Added mapreduce-python to README.

2024-03-22 13:30:56 +08:00 · 2015-04-15 15:48:33 -04:00 · 2015-04-15 15:48:33 -04:00 · 30e48368e6
commit 30e48368e6
parent 30b5b68f08
2 changed files with 16 additions and 18 deletions
--- a/README.md
+++ b/README.md
@ -5,22 +5,6 @@
 # ipython-data-notebooks
 Continually updated IPython Data Science Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, Python, and various command lines.

-## Index
-
-* [spark and hdfs](#spark)
-* [hadoop mapreduce](#aws)
-* [amazon web services](#aws)
-* [kaggle](#kaggle)
-* [scikit-learn](#scikit-learn)
-* [matplotlib](#matplotlib)
-* [pandas](#pandas)
-* [numpy](#numpy)
-* [scipy](#scipy)
-* [python](#python-data)
-* [command lines](#commands)
-* [credits](#credits)
-* [license](#license)
-
 <br/>
 <p align="center">
  <img src="https://raw.githubusercontent.com/donnemartin/ipython-data-notebooks/master/images/spark.png">
@ -35,6 +19,19 @@ IPython Notebook(s) demonstrating spark and HDFS functionality.
 | [spark](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/spark.ipynb) | In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. |
 | [hdfs](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/hdfs.ipynb) | Reliably stores very large files across machines in a large cluster. |

+<br/>
+<p align="center">
+  <img src="https://raw.githubusercontent.com/donnemartin/ipython-data-notebooks/master/images/mrjob.png">
+</p>
+
+## mapreduce-python
+
+IPython Notebook(s) demonstrating Hadoop MapReduce functionality.
+
+| Notebook | Description |
+|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
+| [mapreduce-python](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/mapreduce/mapreduce-python.ipynb) | Supports MapReduce jobs in Python, running them locally or on Hadoop clusters. |
+
 <br/>
 <p align="center">
  <img src="https://raw.githubusercontent.com/donnemartin/ipython-data-notebooks/master/images/aws.png">
@ -46,9 +43,8 @@ IPython Notebook(s) demonstrating Amazon Web Services (AWS) and AWS tools functi

 | Notebook | Description |
 |------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| [mrjob](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#mrjob) | Supports MapReduce jobs in Python 2.5+ and runs them locally or on Hadoop clusters. |
-| [s3distcp](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3distcp) | Combines smaller files and aggregates them together by taking in a pattern and target file.  S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster. |
 | [s3cmd](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3cmd) | Interacts with S3 through the command line. |
+| [s3distcp](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3distcp) | Combines smaller files and aggregates them together by taking in a pattern and target file.  S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster. |
 | [s3-parallel-put](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3-parallel-put) | Uploads multiple files to S3 in parallel. |
 | [redshift](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#redshift) | Acts as a fast data warehouse built on top of technology from massive parallel processing (MPP). |
 | [kinesis](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#kinesis) | Streams data in real time with the ability to process thousands of data streams per second. |
@ -173,6 +169,8 @@ IPython Notebook(s) demonstrating various command lines for Linux, Git, etc.
 * [PyCon 2015 Scikit-learn Tutorial](https://github.com/jakevdp/sklearn_pycon2015) by Jake VanderPlas
 * [Parallel Machine Learning with scikit-learn and IPython](https://github.com/ogrisel/parallel_ml_tutorial) by Olivier Grisel
 * [Think Stats](http://www.amazon.com/Think-Stats-Allen-B-Downey/dp/1449307116) by Allen Downey
+* [Spark Docs](https://spark.apache.org/docs/latest/)
+* [AWS Docs](http://aws.amazon.com/documentation/)

 ## license

--- a/images/mrjob.png
+++ b/images/mrjob.png