Added mapreduce-python to README.

This commit is contained in:
Donne Martin 2015-04-15 15:48:33 -04:00
parent 30b5b68f08
commit 30e48368e6
2 changed files with 16 additions and 18 deletions

View File

@ -5,22 +5,6 @@
# ipython-data-notebooks
Continually updated IPython Data Science Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, Python, and various command lines.
## Index
* [spark and hdfs](#spark)
* [hadoop mapreduce](#aws)
* [amazon web services](#aws)
* [kaggle](#kaggle)
* [scikit-learn](#scikit-learn)
* [matplotlib](#matplotlib)
* [pandas](#pandas)
* [numpy](#numpy)
* [scipy](#scipy)
* [python](#python-data)
* [command lines](#commands)
* [credits](#credits)
* [license](#license)
<br/>
<p align="center">
<img src="https://raw.githubusercontent.com/donnemartin/ipython-data-notebooks/master/images/spark.png">
@ -35,6 +19,19 @@ IPython Notebook(s) demonstrating spark and HDFS functionality.
| [spark](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/spark.ipynb) | In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. |
| [hdfs](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/hdfs.ipynb) | Reliably stores very large files across machines in a large cluster. |
<br/>
<p align="center">
<img src="https://raw.githubusercontent.com/donnemartin/ipython-data-notebooks/master/images/mrjob.png">
</p>
## mapreduce-python
IPython Notebook(s) demonstrating Hadoop MapReduce functionality.
| Notebook | Description |
|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
| [mapreduce-python](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/mapreduce/mapreduce-python.ipynb) | Supports MapReduce jobs in Python, running them locally or on Hadoop clusters. |
<br/>
<p align="center">
<img src="https://raw.githubusercontent.com/donnemartin/ipython-data-notebooks/master/images/aws.png">
@ -46,9 +43,8 @@ IPython Notebook(s) demonstrating Amazon Web Services (AWS) and AWS tools functi
| Notebook | Description |
|------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [mrjob](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#mrjob) | Supports MapReduce jobs in Python 2.5+ and runs them locally or on Hadoop clusters. |
| [s3distcp](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3distcp) | Combines smaller files and aggregates them together by taking in a pattern and target file. S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster. |
| [s3cmd](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3cmd) | Interacts with S3 through the command line. |
| [s3distcp](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3distcp) | Combines smaller files and aggregates them together by taking in a pattern and target file. S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster. |
| [s3-parallel-put](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3-parallel-put) | Uploads multiple files to S3 in parallel. |
| [redshift](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#redshift) | Acts as a fast data warehouse built on top of technology from massive parallel processing (MPP). |
| [kinesis](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#kinesis) | Streams data in real time with the ability to process thousands of data streams per second. |
@ -173,6 +169,8 @@ IPython Notebook(s) demonstrating various command lines for Linux, Git, etc.
* [PyCon 2015 Scikit-learn Tutorial](https://github.com/jakevdp/sklearn_pycon2015) by Jake VanderPlas
* [Parallel Machine Learning with scikit-learn and IPython](https://github.com/ogrisel/parallel_ml_tutorial) by Olivier Grisel
* [Think Stats](http://www.amazon.com/Think-Stats-Allen-B-Downey/dp/1449307116) by Allen Downey
* [Spark Docs](https://spark.apache.org/docs/latest/)
* [AWS Docs](http://aws.amazon.com/documentation/)
## license

BIN
images/mrjob.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB