mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
Added mapreduce-python to README.
This commit is contained in:
parent
30b5b68f08
commit
30e48368e6
34
README.md
34
README.md
|
@ -5,22 +5,6 @@
|
|||
# ipython-data-notebooks
|
||||
Continually updated IPython Data Science Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, Python, and various command lines.
|
||||
|
||||
## Index
|
||||
|
||||
* [spark and hdfs](#spark)
|
||||
* [hadoop mapreduce](#aws)
|
||||
* [amazon web services](#aws)
|
||||
* [kaggle](#kaggle)
|
||||
* [scikit-learn](#scikit-learn)
|
||||
* [matplotlib](#matplotlib)
|
||||
* [pandas](#pandas)
|
||||
* [numpy](#numpy)
|
||||
* [scipy](#scipy)
|
||||
* [python](#python-data)
|
||||
* [command lines](#commands)
|
||||
* [credits](#credits)
|
||||
* [license](#license)
|
||||
|
||||
<br/>
|
||||
<p align="center">
|
||||
<img src="https://raw.githubusercontent.com/donnemartin/ipython-data-notebooks/master/images/spark.png">
|
||||
|
@ -35,6 +19,19 @@ IPython Notebook(s) demonstrating spark and HDFS functionality.
|
|||
| [spark](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/spark.ipynb) | In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. |
|
||||
| [hdfs](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/hdfs.ipynb) | Reliably stores very large files across machines in a large cluster. |
|
||||
|
||||
<br/>
|
||||
<p align="center">
|
||||
<img src="https://raw.githubusercontent.com/donnemartin/ipython-data-notebooks/master/images/mrjob.png">
|
||||
</p>
|
||||
|
||||
## mapreduce-python
|
||||
|
||||
IPython Notebook(s) demonstrating Hadoop MapReduce functionality.
|
||||
|
||||
| Notebook | Description |
|
||||
|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| [mapreduce-python](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/mapreduce/mapreduce-python.ipynb) | Supports MapReduce jobs in Python, running them locally or on Hadoop clusters. |
|
||||
|
||||
<br/>
|
||||
<p align="center">
|
||||
<img src="https://raw.githubusercontent.com/donnemartin/ipython-data-notebooks/master/images/aws.png">
|
||||
|
@ -46,9 +43,8 @@ IPython Notebook(s) demonstrating Amazon Web Services (AWS) and AWS tools functi
|
|||
|
||||
| Notebook | Description |
|
||||
|------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| [mrjob](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#mrjob) | Supports MapReduce jobs in Python 2.5+ and runs them locally or on Hadoop clusters. |
|
||||
| [s3distcp](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3distcp) | Combines smaller files and aggregates them together by taking in a pattern and target file. S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster. |
|
||||
| [s3cmd](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3cmd) | Interacts with S3 through the command line. |
|
||||
| [s3distcp](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3distcp) | Combines smaller files and aggregates them together by taking in a pattern and target file. S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster. |
|
||||
| [s3-parallel-put](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3-parallel-put) | Uploads multiple files to S3 in parallel. |
|
||||
| [redshift](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#redshift) | Acts as a fast data warehouse built on top of technology from massive parallel processing (MPP). |
|
||||
| [kinesis](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#kinesis) | Streams data in real time with the ability to process thousands of data streams per second. |
|
||||
|
@ -173,6 +169,8 @@ IPython Notebook(s) demonstrating various command lines for Linux, Git, etc.
|
|||
* [PyCon 2015 Scikit-learn Tutorial](https://github.com/jakevdp/sklearn_pycon2015) by Jake VanderPlas
|
||||
* [Parallel Machine Learning with scikit-learn and IPython](https://github.com/ogrisel/parallel_ml_tutorial) by Olivier Grisel
|
||||
* [Think Stats](http://www.amazon.com/Think-Stats-Allen-B-Downey/dp/1449307116) by Allen Downey
|
||||
* [Spark Docs](https://spark.apache.org/docs/latest/)
|
||||
* [AWS Docs](http://aws.amazon.com/documentation/)
|
||||
|
||||
## license
|
||||
|
||||
|
|
BIN
images/mrjob.png
Normal file
BIN
images/mrjob.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 46 KiB |
Loading…
Reference in New Issue
Block a user