mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
Re-order sections.
This commit is contained in:
parent
7f7a7e7bc5
commit
84a524f4f3
88
README.md
88
README.md
|
@ -12,9 +12,6 @@
|
||||||
|
|
||||||
## Index
|
## Index
|
||||||
|
|
||||||
* [spark](#spark)
|
|
||||||
* [mapreduce-python](#mapreduce-python)
|
|
||||||
* [kaggle-and-business-analyses](#kaggle-and-business-analyses)
|
|
||||||
* [deep-learning](#deep-learning)
|
* [deep-learning](#deep-learning)
|
||||||
* [scikit-learn](#scikit-learn)
|
* [scikit-learn](#scikit-learn)
|
||||||
* [statistical-inference-scipy](#statistical-inference-scipy)
|
* [statistical-inference-scipy](#statistical-inference-scipy)
|
||||||
|
@ -22,6 +19,9 @@
|
||||||
* [matplotlib](#matplotlib)
|
* [matplotlib](#matplotlib)
|
||||||
* [numpy](#numpy)
|
* [numpy](#numpy)
|
||||||
* [python-data](#python-data)
|
* [python-data](#python-data)
|
||||||
|
* [kaggle-and-business-analyses](#kaggle-and-business-analyses)
|
||||||
|
* [spark](#spark)
|
||||||
|
* [mapreduce-python](#mapreduce-python)
|
||||||
* [amazon web services](#aws)
|
* [amazon web services](#aws)
|
||||||
* [command lines](#commands)
|
* [command lines](#commands)
|
||||||
* [misc](#misc)
|
* [misc](#misc)
|
||||||
|
@ -31,47 +31,6 @@
|
||||||
* [contact-info](#contact-info)
|
* [contact-info](#contact-info)
|
||||||
* [license](#license)
|
* [license](#license)
|
||||||
|
|
||||||
<br/>
|
|
||||||
<p align="center">
|
|
||||||
<img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/spark.png">
|
|
||||||
</p>
|
|
||||||
|
|
||||||
## spark
|
|
||||||
|
|
||||||
IPython Notebook(s) demonstrating spark and HDFS functionality.
|
|
||||||
|
|
||||||
| Notebook | Description |
|
|
||||||
|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
|
|
||||||
| [spark](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/spark.ipynb) | In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. |
|
|
||||||
| [hdfs](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/hdfs.ipynb) | Reliably stores very large files across machines in a large cluster. |
|
|
||||||
|
|
||||||
<br/>
|
|
||||||
<p align="center">
|
|
||||||
<img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/mrjob.png">
|
|
||||||
</p>
|
|
||||||
|
|
||||||
## mapreduce-python
|
|
||||||
|
|
||||||
IPython Notebook(s) demonstrating Hadoop MapReduce with mrjob functionality.
|
|
||||||
|
|
||||||
| Notebook | Description |
|
|
||||||
|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
|
|
||||||
| [mapreduce-python](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/mapreduce/mapreduce-python.ipynb) | Runs MapReduce jobs in Python, executing jobs locally or on Hadoop clusters. Demonstrates Hadoop Streaming in Python code with unit test and [mrjob](https://github.com/Yelp/mrjob) config file to analyze Amazon S3 bucket logs on Elastic MapReduce. [Disco](https://github.com/discoproject/disco/) is another python-based alternative.|
|
|
||||||
|
|
||||||
<br/>
|
|
||||||
<p align="center">
|
|
||||||
<img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/kaggle.png">
|
|
||||||
</p>
|
|
||||||
|
|
||||||
## kaggle-and-business-analyses
|
|
||||||
|
|
||||||
IPython Notebook(s) used in [kaggle](https://www.kaggle.com/) competitions and business analyses.
|
|
||||||
|
|
||||||
| Notebook | Description |
|
|
||||||
|-------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
|
|
||||||
| [titanic](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/kaggle/titanic.ipynb) | Predicts survival on the Titanic. Demonstrates data cleaning, exploratory data analysis, and machine learning. |
|
|
||||||
| [churn-analysis](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/analyses/churn.ipynb) | Predicts customer churn. Exercises logistic regression, gradient boosting classifers, support vector machines, random forests, and k-nearest-neighbors. Discussion of confusion matrices, ROC plots, feature importances, prediction probabilities, and calibration/descrimination.|
|
|
||||||
|
|
||||||
<br/>
|
<br/>
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="http://i.imgur.com/ZhKXrKZ.png">
|
<img src="http://i.imgur.com/ZhKXrKZ.png">
|
||||||
|
@ -230,6 +189,47 @@ IPython Notebook(s) demonstrating Python functionality geared towards data analy
|
||||||
| [pdb](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/python-data/pdb.ipynb) | Learn how to debug in Python with the interactive source code debugger. |
|
| [pdb](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/python-data/pdb.ipynb) | Learn how to debug in Python with the interactive source code debugger. |
|
||||||
| [unit tests](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/python-data/unit_tests.ipynb) | Learn how to test in Python with Nose unit tests. |
|
| [unit tests](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/python-data/unit_tests.ipynb) | Learn how to test in Python with Nose unit tests. |
|
||||||
|
|
||||||
|
<br/>
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/kaggle.png">
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## kaggle-and-business-analyses
|
||||||
|
|
||||||
|
IPython Notebook(s) used in [kaggle](https://www.kaggle.com/) competitions and business analyses.
|
||||||
|
|
||||||
|
| Notebook | Description |
|
||||||
|
|-------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
|
||||||
|
| [titanic](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/kaggle/titanic.ipynb) | Predicts survival on the Titanic. Demonstrates data cleaning, exploratory data analysis, and machine learning. |
|
||||||
|
| [churn-analysis](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/analyses/churn.ipynb) | Predicts customer churn. Exercises logistic regression, gradient boosting classifers, support vector machines, random forests, and k-nearest-neighbors. Discussion of confusion matrices, ROC plots, feature importances, prediction probabilities, and calibration/descrimination.|
|
||||||
|
|
||||||
|
<br/>
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/spark.png">
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## spark
|
||||||
|
|
||||||
|
IPython Notebook(s) demonstrating spark and HDFS functionality.
|
||||||
|
|
||||||
|
| Notebook | Description |
|
||||||
|
|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
|
| [spark](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/spark.ipynb) | In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. |
|
||||||
|
| [hdfs](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/hdfs.ipynb) | Reliably stores very large files across machines in a large cluster. |
|
||||||
|
|
||||||
|
<br/>
|
||||||
|
<p align="center">
|
||||||
|
<img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/mrjob.png">
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## mapreduce-python
|
||||||
|
|
||||||
|
IPython Notebook(s) demonstrating Hadoop MapReduce with mrjob functionality.
|
||||||
|
|
||||||
|
| Notebook | Description |
|
||||||
|
|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
|
||||||
|
| [mapreduce-python](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/mapreduce/mapreduce-python.ipynb) | Runs MapReduce jobs in Python, executing jobs locally or on Hadoop clusters. Demonstrates Hadoop Streaming in Python code with unit test and [mrjob](https://github.com/Yelp/mrjob) config file to analyze Amazon S3 bucket logs on Elastic MapReduce. [Disco](https://github.com/discoproject/disco/) is another python-based alternative.|
|
||||||
|
|
||||||
<br/>
|
<br/>
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/aws.png">
|
<img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/aws.png">
|
||||||
|
|
Loading…
Reference in New Issue
Block a user