diff --git a/README.md b/README.md index 88a38d5..3093c5d 100644 --- a/README.md +++ b/README.md @@ -12,9 +12,6 @@ ## Index -* [spark](#spark) -* [mapreduce-python](#mapreduce-python) -* [kaggle-and-business-analyses](#kaggle-and-business-analyses) * [deep-learning](#deep-learning) * [scikit-learn](#scikit-learn) * [statistical-inference-scipy](#statistical-inference-scipy) @@ -22,6 +19,9 @@ * [matplotlib](#matplotlib) * [numpy](#numpy) * [python-data](#python-data) +* [kaggle-and-business-analyses](#kaggle-and-business-analyses) +* [spark](#spark) +* [mapreduce-python](#mapreduce-python) * [amazon web services](#aws) * [command lines](#commands) * [misc](#misc) @@ -31,47 +31,6 @@ * [contact-info](#contact-info) * [license](#license) -
-

- -

- -## spark - -IPython Notebook(s) demonstrating spark and HDFS functionality. - -| Notebook | Description | -|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| -| [spark](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/spark.ipynb) | In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. | -| [hdfs](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/hdfs.ipynb) | Reliably stores very large files across machines in a large cluster. | - -
-

- -

- -## mapreduce-python - -IPython Notebook(s) demonstrating Hadoop MapReduce with mrjob functionality. - -| Notebook | Description | -|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| -| [mapreduce-python](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/mapreduce/mapreduce-python.ipynb) | Runs MapReduce jobs in Python, executing jobs locally or on Hadoop clusters. Demonstrates Hadoop Streaming in Python code with unit test and [mrjob](https://github.com/Yelp/mrjob) config file to analyze Amazon S3 bucket logs on Elastic MapReduce. [Disco](https://github.com/discoproject/disco/) is another python-based alternative.| - -
-

- -

- -## kaggle-and-business-analyses - -IPython Notebook(s) used in [kaggle](https://www.kaggle.com/) competitions and business analyses. - -| Notebook | Description | -|-------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------| -| [titanic](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/kaggle/titanic.ipynb) | Predicts survival on the Titanic. Demonstrates data cleaning, exploratory data analysis, and machine learning. | -| [churn-analysis](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/analyses/churn.ipynb) | Predicts customer churn. Exercises logistic regression, gradient boosting classifers, support vector machines, random forests, and k-nearest-neighbors. Discussion of confusion matrices, ROC plots, feature importances, prediction probabilities, and calibration/descrimination.| -

@@ -230,6 +189,47 @@ IPython Notebook(s) demonstrating Python functionality geared towards data analy | [pdb](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/python-data/pdb.ipynb) | Learn how to debug in Python with the interactive source code debugger. | | [unit tests](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/python-data/unit_tests.ipynb) | Learn how to test in Python with Nose unit tests. | +
+

+ +

+ +## kaggle-and-business-analyses + +IPython Notebook(s) used in [kaggle](https://www.kaggle.com/) competitions and business analyses. + +| Notebook | Description | +|-------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------| +| [titanic](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/kaggle/titanic.ipynb) | Predicts survival on the Titanic. Demonstrates data cleaning, exploratory data analysis, and machine learning. | +| [churn-analysis](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/analyses/churn.ipynb) | Predicts customer churn. Exercises logistic regression, gradient boosting classifers, support vector machines, random forests, and k-nearest-neighbors. Discussion of confusion matrices, ROC plots, feature importances, prediction probabilities, and calibration/descrimination.| + +
+

+ +

+ +## spark + +IPython Notebook(s) demonstrating spark and HDFS functionality. + +| Notebook | Description | +|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| +| [spark](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/spark.ipynb) | In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. | +| [hdfs](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/hdfs.ipynb) | Reliably stores very large files across machines in a large cluster. | + +
+

+ +

+ +## mapreduce-python + +IPython Notebook(s) demonstrating Hadoop MapReduce with mrjob functionality. + +| Notebook | Description | +|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| +| [mapreduce-python](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/mapreduce/mapreduce-python.ipynb) | Runs MapReduce jobs in Python, executing jobs locally or on Hadoop clusters. Demonstrates Hadoop Streaming in Python code with unit test and [mrjob](https://github.com/Yelp/mrjob) config file to analyze Amazon S3 bucket logs on Elastic MapReduce. [Disco](https://github.com/discoproject/disco/) is another python-based alternative.| +