mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
99 lines
6.1 KiB
Markdown
99 lines
6.1 KiB
Markdown
![alt text](http://i2.wp.com/donnemartin.com/wp-content/uploads/2015/02/ipython_notebook_cover2-e1425213196820.png)
|
|
|
|
# ipython-data-notebooks
|
|
Continually updated IPython Data Science Notebooks geared towards processing big data (AWS, Spark, Hadoop, Linux command line, Python, NumPy, pandas, matplotlib, SciPy, scikit-learn, Kaggle).
|
|
|
|
## kaggle
|
|
|
|
IPython Notebooks used in [kaggle](https://www.kaggle.com/) competitions.
|
|
|
|
* [titanic](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/kaggle/titanic.ipynb): Predicts survival on the Titanic. Demonstrates data cleaning, exploratory data analysis, and machine learning.
|
|
|
|
## aws
|
|
|
|
IPython Notebooks demonstrating Amazon Web Services functionality.
|
|
|
|
* [aws commands index](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb)
|
|
* [s3cmd](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3cmd): Interacts with S3 through the command line.
|
|
* [s3-parallel-put](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3-parallel-put): Uploads multiple files to S3 in parallel.
|
|
* [s3distcp](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3distcp): Combines smaller files and aggregates them together by taking in a pattern and target file. S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster.
|
|
* [mrjob](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#mrjob): Supports MapReduce jobs in Python 2.5+ and runs them locally or on Hadoop clusters.
|
|
* [redshift](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#redshift): Acts as a fast data warehouse built on top of technology from massive parallel processing (MPP).
|
|
* [kinesis](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#kinesis): Streams data in real time with the ability to process thousands of data streams per second.
|
|
* [lambda](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#lambda): Runs code in response to events, automatically managing compute resources.
|
|
|
|
## spark
|
|
|
|
IPython Notebooks demonstrating spark and HDFS functionality.
|
|
|
|
* [spark](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/spark.ipynb): Open-source in-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms.
|
|
* [hdfs](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/hdfs.ipynb): Reliably stores very large files across machines in a large cluster.
|
|
|
|
## python-core
|
|
|
|
IPython Notebooks demonstrating core Python functionality geared towards data analysis.
|
|
|
|
* [data structures](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/structs.ipynb)
|
|
* [data structure utilities](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/structs_utils.ipynb)
|
|
* [functions](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/functions.ipynb)
|
|
* [datetime](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/datetime.ipynb)
|
|
* [unit tests](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/unit_tests.ipynb)
|
|
|
|
## pandas
|
|
|
|
IPython Notebooks demonstrating pandas functionality.
|
|
|
|
* [pandas](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/pandas/pandas.ipynb)
|
|
* [pandas io](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/pandas/pandas_io.ipynb)
|
|
* [pandas cleaning](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/pandas/pandas_clean.ipynb)
|
|
|
|
## commands
|
|
|
|
IPython Notebooks demonstrating various command lines for Linux, Git, etc.
|
|
|
|
* [linux](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/commands/linux.ipynb)
|
|
* [anaconda](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/commands/misc.ipynb#anaconda)
|
|
* [ipython notebook](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/commands/misc.ipynb#ipython-notebook)
|
|
* [git](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/commands/misc.ipynb#git)
|
|
* [ruby](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/commands/misc.ipynb#ruby)
|
|
* [jekyll](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/commands/misc.ipynb#jekyll)
|
|
|
|
## matplotlib
|
|
|
|
[Coming Soon] IPython Notebooks demonstrating matplotlib functionality.
|
|
|
|
## scikit-learn
|
|
|
|
[Coming Soon] IPython Notebooks demonstrating scikit-learn functionality.
|
|
|
|
## scipy
|
|
|
|
[Coming Soon] IPython Notebooks demonstrating SciPy functionality.
|
|
|
|
## numpy
|
|
|
|
[Coming Soon] IPython Notebooks demonstrating NumPy functionality.
|
|
|
|
## References
|
|
|
|
* [Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython](http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793)
|
|
* [Building Machine Learning Systems with Python](http://www.amazon.com/Building-Machine-Learning-Systems-Python/dp/1782161406)
|
|
* [Think Bayes](http://www.amazon.com/Think-Bayes-Allen-B-Downey/dp/1449370780)
|
|
* [Think Stats](http://www.amazon.com/Think-Stats-Allen-B-Downey/dp/1449307116)
|
|
|
|
## License
|
|
|
|
Copyright 2014 Donne Martin
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
you may not use this file except in compliance with the License.
|
|
You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|