mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
Tweaked repo description, reordered spark and aws sections, added tables to python-core section.
This commit is contained in:
parent
8063018571
commit
b23feee87d
33
README.md
33
README.md
|
@ -1,7 +1,7 @@
|
|||
![alt text](http://i2.wp.com/donnemartin.com/wp-content/uploads/2015/02/ipython_notebook_cover2-e1425213196820.png)
|
||||
|
||||
# ipython-data-notebooks
|
||||
Continually updated IPython Data Science Notebooks geared towards processing big data (AWS, Spark, Hadoop, Linux command line, Python, NumPy, pandas, matplotlib, SciPy, scikit-learn, Kaggle).
|
||||
Continually updated IPython Data Science Notebooks geared towards processing big data (AWS, Spark, Hadoop MapReduce, HDFS, Linux command line, Python, NumPy, pandas, matplotlib, SciPy, scikit-learn, Kaggle).
|
||||
|
||||
## kaggle
|
||||
|
||||
|
@ -11,6 +11,15 @@ IPython Notebooks used in [kaggle](https://www.kaggle.com/) competitions.
|
|||
|-------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
|
||||
| [titanic](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/kaggle/titanic.ipynb) | Predicts survival on the Titanic. Demonstrates data cleaning, exploratory data analysis, and machine learning. |
|
||||
|
||||
## spark
|
||||
|
||||
IPython Notebooks demonstrating spark and HDFS functionality.
|
||||
|
||||
| Notebook | Description |
|
||||
|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| [spark](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/spark.ipynb) | In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. |
|
||||
| [hdfs](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/hdfs.ipynb) | Reliably stores very large files across machines in a large cluster. |
|
||||
|
||||
## aws
|
||||
|
||||
IPython Notebooks demonstrating Amazon Web Services functionality.
|
||||
|
@ -19,29 +28,23 @@ IPython Notebooks demonstrating Amazon Web Services functionality.
|
|||
|------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| [s3cmd](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3cmd) | Interacts with S3 through the command line. |
|
||||
| [s3-parallel-put](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3-parallel-put) | Uploads multiple files to S3 in parallel. |
|
||||
| [s3distcp](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3distcp) | Combines smaller files and aggregates them together by taking in a pattern and target file.,S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster. |
|
||||
| [s3distcp](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#s3distcp) | Combines smaller files and aggregates them together by taking in a pattern and target file. S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster. |
|
||||
| [mrjob](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#mrjob) | Supports MapReduce jobs in Python 2.5+ and runs them locally or on Hadoop clusters. |
|
||||
| [redshift](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#redshift) | Acts as a fast data warehouse built on top of technology from massive parallel processing (MPP). |
|
||||
| [kinesis](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#kinesis) | Streams data in real time with the ability to process thousands of data streams per second. |
|
||||
| [lambda](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/aws/aws.ipynb#lambda) | Runs code in response to events, automatically managing compute resources. |
|
||||
|
||||
## spark
|
||||
|
||||
IPython Notebooks demonstrating spark and HDFS functionality.
|
||||
|
||||
* [spark](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/spark.ipynb): Open-source in-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms.
|
||||
|
||||
* [hdfs](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/spark/hdfs.ipynb): Reliably stores very large files across machines in a large cluster.
|
||||
|
||||
## python-core
|
||||
|
||||
IPython Notebooks demonstrating core Python functionality geared towards data analysis.
|
||||
|
||||
* [data structures](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/structs.ipynb)
|
||||
* [data structure utilities](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/structs_utils.ipynb)
|
||||
* [functions](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/functions.ipynb)
|
||||
* [datetime](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/datetime.ipynb)
|
||||
* [unit tests](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/unit_tests.ipynb)
|
||||
| Notebook | Description |
|
||||
|-----------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
|
||||
| [data structures](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/structs.ipynb) | Tuples, lists, dicts, sets. |
|
||||
| [data structure utilities](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/structs_utils.ipynb) | Slice, range, xrange, bisect, sort, sorted, reversed, enumerate, zip, list comprehensions. |
|
||||
| [functions](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/functions.ipynb) | Functions as objects, lambda functions, closures, *args, **kwargs currying, generators, generator expressions, itertools. |
|
||||
| [datetime](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/datetime.ipynb) | Basics of datetime, strftime, strptime, timedelta. |
|
||||
| [unit tests](http://nbviewer.ipython.org/github/donnemartin/ipython-data-notebooks/blob/master/python-core/unit_tests.ipynb) | Nose unit tests. |
|
||||
|
||||
## pandas
|
||||
|
||||
|
|
Loading…
Reference in New Issue
Block a user