Commit Graph

321 Commits

Author SHA1 Message Date
Donne Martin
14f3b3e54c Cleaned up notebook. 2015-04-10 11:07:17 -04:00
Donne Martin
ad82bdeefc Cleaned up notebook. 2015-04-10 11:03:00 -04:00
Donne Martin
83cf7b1278 Added snippets for creating sample data. Some code cleanup. 2015-04-10 10:59:16 -04:00
Donne Martin
6b31f9cad7 Moved mrjob and d3distcp to the front of the aws section. 2015-04-10 06:58:32 -04:00
Donne Martin
b546d7d484 Moved numpy section prior to commands section. 2015-04-09 13:48:27 -04:00
Donne Martin
66770a5ebd Added numpy to README. Reordered some sections. 2015-04-09 13:34:15 -04:00
Donne Martin
40300d1a6a Added numpy snippets for creating fake data and adding noise. 2015-04-09 13:30:56 -04:00
Donne Martin
6b6a0dafb8 Added numpy snippets for combining arrays. 2015-04-09 13:29:27 -04:00
Donne Martin
475e930f8e Added numpy snippets for reshaping and in-place editing. 2015-04-09 13:28:09 -04:00
Donne Martin
1f260e392d Added numpy snippets for common array operations. 2015-04-09 13:27:00 -04:00
Donne Martin
eef925653d Added numpy IPython Notebook with snippets for NumPy Arrays, dtypes, and shapes. 2015-04-09 13:25:59 -04:00
Donne Martin
1d9797b5c5 Added references section. 2015-04-09 11:54:01 -04:00
Donne Martin
c4f4a8aae3 Added commands to configure a remote for a fork and to sync a fork. Deleted duplicate git pull origin master call 2015-04-08 08:04:09 -04:00
Donne Martin
360379c72e Added matplotlib kernel density estimation plots. 2015-04-07 15:23:53 -04:00
Donne Martin
2a672bf6b6 Added matplotlib IPython Notebook to README. Tweaked section ordering. Changed Notebook to Notebook(s). 2015-04-06 09:31:03 -04:00
Donne Martin
0069e10997 Added snippets for scatter plots, subplots. 2015-04-06 08:55:32 -04:00
Donne Martin
bf27e997e4 Added snippets for normalized plots. 2015-04-06 08:54:12 -04:00
Donne Martin
9496652892 Added snippets for bar plots, histograms, and using subplot2grid. 2015-04-06 08:52:43 -04:00
Donne Martin
21b19dd12f Added matplotlib IPython Notebook. Contains code to clean data, data will be plotted in the notebook and setting of global params. 2015-04-06 08:51:28 -04:00
Donne Martin
557b76f267 Updated linux section with list of commands. 2015-04-05 08:24:40 -04:00
Donne Martin
818cf705c4 Added unit test for sample mrjob mapper and reducer to parse logs on s3. 2015-04-05 08:14:53 -04:00
Donne Martin
8d1d56fc22 Revert README changes b41ba644be and 3d2d550852 regarding whitespace tweaks and moving the section images below the text headers. Images now appear before section text headers. 2015-04-04 09:51:50 -04:00
Donne Martin
b41ba644be Tweaked whitespace. 2015-04-04 09:31:24 -04:00
Donne Martin
3d2d550852 Moved section images to below section text headers. 2015-04-04 09:27:38 -04:00
Donne Martin
6e2c1fd5d2 Added images for each section. Removed outdated References section--will update in the future. 2015-04-04 08:53:04 -04:00
Donne Martin
bac10f9f61 Added all images shown in README. 2015-04-04 08:50:37 -04:00
Donne Martin
21facbb91f Add new repo cover image. 2015-04-04 08:36:56 -04:00
Donne Martin
a12fc148ad Converted pandas and commands sections to use tables for legibility. Fixed a typo in datetime description. 2015-04-04 08:17:13 -04:00
Donne Martin
b23feee87d Tweaked repo description, reordered spark and aws sections, added tables to python-core section. 2015-04-04 07:37:20 -04:00
Donne Martin
8063018571 Converted notebook links and descriptions to tables for readability. 2015-04-04 07:23:31 -04:00
Donne Martin
2cbff15b57 Added more whitespace to try to improve legibility 2015-04-03 06:38:26 -04:00
Donne Martin
eb7bba9377 Added more detailed descriptions to each notebook in the categories kaggle, aws, and spark. 2015-04-03 06:34:32 -04:00
Donne Martin
1403cf4134 Added sample mrjob mapper and reducer to parse logs on s3 following the standard bucket logging format. 2015-04-03 06:06:46 -04:00
Donne Martin
d4ab154643 Transformed Embarked to dummy variables instead of integer representations. The latter implies ordering, which isn't the case with Embarked. 2015-04-02 23:29:33 -04:00
Donne Martin
011747c17a Added Spark accumulators snippets. 2015-03-31 21:41:21 -04:00
Donne Martin
7195c5bc82 Added Spark broadcast variables snippets. 2015-03-30 19:01:07 -04:00
Donne Martin
b3fb4ae219 Added Spark streaming with states snippets. 2015-03-29 17:53:52 -04:00
Donne Martin
f316416d88 Fix genders_mapping being recalculated. 2015-03-28 13:47:21 -04:00
Donne Martin
89e04b3a89 Renamed data munging to data wrnagling, fixed spacing between variables passed to confusion_matrix. 2015-03-27 06:36:23 -04:00
Donne Martin
53e0ae41c5 Reduced confusion matrix image, it was too wide and forced a horizontal scroll bar on nbviewer. 2015-03-25 07:56:39 -04:00
Donne Martin
5e38505cd7 Added random forest classification report. 2015-03-23 07:24:12 -04:00
Donne Martin
4c7a3a52a1 Added confusion matrix and accuracy metrics to evaluate the model's performance. 2015-03-22 12:18:51 -04:00
Donne Martin
20ddcd2a01 Added random forest score on training data. Code cleanup. 2015-03-21 10:46:01 -04:00
Donne Martin
bffbb61bc3 Renamed df to df_train to be more explicit of the DataFrame's purpose. 2015-03-20 14:42:05 -04:00
Donne Martin
a6153e5020 Tweaked slicing indices to use single : instead of ::, which I find more readable. Tweaked Feature: Sex headers. 2015-03-20 12:54:20 -04:00
Donne Martin
5bc4d9bef4 Formatted intro section, added Titanic image. 2015-03-20 11:42:43 -04:00
Donne Martin
01d65fd232 Added Random Forest: Prepare for Kaggle Submission section. 2015-03-20 11:36:41 -04:00
Donne Martin
055cd52cd3 Added Random Forest Predicting section. 2015-03-20 11:35:25 -04:00
Donne Martin
3fcbc8364f Added Random Forest training section. 2015-03-20 11:33:41 -04:00
Donne Martin
ad54e0ae70 Added Data Munging Summary section which contains all the data cleaning and transformation steps described in the notebook. 2015-03-20 11:27:06 -04:00