Commit Graph

318 Commits

Author SHA1 Message Date
Donne Martin
6b31f9cad7 Moved mrjob and d3distcp to the front of the aws section. 2015-04-10 06:58:32 -04:00
Donne Martin
b546d7d484 Moved numpy section prior to commands section. 2015-04-09 13:48:27 -04:00
Donne Martin
66770a5ebd Added numpy to README. Reordered some sections. 2015-04-09 13:34:15 -04:00
Donne Martin
40300d1a6a Added numpy snippets for creating fake data and adding noise. 2015-04-09 13:30:56 -04:00
Donne Martin
6b6a0dafb8 Added numpy snippets for combining arrays. 2015-04-09 13:29:27 -04:00
Donne Martin
475e930f8e Added numpy snippets for reshaping and in-place editing. 2015-04-09 13:28:09 -04:00
Donne Martin
1f260e392d Added numpy snippets for common array operations. 2015-04-09 13:27:00 -04:00
Donne Martin
eef925653d Added numpy IPython Notebook with snippets for NumPy Arrays, dtypes, and shapes. 2015-04-09 13:25:59 -04:00
Donne Martin
1d9797b5c5 Added references section. 2015-04-09 11:54:01 -04:00
Donne Martin
c4f4a8aae3 Added commands to configure a remote for a fork and to sync a fork. Deleted duplicate git pull origin master call 2015-04-08 08:04:09 -04:00
Donne Martin
360379c72e Added matplotlib kernel density estimation plots. 2015-04-07 15:23:53 -04:00
Donne Martin
2a672bf6b6 Added matplotlib IPython Notebook to README. Tweaked section ordering. Changed Notebook to Notebook(s). 2015-04-06 09:31:03 -04:00
Donne Martin
0069e10997 Added snippets for scatter plots, subplots. 2015-04-06 08:55:32 -04:00
Donne Martin
bf27e997e4 Added snippets for normalized plots. 2015-04-06 08:54:12 -04:00
Donne Martin
9496652892 Added snippets for bar plots, histograms, and using subplot2grid. 2015-04-06 08:52:43 -04:00
Donne Martin
21b19dd12f Added matplotlib IPython Notebook. Contains code to clean data, data will be plotted in the notebook and setting of global params. 2015-04-06 08:51:28 -04:00
Donne Martin
557b76f267 Updated linux section with list of commands. 2015-04-05 08:24:40 -04:00
Donne Martin
818cf705c4 Added unit test for sample mrjob mapper and reducer to parse logs on s3. 2015-04-05 08:14:53 -04:00
Donne Martin
8d1d56fc22 Revert README changes b41ba644be and 3d2d550852 regarding whitespace tweaks and moving the section images below the text headers. Images now appear before section text headers. 2015-04-04 09:51:50 -04:00
Donne Martin
b41ba644be Tweaked whitespace. 2015-04-04 09:31:24 -04:00
Donne Martin
3d2d550852 Moved section images to below section text headers. 2015-04-04 09:27:38 -04:00
Donne Martin
6e2c1fd5d2 Added images for each section. Removed outdated References section--will update in the future. 2015-04-04 08:53:04 -04:00
Donne Martin
bac10f9f61 Added all images shown in README. 2015-04-04 08:50:37 -04:00
Donne Martin
21facbb91f Add new repo cover image. 2015-04-04 08:36:56 -04:00
Donne Martin
a12fc148ad Converted pandas and commands sections to use tables for legibility. Fixed a typo in datetime description. 2015-04-04 08:17:13 -04:00
Donne Martin
b23feee87d Tweaked repo description, reordered spark and aws sections, added tables to python-core section. 2015-04-04 07:37:20 -04:00
Donne Martin
8063018571 Converted notebook links and descriptions to tables for readability. 2015-04-04 07:23:31 -04:00
Donne Martin
2cbff15b57 Added more whitespace to try to improve legibility 2015-04-03 06:38:26 -04:00
Donne Martin
eb7bba9377 Added more detailed descriptions to each notebook in the categories kaggle, aws, and spark. 2015-04-03 06:34:32 -04:00
Donne Martin
1403cf4134 Added sample mrjob mapper and reducer to parse logs on s3 following the standard bucket logging format. 2015-04-03 06:06:46 -04:00
Donne Martin
d4ab154643 Transformed Embarked to dummy variables instead of integer representations. The latter implies ordering, which isn't the case with Embarked. 2015-04-02 23:29:33 -04:00
Donne Martin
011747c17a Added Spark accumulators snippets. 2015-03-31 21:41:21 -04:00
Donne Martin
7195c5bc82 Added Spark broadcast variables snippets. 2015-03-30 19:01:07 -04:00
Donne Martin
b3fb4ae219 Added Spark streaming with states snippets. 2015-03-29 17:53:52 -04:00
Donne Martin
f316416d88 Fix genders_mapping being recalculated. 2015-03-28 13:47:21 -04:00
Donne Martin
89e04b3a89 Renamed data munging to data wrnagling, fixed spacing between variables passed to confusion_matrix. 2015-03-27 06:36:23 -04:00
Donne Martin
53e0ae41c5 Reduced confusion matrix image, it was too wide and forced a horizontal scroll bar on nbviewer. 2015-03-25 07:56:39 -04:00
Donne Martin
5e38505cd7 Added random forest classification report. 2015-03-23 07:24:12 -04:00
Donne Martin
4c7a3a52a1 Added confusion matrix and accuracy metrics to evaluate the model's performance. 2015-03-22 12:18:51 -04:00
Donne Martin
20ddcd2a01 Added random forest score on training data. Code cleanup. 2015-03-21 10:46:01 -04:00
Donne Martin
bffbb61bc3 Renamed df to df_train to be more explicit of the DataFrame's purpose. 2015-03-20 14:42:05 -04:00
Donne Martin
a6153e5020 Tweaked slicing indices to use single : instead of ::, which I find more readable. Tweaked Feature: Sex headers. 2015-03-20 12:54:20 -04:00
Donne Martin
5bc4d9bef4 Formatted intro section, added Titanic image. 2015-03-20 11:42:43 -04:00
Donne Martin
01d65fd232 Added Random Forest: Prepare for Kaggle Submission section. 2015-03-20 11:36:41 -04:00
Donne Martin
055cd52cd3 Added Random Forest Predicting section. 2015-03-20 11:35:25 -04:00
Donne Martin
3fcbc8364f Added Random Forest training section. 2015-03-20 11:33:41 -04:00
Donne Martin
ad54e0ae70 Added Data Munging Summary section which contains all the data cleaning and transformation steps described in the notebook. 2015-03-20 11:27:06 -04:00
Donne Martin
387922662f Replaced nested for loop that calculated the median age based on sex and passenger class with groupby + apply instead. 2015-03-20 11:21:27 -04:00
Donne Martin
c9ca38d211 Only attempt to fill missing ports of embarkation if there are missing values. Reworked the AgeFill process. Dropped SibSp and Parch columns as there are part of FamilySize. 2015-03-18 19:25:58 -04:00
Donne Martin
d58e6423b3 Updated Notebook TOC, dropped PassengerId as it won't be using in the machine learning algorithms. 2015-03-18 14:45:18 -04:00