Commit Graph

328 Commits

Author SHA1 Message Date
Donne Martin
bffbb61bc3 Renamed df to df_train to be more explicit of the DataFrame's purpose. 2015-03-20 14:42:05 -04:00
Donne Martin
a6153e5020 Tweaked slicing indices to use single : instead of ::, which I find more readable. Tweaked Feature: Sex headers. 2015-03-20 12:54:20 -04:00
Donne Martin
5bc4d9bef4 Formatted intro section, added Titanic image. 2015-03-20 11:42:43 -04:00
Donne Martin
01d65fd232 Added Random Forest: Prepare for Kaggle Submission section. 2015-03-20 11:36:41 -04:00
Donne Martin
055cd52cd3 Added Random Forest Predicting section. 2015-03-20 11:35:25 -04:00
Donne Martin
3fcbc8364f Added Random Forest training section. 2015-03-20 11:33:41 -04:00
Donne Martin
ad54e0ae70 Added Data Munging Summary section which contains all the data cleaning and transformation steps described in the notebook. 2015-03-20 11:27:06 -04:00
Donne Martin
387922662f Replaced nested for loop that calculated the median age based on sex and passenger class with groupby + apply instead. 2015-03-20 11:21:27 -04:00
Donne Martin
c9ca38d211 Only attempt to fill missing ports of embarkation if there are missing values. Reworked the AgeFill process. Dropped SibSp and Parch columns as there are part of FamilySize. 2015-03-18 19:25:58 -04:00
Donne Martin
d58e6423b3 Updated Notebook TOC, dropped PassengerId as it won't be using in the machine learning algorithms. 2015-03-18 14:45:18 -04:00
Donne Martin
000fea0862 Added section Final Data Preparation for Machine Learning, which drops unused columns and converts the DataFrame to a numpy array. 2015-03-18 14:32:28 -04:00
Donne Martin
2662e2bb03 Added feature engineering description, a description on the family size histogram, and a brief discussion on a potential feature related to the passenger's name. 2015-03-18 14:21:25 -04:00
Donne Martin
81660c59d1 Reordered README sections. 2015-03-17 16:21:33 -04:00
Donne Martin
a93f599a9b Cleaned up code, charts, and descriptions in various sections. 2015-03-17 16:16:42 -04:00
Donne Martin
7d4c5532a8 Rework the age analysis, adding more details and graphs. 2015-03-17 15:44:50 -04:00
Donne Martin
ce3ef575bd Added additional plots to further explore the port of embarkation feature. 2015-03-17 14:53:22 -04:00
Donne Martin
011313d2e1 Added snippets of feature engineering: creating a new feature family size by combining number of parents and siblings. 2015-03-17 14:05:07 -04:00
Donne Martin
44eeaf447d Cleaned up some sections, added plots of survival rate by Sex and Pclass. 2015-03-17 14:03:58 -04:00
Donne Martin
0c56902027 Add plots for features we will analyze in the exploratory data analysis section. 2015-03-17 08:51:30 -04:00
Donne Martin
10d63efb4a Added Spark streaming snippets. 2015-03-16 16:01:51 -04:00
Donne Martin
8364d476b3 Updated variable descriptions section to be a markdown cell with a pre tag. 2015-03-15 08:36:23 -04:00
Donne Martin
8d72ba4cd4 Cleaned up various portions of the notebook. 2015-03-15 08:33:36 -04:00
Donne Martin
66a98b61d2 Added title and axes labels for Age charts. 2015-03-15 08:09:56 -04:00
Donne Martin
02b8c05fe9 Fixed range of embarked histogram, as it was not showing the NaN value. Added title and axes labels for passenger gender charts. 2015-03-15 06:32:55 -04:00
Donne Martin
2b8bf79cfa Added title and axes labels for passenger gender charts. 2015-03-15 06:27:48 -04:00
Donne Martin
1b887f75ca Added title and axes labels for passenger classes charts. 2015-03-15 06:24:07 -04:00
Donne Martin
e9d533232b Added competition site URL. Fixed Description header. 2015-03-15 06:14:27 -04:00
Donne Martin
3b8eb8f823 Added snippets to analyze the Titanic passenger Age feature. 2015-03-15 06:11:01 -04:00
Donne Martin
b0f14105ae Added snippets to analyze the Titanic Embarked feature. 2015-03-15 04:07:05 -04:00
Donne Martin
0394852ba5 Added snippets to analyze the Titanic Sex (Gender) feature. 2015-03-14 20:03:48 -04:00
Donne Martin
b2c7f4f850 Added snippets to analyze the Titanic Passenger Class feature. 2015-03-14 20:01:55 -04:00
Donne Martin
8babd3a1cf Added Kaggle section to README. 2015-03-14 19:57:55 -04:00
Donne Martin
4ad409aa63 Added snippets to start exploring the Titanic data. 2015-03-14 19:56:28 -04:00
Donne Martin
bcfae90101 Added preliminary Kaggle Titanic survivor analysis containing the competition description, evaluation, data set, and snippet to read in the data to pandas. 2015-03-14 19:53:56 -04:00
Donne Martin
1fbbd20c68 Added Kaggle Titanic data files. 2015-03-14 19:49:07 -04:00
Donne Martin
8196b4bdfb Updated repo description. 2015-03-14 09:22:01 -04:00
Donne Martin
ce605a6fdf Added snippets for configuring Spark applications. 2015-03-13 08:25:50 -04:00
Donne Martin
53789e0e3e Prefixed Spark commands with ! so they can be executed within IPython Notebook. 2015-03-13 08:09:01 -04:00
Donne Martin
87b017fd37 Prefixed HDFS commands with ! so they can be executed within IPython Notebook. 2015-03-13 08:07:17 -04:00
Donne Martin
8c251e43cd Prefixed various misc commands with ! so they can be executed within IPython Notebook. 2015-03-13 08:05:56 -04:00
Donne Martin
8c4541ae33 Added git reset and pull commands. 2015-03-13 08:03:01 -04:00
Donne Martin
a9ea93b872 Prefixed Linux commands with ! so they can be executed within IPython Notebook. 2015-03-13 07:59:28 -04:00
Donne Martin
23d3866b8e Prefixed AWS commands with ! so they can be executed within IPython Notebook. 2015-03-13 07:57:12 -04:00
Donne Martin
1c4e2157a6 Added snippets to demonstrate writing and running a Spark app. 2015-03-12 06:25:40 -04:00
Donne Martin
9fd62a73ae Added sed command to delete matching lines in place. Added command to display all matching running processes with full formatting. Tweaked formatting of vim section regarding vimtutor and vim syntax coloring. 2015-03-11 20:30:23 -04:00
Donne Martin
31c4f3299a Updated AWS index. 2015-03-10 17:00:44 -04:00
Donne Martin
5600ab0377 Added Lambda commands. 2015-03-10 17:00:08 -04:00
Donne Martin
cd84ffb2f0 Added Kinesis commands. 2015-03-09 16:10:54 -04:00
Donne Martin
1815c9a122 Added snippets to checkpoint RDDs in Spark. 2015-03-08 05:55:45 -04:00
Donne Martin
0481497848 Added snippets to cache RDDs in Spark. 2015-03-08 05:55:05 -04:00