Commit Graph

42 Commits (master)

Author SHA1 Message Date
Jalem Raj Rohit 176172101d Replaced `sort` with `sorted`
`sort` throws an error, so replaced with `sorted` which solves the same purpose
2016-02-25 19:41:53 +05:30
Donne Martin fe696f5efd Updated notebook author and license info. 2015-11-01 06:43:21 -05:00
Donne Martin e3660ca21d Updated notebook to v3. 2015-05-31 08:53:16 -04:00
Donne Martin 30b5b68f08 Added missing __init__.py files to kaggle and mapreduce packages. 2015-04-15 15:20:56 -04:00
Donne Martin d4ab154643 Transformed Embarked to dummy variables instead of integer representations. The latter implies ordering, which isn't the case with Embarked. 2015-04-02 23:29:33 -04:00
Donne Martin f316416d88 Fix genders_mapping being recalculated. 2015-03-28 13:47:21 -04:00
Donne Martin 89e04b3a89 Renamed data munging to data wrnagling, fixed spacing between variables passed to confusion_matrix. 2015-03-27 06:36:23 -04:00
Donne Martin 53e0ae41c5 Reduced confusion matrix image, it was too wide and forced a horizontal scroll bar on nbviewer. 2015-03-25 07:56:39 -04:00
Donne Martin 5e38505cd7 Added random forest classification report. 2015-03-23 07:24:12 -04:00
Donne Martin 4c7a3a52a1 Added confusion matrix and accuracy metrics to evaluate the model's performance. 2015-03-22 12:18:51 -04:00
Donne Martin 20ddcd2a01 Added random forest score on training data. Code cleanup. 2015-03-21 10:46:01 -04:00
Donne Martin bffbb61bc3 Renamed df to df_train to be more explicit of the DataFrame's purpose. 2015-03-20 14:42:05 -04:00
Donne Martin a6153e5020 Tweaked slicing indices to use single : instead of ::, which I find more readable. Tweaked Feature: Sex headers. 2015-03-20 12:54:20 -04:00
Donne Martin 5bc4d9bef4 Formatted intro section, added Titanic image. 2015-03-20 11:42:43 -04:00
Donne Martin 01d65fd232 Added Random Forest: Prepare for Kaggle Submission section. 2015-03-20 11:36:41 -04:00
Donne Martin 055cd52cd3 Added Random Forest Predicting section. 2015-03-20 11:35:25 -04:00
Donne Martin 3fcbc8364f Added Random Forest training section. 2015-03-20 11:33:41 -04:00
Donne Martin ad54e0ae70 Added Data Munging Summary section which contains all the data cleaning and transformation steps described in the notebook. 2015-03-20 11:27:06 -04:00
Donne Martin 387922662f Replaced nested for loop that calculated the median age based on sex and passenger class with groupby + apply instead. 2015-03-20 11:21:27 -04:00
Donne Martin c9ca38d211 Only attempt to fill missing ports of embarkation if there are missing values. Reworked the AgeFill process. Dropped SibSp and Parch columns as there are part of FamilySize. 2015-03-18 19:25:58 -04:00
Donne Martin d58e6423b3 Updated Notebook TOC, dropped PassengerId as it won't be using in the machine learning algorithms. 2015-03-18 14:45:18 -04:00
Donne Martin 000fea0862 Added section Final Data Preparation for Machine Learning, which drops unused columns and converts the DataFrame to a numpy array. 2015-03-18 14:32:28 -04:00
Donne Martin 2662e2bb03 Added feature engineering description, a description on the family size histogram, and a brief discussion on a potential feature related to the passenger's name. 2015-03-18 14:21:25 -04:00
Donne Martin a93f599a9b Cleaned up code, charts, and descriptions in various sections. 2015-03-17 16:16:42 -04:00
Donne Martin 7d4c5532a8 Rework the age analysis, adding more details and graphs. 2015-03-17 15:44:50 -04:00
Donne Martin ce3ef575bd Added additional plots to further explore the port of embarkation feature. 2015-03-17 14:53:22 -04:00
Donne Martin 011313d2e1 Added snippets of feature engineering: creating a new feature family size by combining number of parents and siblings. 2015-03-17 14:05:07 -04:00
Donne Martin 44eeaf447d Cleaned up some sections, added plots of survival rate by Sex and Pclass. 2015-03-17 14:03:58 -04:00
Donne Martin 0c56902027 Add plots for features we will analyze in the exploratory data analysis section. 2015-03-17 08:51:30 -04:00
Donne Martin 8364d476b3 Updated variable descriptions section to be a markdown cell with a pre tag. 2015-03-15 08:36:23 -04:00
Donne Martin 8d72ba4cd4 Cleaned up various portions of the notebook. 2015-03-15 08:33:36 -04:00
Donne Martin 66a98b61d2 Added title and axes labels for Age charts. 2015-03-15 08:09:56 -04:00
Donne Martin 02b8c05fe9 Fixed range of embarked histogram, as it was not showing the NaN value. Added title and axes labels for passenger gender charts. 2015-03-15 06:32:55 -04:00
Donne Martin 2b8bf79cfa Added title and axes labels for passenger gender charts. 2015-03-15 06:27:48 -04:00
Donne Martin 1b887f75ca Added title and axes labels for passenger classes charts. 2015-03-15 06:24:07 -04:00
Donne Martin e9d533232b Added competition site URL. Fixed Description header. 2015-03-15 06:14:27 -04:00
Donne Martin 3b8eb8f823 Added snippets to analyze the Titanic passenger Age feature. 2015-03-15 06:11:01 -04:00
Donne Martin b0f14105ae Added snippets to analyze the Titanic Embarked feature. 2015-03-15 04:07:05 -04:00
Donne Martin 0394852ba5 Added snippets to analyze the Titanic Sex (Gender) feature. 2015-03-14 20:03:48 -04:00
Donne Martin b2c7f4f850 Added snippets to analyze the Titanic Passenger Class feature. 2015-03-14 20:01:55 -04:00
Donne Martin 4ad409aa63 Added snippets to start exploring the Titanic data. 2015-03-14 19:56:28 -04:00
Donne Martin bcfae90101 Added preliminary Kaggle Titanic survivor analysis containing the competition description, evaluation, data set, and snippet to read in the data to pandas. 2015-03-14 19:53:56 -04:00