Donne Martin
|
b23feee87d
|
Tweaked repo description, reordered spark and aws sections, added tables to python-core section.
|
2015-04-04 07:37:20 -04:00 |
|
Donne Martin
|
8063018571
|
Converted notebook links and descriptions to tables for readability.
|
2015-04-04 07:23:31 -04:00 |
|
Donne Martin
|
2cbff15b57
|
Added more whitespace to try to improve legibility
|
2015-04-03 06:38:26 -04:00 |
|
Donne Martin
|
eb7bba9377
|
Added more detailed descriptions to each notebook in the categories kaggle, aws, and spark.
|
2015-04-03 06:34:32 -04:00 |
|
Donne Martin
|
1403cf4134
|
Added sample mrjob mapper and reducer to parse logs on s3 following the standard bucket logging format.
|
2015-04-03 06:06:46 -04:00 |
|
Donne Martin
|
d4ab154643
|
Transformed Embarked to dummy variables instead of integer representations. The latter implies ordering, which isn't the case with Embarked.
|
2015-04-02 23:29:33 -04:00 |
|
Donne Martin
|
011747c17a
|
Added Spark accumulators snippets.
|
2015-03-31 21:41:21 -04:00 |
|
Donne Martin
|
7195c5bc82
|
Added Spark broadcast variables snippets.
|
2015-03-30 19:01:07 -04:00 |
|
Donne Martin
|
b3fb4ae219
|
Added Spark streaming with states snippets.
|
2015-03-29 17:53:52 -04:00 |
|
Donne Martin
|
f316416d88
|
Fix genders_mapping being recalculated.
|
2015-03-28 13:47:21 -04:00 |
|
Donne Martin
|
89e04b3a89
|
Renamed data munging to data wrnagling, fixed spacing between variables passed to confusion_matrix.
|
2015-03-27 06:36:23 -04:00 |
|
Donne Martin
|
53e0ae41c5
|
Reduced confusion matrix image, it was too wide and forced a horizontal scroll bar on nbviewer.
|
2015-03-25 07:56:39 -04:00 |
|
Donne Martin
|
5e38505cd7
|
Added random forest classification report.
|
2015-03-23 07:24:12 -04:00 |
|
Donne Martin
|
4c7a3a52a1
|
Added confusion matrix and accuracy metrics to evaluate the model's performance.
|
2015-03-22 12:18:51 -04:00 |
|
Donne Martin
|
20ddcd2a01
|
Added random forest score on training data. Code cleanup.
|
2015-03-21 10:46:01 -04:00 |
|
Donne Martin
|
bffbb61bc3
|
Renamed df to df_train to be more explicit of the DataFrame's purpose.
|
2015-03-20 14:42:05 -04:00 |
|
Donne Martin
|
a6153e5020
|
Tweaked slicing indices to use single : instead of ::, which I find more readable. Tweaked Feature: Sex headers.
|
2015-03-20 12:54:20 -04:00 |
|
Donne Martin
|
5bc4d9bef4
|
Formatted intro section, added Titanic image.
|
2015-03-20 11:42:43 -04:00 |
|
Donne Martin
|
01d65fd232
|
Added Random Forest: Prepare for Kaggle Submission section.
|
2015-03-20 11:36:41 -04:00 |
|
Donne Martin
|
055cd52cd3
|
Added Random Forest Predicting section.
|
2015-03-20 11:35:25 -04:00 |
|
Donne Martin
|
3fcbc8364f
|
Added Random Forest training section.
|
2015-03-20 11:33:41 -04:00 |
|
Donne Martin
|
ad54e0ae70
|
Added Data Munging Summary section which contains all the data cleaning and transformation steps described in the notebook.
|
2015-03-20 11:27:06 -04:00 |
|
Donne Martin
|
387922662f
|
Replaced nested for loop that calculated the median age based on sex and passenger class with groupby + apply instead.
|
2015-03-20 11:21:27 -04:00 |
|
Donne Martin
|
c9ca38d211
|
Only attempt to fill missing ports of embarkation if there are missing values. Reworked the AgeFill process. Dropped SibSp and Parch columns as there are part of FamilySize.
|
2015-03-18 19:25:58 -04:00 |
|
Donne Martin
|
d58e6423b3
|
Updated Notebook TOC, dropped PassengerId as it won't be using in the machine learning algorithms.
|
2015-03-18 14:45:18 -04:00 |
|
Donne Martin
|
000fea0862
|
Added section Final Data Preparation for Machine Learning, which drops unused columns and converts the DataFrame to a numpy array.
|
2015-03-18 14:32:28 -04:00 |
|
Donne Martin
|
2662e2bb03
|
Added feature engineering description, a description on the family size histogram, and a brief discussion on a potential feature related to the passenger's name.
|
2015-03-18 14:21:25 -04:00 |
|
Donne Martin
|
81660c59d1
|
Reordered README sections.
|
2015-03-17 16:21:33 -04:00 |
|
Donne Martin
|
a93f599a9b
|
Cleaned up code, charts, and descriptions in various sections.
|
2015-03-17 16:16:42 -04:00 |
|
Donne Martin
|
7d4c5532a8
|
Rework the age analysis, adding more details and graphs.
|
2015-03-17 15:44:50 -04:00 |
|
Donne Martin
|
ce3ef575bd
|
Added additional plots to further explore the port of embarkation feature.
|
2015-03-17 14:53:22 -04:00 |
|
Donne Martin
|
011313d2e1
|
Added snippets of feature engineering: creating a new feature family size by combining number of parents and siblings.
|
2015-03-17 14:05:07 -04:00 |
|
Donne Martin
|
44eeaf447d
|
Cleaned up some sections, added plots of survival rate by Sex and Pclass.
|
2015-03-17 14:03:58 -04:00 |
|
Donne Martin
|
0c56902027
|
Add plots for features we will analyze in the exploratory data analysis section.
|
2015-03-17 08:51:30 -04:00 |
|
Donne Martin
|
10d63efb4a
|
Added Spark streaming snippets.
|
2015-03-16 16:01:51 -04:00 |
|
Donne Martin
|
8364d476b3
|
Updated variable descriptions section to be a markdown cell with a pre tag.
|
2015-03-15 08:36:23 -04:00 |
|
Donne Martin
|
8d72ba4cd4
|
Cleaned up various portions of the notebook.
|
2015-03-15 08:33:36 -04:00 |
|
Donne Martin
|
66a98b61d2
|
Added title and axes labels for Age charts.
|
2015-03-15 08:09:56 -04:00 |
|
Donne Martin
|
02b8c05fe9
|
Fixed range of embarked histogram, as it was not showing the NaN value. Added title and axes labels for passenger gender charts.
|
2015-03-15 06:32:55 -04:00 |
|
Donne Martin
|
2b8bf79cfa
|
Added title and axes labels for passenger gender charts.
|
2015-03-15 06:27:48 -04:00 |
|
Donne Martin
|
1b887f75ca
|
Added title and axes labels for passenger classes charts.
|
2015-03-15 06:24:07 -04:00 |
|
Donne Martin
|
e9d533232b
|
Added competition site URL. Fixed Description header.
|
2015-03-15 06:14:27 -04:00 |
|
Donne Martin
|
3b8eb8f823
|
Added snippets to analyze the Titanic passenger Age feature.
|
2015-03-15 06:11:01 -04:00 |
|
Donne Martin
|
b0f14105ae
|
Added snippets to analyze the Titanic Embarked feature.
|
2015-03-15 04:07:05 -04:00 |
|
Donne Martin
|
0394852ba5
|
Added snippets to analyze the Titanic Sex (Gender) feature.
|
2015-03-14 20:03:48 -04:00 |
|
Donne Martin
|
b2c7f4f850
|
Added snippets to analyze the Titanic Passenger Class feature.
|
2015-03-14 20:01:55 -04:00 |
|
Donne Martin
|
8babd3a1cf
|
Added Kaggle section to README.
|
2015-03-14 19:57:55 -04:00 |
|
Donne Martin
|
4ad409aa63
|
Added snippets to start exploring the Titanic data.
|
2015-03-14 19:56:28 -04:00 |
|
Donne Martin
|
bcfae90101
|
Added preliminary Kaggle Titanic survivor analysis containing the competition description, evaluation, data set, and snippet to read in the data to pandas.
|
2015-03-14 19:53:56 -04:00 |
|
Donne Martin
|
1fbbd20c68
|
Added Kaggle Titanic data files.
|
2015-03-14 19:49:07 -04:00 |
|