Commit Graph

  • 1403cf4134 Added sample mrjob mapper and reducer to parse logs on s3 following the standard bucket logging format. Donne Martin 2015-04-03 06:06:46 -0400
  • d4ab154643 Transformed Embarked to dummy variables instead of integer representations. The latter implies ordering, which isn't the case with Embarked. Donne Martin 2015-04-02 23:29:33 -0400
  • 011747c17a Added Spark accumulators snippets. Donne Martin 2015-03-31 21:41:21 -0400
  • 7195c5bc82 Added Spark broadcast variables snippets. Donne Martin 2015-03-30 19:01:07 -0400
  • b3fb4ae219 Added Spark streaming with states snippets. Donne Martin 2015-03-29 17:53:52 -0400
  • f316416d88 Fix genders_mapping being recalculated. Donne Martin 2015-03-28 13:47:21 -0400
  • 89e04b3a89 Renamed data munging to data wrnagling, fixed spacing between variables passed to confusion_matrix. Donne Martin 2015-03-27 06:36:23 -0400
  • 53e0ae41c5 Reduced confusion matrix image, it was too wide and forced a horizontal scroll bar on nbviewer. Donne Martin 2015-03-25 07:56:39 -0400
  • 5e38505cd7 Added random forest classification report. Donne Martin 2015-03-23 07:24:12 -0400
  • 4c7a3a52a1 Added confusion matrix and accuracy metrics to evaluate the model's performance. Donne Martin 2015-03-22 12:18:51 -0400
  • 20ddcd2a01 Added random forest score on training data. Code cleanup. Donne Martin 2015-03-21 10:46:01 -0400
  • bffbb61bc3 Renamed df to df_train to be more explicit of the DataFrame's purpose. Donne Martin 2015-03-20 14:42:05 -0400
  • a6153e5020 Tweaked slicing indices to use single : instead of ::, which I find more readable. Tweaked Feature: Sex headers. Donne Martin 2015-03-20 12:54:20 -0400
  • 5bc4d9bef4 Formatted intro section, added Titanic image. Donne Martin 2015-03-20 11:42:43 -0400
  • 01d65fd232 Added Random Forest: Prepare for Kaggle Submission section. Donne Martin 2015-03-20 11:36:41 -0400
  • 055cd52cd3 Added Random Forest Predicting section. Donne Martin 2015-03-20 11:35:25 -0400
  • 3fcbc8364f Added Random Forest training section. Donne Martin 2015-03-20 11:33:41 -0400
  • ad54e0ae70 Added Data Munging Summary section which contains all the data cleaning and transformation steps described in the notebook. Donne Martin 2015-03-20 11:27:06 -0400
  • 387922662f Replaced nested for loop that calculated the median age based on sex and passenger class with groupby + apply instead. Donne Martin 2015-03-20 11:21:27 -0400
  • c9ca38d211 Only attempt to fill missing ports of embarkation if there are missing values. Reworked the AgeFill process. Dropped SibSp and Parch columns as there are part of FamilySize. Donne Martin 2015-03-18 19:25:58 -0400
  • d58e6423b3 Updated Notebook TOC, dropped PassengerId as it won't be using in the machine learning algorithms. Donne Martin 2015-03-18 14:45:18 -0400
  • 000fea0862 Added section Final Data Preparation for Machine Learning, which drops unused columns and converts the DataFrame to a numpy array. Donne Martin 2015-03-18 14:32:28 -0400
  • 2662e2bb03 Added feature engineering description, a description on the family size histogram, and a brief discussion on a potential feature related to the passenger's name. Donne Martin 2015-03-18 14:21:25 -0400
  • 81660c59d1 Reordered README sections. Donne Martin 2015-03-17 16:21:33 -0400
  • a93f599a9b Cleaned up code, charts, and descriptions in various sections. Donne Martin 2015-03-17 16:16:42 -0400
  • 7d4c5532a8 Rework the age analysis, adding more details and graphs. Donne Martin 2015-03-17 15:44:50 -0400
  • ce3ef575bd Added additional plots to further explore the port of embarkation feature. Donne Martin 2015-03-17 14:53:22 -0400
  • 011313d2e1 Added snippets of feature engineering: creating a new feature family size by combining number of parents and siblings. Donne Martin 2015-03-17 14:05:07 -0400
  • 44eeaf447d Cleaned up some sections, added plots of survival rate by Sex and Pclass. Donne Martin 2015-03-17 14:03:58 -0400
  • 0c56902027 Add plots for features we will analyze in the exploratory data analysis section. Donne Martin 2015-03-17 08:51:30 -0400
  • 10d63efb4a Added Spark streaming snippets. Donne Martin 2015-03-16 16:01:51 -0400
  • 8364d476b3 Updated variable descriptions section to be a markdown cell with a pre tag. Donne Martin 2015-03-15 08:36:23 -0400
  • 8d72ba4cd4 Cleaned up various portions of the notebook. Donne Martin 2015-03-15 08:33:36 -0400
  • 66a98b61d2 Added title and axes labels for Age charts. Donne Martin 2015-03-15 08:09:56 -0400
  • 02b8c05fe9 Fixed range of embarked histogram, as it was not showing the NaN value. Added title and axes labels for passenger gender charts. Donne Martin 2015-03-15 06:32:55 -0400
  • 2b8bf79cfa Added title and axes labels for passenger gender charts. Donne Martin 2015-03-15 06:27:48 -0400
  • 1b887f75ca Added title and axes labels for passenger classes charts. Donne Martin 2015-03-15 06:24:07 -0400
  • e9d533232b Added competition site URL. Fixed Description header. Donne Martin 2015-03-15 06:14:27 -0400
  • 3b8eb8f823 Added snippets to analyze the Titanic passenger Age feature. Donne Martin 2015-03-15 06:11:01 -0400
  • b0f14105ae Added snippets to analyze the Titanic Embarked feature. Donne Martin 2015-03-15 04:07:05 -0400
  • 0394852ba5 Added snippets to analyze the Titanic Sex (Gender) feature. Donne Martin 2015-03-14 20:03:48 -0400
  • b2c7f4f850 Added snippets to analyze the Titanic Passenger Class feature. Donne Martin 2015-03-14 20:01:55 -0400
  • 8babd3a1cf Added Kaggle section to README. Donne Martin 2015-03-14 19:57:55 -0400
  • 4ad409aa63 Added snippets to start exploring the Titanic data. Donne Martin 2015-03-14 19:56:28 -0400
  • bcfae90101 Added preliminary Kaggle Titanic survivor analysis containing the competition description, evaluation, data set, and snippet to read in the data to pandas. Donne Martin 2015-03-14 19:53:56 -0400
  • 1fbbd20c68 Added Kaggle Titanic data files. Donne Martin 2015-03-14 19:49:07 -0400
  • 8196b4bdfb Updated repo description. Donne Martin 2015-03-14 09:22:01 -0400
  • ce605a6fdf Added snippets for configuring Spark applications. Donne Martin 2015-03-13 08:25:50 -0400
  • 53789e0e3e Prefixed Spark commands with ! so they can be executed within IPython Notebook. Donne Martin 2015-03-13 08:09:01 -0400
  • 87b017fd37 Prefixed HDFS commands with ! so they can be executed within IPython Notebook. Donne Martin 2015-03-13 08:07:17 -0400
  • 8c251e43cd Prefixed various misc commands with ! so they can be executed within IPython Notebook. Donne Martin 2015-03-13 08:05:56 -0400
  • 8c4541ae33 Added git reset and pull commands. Donne Martin 2015-03-13 08:03:01 -0400
  • a9ea93b872 Prefixed Linux commands with ! so they can be executed within IPython Notebook. Donne Martin 2015-03-13 07:59:28 -0400
  • 23d3866b8e Prefixed AWS commands with ! so they can be executed within IPython Notebook. Donne Martin 2015-03-13 07:57:12 -0400
  • 1c4e2157a6 Added snippets to demonstrate writing and running a Spark app. Donne Martin 2015-03-12 06:25:40 -0400
  • 9fd62a73ae Added sed command to delete matching lines in place. Added command to display all matching running processes with full formatting. Tweaked formatting of vim section regarding vimtutor and vim syntax coloring. Donne Martin 2015-03-11 20:30:23 -0400
  • 31c4f3299a Updated AWS index. Donne Martin 2015-03-10 17:00:44 -0400
  • 5600ab0377 Added Lambda commands. Donne Martin 2015-03-10 17:00:08 -0400
  • cd84ffb2f0 Added Kinesis commands. Donne Martin 2015-03-09 16:10:54 -0400
  • 1815c9a122 Added snippets to checkpoint RDDs in Spark. Donne Martin 2015-03-08 05:55:45 -0400
  • 0481497848 Added snippets to cache RDDs in Spark. Donne Martin 2015-03-08 05:55:05 -0400
  • 404676a1f7 Added discussion and snippet for working with partitions in Spark. Donne Martin 2015-03-07 09:07:18 -0500
  • bef3dfc9fc Added discussion on viewing the Spark application UI. Donne Martin 2015-03-06 07:53:17 -0500
  • 72cf3af7f1 Added snippets to run Spark on a cluster. Donne Martin 2015-03-05 07:26:54 -0500
  • a5a3da5b28 Added Spark pair RDDs snippets. Donne Martin 2015-03-04 08:28:07 -0500
  • e8b481f480 Added snippets for basic RDD operations. Donne Martin 2015-03-03 10:36:30 -0500
  • 6c7e7b5239 Added Spark IPython Notebook, currently contains snippets for starting the pyspark shell and viewing the spark context. Donne Martin 2015-03-03 10:32:59 -0500
  • a0ac867b7b Added various Linux compression commands. Donne Martin 2015-03-03 10:17:10 -0500
  • 47211cb729 Added anchors for each AWS command line topic. Updated README with AWS topics. Donne Martin 2015-03-02 10:32:03 -0500
  • 39db0b5057 Added curl commands. Donne Martin 2015-03-01 16:27:48 -0500
  • f1f69fbd19 Added commands to view running processes. Donne Martin 2015-03-01 15:51:25 -0500
  • 14ea9025c1 Added Redshift reference tables for create, sort key, dist key, and discussions on how to choose the appropriate keys. Donne Martin 2015-03-01 08:39:59 -0500
  • 17e7736974 Changed Vim commands cell type to code for better formatting on nbviewer. Donne Martin 2015-03-01 08:18:24 -0500
  • 11d7e041fb Added Vim commands. Donne Martin 2015-03-01 08:16:23 -0500
  • 80741219ce Updated repo image. Donne Martin 2015-03-01 07:34:49 -0500
  • d0c4f48469 Tweaked header anchors to work with nbviewer. Donne Martin 2015-03-01 07:00:42 -0500
  • 49c6ae6488 Added anchors for each misc command topic. Updated README with misc command anchors. Donne Martin 2015-03-01 06:56:15 -0500
  • ab3bfad838 Added hyperlinks for each topic listed in misc commands IPython Notebook. Tweaked Jekyll description. Donne Martin 2015-03-01 06:25:57 -0500
  • f9f5e6bfe1 Added index of contents to linux IPython Notebook. Donne Martin 2015-03-01 06:22:43 -0500
  • 09b45a01fa Added IPython Notebook commands. Donne Martin 2015-03-01 06:19:18 -0500
  • 8256531f24 Added Anaconda commands. Added Git description. Donne Martin 2015-03-01 06:01:30 -0500
  • d7b54123e2 Added Ruby commands. Ruby is used to interact with the AWS command line and for Jekyll, a blog framework hosted on GitHub Pages. Donne Martin 2015-03-01 05:57:09 -0500
  • 2363787d59 Combined git and jekyll commands to misc commands IPython Notebook. Donne Martin 2015-03-01 05:53:38 -0500
  • 7586ae1b73 Removed incomplete snippet for pandas idxmax causing an exception. Donne Martin 2015-02-28 18:37:44 -0500
  • 1689cc3e76 Added image source used for slice snippets. Donne Martin 2015-02-28 18:34:17 -0500
  • 21f593da5d Added repo image. Donne Martin 2015-02-28 18:29:16 -0500
  • 36a8dc504f Updated README to include aws and spark. Removed commands suffix from linux, git, jekyll commands as it seemed redundanct. Donne Martin 2015-02-28 18:28:50 -0500
  • b0fe318517 Added IPython Notebook for git commands. Donne Martin 2015-02-28 15:14:59 -0500
  • ee4d38ae71 Added __init__.py files to spark and aws folders. Donne Martin 2015-02-28 13:07:31 -0500
  • fb40d146e5 Added ozone data files used in pandas IPython Notebooks. Donne Martin 2015-02-28 13:06:28 -0500
  • 43c191c8da Moved AWS IPython Notebook to its own directory. Donne Martin 2015-02-28 13:03:06 -0500
  • 7fac30fa79 Renamed folder core to python-core to be more explicit about its contents. Donne Martin 2015-02-28 12:48:47 -0500
  • 87402ca5b8 Added IPython Notebook containing HDFS snippets. Donne Martin 2015-02-28 12:44:56 -0500
  • 85a316fc29 Added instructions on how to add terminal colors by editing your .bash_profile. Donne Martin 2015-02-28 12:40:18 -0500
  • a2191d03bd Added linux snippet to uncompress all tar.gz files in the current directory to another directory. Donne Martin 2015-02-28 12:38:52 -0500
  • d28c32b8da Added grep snippets to check number of files matching a search term and snippet to check the number of MapReduce records processed. Donne Martin 2015-02-28 12:37:18 -0500
  • 133cddb267 Added linux commands to count lines and split files into multiple parts based on line counts. Donne Martin 2015-02-28 12:32:08 -0500
  • a709c709ce Added linux commands IPython Notebook, initially contains command disk usage commands. Donne Martin 2015-02-28 12:30:36 -0500
  • 68b826c212 Added note about donnemartin.com, my mirror site. Donne Martin 2015-02-28 09:17:43 -0500
  • fb1768f39e Added comments on the jekyll build and serve commands. Added Jekyll to list of IPython Notebooks. Donne Martin 2015-02-28 09:15:38 -0500