diff --git a/kaggle/titanic.ipynb b/kaggle/titanic.ipynb index 438a758..3b9409f 100644 --- a/kaggle/titanic.ipynb +++ b/kaggle/titanic.ipynb @@ -1,7 +1,7 @@ { "metadata": { "name": "", - "signature": "sha256:c536f631f40b2ee6ad2ff384cb9076172d442d8b19019094b1af1a8657120e10" + "signature": "sha256:8faa925c9373212bcde3580896d60777b13a18934069bb0bec50503c01d983b0" }, "nbformat": 3, "nbformat_minor": 0, @@ -2333,6 +2333,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "Feature enginering involves creating new features or modifying existing features which might be advantageous to a machine learning algorithm.\n", + "\n", "Define a new feature FamilySize that is the sum of Parch (number of parents or children on board) and SibSp (number of siblings or spouses):" ] }, @@ -2547,14 +2549,22 @@ ], "prompt_number": 35 }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Based on the histograms, it is not immediately obvious what impact FamilySize has on survival. The machine learning algorithms might benefit from this feature.\n", + "\n", + "Additional features we might want to engineer might be related to the Name column, for example honorrary or pedestrian titles might give clues and better predictive power for a male's survival." + ] + }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, - "outputs": [], - "prompt_number": 35 + "outputs": [] } ], "metadata": {}