data-science-ipython-notebooks/pandas/pandas.ipynb

6672 lines
153 KiB
Python
Raw Normal View History

2015-01-28 08:22:37 +08:00
{
2015-05-17 19:36:41 +08:00
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook was prepared by [Donne Martin](http://donnemartin.com). Source and license info is on [GitHub](https://github.com/donnemartin/data-science-ipython-notebooks)."
]
},
2015-01-28 08:22:37 +08:00
{
2015-05-17 19:36:41 +08:00
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas\n",
"\n",
"Credits: The following are notes taken while working through [Python for Data Analysis](http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793) by Wes McKinney\n",
"\n",
2015-05-17 19:36:41 +08:00
"* Series\n",
"* DataFrame\n",
"* Reindexing\n",
"* Dropping Entries\n",
"* Indexing, Selecting, Filtering\n",
"* Arithmetic and Data Alignment\n",
"* Function Application and Mapping\n",
"* Sorting and Ranking\n",
"* Axis Indices with Duplicate Values\n",
"* Summarizing and Computing Descriptive Statistics\n",
"* Cleaning Data (Under Construction)\n",
"* Input and Output (Under Construction)"
2015-05-17 19:36:41 +08:00
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from pandas import Series, DataFrame\n",
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Series\n",
"\n",
"A Series is a one-dimensional array-like object containing an array of data and an associated array of data labels. The data can be any NumPy data type and the labels are the Series' index."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a Series:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 1\n",
"1 1\n",
"2 2\n",
"3 -3\n",
"4 -5\n",
"5 8\n",
"6 13\n",
"dtype: int64"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_1 = Series([1, 1, 2, -3, -5, 8, 13])\n",
"ser_1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get the array representation of a Series:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, 1, 2, -3, -5, 8, 13])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_1.values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Index objects are immutable and hold the axis labels and metadata such as names and axis names.\n",
"\n",
"Get the index of the Series:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Int64Index([0, 1, 2, 3, 4, 5, 6], dtype='int64')"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_1.index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a Series with a custom index:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 1\n",
"b 1\n",
"c 2\n",
"d -3\n",
"e -5\n",
"dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2 = Series([1, 1, 2, -3, -5], index=['a', 'b', 'c', 'd', 'e'])\n",
"ser_2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get a value from a Series:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2[4] == ser_2['e']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get a set of values from a Series by passing in a list:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"c 2\n",
"a 1\n",
"b 1\n",
"dtype: int64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2[['c', 'a', 'b']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get values great than 0:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 1\n",
"b 1\n",
"c 2\n",
"dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2[ser_2 > 0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Scalar multiply:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 2\n",
"b 2\n",
"c 4\n",
"d -6\n",
"e -10\n",
"dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2 * 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Apply a numpy math function:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 2.718282\n",
"b 2.718282\n",
"c 7.389056\n",
"d 0.049787\n",
"e 0.006738\n",
"dtype: float64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"np.exp(ser_2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A Series is like a fixed-length, ordered dict. \n",
"\n",
"Create a series by passing in a dict:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"bar 200\n",
"baz 300\n",
"foo 100\n",
"dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict_1 = {'foo' : 100, 'bar' : 200, 'baz' : 300}\n",
"ser_3 = Series(dict_1)\n",
"ser_3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Re-order a Series by passing in an index (indices not found are NaN):"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"foo 100\n",
"bar 200\n",
"baz 300\n",
"qux NaN\n",
"dtype: float64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"index = ['foo', 'bar', 'baz', 'qux']\n",
"ser_4 = Series(dict_1, index=index)\n",
"ser_4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check for NaN with the pandas method:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"foo False\n",
"bar False\n",
"baz False\n",
"qux True\n",
"dtype: bool"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.isnull(ser_4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check for NaN with the Series method:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"foo False\n",
"bar False\n",
"baz False\n",
"qux True\n",
"dtype: bool"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_4.isnull()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Series automatically aligns differently indexed data in arithmetic operations:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"bar 400\n",
"baz 600\n",
"foo 200\n",
"qux NaN\n",
"dtype: float64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_3 + ser_4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Name a Series:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"ser_4.name = 'foobarbazqux'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Name a Series index:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"ser_4.index.name = 'label'"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"label\n",
"foo 100\n",
"bar 200\n",
"baz 300\n",
"qux NaN\n",
"Name: foobarbazqux, dtype: float64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rename a Series' index in place:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"fo 100\n",
"br 200\n",
"bz 300\n",
"qx NaN\n",
"Name: foobarbazqux, dtype: float64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_4.index = ['fo', 'br', 'bz', 'qx']\n",
"ser_4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DataFrame\n",
"\n",
"A DataFrame is a tabular data structure containing an ordered collection of columns. Each column can have a different type. DataFrames have both row and column indices and is analogous to a dict of Series. Row and column operations are treated roughly symmetrically. Columns returned when indexing a DataFrame are views of the underlying data, not a copy. To obtain a copy, use the Series' copy method.\n",
"\n",
"Create a DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>pop</th>\n",
" <th>state</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 5.0</td>\n",
" <td> VA</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 5.1</td>\n",
" <td> VA</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 5.2</td>\n",
" <td> VA</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 4.0</td>\n",
" <td> MD</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 4.1</td>\n",
" <td> MD</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" pop state year\n",
"0 5.0 VA 2012\n",
"1 5.1 VA 2013\n",
"2 5.2 VA 2014\n",
"3 4.0 MD 2014\n",
"4 4.1 MD 2015"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_1 = {'state' : ['VA', 'VA', 'VA', 'MD', 'MD'],\n",
" 'year' : [2012, 2013, 2014, 2014, 2015],\n",
" 'pop' : [5.0, 5.1, 5.2, 4.0, 4.1]}\n",
"df_1 = DataFrame(data_1)\n",
"df_1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a DataFrame specifying a sequence of columns:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 2012</td>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 2013</td>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2014</td>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2014</td>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2015</td>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year state pop\n",
"0 2012 VA 5.0\n",
"1 2013 VA 5.1\n",
"2 2014 VA 5.2\n",
"3 2014 MD 4.0\n",
"4 2015 MD 4.1"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_2 = DataFrame(data_1, columns=['year', 'state', 'pop'])\n",
"df_2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Like Series, columns that are not present in the data are NaN:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 2012</td>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 2013</td>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2014</td>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2014</td>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2015</td>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year state pop unempl\n",
"0 2012 VA 5.0 NaN\n",
"1 2013 VA 5.1 NaN\n",
"2 2014 VA 5.2 NaN\n",
"3 2014 MD 4.0 NaN\n",
"4 2015 MD 4.1 NaN"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3 = DataFrame(data_1, columns=['year', 'state', 'pop', 'unempl'])\n",
"df_3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Retrieve a column by key, returning a Series:\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 VA\n",
"1 VA\n",
"2 VA\n",
"3 MD\n",
"4 MD\n",
"Name: state, dtype: object"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3['state']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Retrive a column by attribute, returning a Series:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 2012\n",
"1 2013\n",
"2 2014\n",
"3 2014\n",
"4 2015\n",
"Name: year, dtype: int64"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3.year"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Retrieve a row by position:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"year 2012\n",
"state VA\n",
"pop 5\n",
"unempl NaN\n",
"Name: 0, dtype: object"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3.ix[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Update a column by assignment:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 2012</td>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 2013</td>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2014</td>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2014</td>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2015</td>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year state pop unempl\n",
"0 2012 VA 5.0 0\n",
"1 2013 VA 5.1 1\n",
"2 2014 VA 5.2 2\n",
"3 2014 MD 4.0 3\n",
"4 2015 MD 4.1 4"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3['unempl'] = np.arange(5)\n",
"df_3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assign a Series to a column (note if assigning a list or array, the length must match the DataFrame, unlike a Series):"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 2012</td>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 2013</td>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2014</td>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2014</td>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2015</td>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year state pop unempl\n",
"0 2012 VA 5.0 NaN\n",
"1 2013 VA 5.1 NaN\n",
"2 2014 VA 5.2 6.0\n",
"3 2014 MD 4.0 6.0\n",
"4 2015 MD 4.1 6.1"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"unempl = Series([6.0, 6.0, 6.1], index=[2, 3, 4])\n",
"df_3['unempl'] = unempl\n",
"df_3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assign a new column that doesn't exist to create a new column:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>state_dup</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 2012</td>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" <td> VA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 2013</td>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" <td> VA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2014</td>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" <td> VA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2014</td>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" <td> MD</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2015</td>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" <td> MD</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year state pop unempl state_dup\n",
"0 2012 VA 5.0 NaN VA\n",
"1 2013 VA 5.1 NaN VA\n",
"2 2014 VA 5.2 6.0 VA\n",
"3 2014 MD 4.0 6.0 MD\n",
"4 2015 MD 4.1 6.1 MD"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3['state_dup'] = df_3['state']\n",
"df_3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Delete a column:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 2012</td>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 2013</td>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2014</td>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2014</td>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2015</td>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year state pop unempl\n",
"0 2012 VA 5.0 NaN\n",
"1 2013 VA 5.1 NaN\n",
"2 2014 VA 5.2 6.0\n",
"3 2014 MD 4.0 6.0\n",
"4 2015 MD 4.1 6.1"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"del df_3['state_dup']\n",
"df_3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a DataFrame from a nested dict of dicts (the keys in the inner dicts are unioned and sorted to form the index in the result, unless an explicit index is specified):"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MD</th>\n",
" <th>VA</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2013</th>\n",
" <td> NaN</td>\n",
" <td> 5.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2014</th>\n",
" <td> 4.0</td>\n",
" <td> 5.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015</th>\n",
" <td> 4.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MD VA\n",
"2013 NaN 5.1\n",
"2014 4.0 5.2\n",
"2015 4.1 NaN"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pop = {'VA' : {2013 : 5.1, 2014 : 5.2},\n",
" 'MD' : {2014 : 4.0, 2015 : 4.1}}\n",
"df_4 = DataFrame(pop)\n",
"df_4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Transpose the DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>2013</th>\n",
" <th>2014</th>\n",
" <th>2015</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>MD</th>\n",
" <td> NaN</td>\n",
" <td> 4.0</td>\n",
" <td> 4.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>VA</th>\n",
" <td> 5.1</td>\n",
" <td> 5.2</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 2013 2014 2015\n",
"MD NaN 4.0 4.1\n",
"VA 5.1 5.2 NaN"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_4.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a DataFrame from a dict of Series:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MD</th>\n",
" <th>VA</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2014</th>\n",
" <td> NaN</td>\n",
" <td> 5.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015</th>\n",
" <td> 4.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MD VA\n",
"2014 NaN 5.2\n",
"2015 4.1 NaN"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_2 = {'VA' : df_4['VA'][1:],\n",
" 'MD' : df_4['MD'][2:]}\n",
"df_5 = DataFrame(data_2)\n",
"df_5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set the DataFrame index name:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MD</th>\n",
" <th>VA</th>\n",
" </tr>\n",
" <tr>\n",
" <th>year</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2014</th>\n",
" <td> NaN</td>\n",
" <td> 5.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015</th>\n",
" <td> 4.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MD VA\n",
"year \n",
"2014 NaN 5.2\n",
"2015 4.1 NaN"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_5.index.name = 'year'\n",
"df_5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set the DataFrame columns name:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>state</th>\n",
" <th>MD</th>\n",
" <th>VA</th>\n",
" </tr>\n",
" <tr>\n",
" <th>year</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2014</th>\n",
" <td> NaN</td>\n",
" <td> 5.2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2015</th>\n",
" <td> 4.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"state MD VA\n",
"year \n",
"2014 NaN 5.2\n",
"2015 4.1 NaN"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_5.columns.name = 'state'\n",
"df_5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Return the data contained in a DataFrame as a 2D ndarray:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ nan, 5.2],\n",
" [ 4.1, nan]])"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_5.values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the columns are different dtypes, the 2D ndarray's dtype will accomodate all of the columns:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[2012, 'VA', 5.0, nan],\n",
" [2013, 'VA', 5.1, nan],\n",
" [2014, 'VA', 5.2, 6.0],\n",
" [2014, 'MD', 4.0, 6.0],\n",
" [2015, 'MD', 4.1, 6.1]], dtype=object)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3.values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reindexing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a new object with the data conformed to a new index. Any missing values are set to NaN."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 2012</td>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 2013</td>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2014</td>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2014</td>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2015</td>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year state pop unempl\n",
"0 2012 VA 5.0 NaN\n",
"1 2013 VA 5.1 NaN\n",
"2 2014 VA 5.2 6.0\n",
"3 2014 MD 4.0 6.0\n",
"4 2015 MD 4.1 6.1"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Reindexing rows returns a new frame with the specified index:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2015</td>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2014</td>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2014</td>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 2013</td>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 2012</td>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year state pop unempl\n",
"5 NaN NaN NaN NaN\n",
"4 2015 MD 4.1 6.1\n",
"3 2014 MD 4.0 6.0\n",
"2 2014 VA 5.2 6.0\n",
"1 2013 VA 5.1 NaN\n",
"0 2012 VA 5.0 NaN"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3.reindex(list(reversed(range(0, 6))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Missing values can be set to something other than NaN:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [year, state, pop, unempl]\n",
"Index: []"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3.reindex(range(6, 0), fill_value=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interpolate ordered data like a time series:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"ser_5 = Series(['foo', 'bar', 'baz'], index=[0, 2, 4])"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 foo\n",
"1 foo\n",
"2 bar\n",
"3 bar\n",
"4 baz\n",
"dtype: object"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
2015-01-28 08:22:37 +08:00
}
],
2015-05-17 19:36:41 +08:00
"source": [
"ser_5.reindex(range(5), method='ffill')"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 foo\n",
"1 bar\n",
"2 bar\n",
"3 baz\n",
"4 baz\n",
"dtype: object"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_5.reindex(range(5), method='bfill')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Reindex columns:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"0 VA 5.0 NaN 2012\n",
"1 VA 5.1 NaN 2013\n",
"2 VA 5.2 6.0 2014\n",
"3 MD 4.0 6.0 2014\n",
"4 MD 4.1 6.1 2015"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3.reindex(columns=['state', 'pop', 'unempl', 'year'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Reindex rows and columns while filling rows:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 0</td>\n",
" <td> 0.0</td>\n",
" <td> 0.0</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"5 0 0.0 0.0 0\n",
"4 MD 4.1 6.1 2015\n",
"3 MD 4.0 6.0 2014\n",
"2 VA 5.2 6.0 2014\n",
"1 VA 5.1 NaN 2013\n",
"0 VA 5.0 NaN 2012"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_3.reindex(index=list(reversed(range(0, 6))),\n",
" fill_value=0,\n",
" columns=['state', 'pop', 'unempl', 'year'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Reindex using ix:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"0 VA 5.0 NaN 2012\n",
"1 VA 5.1 NaN 2013\n",
"2 VA 5.2 6.0 2014\n",
"3 MD 4.0 6.0 2014\n",
"4 MD 4.1 6.1 2015\n",
"5 NaN NaN NaN NaN\n",
"6 NaN NaN NaN NaN"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6 = df_3.ix[range(0, 7), ['state', 'pop', 'unempl', 'year']]\n",
"df_6"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dropping Entries"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Drop rows from a Series or DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"2 VA 5.2 6.0 2014\n",
"3 MD 4.0 6.0 2014\n",
"4 MD 4.1 6.1 2015\n",
"5 NaN NaN NaN NaN\n",
"6 NaN NaN NaN NaN"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_7 = df_6.drop([0, 1])\n",
"df_7"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Drop columns from a DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop year\n",
"2 VA 5.2 2014\n",
"3 MD 4.0 2014\n",
"4 MD 4.1 2015\n",
"5 NaN NaN NaN\n",
"6 NaN NaN NaN"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_7 = df_7.drop('unempl', axis=1)\n",
"df_7"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Indexing, Selecting, Filtering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Series indexing is similar to NumPy array indexing with the added bonus of being able to use the Series' index values."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 1\n",
"b 1\n",
"c 2\n",
"d -3\n",
"e -5\n",
"dtype: int64"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select a value from a Series:"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2[0] == ser_2['a']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select a slice from a Series:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"b 1\n",
"c 2\n",
"d -3\n",
"dtype: int64"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2[1:4]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select specific values from a Series:"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"b 1\n",
"c 2\n",
"d -3\n",
"dtype: int64"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2[['b', 'c', 'd']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select from a Series based on a filter:"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 1\n",
"b 1\n",
"c 2\n",
"dtype: int64"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2[ser_2 > 0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select a slice from a Series with labels (note the end point is inclusive):"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 1\n",
"b 1\n",
"dtype: int64"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2['a':'b']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assign to a Series slice (note the end point is inclusive):"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 0\n",
"b 0\n",
"c 2\n",
"d -3\n",
"e -5\n",
"dtype: int64"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_2['a':'b'] = 0\n",
"ser_2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas supports indexing into a DataFrame."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"0 VA 5.0 NaN 2012\n",
"1 VA 5.1 NaN 2013\n",
"2 VA 5.2 6.0 2014\n",
"3 MD 4.0 6.0 2014\n",
"4 MD 4.1 6.1 2015\n",
"5 NaN NaN NaN NaN\n",
"6 NaN NaN NaN NaN"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select specified columns from a DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" pop unempl\n",
"0 5.0 NaN\n",
"1 5.1 NaN\n",
"2 5.2 6.0\n",
"3 4.0 6.0\n",
"4 4.1 6.1\n",
"5 NaN NaN\n",
"6 NaN NaN"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6[['pop', 'unempl']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select a slice from a DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td>NaN</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td>NaN</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"0 VA 5.0 NaN 2012\n",
"1 VA 5.1 NaN 2013"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6[:2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select from a DataFrame based on a filter:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td>NaN</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"1 VA 5.1 NaN 2013\n",
"2 VA 5.2 6 2014"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6[df_6['pop'] > 5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Perform a scalar comparison on a DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> True</td>\n",
" <td> False</td>\n",
" <td> False</td>\n",
" <td> True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> True</td>\n",
" <td> True</td>\n",
" <td> False</td>\n",
" <td> True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> True</td>\n",
" <td> True</td>\n",
" <td> True</td>\n",
" <td> True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> True</td>\n",
" <td> False</td>\n",
" <td> True</td>\n",
" <td> True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> True</td>\n",
" <td> False</td>\n",
" <td> True</td>\n",
" <td> True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> False</td>\n",
" <td> False</td>\n",
" <td> False</td>\n",
" <td> False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> False</td>\n",
" <td> False</td>\n",
" <td> False</td>\n",
" <td> False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"0 True False False True\n",
"1 True True False True\n",
"2 True True True True\n",
"3 True False True True\n",
"4 True False True True\n",
"5 False False False False\n",
"6 False False False False"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6 > 5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Perform a scalar comparison on a DataFrame, retain the values that pass the filter:"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> VA</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> NaN</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MD</td>\n",
" <td> NaN</td>\n",
" <td> 6.1</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"0 VA NaN NaN 2012\n",
"1 VA 5.1 NaN 2013\n",
"2 VA 5.2 6.0 2014\n",
"3 MD NaN 6.0 2014\n",
"4 MD NaN 6.1 2015\n",
"5 NaN NaN NaN NaN\n",
"6 NaN NaN NaN NaN"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6[df_6 > 5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select a slice of rows from a DataFrame (note the end point is inclusive):"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"2 VA 5.2 6 2014\n",
"3 MD 4.0 6 2014"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6.ix[2:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select a slice of rows from a specific column of a DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"0 VA 5.0 NaN 2012\n",
"1 VA 5.1 NaN 2013\n",
"2 VA 5.2 6.0 2014\n",
"3 MD 4.0 6.0 2014\n",
"4 MD 4.1 6.1 2015\n",
"5 NaN NaN NaN NaN\n",
"6 NaN NaN NaN NaN"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6.ix[0:2, 'pop']\n",
"df_6"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select rows based on an arithmetic operation on a specific row:"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"2 VA 5.2 6.0 2014\n",
"3 MD 4.0 6.0 2014\n",
"4 MD 4.1 6.1 2015"
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6.ix[df_6.unempl > 5.0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Arithmetic and Data Alignment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Adding Series objects results in the union of index pairs if the pairs are not the same, resulting in NaN for indices that do not overlap:"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 1.764052\n",
"b 0.400157\n",
"c 0.978738\n",
"d 2.240893\n",
"e 1.867558\n",
"dtype: float64"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.seed(0)\n",
"ser_6 = Series(np.random.randn(5),\n",
" index=['a', 'b', 'c', 'd', 'e'])\n",
"ser_6"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 1.624345\n",
"c -0.611756\n",
"e -0.528172\n",
"f -1.072969\n",
"g 0.865408\n",
"dtype: float64"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.seed(1)\n",
"ser_7 = Series(np.random.randn(5),\n",
" index=['a', 'c', 'e', 'f', 'g'])\n",
"ser_7"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 3.388398\n",
"b NaN\n",
"c 0.366982\n",
"d NaN\n",
"e 1.339386\n",
"f NaN\n",
"g NaN\n",
"dtype: float64"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_6 + ser_7"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set a fill value instead of NaN for indices that do not overlap:"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 3.388398\n",
"b 0.400157\n",
"c 0.366982\n",
"d 2.240893\n",
"e 1.339386\n",
"f -1.072969\n",
"g 0.865408\n",
"dtype: float64"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_6.add(ser_7, fill_value=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Adding DataFrame objects results in the union of index pairs for rows and columns if the pairs are not the same, resulting in NaN for indices that do not overlap:"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 0.548814</td>\n",
" <td> 0.715189</td>\n",
" <td> 0.602763</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 0.544883</td>\n",
" <td> 0.423655</td>\n",
" <td> 0.645894</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.437587</td>\n",
" <td> 0.891773</td>\n",
" <td> 0.963663</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c\n",
"0 0.548814 0.715189 0.602763\n",
"1 0.544883 0.423655 0.645894\n",
"2 0.437587 0.891773 0.963663"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.seed(0)\n",
"df_8 = DataFrame(np.random.rand(9).reshape((3, 3)),\n",
" columns=['a', 'b', 'c'])\n",
"df_8"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 0.417022</td>\n",
" <td> 0.720324</td>\n",
" <td> 0.000114</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 0.302333</td>\n",
" <td> 0.146756</td>\n",
" <td> 0.092339</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.186260</td>\n",
" <td> 0.345561</td>\n",
" <td> 0.396767</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" b c d\n",
"0 0.417022 0.720324 0.000114\n",
"1 0.302333 0.146756 0.092339\n",
"2 0.186260 0.345561 0.396767"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.seed(1)\n",
"df_9 = DataFrame(np.random.rand(9).reshape((3, 3)),\n",
" columns=['b', 'c', 'd'])\n",
"df_9"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" <td> 1.132211</td>\n",
" <td> 1.323088</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" <td> 0.725987</td>\n",
" <td> 0.792650</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>NaN</td>\n",
" <td> 1.078033</td>\n",
" <td> 1.309223</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c d\n",
"0 NaN 1.132211 1.323088 NaN\n",
"1 NaN 0.725987 0.792650 NaN\n",
"2 NaN 1.078033 1.309223 NaN"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_8 + df_9"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set a fill value instead of NaN for indices that do not overlap:"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 0.548814</td>\n",
" <td> 1.132211</td>\n",
" <td> 1.323088</td>\n",
" <td> 0.000114</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 0.544883</td>\n",
" <td> 0.725987</td>\n",
" <td> 0.792650</td>\n",
" <td> 0.092339</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.437587</td>\n",
" <td> 1.078033</td>\n",
" <td> 1.309223</td>\n",
" <td> 0.396767</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c d\n",
"0 0.548814 1.132211 1.323088 0.000114\n",
"1 0.544883 0.725987 0.792650 0.092339\n",
"2 0.437587 1.078033 1.309223 0.396767"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_10 = df_8.add(df_9, fill_value=0)\n",
"df_10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Like NumPy, pandas supports arithmetic operations between DataFrames and Series.\n",
"\n",
"Match the index of the Series on the DataFrame's columns, broadcasting down the rows:"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-0.003930</td>\n",
" <td>-0.406224</td>\n",
" <td>-0.530438</td>\n",
" <td> 0.092224</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-0.111226</td>\n",
" <td>-0.054178</td>\n",
" <td>-0.013864</td>\n",
" <td> 0.396653</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c d\n",
"0 0.000000 0.000000 0.000000 0.000000\n",
"1 -0.003930 -0.406224 -0.530438 0.092224\n",
"2 -0.111226 -0.054178 -0.013864 0.396653"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_8 = df_10.ix[0]\n",
"df_11 = df_10 - ser_8\n",
"df_11"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Match the index of the Series on the DataFrame's columns, broadcasting down the rows and union the indices that do not match:"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 0\n",
"d 1\n",
"e 2\n",
"dtype: int64"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_9 = Series(range(3), index=['a', 'd', 'e'])\n",
"ser_9"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" <th>d</th>\n",
" <th>e</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 0.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>-1.000000</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-0.003930</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>-0.907776</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-0.111226</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>-0.603347</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c d e\n",
"0 0.000000 NaN NaN -1.000000 NaN\n",
"1 -0.003930 NaN NaN -0.907776 NaN\n",
"2 -0.111226 NaN NaN -0.603347 NaN"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_11 - ser_9"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Broadcast over the columns and match the rows (axis=0) by using an arithmetic method:"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 0.548814</td>\n",
" <td> 1.132211</td>\n",
" <td> 1.323088</td>\n",
" <td> 0.000114</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 0.544883</td>\n",
" <td> 0.725987</td>\n",
" <td> 0.792650</td>\n",
" <td> 0.092339</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.437587</td>\n",
" <td> 1.078033</td>\n",
" <td> 1.309223</td>\n",
" <td> 0.396767</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c d\n",
"0 0.548814 1.132211 1.323088 0.000114\n",
"1 0.544883 0.725987 0.792650 0.092339\n",
"2 0.437587 1.078033 1.309223 0.396767"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_10"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 100\n",
"1 200\n",
"2 300\n",
"dtype: int64"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_10 = Series([100, 200, 300])\n",
"ser_10"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> -99.451186</td>\n",
" <td> -98.867789</td>\n",
" <td> -98.676912</td>\n",
" <td> -99.999886</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-199.455117</td>\n",
" <td>-199.274013</td>\n",
" <td>-199.207350</td>\n",
" <td>-199.907661</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-299.562413</td>\n",
" <td>-298.921967</td>\n",
" <td>-298.690777</td>\n",
" <td>-299.603233</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c d\n",
"0 -99.451186 -98.867789 -98.676912 -99.999886\n",
"1 -199.455117 -199.274013 -199.207350 -199.907661\n",
"2 -299.562413 -298.921967 -298.690777 -299.603233"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_10.sub(ser_10, axis=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Function Application and Mapping"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"NumPy ufuncs (element-wise array methods) operate on pandas objects:"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 0.003930</td>\n",
" <td> 0.406224</td>\n",
" <td> 0.530438</td>\n",
" <td> 0.092224</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.111226</td>\n",
" <td> 0.054178</td>\n",
" <td> 0.013864</td>\n",
" <td> 0.396653</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c d\n",
"0 0.000000 0.000000 0.000000 0.000000\n",
"1 0.003930 0.406224 0.530438 0.092224\n",
"2 0.111226 0.054178 0.013864 0.396653"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_11 = np.abs(df_11)\n",
"df_11"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Apply a function on 1D arrays to each column:"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"a 0.111226\n",
"b 0.406224\n",
"c 0.530438\n",
"d 0.396653\n",
"dtype: float64"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"func_1 = lambda x: x.max() - x.min()\n",
"df_11.apply(func_1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Apply a function on 1D arrays to each row:"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 0.000000\n",
"1 0.526508\n",
"2 0.382789\n",
"dtype: float64"
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_11.apply(func_1, axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Apply a function and return a DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>min</th>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" <td> 0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td> 0.111226</td>\n",
" <td> 0.406224</td>\n",
" <td> 0.530438</td>\n",
" <td> 0.396653</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c d\n",
"min 0.000000 0.000000 0.000000 0.000000\n",
"max 0.111226 0.406224 0.530438 0.396653"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"func_2 = lambda x: Series([x.min(), x.max()], index=['min', 'max'])\n",
"df_11.apply(func_2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Apply an element-wise Python function to a DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>c</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 0.00</td>\n",
" <td> 0.00</td>\n",
" <td> 0.00</td>\n",
" <td> 0.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 0.00</td>\n",
" <td> 0.41</td>\n",
" <td> 0.53</td>\n",
" <td> 0.09</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 0.11</td>\n",
" <td> 0.05</td>\n",
" <td> 0.01</td>\n",
" <td> 0.40</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" a b c d\n",
"0 0.00 0.00 0.00 0.00\n",
"1 0.00 0.41 0.53 0.09\n",
"2 0.11 0.05 0.01 0.40"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"func_3 = lambda x: '%.2f' %x\n",
"df_11.applymap(func_3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Apply an element-wise Python function to a Series:"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 0.00\n",
"1 0.00\n",
"2 0.11\n",
"Name: a, dtype: object"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_11['a'].map(func_3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sorting and Ranking"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"fo 100\n",
"br 200\n",
"bz 300\n",
"qx NaN\n",
"Name: foobarbazqux, dtype: float64"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sort a Series by its index:"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"br 200\n",
"bz 300\n",
"fo 100\n",
"qx NaN\n",
"Name: foobarbazqux, dtype: float64"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_4.sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sort a Series by its values:"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"fo 100\n",
"br 200\n",
"bz 300\n",
"qx NaN\n",
"Name: foobarbazqux, dtype: float64"
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_4.order()"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>c</th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>three</th>\n",
" <td> 0</td>\n",
" <td> 1</td>\n",
" <td> 2</td>\n",
" <td> 3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>one</th>\n",
" <td> 4</td>\n",
" <td> 5</td>\n",
" <td> 6</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>two</th>\n",
" <td> 8</td>\n",
" <td> 9</td>\n",
" <td> 10</td>\n",
" <td> 11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" c a b d\n",
"three 0 1 2 3\n",
"one 4 5 6 7\n",
"two 8 9 10 11"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_12 = DataFrame(np.arange(12).reshape((3, 4)),\n",
" index=['three', 'one', 'two'],\n",
" columns=['c', 'a', 'b', 'd'])\n",
"df_12"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sort a DataFrame by its index:"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>c</th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>one</th>\n",
" <td> 4</td>\n",
" <td> 5</td>\n",
" <td> 6</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>three</th>\n",
" <td> 0</td>\n",
" <td> 1</td>\n",
" <td> 2</td>\n",
" <td> 3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>two</th>\n",
" <td> 8</td>\n",
" <td> 9</td>\n",
" <td> 10</td>\n",
" <td> 11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" c a b d\n",
"one 4 5 6 7\n",
"three 0 1 2 3\n",
"two 8 9 10 11"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_12.sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sort a DataFrame by columns in descending order:"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>d</th>\n",
" <th>c</th>\n",
" <th>b</th>\n",
" <th>a</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>three</th>\n",
" <td> 3</td>\n",
" <td> 0</td>\n",
" <td> 2</td>\n",
" <td> 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>one</th>\n",
" <td> 7</td>\n",
" <td> 4</td>\n",
" <td> 6</td>\n",
" <td> 5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>two</th>\n",
" <td> 11</td>\n",
" <td> 8</td>\n",
" <td> 10</td>\n",
" <td> 9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" d c b a\n",
"three 3 0 2 1\n",
"one 7 4 6 5\n",
"two 11 8 10 9"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_12.sort_index(axis=1, ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sort a DataFrame's values by column:"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>c</th>\n",
" <th>a</th>\n",
" <th>b</th>\n",
" <th>d</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>three</th>\n",
" <td> 0</td>\n",
" <td> 1</td>\n",
" <td> 2</td>\n",
" <td> 3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>one</th>\n",
" <td> 4</td>\n",
" <td> 5</td>\n",
" <td> 6</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>two</th>\n",
" <td> 8</td>\n",
" <td> 9</td>\n",
" <td> 10</td>\n",
" <td> 11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" c a b d\n",
"three 0 1 2 3\n",
"one 4 5 6 7\n",
"two 8 9 10 11"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_12.sort_index(by=['d', 'c'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ranking is similar to numpy.argsort except that ties are broken by assigning each group the mean rank:"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 -5\n",
"5 0\n",
"4 2\n",
"3 4\n",
"6 4\n",
"0 7\n",
"2 7\n",
"7 7\n",
"dtype: int64"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_11 = Series([7, -5, 7, 4, 2, 0, 4, 7])\n",
"ser_11 = ser_11.order()\n",
"ser_11"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 1.0\n",
"5 2.0\n",
"4 3.0\n",
"3 4.5\n",
"6 4.5\n",
"0 7.0\n",
"2 7.0\n",
"7 7.0\n",
"dtype: float64"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_11.rank()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rank a Series according to when they appear in the data:"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 1\n",
"5 2\n",
"4 3\n",
"3 4\n",
"6 5\n",
"0 6\n",
"2 7\n",
"7 8\n",
"dtype: float64"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_11.rank(method='first')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rank a Series in descending order, using the maximum rank for the group:"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 8\n",
"5 7\n",
"4 6\n",
"3 5\n",
"6 5\n",
"0 3\n",
"2 3\n",
"7 3\n",
"dtype: float64"
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_11.rank(ascending=False, method='max')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"DataFrames can rank over rows or columns."
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>bar</th>\n",
" <th>baz</th>\n",
" <th>foo</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-5</td>\n",
" <td>-1</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 4</td>\n",
" <td> 2</td>\n",
" <td>-5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 2</td>\n",
" <td> 3</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 4</td>\n",
" <td> 5</td>\n",
" <td> 2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 7</td>\n",
" <td> 9</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> 7</td>\n",
" <td> 9</td>\n",
" <td> 4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td> 8</td>\n",
" <td> 5</td>\n",
" <td> 7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" bar baz foo\n",
"0 -5 -1 7\n",
"1 4 2 -5\n",
"2 2 3 7\n",
"3 0 0 4\n",
"4 4 5 2\n",
"5 7 9 0\n",
"6 7 9 4\n",
"7 8 5 7"
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_13 = DataFrame({'foo' : [7, -5, 7, 4, 2, 0, 4, 7],\n",
" 'bar' : [-5, 4, 2, 0, 4, 7, 7, 8],\n",
" 'baz' : [-1, 2, 3, 0, 5, 9, 9, 5]})\n",
"df_13"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rank a DataFrame over rows:"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>bar</th>\n",
" <th>baz</th>\n",
" <th>foo</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 1.0</td>\n",
" <td> 1.0</td>\n",
" <td> 7.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 4.5</td>\n",
" <td> 3.0</td>\n",
" <td> 1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 3.0</td>\n",
" <td> 4.0</td>\n",
" <td> 7.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 2.0</td>\n",
" <td> 2.0</td>\n",
" <td> 4.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 4.5</td>\n",
" <td> 5.5</td>\n",
" <td> 3.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 6.5</td>\n",
" <td> 7.5</td>\n",
" <td> 2.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> 6.5</td>\n",
" <td> 7.5</td>\n",
" <td> 4.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td> 8.0</td>\n",
" <td> 5.5</td>\n",
" <td> 7.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" bar baz foo\n",
"0 1.0 1.0 7.0\n",
"1 4.5 3.0 1.0\n",
"2 3.0 4.0 7.0\n",
"3 2.0 2.0 4.5\n",
"4 4.5 5.5 3.0\n",
"5 6.5 7.5 2.0\n",
"6 6.5 7.5 4.5\n",
"7 8.0 5.5 7.0"
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_13.rank()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rank a DataFrame over columns:"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>bar</th>\n",
" <th>baz</th>\n",
" <th>foo</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 1.0</td>\n",
" <td> 2.0</td>\n",
" <td> 3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 3.0</td>\n",
" <td> 2.0</td>\n",
" <td> 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 1.0</td>\n",
" <td> 2.0</td>\n",
" <td> 3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 1.5</td>\n",
" <td> 1.5</td>\n",
" <td> 3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 2.0</td>\n",
" <td> 3.0</td>\n",
" <td> 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> 2.0</td>\n",
" <td> 3.0</td>\n",
" <td> 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> 2.0</td>\n",
" <td> 3.0</td>\n",
" <td> 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td> 3.0</td>\n",
" <td> 1.0</td>\n",
" <td> 2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" bar baz foo\n",
"0 1.0 2.0 3\n",
"1 3.0 2.0 1\n",
"2 1.0 2.0 3\n",
"3 1.5 1.5 3\n",
"4 2.0 3.0 1\n",
"5 2.0 3.0 1\n",
"6 2.0 3.0 1\n",
"7 3.0 1.0 2"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_13.rank(axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Axis Indexes with Duplicate Values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Labels do not have to be unique in Pandas:"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"foo 0\n",
"foo 1\n",
"bar 2\n",
"bar 3\n",
"baz 4\n",
"dtype: int64"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_12 = Series(range(5), index=['foo', 'foo', 'bar', 'bar', 'baz'])\n",
"ser_12"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_12.index.is_unique"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select Series elements:"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"foo 0\n",
"foo 1\n",
"dtype: int64"
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser_12['foo']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select DataFrame elements:"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>foo</th>\n",
" <td>-2.363469</td>\n",
" <td> 1.135345</td>\n",
" <td>-1.017014</td>\n",
" <td> 0.637362</td>\n",
" </tr>\n",
" <tr>\n",
" <th>foo</th>\n",
" <td>-0.859907</td>\n",
" <td> 1.772608</td>\n",
" <td>-1.110363</td>\n",
" <td> 0.181214</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bar</th>\n",
" <td> 0.564345</td>\n",
" <td>-0.566510</td>\n",
" <td> 0.729976</td>\n",
" <td> 0.372994</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bar</th>\n",
" <td> 0.533811</td>\n",
" <td>-0.091973</td>\n",
" <td> 1.913820</td>\n",
" <td> 0.330797</td>\n",
" </tr>\n",
" <tr>\n",
" <th>baz</th>\n",
" <td> 1.141943</td>\n",
" <td>-1.129595</td>\n",
" <td>-0.850052</td>\n",
" <td> 0.960820</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3\n",
"foo -2.363469 1.135345 -1.017014 0.637362\n",
"foo -0.859907 1.772608 -1.110363 0.181214\n",
"bar 0.564345 -0.566510 0.729976 0.372994\n",
"bar 0.533811 -0.091973 1.913820 0.330797\n",
"baz 1.141943 -1.129595 -0.850052 0.960820"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_14 = DataFrame(np.random.randn(5, 4),\n",
" index=['foo', 'foo', 'bar', 'bar', 'baz'])\n",
"df_14"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" <th>1</th>\n",
" <th>2</th>\n",
" <th>3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>bar</th>\n",
" <td> 0.564345</td>\n",
" <td>-0.566510</td>\n",
" <td> 0.729976</td>\n",
" <td> 0.372994</td>\n",
" </tr>\n",
" <tr>\n",
" <th>bar</th>\n",
" <td> 0.533811</td>\n",
" <td>-0.091973</td>\n",
" <td> 1.913820</td>\n",
" <td> 0.330797</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0 1 2 3\n",
"bar 0.564345 -0.566510 0.729976 0.372994\n",
"bar 0.533811 -0.091973 1.913820 0.330797"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_14.ix['bar']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summarizing and Computing Descriptive Statistics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unlike NumPy arrays, Pandas descriptive statistics automatically exclude missing data. NaN values are excluded unless the entire row or column is NA."
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>pop</th>\n",
" <th>unempl</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> VA</td>\n",
" <td> 5.0</td>\n",
" <td> NaN</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> VA</td>\n",
" <td> 5.1</td>\n",
" <td> NaN</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VA</td>\n",
" <td> 5.2</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MD</td>\n",
" <td> 4.0</td>\n",
" <td> 6.0</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MD</td>\n",
" <td> 4.1</td>\n",
" <td> 6.1</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state pop unempl year\n",
"0 VA 5.0 NaN 2012\n",
"1 VA 5.1 NaN 2013\n",
"2 VA 5.2 6.0 2014\n",
"3 MD 4.0 6.0 2014\n",
"4 MD 4.1 6.1 2015\n",
"5 NaN NaN NaN NaN\n",
"6 NaN NaN NaN NaN"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"pop 23.4\n",
"unempl 18.1\n",
"year 10068.0\n",
"dtype: float64"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6.sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sum over the rows:"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 2017.0\n",
"1 2018.1\n",
"2 2025.2\n",
"3 2024.0\n",
"4 2025.2\n",
"5 NaN\n",
"6 NaN\n",
"dtype: float64"
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6.sum(axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Account for NaNs:"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 NaN\n",
"1 NaN\n",
"2 2025.2\n",
"3 2024.0\n",
"4 2025.2\n",
"5 NaN\n",
"6 NaN\n",
"dtype: float64"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_6.sum(axis=1, skipna=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cleaning Data (Under Construction)\n",
"* Replace\n",
"* Drop\n",
"* Concatenate"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from pandas import Series, DataFrame\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Setup a DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>population</th>\n",
" <th>state</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 5.0</td>\n",
" <td> VA</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 5.1</td>\n",
" <td> VA</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 5.2</td>\n",
" <td> VA</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 4.0</td>\n",
" <td> MD</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 4.1</td>\n",
" <td> MD</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" population state year\n",
"0 5.0 VA 2012\n",
"1 5.1 VA 2013\n",
"2 5.2 VA 2014\n",
"3 4.0 MD 2014\n",
"4 4.1 MD 2015"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_1 = {'state' : ['VA', 'VA', 'VA', 'MD', 'MD'],\n",
" 'year' : [2012, 2013, 2014, 2014, 2015],\n",
" 'population' : [5.0, 5.1, 5.2, 4.0, 4.1]}\n",
"df_1 = DataFrame(data_1)\n",
"df_1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Replace"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Replace all occurrences of a string with another string, in place (no copy):"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>population</th>\n",
" <th>state</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 5.0</td>\n",
" <td> VIRGINIA</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 5.1</td>\n",
" <td> VIRGINIA</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 5.2</td>\n",
" <td> VIRGINIA</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 4.0</td>\n",
" <td> MD</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 4.1</td>\n",
" <td> MD</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" population state year\n",
"0 5.0 VIRGINIA 2012\n",
"1 5.1 VIRGINIA 2013\n",
"2 5.2 VIRGINIA 2014\n",
"3 4.0 MD 2014\n",
"4 4.1 MD 2015"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_1.replace('VA', 'VIRGINIA', inplace=True)\n",
"df_1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In a specified column, replace all occurrences of a string with another string, in place (no copy):"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>population</th>\n",
" <th>state</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 5.0</td>\n",
" <td> VIRGINIA</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 5.1</td>\n",
" <td> VIRGINIA</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 5.2</td>\n",
" <td> VIRGINIA</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 4.0</td>\n",
" <td> MARYLAND</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 4.1</td>\n",
" <td> MARYLAND</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" population state year\n",
"0 5.0 VIRGINIA 2012\n",
"1 5.1 VIRGINIA 2013\n",
"2 5.2 VIRGINIA 2014\n",
"3 4.0 MARYLAND 2014\n",
"4 4.1 MARYLAND 2015"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_1.replace({'state' : { 'MD' : 'MARYLAND' }}, inplace=True)\n",
"df_1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Drop"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Drop the 'population' column and return a copy of the DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>state</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> VIRGINIA</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> VIRGINIA</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> VIRGINIA</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> MARYLAND</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> MARYLAND</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" state year\n",
"0 VIRGINIA 2012\n",
"1 VIRGINIA 2013\n",
"2 VIRGINIA 2014\n",
"3 MARYLAND 2014\n",
"4 MARYLAND 2015"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_2 = df_1.drop('population', axis=1)\n",
"df_2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Concatenate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Concatenate two DataFrames:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>population</th>\n",
" <th>state</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 6.0</td>\n",
" <td> NY</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 6.1</td>\n",
" <td> NY</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 6.2</td>\n",
" <td> NY</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 3.0</td>\n",
" <td> FL</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 3.1</td>\n",
" <td> FL</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" population state year\n",
"0 6.0 NY 2012\n",
"1 6.1 NY 2013\n",
"2 6.2 NY 2014\n",
"3 3.0 FL 2014\n",
"4 3.1 FL 2015"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_2 = {'state' : ['NY', 'NY', 'NY', 'FL', 'FL'],\n",
" 'year' : [2012, 2013, 2014, 2014, 2015],\n",
" 'population' : [6.0, 6.1, 6.2, 3.0, 3.1]}\n",
"df_3 = DataFrame(data_2)\n",
"df_3"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>population</th>\n",
" <th>state</th>\n",
" <th>year</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 5.0</td>\n",
" <td> VIRGINIA</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 5.1</td>\n",
" <td> VIRGINIA</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 5.2</td>\n",
" <td> VIRGINIA</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 4.0</td>\n",
" <td> MARYLAND</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 4.1</td>\n",
" <td> MARYLAND</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 6.0</td>\n",
" <td> NY</td>\n",
" <td> 2012</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 6.1</td>\n",
" <td> NY</td>\n",
" <td> 2013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 6.2</td>\n",
" <td> NY</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 3.0</td>\n",
" <td> FL</td>\n",
" <td> 2014</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 3.1</td>\n",
" <td> FL</td>\n",
" <td> 2015</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" population state year\n",
"0 5.0 VIRGINIA 2012\n",
"1 5.1 VIRGINIA 2013\n",
"2 5.2 VIRGINIA 2014\n",
"3 4.0 MARYLAND 2014\n",
"4 4.1 MARYLAND 2015\n",
"0 6.0 NY 2012\n",
"1 6.1 NY 2013\n",
"2 6.2 NY 2014\n",
"3 3.0 FL 2014\n",
"4 3.1 FL 2015"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_4 = pd.concat([df_1, df_3])\n",
"df_4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Input and Output (Under Construction)\n",
"* Reading\n",
"* Writing"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from pandas import Series, DataFrame\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reading"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Read data from a CSV file into a DataFrame (use sep='\\t' for TSV):"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"df_1 = pd.read_csv(\"../data/ozone.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get a summary of the DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Ozone</th>\n",
" <th>Solar.R</th>\n",
" <th>Wind</th>\n",
" <th>Temp</th>\n",
" <th>Month</th>\n",
" <th>Day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td> 116.000000</td>\n",
" <td> 146.000000</td>\n",
" <td> 153.000000</td>\n",
" <td> 153.000000</td>\n",
" <td> 153.000000</td>\n",
" <td> 153.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td> 42.129310</td>\n",
" <td> 185.931507</td>\n",
" <td> 9.957516</td>\n",
" <td> 77.882353</td>\n",
" <td> 6.993464</td>\n",
" <td> 15.803922</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td> 32.987885</td>\n",
" <td> 90.058422</td>\n",
" <td> 3.523001</td>\n",
" <td> 9.465270</td>\n",
" <td> 1.416522</td>\n",
" <td> 8.864520</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td> 1.000000</td>\n",
" <td> 7.000000</td>\n",
" <td> 1.700000</td>\n",
" <td> 56.000000</td>\n",
" <td> 5.000000</td>\n",
" <td> 1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td> 18.000000</td>\n",
" <td> 115.750000</td>\n",
" <td> 7.400000</td>\n",
" <td> 72.000000</td>\n",
" <td> 6.000000</td>\n",
" <td> 8.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td> 31.500000</td>\n",
" <td> 205.000000</td>\n",
" <td> 9.700000</td>\n",
" <td> 79.000000</td>\n",
" <td> 7.000000</td>\n",
" <td> 16.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td> 63.250000</td>\n",
" <td> 258.750000</td>\n",
" <td> 11.500000</td>\n",
" <td> 85.000000</td>\n",
" <td> 8.000000</td>\n",
" <td> 23.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td> 168.000000</td>\n",
" <td> 334.000000</td>\n",
" <td> 20.700000</td>\n",
" <td> 97.000000</td>\n",
" <td> 9.000000</td>\n",
" <td> 31.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Ozone Solar.R Wind Temp Month Day\n",
"count 116.000000 146.000000 153.000000 153.000000 153.000000 153.000000\n",
"mean 42.129310 185.931507 9.957516 77.882353 6.993464 15.803922\n",
"std 32.987885 90.058422 3.523001 9.465270 1.416522 8.864520\n",
"min 1.000000 7.000000 1.700000 56.000000 5.000000 1.000000\n",
"25% 18.000000 115.750000 7.400000 72.000000 6.000000 8.000000\n",
"50% 31.500000 205.000000 9.700000 79.000000 7.000000 16.000000\n",
"75% 63.250000 258.750000 11.500000 85.000000 8.000000 23.000000\n",
"max 168.000000 334.000000 20.700000 97.000000 9.000000 31.000000"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_1.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"List the first five rows of the DataFrame:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Ozone</th>\n",
" <th>Solar.R</th>\n",
" <th>Wind</th>\n",
" <th>Temp</th>\n",
" <th>Month</th>\n",
" <th>Day</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 41</td>\n",
" <td> 190</td>\n",
" <td> 7.4</td>\n",
" <td> 67</td>\n",
" <td> 5</td>\n",
" <td> 1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 36</td>\n",
" <td> 118</td>\n",
" <td> 8.0</td>\n",
" <td> 72</td>\n",
" <td> 5</td>\n",
" <td> 2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 12</td>\n",
" <td> 149</td>\n",
" <td> 12.6</td>\n",
" <td> 74</td>\n",
" <td> 5</td>\n",
" <td> 3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 18</td>\n",
" <td> 313</td>\n",
" <td> 11.5</td>\n",
" <td> 62</td>\n",
" <td> 5</td>\n",
" <td> 4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>NaN</td>\n",
" <td> NaN</td>\n",
" <td> 14.3</td>\n",
" <td> 56</td>\n",
" <td> 5</td>\n",
" <td> 5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Ozone Solar.R Wind Temp Month Day\n",
"0 41 190 7.4 67 5 1\n",
"1 36 118 8.0 72 5 2\n",
"2 12 149 12.6 74 5 3\n",
"3 18 313 11.5 62 5 4\n",
"4 NaN NaN 14.3 56 5 5"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_1.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Writing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a copy of the CSV file, encoded in UTF-8 and hiding the index and header labels:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"df_1.to_csv('../data/ozone_copy.csv', \n",
" encoding='utf-8', \n",
" index=False, \n",
" header=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"View the data directory:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 16\r\n",
"-rw-r--r--@ 1 dmartin 1443163707 2902 Dec 26 2012 ozone.csv\r\n",
"-rw-r--r-- 1 dmartin 1443163707 3324 Feb 14 06:40 ozone_copy.csv\r\n"
]
}
],
"source": [
"!ls -l ../data/"
]
2015-05-17 19:36:41 +08:00
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
2015-01-28 08:22:37 +08:00
}
2015-05-17 19:36:41 +08:00
},
"nbformat": 4,
"nbformat_minor": 0
}