2015-01-27 19:22:37 -05:00
{
2015-05-17 07:36:41 -04:00
" cells " : [
2015-06-18 21:07:36 -04:00
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
2015-11-01 06:44:00 -05:00
" This notebook was prepared by [Donne Martin](http://donnemartin.com). Source and license info is on [GitHub](https://github.com/donnemartin/data-science-ipython-notebooks). "
2015-06-18 21:07:36 -04:00
]
} ,
2015-01-27 19:22:37 -05:00
{
2015-05-17 07:36:41 -04:00
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" # Pandas \n " ,
" \n " ,
2015-06-18 21:07:36 -04:00
" Credits: The following are notes taken while working through [Python for Data Analysis](http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793) by Wes McKinney \n " ,
2015-06-14 07:04:58 -04:00
" \n " ,
2015-05-17 07:36:41 -04:00
" * Series \n " ,
" * DataFrame \n " ,
" * Reindexing \n " ,
" * Dropping Entries \n " ,
" * Indexing, Selecting, Filtering \n " ,
" * Arithmetic and Data Alignment \n " ,
" * Function Application and Mapping \n " ,
" * Sorting and Ranking \n " ,
" * Axis Indices with Duplicate Values \n " ,
2015-06-12 20:51:00 -04:00
" * Summarizing and Computing Descriptive Statistics \n " ,
" * Cleaning Data (Under Construction) \n " ,
" * Input and Output (Under Construction) "
2015-05-17 07:36:41 -04:00
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 1 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [ ] ,
" source " : [
" from pandas import Series, DataFrame \n " ,
" import pandas as pd \n " ,
" import numpy as np "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Series \n " ,
" \n " ,
" A Series is a one-dimensional array-like object containing an array of data and an associated array of data labels. The data can be any NumPy data type and the labels are the Series ' index. "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Create a Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 2 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 0 1 \n " ,
" 1 1 \n " ,
" 2 2 \n " ,
" 3 -3 \n " ,
" 4 -5 \n " ,
" 5 8 \n " ,
" 6 13 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 2 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_1 = Series([1, 1, 2, -3, -5, 8, 13]) \n " ,
" ser_1 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Get the array representation of a Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 3 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" array([ 1, 1, 2, -3, -5, 8, 13]) "
]
} ,
" execution_count " : 3 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_1.values "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Index objects are immutable and hold the axis labels and metadata such as names and axis names. \n " ,
" \n " ,
" Get the index of the Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 4 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" Int64Index([0, 1, 2, 3, 4, 5, 6], dtype= ' int64 ' ) "
]
} ,
" execution_count " : 4 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_1.index "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Create a Series with a custom index: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 5 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 1 \n " ,
" b 1 \n " ,
" c 2 \n " ,
" d -3 \n " ,
" e -5 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 5 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2 = Series([1, 1, 2, -3, -5], index=[ ' a ' , ' b ' , ' c ' , ' d ' , ' e ' ]) \n " ,
" ser_2 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Get a value from a Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 6 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" True "
]
} ,
" execution_count " : 6 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2[4] == ser_2[ ' e ' ] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Get a set of values from a Series by passing in a list: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 7 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" c 2 \n " ,
" a 1 \n " ,
" b 1 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 7 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2[[ ' c ' , ' a ' , ' b ' ]] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Get values great than 0: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 8 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 1 \n " ,
" b 1 \n " ,
" c 2 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 8 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2[ser_2 > 0] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Scalar multiply: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 9 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 2 \n " ,
" b 2 \n " ,
" c 4 \n " ,
" d -6 \n " ,
" e -10 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 9 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2 * 2 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Apply a numpy math function: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 10 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 2.718282 \n " ,
" b 2.718282 \n " ,
" c 7.389056 \n " ,
" d 0.049787 \n " ,
" e 0.006738 \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 10 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" import numpy as np \n " ,
" np.exp(ser_2) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" A Series is like a fixed-length, ordered dict. \n " ,
" \n " ,
" Create a series by passing in a dict: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 11 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" bar 200 \n " ,
" baz 300 \n " ,
" foo 100 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 11 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" dict_1 = { ' foo ' : 100, ' bar ' : 200, ' baz ' : 300} \n " ,
" ser_3 = Series(dict_1) \n " ,
" ser_3 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Re-order a Series by passing in an index (indices not found are NaN): "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 12 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" foo 100 \n " ,
" bar 200 \n " ,
" baz 300 \n " ,
" qux NaN \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 12 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" index = [ ' foo ' , ' bar ' , ' baz ' , ' qux ' ] \n " ,
" ser_4 = Series(dict_1, index=index) \n " ,
" ser_4 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Check for NaN with the pandas method: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 13 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" foo False \n " ,
" bar False \n " ,
" baz False \n " ,
" qux True \n " ,
" dtype: bool "
]
} ,
" execution_count " : 13 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" pd.isnull(ser_4) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Check for NaN with the Series method: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 14 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" foo False \n " ,
" bar False \n " ,
" baz False \n " ,
" qux True \n " ,
" dtype: bool "
]
} ,
" execution_count " : 14 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_4.isnull() "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Series automatically aligns differently indexed data in arithmetic operations: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 15 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" bar 400 \n " ,
" baz 600 \n " ,
" foo 200 \n " ,
" qux NaN \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 15 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_3 + ser_4 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Name a Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 16 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [ ] ,
" source " : [
" ser_4.name = ' foobarbazqux ' "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Name a Series index: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 17 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [ ] ,
" source " : [
" ser_4.index.name = ' label ' "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 18 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" label \n " ,
" foo 100 \n " ,
" bar 200 \n " ,
" baz 300 \n " ,
" qux NaN \n " ,
" Name: foobarbazqux, dtype: float64 "
]
} ,
" execution_count " : 18 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_4 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Rename a Series ' index in place: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 19 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" fo 100 \n " ,
" br 200 \n " ,
" bz 300 \n " ,
" qx NaN \n " ,
" Name: foobarbazqux, dtype: float64 "
]
} ,
" execution_count " : 19 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_4.index = [ ' fo ' , ' br ' , ' bz ' , ' qx ' ] \n " ,
" ser_4 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## DataFrame \n " ,
" \n " ,
" A DataFrame is a tabular data structure containing an ordered collection of columns. Each column can have a different type. DataFrames have both row and column indices and is analogous to a dict of Series. Row and column operations are treated roughly symmetrically. Columns returned when indexing a DataFrame are views of the underlying data, not a copy. To obtain a copy, use the Series ' copy method. \n " ,
" \n " ,
" Create a DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 20 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>pop</th> \n " ,
" <th>state</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 5.0</td> \n " ,
" <td> VA</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 5.1</td> \n " ,
" <td> VA</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 5.2</td> \n " ,
" <td> VA</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 4.0</td> \n " ,
" <td> MD</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 4.1</td> \n " ,
" <td> MD</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" pop state year \n " ,
" 0 5.0 VA 2012 \n " ,
" 1 5.1 VA 2013 \n " ,
" 2 5.2 VA 2014 \n " ,
" 3 4.0 MD 2014 \n " ,
" 4 4.1 MD 2015 "
]
} ,
" execution_count " : 20 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" data_1 = { ' state ' : [ ' VA ' , ' VA ' , ' VA ' , ' MD ' , ' MD ' ], \n " ,
" ' year ' : [2012, 2013, 2014, 2014, 2015], \n " ,
" ' pop ' : [5.0, 5.1, 5.2, 4.0, 4.1]} \n " ,
" df_1 = DataFrame(data_1) \n " ,
" df_1 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Create a DataFrame specifying a sequence of columns: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 21 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>year</th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 2012</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 2013</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 2014</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 2014</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 2015</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" year state pop \n " ,
" 0 2012 VA 5.0 \n " ,
" 1 2013 VA 5.1 \n " ,
" 2 2014 VA 5.2 \n " ,
" 3 2014 MD 4.0 \n " ,
" 4 2015 MD 4.1 "
]
} ,
" execution_count " : 21 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_2 = DataFrame(data_1, columns=[ ' year ' , ' state ' , ' pop ' ]) \n " ,
" df_2 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Like Series, columns that are not present in the data are NaN: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 22 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>year</th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 2012</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 2013</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 2014</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 2014</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 2015</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" year state pop unempl \n " ,
" 0 2012 VA 5.0 NaN \n " ,
" 1 2013 VA 5.1 NaN \n " ,
" 2 2014 VA 5.2 NaN \n " ,
" 3 2014 MD 4.0 NaN \n " ,
" 4 2015 MD 4.1 NaN "
]
} ,
" execution_count " : 22 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3 = DataFrame(data_1, columns=[ ' year ' , ' state ' , ' pop ' , ' unempl ' ]) \n " ,
" df_3 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Retrieve a column by key, returning a Series: \n "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 23 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 0 VA \n " ,
" 1 VA \n " ,
" 2 VA \n " ,
" 3 MD \n " ,
" 4 MD \n " ,
" Name: state, dtype: object "
]
} ,
" execution_count " : 23 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3[ ' state ' ] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Retrive a column by attribute, returning a Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 24 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 0 2012 \n " ,
" 1 2013 \n " ,
" 2 2014 \n " ,
" 3 2014 \n " ,
" 4 2015 \n " ,
" Name: year, dtype: int64 "
]
} ,
" execution_count " : 24 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3.year "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Retrieve a row by position: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 25 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" year 2012 \n " ,
" state VA \n " ,
" pop 5 \n " ,
" unempl NaN \n " ,
" Name: 0, dtype: object "
]
} ,
" execution_count " : 25 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3.ix[0] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Update a column by assignment: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 26 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>year</th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 2012</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> 0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 2013</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> 1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 2014</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 2</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 2014</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 3</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 2015</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 4</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" year state pop unempl \n " ,
" 0 2012 VA 5.0 0 \n " ,
" 1 2013 VA 5.1 1 \n " ,
" 2 2014 VA 5.2 2 \n " ,
" 3 2014 MD 4.0 3 \n " ,
" 4 2015 MD 4.1 4 "
]
} ,
" execution_count " : 26 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3[ ' unempl ' ] = np.arange(5) \n " ,
" df_3 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Assign a Series to a column (note if assigning a list or array, the length must match the DataFrame, unlike a Series): "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 27 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>year</th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 2012</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 2013</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 2014</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 2014</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 2015</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" year state pop unempl \n " ,
" 0 2012 VA 5.0 NaN \n " ,
" 1 2013 VA 5.1 NaN \n " ,
" 2 2014 VA 5.2 6.0 \n " ,
" 3 2014 MD 4.0 6.0 \n " ,
" 4 2015 MD 4.1 6.1 "
]
} ,
" execution_count " : 27 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" unempl = Series([6.0, 6.0, 6.1], index=[2, 3, 4]) \n " ,
" df_3[ ' unempl ' ] = unempl \n " ,
" df_3 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Assign a new column that doesn ' t exist to create a new column: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 28 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>year</th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>state_dup</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 2012</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" <td> VA</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 2013</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" <td> VA</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 2014</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> VA</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 2014</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> MD</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 2015</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" <td> MD</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" year state pop unempl state_dup \n " ,
" 0 2012 VA 5.0 NaN VA \n " ,
" 1 2013 VA 5.1 NaN VA \n " ,
" 2 2014 VA 5.2 6.0 VA \n " ,
" 3 2014 MD 4.0 6.0 MD \n " ,
" 4 2015 MD 4.1 6.1 MD "
]
} ,
" execution_count " : 28 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3[ ' state_dup ' ] = df_3[ ' state ' ] \n " ,
" df_3 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Delete a column: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 29 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>year</th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 2012</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 2013</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 2014</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 2014</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 2015</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" year state pop unempl \n " ,
" 0 2012 VA 5.0 NaN \n " ,
" 1 2013 VA 5.1 NaN \n " ,
" 2 2014 VA 5.2 6.0 \n " ,
" 3 2014 MD 4.0 6.0 \n " ,
" 4 2015 MD 4.1 6.1 "
]
} ,
" execution_count " : 29 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" del df_3[ ' state_dup ' ] \n " ,
" df_3 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Create a DataFrame from a nested dict of dicts (the keys in the inner dicts are unioned and sorted to form the index in the result, unless an explicit index is specified): "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 30 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>MD</th> \n " ,
" <th>VA</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>2013</th> \n " ,
" <td> NaN</td> \n " ,
" <td> 5.1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2014</th> \n " ,
" <td> 4.0</td> \n " ,
" <td> 5.2</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2015</th> \n " ,
" <td> 4.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" MD VA \n " ,
" 2013 NaN 5.1 \n " ,
" 2014 4.0 5.2 \n " ,
" 2015 4.1 NaN "
]
} ,
" execution_count " : 30 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" pop = { ' VA ' : { 2013 : 5.1, 2014 : 5.2}, \n " ,
" ' MD ' : { 2014 : 4.0, 2015 : 4.1}} \n " ,
" df_4 = DataFrame(pop) \n " ,
" df_4 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Transpose the DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 31 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>2013</th> \n " ,
" <th>2014</th> \n " ,
" <th>2015</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>MD</th> \n " ,
" <td> NaN</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 4.1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>VA</th> \n " ,
" <td> 5.1</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" 2013 2014 2015 \n " ,
" MD NaN 4.0 4.1 \n " ,
" VA 5.1 5.2 NaN "
]
} ,
" execution_count " : 31 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_4.T "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Create a DataFrame from a dict of Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 32 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>MD</th> \n " ,
" <th>VA</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>2014</th> \n " ,
" <td> NaN</td> \n " ,
" <td> 5.2</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2015</th> \n " ,
" <td> 4.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" MD VA \n " ,
" 2014 NaN 5.2 \n " ,
" 2015 4.1 NaN "
]
} ,
" execution_count " : 32 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" data_2 = { ' VA ' : df_4[ ' VA ' ][1:], \n " ,
" ' MD ' : df_4[ ' MD ' ][2:]} \n " ,
" df_5 = DataFrame(data_2) \n " ,
" df_5 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Set the DataFrame index name: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 33 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>MD</th> \n " ,
" <th>VA</th> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>year</th> \n " ,
" <th></th> \n " ,
" <th></th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>2014</th> \n " ,
" <td> NaN</td> \n " ,
" <td> 5.2</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2015</th> \n " ,
" <td> 4.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" MD VA \n " ,
" year \n " ,
" 2014 NaN 5.2 \n " ,
" 2015 4.1 NaN "
]
} ,
" execution_count " : 33 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_5.index.name = ' year ' \n " ,
" df_5 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Set the DataFrame columns name: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 34 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th>state</th> \n " ,
" <th>MD</th> \n " ,
" <th>VA</th> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>year</th> \n " ,
" <th></th> \n " ,
" <th></th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>2014</th> \n " ,
" <td> NaN</td> \n " ,
" <td> 5.2</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2015</th> \n " ,
" <td> 4.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state MD VA \n " ,
" year \n " ,
" 2014 NaN 5.2 \n " ,
" 2015 4.1 NaN "
]
} ,
" execution_count " : 34 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_5.columns.name = ' state ' \n " ,
" df_5 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Return the data contained in a DataFrame as a 2D ndarray: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 35 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" array([[ nan, 5.2], \n " ,
" [ 4.1, nan]]) "
]
} ,
" execution_count " : 35 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_5.values "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" If the columns are different dtypes, the 2D ndarray ' s dtype will accomodate all of the columns: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 36 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" array([[2012, ' VA ' , 5.0, nan], \n " ,
" [2013, ' VA ' , 5.1, nan], \n " ,
" [2014, ' VA ' , 5.2, 6.0], \n " ,
" [2014, ' MD ' , 4.0, 6.0], \n " ,
" [2015, ' MD ' , 4.1, 6.1]], dtype=object) "
]
} ,
" execution_count " : 36 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3.values "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Reindexing "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Create a new object with the data conformed to a new index. Any missing values are set to NaN. "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 37 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>year</th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 2012</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 2013</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 2014</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 2014</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 2015</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" year state pop unempl \n " ,
" 0 2012 VA 5.0 NaN \n " ,
" 1 2013 VA 5.1 NaN \n " ,
" 2 2014 VA 5.2 6.0 \n " ,
" 3 2014 MD 4.0 6.0 \n " ,
" 4 2015 MD 4.1 6.1 "
]
} ,
" execution_count " : 37 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Reindexing rows returns a new frame with the specified index: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 38 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>year</th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 2015</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 2014</td> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 2014</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 2013</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 2012</td> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" year state pop unempl \n " ,
" 5 NaN NaN NaN NaN \n " ,
" 4 2015 MD 4.1 6.1 \n " ,
" 3 2014 MD 4.0 6.0 \n " ,
" 2 2014 VA 5.2 6.0 \n " ,
" 1 2013 VA 5.1 NaN \n " ,
" 0 2012 VA 5.0 NaN "
]
} ,
" execution_count " : 38 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3.reindex(list(reversed(range(0, 6)))) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Missing values can be set to something other than NaN: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 39 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>year</th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" Empty DataFrame \n " ,
" Columns: [year, state, pop, unempl] \n " ,
" Index: [] "
]
} ,
" execution_count " : 39 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3.reindex(range(6, 0), fill_value=0) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Interpolate ordered data like a time series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 40 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [ ] ,
" source " : [
" ser_5 = Series([ ' foo ' , ' bar ' , ' baz ' ], index=[0, 2, 4]) "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 41 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 0 foo \n " ,
" 1 foo \n " ,
" 2 bar \n " ,
" 3 bar \n " ,
" 4 baz \n " ,
" dtype: object "
]
} ,
" execution_count " : 41 ,
" metadata " : { } ,
" output_type " : " execute_result "
2015-01-27 19:22:37 -05:00
}
2015-01-31 06:30:06 -05:00
] ,
2015-05-17 07:36:41 -04:00
" source " : [
" ser_5.reindex(range(5), method= ' ffill ' ) "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 42 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 0 foo \n " ,
" 1 bar \n " ,
" 2 bar \n " ,
" 3 baz \n " ,
" 4 baz \n " ,
" dtype: object "
]
} ,
" execution_count " : 42 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_5.reindex(range(5), method= ' bfill ' ) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Reindex columns: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 43 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 0 VA 5.0 NaN 2012 \n " ,
" 1 VA 5.1 NaN 2013 \n " ,
" 2 VA 5.2 6.0 2014 \n " ,
" 3 MD 4.0 6.0 2014 \n " ,
" 4 MD 4.1 6.1 2015 "
]
} ,
" execution_count " : 43 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3.reindex(columns=[ ' state ' , ' pop ' , ' unempl ' , ' year ' ]) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Reindex rows and columns while filling rows: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 44 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> 0</td> \n " ,
" <td> 0.0</td> \n " ,
" <td> 0.0</td> \n " ,
" <td> 0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 5 0 0.0 0.0 0 \n " ,
" 4 MD 4.1 6.1 2015 \n " ,
" 3 MD 4.0 6.0 2014 \n " ,
" 2 VA 5.2 6.0 2014 \n " ,
" 1 VA 5.1 NaN 2013 \n " ,
" 0 VA 5.0 NaN 2012 "
]
} ,
" execution_count " : 44 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_3.reindex(index=list(reversed(range(0, 6))), \n " ,
" fill_value=0, \n " ,
" columns=[ ' state ' , ' pop ' , ' unempl ' , ' year ' ]) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Reindex using ix: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 45 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 0 VA 5.0 NaN 2012 \n " ,
" 1 VA 5.1 NaN 2013 \n " ,
" 2 VA 5.2 6.0 2014 \n " ,
" 3 MD 4.0 6.0 2014 \n " ,
" 4 MD 4.1 6.1 2015 \n " ,
" 5 NaN NaN NaN NaN \n " ,
" 6 NaN NaN NaN NaN "
]
} ,
" execution_count " : 45 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6 = df_3.ix[range(0, 7), [ ' state ' , ' pop ' , ' unempl ' , ' year ' ]] \n " ,
" df_6 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Dropping Entries "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Drop rows from a Series or DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 46 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 2 VA 5.2 6.0 2014 \n " ,
" 3 MD 4.0 6.0 2014 \n " ,
" 4 MD 4.1 6.1 2015 \n " ,
" 5 NaN NaN NaN NaN \n " ,
" 6 NaN NaN NaN NaN "
]
} ,
" execution_count " : 46 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_7 = df_6.drop([0, 1]) \n " ,
" df_7 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Drop columns from a DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 47 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop year \n " ,
" 2 VA 5.2 2014 \n " ,
" 3 MD 4.0 2014 \n " ,
" 4 MD 4.1 2015 \n " ,
" 5 NaN NaN NaN \n " ,
" 6 NaN NaN NaN "
]
} ,
" execution_count " : 47 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_7 = df_7.drop( ' unempl ' , axis=1) \n " ,
" df_7 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Indexing, Selecting, Filtering "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Series indexing is similar to NumPy array indexing with the added bonus of being able to use the Series ' index values. "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 48 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 1 \n " ,
" b 1 \n " ,
" c 2 \n " ,
" d -3 \n " ,
" e -5 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 48 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select a value from a Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 49 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" True "
]
} ,
" execution_count " : 49 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2[0] == ser_2[ ' a ' ] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select a slice from a Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 50 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" b 1 \n " ,
" c 2 \n " ,
" d -3 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 50 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2[1:4] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select specific values from a Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 51 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" b 1 \n " ,
" c 2 \n " ,
" d -3 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 51 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2[[ ' b ' , ' c ' , ' d ' ]] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select from a Series based on a filter: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 52 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 1 \n " ,
" b 1 \n " ,
" c 2 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 52 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2[ser_2 > 0] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select a slice from a Series with labels (note the end point is inclusive): "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 53 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 1 \n " ,
" b 1 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 53 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2[ ' a ' : ' b ' ] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Assign to a Series slice (note the end point is inclusive): "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 54 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 0 \n " ,
" b 0 \n " ,
" c 2 \n " ,
" d -3 \n " ,
" e -5 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 54 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_2[ ' a ' : ' b ' ] = 0 \n " ,
" ser_2 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Pandas supports indexing into a DataFrame. "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 55 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 0 VA 5.0 NaN 2012 \n " ,
" 1 VA 5.1 NaN 2013 \n " ,
" 2 VA 5.2 6.0 2014 \n " ,
" 3 MD 4.0 6.0 2014 \n " ,
" 4 MD 4.1 6.1 2015 \n " ,
" 5 NaN NaN NaN NaN \n " ,
" 6 NaN NaN NaN NaN "
]
} ,
" execution_count " : 55 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select specified columns from a DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 56 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" pop unempl \n " ,
" 0 5.0 NaN \n " ,
" 1 5.1 NaN \n " ,
" 2 5.2 6.0 \n " ,
" 3 4.0 6.0 \n " ,
" 4 4.1 6.1 \n " ,
" 5 NaN NaN \n " ,
" 6 NaN NaN "
]
} ,
" execution_count " : 56 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6[[ ' pop ' , ' unempl ' ]] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select a slice from a DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 57 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td>NaN</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td>NaN</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 0 VA 5.0 NaN 2012 \n " ,
" 1 VA 5.1 NaN 2013 "
]
} ,
" execution_count " : 57 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6[:2] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select from a DataFrame based on a filter: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 58 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td>NaN</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 1 VA 5.1 NaN 2013 \n " ,
" 2 VA 5.2 6 2014 "
]
} ,
" execution_count " : 58 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6[df_6[ ' pop ' ] > 5] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Perform a scalar comparison on a DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 59 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> True</td> \n " ,
" <td> False</td> \n " ,
" <td> False</td> \n " ,
" <td> True</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> True</td> \n " ,
" <td> True</td> \n " ,
" <td> False</td> \n " ,
" <td> True</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> True</td> \n " ,
" <td> True</td> \n " ,
" <td> True</td> \n " ,
" <td> True</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> True</td> \n " ,
" <td> False</td> \n " ,
" <td> True</td> \n " ,
" <td> True</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> True</td> \n " ,
" <td> False</td> \n " ,
" <td> True</td> \n " ,
" <td> True</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> False</td> \n " ,
" <td> False</td> \n " ,
" <td> False</td> \n " ,
" <td> False</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> False</td> \n " ,
" <td> False</td> \n " ,
" <td> False</td> \n " ,
" <td> False</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 0 True False False True \n " ,
" 1 True True False True \n " ,
" 2 True True True True \n " ,
" 3 True False True True \n " ,
" 4 True False True True \n " ,
" 5 False False False False \n " ,
" 6 False False False False "
]
} ,
" execution_count " : 59 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6 > 5 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Perform a scalar comparison on a DataFrame, retain the values that pass the filter: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 60 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> VA</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MD</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 6.1</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 0 VA NaN NaN 2012 \n " ,
" 1 VA 5.1 NaN 2013 \n " ,
" 2 VA 5.2 6.0 2014 \n " ,
" 3 MD NaN 6.0 2014 \n " ,
" 4 MD NaN 6.1 2015 \n " ,
" 5 NaN NaN NaN NaN \n " ,
" 6 NaN NaN NaN NaN "
]
} ,
" execution_count " : 60 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6[df_6 > 5] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select a slice of rows from a DataFrame (note the end point is inclusive): "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 61 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 2 VA 5.2 6 2014 \n " ,
" 3 MD 4.0 6 2014 "
]
} ,
" execution_count " : 61 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6.ix[2:3] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select a slice of rows from a specific column of a DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 62 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 0 VA 5.0 NaN 2012 \n " ,
" 1 VA 5.1 NaN 2013 \n " ,
" 2 VA 5.2 6.0 2014 \n " ,
" 3 MD 4.0 6.0 2014 \n " ,
" 4 MD 4.1 6.1 2015 \n " ,
" 5 NaN NaN NaN NaN \n " ,
" 6 NaN NaN NaN NaN "
]
} ,
" execution_count " : 62 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6.ix[0:2, ' pop ' ] \n " ,
" df_6 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select rows based on an arithmetic operation on a specific row: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 63 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 2 VA 5.2 6.0 2014 \n " ,
" 3 MD 4.0 6.0 2014 \n " ,
" 4 MD 4.1 6.1 2015 "
]
} ,
" execution_count " : 63 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6.ix[df_6.unempl > 5.0] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Arithmetic and Data Alignment "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Adding Series objects results in the union of index pairs if the pairs are not the same, resulting in NaN for indices that do not overlap: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 64 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 1.764052 \n " ,
" b 0.400157 \n " ,
" c 0.978738 \n " ,
" d 2.240893 \n " ,
" e 1.867558 \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 64 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" np.random.seed(0) \n " ,
" ser_6 = Series(np.random.randn(5), \n " ,
" index=[ ' a ' , ' b ' , ' c ' , ' d ' , ' e ' ]) \n " ,
" ser_6 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 65 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 1.624345 \n " ,
" c -0.611756 \n " ,
" e -0.528172 \n " ,
" f -1.072969 \n " ,
" g 0.865408 \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 65 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" np.random.seed(1) \n " ,
" ser_7 = Series(np.random.randn(5), \n " ,
" index=[ ' a ' , ' c ' , ' e ' , ' f ' , ' g ' ]) \n " ,
" ser_7 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 66 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 3.388398 \n " ,
" b NaN \n " ,
" c 0.366982 \n " ,
" d NaN \n " ,
" e 1.339386 \n " ,
" f NaN \n " ,
" g NaN \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 66 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_6 + ser_7 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Set a fill value instead of NaN for indices that do not overlap: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 67 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 3.388398 \n " ,
" b 0.400157 \n " ,
" c 0.366982 \n " ,
" d 2.240893 \n " ,
" e 1.339386 \n " ,
" f -1.072969 \n " ,
" g 0.865408 \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 67 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_6.add(ser_7, fill_value=0) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Adding DataFrame objects results in the union of index pairs for rows and columns if the pairs are not the same, resulting in NaN for indices that do not overlap: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 68 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 0.548814</td> \n " ,
" <td> 0.715189</td> \n " ,
" <td> 0.602763</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 0.544883</td> \n " ,
" <td> 0.423655</td> \n " ,
" <td> 0.645894</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 0.437587</td> \n " ,
" <td> 0.891773</td> \n " ,
" <td> 0.963663</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" a b c \n " ,
" 0 0.548814 0.715189 0.602763 \n " ,
" 1 0.544883 0.423655 0.645894 \n " ,
" 2 0.437587 0.891773 0.963663 "
]
} ,
" execution_count " : 68 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" np.random.seed(0) \n " ,
" df_8 = DataFrame(np.random.rand(9).reshape((3, 3)), \n " ,
" columns=[ ' a ' , ' b ' , ' c ' ]) \n " ,
" df_8 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 69 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 0.417022</td> \n " ,
" <td> 0.720324</td> \n " ,
" <td> 0.000114</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 0.302333</td> \n " ,
" <td> 0.146756</td> \n " ,
" <td> 0.092339</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 0.186260</td> \n " ,
" <td> 0.345561</td> \n " ,
" <td> 0.396767</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" b c d \n " ,
" 0 0.417022 0.720324 0.000114 \n " ,
" 1 0.302333 0.146756 0.092339 \n " ,
" 2 0.186260 0.345561 0.396767 "
]
} ,
" execution_count " : 69 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" np.random.seed(1) \n " ,
" df_9 = DataFrame(np.random.rand(9).reshape((3, 3)), \n " ,
" columns=[ ' b ' , ' c ' , ' d ' ]) \n " ,
" df_9 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 70 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td>NaN</td> \n " ,
" <td> 1.132211</td> \n " ,
" <td> 1.323088</td> \n " ,
" <td>NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td>NaN</td> \n " ,
" <td> 0.725987</td> \n " ,
" <td> 0.792650</td> \n " ,
" <td>NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td>NaN</td> \n " ,
" <td> 1.078033</td> \n " ,
" <td> 1.309223</td> \n " ,
" <td>NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" a b c d \n " ,
" 0 NaN 1.132211 1.323088 NaN \n " ,
" 1 NaN 0.725987 0.792650 NaN \n " ,
" 2 NaN 1.078033 1.309223 NaN "
]
} ,
" execution_count " : 70 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_8 + df_9 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Set a fill value instead of NaN for indices that do not overlap: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 71 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 0.548814</td> \n " ,
" <td> 1.132211</td> \n " ,
" <td> 1.323088</td> \n " ,
" <td> 0.000114</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 0.544883</td> \n " ,
" <td> 0.725987</td> \n " ,
" <td> 0.792650</td> \n " ,
" <td> 0.092339</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 0.437587</td> \n " ,
" <td> 1.078033</td> \n " ,
" <td> 1.309223</td> \n " ,
" <td> 0.396767</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" a b c d \n " ,
" 0 0.548814 1.132211 1.323088 0.000114 \n " ,
" 1 0.544883 0.725987 0.792650 0.092339 \n " ,
" 2 0.437587 1.078033 1.309223 0.396767 "
]
} ,
" execution_count " : 71 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_10 = df_8.add(df_9, fill_value=0) \n " ,
" df_10 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Like NumPy, pandas supports arithmetic operations between DataFrames and Series. \n " ,
" \n " ,
" Match the index of the Series on the DataFrame ' s columns, broadcasting down the rows: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 72 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 0.000000</td> \n " ,
" <td> 0.000000</td> \n " ,
" <td> 0.000000</td> \n " ,
" <td> 0.000000</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td>-0.003930</td> \n " ,
" <td>-0.406224</td> \n " ,
" <td>-0.530438</td> \n " ,
" <td> 0.092224</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td>-0.111226</td> \n " ,
" <td>-0.054178</td> \n " ,
" <td>-0.013864</td> \n " ,
" <td> 0.396653</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" a b c d \n " ,
" 0 0.000000 0.000000 0.000000 0.000000 \n " ,
" 1 -0.003930 -0.406224 -0.530438 0.092224 \n " ,
" 2 -0.111226 -0.054178 -0.013864 0.396653 "
]
} ,
" execution_count " : 72 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_8 = df_10.ix[0] \n " ,
" df_11 = df_10 - ser_8 \n " ,
" df_11 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Match the index of the Series on the DataFrame ' s columns, broadcasting down the rows and union the indices that do not match: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 73 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 0 \n " ,
" d 1 \n " ,
" e 2 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 73 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_9 = Series(range(3), index=[ ' a ' , ' d ' , ' e ' ]) \n " ,
" ser_9 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 74 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" <th>d</th> \n " ,
" <th>e</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 0.000000</td> \n " ,
" <td>NaN</td> \n " ,
" <td>NaN</td> \n " ,
" <td>-1.000000</td> \n " ,
" <td>NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td>-0.003930</td> \n " ,
" <td>NaN</td> \n " ,
" <td>NaN</td> \n " ,
" <td>-0.907776</td> \n " ,
" <td>NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td>-0.111226</td> \n " ,
" <td>NaN</td> \n " ,
" <td>NaN</td> \n " ,
" <td>-0.603347</td> \n " ,
" <td>NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" a b c d e \n " ,
" 0 0.000000 NaN NaN -1.000000 NaN \n " ,
" 1 -0.003930 NaN NaN -0.907776 NaN \n " ,
" 2 -0.111226 NaN NaN -0.603347 NaN "
]
} ,
" execution_count " : 74 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_11 - ser_9 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Broadcast over the columns and match the rows (axis=0) by using an arithmetic method: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 75 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 0.548814</td> \n " ,
" <td> 1.132211</td> \n " ,
" <td> 1.323088</td> \n " ,
" <td> 0.000114</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 0.544883</td> \n " ,
" <td> 0.725987</td> \n " ,
" <td> 0.792650</td> \n " ,
" <td> 0.092339</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 0.437587</td> \n " ,
" <td> 1.078033</td> \n " ,
" <td> 1.309223</td> \n " ,
" <td> 0.396767</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" a b c d \n " ,
" 0 0.548814 1.132211 1.323088 0.000114 \n " ,
" 1 0.544883 0.725987 0.792650 0.092339 \n " ,
" 2 0.437587 1.078033 1.309223 0.396767 "
]
} ,
" execution_count " : 75 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_10 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 76 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 0 100 \n " ,
" 1 200 \n " ,
" 2 300 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 76 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_10 = Series([100, 200, 300]) \n " ,
" ser_10 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 77 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> -99.451186</td> \n " ,
" <td> -98.867789</td> \n " ,
" <td> -98.676912</td> \n " ,
" <td> -99.999886</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td>-199.455117</td> \n " ,
" <td>-199.274013</td> \n " ,
" <td>-199.207350</td> \n " ,
" <td>-199.907661</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td>-299.562413</td> \n " ,
" <td>-298.921967</td> \n " ,
" <td>-298.690777</td> \n " ,
" <td>-299.603233</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" a b c d \n " ,
" 0 -99.451186 -98.867789 -98.676912 -99.999886 \n " ,
" 1 -199.455117 -199.274013 -199.207350 -199.907661 \n " ,
" 2 -299.562413 -298.921967 -298.690777 -299.603233 "
]
} ,
" execution_count " : 77 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_10.sub(ser_10, axis=0) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Function Application and Mapping "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" NumPy ufuncs (element-wise array methods) operate on pandas objects: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 78 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 0.000000</td> \n " ,
" <td> 0.000000</td> \n " ,
" <td> 0.000000</td> \n " ,
" <td> 0.000000</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 0.003930</td> \n " ,
" <td> 0.406224</td> \n " ,
" <td> 0.530438</td> \n " ,
" <td> 0.092224</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 0.111226</td> \n " ,
" <td> 0.054178</td> \n " ,
" <td> 0.013864</td> \n " ,
" <td> 0.396653</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" a b c d \n " ,
" 0 0.000000 0.000000 0.000000 0.000000 \n " ,
" 1 0.003930 0.406224 0.530438 0.092224 \n " ,
" 2 0.111226 0.054178 0.013864 0.396653 "
]
} ,
" execution_count " : 78 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_11 = np.abs(df_11) \n " ,
" df_11 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Apply a function on 1D arrays to each column: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 79 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" a 0.111226 \n " ,
" b 0.406224 \n " ,
" c 0.530438 \n " ,
" d 0.396653 \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 79 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" func_1 = lambda x: x.max() - x.min() \n " ,
" df_11.apply(func_1) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Apply a function on 1D arrays to each row: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 80 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 0 0.000000 \n " ,
" 1 0.526508 \n " ,
" 2 0.382789 \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 80 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_11.apply(func_1, axis=1) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Apply a function and return a DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 81 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>min</th> \n " ,
" <td> 0.000000</td> \n " ,
" <td> 0.000000</td> \n " ,
" <td> 0.000000</td> \n " ,
" <td> 0.000000</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>max</th> \n " ,
" <td> 0.111226</td> \n " ,
" <td> 0.406224</td> \n " ,
" <td> 0.530438</td> \n " ,
" <td> 0.396653</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" a b c d \n " ,
" min 0.000000 0.000000 0.000000 0.000000 \n " ,
" max 0.111226 0.406224 0.530438 0.396653 "
]
} ,
" execution_count " : 81 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" func_2 = lambda x: Series([x.min(), x.max()], index=[ ' min ' , ' max ' ]) \n " ,
" df_11.apply(func_2) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Apply an element-wise Python function to a DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 82 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>c</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 0.00</td> \n " ,
" <td> 0.00</td> \n " ,
" <td> 0.00</td> \n " ,
" <td> 0.00</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 0.00</td> \n " ,
" <td> 0.41</td> \n " ,
" <td> 0.53</td> \n " ,
" <td> 0.09</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 0.11</td> \n " ,
" <td> 0.05</td> \n " ,
" <td> 0.01</td> \n " ,
" <td> 0.40</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" a b c d \n " ,
" 0 0.00 0.00 0.00 0.00 \n " ,
" 1 0.00 0.41 0.53 0.09 \n " ,
" 2 0.11 0.05 0.01 0.40 "
]
} ,
" execution_count " : 82 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" func_3 = lambda x: ' %.2f ' %x \n " ,
" df_11.applymap(func_3) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Apply an element-wise Python function to a Series: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 83 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 0 0.00 \n " ,
" 1 0.00 \n " ,
" 2 0.11 \n " ,
" Name: a, dtype: object "
]
} ,
" execution_count " : 83 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_11[ ' a ' ].map(func_3) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Sorting and Ranking "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 84 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" fo 100 \n " ,
" br 200 \n " ,
" bz 300 \n " ,
" qx NaN \n " ,
" Name: foobarbazqux, dtype: float64 "
]
} ,
" execution_count " : 84 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_4 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Sort a Series by its index: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 85 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" br 200 \n " ,
" bz 300 \n " ,
" fo 100 \n " ,
" qx NaN \n " ,
" Name: foobarbazqux, dtype: float64 "
]
} ,
" execution_count " : 85 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_4.sort_index() "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Sort a Series by its values: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 86 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" fo 100 \n " ,
" br 200 \n " ,
" bz 300 \n " ,
" qx NaN \n " ,
" Name: foobarbazqux, dtype: float64 "
]
} ,
" execution_count " : 86 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_4.order() "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 87 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>c</th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>three</th> \n " ,
" <td> 0</td> \n " ,
" <td> 1</td> \n " ,
" <td> 2</td> \n " ,
" <td> 3</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>one</th> \n " ,
" <td> 4</td> \n " ,
" <td> 5</td> \n " ,
" <td> 6</td> \n " ,
" <td> 7</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>two</th> \n " ,
" <td> 8</td> \n " ,
" <td> 9</td> \n " ,
" <td> 10</td> \n " ,
" <td> 11</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" c a b d \n " ,
" three 0 1 2 3 \n " ,
" one 4 5 6 7 \n " ,
" two 8 9 10 11 "
]
} ,
" execution_count " : 87 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_12 = DataFrame(np.arange(12).reshape((3, 4)), \n " ,
" index=[ ' three ' , ' one ' , ' two ' ], \n " ,
" columns=[ ' c ' , ' a ' , ' b ' , ' d ' ]) \n " ,
" df_12 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Sort a DataFrame by its index: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 88 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>c</th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>one</th> \n " ,
" <td> 4</td> \n " ,
" <td> 5</td> \n " ,
" <td> 6</td> \n " ,
" <td> 7</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>three</th> \n " ,
" <td> 0</td> \n " ,
" <td> 1</td> \n " ,
" <td> 2</td> \n " ,
" <td> 3</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>two</th> \n " ,
" <td> 8</td> \n " ,
" <td> 9</td> \n " ,
" <td> 10</td> \n " ,
" <td> 11</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" c a b d \n " ,
" one 4 5 6 7 \n " ,
" three 0 1 2 3 \n " ,
" two 8 9 10 11 "
]
} ,
" execution_count " : 88 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_12.sort_index() "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Sort a DataFrame by columns in descending order: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 89 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>d</th> \n " ,
" <th>c</th> \n " ,
" <th>b</th> \n " ,
" <th>a</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>three</th> \n " ,
" <td> 3</td> \n " ,
" <td> 0</td> \n " ,
" <td> 2</td> \n " ,
" <td> 1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>one</th> \n " ,
" <td> 7</td> \n " ,
" <td> 4</td> \n " ,
" <td> 6</td> \n " ,
" <td> 5</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>two</th> \n " ,
" <td> 11</td> \n " ,
" <td> 8</td> \n " ,
" <td> 10</td> \n " ,
" <td> 9</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" d c b a \n " ,
" three 3 0 2 1 \n " ,
" one 7 4 6 5 \n " ,
" two 11 8 10 9 "
]
} ,
" execution_count " : 89 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_12.sort_index(axis=1, ascending=False) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Sort a DataFrame ' s values by column: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 90 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>c</th> \n " ,
" <th>a</th> \n " ,
" <th>b</th> \n " ,
" <th>d</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>three</th> \n " ,
" <td> 0</td> \n " ,
" <td> 1</td> \n " ,
" <td> 2</td> \n " ,
" <td> 3</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>one</th> \n " ,
" <td> 4</td> \n " ,
" <td> 5</td> \n " ,
" <td> 6</td> \n " ,
" <td> 7</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>two</th> \n " ,
" <td> 8</td> \n " ,
" <td> 9</td> \n " ,
" <td> 10</td> \n " ,
" <td> 11</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" c a b d \n " ,
" three 0 1 2 3 \n " ,
" one 4 5 6 7 \n " ,
" two 8 9 10 11 "
]
} ,
" execution_count " : 90 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_12.sort_index(by=[ ' d ' , ' c ' ]) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Ranking is similar to numpy.argsort except that ties are broken by assigning each group the mean rank: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 91 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 1 -5 \n " ,
" 5 0 \n " ,
" 4 2 \n " ,
" 3 4 \n " ,
" 6 4 \n " ,
" 0 7 \n " ,
" 2 7 \n " ,
" 7 7 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 91 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_11 = Series([7, -5, 7, 4, 2, 0, 4, 7]) \n " ,
" ser_11 = ser_11.order() \n " ,
" ser_11 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 92 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 1 1.0 \n " ,
" 5 2.0 \n " ,
" 4 3.0 \n " ,
" 3 4.5 \n " ,
" 6 4.5 \n " ,
" 0 7.0 \n " ,
" 2 7.0 \n " ,
" 7 7.0 \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 92 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_11.rank() "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Rank a Series according to when they appear in the data: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 93 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 1 1 \n " ,
" 5 2 \n " ,
" 4 3 \n " ,
" 3 4 \n " ,
" 6 5 \n " ,
" 0 6 \n " ,
" 2 7 \n " ,
" 7 8 \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 93 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_11.rank(method= ' first ' ) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Rank a Series in descending order, using the maximum rank for the group: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 94 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 1 8 \n " ,
" 5 7 \n " ,
" 4 6 \n " ,
" 3 5 \n " ,
" 6 5 \n " ,
" 0 3 \n " ,
" 2 3 \n " ,
" 7 3 \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 94 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_11.rank(ascending=False, method= ' max ' ) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" DataFrames can rank over rows or columns. "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 95 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>bar</th> \n " ,
" <th>baz</th> \n " ,
" <th>foo</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td>-5</td> \n " ,
" <td>-1</td> \n " ,
" <td> 7</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 4</td> \n " ,
" <td> 2</td> \n " ,
" <td>-5</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 2</td> \n " ,
" <td> 3</td> \n " ,
" <td> 7</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 0</td> \n " ,
" <td> 0</td> \n " ,
" <td> 4</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 4</td> \n " ,
" <td> 5</td> \n " ,
" <td> 2</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> 7</td> \n " ,
" <td> 9</td> \n " ,
" <td> 0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> 7</td> \n " ,
" <td> 9</td> \n " ,
" <td> 4</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>7</th> \n " ,
" <td> 8</td> \n " ,
" <td> 5</td> \n " ,
" <td> 7</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" bar baz foo \n " ,
" 0 -5 -1 7 \n " ,
" 1 4 2 -5 \n " ,
" 2 2 3 7 \n " ,
" 3 0 0 4 \n " ,
" 4 4 5 2 \n " ,
" 5 7 9 0 \n " ,
" 6 7 9 4 \n " ,
" 7 8 5 7 "
]
} ,
" execution_count " : 95 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_13 = DataFrame( { ' foo ' : [7, -5, 7, 4, 2, 0, 4, 7], \n " ,
" ' bar ' : [-5, 4, 2, 0, 4, 7, 7, 8], \n " ,
" ' baz ' : [-1, 2, 3, 0, 5, 9, 9, 5]}) \n " ,
" df_13 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Rank a DataFrame over rows: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 96 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>bar</th> \n " ,
" <th>baz</th> \n " ,
" <th>foo</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 1.0</td> \n " ,
" <td> 1.0</td> \n " ,
" <td> 7.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 4.5</td> \n " ,
" <td> 3.0</td> \n " ,
" <td> 1.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 3.0</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 7.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 2.0</td> \n " ,
" <td> 2.0</td> \n " ,
" <td> 4.5</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 4.5</td> \n " ,
" <td> 5.5</td> \n " ,
" <td> 3.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> 6.5</td> \n " ,
" <td> 7.5</td> \n " ,
" <td> 2.0</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> 6.5</td> \n " ,
" <td> 7.5</td> \n " ,
" <td> 4.5</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>7</th> \n " ,
" <td> 8.0</td> \n " ,
" <td> 5.5</td> \n " ,
" <td> 7.0</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" bar baz foo \n " ,
" 0 1.0 1.0 7.0 \n " ,
" 1 4.5 3.0 1.0 \n " ,
" 2 3.0 4.0 7.0 \n " ,
" 3 2.0 2.0 4.5 \n " ,
" 4 4.5 5.5 3.0 \n " ,
" 5 6.5 7.5 2.0 \n " ,
" 6 6.5 7.5 4.5 \n " ,
" 7 8.0 5.5 7.0 "
]
} ,
" execution_count " : 96 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_13.rank() "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Rank a DataFrame over columns: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 97 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>bar</th> \n " ,
" <th>baz</th> \n " ,
" <th>foo</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 1.0</td> \n " ,
" <td> 2.0</td> \n " ,
" <td> 3</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 3.0</td> \n " ,
" <td> 2.0</td> \n " ,
" <td> 1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 1.0</td> \n " ,
" <td> 2.0</td> \n " ,
" <td> 3</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 1.5</td> \n " ,
" <td> 1.5</td> \n " ,
" <td> 3</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 2.0</td> \n " ,
" <td> 3.0</td> \n " ,
" <td> 1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> 2.0</td> \n " ,
" <td> 3.0</td> \n " ,
" <td> 1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> 2.0</td> \n " ,
" <td> 3.0</td> \n " ,
" <td> 1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>7</th> \n " ,
" <td> 3.0</td> \n " ,
" <td> 1.0</td> \n " ,
" <td> 2</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" bar baz foo \n " ,
" 0 1.0 2.0 3 \n " ,
" 1 3.0 2.0 1 \n " ,
" 2 1.0 2.0 3 \n " ,
" 3 1.5 1.5 3 \n " ,
" 4 2.0 3.0 1 \n " ,
" 5 2.0 3.0 1 \n " ,
" 6 2.0 3.0 1 \n " ,
" 7 3.0 1.0 2 "
]
} ,
" execution_count " : 97 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_13.rank(axis=1) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Axis Indexes with Duplicate Values "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Labels do not have to be unique in Pandas: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 98 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" foo 0 \n " ,
" foo 1 \n " ,
" bar 2 \n " ,
" bar 3 \n " ,
" baz 4 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 98 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_12 = Series(range(5), index=[ ' foo ' , ' foo ' , ' bar ' , ' bar ' , ' baz ' ]) \n " ,
" ser_12 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 99 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" False "
]
} ,
" execution_count " : 99 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_12.index.is_unique "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select Series elements: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 100 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" foo 0 \n " ,
" foo 1 \n " ,
" dtype: int64 "
]
} ,
" execution_count " : 100 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" ser_12[ ' foo ' ] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Select DataFrame elements: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 101 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>0</th> \n " ,
" <th>1</th> \n " ,
" <th>2</th> \n " ,
" <th>3</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>foo</th> \n " ,
" <td>-2.363469</td> \n " ,
" <td> 1.135345</td> \n " ,
" <td>-1.017014</td> \n " ,
" <td> 0.637362</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>foo</th> \n " ,
" <td>-0.859907</td> \n " ,
" <td> 1.772608</td> \n " ,
" <td>-1.110363</td> \n " ,
" <td> 0.181214</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>bar</th> \n " ,
" <td> 0.564345</td> \n " ,
" <td>-0.566510</td> \n " ,
" <td> 0.729976</td> \n " ,
" <td> 0.372994</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>bar</th> \n " ,
" <td> 0.533811</td> \n " ,
" <td>-0.091973</td> \n " ,
" <td> 1.913820</td> \n " ,
" <td> 0.330797</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>baz</th> \n " ,
" <td> 1.141943</td> \n " ,
" <td>-1.129595</td> \n " ,
" <td>-0.850052</td> \n " ,
" <td> 0.960820</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" 0 1 2 3 \n " ,
" foo -2.363469 1.135345 -1.017014 0.637362 \n " ,
" foo -0.859907 1.772608 -1.110363 0.181214 \n " ,
" bar 0.564345 -0.566510 0.729976 0.372994 \n " ,
" bar 0.533811 -0.091973 1.913820 0.330797 \n " ,
" baz 1.141943 -1.129595 -0.850052 0.960820 "
]
} ,
" execution_count " : 101 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_14 = DataFrame(np.random.randn(5, 4), \n " ,
" index=[ ' foo ' , ' foo ' , ' bar ' , ' bar ' , ' baz ' ]) \n " ,
" df_14 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 102 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>0</th> \n " ,
" <th>1</th> \n " ,
" <th>2</th> \n " ,
" <th>3</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>bar</th> \n " ,
" <td> 0.564345</td> \n " ,
" <td>-0.566510</td> \n " ,
" <td> 0.729976</td> \n " ,
" <td> 0.372994</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>bar</th> \n " ,
" <td> 0.533811</td> \n " ,
" <td>-0.091973</td> \n " ,
" <td> 1.913820</td> \n " ,
" <td> 0.330797</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" 0 1 2 3 \n " ,
" bar 0.564345 -0.566510 0.729976 0.372994 \n " ,
" bar 0.533811 -0.091973 1.913820 0.330797 "
]
} ,
" execution_count " : 102 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_14.ix[ ' bar ' ] "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Summarizing and Computing Descriptive Statistics "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Unlike NumPy arrays, Pandas descriptive statistics automatically exclude missing data. NaN values are excluded unless the entire row or column is NA. "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 103 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>pop</th> \n " ,
" <th>unempl</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.0</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.1</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VA</td> \n " ,
" <td> 5.2</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.0</td> \n " ,
" <td> 6.0</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MD</td> \n " ,
" <td> 4.1</td> \n " ,
" <td> 6.1</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>5</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>6</th> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> NaN</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state pop unempl year \n " ,
" 0 VA 5.0 NaN 2012 \n " ,
" 1 VA 5.1 NaN 2013 \n " ,
" 2 VA 5.2 6.0 2014 \n " ,
" 3 MD 4.0 6.0 2014 \n " ,
" 4 MD 4.1 6.1 2015 \n " ,
" 5 NaN NaN NaN NaN \n " ,
" 6 NaN NaN NaN NaN "
]
} ,
" execution_count " : 103 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 104 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" pop 23.4 \n " ,
" unempl 18.1 \n " ,
" year 10068.0 \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 104 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6.sum() "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Sum over the rows: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 105 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 0 2017.0 \n " ,
" 1 2018.1 \n " ,
" 2 2025.2 \n " ,
" 3 2024.0 \n " ,
" 4 2025.2 \n " ,
" 5 NaN \n " ,
" 6 NaN \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 105 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6.sum(axis=1) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Account for NaNs: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 106 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/plain " : [
" 0 NaN \n " ,
" 1 NaN \n " ,
" 2 2025.2 \n " ,
" 3 2024.0 \n " ,
" 4 2025.2 \n " ,
" 5 NaN \n " ,
" 6 NaN \n " ,
" dtype: float64 "
]
} ,
" execution_count " : 106 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_6.sum(axis=1, skipna=False) "
]
2015-06-12 20:51:00 -04:00
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Cleaning Data (Under Construction) \n " ,
" * Replace \n " ,
" * Drop \n " ,
" * Concatenate "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 1 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [ ] ,
" source " : [
" from pandas import Series, DataFrame \n " ,
" import pandas as pd "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Setup a DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 2 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>population</th> \n " ,
" <th>state</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 5.0</td> \n " ,
" <td> VA</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 5.1</td> \n " ,
" <td> VA</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 5.2</td> \n " ,
" <td> VA</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 4.0</td> \n " ,
" <td> MD</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 4.1</td> \n " ,
" <td> MD</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" population state year \n " ,
" 0 5.0 VA 2012 \n " ,
" 1 5.1 VA 2013 \n " ,
" 2 5.2 VA 2014 \n " ,
" 3 4.0 MD 2014 \n " ,
" 4 4.1 MD 2015 "
]
} ,
" execution_count " : 2 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" data_1 = { ' state ' : [ ' VA ' , ' VA ' , ' VA ' , ' MD ' , ' MD ' ], \n " ,
" ' year ' : [2012, 2013, 2014, 2014, 2015], \n " ,
" ' population ' : [5.0, 5.1, 5.2, 4.0, 4.1]} \n " ,
" df_1 = DataFrame(data_1) \n " ,
" df_1 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ### Replace "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Replace all occurrences of a string with another string, in place (no copy): "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 4 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>population</th> \n " ,
" <th>state</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 5.0</td> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 5.1</td> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 5.2</td> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 4.0</td> \n " ,
" <td> MD</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 4.1</td> \n " ,
" <td> MD</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" population state year \n " ,
" 0 5.0 VIRGINIA 2012 \n " ,
" 1 5.1 VIRGINIA 2013 \n " ,
" 2 5.2 VIRGINIA 2014 \n " ,
" 3 4.0 MD 2014 \n " ,
" 4 4.1 MD 2015 "
]
} ,
" execution_count " : 4 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_1.replace( ' VA ' , ' VIRGINIA ' , inplace=True) \n " ,
" df_1 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" In a specified column, replace all occurrences of a string with another string, in place (no copy): "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 5 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>population</th> \n " ,
" <th>state</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 5.0</td> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 5.1</td> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 5.2</td> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 4.0</td> \n " ,
" <td> MARYLAND</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 4.1</td> \n " ,
" <td> MARYLAND</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" population state year \n " ,
" 0 5.0 VIRGINIA 2012 \n " ,
" 1 5.1 VIRGINIA 2013 \n " ,
" 2 5.2 VIRGINIA 2014 \n " ,
" 3 4.0 MARYLAND 2014 \n " ,
" 4 4.1 MARYLAND 2015 "
]
} ,
" execution_count " : 5 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_1.replace( { ' state ' : { ' MD ' : ' MARYLAND ' }}, inplace=True) \n " ,
" df_1 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ### Drop "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Drop the ' population ' column and return a copy of the DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 6 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>state</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> MARYLAND</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> MARYLAND</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" state year \n " ,
" 0 VIRGINIA 2012 \n " ,
" 1 VIRGINIA 2013 \n " ,
" 2 VIRGINIA 2014 \n " ,
" 3 MARYLAND 2014 \n " ,
" 4 MARYLAND 2015 "
]
} ,
" execution_count " : 6 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_2 = df_1.drop( ' population ' , axis=1) \n " ,
" df_2 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ### Concatenate "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Concatenate two DataFrames: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 7 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>population</th> \n " ,
" <th>state</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 6.0</td> \n " ,
" <td> NY</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 6.1</td> \n " ,
" <td> NY</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 6.2</td> \n " ,
" <td> NY</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 3.0</td> \n " ,
" <td> FL</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 3.1</td> \n " ,
" <td> FL</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" population state year \n " ,
" 0 6.0 NY 2012 \n " ,
" 1 6.1 NY 2013 \n " ,
" 2 6.2 NY 2014 \n " ,
" 3 3.0 FL 2014 \n " ,
" 4 3.1 FL 2015 "
]
} ,
" execution_count " : 7 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" data_2 = { ' state ' : [ ' NY ' , ' NY ' , ' NY ' , ' FL ' , ' FL ' ], \n " ,
" ' year ' : [2012, 2013, 2014, 2014, 2015], \n " ,
" ' population ' : [6.0, 6.1, 6.2, 3.0, 3.1]} \n " ,
" df_3 = DataFrame(data_2) \n " ,
" df_3 "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 8 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>population</th> \n " ,
" <th>state</th> \n " ,
" <th>year</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 5.0</td> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 5.1</td> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 5.2</td> \n " ,
" <td> VIRGINIA</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 4.0</td> \n " ,
" <td> MARYLAND</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 4.1</td> \n " ,
" <td> MARYLAND</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 6.0</td> \n " ,
" <td> NY</td> \n " ,
" <td> 2012</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 6.1</td> \n " ,
" <td> NY</td> \n " ,
" <td> 2013</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 6.2</td> \n " ,
" <td> NY</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 3.0</td> \n " ,
" <td> FL</td> \n " ,
" <td> 2014</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td> 3.1</td> \n " ,
" <td> FL</td> \n " ,
" <td> 2015</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" population state year \n " ,
" 0 5.0 VIRGINIA 2012 \n " ,
" 1 5.1 VIRGINIA 2013 \n " ,
" 2 5.2 VIRGINIA 2014 \n " ,
" 3 4.0 MARYLAND 2014 \n " ,
" 4 4.1 MARYLAND 2015 \n " ,
" 0 6.0 NY 2012 \n " ,
" 1 6.1 NY 2013 \n " ,
" 2 6.2 NY 2014 \n " ,
" 3 3.0 FL 2014 \n " ,
" 4 3.1 FL 2015 "
]
} ,
" execution_count " : 8 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_4 = pd.concat([df_1, df_3]) \n " ,
" df_4 "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ## Input and Output (Under Construction) \n " ,
" * Reading \n " ,
" * Writing "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 1 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [ ] ,
" source " : [
" from pandas import Series, DataFrame \n " ,
" import pandas as pd "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ### Reading "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Read data from a CSV file into a DataFrame (use sep= ' \\ t ' for TSV): "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 2 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [ ] ,
" source " : [
" df_1 = pd.read_csv( \" ../data/ozone.csv \" ) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Get a summary of the DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 3 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>Ozone</th> \n " ,
" <th>Solar.R</th> \n " ,
" <th>Wind</th> \n " ,
" <th>Temp</th> \n " ,
" <th>Month</th> \n " ,
" <th>Day</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>count</th> \n " ,
" <td> 116.000000</td> \n " ,
" <td> 146.000000</td> \n " ,
" <td> 153.000000</td> \n " ,
" <td> 153.000000</td> \n " ,
" <td> 153.000000</td> \n " ,
" <td> 153.000000</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>mean</th> \n " ,
" <td> 42.129310</td> \n " ,
" <td> 185.931507</td> \n " ,
" <td> 9.957516</td> \n " ,
" <td> 77.882353</td> \n " ,
" <td> 6.993464</td> \n " ,
" <td> 15.803922</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>std</th> \n " ,
" <td> 32.987885</td> \n " ,
" <td> 90.058422</td> \n " ,
" <td> 3.523001</td> \n " ,
" <td> 9.465270</td> \n " ,
" <td> 1.416522</td> \n " ,
" <td> 8.864520</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>min</th> \n " ,
" <td> 1.000000</td> \n " ,
" <td> 7.000000</td> \n " ,
" <td> 1.700000</td> \n " ,
" <td> 56.000000</td> \n " ,
" <td> 5.000000</td> \n " ,
" <td> 1.000000</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>25 % </th> \n " ,
" <td> 18.000000</td> \n " ,
" <td> 115.750000</td> \n " ,
" <td> 7.400000</td> \n " ,
" <td> 72.000000</td> \n " ,
" <td> 6.000000</td> \n " ,
" <td> 8.000000</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>50 % </th> \n " ,
" <td> 31.500000</td> \n " ,
" <td> 205.000000</td> \n " ,
" <td> 9.700000</td> \n " ,
" <td> 79.000000</td> \n " ,
" <td> 7.000000</td> \n " ,
" <td> 16.000000</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>75 % </th> \n " ,
" <td> 63.250000</td> \n " ,
" <td> 258.750000</td> \n " ,
" <td> 11.500000</td> \n " ,
" <td> 85.000000</td> \n " ,
" <td> 8.000000</td> \n " ,
" <td> 23.000000</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>max</th> \n " ,
" <td> 168.000000</td> \n " ,
" <td> 334.000000</td> \n " ,
" <td> 20.700000</td> \n " ,
" <td> 97.000000</td> \n " ,
" <td> 9.000000</td> \n " ,
" <td> 31.000000</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" Ozone Solar.R Wind Temp Month Day \n " ,
" count 116.000000 146.000000 153.000000 153.000000 153.000000 153.000000 \n " ,
" mean 42.129310 185.931507 9.957516 77.882353 6.993464 15.803922 \n " ,
" std 32.987885 90.058422 3.523001 9.465270 1.416522 8.864520 \n " ,
" min 1.000000 7.000000 1.700000 56.000000 5.000000 1.000000 \n " ,
" 25 % 18.000000 115.750000 7.400000 72.000000 6.000000 8.000000 \n " ,
" 50 % 31.500000 205.000000 9.700000 79.000000 7.000000 16.000000 \n " ,
" 75 % 63.250000 258.750000 11.500000 85.000000 8.000000 23.000000 \n " ,
" max 168.000000 334.000000 20.700000 97.000000 9.000000 31.000000 "
]
} ,
" execution_count " : 3 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_1.describe() "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" List the first five rows of the DataFrame: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 4 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" data " : {
" text/html " : [
" <div style= \" max-height:1000px;max-width:1500px;overflow:auto; \" > \n " ,
" <table border= \" 1 \" class= \" dataframe \" > \n " ,
" <thead> \n " ,
" <tr style= \" text-align: right; \" > \n " ,
" <th></th> \n " ,
" <th>Ozone</th> \n " ,
" <th>Solar.R</th> \n " ,
" <th>Wind</th> \n " ,
" <th>Temp</th> \n " ,
" <th>Month</th> \n " ,
" <th>Day</th> \n " ,
" </tr> \n " ,
" </thead> \n " ,
" <tbody> \n " ,
" <tr> \n " ,
" <th>0</th> \n " ,
" <td> 41</td> \n " ,
" <td> 190</td> \n " ,
" <td> 7.4</td> \n " ,
" <td> 67</td> \n " ,
" <td> 5</td> \n " ,
" <td> 1</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>1</th> \n " ,
" <td> 36</td> \n " ,
" <td> 118</td> \n " ,
" <td> 8.0</td> \n " ,
" <td> 72</td> \n " ,
" <td> 5</td> \n " ,
" <td> 2</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>2</th> \n " ,
" <td> 12</td> \n " ,
" <td> 149</td> \n " ,
" <td> 12.6</td> \n " ,
" <td> 74</td> \n " ,
" <td> 5</td> \n " ,
" <td> 3</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>3</th> \n " ,
" <td> 18</td> \n " ,
" <td> 313</td> \n " ,
" <td> 11.5</td> \n " ,
" <td> 62</td> \n " ,
" <td> 5</td> \n " ,
" <td> 4</td> \n " ,
" </tr> \n " ,
" <tr> \n " ,
" <th>4</th> \n " ,
" <td>NaN</td> \n " ,
" <td> NaN</td> \n " ,
" <td> 14.3</td> \n " ,
" <td> 56</td> \n " ,
" <td> 5</td> \n " ,
" <td> 5</td> \n " ,
" </tr> \n " ,
" </tbody> \n " ,
" </table> \n " ,
" </div> "
] ,
" text/plain " : [
" Ozone Solar.R Wind Temp Month Day \n " ,
" 0 41 190 7.4 67 5 1 \n " ,
" 1 36 118 8.0 72 5 2 \n " ,
" 2 12 149 12.6 74 5 3 \n " ,
" 3 18 313 11.5 62 5 4 \n " ,
" 4 NaN NaN 14.3 56 5 5 "
]
} ,
" execution_count " : 4 ,
" metadata " : { } ,
" output_type " : " execute_result "
}
] ,
" source " : [
" df_1.head() "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" ### Writing "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" Create a copy of the CSV file, encoded in UTF-8 and hiding the index and header labels: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 5 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [ ] ,
" source " : [
" df_1.to_csv( ' ../data/ozone_copy.csv ' , \n " ,
" encoding= ' utf-8 ' , \n " ,
" index=False, \n " ,
" header=False) "
]
} ,
{
" cell_type " : " markdown " ,
" metadata " : { } ,
" source " : [
" View the data directory: "
]
} ,
{
" cell_type " : " code " ,
" execution_count " : 6 ,
" metadata " : {
" collapsed " : false
} ,
" outputs " : [
{
" name " : " stdout " ,
" output_type " : " stream " ,
" text " : [
" total 16 \r \n " ,
" -rw-r--r--@ 1 dmartin 1443163707 2902 Dec 26 2012 ozone.csv \r \n " ,
" -rw-r--r-- 1 dmartin 1443163707 3324 Feb 14 06:40 ozone_copy.csv \r \n "
]
}
] ,
" source " : [
" !ls -l ../data/ "
]
2015-05-17 07:36:41 -04:00
}
] ,
" metadata " : {
" kernelspec " : {
" display_name " : " Python 2 " ,
" language " : " python " ,
" name " : " python2 "
} ,
" language_info " : {
" codemirror_mode " : {
" name " : " ipython " ,
" version " : 2
} ,
" file_extension " : " .py " ,
" mimetype " : " text/x-python " ,
" name " : " python " ,
" nbconvert_exporter " : " python " ,
" pygments_lexer " : " ipython2 " ,
2015-06-12 20:51:00 -04:00
" version " : " 2.7.10 "
2015-01-27 19:22:37 -05:00
}
2015-05-17 07:36:41 -04:00
} ,
" nbformat " : 4 ,
" nbformat_minor " : 0
}