data-science-ipython-notebooks/pandas/03.14 Pandas For Data Analysis.ipynb

6590 lines
168 KiB
Python
Raw Normal View History

2021-01-27 23:44:49 +05:30
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas Introduction \n",
"\n",
"Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.\n",
"\n",
"You can say pandas is extremely powerful version of Excel\n",
"\n",
"In this section we are going to talk about \n",
"\n",
"* Introduction To pandas \n",
"* Seies \n",
"* DataFrames \n",
"* Missing Data\n",
"* Merging , Joining , And Concatenating \n",
"* Operations \n",
"* Data Input and Output "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Series "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fristly we are going to talk about Series DataType .\n",
"\n",
"A Series is very similar to numpy array , it is built on top of NumPy Array..\n",
"But Series can have axis labels , meaning it can be indexed by labels instead of just number location \n",
"\n",
"Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Lets import numpy and pandas \n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# We can convert a list , numpy array , or dict to Series\n",
"\n",
"labels = ['Shivendra','Ragavendra','Narendra']\n",
"my_list= [21,25,30]\n",
"arr=np.array([10,20,30])\n",
"d={'Shivendra':21,'Raghavendra':25,'Narendra':30}"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 21\n",
"1 25\n",
"2 30\n",
"dtype: int64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Using List \n",
"pd.Series(data=my_list)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Shivendra 21\n",
"Ragavendra 25\n",
"Narendra 30\n",
"dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.Series (data=my_list,index=labels )"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Shivendra 21\n",
"Ragavendra 25\n",
"Narendra 30\n",
"dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.Series(my_list,labels)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 10\n",
"1 20\n",
"2 30\n",
"dtype: int32"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# NumPy Array\n",
"pd.Series(arr)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Shivendra 10\n",
"Ragavendra 20\n",
"Narendra 30\n",
"dtype: int32"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.Series (data=arr,index=labels )"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Shivendra 21\n",
"Raghavendra 25\n",
"Narendra 30\n",
"dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Dictonary\n",
"pd.Series (d)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data In A Series \n",
"\n",
"A Pandas Series can hold a variety of Objects "
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 Shivendra\n",
"1 Ragavendra\n",
"2 Narendra\n",
"dtype: object"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.Series (data=labels )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using an Index\n",
"\n",
"The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).\n",
"\n",
"Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"ser1= pd.Series ([1,2,3,4], index =['Chennai','Bihar','West Bengal','Rajasthan'])"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Chennai 1\n",
"Bihar 2\n",
"West Bengal 3\n",
"Rajasthan 4\n",
"dtype: int64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser1"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"ser2=pd.Series ([1,2,5,4],index=['Chennai','Bihar','Assam','Rajasthan'])"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Chennai 1\n",
"Bihar 2\n",
"Assam 5\n",
"Rajasthan 4\n",
"dtype: int64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser2"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ser1['Chennai']"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Assam NaN\n",
"Bihar 4.0\n",
"Chennai 2.0\n",
"Rajasthan 8.0\n",
"West Bengal NaN\n",
"dtype: float64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Operations are then also done based off of index:\n",
"ser1+ser2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## DataFrames\n",
"Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.\n",
"\n",
"DataFrames are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. Let's use pandas to explore this topic!"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from numpy.random import randn\n",
"np.random.seed(101)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"df=pd.DataFrame (randn(5,5),index='Chennai Bihar UtterPredesh Delhi Mumbai'.split(),columns ='SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay'.split())"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122\n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509\n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Selection and Indexing\n",
"\n",
"Let's learn the various methods to grab data from a DataFrame"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Chennai 2.706850\n",
"Bihar -0.319318\n",
"UtterPredesh 0.528813\n",
"Delhi 0.955057\n",
"Mumbai 0.302665\n",
"Name: SRM, dtype: float64"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['SRM']"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>BHU</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.907969</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>0.605965</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>0.188695</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>1.978757</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>-1.706086</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM BHU\n",
"Chennai 2.706850 0.907969\n",
"Bihar -0.319318 0.605965\n",
"UtterPredesh 0.528813 0.188695\n",
"Delhi 0.955057 1.978757\n",
"Mumbai 0.302665 -1.706086"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We can pass a list of columns names \n",
"df[['SRM' , 'BHU']]"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Chennai 2.706850\n",
"Bihar -0.319318\n",
"UtterPredesh 0.528813\n",
"Delhi 0.955057\n",
"Mumbai 0.302665\n",
"Name: SRM, dtype: float64"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.SRM # SQL syntax"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"# Dataframe Columns are just Series"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df['SRM'])"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"# Creating a new columns \n",
"df['UPES']=df['SRM']"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"df['Harshita']=df['SRM'] + df['BHU']"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" <th>UPES</th>\n",
" <th>Harshita</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" <td>2.706850</td>\n",
" <td>3.614819</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" <td>-0.319318</td>\n",
" <td>0.286647</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" <td>0.528813</td>\n",
" <td>0.717509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" <td>0.955057</td>\n",
" <td>2.933814</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" <td>0.302665</td>\n",
" <td>-1.403420</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay UPES \\\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118 2.706850 \n",
"Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122 -0.319318 \n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237 0.528813 \n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509 0.955057 \n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841 0.302665 \n",
"\n",
" Harshita \n",
"Chennai 3.614819 \n",
"Bihar 0.286647 \n",
"UtterPredesh 0.717509 \n",
"Delhi 2.933814 \n",
"Mumbai -1.403420 "
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122\n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509\n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop('UPES',axis=1) # Axis = 1 for column"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" <th>UPES</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" <td>2.706850</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" <td>-0.319318</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" <td>0.528813</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" <td>0.955057</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" <td>0.302665</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay UPES\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118 2.706850\n",
"Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122 -0.319318\n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237 0.528813\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509 0.955057\n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841 0.302665"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df # But again it will be appeared we need to use inplace to remove it parmanently"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"df.drop('UPES',axis=1,inplace =True )"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"df.drop('Harshita',axis=1,inplace = True)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122\n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509\n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122\n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237\n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.drop('Delhi',axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"SRM 2.706850\n",
"NIT_PATNA 0.628133\n",
"BHU 0.907969\n",
"IIT_DELHI 0.503826\n",
"IIT_Bombay 0.651118\n",
"Name: Chennai, dtype: float64"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc['Chennai']"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"SRM -0.319318\n",
"NIT_PATNA -0.848077\n",
"BHU 0.605965\n",
"IIT_DELHI -2.018168\n",
"IIT_Bombay 0.740122\n",
"Name: Bihar, dtype: float64"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We can select based on indexing \n",
"df.iloc[1]"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-0.8480769834036315"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc['Bihar','NIT_PATNA']"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>NIT_PATNA</th>\n",
" <th>IIT_DELHI</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.848077</td>\n",
" <td>-2.018168</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>1.693723</td>\n",
" <td>-1.159119</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" NIT_PATNA IIT_DELHI\n",
"Bihar -0.848077 -2.018168\n",
"Mumbai 1.693723 -1.159119"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[['Bihar','Mumbai'],['NIT_PATNA','IIT_DELHI']]"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122\n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509\n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>True</td>\n",
" <td>True</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"Chennai True True True True True\n",
"Bihar False False True False True\n",
"UtterPredesh True False True False False\n",
"Delhi True True True True True\n",
"Mumbai True True False False False"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df>0"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>0.605965</td>\n",
" <td>NaN</td>\n",
" <td>0.740122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>NaN</td>\n",
" <td>0.188695</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"Bihar NaN NaN 0.605965 NaN 0.740122\n",
"UtterPredesh 0.528813 NaN 0.188695 NaN NaN\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509\n",
"Mumbai 0.302665 1.693723 NaN NaN NaN"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df>0]"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509\n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df [df['SRM']>0] # It will not print Bihar Cz Bihar is having negetive number"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Chennai 0.907969\n",
"UtterPredesh 0.188695\n",
"Delhi 1.978757\n",
"Mumbai -1.706086\n",
"Name: BHU, dtype: float64"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['SRM']>0]['BHU'] # It will not print Bihar data since it is having negetive number "
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Chennai 2.706850\n",
"Bihar -0.319318\n",
"UtterPredesh 0.528813\n",
"Delhi 0.955057\n",
"Name: SRM, dtype: float64"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['BHU']>0]['SRM'] # It will not print mumbai's data"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" BHU IIT_DELHI\n",
"Chennai 0.907969 0.503826\n",
"UtterPredesh 0.188695 -0.758872\n",
"Delhi 1.978757 2.605967\n",
"Mumbai -1.706086 -1.159119"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['SRM']>0][['BHU','IIT_DELHI']]"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['SRM']>0.955)& df['BHU']>0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## More Index Details\n",
"\n",
"Let's discuss some more features of indexing, including resetting the index or setting it something else. We'll also talk about index hierarchy!"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122\n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509\n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>index</th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Chennai</td>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Bihar</td>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>UtterPredesh</td>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Delhi</td>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Mumbai</td>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" index SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"0 Chennai 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"1 Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122\n",
"2 UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237\n",
"3 Delhi 0.955057 0.190794 1.978757 2.605967 0.683509\n",
"4 Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.reset_index()"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"newind ='Tamil_Nadu BIHAR UP Delhi Maharastra'.split()"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"df['States']=newind"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" <th>States</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" <td>Tamil_Nadu</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" <td>BIHAR</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" <td>UP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" <td>Delhi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" <td>Maharastra</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay States\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118 Tamil_Nadu\n",
"Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122 BIHAR\n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237 UP\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509 Delhi\n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841 Maharastra"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" <tr>\n",
" <th>States</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Tamil_Nadu</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>BIHAR</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UP</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Maharastra</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"States \n",
"Tamil_Nadu 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"BIHAR -0.319318 -0.848077 0.605965 -2.018168 0.740122\n",
"UP 0.528813 -0.589001 0.188695 -0.758872 -0.933237\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509\n",
"Maharastra 0.302665 1.693723 -1.706086 -1.159119 -0.134841"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.set_index('States')"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" <th>States</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chennai</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" <td>Tamil_Nadu</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Bihar</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" <td>BIHAR</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UtterPredesh</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" <td>UP</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" <td>Delhi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mumbai</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" <td>Maharastra</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay States\n",
"Chennai 2.706850 0.628133 0.907969 0.503826 0.651118 Tamil_Nadu\n",
"Bihar -0.319318 -0.848077 0.605965 -2.018168 0.740122 BIHAR\n",
"UtterPredesh 0.528813 -0.589001 0.188695 -0.758872 -0.933237 UP\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509 Delhi\n",
"Mumbai 0.302665 1.693723 -1.706086 -1.159119 -0.134841 Maharastra"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"df.set_index('States',inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SRM</th>\n",
" <th>NIT_PATNA</th>\n",
" <th>BHU</th>\n",
" <th>IIT_DELHI</th>\n",
" <th>IIT_Bombay</th>\n",
" </tr>\n",
" <tr>\n",
" <th>States</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Tamil_Nadu</th>\n",
" <td>2.706850</td>\n",
" <td>0.628133</td>\n",
" <td>0.907969</td>\n",
" <td>0.503826</td>\n",
" <td>0.651118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>BIHAR</th>\n",
" <td>-0.319318</td>\n",
" <td>-0.848077</td>\n",
" <td>0.605965</td>\n",
" <td>-2.018168</td>\n",
" <td>0.740122</td>\n",
" </tr>\n",
" <tr>\n",
" <th>UP</th>\n",
" <td>0.528813</td>\n",
" <td>-0.589001</td>\n",
" <td>0.188695</td>\n",
" <td>-0.758872</td>\n",
" <td>-0.933237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Delhi</th>\n",
" <td>0.955057</td>\n",
" <td>0.190794</td>\n",
" <td>1.978757</td>\n",
" <td>2.605967</td>\n",
" <td>0.683509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Maharastra</th>\n",
" <td>0.302665</td>\n",
" <td>1.693723</td>\n",
" <td>-1.706086</td>\n",
" <td>-1.159119</td>\n",
" <td>-0.134841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SRM NIT_PATNA BHU IIT_DELHI IIT_Bombay\n",
"States \n",
"Tamil_Nadu 2.706850 0.628133 0.907969 0.503826 0.651118\n",
"BIHAR -0.319318 -0.848077 0.605965 -2.018168 0.740122\n",
"UP 0.528813 -0.589001 0.188695 -0.758872 -0.933237\n",
"Delhi 0.955057 0.190794 1.978757 2.605967 0.683509\n",
"Maharastra 0.302665 1.693723 -1.706086 -1.159119 -0.134841"
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-Index and Index Hierarchy\n",
"\n",
"Let us go over how to work with Multi-Index, first we'll create a quick example of what a Multi-Indexed DataFrame would look like:"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [],
"source": [
"# index Levels \n",
"outside =['Big_data','Big_data','Big_data','AI','AI','AI']\n",
"inside =[1,2,3,1,2,3]\n",
"hier_index=list(zip(outside,inside))\n",
"hier_index=pd.MultiIndex.from_tuples(hier_index)"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"MultiIndex([('Big_data', 1),\n",
" ('Big_data', 2),\n",
" ('Big_data', 3),\n",
" ( 'AI', 1),\n",
" ( 'AI', 2),\n",
" ( 'AI', 3)],\n",
" )"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hier_index"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [],
"source": [
"df=pd.DataFrame(np.random.rand (6,2),index=hier_index,columns=['Core','volunteers'])"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Core</th>\n",
" <th>volunteers</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">Big_data</th>\n",
" <th>1</th>\n",
" <td>0.701371</td>\n",
" <td>0.487635</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.680678</td>\n",
" <td>0.521548</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.043397</td>\n",
" <td>0.223937</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">AI</th>\n",
" <th>1</th>\n",
" <td>0.575205</td>\n",
" <td>0.120434</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.500117</td>\n",
" <td>0.138010</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.052808</td>\n",
" <td>0.178277</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Core volunteers\n",
"Big_data 1 0.701371 0.487635\n",
" 2 0.680678 0.521548\n",
" 3 0.043397 0.223937\n",
"AI 1 0.575205 0.120434\n",
" 2 0.500117 0.138010\n",
" 3 0.052808 0.178277"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's show how to index this! For index hierarchy we use df.loc[], if this was on the columns axis, you would just use normal bracket notation df[]. Calling one level of the index returns the sub-dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Core</th>\n",
" <th>volunteers</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.701371</td>\n",
" <td>0.487635</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.680678</td>\n",
" <td>0.521548</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.043397</td>\n",
" <td>0.223937</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Core volunteers\n",
"1 0.701371 0.487635\n",
"2 0.680678 0.521548\n",
"3 0.043397 0.223937"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc['Big_data']"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Core 0.701371\n",
"volunteers 0.487635\n",
"Name: 1, dtype: float64"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc['Big_data'].loc[1]"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"FrozenList([None, None])"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.index.names"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [],
"source": [
"df.index.names=['Domain','S.NO']"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>Core</th>\n",
" <th>volunteers</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Domain</th>\n",
" <th>S.NO</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">Big_data</th>\n",
" <th>1</th>\n",
" <td>0.701371</td>\n",
" <td>0.487635</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.680678</td>\n",
" <td>0.521548</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.043397</td>\n",
" <td>0.223937</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"3\" valign=\"top\">AI</th>\n",
" <th>1</th>\n",
" <td>0.575205</td>\n",
" <td>0.120434</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.500117</td>\n",
" <td>0.138010</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.052808</td>\n",
" <td>0.178277</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Core volunteers\n",
"Domain S.NO \n",
"Big_data 1 0.701371 0.487635\n",
" 2 0.680678 0.521548\n",
" 3 0.043397 0.223937\n",
"AI 1 0.575205 0.120434\n",
" 2 0.500117 0.138010\n",
" 3 0.052808 0.178277"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Core</th>\n",
" <th>volunteers</th>\n",
" </tr>\n",
" <tr>\n",
" <th>S.NO</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.701371</td>\n",
" <td>0.487635</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.680678</td>\n",
" <td>0.521548</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.043397</td>\n",
" <td>0.223937</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Core volunteers\n",
"S.NO \n",
"1 0.701371 0.487635\n",
"2 0.680678 0.521548\n",
"3 0.043397 0.223937"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.xs('Big_data')"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [],
"source": [
"weather_data = {\n",
" 'day': ['1/1/2017','1/2/2017','1/3/2017','1/4/2017','1/5/2017','1/6/2017'],\n",
" 'temperature': [32,35,28,24,32,31],\n",
" 'windspeed': [6,7,2,7,4,2],\n",
" 'event': ['Rain', 'Sunny', 'Snow','Snow','Rain', 'Sunny']\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [],
"source": [
"df=pd.DataFrame(weather_data)"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>day</th>\n",
" <th>temperature</th>\n",
" <th>windspeed</th>\n",
" <th>event</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1/1/2017</td>\n",
" <td>32</td>\n",
" <td>6</td>\n",
" <td>Rain</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1/2/2017</td>\n",
" <td>35</td>\n",
" <td>7</td>\n",
" <td>Sunny</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1/3/2017</td>\n",
" <td>28</td>\n",
" <td>2</td>\n",
" <td>Snow</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1/4/2017</td>\n",
" <td>24</td>\n",
" <td>7</td>\n",
" <td>Snow</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1/5/2017</td>\n",
" <td>32</td>\n",
" <td>4</td>\n",
" <td>Rain</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1/6/2017</td>\n",
" <td>31</td>\n",
" <td>2</td>\n",
" <td>Sunny</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" day temperature windspeed event\n",
"0 1/1/2017 32 6 Rain\n",
"1 1/2/2017 35 7 Sunny\n",
"2 1/3/2017 28 2 Snow\n",
"3 1/4/2017 24 7 Snow\n",
"4 1/5/2017 32 4 Rain\n",
"5 1/6/2017 31 2 Sunny"
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(6, 4)"
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape # rows, columns = df.shape"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>day</th>\n",
" <th>temperature</th>\n",
" <th>windspeed</th>\n",
" <th>event</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1/1/2017</td>\n",
" <td>32</td>\n",
" <td>6</td>\n",
" <td>Rain</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1/2/2017</td>\n",
" <td>35</td>\n",
" <td>7</td>\n",
" <td>Sunny</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1/3/2017</td>\n",
" <td>28</td>\n",
" <td>2</td>\n",
" <td>Snow</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1/4/2017</td>\n",
" <td>24</td>\n",
" <td>7</td>\n",
" <td>Snow</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1/5/2017</td>\n",
" <td>32</td>\n",
" <td>4</td>\n",
" <td>Rain</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" day temperature windspeed event\n",
"0 1/1/2017 32 6 Rain\n",
"1 1/2/2017 35 7 Sunny\n",
"2 1/3/2017 28 2 Snow\n",
"3 1/4/2017 24 7 Snow\n",
"4 1/5/2017 32 4 Rain"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head() # df.head(3)"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>day</th>\n",
" <th>temperature</th>\n",
" <th>windspeed</th>\n",
" <th>event</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1/2/2017</td>\n",
" <td>35</td>\n",
" <td>7</td>\n",
" <td>Sunny</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1/3/2017</td>\n",
" <td>28</td>\n",
" <td>2</td>\n",
" <td>Snow</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1/4/2017</td>\n",
" <td>24</td>\n",
" <td>7</td>\n",
" <td>Snow</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1/5/2017</td>\n",
" <td>32</td>\n",
" <td>4</td>\n",
" <td>Rain</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1/6/2017</td>\n",
" <td>31</td>\n",
" <td>2</td>\n",
" <td>Sunny</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" day temperature windspeed event\n",
"1 1/2/2017 35 7 Sunny\n",
"2 1/3/2017 28 2 Snow\n",
"3 1/4/2017 24 7 Snow\n",
"4 1/5/2017 32 4 Rain\n",
"5 1/6/2017 31 2 Sunny"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail() # df.tail(2)"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>day</th>\n",
" <th>temperature</th>\n",
" <th>windspeed</th>\n",
" <th>event</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1/2/2017</td>\n",
" <td>35</td>\n",
" <td>7</td>\n",
" <td>Sunny</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1/3/2017</td>\n",
" <td>28</td>\n",
" <td>2</td>\n",
" <td>Snow</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" day temperature windspeed event\n",
"1 1/2/2017 35 7 Sunny\n",
"2 1/3/2017 28 2 Snow"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[1:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## <font color='blue'>Columns</font>"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['day', 'temperature', 'windspeed', 'event'], dtype='object')"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1/1/2017\n",
"1 1/2/2017\n",
"2 1/3/2017\n",
"3 1/4/2017\n",
"4 1/5/2017\n",
"5 1/6/2017\n",
"Name: day, dtype: object"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['day']"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(df['day'])"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>day</th>\n",
" <th>temperature</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1/1/2017</td>\n",
" <td>32</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1/2/2017</td>\n",
" <td>35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1/3/2017</td>\n",
" <td>28</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1/4/2017</td>\n",
" <td>24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1/5/2017</td>\n",
" <td>32</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1/6/2017</td>\n",
" <td>31</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" day temperature\n",
"0 1/1/2017 32\n",
"1 1/2/2017 35\n",
"2 1/3/2017 28\n",
"3 1/4/2017 24\n",
"4 1/5/2017 32\n",
"5 1/6/2017 31"
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['day','temperature']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## <font color='blue'>Operations On DataFrame</font>"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"35"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['temperature'].max()"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>day</th>\n",
" <th>temperature</th>\n",
" <th>windspeed</th>\n",
" <th>event</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1/2/2017</td>\n",
" <td>35</td>\n",
" <td>7</td>\n",
" <td>Sunny</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" day temperature windspeed event\n",
"1 1/2/2017 35 7 Sunny"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['temperature']>32]"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 1/2/2017\n",
"Name: day, dtype: object"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['day'][df['temperature'] == df['temperature'].max()] # Kinda doing SQL in pandas"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3.8297084310253524"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['temperature'].std()"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Sunny'"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['event'].max() # But mean() won't work since data type is string"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>temperature</th>\n",
" <th>windspeed</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>6.000000</td>\n",
" <td>6.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>30.333333</td>\n",
" <td>4.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>3.829708</td>\n",
" <td>2.338090</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>24.000000</td>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>28.750000</td>\n",
" <td>2.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>31.500000</td>\n",
" <td>5.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>32.000000</td>\n",
" <td>6.750000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>35.000000</td>\n",
" <td>7.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" temperature windspeed\n",
"count 6.000000 6.000000\n",
"mean 30.333333 4.666667\n",
"std 3.829708 2.338090\n",
"min 24.000000 2.000000\n",
"25% 28.750000 2.500000\n",
"50% 31.500000 5.000000\n",
"75% 32.000000 6.750000\n",
"max 35.000000 7.000000"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Missing Data\n",
"\n",
"Let's show a few convenient methods to deal with Missing Data in pandas:"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame({'A':[1,2,np.nan],\n",
" 'B':[5,np.nan,np.nan],\n",
" 'C':[1,2,3]})"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>5.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.0</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"0 1.0 5.0 1\n",
"1 2.0 NaN 2\n",
"2 NaN NaN 3"
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>5.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"0 1.0 5.0 1"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dropna()"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" C\n",
"0 1\n",
"1 2\n",
"2 3"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dropna(axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>5.0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.0</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"0 1.0 5.0 1\n",
"1 2.0 NaN 2"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dropna(thresh=2)"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>shivendra</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>shivendra</td>\n",
" <td>shivendra</td>\n",
" <td>3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C\n",
"0 1 5 1\n",
"1 2 shivendra 2\n",
"2 shivendra shivendra 3"
]
},
"execution_count": 108,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.fillna(value='shivendra')"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1.0\n",
"1 2.0\n",
"2 1.5\n",
"Name: A, dtype: float64"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['A'].fillna(value=df['A'].mean())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Groupby\n",
"\n",
"The groupby method allows you to group rows of data together and call aggregate functions"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"# Create dataframe\n",
"data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],\n",
" 'Person':['Shivendra','Abhishek','Sowjanya','Manish','Mini','Satya'],\n",
" 'Sales':[200,120,340,124,243,350]}"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [],
"source": [
"df = pd.DataFrame(data)"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Company</th>\n",
" <th>Person</th>\n",
" <th>Sales</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>GOOG</td>\n",
" <td>Shivendra</td>\n",
" <td>200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>GOOG</td>\n",
" <td>Abhishek</td>\n",
" <td>120</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>MSFT</td>\n",
" <td>Sowjanya</td>\n",
" <td>340</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>MSFT</td>\n",
" <td>Manish</td>\n",
" <td>124</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>FB</td>\n",
" <td>Mini</td>\n",
" <td>243</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>FB</td>\n",
" <td>Satya</td>\n",
" <td>350</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Company Person Sales\n",
"0 GOOG Shivendra 200\n",
"1 GOOG Abhishek 120\n",
"2 MSFT Sowjanya 340\n",
"3 MSFT Manish 124\n",
"4 FB Mini 243\n",
"5 FB Satya 350"
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"** Now you can use the .groupby() method to group rows together based off of a column name. For instance let's group based off of Company. This will create a DataFrameGroupBy object:**"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001F12523CB08>"
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Company')"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sales</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Company</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>FB</th>\n",
" <td>296.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>GOOG</th>\n",
" <td>160.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MSFT</th>\n",
" <td>232.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sales\n",
"Company \n",
"FB 296.5\n",
"GOOG 160.0\n",
"MSFT 232.0"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#You can save this object as a new variable:\n",
"by_comp = df.groupby(\"Company\")\n",
"#And then call aggregate methods off the object:\n",
"by_comp.mean()\n"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sales</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Company</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>FB</th>\n",
" <td>296.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>GOOG</th>\n",
" <td>160.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MSFT</th>\n",
" <td>232.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sales\n",
"Company \n",
"FB 296.5\n",
"GOOG 160.0\n",
"MSFT 232.0"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('Company').mean()"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Sales</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Company</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>FB</th>\n",
" <td>75.660426</td>\n",
" </tr>\n",
" <tr>\n",
" <th>GOOG</th>\n",
" <td>56.568542</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MSFT</th>\n",
" <td>152.735065</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sales\n",
"Company \n",
"FB 75.660426\n",
"GOOG 56.568542\n",
"MSFT 152.735065"
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#More examples of aggregate methods:\n",
"by_comp.std()"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Person</th>\n",
" <th>Sales</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Company</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>FB</th>\n",
" <td>Satya</td>\n",
" <td>350</td>\n",
" </tr>\n",
" <tr>\n",
" <th>GOOG</th>\n",
" <td>Shivendra</td>\n",
" <td>200</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MSFT</th>\n",
" <td>Sowjanya</td>\n",
" <td>340</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Person Sales\n",
"Company \n",
"FB Satya 350\n",
"GOOG Shivendra 200\n",
"MSFT Sowjanya 340"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"by_comp.max()"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Person</th>\n",
" <th>Sales</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Company</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>FB</th>\n",
" <td>Mini</td>\n",
" <td>243</td>\n",
" </tr>\n",
" <tr>\n",
" <th>GOOG</th>\n",
" <td>Abhishek</td>\n",
" <td>120</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MSFT</th>\n",
" <td>Manish</td>\n",
" <td>124</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Person Sales\n",
"Company \n",
"FB Mini 243\n",
"GOOG Abhishek 120\n",
"MSFT Manish 124"
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"by_comp.min()"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead tr th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe thead tr:last-of-type th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr>\n",
" <th></th>\n",
" <th colspan=\"8\" halign=\"left\">Sales</th>\n",
" </tr>\n",
" <tr>\n",
" <th></th>\n",
" <th>count</th>\n",
" <th>mean</th>\n",
" <th>std</th>\n",
" <th>min</th>\n",
" <th>25%</th>\n",
" <th>50%</th>\n",
" <th>75%</th>\n",
" <th>max</th>\n",
" </tr>\n",
" <tr>\n",
" <th>Company</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>FB</th>\n",
" <td>2.0</td>\n",
" <td>296.5</td>\n",
" <td>75.660426</td>\n",
" <td>243.0</td>\n",
" <td>269.75</td>\n",
" <td>296.5</td>\n",
" <td>323.25</td>\n",
" <td>350.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>GOOG</th>\n",
" <td>2.0</td>\n",
" <td>160.0</td>\n",
" <td>56.568542</td>\n",
" <td>120.0</td>\n",
" <td>140.00</td>\n",
" <td>160.0</td>\n",
" <td>180.00</td>\n",
" <td>200.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>MSFT</th>\n",
" <td>2.0</td>\n",
" <td>232.0</td>\n",
" <td>152.735065</td>\n",
" <td>124.0</td>\n",
" <td>178.00</td>\n",
" <td>232.0</td>\n",
" <td>286.00</td>\n",
" <td>340.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Sales \n",
" count mean std min 25% 50% 75% max\n",
"Company \n",
"FB 2.0 296.5 75.660426 243.0 269.75 296.5 323.25 350.0\n",
"GOOG 2.0 160.0 56.568542 120.0 140.00 160.0 180.00 200.0\n",
"MSFT 2.0 232.0 152.735065 124.0 178.00 232.0 286.00 340.0"
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"by_comp.describe()"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Company</th>\n",
" <th>FB</th>\n",
" <th>GOOG</th>\n",
" <th>MSFT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th rowspan=\"8\" valign=\"top\">Sales</th>\n",
" <th>count</th>\n",
" <td>2.000000</td>\n",
" <td>2.000000</td>\n",
" <td>2.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>296.500000</td>\n",
" <td>160.000000</td>\n",
" <td>232.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>75.660426</td>\n",
" <td>56.568542</td>\n",
" <td>152.735065</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>243.000000</td>\n",
" <td>120.000000</td>\n",
" <td>124.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>269.750000</td>\n",
" <td>140.000000</td>\n",
" <td>178.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>296.500000</td>\n",
" <td>160.000000</td>\n",
" <td>232.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>323.250000</td>\n",
" <td>180.000000</td>\n",
" <td>286.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>350.000000</td>\n",
" <td>200.000000</td>\n",
" <td>340.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Company FB GOOG MSFT\n",
"Sales count 2.000000 2.000000 2.000000\n",
" mean 296.500000 160.000000 232.000000\n",
" std 75.660426 56.568542 152.735065\n",
" min 243.000000 120.000000 124.000000\n",
" 25% 269.750000 140.000000 178.000000\n",
" 50% 296.500000 160.000000 232.000000\n",
" 75% 323.250000 180.000000 286.000000\n",
" max 350.000000 200.000000 340.000000"
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"by_comp.describe().transpose()"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Sales count 2.000000\n",
" mean 160.000000\n",
" std 56.568542\n",
" min 120.000000\n",
" 25% 140.000000\n",
" 50% 160.000000\n",
" 75% 180.000000\n",
" max 200.000000\n",
"Name: GOOG, dtype: float64"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"by_comp.describe().transpose()['GOOG']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Merging, Joining, and Concatenating\n",
"\n",
"There are 3 main ways of combining DataFrames together: Merging, Joining and Concatenating. In this we will discuss these 3 methods with examples.\n",
"\n",
"____"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [],
"source": [
"df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\n",
" 'B': ['B0', 'B1', 'B2', 'B3'],\n",
" 'C': ['C0', 'C1', 'C2', 'C3'],\n",
" 'D': ['D0', 'D1', 'D2', 'D3']},\n",
" index=[0, 1, 2, 3])"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [],
"source": [
"df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],\n",
" 'B': ['B4', 'B5', 'B6', 'B7'],\n",
" 'C': ['C4', 'C5', 'C6', 'C7'],\n",
" 'D': ['D4', 'D5', 'D6', 'D7']},\n",
" index=[4, 5, 6, 7]) "
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [],
"source": [
"df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],\n",
" 'B': ['B8', 'B9', 'B10', 'B11'],\n",
" 'C': ['C8', 'C9', 'C10', 'C11'],\n",
" 'D': ['D8', 'D9', 'D10', 'D11']},\n",
" index=[8, 9, 10, 11])"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>A0</td>\n",
" <td>B0</td>\n",
" <td>C0</td>\n",
" <td>D0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>A1</td>\n",
" <td>B1</td>\n",
" <td>C1</td>\n",
" <td>D1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>A2</td>\n",
" <td>B2</td>\n",
" <td>C2</td>\n",
" <td>D2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>A3</td>\n",
" <td>B3</td>\n",
" <td>C3</td>\n",
" <td>D3</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"0 A0 B0 C0 D0\n",
"1 A1 B1 C1 D1\n",
"2 A2 B2 C2 D2\n",
"3 A3 B3 C3 D3"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>A4</td>\n",
" <td>B4</td>\n",
" <td>C4</td>\n",
" <td>D4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>A5</td>\n",
" <td>B5</td>\n",
" <td>C5</td>\n",
" <td>D5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>A6</td>\n",
" <td>B6</td>\n",
" <td>C6</td>\n",
" <td>D6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>A7</td>\n",
" <td>B7</td>\n",
" <td>C7</td>\n",
" <td>D7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"4 A4 B4 C4 D4\n",
"5 A5 B5 C5 D5\n",
"6 A6 B6 C6 D6\n",
"7 A7 B7 C7 D7"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>A8</td>\n",
" <td>B8</td>\n",
" <td>C8</td>\n",
" <td>D8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>A9</td>\n",
" <td>B9</td>\n",
" <td>C9</td>\n",
" <td>D9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>A10</td>\n",
" <td>B10</td>\n",
" <td>C10</td>\n",
" <td>D10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>A11</td>\n",
" <td>B11</td>\n",
" <td>C11</td>\n",
" <td>D11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"8 A8 B8 C8 D8\n",
"9 A9 B9 C9 D9\n",
"10 A10 B10 C10 D10\n",
"11 A11 B11 C11 D11"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Concatenation\n",
"\n",
"Concatenation basically glues together DataFrames. Keep in mind that dimensions should match along the axis you are concatenating on. You can use **pd.concat** and pass in a list of DataFrames to concatenate together:"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>A0</td>\n",
" <td>B0</td>\n",
" <td>C0</td>\n",
" <td>D0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>A1</td>\n",
" <td>B1</td>\n",
" <td>C1</td>\n",
" <td>D1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>A2</td>\n",
" <td>B2</td>\n",
" <td>C2</td>\n",
" <td>D2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>A3</td>\n",
" <td>B3</td>\n",
" <td>C3</td>\n",
" <td>D3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>A4</td>\n",
" <td>B4</td>\n",
" <td>C4</td>\n",
" <td>D4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>A5</td>\n",
" <td>B5</td>\n",
" <td>C5</td>\n",
" <td>D5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>A6</td>\n",
" <td>B6</td>\n",
" <td>C6</td>\n",
" <td>D6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>A7</td>\n",
" <td>B7</td>\n",
" <td>C7</td>\n",
" <td>D7</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>A8</td>\n",
" <td>B8</td>\n",
" <td>C8</td>\n",
" <td>D8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>A9</td>\n",
" <td>B9</td>\n",
" <td>C9</td>\n",
" <td>D9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>A10</td>\n",
" <td>B10</td>\n",
" <td>C10</td>\n",
" <td>D10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>A11</td>\n",
" <td>B11</td>\n",
" <td>C11</td>\n",
" <td>D11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"0 A0 B0 C0 D0\n",
"1 A1 B1 C1 D1\n",
"2 A2 B2 C2 D2\n",
"3 A3 B3 C3 D3\n",
"4 A4 B4 C4 D4\n",
"5 A5 B5 C5 D5\n",
"6 A6 B6 C6 D6\n",
"7 A7 B7 C7 D7\n",
"8 A8 B8 C8 D8\n",
"9 A9 B9 C9 D9\n",
"10 A10 B10 C10 D10\n",
"11 A11 B11 C11 D11"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat([df1,df2,df3])"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>A0</td>\n",
" <td>B0</td>\n",
" <td>C0</td>\n",
" <td>D0</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>A1</td>\n",
" <td>B1</td>\n",
" <td>C1</td>\n",
" <td>D1</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>A2</td>\n",
" <td>B2</td>\n",
" <td>C2</td>\n",
" <td>D2</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>A3</td>\n",
" <td>B3</td>\n",
" <td>C3</td>\n",
" <td>D3</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A4</td>\n",
" <td>B4</td>\n",
" <td>C4</td>\n",
" <td>D4</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A5</td>\n",
" <td>B5</td>\n",
" <td>C5</td>\n",
" <td>D5</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A6</td>\n",
" <td>B6</td>\n",
" <td>C6</td>\n",
" <td>D6</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A7</td>\n",
" <td>B7</td>\n",
" <td>C7</td>\n",
" <td>D7</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A8</td>\n",
" <td>B8</td>\n",
" <td>C8</td>\n",
" <td>D8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A9</td>\n",
" <td>B9</td>\n",
" <td>C9</td>\n",
" <td>D9</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A10</td>\n",
" <td>B10</td>\n",
" <td>C10</td>\n",
" <td>D10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>A11</td>\n",
" <td>B11</td>\n",
" <td>C11</td>\n",
" <td>D11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D A B C D A B C D\n",
"0 A0 B0 C0 D0 NaN NaN NaN NaN NaN NaN NaN NaN\n",
"1 A1 B1 C1 D1 NaN NaN NaN NaN NaN NaN NaN NaN\n",
"2 A2 B2 C2 D2 NaN NaN NaN NaN NaN NaN NaN NaN\n",
"3 A3 B3 C3 D3 NaN NaN NaN NaN NaN NaN NaN NaN\n",
"4 NaN NaN NaN NaN A4 B4 C4 D4 NaN NaN NaN NaN\n",
"5 NaN NaN NaN NaN A5 B5 C5 D5 NaN NaN NaN NaN\n",
"6 NaN NaN NaN NaN A6 B6 C6 D6 NaN NaN NaN NaN\n",
"7 NaN NaN NaN NaN A7 B7 C7 D7 NaN NaN NaN NaN\n",
"8 NaN NaN NaN NaN NaN NaN NaN NaN A8 B8 C8 D8\n",
"9 NaN NaN NaN NaN NaN NaN NaN NaN A9 B9 C9 D9\n",
"10 NaN NaN NaN NaN NaN NaN NaN NaN A10 B10 C10 D10\n",
"11 NaN NaN NaN NaN NaN NaN NaN NaN A11 B11 C11 D11"
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.concat([df1,df2,df3],axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Operations\n",
"\n",
"There are lots of operations with pandas that will be really useful to you, but don't fall into any distinct category. Let's show them here in this lecture:"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" <th>col2</th>\n",
" <th>col3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>444</td>\n",
" <td>abc</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>555</td>\n",
" <td>def</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>666</td>\n",
" <td>ghi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>444</td>\n",
" <td>xyz</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1 col2 col3\n",
"0 1 444 abc\n",
"1 2 555 def\n",
"2 3 666 ghi\n",
"3 4 444 xyz"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame({'col1':[1,2,3,4],'col2':[444,555,666,444],'col3':['abc','def','ghi','xyz']})\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([444, 555, 666], dtype=int64)"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['col2'].unique()"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 108,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['col2'].nunique()"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"444 2\n",
"555 1\n",
"666 1\n",
"Name: col2, dtype: int64"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['col2'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [],
"source": [
"#Select from DataFrame using criteria from multiple columns\n",
"newdf = df[(df['col1']>2) & (df['col2']==444)]"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" <th>col2</th>\n",
" <th>col3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>444</td>\n",
" <td>xyz</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1 col2 col3\n",
"3 4 444 xyz"
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"newdf"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [],
"source": [
"# Applying Functions\n",
"def times2(x):\n",
" return x*2"
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 2\n",
"1 4\n",
"2 6\n",
"3 8\n",
"Name: col1, dtype: int64"
]
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['col1'].apply(times2)"
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 3\n",
"1 3\n",
"2 3\n",
"3 3\n",
"Name: col3, dtype: int64"
]
},
"execution_count": 114,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['col3'].apply(len)"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['col1'].sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"** Permanently Removing a Column**"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [],
"source": [
"del df['col1']"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col2</th>\n",
" <th>col3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>444</td>\n",
" <td>abc</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>555</td>\n",
" <td>def</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>666</td>\n",
" <td>ghi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>444</td>\n",
" <td>xyz</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col2 col3\n",
"0 444 abc\n",
"1 555 def\n",
"2 666 ghi\n",
"3 444 xyz"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['col2', 'col3'], dtype='object')"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# get columns and index names \n",
"df.columns "
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RangeIndex(start=0, stop=4, step=1)"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.index"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col2</th>\n",
" <th>col3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>444</td>\n",
" <td>abc</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>555</td>\n",
" <td>def</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>666</td>\n",
" <td>ghi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>444</td>\n",
" <td>xyz</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col2 col3\n",
"0 444 abc\n",
"1 555 def\n",
"2 666 ghi\n",
"3 444 xyz"
]
},
"execution_count": 120,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col2</th>\n",
" <th>col3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>444</td>\n",
" <td>abc</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>444</td>\n",
" <td>xyz</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>555</td>\n",
" <td>def</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>666</td>\n",
" <td>ghi</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col2 col3\n",
"0 444 abc\n",
"3 444 xyz\n",
"1 555 def\n",
"2 666 ghi"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_values(by='col2') #inplace=False by default"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col2</th>\n",
" <th>col3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>False</td>\n",
" <td>False</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col2 col3\n",
"0 False False\n",
"1 False False\n",
"2 False False\n",
"3 False False"
]
},
"execution_count": 123,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Check is there any null value or not \n",
"df.isnull()"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col2</th>\n",
" <th>col3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>444</td>\n",
" <td>abc</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>555</td>\n",
" <td>def</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>666</td>\n",
" <td>ghi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>444</td>\n",
" <td>xyz</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col2 col3\n",
"0 444 abc\n",
"1 555 def\n",
"2 666 ghi\n",
"3 444 xyz"
]
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Drop rows with NaN Values\n",
"df.dropna()"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" <th>col2</th>\n",
" <th>col3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>NaN</td>\n",
" <td>abc</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2.0</td>\n",
" <td>555.0</td>\n",
" <td>def</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3.0</td>\n",
" <td>666.0</td>\n",
" <td>ghi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>NaN</td>\n",
" <td>444.0</td>\n",
" <td>xyz</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1 col2 col3\n",
"0 1.0 NaN abc\n",
"1 2.0 555.0 def\n",
"2 3.0 666.0 ghi\n",
"3 NaN 444.0 xyz"
]
},
"execution_count": 125,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pd.DataFrame({'col1':[1,2,3,np.nan],\n",
" 'col2':[np.nan,555,666,444],\n",
" 'col3':['abc','def','ghi','xyz']})\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>col1</th>\n",
" <th>col2</th>\n",
" <th>col3</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>FILL</td>\n",
" <td>abc</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>555</td>\n",
" <td>def</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>666</td>\n",
" <td>ghi</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>FILL</td>\n",
" <td>444</td>\n",
" <td>xyz</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" col1 col2 col3\n",
"0 1 FILL abc\n",
"1 2 555 def\n",
"2 3 666 ghi\n",
"3 FILL 444 xyz"
]
},
"execution_count": 126,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.fillna('FILL')"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {},
"outputs": [],
"source": [
"data = {'A':['foo','foo','foo','bar','bar','bar'],\n",
" 'B':['one','one','two','two','one','one'],\n",
" 'C':['x','y','x','y','x','y'],\n",
" 'D':[1,3,2,5,4,1]}\n",
"\n",
"df = pd.DataFrame(data)"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>A</th>\n",
" <th>B</th>\n",
" <th>C</th>\n",
" <th>D</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>foo</td>\n",
" <td>one</td>\n",
" <td>x</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>foo</td>\n",
" <td>one</td>\n",
" <td>y</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>foo</td>\n",
" <td>two</td>\n",
" <td>x</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>bar</td>\n",
" <td>two</td>\n",
" <td>y</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>bar</td>\n",
" <td>one</td>\n",
" <td>x</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>bar</td>\n",
" <td>one</td>\n",
" <td>y</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" A B C D\n",
"0 foo one x 1\n",
"1 foo one y 3\n",
"2 foo two x 2\n",
"3 bar two y 5\n",
"4 bar one x 4\n",
"5 bar one y 1"
]
},
"execution_count": 128,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Great"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}