mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
644 lines
30 KiB
Python
644 lines
30 KiB
Python
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"<!--BOOK_INFORMATION-->\n",
|
||
|
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
|
||
|
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
|
||
|
"\n",
|
||
|
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*\n",
|
||
|
"\n",
|
||
|
"*No changes were made to the contents of this notebook from the original.*"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"<!--NAVIGATION-->\n",
|
||
|
"< [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb) | [Contents](Index.ipynb) | [Computation on Arrays: Broadcasting](02.05-Computation-on-arrays-broadcasting.ipynb) >"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Aggregations: Min, Max, and Everything In Between"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Often when faced with a large amount of data, a first step is to compute summary statistics for the data in question.\n",
|
||
|
"Perhaps the most common summary statistics are the mean and standard deviation, which allow you to summarize the \"typical\" values in a dataset, but other aggregates are useful as well (the sum, product, median, minimum and maximum, quantiles, etc.).\n",
|
||
|
"\n",
|
||
|
"NumPy has fast built-in aggregation functions for working on arrays; we'll discuss and demonstrate some of them here."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Summing the Values in an Array\n",
|
||
|
"\n",
|
||
|
"As a quick example, consider computing the sum of all values in an array.\n",
|
||
|
"Python itself can do this using the built-in ``sum`` function:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import numpy as np"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"55.61209116604941"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"L = np.random.random(100)\n",
|
||
|
"sum(L)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The syntax is quite similar to that of NumPy's ``sum`` function, and the result is the same in the simplest case:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"55.612091166049424"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 3,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"np.sum(L)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"However, because it executes the operation in compiled code, NumPy's version of the operation is computed much more quickly:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"10 loops, best of 3: 104 ms per loop\n",
|
||
|
"1000 loops, best of 3: 442 µs per loop\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"big_array = np.random.rand(1000000)\n",
|
||
|
"%timeit sum(big_array)\n",
|
||
|
"%timeit np.sum(big_array)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Be careful, though: the ``sum`` function and the ``np.sum`` function are not identical, which can sometimes lead to confusion!\n",
|
||
|
"In particular, their optional arguments have different meanings, and ``np.sum`` is aware of multiple array dimensions, as we will see in the following section."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Minimum and Maximum\n",
|
||
|
"\n",
|
||
|
"Similarly, Python has built-in ``min`` and ``max`` functions, used to find the minimum value and maximum value of any given array:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 5,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"(1.1717128136634614e-06, 0.9999976784968716)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 5,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"min(big_array), max(big_array)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"NumPy's corresponding functions have similar syntax, and again operate much more quickly:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 6,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"(1.1717128136634614e-06, 0.9999976784968716)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 6,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"np.min(big_array), np.max(big_array)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 7,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"10 loops, best of 3: 82.3 ms per loop\n",
|
||
|
"1000 loops, best of 3: 497 µs per loop\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"%timeit min(big_array)\n",
|
||
|
"%timeit np.min(big_array)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"For ``min``, ``max``, ``sum``, and several other NumPy aggregates, a shorter syntax is to use methods of the array object itself:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 8,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"1.17171281366e-06 0.999997678497 499911.628197\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"print(big_array.min(), big_array.max(), big_array.sum())"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Whenever possible, make sure that you are using the NumPy version of these aggregates when operating on NumPy arrays!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Multi dimensional aggregates\n",
|
||
|
"\n",
|
||
|
"One common type of aggregation operation is an aggregate along a row or column.\n",
|
||
|
"Say you have some data stored in a two-dimensional array:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 9,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"[[ 0.8967576 0.03783739 0.75952519 0.06682827]\n",
|
||
|
" [ 0.8354065 0.99196818 0.19544769 0.43447084]\n",
|
||
|
" [ 0.66859307 0.15038721 0.37911423 0.6687194 ]]\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"M = np.random.random((3, 4))\n",
|
||
|
"print(M)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"By default, each NumPy aggregation function will return the aggregate over the entire array:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 10,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"6.0850555667307118"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 10,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"M.sum()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Aggregation functions take an additional argument specifying the *axis* along which the aggregate is computed. For example, we can find the minimum value within each column by specifying ``axis=0``:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 11,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([ 0.66859307, 0.03783739, 0.19544769, 0.06682827])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 11,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"M.min(axis=0)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The function returns four values, corresponding to the four columns of numbers.\n",
|
||
|
"\n",
|
||
|
"Similarly, we can find the maximum value within each row:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 12,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"array([ 0.8967576 , 0.99196818, 0.6687194 ])"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 12,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"M.max(axis=1)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The way the axis is specified here can be confusing to users coming from other languages.\n",
|
||
|
"The ``axis`` keyword specifies the *dimension of the array that will be collapsed*, rather than the dimension that will be returned.\n",
|
||
|
"So specifying ``axis=0`` means that the first axis will be collapsed: for two-dimensional arrays, this means that values within each column will be aggregated."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Other aggregation functions\n",
|
||
|
"\n",
|
||
|
"NumPy provides many other aggregation functions, but we won't discuss them in detail here.\n",
|
||
|
"Additionally, most aggregates have a ``NaN``-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point ``NaN`` value (for a fuller discussion of missing data, see [Handling Missing Data](03.04-Missing-Values.ipynb)).\n",
|
||
|
"Some of these ``NaN``-safe functions were not added until NumPy 1.8, so they will not be available in older NumPy versions.\n",
|
||
|
"\n",
|
||
|
"The following table provides a list of useful aggregation functions available in NumPy:\n",
|
||
|
"\n",
|
||
|
"|Function Name | NaN-safe Version | Description |\n",
|
||
|
"|-------------------|---------------------|-----------------------------------------------|\n",
|
||
|
"| ``np.sum`` | ``np.nansum`` | Compute sum of elements |\n",
|
||
|
"| ``np.prod`` | ``np.nanprod`` | Compute product of elements |\n",
|
||
|
"| ``np.mean`` | ``np.nanmean`` | Compute median of elements |\n",
|
||
|
"| ``np.std`` | ``np.nanstd`` | Compute standard deviation |\n",
|
||
|
"| ``np.var`` | ``np.nanvar`` | Compute variance |\n",
|
||
|
"| ``np.min`` | ``np.nanmin`` | Find minimum value |\n",
|
||
|
"| ``np.max`` | ``np.nanmax`` | Find maximum value |\n",
|
||
|
"| ``np.argmin`` | ``np.nanargmin`` | Find index of minimum value |\n",
|
||
|
"| ``np.argmax`` | ``np.nanargmax`` | Find index of maximum value |\n",
|
||
|
"| ``np.median`` | ``np.nanmedian`` | Compute median of elements |\n",
|
||
|
"| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements |\n",
|
||
|
"| ``np.any`` | N/A | Evaluate whether any elements are true |\n",
|
||
|
"| ``np.all`` | N/A | Evaluate whether all elements are true |\n",
|
||
|
"\n",
|
||
|
"We will see these aggregates often throughout the rest of the book."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Example: What is the Average Height of US Presidents?"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Aggregates available in NumPy can be extremely useful for summarizing a set of values.\n",
|
||
|
"As a simple example, let's consider the heights of all US presidents.\n",
|
||
|
"This data is available in the file *president_heights.csv*, which is a simple comma-separated list of labels and values:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 13,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"order,name,height(cm)\r\n",
|
||
|
"1,George Washington,189\r\n",
|
||
|
"2,John Adams,170\r\n",
|
||
|
"3,Thomas Jefferson,189\r\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"!head -4 data/president_heights.csv"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"We'll use the Pandas package, which we'll explore more fully in [Chapter 3](03.00-Introduction-to-Pandas.ipynb), to read the file and extract this information (note that the heights are measured in centimeters)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 14,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173\n",
|
||
|
" 174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183\n",
|
||
|
" 177 185 188 188 182 185]\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"import pandas as pd\n",
|
||
|
"data = pd.read_csv('data/president_heights.csv')\n",
|
||
|
"heights = np.array(data['height(cm)'])\n",
|
||
|
"print(heights)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Now that we have this data array, we can compute a variety of summary statistics:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 15,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"Mean height: 179.738095238\n",
|
||
|
"Standard deviation: 6.93184344275\n",
|
||
|
"Minimum height: 163\n",
|
||
|
"Maximum height: 193\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"print(\"Mean height: \", heights.mean())\n",
|
||
|
"print(\"Standard deviation:\", heights.std())\n",
|
||
|
"print(\"Minimum height: \", heights.min())\n",
|
||
|
"print(\"Maximum height: \", heights.max())"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Note that in each case, the aggregation operation reduced the entire array to a single summarizing value, which gives us information about the distribution of values.\n",
|
||
|
"We may also wish to compute quantiles:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 16,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"25th percentile: 174.25\n",
|
||
|
"Median: 182.0\n",
|
||
|
"75th percentile: 183.0\n"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"print(\"25th percentile: \", np.percentile(heights, 25))\n",
|
||
|
"print(\"Median: \", np.median(heights))\n",
|
||
|
"print(\"75th percentile: \", np.percentile(heights, 75))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"We see that the median height of US presidents is 182 cm, or just shy of six feet.\n",
|
||
|
"\n",
|
||
|
"Of course, sometimes it's more useful to see a visual representation of this data, which we can accomplish using tools in Matplotlib (we'll discuss Matplotlib more fully in [Chapter 4](04.00-Introduction-To-Matplotlib.ipynb)). For example, this code generates the following chart:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 17,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%matplotlib inline\n",
|
||
|
"import matplotlib.pyplot as plt\n",
|
||
|
"import seaborn; seaborn.set() # set plot style"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 18,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAfYAAAFtCAYAAAD1Skg8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8VNXdx/HvJJNEskAWEyzIJiWgqLwKFFHBIsrrCRZq\nArUImIBaqUKEVqgIRJBSjMCTUkCwRNyIYKxABCyCLxTxAZTNumBBVLayBxJCEsAsc58/eDGSlZkk\nMxNOPu+/Mss953fuuck3d5k7NsuyLAEAACP4+boAAABQdwh2AAAMQrADAGAQgh0AAIMQ7AAAGIRg\nBwDAIAQ76r0OHTrozJkzZZ7LysrS448/fsVl//CHP+iHH36o9j0TJkzQa6+9Vulr8+fP10cffVTh\n+SNHjuimm25SQkKCEhIS9Jvf/EYDBw7Uu+++63zP3LlztXLlymr7rqr98stXtg6u5Ouvv9aUKVMk\nSbt27dKYMWPcWr4mHA6HnnjiCcXFxWnJkiVlXqtqzh5//HHneissLNSzzz6r/v376/7779eAAQP0\nzjvvVNpXVlaWunbtqoSEBA0YMEDx8fEaMmSIvvjiizobT1Xbz7p165SYmFirth999FG35xRwhd3X\nBQBXYrPZarzswoULa9X3Z599pnbt2lX62jXXXKOsrCzn46NHj2r48OEKCQlRnz59NHr06Fq1f/ny\nNVkH3333nU6cOCFJuvnmmzVnzhy323DX8ePHtWXLFn3xxRc1qjktLU0hISFavXq1JCk7O1uDBg1S\n8+bNdccdd1R4f9euXfWPf/zD+XjDhg1KTk7WJ598Ij+/2u+3VLf91Ga7lKTNmzfXanmgKgQ76r0r\n3UOpuLhY//u//6vt27fL4XDoxhtvVEpKikJCQtS7d2/NmzdPHTt2VHp6upYvX66QkBB17dpV69ev\nd+4tf/7551q3bp1Onz6t2NhYpaWlafny5dq1a5dmzpwpPz8/3XvvvdXW0axZM40ePVqvvPKK+vTp\nowkTJig2NlYPP/yw5s6dqw8//FABAQEKDw9XamqqPvjggzLtf/jhhzpz5owOHz6sXr166dSpU87l\nLcvS3/72N+3atUuWZWnMmDHq1auXsrKytG7dOme4XXr83HPPad68eSooKNDEiRMVHx+vadOmafXq\n1SooKNDUqVO1Z88e2Ww29ezZU2PHjpWfn59uvfVWjRgxQps3b1Z2drYSExM1bNiwCmPdsWOHZs2a\npQsXLiggIEBjxoxR586d9dhjj6mkpEQDBgzQ3Llz1aJFC7fmOjs7W9dee62Ki4sVEBCg6OhozZs3\nT02aNHFp+dtvv12nT5/W2bNnNWPGjDLrc/To0VVuJ0uXLtXbb7+twMBABQUFaerUqWrbtm2Z7WfO\nnDl67733FBERoZYtWzr7vNL2N2DAAH366ac6duyY7rvvPo0bN04TJkyQJCUlJenll1/Whx9+WGn/\nQE1wKB5XhaSkJOdh7/j4eM2dO9f5Wnp6uux2u1asWKF3331XMTExSktLK7P8pk2b9O6772r58uVa\nsWKFCgsLy+xxnTx5UosXL9a6det07NgxffDBBxo6dKhuvvlmPf3001cM9Us6dOigb7/9tsxzx48f\n1+LFi7Vs2TItW7ZMPXr00FdffeVsf/z48c72f/zxR61evVpjx46t0HarVq20YsUKzZw5U+PHj1du\nbm6VdVx33XUaPXq0unTpoueff77Ma9OmTVNERIRWr16t5cuXa8+ePXrllVckSUVFRYqMjNRbb72l\nOXPmKC0tTUVFRWWWP3PmjMaMGaOUlBStXLlSL7zwgv785z/rzJkzSk9PV1BQkLKystwOdUlKTk7W\nli1bdPvtt+v3v/+9FixYoJCQEF1//fUuLZ+Zmal27dopPDxcUtn1WdV24nA4lJqaqldeeUXvvPOO\nfve73+nzzz8v0+6HH36o9evXa9WqVcrMzFRBQYHztSttf+fOndOSJUv01ltvKSMjQ0eOHFFqaqok\nKSMjQ9HR0VfsH3AHe+y4KmRkZJTZa7u0ZypJH3/8sfLz852HNktKShQVFVVm+Y0bNyouLk6hoaGS\npKFDh+qzzz5zvn7PPfcoMDBQkhQbG6ucnJwa1Wmz2dSoUaMyzzVt2lQ33nijEhIS1LNnT9111126\n/fbbna9ffkSic+fOVbb94IMPSpLatWundu3a1fhc8v/93/8pMzNTkhQQEKDBgwfrjTfe0GOPPSbp\n4rqQpI4dO6q4uFjnz593rhtJ+vLLL9WqVSvdcsstkqSf//zn6ty5s7Zt26Zu3bpV2W9Vh64dDofz\nsHn79u21bt06/ec//9G2bdu0efNmLVy4UHPmzFGvXr0qLLtjxw4lJCRIurjnfMMNN2jevHnO1y9f\nn1VtJ35+furbt68GDRqkXr166c4771T//v3L9PPpp5+qT58+zrkdOHCgMjIyqm33kkvrs2nTpoqK\nilJeXp6aN28u6eLcu9I/4A6CHVeF6g7Hl5aWatKkSerZs6ck6fz58/rxxx/LvMdut5dpo/z514CA\nAOfPNpvtiof/q/LVV18pNja2zHM2m00ZGRnatWuXtmzZotTUVHXv3l0TJ06ssHxISEiVbV9es8Ph\nkN1urxCWxcXFV6zR4XBUeFxSUuJ8HBQUVOb18uuisnVTvo3KREREVHqx2KlTpxQeHq7S0lJNnTpV\n48aN00033aSbbrpJw4cP10svvaTMzMxKg738OfbyLl+f1W0nM2fO1Pfff68tW7bo5Zdf1vLlyzV/\n/vwqx+3v7+9Su9LFazGqaueS8v0vW7ZMCxYsqHJcQHU4FI+rXs+ePbVkyRIVFxfL4XBo0qRJ+tvf\n/lbmPb/61a/0wQcfOA+hLlu2zKWLn+x2e5WBVf4P9P79+/XSSy/pkUceKfP8nj171K9fP7Vt21Yj\nRozQ8OHDtWfPniu2X96KFSskSd98840OHTqkTp06KSIiQnv37lVRUZFKSkrKXGHv7+9fads9evRw\nXrFeVFSkt99+W3feeadLY5SkTp06af/+/fr6668lXbxIb+fOnbrtttuqXEa6uPd86NAh7dy50/nc\n1q1bdfToUXXu3Fn+/v7av3+/FixY4Ky7pKREhw4dUseOHa+4fq6kqu0kNzdXvXr1Unh4uJKSkvTH\nP/7ROT+XL7t27Vrl5+fL4XCU+bSDK9tfZS7NfWX9lz+dA7iDPXbUe1cK4JEjR2rmzJlKSEhwXrw0\nfvz4Mst2795dDzzwgB588EFdc801ateuXYVD5pW5++67NWPGDBUVFSk+Pr7Ma0VFRc7DwDabTUFB\nQRo3bpzuuuuuMu/r0KGD+vbtqwEDBig4OFiNGjVSSkpKhfavtA4OHz6shIQE2Ww2zZ49W40bN1aP\nHj3UrVs3xcXFKSYmRrfddpszFH7xi1/o73//u5588skyH81KSUnRtGnT1L9/fxUXF6tnz57Oj6GV\nX9eVrfuIiAjNmTNH06ZN0/nz5+Xv76/U1FS1bNlSR44cqXK+wsLCNG/ePKWlpencuXMqKSlRZGSk\n0tPTnadI5s2bp5kzZ+p//ud/FBwcLMuydM8992jUqFHVrh9XVLWdhISEaOTIkRo2bJiCgoIUEBCg\n6dOnlxn/r371K3333XcaOHCgmjRpog4dOjivcXBl+6tsfd57770aMmSIFixYUGX/QE3Y+NpWNAS7\ndu3Sv//9b2fAvf766/rqq69c2rMCgKuJxw/Ff/nll84/prt379bQoUOVlJSk3//+9zW+QAlwV+vW\nrbVjxw71799f/fv312effaZnnnnG12UBQJ3z6B77okWLtHLlSoWEhCgzM1OJiYlKSUlR+/bt9fbb\nb2v//v38cQUAoA55dI+9VatWZa4snT17ttq3by/p4kUx5a++BQAAtePRYO/Tp0+Zj4Vce+21ki7e\n5Wvp0qUaPny4J7sHAKDB8fpV8WvWrNHChQuVnp6uiIiIK77fsqxa35MZgPn27t2rxAlLFdwkxtel\nVHAu76QyUodUuMcB4Al
|
||
|
"text/plain": [
|
||
|
"<matplotlib.figure.Figure at 0x116102f98>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plt.hist(heights)\n",
|
||
|
"plt.title('Height Distribution of US Presidents')\n",
|
||
|
"plt.xlabel('height (cm)')\n",
|
||
|
"plt.ylabel('number');"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"These aggregates are some of the fundamental pieces of exploratory data analysis that we'll explore in more depth in later chapters of the book."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"<!--NAVIGATION-->\n",
|
||
|
"< [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb) | [Contents](Index.ipynb) | [Computation on Arrays: Broadcasting](02.05-Computation-on-arrays-broadcasting.ipynb) >"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"anaconda-cloud": {},
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.4.3"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 0
|
||
|
}
|