mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
802 lines
99 KiB
Python
802 lines
99 KiB
Python
|
{
|
|||
|
"cells": [
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"<!--BOOK_INFORMATION-->\n",
|
|||
|
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
|
|||
|
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
|
|||
|
"\n",
|
|||
|
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*\n",
|
|||
|
"\n",
|
|||
|
"*No changes were made to the contents of this notebook from the original.*"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"<!--NAVIGATION-->\n",
|
|||
|
"< [Aggregations: Min, Max, and Everything In Between](02.04-Computation-on-arrays-aggregates.ipynb) | [Contents](Index.ipynb) | [Comparisons, Masks, and Boolean Logic](02.06-Boolean-Arrays-and-Masks.ipynb) >"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"# Computation on Arrays: Broadcasting"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"We saw in the previous section how NumPy's universal functions can be used to *vectorize* operations and thereby remove slow Python loops.\n",
|
|||
|
"Another means of vectorizing operations is to use NumPy's *broadcasting* functionality.\n",
|
|||
|
"Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Introducing Broadcasting\n",
|
|||
|
"\n",
|
|||
|
"Recall that for arrays of the same size, binary operations are performed on an element-by-element basis:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 1,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"import numpy as np"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([5, 6, 7])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 2,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"a = np.array([0, 1, 2])\n",
|
|||
|
"b = np.array([5, 5, 5])\n",
|
|||
|
"a + b"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Broadcasting allows these types of binary operations to be performed on arrays of different sizes–for example, we can just as easily add a scalar (think of it as a zero-dimensional array) to an array:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([5, 6, 7])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 3,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"a + 5"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"We can think of this as an operation that stretches or duplicates the value ``5`` into the array ``[5, 5, 5]``, and adds the results.\n",
|
|||
|
"The advantage of NumPy's broadcasting is that this duplication of values does not actually take place, but it is a useful mental model as we think about broadcasting.\n",
|
|||
|
"\n",
|
|||
|
"We can similarly extend this to arrays of higher dimension. Observe the result when we add a one-dimensional array to a two-dimensional array:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([[ 1., 1., 1.],\n",
|
|||
|
" [ 1., 1., 1.],\n",
|
|||
|
" [ 1., 1., 1.]])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 4,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"M = np.ones((3, 3))\n",
|
|||
|
"M"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([[ 1., 2., 3.],\n",
|
|||
|
" [ 1., 2., 3.],\n",
|
|||
|
" [ 1., 2., 3.]])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 5,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"M + a"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Here the one-dimensional array ``a`` is stretched, or broadcast across the second dimension in order to match the shape of ``M``.\n",
|
|||
|
"\n",
|
|||
|
"While these examples are relatively easy to understand, more complicated cases can involve broadcasting of both arrays. Consider the following example:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 6,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"name": "stdout",
|
|||
|
"output_type": "stream",
|
|||
|
"text": [
|
|||
|
"[0 1 2]\n",
|
|||
|
"[[0]\n",
|
|||
|
" [1]\n",
|
|||
|
" [2]]\n"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"a = np.arange(3)\n",
|
|||
|
"b = np.arange(3)[:, np.newaxis]\n",
|
|||
|
"\n",
|
|||
|
"print(a)\n",
|
|||
|
"print(b)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([[0, 1, 2],\n",
|
|||
|
" [1, 2, 3],\n",
|
|||
|
" [2, 3, 4]])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 7,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"a + b"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Just as before we stretched or broadcasted one value to match the shape of the other, here we've stretched *both* ``a`` and ``b`` to match a common shape, and the result is a two-dimensional array!\n",
|
|||
|
"The geometry of these examples is visualized in the following figure (Code to produce this plot can be found in the [appendix](06.00-Figure-Code.ipynb#Broadcasting), and is adapted from source published in the [astroML](http://astroml.org) documentation. Used by permission)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"![Broadcasting Visual](figures/02.05-broadcasting.png)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"The light boxes represent the broadcasted values: again, this extra memory is not actually allocated in the course of the operation, but it can be useful conceptually to imagine that it is."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Rules of Broadcasting\n",
|
|||
|
"\n",
|
|||
|
"Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:\n",
|
|||
|
"\n",
|
|||
|
"- Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is *padded* with ones on its leading (left) side.\n",
|
|||
|
"- Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.\n",
|
|||
|
"- Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.\n",
|
|||
|
"\n",
|
|||
|
"To make these rules clear, let's consider a few examples in detail."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Broadcasting example 1\n",
|
|||
|
"\n",
|
|||
|
"Let's look at adding a two-dimensional array to a one-dimensional array:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 8,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"M = np.ones((2, 3))\n",
|
|||
|
"a = np.arange(3)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Let's consider an operation on these two arrays. The shape of the arrays are\n",
|
|||
|
"\n",
|
|||
|
"- ``M.shape = (2, 3)``\n",
|
|||
|
"- ``a.shape = (3,)``\n",
|
|||
|
"\n",
|
|||
|
"We see by rule 1 that the array ``a`` has fewer dimensions, so we pad it on the left with ones:\n",
|
|||
|
"\n",
|
|||
|
"- ``M.shape -> (2, 3)``\n",
|
|||
|
"- ``a.shape -> (1, 3)``\n",
|
|||
|
"\n",
|
|||
|
"By rule 2, we now see that the first dimension disagrees, so we stretch this dimension to match:\n",
|
|||
|
"\n",
|
|||
|
"- ``M.shape -> (2, 3)``\n",
|
|||
|
"- ``a.shape -> (2, 3)``\n",
|
|||
|
"\n",
|
|||
|
"The shapes match, and we see that the final shape will be ``(2, 3)``:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([[ 1., 2., 3.],\n",
|
|||
|
" [ 1., 2., 3.]])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 9,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"M + a"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Broadcasting example 2\n",
|
|||
|
"\n",
|
|||
|
"Let's take a look at an example where both arrays need to be broadcast:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 10,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"a = np.arange(3).reshape((3, 1))\n",
|
|||
|
"b = np.arange(3)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Again, we'll start by writing out the shape of the arrays:\n",
|
|||
|
"\n",
|
|||
|
"- ``a.shape = (3, 1)``\n",
|
|||
|
"- ``b.shape = (3,)``\n",
|
|||
|
"\n",
|
|||
|
"Rule 1 says we must pad the shape of ``b`` with ones:\n",
|
|||
|
"\n",
|
|||
|
"- ``a.shape -> (3, 1)``\n",
|
|||
|
"- ``b.shape -> (1, 3)``\n",
|
|||
|
"\n",
|
|||
|
"And rule 2 tells us that we upgrade each of these ones to match the corresponding size of the other array:\n",
|
|||
|
"\n",
|
|||
|
"- ``a.shape -> (3, 3)``\n",
|
|||
|
"- ``b.shape -> (3, 3)``\n",
|
|||
|
"\n",
|
|||
|
"Because the result matches, these shapes are compatible. We can see this here:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 11,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([[0, 1, 2],\n",
|
|||
|
" [1, 2, 3],\n",
|
|||
|
" [2, 3, 4]])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 11,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"a + b"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Broadcasting example 3\n",
|
|||
|
"\n",
|
|||
|
"Now let's take a look at an example in which the two arrays are not compatible:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 12,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"M = np.ones((3, 2))\n",
|
|||
|
"a = np.arange(3)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"This is just a slightly different situation than in the first example: the matrix ``M`` is transposed.\n",
|
|||
|
"How does this affect the calculation? The shape of the arrays are\n",
|
|||
|
"\n",
|
|||
|
"- ``M.shape = (3, 2)``\n",
|
|||
|
"- ``a.shape = (3,)``\n",
|
|||
|
"\n",
|
|||
|
"Again, rule 1 tells us that we must pad the shape of ``a`` with ones:\n",
|
|||
|
"\n",
|
|||
|
"- ``M.shape -> (3, 2)``\n",
|
|||
|
"- ``a.shape -> (1, 3)``\n",
|
|||
|
"\n",
|
|||
|
"By rule 2, the first dimension of ``a`` is stretched to match that of ``M``:\n",
|
|||
|
"\n",
|
|||
|
"- ``M.shape -> (3, 2)``\n",
|
|||
|
"- ``a.shape -> (3, 3)``\n",
|
|||
|
"\n",
|
|||
|
"Now we hit rule 3–the final shapes do not match, so these two arrays are incompatible, as we can observe by attempting this operation:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 13,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"ename": "ValueError",
|
|||
|
"evalue": "operands could not be broadcast together with shapes (3,2) (3,) ",
|
|||
|
"output_type": "error",
|
|||
|
"traceback": [
|
|||
|
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
|||
|
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
|
|||
|
"\u001b[0;32m<ipython-input-13-9e16e9f98da6>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mM\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
|
|||
|
"\u001b[0;31mValueError\u001b[0m: operands could not be broadcast together with shapes (3,2) (3,) "
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"M + a"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Note the potential confusion here: you could imagine making ``a`` and ``M`` compatible by, say, padding ``a``'s shape with ones on the right rather than the left.\n",
|
|||
|
"But this is not how the broadcasting rules work!\n",
|
|||
|
"That sort of flexibility might be useful in some cases, but it would lead to potential areas of ambiguity.\n",
|
|||
|
"If right-side padding is what you'd like, you can do this explicitly by reshaping the array (we'll use the ``np.newaxis`` keyword introduced in [The Basics of NumPy Arrays](02.02-The-Basics-Of-NumPy-Arrays.ipynb)):"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 14,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"(3, 1)"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 14,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"a[:, np.newaxis].shape"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([[ 1., 1.],\n",
|
|||
|
" [ 2., 2.],\n",
|
|||
|
" [ 3., 3.]])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 15,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"M + a[:, np.newaxis]"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Also note that while we've been focusing on the ``+`` operator here, these broadcasting rules apply to *any* binary ``ufunc``.\n",
|
|||
|
"For example, here is the ``logaddexp(a, b)`` function, which computes ``log(exp(a) + exp(b))`` with more precision than the naive approach:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 16,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([[ 1.31326169, 1.31326169],\n",
|
|||
|
" [ 1.69314718, 1.69314718],\n",
|
|||
|
" [ 2.31326169, 2.31326169]])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 16,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"np.logaddexp(M, a[:, np.newaxis])"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"For more information on the many available universal functions, refer to [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb)."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"## Broadcasting in Practice"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"Broadcasting operations form the core of many examples we'll see throughout this book.\n",
|
|||
|
"We'll now take a look at a couple simple examples of where they can be useful."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Centering an array"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"In the previous section, we saw that ufuncs allow a NumPy user to remove the need to explicitly write slow Python loops. Broadcasting extends this ability.\n",
|
|||
|
"One commonly seen example is when centering an array of data.\n",
|
|||
|
"Imagine you have an array of 10 observations, each of which consists of 3 values.\n",
|
|||
|
"Using the standard convention (see [Data Representation in Scikit-Learn](05.02-Introducing-Scikit-Learn.ipynb#Data-Representation-in-Scikit-Learn)), we'll store this in a $10 \\times 3$ array:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 17,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"X = np.random.random((10, 3))"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"We can compute the mean of each feature using the ``mean`` aggregate across the first dimension:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 18,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([ 0.53514715, 0.66567217, 0.44385899])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 18,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"Xmean = X.mean(0)\n",
|
|||
|
"Xmean"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"And now we can center the ``X`` array by subtracting the mean (this is a broadcasting operation):"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 19,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"X_centered = X - Xmean"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"To double-check that we've done this correctly, we can check that the centered array has near zero mean:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 20,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"text/plain": [
|
|||
|
"array([ 2.22044605e-17, -7.77156117e-17, -1.66533454e-17])"
|
|||
|
]
|
|||
|
},
|
|||
|
"execution_count": 20,
|
|||
|
"metadata": {},
|
|||
|
"output_type": "execute_result"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"X_centered.mean(0)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"To within machine precision, the mean is now zero."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"### Plotting a two-dimensional function"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"One place that broadcasting is very useful is in displaying images based on two-dimensional functions.\n",
|
|||
|
"If we want to define a function $z = f(x, y)$, broadcasting can be used to compute the function across the grid:"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 21,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"# x and y have 50 steps from 0 to 5\n",
|
|||
|
"x = np.linspace(0, 5, 50)\n",
|
|||
|
"y = np.linspace(0, 5, 50)[:, np.newaxis]\n",
|
|||
|
"\n",
|
|||
|
"z = np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"We'll use Matplotlib to plot this two-dimensional array (these tools will be discussed in full in [Density and Contour Plots](04.04-Density-and-Contour-Plots.ipynb)):"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 22,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [],
|
|||
|
"source": [
|
|||
|
"%matplotlib inline\n",
|
|||
|
"import matplotlib.pyplot as plt"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "code",
|
|||
|
"execution_count": 23,
|
|||
|
"metadata": {
|
|||
|
"collapsed": false
|
|||
|
},
|
|||
|
"outputs": [
|
|||
|
{
|
|||
|
"data": {
|
|||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAATYAAAEACAYAAAA5n1oZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzsvW+sd9t21/UZY871++39nHN7ae+tlXD7ByhUYzCEmPIv\nkSZNpCCCL0hsNTH6whBjE4zKG94YEn3RSJRoIVJTDSQo+sJaXkgpiQmkAaEiNTEUUxEqbW8qtPde\nes7Z+/dbc87hizHmXHOtvfc5zznPPueWe/d8sp71+7fX37m+8zvG+I4xxcx4aS/tpb20r6SmX+4D\neGkv7aW9tOduL8D20l7aS/uKay/A9tJe2kv7imsvwPbSXtpL+4prL8D20l7aS/uKay/A9tJe2kv7\nimv5dX4kIn8P+BLQgNXMvv3jPKiX9tJe2kt7k/ZawIYD2neY2Rc+zoN5aS/tpb2052iva4rKh/jt\nS3tpL+2lfVnb64KVAX9RRH5cRP6tj/OAXtpLe2kv7U3b65qiv93MPi8iX48D3E+a2Y99nAf20l7a\nS3tpH7W9FrCZ2edj/Q9E5IeAbwd2wCYiL0mnL+2lfZmamcmb/P23fONiP/0z5XV//tNm9i1vsr+P\nu8kHJcGLyCtAzewdEXkL+FHgj5jZjx5+Z9/1jX8QS0J5+0R9+0z91Iny9on1U8r6KbjGwqcar96+\n5623Lr5++57PnN7h60+/xNcvvv50vuNEYZHKIoUTlTtbeK+debededdO/KNyyy+sb/EL17f5xetb\n/ML1Le7uT1zuTtzfnbjcL5T7jF5Br4JeBL2CNOEf/uUf4Z/4bd+FGCBGy4zFMti5wU3Dbnx9c3Ph\na2/v+BU37/G1t+/xtTfv8dnll/j6vC1fo1duBW7ExnrFWA1W89fvtIUv1Fd8sd36ur7iC+stX1xf\n8cX1li+sr3hvPXFfFu7XzGXNXErGTPiF/+F/4et+/3diJog2khopNVJqLKlym6+8WlZeLVdul5W3\n04W304W3sq9f6ZUsjSyVJI1M5WKZ9+qZ99qJ99qZd8qZL9zf8sXLK75w/4ov3N9S312QdxO8k5B3\nEvqOsvxS4/RLNdYNefeCvHuPvHePvHsP91f+zv3f5Nek3wCt+XJakJsz3JyR8xl7dWL91ELpy9sn\n1k8Z69dA+ZSxfsrIbxe+5q33+PSrOz4d669b3uHr8rt8Jr/DZ5Z3eUsvJGkk2lj/UrvlH9WbWN/y\nxfKKX7y84gvXV/zi5S1+8fKKcp8pd5n1fqHcZeyifOHP/yif/ed/J7oKuuJOGO8iAJhCW6KvLGCL\nobcVvSm+vi28fXPPZ2/e5bPnd/nMzTt85vwun0nv8pkc6/QOr7RyEuEEnERIwL0J98ZYvlRv+EJ9\ni1+sb/m6vOJL660v11u+tN5QWuKv/M7/5I2BTURs/fyvfa3fLr/y7zzYn4j8IPB7gJ83s3/2iX38\n58DvAt4F/g0z+4n4/LuAP4a7xn7QzL7vI59ItNdhbN8A/FAwsgz8mSOoPdkewUzpH5tg8bqZ0JD9\n2oQmEpuQ6X9DxFAzVAzFUJq/jkXEEDX/AzEQ2cIfsl/eD9b7d479ghnYdJxm07H2Y8cwBMPG+R33\nIrv19p1tZ+j7MpmWuGbN32OK0WgIglIwiiZKa1xrJmnjKpmrVs6tUDRRSWgcn+DXKtFYtHKiUFGq\nKvcpc59WbvPK/ZJZT0K7QjsJ7azU1UhXqFdBr0q9gpaElgQlI2tGaoOrIEnxwdP8QtYGpYKu/v1F\n0EVJi2K50hZxwMjQsiBJaJq56sJ76YxkI1thscqJyqIVAxapnKSwUMcFTmIsUrnRlVu98iplLilz\nyZmrJS61ca0GTWhNaADJ4GyYQBPvINKiD5j3IYsF8d/1r3pf7v2imFIssY5FWU0pKMWMBDRxyYFO\nPUBjSVOfVmmjv/enwkzet/9+2Fatvcmf/zfAfwH86ce+FJHfBfxaM/t1IvKbgf8S+C0iosD3A98J\n/Bzw4yLyw2b2t9/kYD4Q2Mzs7wK/8aPuQGx7kLeN9m33myNjPUBjWnbbExDzb5Q9mA1QE4sd2wMw\nswnQTGIknj7fdnS8DhPgIDRTX+IYt2NmgNrhdKfN2g7U5l11ELN4YjqwEX3OmoMbCm3Efnw7RRur\nNpI2cstcW2VtiVUTpSWKKoqSpcV+HdiyVE6imK60JNymzH1eubSV+5bRRVlPQlkd1OwE9Rws+Ap1\nVaworAldM3ZdkGKgGkuDCjSDVqEUv5FJkIuii2KLklPDsmCL0BZBF0GyUlPikhckN+oCi1UWqrP5\nVBCBs6w09SuZ4mIpjUUqZ1m51TSBWmYlk6qz92ZK6SCRDDs1DEXYgG0AnIIlQC3ArfefPuB4H65N\nqQFsvuhYr6ZUaX5Jor/MHWQDN+/jiRZ9vbF5fPrA+UZEbdfaG8Ckmf2YiHzz+/zk9xGgZ2Z/TUQ+\nLSLfAPxq4KfM7KcBROTPxm8/XmB7lhbDmWwUJtYPQW2A28To4qf0VWcbKm0Dtx3I+XNDH1XVgcsE\nJNjbq2/+1kfB7v36Se9IGzuTPbiZYGL7U5yaHNYAO9dkB/wOexNjw+D8T/0aB7XWAdbBrbsTqiqr\nJlKA20kzaysBbkoxJYvQAsxFjISxUDEVJID0klbu88p9W7ltC5wUVnEAWxP1rOgKeoV0FeoKrIqs\nCVkTnDKUxteeP+c3ofbza1htiJRxrWVR9KKQM5IabVHaAimLuwWSA9s1Z1qG66IBaoVTqpxqRQWa\nCgIkdTCDYGwUTKFY4pquDmqWWEnQoDWhmHAxB7Lbf+ZXw8mHpoYhLYCtia8FLFmAW/QvZkuEwdhq\ngFgfXFbTALi4F7EPG33PBhFQcEYt+77eh8/xbDwjZVutfvCPPnr7VcDfn97/THz22OdvnADwsQDb\nB44htt0UMx+12hHcBlvbttZvqgZrm8GsU/UjY7MOcgfW9upbvhVr0+cfcOAdgB+C7wx0xHFPHW8+\n6b4LiXOR/XdjBN4xWaA5U7v5tl/roNb8pBrm7NU6Y3NQW1NCm3GtNR4oZw3VnEVY7FgxRLZxWgNk\n7/PKbXNwu9hKK0o9JXRtUIxWcNP0LNQryCqwJuSa0FOGk0ExPvPqG7H7i9+AfrNrPDwtrsclQcqk\nVNFU3QRdlBSmaM1Cy5lrhmtWWBJZ3AQ9ZV9SuB2SNU5WaPTzayzBgKpeuaYUbC1RJGFNKE25oCQy\nBeWt3/AttGuLPghSHdikekc13/AYLHcujZmxdQBrmylaLLHin9fRX+JSDBYdTE06Y2sD4JK0YYbC\n85uiTzG2H/sr9/zYX708456A14CJN2mfDGMjzmKmX1OveMDYdubdY3TbfLAMgFD6yBbgphO4dXNh\n8o2M7mDswG5/sPGTR+51NxV7p5yP3SzMkgOs0c3eafN7Y3s+y9lxs5k43cdGC34wdewW/rGaEqUa\nqsZVE9eWWGvy15bJVqmm43jBH6AsDgcmcKMrN+nKK3OzrWWlLsp6SlxLphSFq2Cr0K5CXQVdhXZN\ntLVhq2GlOYjVCrVASvsL2NzXJmuBtcBlhZTQbKQlk7Obo2ShJcGyhsNeuMrCvZx4R29Y1M0zM0NS\nQyOA0PuNIYj4+Z2kctaVW8usKVGysraNVWHuZ6pkar97VaCK950ad0jBuikaZqkPnkG3DgNeH1Bq\nAF4l3tNosvWUYY0AiQ5mRqaFy8CXNFkqz6lFqE8A22/9bWd+6287j/ff95/+0kfZ/M8C3zi9/1x8\ndgK+6ZHP36h9oqYoEH6L7vyc/GwdIKYO8aiPrfM46extW5JsACdqiIap172x4r6pwITxemZzj40j\nw8Rge1icce59gbYZJIdj3pNCgRHPcAa3P8uxn7g+9HWblmCG0lmduTlWq1JUEXXz56qJq2auLXNt\nhYVEEX+4mok/HDi7UfzBPEvhRleuemXNiZqVkpXrkrmcCmtJyFmwVWmrM7a6CrIKuibaakjJG6iV\njOS6jRKDqlcHt2sAn16
|
|||
|
"text/plain": [
|
|||
|
"<matplotlib.figure.Figure at 0x10becee48>"
|
|||
|
]
|
|||
|
},
|
|||
|
"metadata": {},
|
|||
|
"output_type": "display_data"
|
|||
|
}
|
|||
|
],
|
|||
|
"source": [
|
|||
|
"plt.imshow(z, origin='lower', extent=[0, 5, 0, 5],\n",
|
|||
|
" cmap='viridis')\n",
|
|||
|
"plt.colorbar();"
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"The result is a compelling visualization of the two-dimensional function."
|
|||
|
]
|
|||
|
},
|
|||
|
{
|
|||
|
"cell_type": "markdown",
|
|||
|
"metadata": {},
|
|||
|
"source": [
|
|||
|
"<!--NAVIGATION-->\n",
|
|||
|
"< [Aggregations: Min, Max, and Everything In Between](02.04-Computation-on-arrays-aggregates.ipynb) | [Contents](Index.ipynb) | [Comparisons, Masks, and Boolean Logic](02.06-Boolean-Arrays-and-Masks.ipynb) >"
|
|||
|
]
|
|||
|
}
|
|||
|
],
|
|||
|
"metadata": {
|
|||
|
"anaconda-cloud": {},
|
|||
|
"kernelspec": {
|
|||
|
"display_name": "Python 3",
|
|||
|
"language": "python",
|
|||
|
"name": "python3"
|
|||
|
},
|
|||
|
"language_info": {
|
|||
|
"codemirror_mode": {
|
|||
|
"name": "ipython",
|
|||
|
"version": 3
|
|||
|
},
|
|||
|
"file_extension": ".py",
|
|||
|
"mimetype": "text/x-python",
|
|||
|
"name": "python",
|
|||
|
"nbconvert_exporter": "python",
|
|||
|
"pygments_lexer": "ipython3",
|
|||
|
"version": "3.4.3"
|
|||
|
}
|
|||
|
},
|
|||
|
"nbformat": 4,
|
|||
|
"nbformat_minor": 0
|
|||
|
}
|