mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
800 lines
73 KiB
Python
800 lines
73 KiB
Python
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Effect Size\n",
|
||
|
"======================\n",
|
||
|
"\n",
|
||
|
"Credits: Forked from [CompStats](https://github.com/AllenDowney/CompStats) by Allen Downey. License: [Creative Commons Attribution 4.0 International](http://creativecommons.org/licenses/by/4.0/)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"from __future__ import print_function, division\n",
|
||
|
"\n",
|
||
|
"import numpy\n",
|
||
|
"import scipy.stats\n",
|
||
|
"\n",
|
||
|
"import matplotlib.pyplot as pyplot\n",
|
||
|
"\n",
|
||
|
"from IPython.html.widgets import interact, fixed\n",
|
||
|
"from IPython.html import widgets\n",
|
||
|
"\n",
|
||
|
"# seed the random number generator so we all get the same results\n",
|
||
|
"numpy.random.seed(17)\n",
|
||
|
"\n",
|
||
|
"# some nice colors from http://colorbrewer2.org/\n",
|
||
|
"COLOR1 = '#7fc97f'\n",
|
||
|
"COLOR2 = '#beaed4'\n",
|
||
|
"COLOR3 = '#fdc086'\n",
|
||
|
"COLOR4 = '#ffff99'\n",
|
||
|
"COLOR5 = '#386cb0'\n",
|
||
|
"\n",
|
||
|
"%matplotlib inline"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"To explore statistics that quantify effect size, we'll look at the difference in height between men and women. I used data from the Behavioral Risk Factor Surveillance System (BRFSS) to estimate the mean and standard deviation of height in cm for adult women and men in the U.S.\n",
|
||
|
"\n",
|
||
|
"I'll use `scipy.stats.norm` to represent the distributions. The result is an `rv` object (which stands for random variable)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"mu1, sig1 = 178, 7.7\n",
|
||
|
"male_height = scipy.stats.norm(mu1, sig1)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"mu2, sig2 = 163, 7.3\n",
|
||
|
"female_height = scipy.stats.norm(mu2, sig2)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The following function evaluates the normal (Gaussian) probability density function (PDF) within 4 standard deviations of the mean. It takes and rv object and returns a pair of NumPy arrays."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 4,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def eval_pdf(rv, num=4):\n",
|
||
|
" mean, std = rv.mean(), rv.std()\n",
|
||
|
" xs = numpy.linspace(mean - num*std, mean + num*std, 100)\n",
|
||
|
" ys = rv.pdf(xs)\n",
|
||
|
" return xs, ys"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Here's what the two distributions look like."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 5,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAEPCAYAAACqZsSmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztvXmUZNd93/e5tfXeXb3vMz2YfQCCWEgQtECpRdEiCduE\nEkUL44RHyzF5QoPWUXIcykrOEZAcR17iWGEYU7Aly5QdirRoh4HEBRIptgSKJIgdGMxgMNOz9r7v\nS203f7yqqXr3VXVXd1fVq3rv9zmnTr93675Xv67l++773d/9/UAQBEEQBEEQBEEQBEEQBEEQBEEQ\nBEEQBEEQBEEQBEGoYT4CvA1cBT5boM/n0s+/DjyY0x4FvgpcBi4Bj5bPTEEQBKFUBIFrwAgQBl4D\nzht9Hge+kd5+H/DDnOe+CPxKejsEtJXLUEEQBKF0vB/4Vs7+b6Qfufwu8As5+28DvVhCf72s1gmC\nIAiHIrDP84PAnZz9iXTbfn2GgBPAPPAHwCvAvwEaj2KsIAiCUBr2E39d5HlUnuNCwEPAv0r/3cR5\n1yAIgiC4QGif5yeB4Zz9YayR/V59htJtKt33xXT7V8kj/idPntTj4+MHMFkQBEEAxoFThz14v5H/\nS8BprAnfCJZv/1mjz7PAJ9LbjwIrwCwwg+UOOpN+7kPAW+YLjI+Po7Wu+sdv/dZvuW6D2Cl21rKd\ntWBjLdkJnDyw4uew38g/ATwJPIcV+fP7WGGbn0o//wxWpM/jWFFBm8Av5xz/GeD/wbpwjBvPCYIg\nCC6xn/gDfDP9yOUZY//JAse+Drz3oEYJgiAI5WU/t4+QZnR01G0TikLsLC1iZ+moBRuhduw8KmaU\njhvotP9KEARBKBKlFBxBw2XkLwiC4ENE/AVBEHyIiL8gCIIPEfEXBEHwISL+giAIPkTEXxAEwYeI\n+AuCIPgQEX9BEAQfIuIvCILgQ0T8BUEQfIiIvyAIgg8pJqunILiG3lqEzXms4nABaOlH1be6bZYg\n1Dwi/kJVojfn0TfGYPGa8YxC996HGvkAqr7NBcsEwRtIVk+hqtBao69/FyZe2LujCqJOfhA1+J7K\nGCYIVcZRs3rKyF+oGrTW6PHvwOSLRXROoq/9OYBcAAThEMiEr1A16Fvfyy/8zX3QdQ4aOpzHXPtz\n9MwbFbBOELyFjPyFqkBPvw63vmdvDDeiznwUOk+jlELrFMy8YY34U4nssVe+AXWtqPaRyhotCDWM\njPwF19G76+jxb9sbQ/Wo+z+O6jqT8W2iVADV/wDq3p8FlfvV1eh3volOxipntCDUOCL+gqtordFX\nn4Nc4Q6EUe/6eVRzT95jVMc9qPM/Y2/cWUHffL6MlgqCtxDxF9xl4QosXrU1qXtGUa2Dex6mus+C\nOdE78SJ6fbq09gmCRxHxF1xDJ2N3I3bu0jIAAw8Vdbwa+XGoy13wpdHvfAsJHRaE/RHxF9xj6lWI\nbWT3VQB19nGUKu5rqUJ1qNMftjduzDjuJARBcCLRPoIr6GQcfeeH9sahR1BN3Qc6j+o8he46a7mP\nMue+9dd3I4RqEa0181NrzN5ZJbZjRTWpgCLa1cTQyQ4idfKzFY6OfIsEd5h+DeJb2f1gBDX8vkOd\nSh1/DJ0j/mzMwNI4dJ46opGVRWvN3OQaNy/PsbXhjFxaXdzizrUFBkbaGTnXTTgiP1/h8IjbR6g4\nOpVwjvoHHkKFGw91PtXcA11n7a9x669ryvevtebqGzNcenEir/BnSCU1E+NLvDR2ne1NCW0VDo+I\nv1B5Zt6w+/oDIdTQI0c6pTr2N+wN61OwfPNI56wUWmvefmWKyetLRR+zsxnn1edvsLW+W0bLBC9T\njPh/BHgbuAp8tkCfz6Wffx14MKf9JvAG8Crwo0NbKXgGrTV68iV748BDqEjTkc6rWvocbh499fKR\nzlkJtNZcfmmSmdsrtnaloHe4jQceG+GhHz/BqXf1Eqm3u3l2txO8+vxNuQAIh2I/p2EQ+DzwIWAS\neBF4Fric0+dx4BRwGngf8AXg0fRzGhgFih/SCN5mdQK2FrP7KnDkUf/dUw2/H52bAnrxGnp3DVVX\nvfn/J8aXmJ1YtbVF6kK8+8eO09xWf7etrbORwRMdXH5lirmc/rHdBBd/dIeHR+8hGJQbeaF49vu2\nPAJcwxrBx4EvA08YfT4GfDG9/QIQBXpznq/NkAuhLOjpV+wNnadRdS2lOXnrIDTlrgrWVs6gKmVj\ndYfxt2ZtbXUNYR788RGb8GcIBANceM8g/cejtvbNtV2uvzVXVlsF77Gf+A8Cd3L2J9JtxfbRwLeB\nl4C/d3gzBS+g41swf8XWpvofLND74CilUP0P2BunX7cSwlUZyWSKSy9OoFPZSelQOMCDHxihsbmu\n4HFKKc4+OEDPoP1uZmJ8kaXZjQJHCYKT/cS/2HCJQqP7x7DmAD4K/H3gA0WeT/AiM2+CTmb366NQ\n6kycvfdBIJzdj63D4nhpX6ME3Lg0x6bhqz/zwAANTZF9j1VKceaBAeobw7b2yy9PEo8lCxwlCHb2\n8/lPAsM5+8NYI/u9+gyl2wCm0n/ngf8Xy43kyL711FNP3d0eHR1ldHR0H7OEWkNrjZ5+1dam+h8s\n+UIsFapD91yAmay7R0+/guo6XdLXOQpbG7tMjC/a2vqG2+gdKr4sZTgS5PzDg7z6/M27bbHdBLev\nLnDy3t7CBwo1y9jYGGNjYyU7336/vBBwBfgpLCH/EfBxnBO+T6b/Pgr8TvpvI9aE8TrQBPwZ8HT6\nby5SxtEH6NUJ9Gv/PtugAqhHP4OKHC62f8/XWp9Gv/LvcloU6v1PoiLNJX+tw3DxhTvMT63d3a9v\nDPPeD54kFA4e+FzX35rl1jsLd/cDAcX7/uZpx12B4D2OWsZxP7dPAkvYnwMuAV/BEv5PpR8A3wCu\nY00MPwN8Ot3ehzXKfw1rIvhPcQq/4BP03CV7Q+eZsgg/gGrph+bc0a+GucsF+1eS1aUtm/AD3HNv\n76GEH+D42W5bCGgqpblxWSZ/hf0pZn34N9OPXJ4x9p/Mc9x14IE87YLP0DoF83bxVb33lvU1Vc8F\n9EY2kkbPXUINvbesr7kfWmvGL9qje1qi9Y7J24MQDAU4cb6HK69O3W2bub3C8KnOvBFDgpBBAoOF\n8rN808jjUwcd95T3NbvP2/fXp9Dby+V9zX1YnN1gdXHL1nbyvr4jz3v0HYvS2GKPELpuhJAKgomI\nv1B2HC6f7rOoQHmTkqn6Nmgdsje67Pq5neObB+jobaa9+2grm8Hy85uTvIuzG2yu7Rz53IJ3EfEX\nyopOJWDhHVub6rlQkdc2XUt67q2KvG4+1pa3HaP+ey7kL1N5GDr7mmltb7C13bm2WKC3IIj4C+Vm\ncRySOfHs4SaIHq/Ma3edwxYMsbWA3nBnMvTONfuov727iZZoQ4HeB0cpxfDpTlvbTE49AEEwEfEX\nyoo2JnrpOVd0pa6joiKN0H5ib3sqwM5WjPlJe4TP8KnOAr0PT/dAqy3EU6c0EwfIFCr4CxF/oWzo\nVMIqqpKD6q6My+fu6/UYE78ulHicGF8idylLY0sdHb2lX3OglHJcVKZuLJFMVF96C8F9RPyF8rFy\nG5I5BUfCTdA6UFkbOk9hc/1szlc06ieRSDJ10/56w6c6y1Zisu94lFA4+7OOx5LM3lnZ4wjBr4j4\nC2VDL9oneuk6XTGXTwYVboS2YXvjQuVG//MTa7aRd7guSO9w8WkcDkooFGRgpMPWNnVLxF9wIuIv\nlAWttUNkVac7+XVU1xnbvuOiVEambtlH/QPH28ued3/gRLttf315m41VCfsU7Ij4C+Vhfdoo1Rgu\nfQbPYjGTuq1OoGObZX/ZzbUd1pa2bW39x9sL9C4dDU0R2nvs6wemb7m7wE2oPkT8hbKgjdh+Ok6W\nfWFXIVR91JnrJ7fiV5mYNtwt0a4mGpr3T9lcCgaMi8zM7VWSSZn4FbKI+AvlwXCtuJ1S2XQ56TJH\n/aRSKUdd3v6RaIHepae
|
||
|
"text/plain": [
|
||
|
"<matplotlib.figure.Figure at 0x7f75ddc471d0>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"xs, ys = eval_pdf(male_height)\n",
|
||
|
"pyplot.plot(xs, ys, label='male', linewidth=4, color=COLOR2)\n",
|
||
|
"\n",
|
||
|
"xs, ys = eval_pdf(female_height)\n",
|
||
|
"pyplot.plot(xs, ys, label='female', linewidth=4, color=COLOR3)\n",
|
||
|
"pyplot.xlabel('height (cm)')\n",
|
||
|
"None"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Let's assume for now that those are the true distributions for the population. Of course, in real life we never observe the true population distribution. We generally have to work with a random sample.\n",
|
||
|
"\n",
|
||
|
"I'll use `rvs` to generate random samples from the population distributions. Note that these are totally random, totally representative samples, with no measurement error!"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 6,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"male_sample = male_height.rvs(1000)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 7,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"female_sample = female_height.rvs(1000)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Both samples are NumPy arrays. Now we can compute sample statistics like the mean and standard deviation."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 8,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"(178.16511665818112, 7.8419961712899502)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 8,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"mean1, std1 = male_sample.mean(), male_sample.std()\n",
|
||
|
"mean1, std1"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The sample mean is close to the population mean, but not exact, as expected."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 9,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"(163.48610226651135, 7.382384919896662)"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 9,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"mean2, std2 = female_sample.mean(), female_sample.std()\n",
|
||
|
"mean2, std2"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"And the results are similar for the female sample.\n",
|
||
|
"\n",
|
||
|
"Now, there are many ways to describe the magnitude of the difference between these distributions. An obvious one is the difference in the means:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 10,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"14.679014391669767"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 10,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"difference_in_means = male_sample.mean() - female_sample.mean()\n",
|
||
|
"difference_in_means # in cm"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"On average, men are 14--15 centimeters taller. For some applications, that would be a good way to describe the difference, but there are a few problems:\n",
|
||
|
"\n",
|
||
|
"* Without knowing more about the distributions (like the standard deviations) it's hard to interpret whether a difference like 15 cm is a lot or not.\n",
|
||
|
"\n",
|
||
|
"* The magnitude of the difference depends on the units of measure, making it hard to compare across different studies.\n",
|
||
|
"\n",
|
||
|
"There are a number of ways to quantify the difference between distributions. A simple option is to express the difference as a percentage of the mean.\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 11,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"8.2389946286916569"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 11,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Exercise: what is the relative difference in means, expressed as a percentage?\n",
|
||
|
"\n",
|
||
|
"relative_difference = difference_in_means / male_sample.mean()\n",
|
||
|
"relative_difference * 100 # percent"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"But a problem with relative differences is that you have to choose which mean to express them relative to."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 12,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"8.9787536605040401"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 12,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"relative_difference = difference_in_means / female_sample.mean()\n",
|
||
|
"relative_difference * 100 # percent"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Part Two\n",
|
||
|
"========\n",
|
||
|
"\n",
|
||
|
"An alternative way to express the difference between distributions is to see how much they overlap. To define overlap, we choose a threshold between the two means. The simple threshold is the midpoint between the means:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 13,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"170.82560946234622"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 13,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"simple_thresh = (mean1 + mean2) / 2\n",
|
||
|
"simple_thresh"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"A better, but slightly more complicated threshold is the place where the PDFs cross."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 14,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"170.6040359174722"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 14,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"thresh = (std1 * mean2 + std2 * mean1) / (std1 + std2)\n",
|
||
|
"thresh"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In this example, there's not much difference between the two thresholds.\n",
|
||
|
"\n",
|
||
|
"Now we can count how many men are below the threshold:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 15,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"164"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 15,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"male_below_thresh = sum(male_sample < thresh)\n",
|
||
|
"male_below_thresh"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"And how many women are above it:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 16,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"174"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 16,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"female_above_thresh = sum(female_sample > thresh)\n",
|
||
|
"female_above_thresh"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"The \"overlap\" is the total area under the curves that ends up on the wrong side of the threshold."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 17,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"0.33799999999999997"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 17,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"overlap = male_below_thresh / len(male_sample) + female_above_thresh / len(female_sample)\n",
|
||
|
"overlap"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Or in more practical terms, you might report the fraction of people who would be misclassified if you tried to use height to guess sex:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 18,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"0.16899999999999998"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 18,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"misclassification_rate = overlap / 2\n",
|
||
|
"misclassification_rate"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Another way to quantify the difference between distributions is what's called \"probability of superiority\", which is a problematic term, but in this context it's the probability that a randomly-chosen man is taller than a randomly-chosen woman."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 19,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"0.91100000000000003"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 19,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"# Exercise: suppose I choose a man and a woman at random.\n",
|
||
|
"# What is the probability that the man is taller?\n",
|
||
|
"sum(x > y for x, y in zip(male_sample, female_sample)) / len(male_sample)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Overlap (or misclassification rate) and \"probability of superiority\" have two good properties:\n",
|
||
|
"\n",
|
||
|
"* As probabilities, they don't depend on units of measure, so they are comparable between studies.\n",
|
||
|
"\n",
|
||
|
"* They are expressed in operational terms, so a reader has a sense of what practical effect the difference makes.\n",
|
||
|
"\n",
|
||
|
"There is one other common way to express the difference between distributions. Cohen's $d$ is the difference in means, standardized by dividing by the standard deviation. Here's a function that computes it:\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 20,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def CohenEffectSize(group1, group2):\n",
|
||
|
" \"\"\"Compute Cohen's d.\n",
|
||
|
"\n",
|
||
|
" group1: Series or NumPy array\n",
|
||
|
" group2: Series or NumPy array\n",
|
||
|
"\n",
|
||
|
" returns: float\n",
|
||
|
" \"\"\"\n",
|
||
|
" diff = group1.mean() - group2.mean()\n",
|
||
|
"\n",
|
||
|
" n1, n2 = len(group1), len(group2)\n",
|
||
|
" var1 = group1.var()\n",
|
||
|
" var2 = group2.var()\n",
|
||
|
"\n",
|
||
|
" pooled_var = (n1 * var1 + n2 * var2) / (n1 + n2)\n",
|
||
|
" d = diff / numpy.sqrt(pooled_var)\n",
|
||
|
" return d"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Computing the denominator is a little complicated; in fact, people have proposed several ways to do it. This implementation uses the \"pooled standard deviation\", which is a weighted average of the standard deviations of the two groups.\n",
|
||
|
"\n",
|
||
|
"And here's the result for the difference in height between men and women."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 21,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/plain": [
|
||
|
"1.9274780043619493"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 21,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"CohenEffectSize(male_sample, female_sample)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Most people don't have a good sense of how big $d=1.9$ is, so let's make a visualization to get calibrated.\n",
|
||
|
"\n",
|
||
|
"Here's a function that encapsulates the code we already saw for computing overlap and probability of superiority."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 22,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def overlap_superiority(control, treatment, n=1000):\n",
|
||
|
" \"\"\"Estimates overlap and superiority based on a sample.\n",
|
||
|
" \n",
|
||
|
" control: scipy.stats rv object\n",
|
||
|
" treatment: scipy.stats rv object\n",
|
||
|
" n: sample size\n",
|
||
|
" \"\"\"\n",
|
||
|
" control_sample = control.rvs(n)\n",
|
||
|
" treatment_sample = treatment.rvs(n)\n",
|
||
|
" thresh = (control.mean() + treatment.mean()) / 2\n",
|
||
|
" \n",
|
||
|
" control_above = sum(control_sample > thresh)\n",
|
||
|
" treatment_below = sum(treatment_sample < thresh)\n",
|
||
|
" overlap = (control_above + treatment_below) / n\n",
|
||
|
" \n",
|
||
|
" superiority = sum(x > y for x, y in zip(treatment_sample, control_sample)) / n\n",
|
||
|
" return overlap, superiority"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Here's the function that takes Cohen's $d$, plots normal distributions with the given effect size, and prints their overlap and superiority."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 23,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"def plot_pdfs(cohen_d=2):\n",
|
||
|
" \"\"\"Plot PDFs for distributions that differ by some number of stds.\n",
|
||
|
" \n",
|
||
|
" cohen_d: number of standard deviations between the means\n",
|
||
|
" \"\"\"\n",
|
||
|
" control = scipy.stats.norm(0, 1)\n",
|
||
|
" treatment = scipy.stats.norm(cohen_d, 1)\n",
|
||
|
" xs, ys = eval_pdf(control)\n",
|
||
|
" pyplot.fill_between(xs, ys, label='control', color=COLOR3, alpha=0.7)\n",
|
||
|
"\n",
|
||
|
" xs, ys = eval_pdf(treatment)\n",
|
||
|
" pyplot.fill_between(xs, ys, label='treatment', color=COLOR2, alpha=0.7)\n",
|
||
|
" \n",
|
||
|
" o, s = overlap_superiority(control, treatment)\n",
|
||
|
" print('overlap', o)\n",
|
||
|
" print('superiority', s)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Here's an example that demonstrates the function:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 24,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"overlap 0.278\n",
|
||
|
"superiority 0.932\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEACAYAAAC57G0KAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuMZPdV4PHv73dvVfe8x+/HeJyJHAecQOJkiTEKgYaw\nyIkWzC4rBS8IQVDWu8I8tGhlopXIWEirjXYRLLLImsVCLJvFWtgQAkpwEqCXwOZhx44dx68Zz7N7\nenr6Ue931T37x+9WT02nu6uqu6ruvXXPR2p1Vde9Xb+5U3X6V+eee36glFJKKaWUUkoppZRSSiml\nlFJKKaWUUkoplVoPAK8Cp4BHd9juPUAb+Mld7KuUUmrCPOA0cALIAN8A7tlmu78F/oqrAX7QfZVS\nSo2B7fP4fbggfQ5oAU8BD26x3S8Bfwas7GJfpZRSY9AvwB8DLvbcXwh/tnmbB4FPhPdliH2VUkqN\nSb8AL30eB/gd4NfDbU34Nei+SimlxsTv8/gicLzn/nHcTLzXP8GlXwBuBD6AS8kMsi933XWXvPHG\nG0MMWSmlFPAG8JadNjA7PYj7A/Aa8H7gEvA14CHglW22/0PgL4FPDbGviOhkH+DkyZOcPHky6mFM\nXJC/COf+HjpN8PeB9Xjs9/43H/u3/xLaVcDCbfdibn0HxvR7ySZbbqXM2VdWCDoBftbDWsMn/uC3\n+Tcf/lVazQ7GGI7ddT23Hj8a9VAjkdb3yFbC98KOb4h+M/g28AjwNK4q5klcgH44fPyJXeyr1Iag\neAnO/A3YDGQPXvugsZA5CEEHLn3d5QBve2ck45yE/GqFN15axvMt2dlr35rGGrKzPkEgXDy1hjFw\nyx3pDPJqcP0CPMDnwq9e2wX2nx9gX6UAkPIVOP0FF9y97PYbWg/8/XDp6wReFnvz9FXbFnPVjeDu\n+dufGrPWkJnxuHBqDetZbrrt8ARHqZKm30lWNUFzc3NRD2FipFFCTn0ejLdlcP/B97z92h90g/zF\nrxDkL0xolJNRqzQ59eJljGe2DO7f8+77r7lvrSGTsZx/dYXCWmVSw4yFNL1HRiEOCU3NwaeMiCCn\n/hrKy5A5MNzOnSYYg3n7v8D4s+MZ4ASJCK8+d4lKqUF2xhtq33arg/Us3/29d+4461fTaZAcvL4q\n1MTJ+htQWnIz8mF5WWg3kcVnRz+wCKxcKlEu1slkh38r+hmPdrPD4tn1MYxMTQMN8GqipFWDi18B\nbxZ2WxGT2Qerp5DS5dEObsKa9TYXT6+SydhdVwdlsh5XFgqUi/URj05NAw3waqJk4WvQaYOX2f0v\nMRasj5z7EhJ0Rje4CTt/ahUJBOvt/m1orMFYw7lXrhAEmupU19IAryZGajlYPwuZXaRmNvNnoVl2\n6Z4EqhTr5FcrZIbMu28lk/WoV1vkVsojGJmaJhrg1cTI0jfCZhYjOrfvZeHSc4mcxS+ezWEMI7tw\ny/qWxTPrOotX19AAryZCannInd/didXteFlo1ZDc2dH9zgmolhoU1qtksnufvXf5vqVRb5Nf1Vm8\nukoDvJoIufyCuzHqVgNeFha/jkgw2t87Rm72bkbedsHzDItncmjZserSAK/GTuoFyJ111S+j5mWh\nVUXWz43+d49BtdygsFbZVVlkP55vqdda5FfTdfGT2p4GeDV2svytsJn0mF5uNgNLzyVi5nr5fB4Y\nXe69lzEGzzNcOquzeOVogFdjJZ0mrJ92VS/j4mWhUYbKlfE9xwi0mh3WVyojzb1v5vmWWqVJtdQc\n23Oo5NAAr8ZKcudAAtdLZlzC2bBciXez0vXlMohg7Pg6hBhjwMDKpcLYnkMlhwZ4NTYiAssvgRmk\naekeZfZB/ry7UjaGRITli/mJ9IzxMx5ry2XareSVj6rR0gCvxqe6CvXizq2AR8VYkADJnRn/c+1C\nOV+n2WhPJMBba5BAyK3oyda00wCvxkZWXh3thU39eFlY/lYsTzBeWSxMtHer9QyXL+RjeSzU5GiA\nV2Mh7Qasn3FL8E2Kl4VmFcrxakLWarTJrYz2wqZ+PN/SqLWoFBsTe04VPxrg1VhI/oI7uTqu0sid\nnnv19Yk/507WVyqATHQ92e5zrS6VJvacKn40wKvxWH0N7AROrm6W2Qe580inNfnn3sbqpSLWm/za\nOn7WY/1KmSBIzlW+arQGCfAPAK8Cp4BHt3j8QeAF4Hng68AP9zx2DngxfOxrexmoSg5pVqCyAt7M\n5J/cWCBAiouTf+4t1KtNapVmJCsuWWsIAqGU017xadVviuUBjwM/AiwCzwCfAXoLjr8I/EV4+7uB\nPwfeEt4XYA7QJWdSRPLn3Y0JpiSuZWDtFFx3IqLnvyq3UnEX8UZ2LITVyyWO3DDCJm8qMfpNK+4D\nTuNm4i3gKdyMvVdvLdZBYHXT43FY91VN0uprrn1AVPxZKC66E70REhFWLpXwI1wvNZPxyK9U6LQ1\nTZNG/V55x4CLPfcXwp9t9hO4Wf3ngF/u+bngZvjPAh/Z/TBVUki9ALX8ZGrft2MsiCCFi/23HaNa\nuUmz3ook/95lrEFEKKxXIxuDik6/AD9oEe2ngXuAHwP+uOfn7wXeBXwA+EXgfcMOUCWL5M65G5Gl\nJELWh4iradavlCNOzzjGGq2mSal+OfhF4HjP/eO4Wfx2vhT+zhuANWAp/PkKLjd/X7jNNU6ePLlx\ne25ujrm5uT7DUnEkIi6oRnFydTNvBspXkFYVM4olAockIqwulfAz0Req+RlLMVej3ergZyZXi69G\na35+nvn5+aH26Te18IHXgPcDl3CVMA9x7UnWu4AzuNn+u4E/DX+2H3eStgQcAD4PPBZ+7yV6td10\nkFoOefkv3JqrUc/gAZoVeNN7sTfePfGnrhQbvPLcAtmZCEpFt9BsdHjzPTdxwy2Hoh6KGpHwk+GO\nb7R+r7428AjwNC5YP4kL7g+Hjz8B/CTws7iTsGXgp8LHbgU+1fM8n+Tbg7uaIlJYACQewR1cB8v1\nNyCCAJ9fqxCneYsxkLtS1gCfMnF4J+oMfkoEL3/azZqjPMHaSwTaVcw7/xVmgmMSEV76ykXa7U4k\n9e9bkUBotwLufd8JPC8eY1J7M8gMXv+n1UhIs+KqZ6Isj9ys+0miuLTzdiPWqLVoRFw9s5mxBsF1\ntVTpoQFejYQUFyfbOXJgBsmdnegz5tdcSWLU1TPfRoT1lXLUo1ATpAFejcb6G2BiWKHhz0LhAhJM\nbvGL9eVyrGbvXX7WI3elQhBoSjQtNMCrPZN2A8rL8SiP3Cy86Iny8kSertloUy01YpN77+V60wRU\nS9pCOC3i9ypUiSPFS+5G3FISG4Kr/XHGrLge0/RMl0BuVVd6SgsN8GrvcmeJR0HWNrxZyJ2dyOpG\n68vlsS6qvVdexrJ+uawrPaWEBni1JyIBFBfimZ7psj50mlDLjfVpOp2AUr4ei6tXt2OtodVs06jF\np1++Gp/4vhJVMlRW3cpNNoYnWHuJIKVLY32KSrERi94zOzHGlUsWc7Woh6ImQAO82hMpLhCrSza3\nYzNhKml88quVRKQ+rDXkrmgePg00wKu9yZ2Lz5WrO/GyUFkbW494ESG3Uol1eqbLz1hKhbr2iE+B\n+L8aVWxJswL1QryuXt2OMe48cPnyWH59o9am1WhjY3yCtaubQioX9KrWaacBXu2alJbCwBn/oAaA\ngOQvjOVXF3MxL4/cTGTjils1vTTAq93LnydRLyF/FvIXxpInz61UYl0euZmXseRXtFxy2iXo3ani\nRIIOFBfBj3F55GbWg6AFtdGuAd/pBJRjXh65mSuXDKhXtVxymiXnFanipbrqqmdMwl5CIlevvB2R\nbi47MekZumMVSlouOdUS9u5UcSHFRVf/njQ2A/lzI/2VxfUqQQJTHcYacitaLjnNNMCr3cmfT0Z5\n5GZeFqprSKc5sl+ZW6ngx7C5WD/dcsmgk8A/1GogyXtVqshJu56c8sjNummU8pWR/Lpmo02z3o5l\ne+B+ummainaXnFoa4NX
|
||
|
"text/plain": [
|
||
|
"<matplotlib.figure.Figure at 0x7f75ddb98d10>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"plot_pdfs(2)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"And an interactive widget you can use to visualize what different values of $d$ mean:"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 25,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"name": "stdout",
|
||
|
"output_type": "stream",
|
||
|
"text": [
|
||
|
"overlap 0.305\n",
|
||
|
"superiority 0.931\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"data": {
|
||
|
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAEACAYAAAC57G0KAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuMZPdV4PHv73dvVfe8x+/HeJyJHAecQOJkiTEKgYaw\nyIkWzC4rBS8IQVDWu8I8tGhlopXIWEirjXYRLLLImsVCLJvFWtgQAkpwEqCXwOZhx44dx68Zz7N7\nenr6Ue931T37x+9WT02nu6uqu6ruvXXPR2p1Vde9Xb+5U3X6V+eee36glFJKKaWUUkoppZRSSiml\nlFJKKaWUUkoplVoPAK8Cp4BHd9juPUAb+Mld7KuUUmrCPOA0cALIAN8A7tlmu78F/oqrAX7QfZVS\nSo2B7fP4fbggfQ5oAU8BD26x3S8Bfwas7GJfpZRSY9AvwB8DLvbcXwh/tnmbB4FPhPdliH2VUkqN\nSb8AL30eB/gd4NfDbU34Nei+SimlxsTv8/gicLzn/nHcTLzXP8GlXwBuBD6AS8kMsi933XWXvPHG\nG0MMWSmlFPAG8JadNjA7PYj7A/Aa8H7gEvA14CHglW22/0PgL4FPDbGviOhkH+DkyZOcPHky6mFM\nXJC/COf+HjpN8PeB9Xjs9/43H/u3/xLaVcDCbfdibn0HxvR7ySZbbqXM2VdWCDoBftbDWsMn/uC3\n+Tcf/lVazQ7GGI7ddT23Hj8a9VAjkdb3yFbC98KOb4h+M/g28AjwNK4q5klcgH44fPyJXeyr1Iag\neAnO/A3YDGQPXvugsZA5CEEHLn3d5QBve2ck45yE/GqFN15axvMt2dlr35rGGrKzPkEgXDy1hjFw\nyx3pDPJqcP0CPMDnwq9e2wX2nx9gX6UAkPIVOP0FF9y97PYbWg/8/XDp6wReFnvz9FXbFnPVjeDu\n+dufGrPWkJnxuHBqDetZbrrt8ARHqZKm30lWNUFzc3NRD2FipFFCTn0ejLdlcP/B97z92h90g/zF\nrxDkL0xolJNRqzQ59eJljGe2DO7f8+77r7lvrSGTsZx/dYXCWmVSw4yFNL1HRiEOCU3NwaeMiCCn\n/hrKy5A5MNzOnSYYg3n7v8D4s+MZ4ASJCK8+d4lKqUF2xhtq33arg/Us3/29d+4461fTaZAcvL4q\n1MTJ+htQWnIz8mF5WWg3kcVnRz+wCKxcKlEu1slkh38r+hmPdrPD4tn1MYxMTQMN8GqipFWDi18B\nbxZ2WxGT2Qerp5DS5dEObsKa9TYXT6+SydhdVwdlsh5XFgqUi/URj05NAw3waqJk4WvQaYOX2f0v\nMRasj5z7EhJ0Rje4CTt/ahUJBOvt/m1orMFYw7lXrhAEmupU19IAryZGajlYPwuZXaRmNvNnoVl2\n6Z4EqhTr5FcrZIbMu28lk/WoV1vkVsojGJmaJhrg1cTI0jfCZhYjOrfvZeHSc4mcxS+ezWEMI7tw\ny/qWxTPrOotX19AAryZCannInd/didXteFlo1ZDc2dH9zgmolhoU1qtksnufvXf5vqVRb5Nf1Vm8\nukoDvJoIufyCuzHqVgNeFha/jkgw2t87Rm72bkbedsHzDItncmjZserSAK/GTuoFyJ111S+j5mWh\nVUXWz43+d49BtdygsFbZVVlkP55vqdda5FfTdfGT2p4GeDV2svytsJn0mF5uNgNLzyVi5nr5fB4Y\nXe69lzEGzzNcOquzeOVogFdjJZ0mrJ92VS/j4mWhUYbKlfE9xwi0mh3WVyojzb1v5vmWWqVJtdQc\n23Oo5NAAr8ZKcudAAtdLZlzC2bBciXez0vXlMohg7Pg6hBhjwMDKpcLYnkMlhwZ4NTYiAssvgRmk\naekeZfZB/ry7UjaGRITli/mJ9IzxMx5ry2XareSVj6rR0gCvxqe6CvXizq2AR8VYkADJnRn/c+1C\nOV+n2WhPJMBba5BAyK3oyda00wCvxkZWXh3thU39eFlY/lYsTzBeWSxMtHer9QyXL+RjeSzU5GiA\nV2Mh7Qasn3FL8E2Kl4VmFcrxakLWarTJrYz2wqZ+PN/SqLWoFBsTe04VPxrg1VhI/oI7uTqu0sid\nnnv19Yk/507WVyqATHQ92e5zrS6VJvacKn40wKvxWH0N7AROrm6W2Qe580inNfnn3sbqpSLWm/za\nOn7WY/1KmSBIzlW+arQGCfAPAK8Cp4BHt3j8QeAF4Hng68AP9zx2DngxfOxrexmoSg5pVqCyAt7M\n5J/cWCBAiouTf+4t1KtNapVmJCsuWWsIAqGU017xadVviuUBjwM/AiwCzwCfAXoLjr8I/EV4+7uB\nPwfeEt4XYA7QJWdSRPLn3Y0JpiSuZWDtFFx3IqLnvyq3UnEX8UZ2LITVyyWO3DDCJm8qMfpNK+4D\nTuNm4i3gKdyMvVdvLdZBYHXT43FY91VN0uprrn1AVPxZKC66E70REhFWLpXwI1wvNZPxyK9U6LQ1\nTZNG/V55x4CLPfcXwp9t9hO4Wf3ngF/u+bngZvjPAh/Z/TBVUki9ALX8ZGrft2MsiCCFi/23HaNa\nuUmz3ook/95lrEFEKKxXIxuDik6/AD9oEe2ngXuAHwP+uOfn7wXeBXwA+EXgfcMOUCWL5M65G5Gl\nJELWh4iradavlCNOzzjGGq2mSal+OfhF4HjP/eO4Wfx2vhT+zhuANWAp/PkKLjd/X7jNNU6ePLlx\ne25ujrm5uT7DUnEkIi6oRnFydTNvBspXkFYVM4olAockIqwulfAz0Req+RlLMVej3ergZyZXi69G\na35+nvn5+aH26Te18IHXgPcDl3CVMA9x7UnWu4AzuNn+u4E/DX+2H3eStgQcAD4PPBZ+7yV6td10\nkFoOefkv3JqrUc/gAZoVeNN7sTfePfGnrhQbvPLcAtmZCEpFt9BsdHjzPTdxwy2Hoh6KGpHwk+GO\nb7R+r7428AjwNC5YP4kL7g+Hjz8B/CTws7iTsGXgp8LHbgU+1fM8n+Tbg7uaIlJYACQewR1cB8v1\nNyCCAJ9fqxCneYsxkLtS1gCfMnF4J+oMfkoEL3/azZqjPMHaSwTaVcw7/xVmgmMSEV76ykXa7U4k\n9e9bkUBotwLufd8JPC8eY1J7M8gMXv+n1UhIs+KqZ6Isj9ys+0miuLTzdiPWqLVoRFw9s5mxBsF1\ntVTpoQFejYQUFyfbOXJgBsmdnegz5tdcSWLU1TPfRoT1lXLUo1ATpAFejcb6G2BiWKHhz0LhAhJM\nbvGL9eVyrGbvXX7WI3elQhBoSjQtNMCrPZN2A8rL8SiP3Cy86Iny8kSertloUy01YpN77+V60wRU\nS9pCOC3i9ypUiSPFS+5G3FISG4Kr/XHGrLge0/RMl0BuVVd6SgsN8GrvcmeJR0HWNrxZyJ2dyOpG\n68vlsS6qvVdexrJ+uawrPaWEBni1JyIBFBfimZ7psj50mlDLjfVpOp2AUr4ei6tXt2OtodVs06jF\np1++Gp/4vhJVMlRW3cpNNoYnWHuJIKVLY32KSrERi94zOzHGlUsWc7Woh6ImQAO82hMpLhCrSza3\nYzNhKml88quVRKQ+rDXkrmgePg00wKu9yZ2Lz5WrO/GyUFkbW494ESG3Uol1eqbLz1hKhbr2iE+B\n+L8aVWxJswL1QryuXt2OMe48cPnyWH59o9am1WhjY3yCtaubQioX9KrWaacBXu2alJbCwBn/oAaA\ngOQvjOVXF3MxL4/cTGTjils1vTTAq93LnydRLyF/FvIXxpInz61UYl0euZmXseRXtFxy2iXo3ani\nRIIOFBfBj3F55GbWg6AFtdGuAd/pBJRjXh65mSuXDKhXtVxymiXnFanipbrqqmdMwl5CIlevvB2R\nbi47MekZumMVSlouOdUS9u5UcSHFRVf/njQ2A/lzI/2VxfUqQQJTHcYacitaLjnNNMCr3cmfT0Z5\n5GZeFqprSKc5sl+ZW6ngx7C5WD/dcsmgk8A/1GogyXtVqshJu56c8sjNummU8pWR/Lpmo02z3o5l\ne+B+ummainaXnFoa4NX
|
||
|
"text/plain": [
|
||
|
"<matplotlib.figure.Figure at 0x7f75ddba2310>"
|
||
|
]
|
||
|
},
|
||
|
"metadata": {},
|
||
|
"output_type": "display_data"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"slider = widgets.FloatSliderWidget(min=0, max=4, value=2)\n",
|
||
|
"interact(plot_pdfs, cohen_d=slider)\n",
|
||
|
"None"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Cohen's $d$ has a few nice properties:\n",
|
||
|
"\n",
|
||
|
"* Because mean and standard deviation have the same units, their ratio is dimensionless, so we can compare $d$ across different studies.\n",
|
||
|
"\n",
|
||
|
"* In fields that commonly use $d$, people are calibrated to know what values should be considered big, surprising, or important.\n",
|
||
|
"\n",
|
||
|
"* Given $d$ (and the assumption that the distributions are normal), you can compute overlap, superiority, and related statistics."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"In summary, the best way to report effect size often depends on the audience and your goals. There is often a tradeoff between summary statistics that have good technical properties and statistics that are meaningful to a general audience."
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 2",
|
||
|
"language": "python",
|
||
|
"name": "python2"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 2
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython2",
|
||
|
"version": "2.7.10"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 0
|
||
|
}
|