From 98375d25ec3cc457a876bca6d069099f2d2eb4cf Mon Sep 17 00:00:00 2001 From: Donne Martin Date: Sun, 27 Dec 2015 09:28:26 -0500 Subject: [PATCH] Add Theano logistic regression notebook. --- README.md | 1 + .../intro_theano/logistic_regression.ipynb | 445 ++++++++++++++++++ 2 files changed, 446 insertions(+) create mode 100644 deep-learning/theano-tutorial/intro_theano/logistic_regression.ipynb diff --git a/README.md b/README.md index 17d2a1d..5492897 100644 --- a/README.md +++ b/README.md @@ -95,6 +95,7 @@ IPython Notebook(s) demonstrating deep learning functionality. | [tsf-lstm](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/tensor-flow-exercises/6_lstm.ipynb) | Train a LSTM character model over Text8 data in TensorFlow. | | [theano-intro](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/intro_theano/intro_theano.ipynb) | Intro to Theano, which allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation. | | [theano-scan](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/scan_tutorial/scan_tutorial.ipynb) | Learn scans, a mechanism to perform loops in a Theano graph. | +| [theano-logistic](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/intro_theano/logistic_regression.ipynb) | Implement logistic regression in Theano. | | [deep-dream](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/deep-dream/dream.ipynb) | Caffe-based computer vision program which uses a convolutional neural network to find and enhance patterns in images. |
diff --git a/deep-learning/theano-tutorial/intro_theano/logistic_regression.ipynb b/deep-learning/theano-tutorial/intro_theano/logistic_regression.ipynb new file mode 100644 index 0000000..2d30611 --- /dev/null +++ b/deep-learning/theano-tutorial/intro_theano/logistic_regression.ipynb @@ -0,0 +1,445 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Logistic Regression in Theano\n", + "\n", + "Credits: Forked from [summerschool2015](https://github.com/mila-udem/summerschool2015) by mila-udem\n", + "\n", + "This notebook is inspired from [the tutorial on logistic regression](http://deeplearning.net/tutorial/logreg.html) on [deeplearning.net](http://deeplearning.net).\n", + "\n", + "In this notebook, we show how Theano can be used to implement the most basic classifier: the **logistic regression**. We start off with a quick primer of the model, which serves both as a refresher but also to anchor the notation and show how mathematical expressions are mapped onto Theano graphs.\n", + "\n", + "In the deepest of machine learning traditions, this tutorial will tackle the exciting problem of MNIST digit classification." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Get the data\n", + "\n", + "In the mean time, let's just download a pre-packaged version of MNIST, and load each split of the dataset as NumPy ndarrays." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "import os\n", + "import requests\n", + "import gzip\n", + "import six\n", + "from six.moves import cPickle\n", + "\n", + "if not os.path.exists('mnist.pkl.gz'):\n", + " r = requests.get('http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz')\n", + " with open('mnist.pkl.gz', 'wb') as data_file:\n", + " data_file.write(r.content)\n", + "\n", + "with gzip.open('mnist.pkl.gz', 'rb') as data_file:\n", + " if six.PY3:\n", + " train_set, valid_set, test_set = cPickle.load(data_file, encoding='latin1')\n", + " else:\n", + " train_set, valid_set, test_set = cPickle.load(data_file)\n", + "\n", + "train_set_x, train_set_y = train_set\n", + "valid_set_x, valid_set_y = valid_set\n", + "test_set_x, test_set_y = test_set" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## The model\n", + "Logistic regression is a probabilistic, linear classifier. It is parametrized\n", + "by a weight matrix $W$ and a bias vector $b$. Classification is\n", + "done by projecting an input vector onto a set of hyperplanes, each of which\n", + "corresponds to a class. The distance from the input to a hyperplane reflects\n", + "the probability that the input is a member of the corresponding class.\n", + "\n", + "Mathematically, the probability that an input vector $x$ is a member of a\n", + "class $i$, a value of a stochastic variable $Y$, can be written as:\n", + "\n", + "$$P(Y=i|x, W,b) = softmax_i(W x + b) = \\frac {e^{W_i x + b_i}} {\\sum_j e^{W_j x + b_j}}$$\n", + "\n", + "The model's prediction $y_{pred}$ is the class whose probability is maximal, specifically:\n", + "\n", + "$$ y_{pred} = {\\rm argmax}_i P(Y=i|x,W,b)$$\n", + "\n", + "Now, let us define our input variables. First, we need to define the dimension of our tensors:\n", + "- `n_in` is the length of each training vector,\n", + "- `n_out` is the number of classes.\n", + "\n", + "Our variables will be:\n", + "- `x` is a matrix, where each row contains a different example of the dataset. Its shape is `(batch_size, n_in)`, but `batch_size` does not have to be specified in advance, and can change during training.\n", + "- `W` is a shared matrix, of shape `(n_in, n_out)`, initialized with zeros. Column `k` of `W` represents the separation hyperplane for class `k`.\n", + "- `b` is a shared vector, of length `n_out`, initialized with zeros. Element `k` of `b` represents the free parameter of hyperplane `k`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "import numpy\n", + "import theano\n", + "from theano import tensor\n", + "\n", + "# Size of the data\n", + "n_in = 28 * 28\n", + "# Number of classes\n", + "n_out = 10\n", + "\n", + "x = tensor.matrix('x')\n", + "W = theano.shared(value=numpy.zeros((n_in, n_out), dtype=theano.config.floatX),\n", + " name='W',\n", + " borrow=True)\n", + "b = theano.shared(value=numpy.zeros((n_out,), dtype=theano.config.floatX),\n", + " name='b',\n", + " borrow=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, we can build a symbolic expression for the matrix of class-membership probability (`p_y_given_x`), and for the class whose probability is maximal (`y_pred`)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "p_y_given_x = tensor.nnet.softmax(tensor.dot(x, W) + b)\n", + "y_pred = tensor.argmax(p_y_given_x, axis=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Defining a loss function\n", + "Learning optimal model parameters involves minimizing a loss function. In the\n", + "case of multi-class logistic regression, it is very common to use the negative\n", + "log-likelihood as the loss. This is equivalent to maximizing the likelihood of the\n", + "data set $\\cal{D}$ under the model parameterized by $\\theta$. Let\n", + "us first start by defining the likelihood $\\cal{L}$ and loss\n", + "$\\ell$:\n", + "\n", + "$$\\mathcal{L} (\\theta=\\{W,b\\}, \\mathcal{D}) =\n", + " \\sum_{i=0}^{|\\mathcal{D}|} \\log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\\\\n", + " \\ell (\\theta=\\{W,b\\}, \\mathcal{D}) = - \\mathcal{L} (\\theta=\\{W,b\\}, \\mathcal{D})\n", + "$$\n", + "\n", + "Again, we will express those expressions using Theano. We have one additional input, the actual target class `y`:\n", + "- `y` is an input vector of integers, of length `batch_size` (which will have to match the length of `x` at runtime). The length of `y` can be symbolically expressed by `y.shape[0]`.\n", + "- `log_prob` is a `(batch_size, n_out)` matrix containing the log probabilities of class membership for each example.\n", + "- `arange(y.shape[0])` is a symbolic vector which will contain `[0,1,2,... batch_size-1]`\n", + "- `log_likelihood` is a vector containing the log probability of the target, for each example.\n", + "- `loss` is the mean of the negative `log_likelihood` over the examples in the minibatch." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "y = tensor.lvector('y')\n", + "log_prob = tensor.log(p_y_given_x)\n", + "log_likelihood = log_prob[tensor.arange(y.shape[0]), y]\n", + "loss = - log_likelihood.mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Training procedure\n", + "This notebook will use the method of stochastic gradient descent with mini-batches (MSGD) to find values of `W` and `b` that minimize the loss.\n", + "\n", + "We can let Theano compute symbolic expressions for the gradient of the loss wrt `W` and `b`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "g_W, g_b = theano.grad(cost=loss, wrt=[W, b])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`g_W` and `g_b` are symbolic variables, which can be used as part of a computation graph. In particular, let us define the expressions for one step of gradient descent for `W` and `b`, for a fixed learning rate." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "learning_rate = numpy.float32(0.13)\n", + "new_W = W - learning_rate * g_W\n", + "new_b = b - learning_rate * g_b" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can then define **update expressions**, or pairs of (shared variable, expression for its update), that we will use when compiling the Theano function. The updates will be performed each time the function gets called.\n", + "\n", + "The following function, `train_model`, returns the loss on the current minibatch, then changes the values of the shared variables according to the update rules. It needs to be passed `x` and `y` as inputs, but not the shared variables, which are implicit inputs.\n", + "\n", + "The entire learning algorithm thus consists in looping over all examples in the dataset, considering all the examples in one minibatch at a time, and repeatedly calling the `train_model` function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "train_model = theano.function(inputs=[x, y],\n", + " outputs=loss,\n", + " updates=[(W, new_W),\n", + " (b, new_b)])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Testing the model\n", + "When testing the model, we are interested in the number of misclassified examples (and not only in the likelihood). Here, we build a symbolic expression for retrieving the number of misclassified examples in a minibatch.\n", + "\n", + "This will also be useful to apply on the validation and testing sets, in order to monitor the progress of the model during training, and to do early stopping." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "misclass_nb = tensor.neq(y_pred, y)\n", + "misclass_rate = misclass_nb.mean()\n", + "\n", + "test_model = theano.function(inputs=[x, y],\n", + " outputs=misclass_rate)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Training the model\n", + "Here is the main training loop of the algorithm:\n", + "- For each *epoch*, or pass through the training set\n", + " - split the training set in minibatches, and call `train_model` on each minibatch\n", + " - split the validation set in minibatches, and call `test_model` on each minibatch to measure the misclassification rate\n", + " - if the misclassification rate has not improved in a while, stop training\n", + "- Measure performance on the test set\n", + "\n", + "The **early stopping procedure** is what decide whether the performance has improved enough. There are many variants, and we will not go into the details of this one here.\n", + "\n", + "We first need to define a few parameters for the training loop and the early stopping procedure." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "## Define a couple of helper variables and functions for the optimization\n", + "batch_size = 500\n", + "# compute number of minibatches for training, validation and testing\n", + "n_train_batches = train_set_x.shape[0] // batch_size\n", + "n_valid_batches = valid_set_x.shape[0] // batch_size\n", + "n_test_batches = test_set_x.shape[0] // batch_size\n", + "\n", + "def get_minibatch(i, dataset_x, dataset_y):\n", + " start_idx = i * batch_size\n", + " end_idx = (i + 1) * batch_size\n", + " batch_x = dataset_x[start_idx:end_idx]\n", + " batch_y = dataset_y[start_idx:end_idx]\n", + " return (batch_x, batch_y)\n", + "\n", + "## early-stopping parameters\n", + "# maximum number of epochs\n", + "n_epochs = 1000\n", + "# look as this many examples regardless\n", + "patience = 5000\n", + "# wait this much longer when a new best is found\n", + "patience_increase = 2\n", + "# a relative improvement of this much is considered significant\n", + "improvement_threshold = 0.995\n", + "\n", + "# go through this many minibatches before checking the network on the validation set;\n", + "# in this case we check every epoch\n", + "validation_frequency = min(n_train_batches, patience / 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false, + "scrolled": true + }, + "outputs": [], + "source": [ + "import timeit\n", + "from six.moves import xrange\n", + "\n", + "best_validation_loss = numpy.inf\n", + "test_score = 0.\n", + "start_time = timeit.default_timer()\n", + "\n", + "done_looping = False\n", + "epoch = 0\n", + "while (epoch < n_epochs) and (not done_looping):\n", + " epoch = epoch + 1\n", + " for minibatch_index in xrange(n_train_batches):\n", + " minibatch_x, minibatch_y = get_minibatch(minibatch_index, train_set_x, train_set_y)\n", + " minibatch_avg_cost = train_model(minibatch_x, minibatch_y)\n", + "\n", + " # iteration number\n", + " iter = (epoch - 1) * n_train_batches + minibatch_index\n", + " if (iter + 1) % validation_frequency == 0:\n", + " # compute zero-one loss on validation set\n", + " validation_losses = []\n", + " for i in xrange(n_valid_batches):\n", + " valid_xi, valid_yi = get_minibatch(i, valid_set_x, valid_set_y)\n", + " validation_losses.append(test_model(valid_xi, valid_yi))\n", + " this_validation_loss = numpy.mean(validation_losses)\n", + " print('epoch %i, minibatch %i/%i, validation error %f %%' %\n", + " (epoch,\n", + " minibatch_index + 1,\n", + " n_train_batches,\n", + " this_validation_loss * 100.))\n", + "\n", + " # if we got the best validation score until now\n", + " if this_validation_loss < best_validation_loss:\n", + " # improve patience if loss improvement is good enough\n", + " if this_validation_loss < best_validation_loss * improvement_threshold:\n", + " patience = max(patience, iter * patience_increase)\n", + "\n", + " best_validation_loss = this_validation_loss\n", + "\n", + " # test it on the test set\n", + " test_losses = []\n", + " for i in xrange(n_test_batches):\n", + " test_xi, test_yi = get_minibatch(i, test_set_x, test_set_y)\n", + " test_losses.append(test_model(test_xi, test_yi))\n", + "\n", + " test_score = numpy.mean(test_losses)\n", + " print(' epoch %i, minibatch %i/%i, test error of best model %f %%' %\n", + " (epoch,\n", + " minibatch_index + 1,\n", + " n_train_batches,\n", + " test_score * 100.))\n", + "\n", + " # save the best parameters\n", + " numpy.savez('best_model.npz', W=W.get_value(), b=b.get_value())\n", + "\n", + " if patience <= iter:\n", + " done_looping = True\n", + " break\n", + "\n", + "end_time = timeit.default_timer()\n", + "print('Optimization complete with best validation score of %f %%, '\n", + " 'with test performance %f %%' %\n", + " (best_validation_loss * 100., test_score * 100.))\n", + "\n", + "print('The code ran for %d epochs, with %f epochs/sec' %\n", + " (epoch, 1. * epoch / (end_time - start_time)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Visualization\n", + "You can visualize the columns of `W`, which correspond to the separation hyperplanes for each class." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import matplotlib.pyplot as plt\n", + "from utils import tile_raster_images\n", + "\n", + "plt.clf()\n", + "\n", + "# Increase the size of the figure\n", + "plt.gcf().set_size_inches(15, 10)\n", + "\n", + "plot_data = tile_raster_images(W.get_value(borrow=True).T,\n", + " img_shape=(28, 28), tile_shape=(2, 5), tile_spacing=(1, 1))\n", + "plt.imshow(plot_data, cmap='Greys', interpolation='none')\n", + "plt.axis('off')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.4.3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}