mirror of
https://github.com/donnemartin/data-science-ipython-notebooks.git
synced 2024-03-22 13:30:56 +08:00
446 lines
17 KiB
Python
446 lines
17 KiB
Python
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"# Logistic Regression in Theano\n",
|
||
|
"\n",
|
||
|
"Credits: Forked from [summerschool2015](https://github.com/mila-udem/summerschool2015) by mila-udem\n",
|
||
|
"\n",
|
||
|
"This notebook is inspired from [the tutorial on logistic regression](http://deeplearning.net/tutorial/logreg.html) on [deeplearning.net](http://deeplearning.net).\n",
|
||
|
"\n",
|
||
|
"In this notebook, we show how Theano can be used to implement the most basic classifier: the **logistic regression**. We start off with a quick primer of the model, which serves both as a refresher but also to anchor the notation and show how mathematical expressions are mapped onto Theano graphs.\n",
|
||
|
"\n",
|
||
|
"In the deepest of machine learning traditions, this tutorial will tackle the exciting problem of MNIST digit classification."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Get the data\n",
|
||
|
"\n",
|
||
|
"In the mean time, let's just download a pre-packaged version of MNIST, and load each split of the dataset as NumPy ndarrays."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import os\n",
|
||
|
"import requests\n",
|
||
|
"import gzip\n",
|
||
|
"import six\n",
|
||
|
"from six.moves import cPickle\n",
|
||
|
"\n",
|
||
|
"if not os.path.exists('mnist.pkl.gz'):\n",
|
||
|
" r = requests.get('http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz')\n",
|
||
|
" with open('mnist.pkl.gz', 'wb') as data_file:\n",
|
||
|
" data_file.write(r.content)\n",
|
||
|
"\n",
|
||
|
"with gzip.open('mnist.pkl.gz', 'rb') as data_file:\n",
|
||
|
" if six.PY3:\n",
|
||
|
" train_set, valid_set, test_set = cPickle.load(data_file, encoding='latin1')\n",
|
||
|
" else:\n",
|
||
|
" train_set, valid_set, test_set = cPickle.load(data_file)\n",
|
||
|
"\n",
|
||
|
"train_set_x, train_set_y = train_set\n",
|
||
|
"valid_set_x, valid_set_y = valid_set\n",
|
||
|
"test_set_x, test_set_y = test_set"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## The model\n",
|
||
|
"Logistic regression is a probabilistic, linear classifier. It is parametrized\n",
|
||
|
"by a weight matrix $W$ and a bias vector $b$. Classification is\n",
|
||
|
"done by projecting an input vector onto a set of hyperplanes, each of which\n",
|
||
|
"corresponds to a class. The distance from the input to a hyperplane reflects\n",
|
||
|
"the probability that the input is a member of the corresponding class.\n",
|
||
|
"\n",
|
||
|
"Mathematically, the probability that an input vector $x$ is a member of a\n",
|
||
|
"class $i$, a value of a stochastic variable $Y$, can be written as:\n",
|
||
|
"\n",
|
||
|
"$$P(Y=i|x, W,b) = softmax_i(W x + b) = \\frac {e^{W_i x + b_i}} {\\sum_j e^{W_j x + b_j}}$$\n",
|
||
|
"\n",
|
||
|
"The model's prediction $y_{pred}$ is the class whose probability is maximal, specifically:\n",
|
||
|
"\n",
|
||
|
"$$ y_{pred} = {\\rm argmax}_i P(Y=i|x,W,b)$$\n",
|
||
|
"\n",
|
||
|
"Now, let us define our input variables. First, we need to define the dimension of our tensors:\n",
|
||
|
"- `n_in` is the length of each training vector,\n",
|
||
|
"- `n_out` is the number of classes.\n",
|
||
|
"\n",
|
||
|
"Our variables will be:\n",
|
||
|
"- `x` is a matrix, where each row contains a different example of the dataset. Its shape is `(batch_size, n_in)`, but `batch_size` does not have to be specified in advance, and can change during training.\n",
|
||
|
"- `W` is a shared matrix, of shape `(n_in, n_out)`, initialized with zeros. Column `k` of `W` represents the separation hyperplane for class `k`.\n",
|
||
|
"- `b` is a shared vector, of length `n_out`, initialized with zeros. Element `k` of `b` represents the free parameter of hyperplane `k`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import numpy\n",
|
||
|
"import theano\n",
|
||
|
"from theano import tensor\n",
|
||
|
"\n",
|
||
|
"# Size of the data\n",
|
||
|
"n_in = 28 * 28\n",
|
||
|
"# Number of classes\n",
|
||
|
"n_out = 10\n",
|
||
|
"\n",
|
||
|
"x = tensor.matrix('x')\n",
|
||
|
"W = theano.shared(value=numpy.zeros((n_in, n_out), dtype=theano.config.floatX),\n",
|
||
|
" name='W',\n",
|
||
|
" borrow=True)\n",
|
||
|
"b = theano.shared(value=numpy.zeros((n_out,), dtype=theano.config.floatX),\n",
|
||
|
" name='b',\n",
|
||
|
" borrow=True)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Now, we can build a symbolic expression for the matrix of class-membership probability (`p_y_given_x`), and for the class whose probability is maximal (`y_pred`)."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"p_y_given_x = tensor.nnet.softmax(tensor.dot(x, W) + b)\n",
|
||
|
"y_pred = tensor.argmax(p_y_given_x, axis=1)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Defining a loss function\n",
|
||
|
"Learning optimal model parameters involves minimizing a loss function. In the\n",
|
||
|
"case of multi-class logistic regression, it is very common to use the negative\n",
|
||
|
"log-likelihood as the loss. This is equivalent to maximizing the likelihood of the\n",
|
||
|
"data set $\\cal{D}$ under the model parameterized by $\\theta$. Let\n",
|
||
|
"us first start by defining the likelihood $\\cal{L}$ and loss\n",
|
||
|
"$\\ell$:\n",
|
||
|
"\n",
|
||
|
"$$\\mathcal{L} (\\theta=\\{W,b\\}, \\mathcal{D}) =\n",
|
||
|
" \\sum_{i=0}^{|\\mathcal{D}|} \\log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\\\\n",
|
||
|
" \\ell (\\theta=\\{W,b\\}, \\mathcal{D}) = - \\mathcal{L} (\\theta=\\{W,b\\}, \\mathcal{D})\n",
|
||
|
"$$\n",
|
||
|
"\n",
|
||
|
"Again, we will express those expressions using Theano. We have one additional input, the actual target class `y`:\n",
|
||
|
"- `y` is an input vector of integers, of length `batch_size` (which will have to match the length of `x` at runtime). The length of `y` can be symbolically expressed by `y.shape[0]`.\n",
|
||
|
"- `log_prob` is a `(batch_size, n_out)` matrix containing the log probabilities of class membership for each example.\n",
|
||
|
"- `arange(y.shape[0])` is a symbolic vector which will contain `[0,1,2,... batch_size-1]`\n",
|
||
|
"- `log_likelihood` is a vector containing the log probability of the target, for each example.\n",
|
||
|
"- `loss` is the mean of the negative `log_likelihood` over the examples in the minibatch."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"y = tensor.lvector('y')\n",
|
||
|
"log_prob = tensor.log(p_y_given_x)\n",
|
||
|
"log_likelihood = log_prob[tensor.arange(y.shape[0]), y]\n",
|
||
|
"loss = - log_likelihood.mean()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Training procedure\n",
|
||
|
"This notebook will use the method of stochastic gradient descent with mini-batches (MSGD) to find values of `W` and `b` that minimize the loss.\n",
|
||
|
"\n",
|
||
|
"We can let Theano compute symbolic expressions for the gradient of the loss wrt `W` and `b`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"g_W, g_b = theano.grad(cost=loss, wrt=[W, b])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"`g_W` and `g_b` are symbolic variables, which can be used as part of a computation graph. In particular, let us define the expressions for one step of gradient descent for `W` and `b`, for a fixed learning rate."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"learning_rate = numpy.float32(0.13)\n",
|
||
|
"new_W = W - learning_rate * g_W\n",
|
||
|
"new_b = b - learning_rate * g_b"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"We can then define **update expressions**, or pairs of (shared variable, expression for its update), that we will use when compiling the Theano function. The updates will be performed each time the function gets called.\n",
|
||
|
"\n",
|
||
|
"The following function, `train_model`, returns the loss on the current minibatch, then changes the values of the shared variables according to the update rules. It needs to be passed `x` and `y` as inputs, but not the shared variables, which are implicit inputs.\n",
|
||
|
"\n",
|
||
|
"The entire learning algorithm thus consists in looping over all examples in the dataset, considering all the examples in one minibatch at a time, and repeatedly calling the `train_model` function."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"train_model = theano.function(inputs=[x, y],\n",
|
||
|
" outputs=loss,\n",
|
||
|
" updates=[(W, new_W),\n",
|
||
|
" (b, new_b)])"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### Testing the model\n",
|
||
|
"When testing the model, we are interested in the number of misclassified examples (and not only in the likelihood). Here, we build a symbolic expression for retrieving the number of misclassified examples in a minibatch.\n",
|
||
|
"\n",
|
||
|
"This will also be useful to apply on the validation and testing sets, in order to monitor the progress of the model during training, and to do early stopping."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"misclass_nb = tensor.neq(y_pred, y)\n",
|
||
|
"misclass_rate = misclass_nb.mean()\n",
|
||
|
"\n",
|
||
|
"test_model = theano.function(inputs=[x, y],\n",
|
||
|
" outputs=misclass_rate)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Training the model\n",
|
||
|
"Here is the main training loop of the algorithm:\n",
|
||
|
"- For each *epoch*, or pass through the training set\n",
|
||
|
" - split the training set in minibatches, and call `train_model` on each minibatch\n",
|
||
|
" - split the validation set in minibatches, and call `test_model` on each minibatch to measure the misclassification rate\n",
|
||
|
" - if the misclassification rate has not improved in a while, stop training\n",
|
||
|
"- Measure performance on the test set\n",
|
||
|
"\n",
|
||
|
"The **early stopping procedure** is what decide whether the performance has improved enough. There are many variants, and we will not go into the details of this one here.\n",
|
||
|
"\n",
|
||
|
"We first need to define a few parameters for the training loop and the early stopping procedure."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"## Define a couple of helper variables and functions for the optimization\n",
|
||
|
"batch_size = 500\n",
|
||
|
"# compute number of minibatches for training, validation and testing\n",
|
||
|
"n_train_batches = train_set_x.shape[0] // batch_size\n",
|
||
|
"n_valid_batches = valid_set_x.shape[0] // batch_size\n",
|
||
|
"n_test_batches = test_set_x.shape[0] // batch_size\n",
|
||
|
"\n",
|
||
|
"def get_minibatch(i, dataset_x, dataset_y):\n",
|
||
|
" start_idx = i * batch_size\n",
|
||
|
" end_idx = (i + 1) * batch_size\n",
|
||
|
" batch_x = dataset_x[start_idx:end_idx]\n",
|
||
|
" batch_y = dataset_y[start_idx:end_idx]\n",
|
||
|
" return (batch_x, batch_y)\n",
|
||
|
"\n",
|
||
|
"## early-stopping parameters\n",
|
||
|
"# maximum number of epochs\n",
|
||
|
"n_epochs = 1000\n",
|
||
|
"# look as this many examples regardless\n",
|
||
|
"patience = 5000\n",
|
||
|
"# wait this much longer when a new best is found\n",
|
||
|
"patience_increase = 2\n",
|
||
|
"# a relative improvement of this much is considered significant\n",
|
||
|
"improvement_threshold = 0.995\n",
|
||
|
"\n",
|
||
|
"# go through this many minibatches before checking the network on the validation set;\n",
|
||
|
"# in this case we check every epoch\n",
|
||
|
"validation_frequency = min(n_train_batches, patience / 2)"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false,
|
||
|
"scrolled": true
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import timeit\n",
|
||
|
"from six.moves import xrange\n",
|
||
|
"\n",
|
||
|
"best_validation_loss = numpy.inf\n",
|
||
|
"test_score = 0.\n",
|
||
|
"start_time = timeit.default_timer()\n",
|
||
|
"\n",
|
||
|
"done_looping = False\n",
|
||
|
"epoch = 0\n",
|
||
|
"while (epoch < n_epochs) and (not done_looping):\n",
|
||
|
" epoch = epoch + 1\n",
|
||
|
" for minibatch_index in xrange(n_train_batches):\n",
|
||
|
" minibatch_x, minibatch_y = get_minibatch(minibatch_index, train_set_x, train_set_y)\n",
|
||
|
" minibatch_avg_cost = train_model(minibatch_x, minibatch_y)\n",
|
||
|
"\n",
|
||
|
" # iteration number\n",
|
||
|
" iter = (epoch - 1) * n_train_batches + minibatch_index\n",
|
||
|
" if (iter + 1) % validation_frequency == 0:\n",
|
||
|
" # compute zero-one loss on validation set\n",
|
||
|
" validation_losses = []\n",
|
||
|
" for i in xrange(n_valid_batches):\n",
|
||
|
" valid_xi, valid_yi = get_minibatch(i, valid_set_x, valid_set_y)\n",
|
||
|
" validation_losses.append(test_model(valid_xi, valid_yi))\n",
|
||
|
" this_validation_loss = numpy.mean(validation_losses)\n",
|
||
|
" print('epoch %i, minibatch %i/%i, validation error %f %%' %\n",
|
||
|
" (epoch,\n",
|
||
|
" minibatch_index + 1,\n",
|
||
|
" n_train_batches,\n",
|
||
|
" this_validation_loss * 100.))\n",
|
||
|
"\n",
|
||
|
" # if we got the best validation score until now\n",
|
||
|
" if this_validation_loss < best_validation_loss:\n",
|
||
|
" # improve patience if loss improvement is good enough\n",
|
||
|
" if this_validation_loss < best_validation_loss * improvement_threshold:\n",
|
||
|
" patience = max(patience, iter * patience_increase)\n",
|
||
|
"\n",
|
||
|
" best_validation_loss = this_validation_loss\n",
|
||
|
"\n",
|
||
|
" # test it on the test set\n",
|
||
|
" test_losses = []\n",
|
||
|
" for i in xrange(n_test_batches):\n",
|
||
|
" test_xi, test_yi = get_minibatch(i, test_set_x, test_set_y)\n",
|
||
|
" test_losses.append(test_model(test_xi, test_yi))\n",
|
||
|
"\n",
|
||
|
" test_score = numpy.mean(test_losses)\n",
|
||
|
" print(' epoch %i, minibatch %i/%i, test error of best model %f %%' %\n",
|
||
|
" (epoch,\n",
|
||
|
" minibatch_index + 1,\n",
|
||
|
" n_train_batches,\n",
|
||
|
" test_score * 100.))\n",
|
||
|
"\n",
|
||
|
" # save the best parameters\n",
|
||
|
" numpy.savez('best_model.npz', W=W.get_value(), b=b.get_value())\n",
|
||
|
"\n",
|
||
|
" if patience <= iter:\n",
|
||
|
" done_looping = True\n",
|
||
|
" break\n",
|
||
|
"\n",
|
||
|
"end_time = timeit.default_timer()\n",
|
||
|
"print('Optimization complete with best validation score of %f %%, '\n",
|
||
|
" 'with test performance %f %%' %\n",
|
||
|
" (best_validation_loss * 100., test_score * 100.))\n",
|
||
|
"\n",
|
||
|
"print('The code ran for %d epochs, with %f epochs/sec' %\n",
|
||
|
" (epoch, 1. * epoch / (end_time - start_time)))"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## Visualization\n",
|
||
|
"You can visualize the columns of `W`, which correspond to the separation hyperplanes for each class."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {
|
||
|
"collapsed": false
|
||
|
},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"%matplotlib inline\n",
|
||
|
"import matplotlib.pyplot as plt\n",
|
||
|
"from utils import tile_raster_images\n",
|
||
|
"\n",
|
||
|
"plt.clf()\n",
|
||
|
"\n",
|
||
|
"# Increase the size of the figure\n",
|
||
|
"plt.gcf().set_size_inches(15, 10)\n",
|
||
|
"\n",
|
||
|
"plot_data = tile_raster_images(W.get_value(borrow=True).T,\n",
|
||
|
" img_shape=(28, 28), tile_shape=(2, 5), tile_spacing=(1, 1))\n",
|
||
|
"plt.imshow(plot_data, cmap='Greys', interpolation='none')\n",
|
||
|
"plt.axis('off')"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.4.3"
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 0
|
||
|
}
|