Merge pull request #19 from donnemartin/feature/deep-learning

Add Theano notebooks.
2024-03-22 13:30:56 +08:00 · 2015-12-27 09:34:47 -05:00 · 2015-12-27 09:34:47 -05:00 · 5281bee77b
commit 5281bee77b
parent 5172b94e91 871db9ec50
18 changed files with 4516 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -93,6 +93,11 @@ IPython Notebook(s) demonstrating deep learning functionality.
 | [tsf-convolutions](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/tensor-flow-exercises/4_convolutions.ipynb) |  Create convolutional neural networks in TensorFlow. |
 | [tsf-word2vec](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/tensor-flow-exercises/5_word2vec.ipynb) |  Train a skip-gram model over Text8 data in TensorFlow. |
 | [tsf-lstm](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/tensor-flow-exercises/6_lstm.ipynb) |  Train a LSTM character model over Text8 data in TensorFlow. |
 | [theano-intro](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/intro_theano/intro_theano.ipynb) |  Intro to Theano, which allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation. |
 | [theano-scan](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/scan_tutorial/scan_tutorial.ipynb) |  Learn scans, a mechanism to perform loops in a Theano graph. |
 | [theano-logistic](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/intro_theano/logistic_regression.ipynb) |  Implement logistic regression in Theano. |
 | [theano-rnn](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/rnn_tutorial/simple_rnn.ipynb) |  Implement recurrent neural networks in Theano. |
 | [theano-mlp](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/theano_mlp/theano_mlp.ipynb) |  Implement multilayer perceptrons in Theano. |
 | [deep-dream](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/deep-dream/dream.ipynb) |  Caffe-based computer vision program which uses a convolutional neural network to find and enhance patterns in images. |
 <br/>
--- a/deep-learning/theano-tutorial/intro_theano/Makefile
+++ b/deep-learning/theano-tutorial/intro_theano/Makefile
@ -0,0 +1,3 @@
 intro_theano.pdf: slides_source/intro_theano.tex
 	cd slides_source; pdflatex --shell-escape intro_theano.tex
 	mv slides_source/intro_theano.pdf .
--- a/deep-learning/theano-tutorial/intro_theano/intro_theano.ipynb
+++ b/deep-learning/theano-tutorial/intro_theano/intro_theano.ipynb
--- a/deep-learning/theano-tutorial/intro_theano/intro_theano.pdf
+++ b/deep-learning/theano-tutorial/intro_theano/intro_theano.pdf
--- a/deep-learning/theano-tutorial/intro_theano/logistic_regression.ipynb
+++ b/deep-learning/theano-tutorial/intro_theano/logistic_regression.ipynb
@ -0,0 +1,445 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Logistic Regression in Theano\n",
    "\n",
    "Credits: Forked from [summerschool2015](https://github.com/mila-udem/summerschool2015) by mila-udem\n",
    "\n",
    "This notebook is inspired from [the tutorial on logistic regression](http://deeplearning.net/tutorial/logreg.html) on [deeplearning.net](http://deeplearning.net).\n",
    "\n",
    "In this notebook, we show how Theano can be used to implement the most basic classifier: the **logistic regression**. We start off with a quick primer of the model, which serves both as a refresher but also to anchor the notation and show how mathematical expressions are mapped onto Theano graphs.\n",
    "\n",
    "In the deepest of machine learning traditions, this tutorial will tackle the exciting problem of MNIST digit classification."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get the data\n",
    "\n",
    "In the mean time, let's just download a pre-packaged version of MNIST, and load each split of the dataset as NumPy ndarrays."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import requests\n",
    "import gzip\n",
    "import six\n",
    "from six.moves import cPickle\n",
    "\n",
    "if not os.path.exists('mnist.pkl.gz'):\n",
    "    r = requests.get('http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz')\n",
    "    with open('mnist.pkl.gz', 'wb') as data_file:\n",
    "        data_file.write(r.content)\n",
    "\n",
    "with gzip.open('mnist.pkl.gz', 'rb') as data_file:\n",
    "    if six.PY3:\n",
    "        train_set, valid_set, test_set = cPickle.load(data_file, encoding='latin1')\n",
    "    else:\n",
    "        train_set, valid_set, test_set = cPickle.load(data_file)\n",
    "\n",
    "train_set_x, train_set_y = train_set\n",
    "valid_set_x, valid_set_y = valid_set\n",
    "test_set_x, test_set_y = test_set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The model\n",
    "Logistic regression is a probabilistic, linear classifier. It is parametrized\n",
    "by a weight matrix $W$ and a bias vector $b$. Classification is\n",
    "done by projecting an input vector onto a set of hyperplanes, each of which\n",
    "corresponds to a class. The distance from the input to a hyperplane reflects\n",
    "the probability that the input is a member of the corresponding class.\n",
    "\n",
    "Mathematically, the probability that an input vector $x$ is a member of a\n",
    "class $i$, a value of a stochastic variable $Y$, can be written as:\n",
    "\n",
    "$$P(Y=i|x, W,b) = softmax_i(W x + b) = \\frac {e^{W_i x + b_i}} {\\sum_j e^{W_j x + b_j}}$$\n",
    "\n",
    "The model's prediction $y_{pred}$ is the class whose probability is maximal, specifically:\n",
    "\n",
    "$$  y_{pred} = {\\rm argmax}_i P(Y=i|x,W,b)$$\n",
    "\n",
    "Now, let us define our input variables. First, we need to define the dimension of our tensors:\n",
    "- `n_in` is the length of each training vector,\n",
    "- `n_out` is the number of classes.\n",
    "\n",
    "Our variables will be:\n",
    "- `x` is a matrix, where each row contains a different example of the dataset. Its shape is `(batch_size, n_in)`, but `batch_size` does not have to be specified in advance, and can change during training.\n",
    "- `W` is a shared matrix, of shape `(n_in, n_out)`, initialized with zeros. Column `k` of `W` represents the separation hyperplane for class `k`.\n",
    "- `b` is a shared vector, of length `n_out`, initialized with zeros. Element `k` of `b` represents the free parameter of hyperplane `k`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy\n",
    "import theano\n",
    "from theano import tensor\n",
    "\n",
    "# Size of the data\n",
    "n_in = 28 * 28\n",
    "# Number of classes\n",
    "n_out = 10\n",
    "\n",
    "x = tensor.matrix('x')\n",
    "W = theano.shared(value=numpy.zeros((n_in, n_out), dtype=theano.config.floatX),\n",
    "                  name='W',\n",
    "                  borrow=True)\n",
    "b = theano.shared(value=numpy.zeros((n_out,), dtype=theano.config.floatX),\n",
    "                  name='b',\n",
    "                  borrow=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can build a symbolic expression for the matrix of class-membership probability (`p_y_given_x`), and for the class whose probability is maximal (`y_pred`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "p_y_given_x = tensor.nnet.softmax(tensor.dot(x, W) + b)\n",
    "y_pred = tensor.argmax(p_y_given_x, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining a loss function\n",
    "Learning optimal model parameters involves minimizing a loss function. In the\n",
    "case of multi-class logistic regression, it is very common to use the negative\n",
    "log-likelihood as the loss. This is equivalent to maximizing the likelihood of the\n",
    "data set $\\cal{D}$ under the model parameterized by $\\theta$. Let\n",
    "us first start by defining the likelihood $\\cal{L}$ and loss\n",
    "$\\ell$:\n",
    "\n",
    "$$\\mathcal{L} (\\theta=\\{W,b\\}, \\mathcal{D}) =\n",
    "     \\sum_{i=0}^{|\\mathcal{D}|} \\log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\\\\n",
    "   \\ell (\\theta=\\{W,b\\}, \\mathcal{D}) = - \\mathcal{L} (\\theta=\\{W,b\\}, \\mathcal{D})\n",
    "$$\n",
    "\n",
    "Again, we will express those expressions using Theano. We have one additional input, the actual target class `y`:\n",
    "- `y` is an input vector of integers, of length `batch_size` (which will have to match the length of `x` at runtime). The length of `y` can be symbolically expressed by `y.shape[0]`.\n",
    "- `log_prob` is a `(batch_size, n_out)` matrix containing the log probabilities of class membership for each example.\n",
    "- `arange(y.shape[0])` is a symbolic vector which will contain `[0,1,2,... batch_size-1]`\n",
    "- `log_likelihood` is a vector containing the log probability of the target, for each example.\n",
    "- `loss` is the mean of the negative `log_likelihood` over the examples in the minibatch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "y = tensor.lvector('y')\n",
    "log_prob = tensor.log(p_y_given_x)\n",
    "log_likelihood = log_prob[tensor.arange(y.shape[0]), y]\n",
    "loss = - log_likelihood.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training procedure\n",
    "This notebook will use the method of stochastic gradient descent with mini-batches (MSGD) to find values of `W` and `b` that minimize the loss.\n",
    "\n",
    "We can let Theano compute symbolic expressions for the gradient of the loss wrt `W` and `b`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "g_W, g_b = theano.grad(cost=loss, wrt=[W, b])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`g_W` and `g_b` are symbolic variables, which can be used as part of a computation graph. In particular, let us define the expressions for one step of gradient descent for `W` and `b`, for a fixed learning rate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "learning_rate = numpy.float32(0.13)\n",
    "new_W = W - learning_rate * g_W\n",
    "new_b = b - learning_rate * g_b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then define **update expressions**, or pairs of (shared variable, expression for its update), that we will use when compiling the Theano function. The updates will be performed each time the function gets called.\n",
    "\n",
    "The following function, `train_model`, returns the loss on the current minibatch, then changes the values of the shared variables according to the update rules. It needs to be passed `x` and `y` as inputs, but not the shared variables, which are implicit inputs.\n",
    "\n",
    "The entire learning algorithm thus consists in looping over all examples in the dataset, considering all the examples in one minibatch at a time, and repeatedly calling the `train_model` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "train_model = theano.function(inputs=[x, y],\n",
    "                              outputs=loss,\n",
    "                              updates=[(W, new_W),\n",
    "                                       (b, new_b)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Testing the model\n",
    "When testing the model, we are interested in the number of misclassified examples (and not only in the likelihood). Here, we build a symbolic expression for retrieving the number of misclassified examples in a minibatch.\n",
    "\n",
    "This will also be useful to apply on the validation and testing sets, in order to monitor the progress of the model during training, and to do early stopping."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "misclass_nb = tensor.neq(y_pred, y)\n",
    "misclass_rate = misclass_nb.mean()\n",
    "\n",
    "test_model = theano.function(inputs=[x, y],\n",
    "                             outputs=misclass_rate)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training the model\n",
    "Here is the main training loop of the algorithm:\n",
    "- For each *epoch*, or pass through the training set\n",
    "  - split the training set in minibatches, and call `train_model` on each minibatch\n",
    "  - split the validation set in minibatches, and call `test_model` on each minibatch to measure the misclassification rate\n",
    "  - if the misclassification rate has not improved in a while, stop training\n",
    "- Measure performance on the test set\n",
    "\n",
    "The **early stopping procedure** is what decide whether the performance has improved enough. There are many variants, and we will not go into the details of this one here.\n",
    "\n",
    "We first need to define a few parameters for the training loop and the early stopping procedure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "## Define a couple of helper variables and functions for the optimization\n",
    "batch_size = 500\n",
    "# compute number of minibatches for training, validation and testing\n",
    "n_train_batches = train_set_x.shape[0] // batch_size\n",
    "n_valid_batches = valid_set_x.shape[0] // batch_size\n",
    "n_test_batches = test_set_x.shape[0] // batch_size\n",
    "\n",
    "def get_minibatch(i, dataset_x, dataset_y):\n",
    "    start_idx = i * batch_size\n",
    "    end_idx = (i + 1) * batch_size\n",
    "    batch_x = dataset_x[start_idx:end_idx]\n",
    "    batch_y = dataset_y[start_idx:end_idx]\n",
    "    return (batch_x, batch_y)\n",
    "\n",
    "## early-stopping parameters\n",
    "# maximum number of epochs\n",
    "n_epochs = 1000\n",
    "# look as this many examples regardless\n",
    "patience = 5000\n",
    "# wait this much longer when a new best is found\n",
    "patience_increase = 2\n",
    "# a relative improvement of this much is considered significant\n",
    "improvement_threshold = 0.995\n",
    "\n",
    "# go through this many minibatches before checking the network on the validation set;\n",
    "# in this case we check every epoch\n",
    "validation_frequency = min(n_train_batches, patience / 2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import timeit\n",
    "from six.moves import xrange\n",
    "\n",
    "best_validation_loss = numpy.inf\n",
    "test_score = 0.\n",
    "start_time = timeit.default_timer()\n",
    "\n",
    "done_looping = False\n",
    "epoch = 0\n",
    "while (epoch < n_epochs) and (not done_looping):\n",
    "    epoch = epoch + 1\n",
    "    for minibatch_index in xrange(n_train_batches):\n",
    "        minibatch_x, minibatch_y = get_minibatch(minibatch_index, train_set_x, train_set_y)\n",
    "        minibatch_avg_cost = train_model(minibatch_x, minibatch_y)\n",
    "\n",
    "        # iteration number\n",
    "        iter = (epoch - 1) * n_train_batches + minibatch_index\n",
    "        if (iter + 1) % validation_frequency == 0:\n",
    "            # compute zero-one loss on validation set\n",
    "            validation_losses = []\n",
    "            for i in xrange(n_valid_batches):\n",
    "                valid_xi, valid_yi = get_minibatch(i, valid_set_x, valid_set_y)\n",
    "                validation_losses.append(test_model(valid_xi, valid_yi))\n",
    "            this_validation_loss = numpy.mean(validation_losses)\n",
    "            print('epoch %i, minibatch %i/%i, validation error %f %%' %\n",
    "                  (epoch,\n",
    "                   minibatch_index + 1,\n",
    "                   n_train_batches,\n",
    "                   this_validation_loss * 100.))\n",
    "\n",
    "            # if we got the best validation score until now\n",
    "            if this_validation_loss < best_validation_loss:\n",
    "                # improve patience if loss improvement is good enough\n",
    "                if this_validation_loss < best_validation_loss * improvement_threshold:\n",
    "                    patience = max(patience, iter * patience_increase)\n",
    "\n",
    "                best_validation_loss = this_validation_loss\n",
    "\n",
    "                # test it on the test set\n",
    "                test_losses = []\n",
    "                for i in xrange(n_test_batches):\n",
    "                    test_xi, test_yi = get_minibatch(i, test_set_x, test_set_y)\n",
    "                    test_losses.append(test_model(test_xi, test_yi))\n",
    "\n",
    "                test_score = numpy.mean(test_losses)\n",
    "                print('     epoch %i, minibatch %i/%i, test error of  best model %f %%' %\n",
    "                      (epoch,\n",
    "                       minibatch_index + 1,\n",
    "                       n_train_batches,\n",
    "                       test_score * 100.))\n",
    "\n",
    "                # save the best parameters\n",
    "                numpy.savez('best_model.npz', W=W.get_value(), b=b.get_value())\n",
    "\n",
    "        if patience <= iter:\n",
    "            done_looping = True\n",
    "            break\n",
    "\n",
    "end_time = timeit.default_timer()\n",
    "print('Optimization complete with best validation score of %f %%, '\n",
    "      'with test performance %f %%' %\n",
    "      (best_validation_loss * 100., test_score * 100.))\n",
    "\n",
    "print('The code ran for %d epochs, with %f epochs/sec' %\n",
    "      (epoch, 1. * epoch / (end_time - start_time)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualization\n",
    "You can visualize the columns of `W`, which correspond to the separation hyperplanes for each class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "from utils import tile_raster_images\n",
    "\n",
    "plt.clf()\n",
    "\n",
    "# Increase the size of the figure\n",
    "plt.gcf().set_size_inches(15, 10)\n",
    "\n",
    "plot_data = tile_raster_images(W.get_value(borrow=True).T,\n",
    "                               img_shape=(28, 28), tile_shape=(2, 5), tile_spacing=(1, 1))\n",
    "plt.imshow(plot_data, cmap='Greys', interpolation='none')\n",
    "plt.axis('off')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
 }
--- a/deep-learning/theano-tutorial/intro_theano/utils.py
+++ b/deep-learning/theano-tutorial/intro_theano/utils.py
@ -0,0 +1,140 @@
 """ This file contains different utility functions that are not connected
 in anyway to the networks presented in the tutorials, but rather help in
 processing the outputs into a more understandable way.
 For example ``tile_raster_images`` helps in generating a easy to grasp
 image from a set of samples or weights.
 """
 import numpy
 from six.moves import xrange
 def scale_to_unit_interval(ndar, eps=1e-8):
    """ Scales all values in the ndarray ndar to be between 0 and 1 """
    ndar = ndar.copy()
    ndar -= ndar.min()
    ndar *= 1.0 / (ndar.max() + eps)
    return ndar
 def tile_raster_images(X, img_shape, tile_shape, tile_spacing=(0, 0),
                       scale_rows_to_unit_interval=True,
                       output_pixel_vals=True):
    """
    Transform an array with one flattened image per row, into an array in
    which images are reshaped and layed out like tiles on a floor.
    This function is useful for visualizing datasets whose rows are images,
    and also columns of matrices for transforming those rows
    (such as the first layer of a neural net).
    :type X: a 2-D ndarray or a tuple of 4 channels, elements of which can
    be 2-D ndarrays or None;
    :param X: a 2-D array in which every row is a flattened image.
    :type img_shape: tuple; (height, width)
    :param img_shape: the original shape of each image
    :type tile_shape: tuple; (rows, cols)
    :param tile_shape: the number of images to tile (rows, cols)
    :param output_pixel_vals: if output should be pixel values (i.e. int8
    values) or floats
    :param scale_rows_to_unit_interval: if the values need to be scaled before
    being plotted to [0,1] or not
    :returns: array suitable for viewing as an image.
    (See:`Image.fromarray`.)
    :rtype: a 2-d array with same dtype as X.
    """
    assert len(img_shape) == 2
    assert len(tile_shape) == 2
    assert len(tile_spacing) == 2
    # The expression below can be re-written in a more C style as
    # follows :
    #
    # out_shape    = [0,0]
    # out_shape[0] = (img_shape[0]+tile_spacing[0])*tile_shape[0] -
    #                tile_spacing[0]
    # out_shape[1] = (img_shape[1]+tile_spacing[1])*tile_shape[1] -
    #                tile_spacing[1]
    out_shape = [
        (ishp + tsp) * tshp - tsp
        for ishp, tshp, tsp in zip(img_shape, tile_shape, tile_spacing)
    ]
    if isinstance(X, tuple):
        assert len(X) == 4
        # Create an output numpy ndarray to store the image
        if output_pixel_vals:
            out_array = numpy.zeros((out_shape[0], out_shape[1], 4),
                                    dtype='uint8')
        else:
            out_array = numpy.zeros((out_shape[0], out_shape[1], 4),
                                    dtype=X.dtype)
        #colors default to 0, alpha defaults to 1 (opaque)
        if output_pixel_vals:
            channel_defaults = [0, 0, 0, 255]
        else:
            channel_defaults = [0., 0., 0., 1.]
        for i in xrange(4):
            if X[i] is None:
                # if channel is None, fill it with zeros of the correct
                # dtype
                dt = out_array.dtype
                if output_pixel_vals:
                    dt = 'uint8'
                out_array[:, :, i] = numpy.zeros(
                    out_shape,
                    dtype=dt
                ) + channel_defaults[i]
            else:
                # use a recurrent call to compute the channel and store it
                # in the output
                out_array[:, :, i] = tile_raster_images(
                    X[i], img_shape, tile_shape, tile_spacing,
                    scale_rows_to_unit_interval, output_pixel_vals)
        return out_array
    else:
        # if we are dealing with only one channel
        H, W = img_shape
        Hs, Ws = tile_spacing
        # generate a matrix to store the output
        dt = X.dtype
        if output_pixel_vals:
            dt = 'uint8'
        out_array = numpy.zeros(out_shape, dtype=dt)
        for tile_row in xrange(tile_shape[0]):
            for tile_col in xrange(tile_shape[1]):
                if tile_row * tile_shape[1] + tile_col < X.shape[0]:
                    this_x = X[tile_row * tile_shape[1] + tile_col]
                    if scale_rows_to_unit_interval:
                        # if we should scale values to be between 0 and 1
                        # do this by calling the `scale_to_unit_interval`
                        # function
                        this_img = scale_to_unit_interval(
                            this_x.reshape(img_shape))
                    else:
                        this_img = this_x.reshape(img_shape)
                    # add the slice to the corresponding position in the
                    # output array
                    c = 1
                    if output_pixel_vals:
                        c = 255
                    out_array[
                        tile_row * (H + Hs): tile_row * (H + Hs) + H,
                        tile_col * (W + Ws): tile_col * (W + Ws) + W
                    ] = this_img * c
        return out_array
--- a/deep-learning/theano-tutorial/rnn_tutorial/Makefile
+++ b/deep-learning/theano-tutorial/rnn_tutorial/Makefile
@ -0,0 +1,13 @@
 all: instruction.pdf rnn_lstm.pdf
 instruction.pdf: slides_source/instruction.tex
 	cd slides_source; pdflatex --shell-escape instruction.tex
 	cd slides_source; pdflatex --shell-escape instruction.tex
 	cd slides_source; pdflatex --shell-escape instruction.tex
 	mv slides_source/instruction.pdf .
 rnn_lstm.pdf: slides_source/rnn_lstm.tex
 	cd slides_source; pdflatex --shell-escape rnn_lstm.tex
 	cd slides_source; pdflatex --shell-escape rnn_lstm.tex
 	cd slides_source; pdflatex --shell-escape rnn_lstm.tex
 	mv slides_source/rnn_lstm.pdf .
--- a/deep-learning/theano-tutorial/rnn_tutorial/instruction.pdf
+++ b/deep-learning/theano-tutorial/rnn_tutorial/instruction.pdf
--- a/deep-learning/theano-tutorial/rnn_tutorial/lstm_text.ipynb
+++ b/deep-learning/theano-tutorial/rnn_tutorial/lstm_text.ipynb
@ -0,0 +1,508 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "In this demo, you'll see a more practical application of RNNs/LSTMs as character-level language models. The emphasis will be more on parallelization and using RNNs with data from Fuel.\n",
    "\n",
    "To get started, we first need to download the training text, validation text and a file that contains a dictionary for mapping characters to integers. We also need to import quite a list of modules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import requests\n",
    "import gzip\n",
    "\n",
    "from six.moves import cPickle as pkl\n",
    "import time\n",
    "\n",
    "import numpy\n",
    "import theano\n",
    "import theano.tensor as T\n",
    "\n",
    "from theano.tensor.nnet import categorical_crossentropy\n",
    "from theano import config\n",
    "from fuel.datasets import TextFile\n",
    "from fuel.streams import DataStream\n",
    "from fuel.schemes import ConstantScheme\n",
    "from fuel.transformers import Batch, Padding\n",
    "\n",
    "if not os.path.exists('traindata.txt'):\n",
    "    r = requests.get('http://www-etud.iro.umontreal.ca/~brakelp/traindata.txt.gz')\n",
    "    with open('traindata.txt.gz', 'wb') as data_file:\n",
    "        data_file.write(r.content)\n",
    "    with gzip.open('traindata.txt.gz', 'rb') as data_file:\n",
    "        with open('traindata.txt', 'w') as out_file:\n",
    "            out_file.write(data_file.read())\n",
    "        \n",
    "if not os.path.exists('valdata.txt'):\n",
    "    r = requests.get('http://www-etud.iro.umontreal.ca/~brakelp/valdata.txt.gz')\n",
    "    with open('valdata.txt.gz', 'wb') as data_file:\n",
    "        data_file.write(r.content)\n",
    "    with gzip.open('valdata.txt.gz', 'rb') as data_file:\n",
    "        with open('valdata.txt', 'w') as out_file:\n",
    "            out_file.write(data_file.read())\n",
    "\n",
    "if not os.path.exists('dictionary.pkl'):\n",
    "    r = requests.get('http://www-etud.iro.umontreal.ca/~brakelp/dictionary.pkl')\n",
    "    with open('dictionary.pkl', 'wb') as data_file:\n",
    "        data_file.write(r.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##The Model\n",
    "The code below shows an implementation of an LSTM network. Note that there are various different variations of the LSTM in use and this one doesn't include the so-called 'peephole connections'. We used a separate method for the dynamic update to make it easier to generate from the network later. The `index_dot` function doesn't safe much verbosity, but it clarifies that certain dot products have been replaced with indexing operations because this network will be applied to discrete data. Last but not least, note the addition of the `mask` argument which is used to ignore certain parts of the input sequence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def gauss_weight(rng, ndim_in, ndim_out=None, sd=.005):\n",
    "    if ndim_out is None:\n",
    "        ndim_out = ndim_in\n",
    "    W = rng.randn(ndim_in, ndim_out) * sd\n",
    "    return numpy.asarray(W, dtype=config.floatX)\n",
    "\n",
    "\n",
    "def index_dot(indices, w):\n",
    "    return w[indices.flatten()]\n",
    "\n",
    "\n",
    "class LstmLayer:\n",
    "\n",
    "    def __init__(self, rng, input, mask, n_in, n_h):\n",
    "\n",
    "        # Init params\n",
    "        self.W_i = theano.shared(gauss_weight(rng, n_in, n_h), 'W_i', borrow=True)\n",
    "        self.W_f = theano.shared(gauss_weight(rng, n_in, n_h), 'W_f', borrow=True)\n",
    "        self.W_c = theano.shared(gauss_weight(rng, n_in, n_h), 'W_c', borrow=True)\n",
    "        self.W_o = theano.shared(gauss_weight(rng, n_in, n_h), 'W_o', borrow=True)\n",
    "\n",
    "        self.U_i = theano.shared(gauss_weight(rng, n_h), 'U_i', borrow=True)\n",
    "        self.U_f = theano.shared(gauss_weight(rng, n_h), 'U_f', borrow=True)\n",
    "        self.U_c = theano.shared(gauss_weight(rng, n_h), 'U_c', borrow=True)\n",
    "        self.U_o = theano.shared(gauss_weight(rng, n_h), 'U_o', borrow=True)\n",
    "\n",
    "        self.b_i = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),\n",
    "                                 'b_i', borrow=True)\n",
    "        self.b_f = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),\n",
    "                                 'b_f', borrow=True)\n",
    "        self.b_c = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),\n",
    "                                 'b_c', borrow=True)\n",
    "        self.b_o = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),\n",
    "                                 'b_o', borrow=True)\n",
    "\n",
    "        self.params = [self.W_i, self.W_f, self.W_c, self.W_o,\n",
    "                       self.U_i, self.U_f, self.U_c, self.U_o,\n",
    "                       self.b_i, self.b_f, self.b_c, self.b_o]\n",
    "\n",
    "        outputs_info = [T.zeros((input.shape[1], n_h)),\n",
    "                        T.zeros((input.shape[1], n_h))]\n",
    "\n",
    "        rval, updates = theano.scan(self._step,\n",
    "                                    sequences=[mask, input],\n",
    "                                    outputs_info=outputs_info)\n",
    "\n",
    "        # self.output is in the format (length, batchsize, n_h)\n",
    "        self.output = rval[0]\n",
    "\n",
    "    def _step(self, m_, x_, h_, c_):\n",
    "\n",
    "        i_preact = (index_dot(x_, self.W_i) +\n",
    "                    T.dot(h_, self.U_i) + self.b_i)\n",
    "        i = T.nnet.sigmoid(i_preact)\n",
    "\n",
    "        f_preact = (index_dot(x_, self.W_f) +\n",
    "                    T.dot(h_, self.U_f) + self.b_f)\n",
    "        f = T.nnet.sigmoid(f_preact)\n",
    "\n",
    "        o_preact = (index_dot(x_, self.W_o) +\n",
    "                    T.dot(h_, self.U_o) + self.b_o)\n",
    "        o = T.nnet.sigmoid(o_preact)\n",
    "\n",
    "        c_preact = (index_dot(x_, self.W_c) +\n",
    "                    T.dot(h_, self.U_c) + self.b_c)\n",
    "        c = T.tanh(c_preact)\n",
    "\n",
    "        c = f * c_ + i * c\n",
    "        c = m_[:, None] * c + (1. - m_)[:, None] * c_\n",
    "\n",
    "        h = o * T.tanh(c)\n",
    "        h = m_[:, None] * h + (1. - m_)[:, None] * h_\n",
    "\n",
    "        return h, c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next block contains some code that computes cross-entropy for masked sequences and a stripped down version of the logistic regression class from the deep learning tutorials which we will need later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def sequence_categorical_crossentropy(prediction, targets, mask):\n",
    "    prediction_flat = prediction.reshape(((prediction.shape[0] *\n",
    "                                           prediction.shape[1]),\n",
    "                                          prediction.shape[2]), ndim=2)\n",
    "    targets_flat = targets.flatten()\n",
    "    mask_flat = mask.flatten()\n",
    "    ce = categorical_crossentropy(prediction_flat, targets_flat)\n",
    "    return T.sum(ce * mask_flat)\n",
    "\n",
    "\n",
    "class LogisticRegression(object):\n",
    "   \n",
    "    def __init__(self, rng, input, n_in, n_out):\n",
    "        \n",
    "        W = gauss_weight(rng, n_in, n_out)\n",
    "        self.W = theano.shared(value=numpy.asarray(W, dtype=theano.config.floatX),\n",
    "                               name='W', borrow=True)\n",
    "        # initialize the biases b as a vector of n_out 0s\n",
    "        self.b = theano.shared(value=numpy.zeros((n_out,),\n",
    "                                                 dtype=theano.config.floatX),\n",
    "                               name='b', borrow=True)\n",
    "\n",
    "        # compute vector of class-membership probabilities in symbolic form\n",
    "        energy = T.dot(input, self.W) + self.b\n",
    "        energy_exp = T.exp(energy - T.max(energy, axis=2, keepdims=True))\n",
    "        pmf = energy_exp / energy_exp.sum(axis=2, keepdims=True)\n",
    "        self.p_y_given_x = pmf\n",
    "        self.params = [self.W, self.b]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#Processing the Data\n",
    "The data in `traindata.txt` and `valdata.txt` is simply English text but formatted in such a way that every sentence is conveniently separated by the newline symbol. We'll use some of the functionality of fuel to perform the following preprocessing steps:\n",
    "* Convert everything to lowercase\n",
    "* Map characters to indices\n",
    "* Group the sentences into batches\n",
    "* Convert each batch in a matrix/tensor as long as the longest sequence with zeros padded to all the shorter sequences\n",
    "* Add a mask matrix that encodes the length of each sequence (a timestep at which the mask is 0 indicates that there is no data available)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "batch_size = 100\n",
    "n_epochs = 40\n",
    "n_h = 50\n",
    "DICT_FILE = 'dictionary.pkl'\n",
    "TRAIN_FILE = 'traindata.txt'\n",
    "VAL_FILE = 'valdata.txt'\n",
    "\n",
    "# Load the datasets with Fuel\n",
    "dictionary = pkl.load(open(DICT_FILE, 'r'))\n",
    "# add a symbol for unknown characters\n",
    "dictionary['~'] = len(dictionary)\n",
    "reverse_mapping = dict((j, i) for i, j in dictionary.items())\n",
    "\n",
    "train = TextFile(files=[TRAIN_FILE],\n",
    "                 dictionary=dictionary,\n",
    "                 unk_token='~',\n",
    "                 level='character',\n",
    "                 preprocess=str.lower,\n",
    "                 bos_token=None,\n",
    "                 eos_token=None)\n",
    "\n",
    "train_stream = DataStream.default_stream(train)\n",
    "\n",
    "# organize data in batches and pad shorter sequences with zeros\n",
    "train_stream = Batch(train_stream,\n",
    "                     iteration_scheme=ConstantScheme(batch_size))\n",
    "train_stream = Padding(train_stream)\n",
    "\n",
    "# idem dito for the validation text\n",
    "val = TextFile(files=[VAL_FILE],\n",
    "                 dictionary=dictionary,\n",
    "                 unk_token='~',\n",
    "                 level='character',\n",
    "                 preprocess=str.lower,\n",
    "                 bos_token=None,\n",
    "                 eos_token=None)\n",
    "\n",
    "val_stream = DataStream.default_stream(val)\n",
    "\n",
    "# organize data in batches and pad shorter sequences with zeros\n",
    "val_stream = Batch(val_stream,\n",
    "                     iteration_scheme=ConstantScheme(batch_size))\n",
    "val_stream = Padding(val_stream)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##The Theano Graph\n",
    "We'll now define the complete Theano graph for computing costs and gradients among other things. The cost will be the cross-entropy of the next character in the sequence and the network will try to predict it based on the previous characters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Set the random number generator' seeds for consistency\n",
    "rng = numpy.random.RandomState(12345)\n",
    "\n",
    "x = T.lmatrix('x')\n",
    "mask = T.matrix('mask')\n",
    "\n",
    "# Construct an LSTM layer and a logistic regression layer\n",
    "recurrent_layer = LstmLayer(rng=rng, input=x, mask=mask, n_in=111, n_h=n_h)\n",
    "logreg_layer = LogisticRegression(rng=rng, input=recurrent_layer.output[:-1],\n",
    "                                  n_in=n_h, n_out=111)\n",
    "\n",
    "# define a cost variable to optimize\n",
    "cost = sequence_categorical_crossentropy(logreg_layer.p_y_given_x,\n",
    "                                         x[1:],\n",
    "                                         mask[1:]) / batch_size\n",
    "\n",
    "# create a list of all model parameters to be fit by gradient descent\n",
    "params = logreg_layer.params + recurrent_layer.params\n",
    "\n",
    "# create a list of gradients for all model parameters\n",
    "grads = T.grad(cost, params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now compile the function that updates the gradients. We also added a function that computes the cost without updating for monitoring purposes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/pbrakel/Repositories/Theano/theano/scan_module/scan_perform_ext.py:117: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility\n",
      "  from scan_perform.scan_perform import *\n"
     ]
    }
   ],
   "source": [
    "learning_rate = 0.1\n",
    "updates = [\n",
    "    (param_i, param_i - learning_rate * grad_i)\n",
    "    for param_i, grad_i in zip(params, grads)\n",
    "]\n",
    "\n",
    "update_model = theano.function([x, mask], cost, updates=updates)\n",
    "\n",
    "evaluate_model = theano.function([x, mask], cost)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##Generating Sequences\n",
    "To see if the networks learn something useful (and to make results monitoring more entertaining), we'll also write some code to generate sequences. For this, we'll first compile a function that computes a single state update for the network to have more control over the values of each variable at each time step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "x_t = T.iscalar()\n",
    "h_p = T.vector()\n",
    "c_p = T.vector()\n",
    "h_t, c_t = recurrent_layer._step(T.ones(1), x_t, h_p, c_p)\n",
    "energy = T.dot(h_t, logreg_layer.W) + logreg_layer.b\n",
    "\n",
    "energy_exp = T.exp(energy - T.max(energy, axis=1, keepdims=True))\n",
    "\n",
    "output = energy_exp / energy_exp.sum(axis=1, keepdims=True)\n",
    "single_step = theano.function([x_t, h_p, c_p], [output, h_t, c_t])\n",
    "\n",
    "def speak(single_step, prefix='the meaning of life is ', n_steps=450):\n",
    "    try:\n",
    "        h_p = numpy.zeros((n_h,), dtype=config.floatX)\n",
    "        c_p = numpy.zeros((n_h,), dtype=config.floatX)\n",
    "        sentence = prefix\n",
    "        for char in prefix:\n",
    "            x_t = dictionary[char]\n",
    "            prediction, h_p, c_p = single_step(x_t, h_p.flatten(),\n",
    "                                               c_p.flatten())\n",
    "        # Renormalize probability in float64\n",
    "        flat_prediction = prediction.flatten()\n",
    "        flat_pred_sum = flat_prediction.sum(dtype='float64')\n",
    "        if flat_pred_sum > 1:\n",
    "            flat_prediction = flat_prediction.astype('float64') / flat_pred_sum\n",
    "        sample = numpy.random.multinomial(1, flat_prediction)\n",
    "\n",
    "        for i in range(n_steps):\n",
    "            x_t = numpy.argmax(sample)\n",
    "            prediction, h_p, c_p = single_step(x_t, h_p.flatten(),\n",
    "                                               c_p.flatten())\n",
    "            # Renormalize probability in float64\n",
    "            flat_prediction = prediction.flatten()\n",
    "            flat_pred_sum = flat_prediction.sum(dtype='float64')\n",
    "            if flat_pred_sum > 1:\n",
    "                flat_prediction = flat_prediction.astype('float64') / flat_pred_sum\n",
    "            sample = numpy.random.multinomial(1, flat_prediction)\n",
    "\n",
    "            sentence += reverse_mapping[x_t]\n",
    "\n",
    "        return sentence\n",
    "    except ValueError as e:\n",
    "        print 'Something went wrong during sentence generation: {}'.format(e)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch: 0\n",
      "\n",
      "LSTM: \"the meaning of life is i<>ateisn ^ltbagss7tuodkca r9 msd,forreypoctlluoiasrn?at<61>netteofkotenni<6E>cf/vattosnlrxisiovu<76>al.hahau<61>ootwo tuost! ]cw<63> eweunhufaaecihtdtk tticiss cvt2f etoct bllstsluohh-,retti?eusrv eikly an<61>ade'i stiel<65>doelnamtuartoci<63>ht.<2E>woi 2kfs$an tpeo<65>miiadain9.e eegtamiaesboeinne<6E>unlocityqe dansapeaeiyo<79>ihaewmtrt<72>'aa svteatae ,otrr.gsac.-perioswetgoc<6F>io froaoeismhsgtulherbttrh fl<66>i el  nnltnta<74>sat yhomsnttwlnwnenaee.mhits r<>us-thist sn man4lamhpac.osdopl g<>\"\n",
      "\n",
      "epoch: 0   minibatch: 40\n",
      "Average validation CE per sentence: 251.167072292\n"
     ]
    },
    {
     "ename": "KeyboardInterrupt",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-13-7c09df6ae427>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      9\u001b[0m         \u001b[0miteration\u001b[0m \u001b[1;33m+=\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     10\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 11\u001b[1;33m         \u001b[0mcross_entropy\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mupdate_model\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mx_\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mT\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmask_\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mT\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     12\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     13\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m/home/pbrakel/Repositories/Theano/theano/compile/function_module.pyc\u001b[0m in \u001b[0;36m__call__\u001b[1;34m(self, *args, **kwargs)\u001b[0m\n\u001b[0;32m    577\u001b[0m         \u001b[0mt0_fn\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    578\u001b[0m         \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 579\u001b[1;33m             \u001b[0moutputs\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    580\u001b[0m         \u001b[1;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    581\u001b[0m             \u001b[1;32mif\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'position_of_error'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;32m/home/pbrakel/Repositories/Theano/theano/scan_module/scan_op.pyc\u001b[0m in \u001b[0;36mrval\u001b[1;34m(p, i, o, n)\u001b[0m\n\u001b[0;32m    649\u001b[0m         \u001b[1;31m# default arguments are stored in the closure of `rval`\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    650\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 651\u001b[1;33m         \u001b[1;32mdef\u001b[0m \u001b[0mrval\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mp\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mp\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mi\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mnode_input_storage\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mo\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mnode_output_storage\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mn\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mnode\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    652\u001b[0m             \u001b[0mr\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mp\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mn\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m[\u001b[0m\u001b[0mx\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mx\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mi\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mo\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    653\u001b[0m             \u001b[1;32mfor\u001b[0m \u001b[0mo\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mnode\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0moutputs\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mKeyboardInterrupt\u001b[0m: "
     ]
    }
   ],
   "source": [
    "start_time = time.clock()\n",
    "\n",
    "iteration = 0\n",
    "\n",
    "for epoch in range(n_epochs):\n",
    "    print 'epoch:', epoch\n",
    "\n",
    "    for x_, mask_ in train_stream.get_epoch_iterator():\n",
    "        iteration += 1\n",
    "\n",
    "        cross_entropy = update_model(x_.T, mask_.T)\n",
    "\n",
    "\n",
    "        # Generate some text after each 20 minibatches\n",
    "        if iteration % 40 == 0:\n",
    "            sentence = speak(single_step, prefix='the meaning of life is ', n_steps=450)\n",
    "            print\n",
    "            print 'LSTM: \"' + sentence + '\"'\n",
    "            print\n",
    "            print 'epoch:', epoch, '  minibatch:', iteration\n",
    "            val_scores = []\n",
    "            for x_val, mask_val in val_stream.get_epoch_iterator():\n",
    "                val_scores.append(evaluate_model(x_val.T, mask_val.T))\n",
    "            print 'Average validation CE per sentence:', numpy.mean(val_scores)\n",
    "\n",
    "end_time = time.clock()\n",
    "print('Optimization complete.')\n",
    "print('The code ran for %.2fm' % ((end_time - start_time) / 60.))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "It can take a while before the text starts to look more reasonable but here are some things to experiment with:\n",
    "* Smarter optimization algorithms (or at least momentum)\n",
    "* Initializing the recurrent weights orthogonally\n",
    "* The sizes of the initial weights and biases (think about what the gates do)\n",
    "* Different sentence prefixes\n",
    "* Changing the temperature of the character distribution during generation. What happens when you generate deterministically?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
 }
--- a/deep-learning/theano-tutorial/rnn_tutorial/lstm_text.py
+++ b/deep-learning/theano-tutorial/rnn_tutorial/lstm_text.py
@ -0,0 +1,299 @@
 import cPickle as pkl
 import time
 import numpy
 import theano
 from theano import config
 import theano.tensor as T
 from theano.tensor.nnet import categorical_crossentropy
 from fuel.datasets import TextFile
 from fuel.streams import DataStream
 from fuel.schemes import ConstantScheme
 from fuel.transformers import Batch, Padding
 # These files can be downloaded from
 # http://www-etud.iro.umontreal.ca/~brakelp/train.txt.gz
 # http://www-etud.iro.umontreal.ca/~brakelp/dictionary.pkl
 # don't forget to change the paths and gunzip train.txt.gz
 TRAIN_FILE = '/u/brakelp/temp/traindata.txt'
 VAL_FILE = '/u/brakelp/temp/valdata.txt'
 DICT_FILE = '/u/brakelp/temp/dictionary.pkl'
 def sequence_categorical_crossentropy(prediction, targets, mask):
    prediction_flat = prediction.reshape(((prediction.shape[0] *
                                           prediction.shape[1]),
                                          prediction.shape[2]), ndim=2)
    targets_flat = targets.flatten()
    mask_flat = mask.flatten()
    ce = categorical_crossentropy(prediction_flat, targets_flat)
    return T.sum(ce * mask_flat)
 def gauss_weight(ndim_in, ndim_out=None, sd=.005):
    if ndim_out is None:
        ndim_out = ndim_in
    W = numpy.random.randn(ndim_in, ndim_out) * sd
    return numpy.asarray(W, dtype=config.floatX)
 class LogisticRegression(object):
    """Multi-class Logistic Regression Class
    The logistic regression is fully described by a weight matrix :math:`W`
    and bias vector :math:`b`. Classification is done by projecting data
    points onto a set of hyperplanes, the distance to which is used to
    determine a class membership probability.
    """
    def __init__(self, input, n_in, n_out):
        """ Initialize the parameters of the logistic regression
        :type input: theano.tensor.TensorType
        :param input: symbolic variable that describes the input of the
                      architecture (one minibatch)
        :type n_in: int
        :param n_in: number of input units, the dimension of the space in
                     which the datapoints lie
        :type n_out: int
        :param n_out: number of output units, the dimension of the space in
                      which the labels lie
        """
        # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
        self.W = theano.shared(value=numpy.zeros((n_in, n_out),
                                                 dtype=theano.config.floatX),
                               name='W', borrow=True)
        # initialize the baises b as a vector of n_out 0s
        self.b = theano.shared(value=numpy.zeros((n_out,),
                                                 dtype=theano.config.floatX),
                               name='b', borrow=True)
        # compute vector of class-membership probabilities in symbolic form
        energy = T.dot(input, self.W) + self.b
        energy_exp = T.exp(energy - T.max(energy, 2)[:, :, None])
        pmf = energy_exp / energy_exp.sum(2)[:, :, None]
        self.p_y_given_x = pmf
        # compute prediction as class whose probability is maximal in
        # symbolic form
        self.y_pred = T.argmax(self.p_y_given_x, axis=1)
        # parameters of the model
        self.params = [self.W, self.b]
 def index_dot(indices, w):
    return w[indices.flatten()]
 class LstmLayer:
    def __init__(self, rng, input, mask, n_in, n_h):
        # Init params
        self.W_i = theano.shared(gauss_weight(n_in, n_h), 'W_i', borrow=True)
        self.W_f = theano.shared(gauss_weight(n_in, n_h), 'W_f', borrow=True)
        self.W_c = theano.shared(gauss_weight(n_in, n_h), 'W_c', borrow=True)
        self.W_o = theano.shared(gauss_weight(n_in, n_h), 'W_o', borrow=True)
        self.U_i = theano.shared(gauss_weight(n_h), 'U_i', borrow=True)
        self.U_f = theano.shared(gauss_weight(n_h), 'U_f', borrow=True)
        self.U_c = theano.shared(gauss_weight(n_h), 'U_c', borrow=True)
        self.U_o = theano.shared(gauss_weight(n_h), 'U_o', borrow=True)
        self.b_i = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
                                 'b_i', borrow=True)
        self.b_f = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
                                 'b_f', borrow=True)
        self.b_c = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
                                 'b_c', borrow=True)
        self.b_o = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
                                 'b_o', borrow=True)
        self.params = [self.W_i, self.W_f, self.W_c, self.W_o,
                       self.U_i, self.U_f, self.U_c, self.U_o,
                       self.b_i, self.b_f, self.b_c, self.b_o]
        outputs_info = [T.zeros((input.shape[1], n_h)),
                        T.zeros((input.shape[1], n_h))]
        rval, updates = theano.scan(self._step,
                                    sequences=[mask, input],
                                    outputs_info=outputs_info)
        # self.output is in the format (batchsize, n_h)
        self.output = rval[0]
    def _step(self, m_, x_, h_, c_):
        i_preact = (index_dot(x_, self.W_i) +
                    T.dot(h_, self.U_i) + self.b_i)
        i = T.nnet.sigmoid(i_preact)
        f_preact = (index_dot(x_, self.W_f) +
                    T.dot(h_, self.U_f) + self.b_f)
        f = T.nnet.sigmoid(f_preact)
        o_preact = (index_dot(x_, self.W_o) +
                    T.dot(h_, self.U_o) + self.b_o)
        o = T.nnet.sigmoid(o_preact)
        c_preact = (index_dot(x_, self.W_c) +
                    T.dot(h_, self.U_c) + self.b_c)
        c = T.tanh(c_preact)
        c = f * c_ + i * c
        c = m_[:, None] * c + (1. - m_)[:, None] * c_
        h = o * T.tanh(c)
        h = m_[:, None] * h + (1. - m_)[:, None] * h_
        return h, c
 def train_model(batch_size=100, n_h=50, n_epochs=40):
    # Load the datasets with Fuel
    dictionary = pkl.load(open(DICT_FILE, 'r'))
    dictionary['~'] = len(dictionary)
    reverse_mapping = dict((j, i) for i, j in dictionary.items())
    print("Loading the data")
    train = TextFile(files=[TRAIN_FILE],
                     dictionary=dictionary,
                     unk_token='~',
                     level='character',
                     preprocess=str.lower,
                     bos_token=None,
                     eos_token=None)
    train_stream = DataStream.default_stream(train)
    # organize data in batches and pad shorter sequences with zeros
    train_stream = Batch(train_stream,
                         iteration_scheme=ConstantScheme(batch_size))
    train_stream = Padding(train_stream)
    # idem dito for the validation text
    val = TextFile(files=[VAL_FILE],
                     dictionary=dictionary,
                     unk_token='~',
                     level='character',
                     preprocess=str.lower,
                     bos_token=None,
                     eos_token=None)
    val_stream = DataStream.default_stream(val)
    # organize data in batches and pad shorter sequences with zeros
    val_stream = Batch(val_stream,
                         iteration_scheme=ConstantScheme(batch_size))
    val_stream = Padding(val_stream)
    print('Building model')
    # Set the random number generator' seeds for consistency
    rng = numpy.random.RandomState(12345)
    x = T.lmatrix('x')
    mask = T.matrix('mask')
    # Construct the LSTM layer
    recurrent_layer = LstmLayer(rng=rng, input=x, mask=mask, n_in=111, n_h=n_h)
    logreg_layer = LogisticRegression(input=recurrent_layer.output[:-1],
                                      n_in=n_h, n_out=111)
    cost = sequence_categorical_crossentropy(logreg_layer.p_y_given_x,
                                             x[1:],
                                             mask[1:]) / batch_size
    # create a list of all model parameters to be fit by gradient descent
    params = logreg_layer.params + recurrent_layer.params
    # create a list of gradients for all model parameters
    grads = T.grad(cost, params)
    # update_model is a function that updates the model parameters by
    # SGD Since this model has many parameters, it would be tedious to
    # manually create an update rule for each model parameter. We thus
    # create the updates list by automatically looping over all
    # (params[i], grads[i]) pairs.
    learning_rate = 0.1
    updates = [
        (param_i, param_i - learning_rate * grad_i)
        for param_i, grad_i in zip(params, grads)
    ]
    update_model = theano.function([x, mask], cost, updates=updates)
    evaluate_model = theano.function([x, mask], cost)
    # Define and compile a function for generating a sequence step by step.
    x_t = T.iscalar()
    h_p = T.vector()
    c_p = T.vector()
    h_t, c_t = recurrent_layer._step(T.ones(1), x_t, h_p, c_p)
    energy = T.dot(h_t, logreg_layer.W) + logreg_layer.b
    energy_exp = T.exp(energy - T.max(energy, 1)[:, None])
    output = energy_exp / energy_exp.sum(1)[:, None]
    single_step = theano.function([x_t, h_p, c_p], [output, h_t, c_t])
    start_time = time.clock()
    iteration = 0
    for epoch in range(n_epochs):
        print 'epoch:', epoch
        for x_, mask_ in train_stream.get_epoch_iterator():
            iteration += 1
            cross_entropy = update_model(x_.T, mask_.T)
            # Generate some text after each 20 minibatches
            if iteration % 40 == 0:
                try:
                    prediction = numpy.ones(111, dtype=config.floatX) / 111.0
                    h_p = numpy.zeros((n_h,), dtype=config.floatX)
                    c_p = numpy.zeros((n_h,), dtype=config.floatX)
                    initial = 'the meaning of life is '
                    sentence = initial
                    for char in initial:
                        x_t = dictionary[char]
                        prediction, h_p, c_p = single_step(x_t, h_p.flatten(),
                                                           c_p.flatten())
                    sample = numpy.random.multinomial(1, prediction.flatten())
                    for i in range(450):
                        x_t = numpy.argmax(sample)
                        prediction, h_p, c_p = single_step(x_t, h_p.flatten(),
                                                           c_p.flatten())
                        sentence += reverse_mapping[x_t]
                        sample = numpy.random.multinomial(1, prediction.flatten())
                    print 'LSTM: "' + sentence + '"'
                except ValueError:
                    print 'Something went wrong during sentence generation.'
            if iteration % 40 == 0:
                print 'epoch:', epoch, '  minibatch:', iteration
                val_scores = []
                for x_val, mask_val in val_stream.get_epoch_iterator():
                    val_scores.append(evaluate_model(x_val.T, mask_val.T))
                print 'Average validation CE per sentence:', numpy.mean(val_scores)
    end_time = time.clock()
    print('Optimization complete.')
    print('The code ran for %.2fm' % ((end_time - start_time) / 60.))
 if __name__ == '__main__':
    train_model()
--- a/deep-learning/theano-tutorial/rnn_tutorial/rnn_lstm.pdf
+++ b/deep-learning/theano-tutorial/rnn_tutorial/rnn_lstm.pdf
--- a/deep-learning/theano-tutorial/rnn_tutorial/rnn_precompile.py
+++ b/deep-learning/theano-tutorial/rnn_tutorial/rnn_precompile.py
@ -0,0 +1,234 @@
 """This file is only here to speed up the execution of notebooks.
 It contains a subset of the code defined in simple_rnn.ipynb and
 lstm_text.ipynb, in particular the code compiling Theano function.
 Executing this script first will populate the cache of compiled C code,
 which will make subsequent compilations faster.
 The use case is to run this script in the background when a demo VM
 such as the one for NVIDIA's qwikLABS, so that the compilation phase
 started from the notebooks is faster.
 """
 import numpy
 import theano
 import theano.tensor as T
 from theano import config
 from theano.tensor.nnet import categorical_crossentropy
 floatX = theano.config.floatX
 # simple_rnn.ipynb
 class SimpleRNN(object):
    def __init__(self, input_dim, recurrent_dim):
        w_xh = numpy.random.normal(0, .01, (input_dim, recurrent_dim))
        w_hh = numpy.random.normal(0, .02, (recurrent_dim, recurrent_dim))
        self.w_xh = theano.shared(numpy.asarray(w_xh, dtype=floatX), name='w_xh')
        self.w_hh = theano.shared(numpy.asarray(w_hh, dtype=floatX), name='w_hh')
        self.b_h = theano.shared(numpy.zeros((recurrent_dim,), dtype=floatX), name='b_h')
        self.parameters = [self.w_xh, self.w_hh, self.b_h]
    def _step(self, input_t, previous):
        return T.tanh(T.dot(previous, self.w_hh) + input_t)
    def __call__(self, x):
        x_w_xh = T.dot(x, self.w_xh) + self.b_h
        result, updates = theano.scan(self._step,
                                      sequences=[x_w_xh],
                                      outputs_info=[T.zeros_like(self.b_h)])
        return result
 w_ho_np = numpy.random.normal(0, .01, (15, 1))
 w_ho = theano.shared(numpy.asarray(w_ho_np, dtype=floatX), name='w_ho')
 b_o = theano.shared(numpy.zeros((1,), dtype=floatX), name='b_o')
 x = T.matrix('x')
 my_rnn = SimpleRNN(1, 15)
 hidden = my_rnn(x)
 prediction = T.dot(hidden, w_ho) + b_o
 parameters = my_rnn.parameters + [w_ho, b_o]
 l2 = sum((p**2).sum() for p in parameters)
 mse = T.mean((prediction[:-1] - x[1:])**2)
 cost = mse + .0001 * l2
 gradient = T.grad(cost, wrt=parameters)
 lr = .3
 updates = [(par, par - lr * gra) for par, gra in zip(parameters, gradient)]
 update_model = theano.function([x], cost, updates=updates)
 get_cost = theano.function([x], mse)
 predict = theano.function([x], prediction)
 get_hidden = theano.function([x], hidden)
 get_gradient = theano.function([x], gradient)
 predict = theano.function([x], prediction)
 # Generating sequences
 x_t = T.vector()
 h_p = T.vector()
 preactivation = T.dot(x_t, my_rnn.w_xh) + my_rnn.b_h
 h_t = my_rnn._step(preactivation, h_p)
 o_t = T.dot(h_t, w_ho) + b_o
 single_step = theano.function([x_t, h_p], [o_t, h_t])
 # lstm_text.ipynb
 def gauss_weight(rng, ndim_in, ndim_out=None, sd=.005):
    if ndim_out is None:
        ndim_out = ndim_in
    W = rng.randn(ndim_in, ndim_out) * sd
    return numpy.asarray(W, dtype=config.floatX)
 def index_dot(indices, w):
    return w[indices.flatten()]
 class LstmLayer:
    def __init__(self, rng, input, mask, n_in, n_h):
        # Init params
        self.W_i = theano.shared(gauss_weight(rng, n_in, n_h), 'W_i', borrow=True)
        self.W_f = theano.shared(gauss_weight(rng, n_in, n_h), 'W_f', borrow=True)
        self.W_c = theano.shared(gauss_weight(rng, n_in, n_h), 'W_c', borrow=True)
        self.W_o = theano.shared(gauss_weight(rng, n_in, n_h), 'W_o', borrow=True)
        self.U_i = theano.shared(gauss_weight(rng, n_h), 'U_i', borrow=True)
        self.U_f = theano.shared(gauss_weight(rng, n_h), 'U_f', borrow=True)
        self.U_c = theano.shared(gauss_weight(rng, n_h), 'U_c', borrow=True)
        self.U_o = theano.shared(gauss_weight(rng, n_h), 'U_o', borrow=True)
        self.b_i = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
                                 'b_i', borrow=True)
        self.b_f = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
                                 'b_f', borrow=True)
        self.b_c = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
                                 'b_c', borrow=True)
        self.b_o = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
                                 'b_o', borrow=True)
        self.params = [self.W_i, self.W_f, self.W_c, self.W_o,
                       self.U_i, self.U_f, self.U_c, self.U_o,
                       self.b_i, self.b_f, self.b_c, self.b_o]
        outputs_info = [T.zeros((input.shape[1], n_h)),
                        T.zeros((input.shape[1], n_h))]
        rval, updates = theano.scan(self._step,
                                    sequences=[mask, input],
                                    outputs_info=outputs_info)
        # self.output is in the format (length, batchsize, n_h)
        self.output = rval[0]
    def _step(self, m_, x_, h_, c_):
        i_preact = (index_dot(x_, self.W_i) +
                    T.dot(h_, self.U_i) + self.b_i)
        i = T.nnet.sigmoid(i_preact)
        f_preact = (index_dot(x_, self.W_f) +
                    T.dot(h_, self.U_f) + self.b_f)
        f = T.nnet.sigmoid(f_preact)
        o_preact = (index_dot(x_, self.W_o) +
                    T.dot(h_, self.U_o) + self.b_o)
        o = T.nnet.sigmoid(o_preact)
        c_preact = (index_dot(x_, self.W_c) +
                    T.dot(h_, self.U_c) + self.b_c)
        c = T.tanh(c_preact)
        c = f * c_ + i * c
        c = m_[:, None] * c + (1. - m_)[:, None] * c_
        h = o * T.tanh(c)
        h = m_[:, None] * h + (1. - m_)[:, None] * h_
        return h, c
 def sequence_categorical_crossentropy(prediction, targets, mask):
    prediction_flat = prediction.reshape(((prediction.shape[0] *
                                           prediction.shape[1]),
                                          prediction.shape[2]), ndim=2)
    targets_flat = targets.flatten()
    mask_flat = mask.flatten()
    ce = categorical_crossentropy(prediction_flat, targets_flat)
    return T.sum(ce * mask_flat)
 class LogisticRegression(object):
    def __init__(self, rng, input, n_in, n_out):
        W = gauss_weight(rng, n_in, n_out)
        self.W = theano.shared(value=numpy.asarray(W, dtype=theano.config.floatX),
                               name='W', borrow=True)
        # initialize the biases b as a vector of n_out 0s
        self.b = theano.shared(value=numpy.zeros((n_out,),
                                                 dtype=theano.config.floatX),
                               name='b', borrow=True)
        # compute vector of class-membership probabilities in symbolic form
        energy = T.dot(input, self.W) + self.b
        energy_exp = T.exp(energy - T.max(energy, axis=2, keepdims=True))
        pmf = energy_exp / energy_exp.sum(axis=2, keepdims=True)
        self.p_y_given_x = pmf
        self.params = [self.W, self.b]
 batch_size = 100
 n_h = 50
 # The Theano graph
 # Set the random number generator' seeds for consistency
 rng = numpy.random.RandomState(12345)
 x = T.lmatrix('x')
 mask = T.matrix('mask')
 # Construct an LSTM layer and a logistic regression layer
 recurrent_layer = LstmLayer(rng=rng, input=x, mask=mask, n_in=111, n_h=n_h)
 logreg_layer = LogisticRegression(rng=rng, input=recurrent_layer.output[:-1],
                                  n_in=n_h, n_out=111)
 # define a cost variable to optimize
 cost = sequence_categorical_crossentropy(logreg_layer.p_y_given_x,
                                         x[1:],
                                         mask[1:]) / batch_size
 # create a list of all model parameters to be fit by gradient descent
 params = logreg_layer.params + recurrent_layer.params
 # create a list of gradients for all model parameters
 grads = T.grad(cost, params)
 learning_rate = 0.1
 updates = [
    (param_i, param_i - learning_rate * grad_i)
    for param_i, grad_i in zip(params, grads)
 ]
 update_model = theano.function([x, mask], cost, updates=updates)
 evaluate_model = theano.function([x, mask], cost)
 # Generating Sequences
 x_t = T.iscalar()
 h_p = T.vector()
 c_p = T.vector()
 h_t, c_t = recurrent_layer._step(T.ones(1), x_t, h_p, c_p)
 energy = T.dot(h_t, logreg_layer.W) + logreg_layer.b
 energy_exp = T.exp(energy - T.max(energy, axis=1, keepdims=True))
 output = energy_exp / energy_exp.sum(axis=1, keepdims=True)
 single_step = theano.function([x_t, h_p, c_p], [output, h_t, c_t])
--- a/deep-learning/theano-tutorial/rnn_tutorial/simple_rnn.ipynb
+++ b/deep-learning/theano-tutorial/rnn_tutorial/simple_rnn.ipynb
--- a/deep-learning/theano-tutorial/rnn_tutorial/synthetic.py
+++ b/deep-learning/theano-tutorial/rnn_tutorial/synthetic.py
@ -0,0 +1,85 @@
 import collections
 import numpy as np
 def mackey_glass(sample_len=1000, tau=17, seed=None, n_samples = 1):
    '''
    mackey_glass(sample_len=1000, tau=17, seed = None, n_samples = 1) -> input
    Generate the Mackey Glass time-series. Parameters are:
        - sample_len: length of the time-series in timesteps. Default is 1000.
        - tau: delay of the MG - system. Commonly used values are tau=17 (mild 
          chaos) and tau=30 (moderate chaos). Default is 17.
        - seed: to seed the random generator, can be used to generate the same
          timeseries at each invocation.
        - n_samples : number of samples to generate
    '''
    delta_t = 10
    history_len = tau * delta_t 
    # Initial conditions for the history of the system
    timeseries = 1.2
    if seed is not None:
        np.random.seed(seed)
    samples = []
    for _ in range(n_samples):
        history = collections.deque(1.2 * np.ones(history_len) + 0.2 * \
                                    (np.random.rand(history_len) - 0.5))
        # Preallocate the array for the time-series
        inp = np.zeros((sample_len,1))
        for timestep in range(sample_len):
            for _ in range(delta_t):
                xtau = history.popleft()
                history.append(timeseries)
                timeseries = history[-1] + (0.2 * xtau / (1.0 + xtau ** 10) - \
                             0.1 * history[-1]) / delta_t
            inp[timestep] = timeseries
        # Squash timeseries through tanh
        inp = np.tanh(inp - 1)
        samples.append(inp)
    return samples
 def mso(sample_len=1000, n_samples = 1):
    '''
    mso(sample_len=1000, n_samples = 1) -> input
    Generate the Multiple Sinewave Oscillator time-series, a sum of two sines
    with incommensurable periods. Parameters are:
        - sample_len: length of the time-series in timesteps
        - n_samples: number of samples to generate
    '''
    signals = []
    for _ in range(n_samples):
        phase = np.random.rand()
        x = np.atleast_2d(np.arange(sample_len)).T
        signals.append(np.sin(0.2 * x + phase) + np.sin(0.311 * x + phase))
    return signals
 def lorentz(sample_len=1000, sigma=10, rho=28, beta=8 / 3, step=0.01):
    """This function generates a Lorentz time series of length sample_len,
    with standard parameters sigma, rho and beta. 
    """
    x = np.zeros([sample_len])
    y = np.zeros([sample_len])
    z = np.zeros([sample_len])
    # Initial conditions taken from 'Chaos and Time Series Analysis', J. Sprott
    x[0] = 0;
    y[0] = -0.01;
    z[0] = 9;
    for t in range(sample_len - 1):
        x[t + 1] = x[t] + sigma * (y[t] - x[t]) * step
        y[t + 1] = y[t] + (x[t] * (rho - z[t]) - y[t]) * step
        z[t + 1] = z[t] + (x[t] * y[t] - beta * z[t]) * step
    x.shape += (1,)
    y.shape += (1,)
    z.shape += (1,)
    return np.concatenate((x, y, z), axis=1)
--- a/deep-learning/theano-tutorial/scan_tutorial/scan_ex1_solution.py
+++ b/deep-learning/theano-tutorial/scan_tutorial/scan_ex1_solution.py
@ -0,0 +1,28 @@
 import theano
 import theano.tensor as T
 import numpy as np
 coefficients = T.vector("coefficients")
 x = T.scalar("x")
 max_coefficients_supported = 10000
 def step(coeff, power, prior_value, free_var):
    return prior_value + (coeff * (free_var ** power))
 # Generate the components of the polynomial
 full_range = T.arange(max_coefficients_supported)
 outputs_info = np.zeros((), dtype=theano.config.floatX)
 components, updates = theano.scan(fn=step,
                                  sequences=[coefficients, full_range],
                                  outputs_info=outputs_info,
                                  non_sequences=x)
 polynomial = components[-1]
 calculate_polynomial = theano.function(inputs=[coefficients, x],
                                       outputs=polynomial,
                                       updates=updates)
 test_coeff = np.asarray([1, 0, 2], dtype=theano.config.floatX)
 print(calculate_polynomial(test_coeff, 3))
--- a/deep-learning/theano-tutorial/scan_tutorial/scan_ex2_solution.py
+++ b/deep-learning/theano-tutorial/scan_tutorial/scan_ex2_solution.py
@ -0,0 +1,50 @@
 import theano
 import theano.tensor as T
 import numpy as np
 probabilities = T.vector()
 nb_samples = T.iscalar()
 rng = T.shared_randomstreams.RandomStreams(1234)
 def sample_from_pvect(pvect):
    """ Provided utility function: given a symbolic vector of
    probabilities (which MUST sum to 1), sample one element
    and return its index.
    """
    onehot_sample = rng.multinomial(n=1, pvals=pvect)
    sample = onehot_sample.argmax()
    return sample
 def set_p_to_zero(pvect, i):
    """ Provided utility function: given a symbolic vector of
    probabilities and an index 'i', set the probability of the
    i-th element to 0 and renormalize the probabilities so they
    sum to 1.
    """
    new_pvect = T.set_subtensor(pvect[i], 0.)
    new_pvect = new_pvect / new_pvect.sum()
    return new_pvect
 def step(p):
    sample = sample_from_pvect(p)
    new_p = set_p_to_zero(p, sample)
    return new_p, sample
 output, updates = theano.scan(fn=step,
                              outputs_info=[probabilities, None],
                              n_steps=nb_samples)
 modified_probabilities, samples = output
 f = theano.function(inputs=[probabilities, nb_samples],
                    outputs=[samples],
                    updates=updates)
 # Testing the function
 test_probs = np.asarray([0.6, 0.3, 0.1], dtype=theano.config.floatX)
 for i in range(10):
    print(f(test_probs, 2))
--- a/deep-learning/theano-tutorial/scan_tutorial/scan_tutorial.ipynb
+++ b/deep-learning/theano-tutorial/scan_tutorial/scan_tutorial.ipynb
@ -0,0 +1,644 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction to Scan Theano\n",
    "\n",
    "Credits: Forked from [summerschool2015](https://github.com/mila-udem/summerschool2015) by mila-udem\n",
    "\n",
    "## In short\n",
    "\n",
    "* Mechanism to perform loops in a Theano graph\n",
    "* Supports nested loops and reusing results from previous iterations \n",
    "* Highly generic\n",
    "\n",
    "## Implementation\n",
    "\n",
    "A Theano function graph is composed of two types of nodes; Variable nodes which represent data and Apply node which apply Ops (which represent some computation) to Variables to produce new Variables.\n",
    "\n",
    "From this point of view, a node that applies a Scan op is just like any other. Internally, however, it is very different from most Ops.\n",
    "\n",
    "Inside a Scan op is yet another Theano graph which represents the computation to be performed at every iteration of the loop. During compilation, that graph is compiled into a function and, during execution, the Scan op will call that function repeatedly on its inputs to produce its outputs.\n",
    "\n",
    "## Example 1 : As simple as it gets\n",
    "\n",
    "Scan's interface is complex and, thus, best introduced by examples. So, let's dive right in and start with a simple example; perform an element-wise multiplication between two vectors. \n",
    "\n",
    "This particular example is simple enough that Scan is not the best way to do things but we'll gradually work our way to more complex examples where Scan gets more interesting.\n",
    "\n",
    "Let's first setup our use case by defining Theano variables for the inputs :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import theano\n",
    "import theano.tensor as T\n",
    "import numpy as np\n",
    "\n",
    "vector1 = T.vector('vector1')\n",
    "vector2 = T.vector('vector2')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we call the `scan()` function. It has many parameters but, because our use case is simple, we only need two of them. We'll introduce other parameters in the next examples.\n",
    "\n",
    "The parameter `sequences` allows us to specify variables that Scan should iterate over as it loops. The first iteration will take as input the first element of every sequence, the second iteration will take as input the second element of every sequence, etc. These individual element have will have one less dimension than the original sequences. For example, for a matrix sequence, the individual elements will be vectors.\n",
    "\n",
    "The parameter `fn` receives a function or lambda expression that expresses the computation to do at every iteration. It operates on the symbolic inputs to produce symbolic outputs. It will **only ever be called once**, to assemble the Theano graph used by Scan at every the iterations.\n",
    "\n",
    "Since we wish to iterate over both `vector1` and `vector2` simultaneously, we provide them as sequences. This means that every iteration will operate on two inputs: an element from `vector1` and the corresponding element from `vector2`. \n",
    "\n",
    "Because what we want is the elementwise product between the vectors, we provide a lambda expression that, given an element `a` from `vector1` and an element `b` from `vector2` computes and return the product."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "output, updates = theano.scan(fn=lambda a, b : a * b,\n",
    "                              sequences=[vector1, vector2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling `scan()`, we see that it returns two outputs.\n",
    "\n",
    "The first output contains the outputs of `fn` from every timestep concatenated into a tensor. In our case, the output of a single timestep is a scalar so output is a vector where `output[i]` is the output of the i-th iteration.\n",
    "\n",
    "The second output details if and how the execution of the Scan updates any shared variable in the graph. It should be provided as an argument when compiling the Theano function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "f = theano.function(inputs=[vector1, vector2],\n",
    "                    outputs=output,\n",
    "                    updates=updates)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If `updates` is omitted, the state of any shared variables modified by Scan will not be updated properly. Random number sampling, for instance, relies on shared variables. If `updates` is not provided, the state of the random number generator won't be updated properly and the same numbers might be sampled repeatedly. **Always** provide `updates` when compiling your Theano function.\n",
    "\n",
    "Now that we've defined how to do elementwise multiplication with Scan, we can see that the result is as expected :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "vector1_value = np.arange(0, 5).astype(theano.config.floatX) # [0,1,2,3,4]\n",
    "vector2_value = np.arange(1, 6).astype(theano.config.floatX) # [1,2,3,4,5]\n",
    "print(f(vector1_value, vector2_value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An interesting thing is that we never explicitly told Scan how many iteration it needed to run. It was automatically inferred; when given sequences, Scan will run as many iterations as the length of the shortest sequence : "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "print(f(vector1_value, vector2_value[:4]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 2 : Non-sequences\n",
    "\n",
    "In this example, we introduce another of Scan's features; non-sequences. To demonstrate how to use them, we use Scan to compute the activations of  a linear MLP layer over a minibatch.\n",
    "\n",
    "It is not yet a use case where Scan is truly useful but it introduces a requirement that sequences cannot fulfill; if we want to use Scan to iterate over the minibatch elements and compute the activations for each of them, then we need some variables (the parameters of the layer), to be available 'as is' at every iteration of the loop. We do *not* want Scan to iterate over them and give only part of them at every iteration.\n",
    "\n",
    "Once again, we begin by setting up our Theano variables :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X = T.matrix('X') # Minibatch of data\n",
    "W = T.matrix('W') # Weights of the layer\n",
    "b = T.vector('b') # Biases of the layer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the sake of variety, in this example we define the computation to be done at every iteration of the loop using a Python function, `step()`, instead of a lambda expression.\n",
    "\n",
    "To have the full weight matrix W and the full bias vector b available at every iteration, we use the argument non_sequences. Contrary to sequences, non-sequences are not iterated upon by Scan. Every non-sequence is passed as input to every iteration.\n",
    "\n",
    "This means that our `step()` function will need to operate on three symbolic inputs; one for our sequence X and one for each of our non-sequences W and b. \n",
    "\n",
    "The inputs that correspond to the non-sequences are **always** last and in the same order at the non-sequences are provided to Scan. This means that the correspondence between the inputs of the `step()` function and the arguments to `scan()` is the following : \n",
    "\n",
    "* `v` : individual element of the sequence `X` \n",
    "* `W` and `b` : non-sequences `W` and `b`, respectively"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def step(v, W, b):\n",
    "    return T.dot(v, W) + b\n",
    "\n",
    "output, updates = theano.scan(fn=step,\n",
    "                              sequences=[X],\n",
    "                              non_sequences=[W, b])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now compile our Theano function and see that it gives the expected results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "f = theano.function(inputs=[X, W, b],\n",
    "                    outputs=output,\n",
    "                    updates=updates)\n",
    "\n",
    "X_value = np.arange(-3, 3).reshape(3, 2).astype(theano.config.floatX)\n",
    "W_value = np.eye(2).astype(theano.config.floatX)\n",
    "b_value = np.arange(2).astype(theano.config.floatX)\n",
    "print(f(X_value, W_value, b_value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 3 : Reusing outputs from the previous iterations\n",
    "\n",
    "In this example, we will use Scan to compute a cumulative sum over the first dimension of a matrix $M$. This means that the output will be a matrix $S$ in which the first row will be equal to the first row of $M$, the second row will be equal to the sum of the two first rows of $M$, and so on.\n",
    "\n",
    "Another way to express this, which is the way we will implement here, is that $S[t] = S[t-1] + M[t]$. Implementing this with Scan would involve iterating over the rows of the matrix $M$ and, at every iteration, reuse the cumulative row that was output at the previous iteration and return the sum of it and the current row of $M$.\n",
    "\n",
    "If we assume for a moment that we can get Scan to provide the output value from the previous iteration as an input for every iteration, implementing a step function is simple :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def step(m_row, cumulative_sum):\n",
    "    return m_row + cumulative_sum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The trick part is informing Scan that our step function expects as input the output of a previous iteration. To achieve this, we need to use a new parameter of the `scan()` function: `outputs_info`. This parameter is used to tell Scan how we intend to use each of the outputs that are computed at each iteration.\n",
    "\n",
    "This parameter can be omitted (like we did so far) when the step function doesn't depend on any output of a previous iteration. However, now that we wish to have recurrent outputs, we need to start using it.\n",
    "\n",
    "`outputs_info` takes a sequence with one element for every output of the `step()` function :\n",
    "* For a **non-recurrent output** (like in every example before this one), the element should be `None`.\n",
    "* For a **simple recurrent output** (iteration $t$ depends on the value at iteration $t-1$), the element must be a tensor. Scan will interpret it as being an initial state for a recurrent output and give it as input to the first iteration, pretending it is the output value from a previous iteration. For subsequent iterations, Scan will automatically handle giving the previous output value as an input.\n",
    "\n",
    "The `step()` function needs to expect one additional input for each simple recurrent output. These inputs correspond to outputs from previous iteration and are **always** after the inputs that correspond to sequences but before those that correspond to non-sequences. The are received by the `step()` function in the order in which the recurrent outputs are declared in the outputs_info sequence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "M = T.matrix('X')\n",
    "s = T.vector('s') # Initial value for the cumulative sum\n",
    "\n",
    "output, updates = theano.scan(fn=step,\n",
    "                              sequences=[M],\n",
    "                              outputs_info=[s])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now compile and test the Theano function :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "f = theano.function(inputs=[M, s],\n",
    "                    outputs=output,\n",
    "                    updates=updates)\n",
    "\n",
    "M_value = np.arange(9).reshape(3, 3).astype(theano.config.floatX)\n",
    "s_value = np.zeros((3, ), dtype=theano.config.floatX)\n",
    "print(f(M_value, s_value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An important thing to notice here, is that the output computed by the Scan does **not** include the initial state that we provided. It only outputs the states that it has computed itself.\n",
    "\n",
    "If we want to have both the initial state and the computed states in the same Theano variable, we have to join them ourselves."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 4 : Reusing outputs from multiple past iterations\n",
    "\n",
    "The Fibonacci sequence is a sequence of numbers F where the two first numbers both 1 and every subsequence number is defined as such : $F_n = F_{n-1} + F_{n-2}$. Thus, the Fibonacci sequence goes : 1, 1, 2, 3, 5, 8, 13, ...\n",
    "\n",
    "In this example, we will cover how to compute part of the Fibonacci sequence using Scan. Most of the tools required to achieve this have been introduced in the previous examples. The only one missing is the ability to use, at iteration $i$, outputs from iterations older than $i-1$.\n",
    "\n",
    "Also, since every example so far had only one output at every iteration of the loop, we will also compute, at each timestep, the ratio between the new term of the Fibonacci sequence and the previous term.\n",
    "\n",
    "Writing an appropriate step function given two inputs, representing the two previous terms of the Fibonacci sequence, is easy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def step(f_minus2, f_minus1):\n",
    "    new_f = f_minus2 + f_minus1\n",
    "    ratio = new_f / f_minus1\n",
    "    return new_f, ratio"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is defining the value of `outputs_info`.\n",
    "\n",
    "Recall that, for **non-recurrent outputs**, the value is `None` and, for **simple recurrent outputs**, the value is a single initial state. For **general recurrent outputs**, where iteration $t$ may depend on multiple past values, the value is a dictionary. That dictionary has two values:\n",
    "* taps : list declaring which previous values of that output every iteration will need. `[-3, -2, -1]` would mean every iteration should take as input the last 3 values of that output. `[-2]` would mean every iteration should take as input the value of that output from two iterations ago.\n",
    "* initial : tensor of initial values. If every initial value has $n$ dimensions, `initial` will be a single tensor of $n+1$ dimensions with as many initial values as the oldest requested tap. In the case of the Fibonacci sequence, the individual initial values are scalars so the `initial` will be a vector. \n",
    "\n",
    "In our example, we have two outputs. The first output is the next computed term of the Fibonacci sequence so every iteration should take as input the two last values of that output. The second output is the ratio between successive terms and we don't reuse its value so this output is non-recurrent. We define the value of `outputs_info` as such :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "f_init = T.fvector()\n",
    "outputs_info = [dict(initial=f_init, taps=[-2, -1]),\n",
    "                None]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we've defined the step function and the properties of our outputs, we can call the `scan()` function. Because the `step()` function has multiple outputs, the first output of `scan()` function will be a list of tensors: the first tensor containing all the states of the first output and the second tensor containing all the states of the second input.\n",
    "\n",
    "In every previous example, we used sequences and Scan automatically inferred the number of iterations it needed to run from the length of these\n",
    "sequences. Now that we have no sequence, we need to explicitly tell Scan how many iterations to run using the `n_step` parameter. The value can be real or symbolic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "output, updates = theano.scan(fn=step,\n",
    "                              outputs_info=outputs_info,\n",
    "                              n_steps=10)\n",
    "\n",
    "next_fibonacci_terms = output[0]\n",
    "ratios_between_terms = output[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's compile our Theano function which will take a vector of consecutive values from the Fibonacci sequence and compute the next 10 values :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "f = theano.function(inputs=[f_init],\n",
    "                    outputs=[next_fibonacci_terms, ratios_between_terms],\n",
    "                    updates=updates)\n",
    "\n",
    "out = f([1, 1])\n",
    "print(out[0])\n",
    "print(out[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Precisions about the order of the arguments to the step function\n",
    "\n",
    "When we start using many sequences, recurrent outputs and non-sequences, it's easy to get confused regarding the order in which the step function receives the corresponding inputs. Below is the full order:\n",
    "\n",
    "* Element from the first sequence\n",
    "* ...\n",
    "* Element from the last sequence\n",
    "* First requested tap from first recurrent output\n",
    "* ...\n",
    "* Last requested tap from first recurrent output\n",
    "* ...\n",
    "* First requested tap from last recurrent output\n",
    "* ...\n",
    "* Last requested tap from last recurrent output\n",
    "* First non-sequence\n",
    "* ...\n",
    "* Last non-sequence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## When to use Scan and when not to\n",
    "\n",
    "Scan is not appropriate for every problem. Here's some information to help you figure out if Scan is the best solution for a given use case.\n",
    "\n",
    "### Execution speed\n",
    "\n",
    "Using Scan in a Theano function typically makes it slighly slower compared to the equivalent Theano graph in which the loop is unrolled. Both of these approaches tend to be much slower than a vectorized implementation in which large chunks of the computation can be done in parallel.\n",
    "\n",
    "### Compilation speed\n",
    "\n",
    "Scan also adds an overhead to the compilation, potentially making it slower, but using it can also dramatically reduce the size of your graph, making compilation much faster. In the end, the effect of Scan on compilation speed will heavily depend on the size of the graph with and without Scan.\n",
    "\n",
    "The compilation speed of a Theano function using Scan will usually be comparable to one in which the loop is unrolled if the number of iterations is small. It the number of iterations is large, however, the compilation will usually be much faster with Scan.\n",
    "\n",
    "### In summary\n",
    "\n",
    "If you have one of the following cases, Scan can help :\n",
    "* A vectorized implementation is not possible (due to the nature of the computation and/or memory usage)\n",
    "* You want to do a large or variable number of iterations\n",
    "\n",
    "If you have one of the following cases, you should consider other options :\n",
    "* A vectorized implementation could perform the same computation => Use the vectorized approach. It will often be faster during both compilation and execution.\n",
    "* You want to do a small, fixed, number of iterations (ex: 2 or 3) => It's probably better to simply unroll the computation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "### Exercise 1 - Computing a polynomial\n",
    "\n",
    "In this exercise, the initial version already works. It computes the value of a polynomial ($n_0 + n_1 x + n_2 x^2 + ... $) of at most 10000 degrees given the coefficients of the various terms and the value of x.\n",
    "\n",
    "You must modify it such that the reduction (the sum() call) is done by Scan."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "coefficients = theano.tensor.vector(\"coefficients\")\n",
    "x = T.scalar(\"x\")\n",
    "max_coefficients_supported = 10000\n",
    "\n",
    "def step(coeff, power, free_var):\n",
    "    return coeff * free_var ** power\n",
    "\n",
    "# Generate the components of the polynomial\n",
    "full_range=theano.tensor.arange(max_coefficients_supported)\n",
    "components, updates = theano.scan(fn=step,\n",
    "                                  outputs_info=None,\n",
    "                                  sequences=[coefficients, full_range],\n",
    "                                  non_sequences=x)\n",
    "\n",
    "polynomial = components.sum()\n",
    "calculate_polynomial = theano.function(inputs=[coefficients, x],\n",
    "                                       outputs=polynomial,\n",
    "                                       updates=updates)\n",
    "\n",
    "test_coeff = np.asarray([1, 0, 2], dtype=theano.config.floatX)\n",
    "print(calculate_polynomial(test_coeff, 3))\n",
    "# 19.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Solution** : run the cell below to display the solution to this exercise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%load scan_ex1_solution.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 2 - Sampling without replacement\n",
    "\n",
    "In this exercise, the goal is to implement a Theano function that :\n",
    "* takes as input a vector of probabilities and a scalar\n",
    "* performs sampling without replacements from those probabilities as many times as the value of the scalar\n",
    "* returns a vector containing the indices of the sampled elements.\n",
    "\n",
    "Partial code is provided to help with the sampling of random numbers since this is not something that was covered in this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "probabilities = T.vector()\n",
    "nb_samples = T.iscalar()\n",
    "\n",
    "rng = T.shared_randomstreams.RandomStreams(1234)\n",
    "\n",
    "def sample_from_pvect(pvect):\n",
    "    \"\"\" Provided utility function: given a symbolic vector of\n",
    "    probabilities (which MUST sum to 1), sample one element\n",
    "    and return its index.\n",
    "    \"\"\"\n",
    "    onehot_sample = rng.multinomial(n=1, pvals=pvect)\n",
    "    sample = onehot_sample.argmax()\n",
    "    return sample\n",
    "\n",
    "def set_p_to_zero(pvect, i):\n",
    "    \"\"\" Provided utility function: given a symbolic vector of\n",
    "    probabilities and an index 'i', set the probability of the\n",
    "    i-th element to 0 and renormalize the probabilities so they\n",
    "    sum to 1.\n",
    "    \"\"\"\n",
    "    new_pvect = T.set_subtensor(pvect[i], 0.)\n",
    "    new_pvect = new_pvect / new_pvect.sum()\n",
    "    return new_pvect\n",
    "    \n",
    "\n",
    "# TODO use Scan to sample from the vector of probabilities and\n",
    "# symbolically obtain 'samples' the vector of sampled indices.\n",
    "samples = None\n",
    "\n",
    "# Compiling the function\n",
    "f = theano.function(inputs=[probabilities, nb_samples],\n",
    "                    outputs=[samples])\n",
    "\n",
    "# Testing the function\n",
    "test_probs = np.asarray([0.6, 0.3, 0.1], dtype=theano.config.floatX)\n",
    "for i in range(10):\n",
    "    print(f(test_probs, 2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Solution** : run the cell below to display the solution to this exercise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%load scan_ex2_solution.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
 }
--- a/deep-learning/theano-tutorial/theano_mlp/theano_mlp.ipynb
+++ b/deep-learning/theano-tutorial/theano_mlp/theano_mlp.ipynb
@ -0,0 +1,787 @@
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multilayer Perceptron in Theano\n",
    "\n",
    "Credits: Forked from [summerschool2015](https://github.com/mila-udem/summerschool2015) by mila-udem\n",
    "\n",
    "This notebook describes how to implement the building blocks for a multilayer perceptron in Theano, in particular how to define and combine layers.\n",
    "\n",
    "We will continue using the MNIST digits classification dataset, still using Fuel.\n",
    "\n",
    "## The Model\n",
    "We will focus on fully-connected layers, with an elementwise non-linearity on each hidden layer, and a softmax layer (similar to the logistic regression model) for classification on the top layer.\n",
    "\n",
    "### A class for hidden layers\n",
    "This class does all its work in its constructor:\n",
    "- Create and initialize shared variables for its parameters (`W` and `b`), unless there are explicitly provided. Note that the initialization scheme for `W` is the one described in [Glorot & Bengio (2010)](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf).\n",
    "- Build the Theano expression for the value of the output units, given a variable for the input.\n",
    "- Store the input, output, and shared parameters as members."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import numpy\n",
    "import theano\n",
    "from theano import tensor\n",
    "\n",
    "# Set lower precision float, otherwise the notebook will take too long to run\n",
    "theano.config.floatX = 'float32'\n",
    "\n",
    "\n",
    "class HiddenLayer(object):\n",
    "    def __init__(self, rng, input, n_in, n_out, W=None, b=None,\n",
    "                 activation=tensor.tanh):\n",
    "        \"\"\"\n",
    "        Typical hidden layer of a MLP: units are fully-connected and have\n",
    "        sigmoidal activation function. Weight matrix W is of shape (n_in,n_out)\n",
    "        and the bias vector b is of shape (n_out,).\n",
    "\n",
    "        NOTE : The nonlinearity used here is tanh\n",
    "\n",
    "        Hidden unit activation is given by: tanh(dot(input,W) + b)\n",
    "\n",
    "        :type rng: numpy.random.RandomState\n",
    "        :param rng: a random number generator used to initialize weights\n",
    "\n",
    "        :type input: theano.tensor.dmatrix\n",
    "        :param input: a symbolic tensor of shape (n_examples, n_in)\n",
    "\n",
    "        :type n_in: int\n",
    "        :param n_in: dimensionality of input\n",
    "\n",
    "        :type n_out: int\n",
    "        :param n_out: number of hidden units\n",
    "\n",
    "        :type activation: theano.Op or function\n",
    "        :param activation: Non linearity to be applied in the hidden layer\n",
    "        \"\"\"\n",
    "        self.input = input\n",
    "\n",
    "        # `W` is initialized with `W_values` which is uniformely sampled\n",
    "        # from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden))\n",
    "        # for tanh activation function\n",
    "        # the output of uniform if converted using asarray to dtype\n",
    "        # theano.config.floatX so that the code is runable on GPU\n",
    "        # Note : optimal initialization of weights is dependent on the\n",
    "        #        activation function used (among other things).\n",
    "        #        For example, results presented in Glorot & Bengio (2010)\n",
    "        #        suggest that you should use 4 times larger initial weights\n",
    "        #        for sigmoid compared to tanh\n",
    "        if W is None:\n",
    "            W_values = numpy.asarray(\n",
    "                rng.uniform(\n",
    "                    low=-numpy.sqrt(6. / (n_in + n_out)),\n",
    "                    high=numpy.sqrt(6. / (n_in + n_out)),\n",
    "                    size=(n_in, n_out)\n",
    "                ),\n",
    "                dtype=theano.config.floatX\n",
    "            )\n",
    "            if activation == tensor.nnet.sigmoid:\n",
    "                W_values *= 4\n",
    "\n",
    "            W = theano.shared(value=W_values, name='W', borrow=True)\n",
    "\n",
    "        if b is None:\n",
    "            b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)\n",
    "            b = theano.shared(value=b_values, name='b', borrow=True)\n",
    "\n",
    "        self.W = W\n",
    "        self.b = b\n",
    "\n",
    "        lin_output = tensor.dot(input, self.W) + self.b\n",
    "        self.output = (\n",
    "            lin_output if activation is None\n",
    "            else activation(lin_output)\n",
    "        )\n",
    "        # parameters of the model\n",
    "        self.params = [self.W, self.b]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A softmax class for the output\n",
    "This class performs computations similar to what was performed in the [logistic regression tutorial](../intro_theano/logistic_regression.ipynb).\n",
    "\n",
    "Here as well, the expression for the output is built in the class constructor, which takes the input as argument. We also add the target, `y`, and store it as an argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class LogisticRegression(object):\n",
    "    \"\"\"Multi-class Logistic Regression Class\n",
    "\n",
    "    The logistic regression is fully described by a weight matrix :math:`W`\n",
    "    and bias vector :math:`b`. Classification is done by projecting data\n",
    "    points onto a set of hyperplanes, the distance to which is used to\n",
    "    determine a class membership probability.\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(self, input, target, n_in, n_out):\n",
    "        \"\"\" Initialize the parameters of the logistic regression\n",
    "\n",
    "        :type input: theano.tensor.TensorType\n",
    "        :param input: symbolic variable that describes the input of the\n",
    "                      architecture (one minibatch)\n",
    "        \n",
    "        :type target: theano.tensor.TensorType\n",
    "        :type target: column tensor that describes the target for training\n",
    "\n",
    "        :type n_in: int\n",
    "        :param n_in: number of input units, the dimension of the space in\n",
    "                     which the datapoints lie\n",
    "\n",
    "        :type n_out: int\n",
    "        :param n_out: number of output units, the dimension of the space in\n",
    "                      which the labels lie\n",
    "\n",
    "        \"\"\"\n",
    "        # keep track of model input and target.\n",
    "        # We store a flattened (vector) version of target as y, which is easier to handle\n",
    "        self.input = input\n",
    "        self.target = target\n",
    "        self.y = target.flatten()\n",
    "\n",
    "        self.W = theano.shared(value=numpy.zeros((n_in, n_out), dtype=theano.config.floatX),\n",
    "                               name='W',\n",
    "                               borrow=True)\n",
    "        self.b = theano.shared(value=numpy.zeros((n_out,), dtype=theano.config.floatX),\n",
    "                               name='b',\n",
    "                               borrow=True)\n",
    "    \n",
    "        # class-membership probabilities\n",
    "        self.p_y_given_x = tensor.nnet.softmax(tensor.dot(input, self.W) + self.b)\n",
    "\n",
    "        # class whose probability is maximal\n",
    "        self.y_pred = tensor.argmax(self.p_y_given_x, axis=1)\n",
    "\n",
    "        # parameters of the model\n",
    "        self.params = [self.W, self.b]\n",
    "        \n",
    "\n",
    "    def negative_log_likelihood(self):\n",
    "        \"\"\"Return the mean of the negative log-likelihood of the prediction\n",
    "        of this model under a given target distribution.\n",
    "\n",
    "        Note: we use the mean instead of the sum so that\n",
    "              the learning rate is less dependent on the batch size\n",
    "        \"\"\"\n",
    "        log_prob = tensor.log(self.p_y_given_x)\n",
    "        log_likelihood = log_prob[tensor.arange(self.y.shape[0]), self.y]\n",
    "        loss = - log_likelihood.mean()\n",
    "        return loss\n",
    "\n",
    "    def errors(self):\n",
    "        \"\"\"Return a float representing the number of errors in the minibatch\n",
    "        over the total number of examples of the minibatch\n",
    "        \"\"\"\n",
    "        misclass_nb = tensor.neq(self.y_pred, self.y)\n",
    "        misclass_rate = misclass_nb.mean()\n",
    "        return misclass_rate"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The MLP class\n",
    "That class brings together the different parts of the model.\n",
    "\n",
    "It also adds additional controls on the training of the full network, for instance an expression for L1 or L2 regularization (weight decay).\n",
    "\n",
    "We can specify an arbitrary number of hidden layers, providing an empty one will reproduce the logistic regression model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "class MLP(object):\n",
    "    \"\"\"Multi-Layer Perceptron Class\n",
    "\n",
    "    A multilayer perceptron is a feedforward artificial neural network model\n",
    "    that has one layer or more of hidden units and nonlinear activations.\n",
    "    Intermediate layers usually have as activation function tanh or the\n",
    "    sigmoid function (defined here by a ``HiddenLayer`` class)  while the\n",
    "    top layer is a softmax layer (defined here by a ``LogisticRegression``\n",
    "    class).\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(self, rng, input, target, n_in, n_hidden, n_out, activation=tensor.tanh):\n",
    "        \"\"\"Initialize the parameters for the multilayer perceptron\n",
    "\n",
    "        :type rng: numpy.random.RandomState\n",
    "        :param rng: a random number generator used to initialize weights\n",
    "\n",
    "        :type input: theano.tensor.TensorType\n",
    "        :param input: symbolic variable that describes the input of the\n",
    "        architecture (one minibatch)\n",
    "        \n",
    "        :type target: theano.tensor.TensorType\n",
    "        :type target: column tensor that describes the target for training\n",
    "\n",
    "        :type n_in: int\n",
    "        :param n_in: number of input units, the dimension of the space in\n",
    "        which the datapoints lie\n",
    "\n",
    "        :type n_hidden: list of int\n",
    "        :param n_hidden: number of hidden units in each hidden layer\n",
    "\n",
    "        :type n_out: int\n",
    "        :param n_out: number of output units, the dimension of the space in\n",
    "        which the labels lie\n",
    "        \n",
    "        :type activation: theano.Op or function\n",
    "        :param activation: Non linearity to be applied in all hidden layers\n",
    "        \"\"\"\n",
    "        # keep track of model input and target.\n",
    "        # We store a flattened (vector) version of target as y, which is easier to handle\n",
    "        self.input = input\n",
    "        self.target = target\n",
    "        self.y = target.flatten()\n",
    "\n",
    "        # Build all necessary hidden layers and chain them\n",
    "        self.hidden_layers = []\n",
    "        layer_input = input\n",
    "        layer_n_in = n_in\n",
    "\n",
    "        for nh in n_hidden:\n",
    "            hidden_layer = HiddenLayer(\n",
    "                rng=rng,\n",
    "                input=layer_input,\n",
    "                n_in=layer_n_in,\n",
    "                n_out=nh,\n",
    "                activation=activation)\n",
    "            self.hidden_layers.append(hidden_layer)\n",
    "\n",
    "            # prepare variables for next layer\n",
    "            layer_input = hidden_layer.output\n",
    "            layer_n_in = nh\n",
    "\n",
    "        # The logistic regression layer gets as input the hidden units of the hidden layer,\n",
    "        # and the target\n",
    "        self.log_reg_layer = LogisticRegression(\n",
    "            input=layer_input,\n",
    "            target=target,\n",
    "            n_in=layer_n_in,\n",
    "            n_out=n_out)\n",
    "        \n",
    "        # self.params has all the parameters of the model,\n",
    "        # self.weights contains only the `W` variables.\n",
    "        # We also give unique name to the parameters, this will be useful to save them.\n",
    "        self.params = []\n",
    "        self.weights = []\n",
    "        layer_idx = 0\n",
    "        for hl in self.hidden_layers:\n",
    "            self.params.extend(hl.params)\n",
    "            self.weights.append(hl.W)\n",
    "            for hlp in hl.params:\n",
    "                prev_name = hlp.name\n",
    "                hlp.name = 'layer' + str(layer_idx) + '.' + prev_name\n",
    "            layer_idx += 1\n",
    "        self.params.extend(self.log_reg_layer.params)\n",
    "        self.weights.append(self.log_reg_layer.W)\n",
    "        for lrp in self.log_reg_layer.params:\n",
    "            prev_name = lrp.name\n",
    "            lrp.name = 'layer' + str(layer_idx) + '.' + prev_name\n",
    "\n",
    "        # L1 norm ; one regularization option is to enforce L1 norm to be small\n",
    "        self.L1 = sum(abs(W).sum() for W in self.weights)\n",
    "\n",
    "        # square of L2 norm ; one regularization option is to enforce square of L2 norm to be small\n",
    "        self.L2_sqr = sum((W ** 2).sum() for W in self.weights)\n",
    "    \n",
    "    def negative_log_likelihood(self):\n",
    "        # negative log likelihood of the MLP is given by the negative\n",
    "        # log likelihood of the output of the model, computed in the\n",
    "        # logistic regression layer\n",
    "        return self.log_reg_layer.negative_log_likelihood()\n",
    "\n",
    "    def errors(self):\n",
    "        # same holds for the function computing the number of errors\n",
    "        return self.log_reg_layer.errors()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training Procedure\n",
    "We will re-use the same training algorithm: stochastic gradient descent with mini-batches, and the same early-stopping criterion. Here, the number of parameters to train is variable, and we have to wait until the MLP model is actually instantiated to have an expression for the cost and the updates.\n",
    "\n",
    "### Gradient and Updates\n",
    "Let us define helper functions for getting expressions for the gradient of the cost wrt the parameters, and the parameter updates. The following ones are simple, but many variations can exist, for instance:\n",
    "- regularized costs, including L1 or L2 regularization\n",
    "- more complex learning rules, such as momentum, RMSProp, ADAM, ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def nll_grad(mlp_model):\n",
    "    loss = mlp_model.negative_log_likelihood()\n",
    "    params = mlp_model.params\n",
    "    grads = theano.grad(loss, wrt=params)\n",
    "    # Return (param, grad) pairs\n",
    "    return zip(params, grads)\n",
    "\n",
    "def sgd_updates(params_and_grads, learning_rate):\n",
    "    return [(param, param - learning_rate * grad)\n",
    "            for param, grad in params_and_grads]\n",
    "\n",
    "def get_simple_training_fn(mlp_model, learning_rate):\n",
    "    inputs = [mlp_model.input, mlp_model.target]\n",
    "    params_and_grads = nll_grad(mlp_model)\n",
    "    updates = sgd_updates(params_and_grads, learning_rate=lr)\n",
    "    \n",
    "    return theano.function(inputs=inputs, outputs=[], updates=updates)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def regularized_cost_grad(mlp_model, L1_reg, L2_reg):\n",
    "    loss = (mlp_model.negative_log_likelihood() +\n",
    "            L1_reg * mlp_model.L1 + \n",
    "            L2_reg * mlp_model.L2_sqr)\n",
    "    params = mlp_model.params\n",
    "    grads = theano.grad(loss, wrt=params)\n",
    "    # Return (param, grad) pairs\n",
    "    return zip(params, grads)\n",
    "\n",
    "def get_regularized_training_fn(mlp_model, L1_reg, L2_reg, learning_rate):\n",
    "    inputs = [mlp_model.input, mlp_model.target]\n",
    "    params_and_grads = regularized_cost_grad(mlp_model, L1_reg, L2_reg)\n",
    "    updates = sgd_updates(params_and_grads, learning_rate=lr)\n",
    "    return theano.function(inputs, updates=updates)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Testing function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def get_test_fn(mlp_model):\n",
    "    return theano.function([mlp_model.input, mlp_model.target], mlp_model.errors())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training the Model\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training procedure\n",
    "We first need to define a few parameters for the training loop and the early stopping procedure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import timeit\n",
    "from fuel.streams import DataStream\n",
    "from fuel.schemes import SequentialScheme\n",
    "from fuel.transformers import Flatten\n",
    "\n",
    "## early-stopping parameters tuned for 1-2 min runtime\n",
    "def sgd_training(train_model, test_model, train_set, valid_set, test_set, model_name='mlp_model',\n",
    "                 # maximum number of epochs\n",
    "                 n_epochs=20,\n",
    "                 # look at this many examples regardless\n",
    "                 patience=5000,\n",
    "                 # wait this much longer when a new best is found\n",
    "                 patience_increase=2,\n",
    "                 # a relative improvement of this much is considered significant\n",
    "                 improvement_threshold=0.995,\n",
    "                 batch_size=20):\n",
    "\n",
    "    n_train_batches = train_set.num_examples // batch_size\n",
    "\n",
    "    # Create data streams to iterate through the data.\n",
    "    train_stream = Flatten(DataStream.default_stream(\n",
    "        train_set, iteration_scheme=SequentialScheme(train_set.num_examples, batch_size)))\n",
    "    valid_stream = Flatten(DataStream.default_stream(\n",
    "        valid_set, iteration_scheme=SequentialScheme(valid_set.num_examples, batch_size)))\n",
    "    test_stream = Flatten(DataStream.default_stream(\n",
    "        test_set, iteration_scheme=SequentialScheme(test_set.num_examples, batch_size)))\n",
    "\n",
    "    # go through this many minibatches before checking the network on the validation set;\n",
    "    # in this case we check every epoch\n",
    "    validation_frequency = min(n_train_batches, patience / 2)\n",
    "\n",
    "    best_validation_loss = numpy.inf\n",
    "    test_score = 0.\n",
    "    start_time = timeit.default_timer()\n",
    "\n",
    "    done_looping = False\n",
    "    epoch = 0\n",
    "    while (epoch < n_epochs) and (not done_looping):\n",
    "        epoch = epoch + 1\n",
    "        minibatch_index = 0\n",
    "        for minibatch_x, minibatch_y in train_stream.get_epoch_iterator():\n",
    "            train_model(minibatch_x, minibatch_y)\n",
    "\n",
    "            # iteration number\n",
    "            iter = (epoch - 1) * n_train_batches + minibatch_index\n",
    "            if (iter + 1) % validation_frequency == 0:\n",
    "                # compute zero-one loss on validation set\n",
    "                validation_losses = []\n",
    "                for valid_xi, valid_yi in valid_stream.get_epoch_iterator():\n",
    "                    validation_losses.append(test_model(valid_xi, valid_yi))\n",
    "                this_validation_loss = numpy.mean(validation_losses)\n",
    "                print('epoch %i, minibatch %i/%i, validation error %f %%' %\n",
    "                      (epoch,\n",
    "                       minibatch_index + 1,\n",
    "                       n_train_batches,\n",
    "                       this_validation_loss * 100.))\n",
    "\n",
    "                # if we got the best validation score until now\n",
    "                if this_validation_loss < best_validation_loss:\n",
    "                    # improve patience if loss improvement is good enough\n",
    "                    if this_validation_loss < best_validation_loss * improvement_threshold:\n",
    "                        patience = max(patience, iter * patience_increase)\n",
    "\n",
    "                    best_validation_loss = this_validation_loss\n",
    "\n",
    "                    # test it on the test set\n",
    "                    test_losses = []\n",
    "                    for test_xi, test_yi in test_stream.get_epoch_iterator():\n",
    "                        test_losses.append(test_model(test_xi, test_yi))\n",
    "\n",
    "                    test_score = numpy.mean(test_losses)\n",
    "                    print('     epoch %i, minibatch %i/%i, test error of best model %f %%' %\n",
    "                          (epoch,\n",
    "                           minibatch_index + 1,\n",
    "                           n_train_batches,\n",
    "                           test_score * 100.))\n",
    "\n",
    "                    # save the best parameters\n",
    "                    # build a name -> value dictionary\n",
    "                    best = {param.name: param.get_value() for param in mlp_model.params}\n",
    "                    numpy.savez('best_{}.npz'.format(model_name), **best)\n",
    "\n",
    "            minibatch_index += 1\n",
    "            if patience <= iter:\n",
    "                done_looping = True\n",
    "                break\n",
    "\n",
    "    end_time = timeit.default_timer()\n",
    "    print('Optimization complete with best validation score of %f %%, '\n",
    "          'with test performance %f %%' %\n",
    "          (best_validation_loss * 100., test_score * 100.))\n",
    "\n",
    "    print('The code ran for %d epochs, with %f epochs/sec (%.2fm total time)' %\n",
    "          (epoch, 1. * epoch / (end_time - start_time), (end_time - start_time) / 60.))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then load our data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from fuel.datasets import MNIST\n",
    "\n",
    "# the full set is usually (0, 50000) for train, (50000, 60000) for valid and no slice for test.\n",
    "# We only selected a subset to go faster.\n",
    "train_set = MNIST(which_sets=('train',), sources=('features', 'targets'), subset=slice(0, 20000))\n",
    "valid_set = MNIST(which_sets=('train',), sources=('features', 'targets'), subset=slice(20000, 24000))\n",
    "test_set = MNIST(which_sets=('test',), sources=('features', 'targets'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build the Model\n",
    "Now is the time to specify and build a particular instance of the MLP. Let's start with one with a single hidden layer of 500 hidden units, and a tanh non-linearity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "rng = numpy.random.RandomState(1234)\n",
    "x = tensor.matrix('x')\n",
    "# The labels coming from Fuel are in a \"column\" format\n",
    "y = tensor.icol('y')\n",
    "\n",
    "n_in = 28 * 28\n",
    "n_out = 10\n",
    "\n",
    "mlp_model = MLP(\n",
    "    rng=rng,\n",
    "    input=x,\n",
    "    target=y,\n",
    "    n_in=n_in,\n",
    "    n_hidden=[500],\n",
    "    n_out=n_out,\n",
    "    activation=tensor.tanh)\n",
    "\n",
    "lr = numpy.float32(0.1)\n",
    "L1_reg = numpy.float32(0)\n",
    "L2_reg = numpy.float32(0.0001)\n",
    "\n",
    "train_model = get_regularized_training_fn(mlp_model, L1_reg, L2_reg, lr)\n",
    "test_model = get_test_fn(mlp_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Launch the training phase"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sgd_training(train_model, test_model, train_set, valid_set, test_set)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How can we make it better?\n",
    "\n",
    "- Max-column normalization\n",
    "- Dropout\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ReLU activation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def relu(x):\n",
    "    return x * (x > 0)\n",
    "\n",
    "rng = numpy.random.RandomState(1234)\n",
    "\n",
    "mlp_relu = MLP(\n",
    "    rng=rng,\n",
    "    input=x,\n",
    "    target=y,\n",
    "    n_in=n_in,\n",
    "    n_hidden=[500],\n",
    "    n_out=n_out,\n",
    "    activation=relu)\n",
    "\n",
    "lr = numpy.float32(0.1)\n",
    "L1_reg = numpy.float32(0)\n",
    "L2_reg = numpy.float32(0.0001)\n",
    "\n",
    "train_relu = get_regularized_training_fn(mlp_relu, L1_reg, L2_reg, lr)\n",
    "test_relu = get_test_fn(mlp_relu)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "sgd_training(train_relu, test_relu, train_set, valid_set, test_set, model_name='mlp_relu')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Momentum training (Adadelta, RMSProp, ...)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# This implements simple momentum\n",
    "def get_momentum_updates(params_and_grads, lr, rho):\n",
    "    res = []\n",
    "\n",
    "    # numpy will promote (1 - rho) to float64 otherwise\n",
    "    one = numpy.float32(1.)\n",
    "    \n",
    "    for p, g in params_and_grads:\n",
    "        up = theano.shared(p.get_value() * 0)\n",
    "        res.append((p, p - lr * up))\n",
    "        res.append((up, rho * up + (one - rho) * g))\n",
    "\n",
    "    return res\n",
    "\n",
    "\n",
    "# This implements the parameter updates for Adadelta\n",
    "def get_adadelta_updates(params_and_grads, rho):\n",
    "    up2 = [theano.shared(p.get_value() * 0, name=\"up2 for \" + p.name) for p, g in params_and_grads]\n",
    "    grads2 = [theano.shared(p.get_value() * 0, name=\"grads2 for \" + p.name) for p, g in params_and_grads]\n",
    "\n",
    "    # This is dumb but numpy will promote (1 - rho) to float64 otherwise\n",
    "    one = numpy.float32(1.)\n",
    "    \n",
    "    rg2up = [(rg2, rho * rg2 + (one - rho) * (g ** 2))\n",
    "             for rg2, (p, g) in zip(grads2, params_and_grads)]\n",
    "\n",
    "    updir = [-(tensor.sqrt(ru2 + 1e-6) / tensor.sqrt(rg2 + 1e-6)) * g\n",
    "             for (p, g), ru2, rg2 in zip(params_and_grads, up2, grads2)]\n",
    "\n",
    "    ru2up = [(ru2, rho * ru2 + (one - rho) * (ud ** 2))\n",
    "             for ru2, ud in zip(up2, updir)]\n",
    "\n",
    "    param_up = [(p, p + ud) for (p, g), ud in zip(params_and_grads, updir)]\n",
    "    \n",
    "    return rg2up + ru2up + param_up\n",
    "\n",
    "# You can try to write an RMSProp function and train the model with it.\n",
    "\n",
    "def get_momentum_training_fn(mlp_model, L1_reg, L2_reg, lr, rho):\n",
    "    inputs = [mlp_model.input, mlp_model.target]\n",
    "    params_and_grads = regularized_cost_grad(mlp_model, L1_reg, L2_reg)\n",
    "    updates = get_momentum_updates(params_and_grads, lr=lr, rho=rho)\n",
    "    return theano.function(inputs, updates=updates)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "rng = numpy.random.RandomState(1234)\n",
    "x = tensor.matrix('x')\n",
    "# The labels coming from Fuel are in a \"column\" format\n",
    "y = tensor.icol('y')\n",
    "\n",
    "n_in = 28 * 28\n",
    "n_out = 10\n",
    "\n",
    "mlp_model = MLP(\n",
    "    rng=rng,\n",
    "    input=x,\n",
    "    target=y,\n",
    "    n_in=n_in,\n",
    "    n_hidden=[500],\n",
    "    n_out=n_out,\n",
    "    activation=tensor.tanh)\n",
    "\n",
    "lr = numpy.float32(0.1)\n",
    "L1_reg = numpy.float32(0)\n",
    "L2_reg = numpy.float32(0.0001)\n",
    "rho = numpy.float32(0.95)\n",
    "\n",
    "momentum_train = get_momentum_training_fn(mlp_model, L1_reg, L2_reg, lr=lr, rho=rho)\n",
    "test_fn = get_test_fn(mlp_model)\n",
    "\n",
    "sgd_training(momentum_train, test_fn, train_set, valid_set, test_set, n_epochs=20, model_name='mlp_momentum')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
 }