Add Theano recurrent neural networks notebook.

2024-03-22 13:30:56 +08:00 · 2015-12-27 09:31:45 -05:00 · 2015-12-27 09:31:45 -05:00 · 73baf7a469
commit 73baf7a469
parent 98375d25ec
9 changed files with 1533 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -96,6 +96,7 @@ IPython Notebook(s) demonstrating deep learning functionality.
 | [theano-intro](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/intro_theano/intro_theano.ipynb) |  Intro to Theano, which allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation. |
 | [theano-scan](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/scan_tutorial/scan_tutorial.ipynb) |  Learn scans, a mechanism to perform loops in a Theano graph. |
 | [theano-logistic](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/intro_theano/logistic_regression.ipynb) |  Implement logistic regression in Theano. |
+| [theano-rnn](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/theano-tutorial/rnn_tutorial/simple_rnn.ipynb) |  Implement recurrent neural networks in Theano. |
 | [deep-dream](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/deep-learning/deep-dream/dream.ipynb) |  Caffe-based computer vision program which uses a convolutional neural network to find and enhance patterns in images. |

 <br/>
--- a/deep-learning/theano-tutorial/rnn_tutorial/Makefile
+++ b/deep-learning/theano-tutorial/rnn_tutorial/Makefile
@ -0,0 +1,13 @@
+all: instruction.pdf rnn_lstm.pdf
+
+instruction.pdf: slides_source/instruction.tex
+	cd slides_source; pdflatex --shell-escape instruction.tex
+	cd slides_source; pdflatex --shell-escape instruction.tex
+	cd slides_source; pdflatex --shell-escape instruction.tex
+	mv slides_source/instruction.pdf .
+
+rnn_lstm.pdf: slides_source/rnn_lstm.tex
+	cd slides_source; pdflatex --shell-escape rnn_lstm.tex
+	cd slides_source; pdflatex --shell-escape rnn_lstm.tex
+	cd slides_source; pdflatex --shell-escape rnn_lstm.tex
+	mv slides_source/rnn_lstm.pdf .
--- a/deep-learning/theano-tutorial/rnn_tutorial/instruction.pdf
+++ b/deep-learning/theano-tutorial/rnn_tutorial/instruction.pdf
--- a/deep-learning/theano-tutorial/rnn_tutorial/lstm_text.ipynb
+++ b/deep-learning/theano-tutorial/rnn_tutorial/lstm_text.ipynb
@ -0,0 +1,508 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introduction\n",
+    "In this demo, you'll see a more practical application of RNNs/LSTMs as character-level language models. The emphasis will be more on parallelization and using RNNs with data from Fuel.\n",
+    "\n",
+    "To get started, we first need to download the training text, validation text and a file that contains a dictionary for mapping characters to integers. We also need to import quite a list of modules."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import requests\n",
+    "import gzip\n",
+    "\n",
+    "from six.moves import cPickle as pkl\n",
+    "import time\n",
+    "\n",
+    "import numpy\n",
+    "import theano\n",
+    "import theano.tensor as T\n",
+    "\n",
+    "from theano.tensor.nnet import categorical_crossentropy\n",
+    "from theano import config\n",
+    "from fuel.datasets import TextFile\n",
+    "from fuel.streams import DataStream\n",
+    "from fuel.schemes import ConstantScheme\n",
+    "from fuel.transformers import Batch, Padding\n",
+    "\n",
+    "if not os.path.exists('traindata.txt'):\n",
+    "    r = requests.get('http://www-etud.iro.umontreal.ca/~brakelp/traindata.txt.gz')\n",
+    "    with open('traindata.txt.gz', 'wb') as data_file:\n",
+    "        data_file.write(r.content)\n",
+    "    with gzip.open('traindata.txt.gz', 'rb') as data_file:\n",
+    "        with open('traindata.txt', 'w') as out_file:\n",
+    "            out_file.write(data_file.read())\n",
+    "        \n",
+    "if not os.path.exists('valdata.txt'):\n",
+    "    r = requests.get('http://www-etud.iro.umontreal.ca/~brakelp/valdata.txt.gz')\n",
+    "    with open('valdata.txt.gz', 'wb') as data_file:\n",
+    "        data_file.write(r.content)\n",
+    "    with gzip.open('valdata.txt.gz', 'rb') as data_file:\n",
+    "        with open('valdata.txt', 'w') as out_file:\n",
+    "            out_file.write(data_file.read())\n",
+    "\n",
+    "if not os.path.exists('dictionary.pkl'):\n",
+    "    r = requests.get('http://www-etud.iro.umontreal.ca/~brakelp/dictionary.pkl')\n",
+    "    with open('dictionary.pkl', 'wb') as data_file:\n",
+    "        data_file.write(r.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "##The Model\n",
+    "The code below shows an implementation of an LSTM network. Note that there are various different variations of the LSTM in use and this one doesn't include the so-called 'peephole connections'. We used a separate method for the dynamic update to make it easier to generate from the network later. The `index_dot` function doesn't safe much verbosity, but it clarifies that certain dot products have been replaced with indexing operations because this network will be applied to discrete data. Last but not least, note the addition of the `mask` argument which is used to ignore certain parts of the input sequence."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def gauss_weight(rng, ndim_in, ndim_out=None, sd=.005):\n",
+    "    if ndim_out is None:\n",
+    "        ndim_out = ndim_in\n",
+    "    W = rng.randn(ndim_in, ndim_out) * sd\n",
+    "    return numpy.asarray(W, dtype=config.floatX)\n",
+    "\n",
+    "\n",
+    "def index_dot(indices, w):\n",
+    "    return w[indices.flatten()]\n",
+    "\n",
+    "\n",
+    "class LstmLayer:\n",
+    "\n",
+    "    def __init__(self, rng, input, mask, n_in, n_h):\n",
+    "\n",
+    "        # Init params\n",
+    "        self.W_i = theano.shared(gauss_weight(rng, n_in, n_h), 'W_i', borrow=True)\n",
+    "        self.W_f = theano.shared(gauss_weight(rng, n_in, n_h), 'W_f', borrow=True)\n",
+    "        self.W_c = theano.shared(gauss_weight(rng, n_in, n_h), 'W_c', borrow=True)\n",
+    "        self.W_o = theano.shared(gauss_weight(rng, n_in, n_h), 'W_o', borrow=True)\n",
+    "\n",
+    "        self.U_i = theano.shared(gauss_weight(rng, n_h), 'U_i', borrow=True)\n",
+    "        self.U_f = theano.shared(gauss_weight(rng, n_h), 'U_f', borrow=True)\n",
+    "        self.U_c = theano.shared(gauss_weight(rng, n_h), 'U_c', borrow=True)\n",
+    "        self.U_o = theano.shared(gauss_weight(rng, n_h), 'U_o', borrow=True)\n",
+    "\n",
+    "        self.b_i = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),\n",
+    "                                 'b_i', borrow=True)\n",
+    "        self.b_f = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),\n",
+    "                                 'b_f', borrow=True)\n",
+    "        self.b_c = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),\n",
+    "                                 'b_c', borrow=True)\n",
+    "        self.b_o = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),\n",
+    "                                 'b_o', borrow=True)\n",
+    "\n",
+    "        self.params = [self.W_i, self.W_f, self.W_c, self.W_o,\n",
+    "                       self.U_i, self.U_f, self.U_c, self.U_o,\n",
+    "                       self.b_i, self.b_f, self.b_c, self.b_o]\n",
+    "\n",
+    "        outputs_info = [T.zeros((input.shape[1], n_h)),\n",
+    "                        T.zeros((input.shape[1], n_h))]\n",
+    "\n",
+    "        rval, updates = theano.scan(self._step,\n",
+    "                                    sequences=[mask, input],\n",
+    "                                    outputs_info=outputs_info)\n",
+    "\n",
+    "        # self.output is in the format (length, batchsize, n_h)\n",
+    "        self.output = rval[0]\n",
+    "\n",
+    "    def _step(self, m_, x_, h_, c_):\n",
+    "\n",
+    "        i_preact = (index_dot(x_, self.W_i) +\n",
+    "                    T.dot(h_, self.U_i) + self.b_i)\n",
+    "        i = T.nnet.sigmoid(i_preact)\n",
+    "\n",
+    "        f_preact = (index_dot(x_, self.W_f) +\n",
+    "                    T.dot(h_, self.U_f) + self.b_f)\n",
+    "        f = T.nnet.sigmoid(f_preact)\n",
+    "\n",
+    "        o_preact = (index_dot(x_, self.W_o) +\n",
+    "                    T.dot(h_, self.U_o) + self.b_o)\n",
+    "        o = T.nnet.sigmoid(o_preact)\n",
+    "\n",
+    "        c_preact = (index_dot(x_, self.W_c) +\n",
+    "                    T.dot(h_, self.U_c) + self.b_c)\n",
+    "        c = T.tanh(c_preact)\n",
+    "\n",
+    "        c = f * c_ + i * c\n",
+    "        c = m_[:, None] * c + (1. - m_)[:, None] * c_\n",
+    "\n",
+    "        h = o * T.tanh(c)\n",
+    "        h = m_[:, None] * h + (1. - m_)[:, None] * h_\n",
+    "\n",
+    "        return h, c"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The next block contains some code that computes cross-entropy for masked sequences and a stripped down version of the logistic regression class from the deep learning tutorials which we will need later."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def sequence_categorical_crossentropy(prediction, targets, mask):\n",
+    "    prediction_flat = prediction.reshape(((prediction.shape[0] *\n",
+    "                                           prediction.shape[1]),\n",
+    "                                          prediction.shape[2]), ndim=2)\n",
+    "    targets_flat = targets.flatten()\n",
+    "    mask_flat = mask.flatten()\n",
+    "    ce = categorical_crossentropy(prediction_flat, targets_flat)\n",
+    "    return T.sum(ce * mask_flat)\n",
+    "\n",
+    "\n",
+    "class LogisticRegression(object):\n",
+    "   \n",
+    "    def __init__(self, rng, input, n_in, n_out):\n",
+    "        \n",
+    "        W = gauss_weight(rng, n_in, n_out)\n",
+    "        self.W = theano.shared(value=numpy.asarray(W, dtype=theano.config.floatX),\n",
+    "                               name='W', borrow=True)\n",
+    "        # initialize the biases b as a vector of n_out 0s\n",
+    "        self.b = theano.shared(value=numpy.zeros((n_out,),\n",
+    "                                                 dtype=theano.config.floatX),\n",
+    "                               name='b', borrow=True)\n",
+    "\n",
+    "        # compute vector of class-membership probabilities in symbolic form\n",
+    "        energy = T.dot(input, self.W) + self.b\n",
+    "        energy_exp = T.exp(energy - T.max(energy, axis=2, keepdims=True))\n",
+    "        pmf = energy_exp / energy_exp.sum(axis=2, keepdims=True)\n",
+    "        self.p_y_given_x = pmf\n",
+    "        self.params = [self.W, self.b]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#Processing the Data\n",
+    "The data in `traindata.txt` and `valdata.txt` is simply English text but formatted in such a way that every sentence is conveniently separated by the newline symbol. We'll use some of the functionality of fuel to perform the following preprocessing steps:\n",
+    "* Convert everything to lowercase\n",
+    "* Map characters to indices\n",
+    "* Group the sentences into batches\n",
+    "* Convert each batch in a matrix/tensor as long as the longest sequence with zeros padded to all the shorter sequences\n",
+    "* Add a mask matrix that encodes the length of each sequence (a timestep at which the mask is 0 indicates that there is no data available)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "batch_size = 100\n",
+    "n_epochs = 40\n",
+    "n_h = 50\n",
+    "DICT_FILE = 'dictionary.pkl'\n",
+    "TRAIN_FILE = 'traindata.txt'\n",
+    "VAL_FILE = 'valdata.txt'\n",
+    "\n",
+    "# Load the datasets with Fuel\n",
+    "dictionary = pkl.load(open(DICT_FILE, 'r'))\n",
+    "# add a symbol for unknown characters\n",
+    "dictionary['~'] = len(dictionary)\n",
+    "reverse_mapping = dict((j, i) for i, j in dictionary.items())\n",
+    "\n",
+    "train = TextFile(files=[TRAIN_FILE],\n",
+    "                 dictionary=dictionary,\n",
+    "                 unk_token='~',\n",
+    "                 level='character',\n",
+    "                 preprocess=str.lower,\n",
+    "                 bos_token=None,\n",
+    "                 eos_token=None)\n",
+    "\n",
+    "train_stream = DataStream.default_stream(train)\n",
+    "\n",
+    "# organize data in batches and pad shorter sequences with zeros\n",
+    "train_stream = Batch(train_stream,\n",
+    "                     iteration_scheme=ConstantScheme(batch_size))\n",
+    "train_stream = Padding(train_stream)\n",
+    "\n",
+    "# idem dito for the validation text\n",
+    "val = TextFile(files=[VAL_FILE],\n",
+    "                 dictionary=dictionary,\n",
+    "                 unk_token='~',\n",
+    "                 level='character',\n",
+    "                 preprocess=str.lower,\n",
+    "                 bos_token=None,\n",
+    "                 eos_token=None)\n",
+    "\n",
+    "val_stream = DataStream.default_stream(val)\n",
+    "\n",
+    "# organize data in batches and pad shorter sequences with zeros\n",
+    "val_stream = Batch(val_stream,\n",
+    "                     iteration_scheme=ConstantScheme(batch_size))\n",
+    "val_stream = Padding(val_stream)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "##The Theano Graph\n",
+    "We'll now define the complete Theano graph for computing costs and gradients among other things. The cost will be the cross-entropy of the next character in the sequence and the network will try to predict it based on the previous characters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "# Set the random number generator' seeds for consistency\n",
+    "rng = numpy.random.RandomState(12345)\n",
+    "\n",
+    "x = T.lmatrix('x')\n",
+    "mask = T.matrix('mask')\n",
+    "\n",
+    "# Construct an LSTM layer and a logistic regression layer\n",
+    "recurrent_layer = LstmLayer(rng=rng, input=x, mask=mask, n_in=111, n_h=n_h)\n",
+    "logreg_layer = LogisticRegression(rng=rng, input=recurrent_layer.output[:-1],\n",
+    "                                  n_in=n_h, n_out=111)\n",
+    "\n",
+    "# define a cost variable to optimize\n",
+    "cost = sequence_categorical_crossentropy(logreg_layer.p_y_given_x,\n",
+    "                                         x[1:],\n",
+    "                                         mask[1:]) / batch_size\n",
+    "\n",
+    "# create a list of all model parameters to be fit by gradient descent\n",
+    "params = logreg_layer.params + recurrent_layer.params\n",
+    "\n",
+    "# create a list of gradients for all model parameters\n",
+    "grads = T.grad(cost, params)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can now compile the function that updates the gradients. We also added a function that computes the cost without updating for monitoring purposes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/pbrakel/Repositories/Theano/theano/scan_module/scan_perform_ext.py:117: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility\n",
+      "  from scan_perform.scan_perform import *\n"
+     ]
+    }
+   ],
+   "source": [
+    "learning_rate = 0.1\n",
+    "updates = [\n",
+    "    (param_i, param_i - learning_rate * grad_i)\n",
+    "    for param_i, grad_i in zip(params, grads)\n",
+    "]\n",
+    "\n",
+    "update_model = theano.function([x, mask], cost, updates=updates)\n",
+    "\n",
+    "evaluate_model = theano.function([x, mask], cost)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "##Generating Sequences\n",
+    "To see if the networks learn something useful (and to make results monitoring more entertaining), we'll also write some code to generate sequences. For this, we'll first compile a function that computes a single state update for the network to have more control over the values of each variable at each time step."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "x_t = T.iscalar()\n",
+    "h_p = T.vector()\n",
+    "c_p = T.vector()\n",
+    "h_t, c_t = recurrent_layer._step(T.ones(1), x_t, h_p, c_p)\n",
+    "energy = T.dot(h_t, logreg_layer.W) + logreg_layer.b\n",
+    "\n",
+    "energy_exp = T.exp(energy - T.max(energy, axis=1, keepdims=True))\n",
+    "\n",
+    "output = energy_exp / energy_exp.sum(axis=1, keepdims=True)\n",
+    "single_step = theano.function([x_t, h_p, c_p], [output, h_t, c_t])\n",
+    "\n",
+    "def speak(single_step, prefix='the meaning of life is ', n_steps=450):\n",
+    "    try:\n",
+    "        h_p = numpy.zeros((n_h,), dtype=config.floatX)\n",
+    "        c_p = numpy.zeros((n_h,), dtype=config.floatX)\n",
+    "        sentence = prefix\n",
+    "        for char in prefix:\n",
+    "            x_t = dictionary[char]\n",
+    "            prediction, h_p, c_p = single_step(x_t, h_p.flatten(),\n",
+    "                                               c_p.flatten())\n",
+    "        # Renormalize probability in float64\n",
+    "        flat_prediction = prediction.flatten()\n",
+    "        flat_pred_sum = flat_prediction.sum(dtype='float64')\n",
+    "        if flat_pred_sum > 1:\n",
+    "            flat_prediction = flat_prediction.astype('float64') / flat_pred_sum\n",
+    "        sample = numpy.random.multinomial(1, flat_prediction)\n",
+    "\n",
+    "        for i in range(n_steps):\n",
+    "            x_t = numpy.argmax(sample)\n",
+    "            prediction, h_p, c_p = single_step(x_t, h_p.flatten(),\n",
+    "                                               c_p.flatten())\n",
+    "            # Renormalize probability in float64\n",
+    "            flat_prediction = prediction.flatten()\n",
+    "            flat_pred_sum = flat_prediction.sum(dtype='float64')\n",
+    "            if flat_pred_sum > 1:\n",
+    "                flat_prediction = flat_prediction.astype('float64') / flat_pred_sum\n",
+    "            sample = numpy.random.multinomial(1, flat_prediction)\n",
+    "\n",
+    "            sentence += reverse_mapping[x_t]\n",
+    "\n",
+    "        return sentence\n",
+    "    except ValueError as e:\n",
+    "        print 'Something went wrong during sentence generation: {}'.format(e)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch: 0\n",
+      "\n",
+      "LSTM: \"the meaning of life is i<>ateisn ^ltbagss7tuodkca r9 msd,forreypoctlluoiasrn?at<61>netteofkotenni<6E>cf/vattosnlrxisiovu<76>al.hahau<61>ootwo tuost! ]cw<63> eweunhufaaecihtdtk tticiss cvt2f etoct bllstsluohh-,retti?eusrv eikly an<61>ade'i stiel<65>doelnamtuartoci<63>ht.<2E>woi 2kfs$an tpeo<65>miiadain9.e eegtamiaesboeinne<6E>unlocityqe dansapeaeiyo<79>ihaewmtrt<72>'aa svteatae ,otrr.gsac.-perioswetgoc<6F>io froaoeismhsgtulherbttrh fl<66>i el  nnltnta<74>sat yhomsnttwlnwnenaee.mhits r<>us-thist sn man4lamhpac.osdopl g<>\"\n",
+      "\n",
+      "epoch: 0   minibatch: 40\n",
+      "Average validation CE per sentence: 251.167072292\n"
+     ]
+    },
+    {
+     "ename": "KeyboardInterrupt",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[1;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
+      "\u001b[1;32m<ipython-input-13-7c09df6ae427>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      9\u001b[0m         \u001b[0miteration\u001b[0m \u001b[1;33m+=\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     10\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 11\u001b[1;33m         \u001b[0mcross_entropy\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mupdate_model\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mx_\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mT\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mmask_\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mT\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m     12\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     13\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
+      "\u001b[1;32m/home/pbrakel/Repositories/Theano/theano/compile/function_module.pyc\u001b[0m in \u001b[0;36m__call__\u001b[1;34m(self, *args, **kwargs)\u001b[0m\n\u001b[0;32m    577\u001b[0m         \u001b[0mt0_fn\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    578\u001b[0m         \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 579\u001b[1;33m             \u001b[0moutputs\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    580\u001b[0m         \u001b[1;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    581\u001b[0m             \u001b[1;32mif\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'position_of_error'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
+      "\u001b[1;32m/home/pbrakel/Repositories/Theano/theano/scan_module/scan_op.pyc\u001b[0m in \u001b[0;36mrval\u001b[1;34m(p, i, o, n)\u001b[0m\n\u001b[0;32m    649\u001b[0m         \u001b[1;31m# default arguments are stored in the closure of `rval`\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    650\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 651\u001b[1;33m         \u001b[1;32mdef\u001b[0m \u001b[0mrval\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mp\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mp\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mi\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mnode_input_storage\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mo\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mnode_output_storage\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mn\u001b[0m\u001b[1;33m=\u001b[0m\u001b[0mnode\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m    652\u001b[0m             \u001b[0mr\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mp\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mn\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;33m[\u001b[0m\u001b[0mx\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m0\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mx\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mi\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mo\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m    653\u001b[0m             \u001b[1;32mfor\u001b[0m \u001b[0mo\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mnode\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0moutputs\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
+      "\u001b[1;31mKeyboardInterrupt\u001b[0m: "
+     ]
+    }
+   ],
+   "source": [
+    "start_time = time.clock()\n",
+    "\n",
+    "iteration = 0\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "    print 'epoch:', epoch\n",
+    "\n",
+    "    for x_, mask_ in train_stream.get_epoch_iterator():\n",
+    "        iteration += 1\n",
+    "\n",
+    "        cross_entropy = update_model(x_.T, mask_.T)\n",
+    "\n",
+    "\n",
+    "        # Generate some text after each 20 minibatches\n",
+    "        if iteration % 40 == 0:\n",
+    "            sentence = speak(single_step, prefix='the meaning of life is ', n_steps=450)\n",
+    "            print\n",
+    "            print 'LSTM: \"' + sentence + '\"'\n",
+    "            print\n",
+    "            print 'epoch:', epoch, '  minibatch:', iteration\n",
+    "            val_scores = []\n",
+    "            for x_val, mask_val in val_stream.get_epoch_iterator():\n",
+    "                val_scores.append(evaluate_model(x_val.T, mask_val.T))\n",
+    "            print 'Average validation CE per sentence:', numpy.mean(val_scores)\n",
+    "\n",
+    "end_time = time.clock()\n",
+    "print('Optimization complete.')\n",
+    "print('The code ran for %.2fm' % ((end_time - start_time) / 60.))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "It can take a while before the text starts to look more reasonable but here are some things to experiment with:\n",
+    "* Smarter optimization algorithms (or at least momentum)\n",
+    "* Initializing the recurrent weights orthogonally\n",
+    "* The sizes of the initial weights and biases (think about what the gates do)\n",
+    "* Different sentence prefixes\n",
+    "* Changing the temperature of the character distribution during generation. What happens when you generate deterministically?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
--- a/deep-learning/theano-tutorial/rnn_tutorial/lstm_text.py
+++ b/deep-learning/theano-tutorial/rnn_tutorial/lstm_text.py
@ -0,0 +1,299 @@
+import cPickle as pkl
+import time
+
+import numpy
+import theano
+from theano import config
+import theano.tensor as T
+from theano.tensor.nnet import categorical_crossentropy
+
+from fuel.datasets import TextFile
+from fuel.streams import DataStream
+from fuel.schemes import ConstantScheme
+from fuel.transformers import Batch, Padding
+
+
+# These files can be downloaded from
+# http://www-etud.iro.umontreal.ca/~brakelp/train.txt.gz
+# http://www-etud.iro.umontreal.ca/~brakelp/dictionary.pkl
+# don't forget to change the paths and gunzip train.txt.gz
+TRAIN_FILE = '/u/brakelp/temp/traindata.txt'
+VAL_FILE = '/u/brakelp/temp/valdata.txt'
+DICT_FILE = '/u/brakelp/temp/dictionary.pkl'
+
+
+def sequence_categorical_crossentropy(prediction, targets, mask):
+    prediction_flat = prediction.reshape(((prediction.shape[0] *
+                                           prediction.shape[1]),
+                                          prediction.shape[2]), ndim=2)
+    targets_flat = targets.flatten()
+    mask_flat = mask.flatten()
+    ce = categorical_crossentropy(prediction_flat, targets_flat)
+    return T.sum(ce * mask_flat)
+
+
+def gauss_weight(ndim_in, ndim_out=None, sd=.005):
+    if ndim_out is None:
+        ndim_out = ndim_in
+    W = numpy.random.randn(ndim_in, ndim_out) * sd
+    return numpy.asarray(W, dtype=config.floatX)
+
+
+class LogisticRegression(object):
+    """Multi-class Logistic Regression Class
+
+    The logistic regression is fully described by a weight matrix :math:`W`
+    and bias vector :math:`b`. Classification is done by projecting data
+    points onto a set of hyperplanes, the distance to which is used to
+    determine a class membership probability.
+    """
+
+    def __init__(self, input, n_in, n_out):
+        """ Initialize the parameters of the logistic regression
+
+        :type input: theano.tensor.TensorType
+        :param input: symbolic variable that describes the input of the
+                      architecture (one minibatch)
+
+        :type n_in: int
+        :param n_in: number of input units, the dimension of the space in
+                     which the datapoints lie
+
+        :type n_out: int
+        :param n_out: number of output units, the dimension of the space in
+                      which the labels lie
+
+        """
+
+        # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
+        self.W = theano.shared(value=numpy.zeros((n_in, n_out),
+                                                 dtype=theano.config.floatX),
+                               name='W', borrow=True)
+        # initialize the baises b as a vector of n_out 0s
+        self.b = theano.shared(value=numpy.zeros((n_out,),
+                                                 dtype=theano.config.floatX),
+                               name='b', borrow=True)
+
+        # compute vector of class-membership probabilities in symbolic form
+        energy = T.dot(input, self.W) + self.b
+        energy_exp = T.exp(energy - T.max(energy, 2)[:, :, None])
+        pmf = energy_exp / energy_exp.sum(2)[:, :, None]
+        self.p_y_given_x = pmf
+
+        # compute prediction as class whose probability is maximal in
+        # symbolic form
+        self.y_pred = T.argmax(self.p_y_given_x, axis=1)
+
+        # parameters of the model
+        self.params = [self.W, self.b]
+
+
+def index_dot(indices, w):
+    return w[indices.flatten()]
+
+
+class LstmLayer:
+
+    def __init__(self, rng, input, mask, n_in, n_h):
+
+        # Init params
+        self.W_i = theano.shared(gauss_weight(n_in, n_h), 'W_i', borrow=True)
+        self.W_f = theano.shared(gauss_weight(n_in, n_h), 'W_f', borrow=True)
+        self.W_c = theano.shared(gauss_weight(n_in, n_h), 'W_c', borrow=True)
+        self.W_o = theano.shared(gauss_weight(n_in, n_h), 'W_o', borrow=True)
+
+        self.U_i = theano.shared(gauss_weight(n_h), 'U_i', borrow=True)
+        self.U_f = theano.shared(gauss_weight(n_h), 'U_f', borrow=True)
+        self.U_c = theano.shared(gauss_weight(n_h), 'U_c', borrow=True)
+        self.U_o = theano.shared(gauss_weight(n_h), 'U_o', borrow=True)
+
+        self.b_i = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
+                                 'b_i', borrow=True)
+        self.b_f = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
+                                 'b_f', borrow=True)
+        self.b_c = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
+                                 'b_c', borrow=True)
+        self.b_o = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
+                                 'b_o', borrow=True)
+
+        self.params = [self.W_i, self.W_f, self.W_c, self.W_o,
+                       self.U_i, self.U_f, self.U_c, self.U_o,
+                       self.b_i, self.b_f, self.b_c, self.b_o]
+
+        outputs_info = [T.zeros((input.shape[1], n_h)),
+                        T.zeros((input.shape[1], n_h))]
+
+        rval, updates = theano.scan(self._step,
+                                    sequences=[mask, input],
+                                    outputs_info=outputs_info)
+
+        # self.output is in the format (batchsize, n_h)
+        self.output = rval[0]
+
+    def _step(self, m_, x_, h_, c_):
+
+        i_preact = (index_dot(x_, self.W_i) +
+                    T.dot(h_, self.U_i) + self.b_i)
+        i = T.nnet.sigmoid(i_preact)
+
+        f_preact = (index_dot(x_, self.W_f) +
+                    T.dot(h_, self.U_f) + self.b_f)
+        f = T.nnet.sigmoid(f_preact)
+
+        o_preact = (index_dot(x_, self.W_o) +
+                    T.dot(h_, self.U_o) + self.b_o)
+        o = T.nnet.sigmoid(o_preact)
+
+        c_preact = (index_dot(x_, self.W_c) +
+                    T.dot(h_, self.U_c) + self.b_c)
+        c = T.tanh(c_preact)
+
+        c = f * c_ + i * c
+        c = m_[:, None] * c + (1. - m_)[:, None] * c_
+
+        h = o * T.tanh(c)
+        h = m_[:, None] * h + (1. - m_)[:, None] * h_
+
+        return h, c
+
+
+def train_model(batch_size=100, n_h=50, n_epochs=40):
+
+    # Load the datasets with Fuel
+    dictionary = pkl.load(open(DICT_FILE, 'r'))
+    dictionary['~'] = len(dictionary)
+    reverse_mapping = dict((j, i) for i, j in dictionary.items())
+
+    print("Loading the data")
+    train = TextFile(files=[TRAIN_FILE],
+                     dictionary=dictionary,
+                     unk_token='~',
+                     level='character',
+                     preprocess=str.lower,
+                     bos_token=None,
+                     eos_token=None)
+
+    train_stream = DataStream.default_stream(train)
+
+    # organize data in batches and pad shorter sequences with zeros
+    train_stream = Batch(train_stream,
+                         iteration_scheme=ConstantScheme(batch_size))
+    train_stream = Padding(train_stream)
+
+    # idem dito for the validation text
+    val = TextFile(files=[VAL_FILE],
+                     dictionary=dictionary,
+                     unk_token='~',
+                     level='character',
+                     preprocess=str.lower,
+                     bos_token=None,
+                     eos_token=None)
+
+    val_stream = DataStream.default_stream(val)
+
+    # organize data in batches and pad shorter sequences with zeros
+    val_stream = Batch(val_stream,
+                         iteration_scheme=ConstantScheme(batch_size))
+    val_stream = Padding(val_stream)
+
+    print('Building model')
+
+    # Set the random number generator' seeds for consistency
+    rng = numpy.random.RandomState(12345)
+
+    x = T.lmatrix('x')
+    mask = T.matrix('mask')
+
+    # Construct the LSTM layer
+    recurrent_layer = LstmLayer(rng=rng, input=x, mask=mask, n_in=111, n_h=n_h)
+
+    logreg_layer = LogisticRegression(input=recurrent_layer.output[:-1],
+                                      n_in=n_h, n_out=111)
+
+    cost = sequence_categorical_crossentropy(logreg_layer.p_y_given_x,
+                                             x[1:],
+                                             mask[1:]) / batch_size
+
+    # create a list of all model parameters to be fit by gradient descent
+    params = logreg_layer.params + recurrent_layer.params
+
+    # create a list of gradients for all model parameters
+    grads = T.grad(cost, params)
+
+    # update_model is a function that updates the model parameters by
+    # SGD Since this model has many parameters, it would be tedious to
+    # manually create an update rule for each model parameter. We thus
+    # create the updates list by automatically looping over all
+    # (params[i], grads[i]) pairs.
+    learning_rate = 0.1
+    updates = [
+        (param_i, param_i - learning_rate * grad_i)
+        for param_i, grad_i in zip(params, grads)
+    ]
+
+    update_model = theano.function([x, mask], cost, updates=updates)
+
+    evaluate_model = theano.function([x, mask], cost)
+
+    # Define and compile a function for generating a sequence step by step.
+    x_t = T.iscalar()
+    h_p = T.vector()
+    c_p = T.vector()
+    h_t, c_t = recurrent_layer._step(T.ones(1), x_t, h_p, c_p)
+    energy = T.dot(h_t, logreg_layer.W) + logreg_layer.b
+
+    energy_exp = T.exp(energy - T.max(energy, 1)[:, None])
+
+    output = energy_exp / energy_exp.sum(1)[:, None]
+    single_step = theano.function([x_t, h_p, c_p], [output, h_t, c_t])
+
+    start_time = time.clock()
+
+    iteration = 0
+
+    for epoch in range(n_epochs):
+        print 'epoch:', epoch
+
+        for x_, mask_ in train_stream.get_epoch_iterator():
+            iteration += 1
+
+            cross_entropy = update_model(x_.T, mask_.T)
+
+
+            # Generate some text after each 20 minibatches
+            if iteration % 40 == 0:
+                try:
+                    prediction = numpy.ones(111, dtype=config.floatX) / 111.0
+                    h_p = numpy.zeros((n_h,), dtype=config.floatX)
+                    c_p = numpy.zeros((n_h,), dtype=config.floatX)
+                    initial = 'the meaning of life is '
+                    sentence = initial
+                    for char in initial:
+                        x_t = dictionary[char]
+                        prediction, h_p, c_p = single_step(x_t, h_p.flatten(),
+                                                           c_p.flatten())
+                    sample = numpy.random.multinomial(1, prediction.flatten())
+                    for i in range(450):
+                        x_t = numpy.argmax(sample)
+                        prediction, h_p, c_p = single_step(x_t, h_p.flatten(),
+                                                           c_p.flatten())
+                        sentence += reverse_mapping[x_t]
+                        sample = numpy.random.multinomial(1, prediction.flatten())
+                    print 'LSTM: "' + sentence + '"'
+                except ValueError:
+                    print 'Something went wrong during sentence generation.'
+
+            if iteration % 40 == 0:
+                print 'epoch:', epoch, '  minibatch:', iteration
+                val_scores = []
+                for x_val, mask_val in val_stream.get_epoch_iterator():
+                    val_scores.append(evaluate_model(x_val.T, mask_val.T))
+                print 'Average validation CE per sentence:', numpy.mean(val_scores)
+
+    end_time = time.clock()
+    print('Optimization complete.')
+    print('The code ran for %.2fm' % ((end_time - start_time) / 60.))
+
+
+if __name__ == '__main__':
+    train_model()
--- a/deep-learning/theano-tutorial/rnn_tutorial/rnn_lstm.pdf
+++ b/deep-learning/theano-tutorial/rnn_tutorial/rnn_lstm.pdf
--- a/deep-learning/theano-tutorial/rnn_tutorial/rnn_precompile.py
+++ b/deep-learning/theano-tutorial/rnn_tutorial/rnn_precompile.py
@ -0,0 +1,234 @@
+"""This file is only here to speed up the execution of notebooks.
+
+It contains a subset of the code defined in simple_rnn.ipynb and
+lstm_text.ipynb, in particular the code compiling Theano function.
+Executing this script first will populate the cache of compiled C code,
+which will make subsequent compilations faster.
+
+The use case is to run this script in the background when a demo VM
+such as the one for NVIDIA's qwikLABS, so that the compilation phase
+started from the notebooks is faster.
+
+"""
+import numpy
+
+import theano
+import theano.tensor as T
+
+from theano import config
+from theano.tensor.nnet import categorical_crossentropy
+
+
+floatX = theano.config.floatX
+
+
+# simple_rnn.ipynb
+
+class SimpleRNN(object):
+    def __init__(self, input_dim, recurrent_dim):
+        w_xh = numpy.random.normal(0, .01, (input_dim, recurrent_dim))
+        w_hh = numpy.random.normal(0, .02, (recurrent_dim, recurrent_dim))
+        self.w_xh = theano.shared(numpy.asarray(w_xh, dtype=floatX), name='w_xh')
+        self.w_hh = theano.shared(numpy.asarray(w_hh, dtype=floatX), name='w_hh')
+        self.b_h = theano.shared(numpy.zeros((recurrent_dim,), dtype=floatX), name='b_h')
+        self.parameters = [self.w_xh, self.w_hh, self.b_h]
+
+    def _step(self, input_t, previous):
+        return T.tanh(T.dot(previous, self.w_hh) + input_t)
+
+    def __call__(self, x):
+        x_w_xh = T.dot(x, self.w_xh) + self.b_h
+        result, updates = theano.scan(self._step,
+                                      sequences=[x_w_xh],
+                                      outputs_info=[T.zeros_like(self.b_h)])
+        return result
+
+
+w_ho_np = numpy.random.normal(0, .01, (15, 1))
+w_ho = theano.shared(numpy.asarray(w_ho_np, dtype=floatX), name='w_ho')
+b_o = theano.shared(numpy.zeros((1,), dtype=floatX), name='b_o')
+
+x = T.matrix('x')
+my_rnn = SimpleRNN(1, 15)
+hidden = my_rnn(x)
+prediction = T.dot(hidden, w_ho) + b_o
+parameters = my_rnn.parameters + [w_ho, b_o]
+l2 = sum((p**2).sum() for p in parameters)
+mse = T.mean((prediction[:-1] - x[1:])**2)
+cost = mse + .0001 * l2
+gradient = T.grad(cost, wrt=parameters)
+
+lr = .3
+updates = [(par, par - lr * gra) for par, gra in zip(parameters, gradient)]
+update_model = theano.function([x], cost, updates=updates)
+get_cost = theano.function([x], mse)
+predict = theano.function([x], prediction)
+get_hidden = theano.function([x], hidden)
+get_gradient = theano.function([x], gradient)
+
+predict = theano.function([x], prediction)
+
+# Generating sequences
+
+x_t = T.vector()
+h_p = T.vector()
+preactivation = T.dot(x_t, my_rnn.w_xh) + my_rnn.b_h
+h_t = my_rnn._step(preactivation, h_p)
+o_t = T.dot(h_t, w_ho) + b_o
+
+single_step = theano.function([x_t, h_p], [o_t, h_t])
+
+# lstm_text.ipynb
+
+def gauss_weight(rng, ndim_in, ndim_out=None, sd=.005):
+    if ndim_out is None:
+        ndim_out = ndim_in
+    W = rng.randn(ndim_in, ndim_out) * sd
+    return numpy.asarray(W, dtype=config.floatX)
+
+
+def index_dot(indices, w):
+    return w[indices.flatten()]
+
+
+class LstmLayer:
+
+    def __init__(self, rng, input, mask, n_in, n_h):
+
+        # Init params
+        self.W_i = theano.shared(gauss_weight(rng, n_in, n_h), 'W_i', borrow=True)
+        self.W_f = theano.shared(gauss_weight(rng, n_in, n_h), 'W_f', borrow=True)
+        self.W_c = theano.shared(gauss_weight(rng, n_in, n_h), 'W_c', borrow=True)
+        self.W_o = theano.shared(gauss_weight(rng, n_in, n_h), 'W_o', borrow=True)
+
+        self.U_i = theano.shared(gauss_weight(rng, n_h), 'U_i', borrow=True)
+        self.U_f = theano.shared(gauss_weight(rng, n_h), 'U_f', borrow=True)
+        self.U_c = theano.shared(gauss_weight(rng, n_h), 'U_c', borrow=True)
+        self.U_o = theano.shared(gauss_weight(rng, n_h), 'U_o', borrow=True)
+
+        self.b_i = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
+                                 'b_i', borrow=True)
+        self.b_f = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
+                                 'b_f', borrow=True)
+        self.b_c = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
+                                 'b_c', borrow=True)
+        self.b_o = theano.shared(numpy.zeros((n_h,), dtype=config.floatX),
+                                 'b_o', borrow=True)
+
+        self.params = [self.W_i, self.W_f, self.W_c, self.W_o,
+                       self.U_i, self.U_f, self.U_c, self.U_o,
+                       self.b_i, self.b_f, self.b_c, self.b_o]
+
+        outputs_info = [T.zeros((input.shape[1], n_h)),
+                        T.zeros((input.shape[1], n_h))]
+
+        rval, updates = theano.scan(self._step,
+                                    sequences=[mask, input],
+                                    outputs_info=outputs_info)
+
+        # self.output is in the format (length, batchsize, n_h)
+        self.output = rval[0]
+
+    def _step(self, m_, x_, h_, c_):
+
+        i_preact = (index_dot(x_, self.W_i) +
+                    T.dot(h_, self.U_i) + self.b_i)
+        i = T.nnet.sigmoid(i_preact)
+
+        f_preact = (index_dot(x_, self.W_f) +
+                    T.dot(h_, self.U_f) + self.b_f)
+        f = T.nnet.sigmoid(f_preact)
+
+        o_preact = (index_dot(x_, self.W_o) +
+                    T.dot(h_, self.U_o) + self.b_o)
+        o = T.nnet.sigmoid(o_preact)
+
+        c_preact = (index_dot(x_, self.W_c) +
+                    T.dot(h_, self.U_c) + self.b_c)
+        c = T.tanh(c_preact)
+
+        c = f * c_ + i * c
+        c = m_[:, None] * c + (1. - m_)[:, None] * c_
+
+        h = o * T.tanh(c)
+        h = m_[:, None] * h + (1. - m_)[:, None] * h_
+
+        return h, c
+
+
+def sequence_categorical_crossentropy(prediction, targets, mask):
+    prediction_flat = prediction.reshape(((prediction.shape[0] *
+                                           prediction.shape[1]),
+                                          prediction.shape[2]), ndim=2)
+    targets_flat = targets.flatten()
+    mask_flat = mask.flatten()
+    ce = categorical_crossentropy(prediction_flat, targets_flat)
+    return T.sum(ce * mask_flat)
+
+
+class LogisticRegression(object):
+
+    def __init__(self, rng, input, n_in, n_out):
+
+        W = gauss_weight(rng, n_in, n_out)
+        self.W = theano.shared(value=numpy.asarray(W, dtype=theano.config.floatX),
+                               name='W', borrow=True)
+        # initialize the biases b as a vector of n_out 0s
+        self.b = theano.shared(value=numpy.zeros((n_out,),
+                                                 dtype=theano.config.floatX),
+                               name='b', borrow=True)
+
+        # compute vector of class-membership probabilities in symbolic form
+        energy = T.dot(input, self.W) + self.b
+        energy_exp = T.exp(energy - T.max(energy, axis=2, keepdims=True))
+        pmf = energy_exp / energy_exp.sum(axis=2, keepdims=True)
+        self.p_y_given_x = pmf
+        self.params = [self.W, self.b]
+
+batch_size = 100
+n_h = 50
+
+# The Theano graph
+# Set the random number generator' seeds for consistency
+rng = numpy.random.RandomState(12345)
+
+x = T.lmatrix('x')
+mask = T.matrix('mask')
+
+# Construct an LSTM layer and a logistic regression layer
+recurrent_layer = LstmLayer(rng=rng, input=x, mask=mask, n_in=111, n_h=n_h)
+logreg_layer = LogisticRegression(rng=rng, input=recurrent_layer.output[:-1],
+                                  n_in=n_h, n_out=111)
+
+# define a cost variable to optimize
+cost = sequence_categorical_crossentropy(logreg_layer.p_y_given_x,
+                                         x[1:],
+                                         mask[1:]) / batch_size
+
+# create a list of all model parameters to be fit by gradient descent
+params = logreg_layer.params + recurrent_layer.params
+
+# create a list of gradients for all model parameters
+grads = T.grad(cost, params)
+
+learning_rate = 0.1
+updates = [
+    (param_i, param_i - learning_rate * grad_i)
+    for param_i, grad_i in zip(params, grads)
+]
+
+update_model = theano.function([x, mask], cost, updates=updates)
+
+evaluate_model = theano.function([x, mask], cost)
+
+# Generating Sequences
+x_t = T.iscalar()
+h_p = T.vector()
+c_p = T.vector()
+h_t, c_t = recurrent_layer._step(T.ones(1), x_t, h_p, c_p)
+energy = T.dot(h_t, logreg_layer.W) + logreg_layer.b
+
+energy_exp = T.exp(energy - T.max(energy, axis=1, keepdims=True))
+
+output = energy_exp / energy_exp.sum(axis=1, keepdims=True)
+single_step = theano.function([x_t, h_p, c_p], [output, h_t, c_t])
--- a/deep-learning/theano-tutorial/rnn_tutorial/simple_rnn.ipynb
+++ b/deep-learning/theano-tutorial/rnn_tutorial/simple_rnn.ipynb
--- a/deep-learning/theano-tutorial/rnn_tutorial/synthetic.py
+++ b/deep-learning/theano-tutorial/rnn_tutorial/synthetic.py
@ -0,0 +1,85 @@
+import collections
+import numpy as np
+
+
+def mackey_glass(sample_len=1000, tau=17, seed=None, n_samples = 1):
+    '''
+    mackey_glass(sample_len=1000, tau=17, seed = None, n_samples = 1) -> input
+    Generate the Mackey Glass time-series. Parameters are:
+        - sample_len: length of the time-series in timesteps. Default is 1000.
+        - tau: delay of the MG - system. Commonly used values are tau=17 (mild 
+          chaos) and tau=30 (moderate chaos). Default is 17.
+        - seed: to seed the random generator, can be used to generate the same
+          timeseries at each invocation.
+        - n_samples : number of samples to generate
+    '''
+    delta_t = 10
+    history_len = tau * delta_t 
+    # Initial conditions for the history of the system
+    timeseries = 1.2
+    
+    if seed is not None:
+        np.random.seed(seed)
+
+    samples = []
+
+    for _ in range(n_samples):
+        history = collections.deque(1.2 * np.ones(history_len) + 0.2 * \
+                                    (np.random.rand(history_len) - 0.5))
+        # Preallocate the array for the time-series
+        inp = np.zeros((sample_len,1))
+        
+        for timestep in range(sample_len):
+            for _ in range(delta_t):
+                xtau = history.popleft()
+                history.append(timeseries)
+                timeseries = history[-1] + (0.2 * xtau / (1.0 + xtau ** 10) - \
+                             0.1 * history[-1]) / delta_t
+            inp[timestep] = timeseries
+        
+        # Squash timeseries through tanh
+        inp = np.tanh(inp - 1)
+        samples.append(inp)
+    return samples
+
+
+def mso(sample_len=1000, n_samples = 1):
+    '''
+    mso(sample_len=1000, n_samples = 1) -> input
+    Generate the Multiple Sinewave Oscillator time-series, a sum of two sines
+    with incommensurable periods. Parameters are:
+        - sample_len: length of the time-series in timesteps
+        - n_samples: number of samples to generate
+    '''
+    signals = []
+    for _ in range(n_samples):
+        phase = np.random.rand()
+        x = np.atleast_2d(np.arange(sample_len)).T
+        signals.append(np.sin(0.2 * x + phase) + np.sin(0.311 * x + phase))
+    return signals
+
+
+def lorentz(sample_len=1000, sigma=10, rho=28, beta=8 / 3, step=0.01):
+    """This function generates a Lorentz time series of length sample_len,
+    with standard parameters sigma, rho and beta. 
+    """
+
+    x = np.zeros([sample_len])
+    y = np.zeros([sample_len])
+    z = np.zeros([sample_len])
+
+    # Initial conditions taken from 'Chaos and Time Series Analysis', J. Sprott
+    x[0] = 0;
+    y[0] = -0.01;
+    z[0] = 9;
+
+    for t in range(sample_len - 1):
+        x[t + 1] = x[t] + sigma * (y[t] - x[t]) * step
+        y[t + 1] = y[t] + (x[t] * (rho - z[t]) - y[t]) * step
+        z[t + 1] = z[t] + (x[t] * y[t] - beta * z[t]) * step
+
+    x.shape += (1,)
+    y.shape += (1,)
+    z.shape += (1,)
+
+    return np.concatenate((x, y, z), axis=1)