data-science-ipython-notebooks/deep-learning/keras-tutorial/3.2 RNN and LSTM.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Credits: Forked from [deep-learning-keras-tensorflow](https://github.com/leriomaggio/deep-learning-keras-tensorflow) by Valerio Maggio"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Recurrent Neural networks\n",
    "====="
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### RNN  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "<img src =\"imgs/rnn.png\" width=\"20%\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "keras.layers.recurrent.SimpleRNN(output_dim, \n",
    "                                 init='glorot_uniform', inner_init='orthogonal', activation='tanh', \n",
    "                                 W_regularizer=None, U_regularizer=None, b_regularizer=None, \n",
    "                                 dropout_W=0.0, dropout_U=0.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Backprop Through time  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Contrary to feed-forward neural networks, the RNN is characterized by the ability of encoding longer past information, thus very suitable for sequential models. The BPTT extends the ordinary BP algorithm to suit the recurrent neural\n",
    "architecture."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "source": [
    "<img src =\"imgs/rnn2.png\" width=\"45%\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using Theano backend.\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import theano\n",
    "import theano.tensor as T\n",
    "import keras \n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense, Activation\n",
    "from keras.preprocessing import image\n",
    "from __future__ import print_function\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "from keras.datasets import imdb\n",
    "from keras.datasets import mnist\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense, Dropout, Activation, Flatten\n",
    "from keras.layers import Convolution2D, MaxPooling2D\n",
    "from keras.utils import np_utils\n",
    "from keras.preprocessing import sequence\n",
    "from keras.layers.embeddings import Embedding\n",
    "from keras.layers.recurrent import LSTM, GRU, SimpleRNN\n",
    "from sklearn.preprocessing import LabelEncoder\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from sklearn.cross_validation import train_test_split\n",
    "from keras.layers.core import Activation, TimeDistributedDense, RepeatVector\n",
    "from keras.callbacks import EarlyStopping, ModelCheckpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### IMDB sentiment classification task"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. \n",
    "\n",
    "IMDB provided a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. \n",
    "\n",
    "There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. \n",
    "\n",
    "http://ai.stanford.edu/~amaas/data/sentiment/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Data Preparation - IMDB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading data...\n",
      "20000 train sequences\n",
      "5000 test sequences\n",
      "Example:\n",
      "[ [1, 20, 28, 716, 48, 495, 79, 27, 493, 8, 5067, 7, 50, 5, 4682, 13075, 10, 5, 852, 157, 11, 5, 1716, 3351, 10, 5, 500, 7308, 6, 33, 256, 41, 13610, 7, 17, 23, 48, 1537, 3504, 26, 269, 929, 18, 2, 7, 2, 4284, 8, 105, 5, 2, 182, 314, 38, 98, 103, 7, 36, 2184, 246, 360, 7, 19, 396, 17, 26, 269, 929, 18, 1769, 493, 6, 116, 7, 105, 5, 575, 182, 27, 5, 1002, 1085, 130, 62, 17, 24, 89, 17, 13, 381, 1421, 8, 5167, 7, 5, 2723, 38, 325, 7, 17, 23, 93, 9, 156, 252, 19, 235, 20, 28, 5, 104, 76, 7, 17, 169, 35, 14764, 17, 23, 1460, 7, 36, 2184, 934, 56, 2134, 6, 17, 891, 214, 11, 5, 1552, 6, 92, 6, 33, 256, 82, 7]]\n",
      "Pad sequences (samples x time)\n",
      "X_train shape: (20000L, 100L)\n",
      "X_test shape: (5000L, 100L)\n"
     ]
    }
   ],
   "source": [
    "max_features = 20000\n",
    "maxlen = 100  # cut texts after this number of words (among top max_features most common words)\n",
    "batch_size = 32\n",
    "\n",
    "print(\"Loading data...\")\n",
    "(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features, test_split=0.2)\n",
    "print(len(X_train), 'train sequences')\n",
    "print(len(X_test), 'test sequences')\n",
    "\n",
    "print('Example:')\n",
    "print(X_train[:1])\n",
    "\n",
    "print(\"Pad sequences (samples x time)\")\n",
    "X_train = sequence.pad_sequences(X_train, maxlen=maxlen)\n",
    "X_test = sequence.pad_sequences(X_test, maxlen=maxlen)\n",
    "print('X_train shape:', X_train.shape)\n",
    "print('X_test shape:', X_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Model building "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Build model...\n",
      "Train...\n",
      "Train on 20000 samples, validate on 5000 samples\n",
      "Epoch 1/1\n",
      "20000/20000 [==============================] - 174s - loss: 0.7213 - val_loss: 0.6179\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.History at 0x20519860>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print('Build model...')\n",
    "model = Sequential()\n",
    "model.add(Embedding(max_features, 128, input_length=maxlen))\n",
    "model.add(SimpleRNN(128))  \n",
    "model.add(Dropout(0.5))\n",
    "model.add(Dense(1))\n",
    "model.add(Activation('sigmoid'))\n",
    "\n",
    "# try using different optimizers and different optimizer configs\n",
    "model.compile(loss='binary_crossentropy', optimizer='adam', class_mode=\"binary\")\n",
    "\n",
    "print(\"Train...\")\n",
    "model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=1, validation_data=(X_test, y_test), show_accuracy=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### LSTM  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A LSTM network is an artificial neural network that contains LSTM blocks instead of, or in addition to, regular network units. A LSTM block may be described as a \"smart\" network unit that can remember a value for an arbitrary length of time. \n",
    "\n",
    "Unlike traditional RNNs, an Long short-term memory network is well-suited to learn from experience to classify, process and predict time series when there are very long time lags of unknown size between important events."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "source": [
    "<img src =\"imgs/gru.png\" width=\"60%\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "keras.layers.recurrent.LSTM(output_dim, init='glorot_uniform', inner_init='orthogonal', \n",
    "                            forget_bias_init='one', activation='tanh', \n",
    "                            inner_activation='hard_sigmoid', \n",
    "                            W_regularizer=None, U_regularizer=None, b_regularizer=None, \n",
    "                            dropout_W=0.0, dropout_U=0.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GRU  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Gated recurrent units are a gating mechanism in recurrent neural networks. \n",
    "\n",
    "Much similar to the LSTMs, they have fewer parameters than LSTM, as they lack an output gate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "keras.layers.recurrent.GRU(output_dim, init='glorot_uniform', inner_init='orthogonal', \n",
    "                           activation='tanh', inner_activation='hard_sigmoid', \n",
    "                           W_regularizer=None, U_regularizer=None, b_regularizer=None, \n",
    "                           dropout_W=0.0, dropout_U=0.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Your Turn! - Hands on Rnn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "print('Build model...')\n",
    "model = Sequential()\n",
    "model.add(Embedding(max_features, 128, input_length=maxlen))\n",
    "\n",
    "# Play with those! try and get better results!\n",
    "#model.add(SimpleRNN(128))  \n",
    "#model.add(GRU(128))  \n",
    "#model.add(LSTM(128))  \n",
    "\n",
    "model.add(Dropout(0.5))\n",
    "model.add(Dense(1))\n",
    "model.add(Activation('sigmoid'))\n",
    "\n",
    "# try using different optimizers and different optimizer configs\n",
    "model.compile(loss='binary_crossentropy', optimizer='adam', class_mode=\"binary\")\n",
    "\n",
    "print(\"Train...\")\n",
    "model.fit(X_train, y_train, batch_size=batch_size, \n",
    "          nb_epoch=4, validation_data=(X_test, y_test), show_accuracy=True)\n",
    "score, acc = model.evaluate(X_test, y_test, batch_size=batch_size, show_accuracy=True)\n",
    "print('Test score:', score)\n",
    "print('Test accuracy:', acc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sentence Generation using RNN(LSTM)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt\n",
      "598016/600901 [============================>.] - ETA: 0s('corpus length:', 600901)\n",
      "('total chars:', 59)\n",
      "('nb sequences:', 200287)\n",
      "Vectorization...\n",
      "Build model...\n",
      "()\n",
      "--------------------------------------------------\n",
      "('Iteration', 1)\n",
      "Epoch 1/1\n",
      "200287/200287 [==============================] - 1367s - loss: 1.9977  \n",
      "()\n",
      "('----- diversity:', 0.2)\n",
      "----- Generating with seed: \"nd the frenzied\n",
      "speeches of the prophets\"\n",
      "nd the frenzied\n",
      "speeches of the prophets and the present and and the preases and the soul to the sense of the morals and the some the consequence of the most and one only the some of the proment and interent of the some devertal to the self-consertion of the some deverent of the some distiness and the sense of the some of the morality of the most proves and the some of the some in the seem of the self-conception of the sees of the sense()\n",
      "()\n",
      "('----- diversity:', 0.5)\n",
      "----- Generating with seed: \"nd the frenzied\n",
      "speeches of the prophets\"\n",
      "nd the frenzied\n",
      "speeches of the prophets of the preat weak to the master of man who onow in interervain of even which who with it is the isitaial conception of the some live the contented the one who exilfacied in the sees to raters, and the passe expecience the inte that the persented in the pass, in the experious of the soulity of the waith the morally distanding of the some of the most interman only and as a period of the sense and o()\n",
      "()\n",
      "('----- diversity:', 1.0)\n",
      "----- Generating with seed: \"nd the frenzied\n",
      "speeches of the prophets\"\n",
      "nd the frenzied\n",
      "speeches of the prophets of\n",
      "ar self now no ecerspoped ivent so not,\n",
      "that itsed undiswerbatarlials. what it is altrenively evok\n",
      "now be scotnew\n",
      "prigardiness intagualds, and coumond-grow to\n",
      "the respence you as penires never wand be\n",
      "natuented ost ablinice to love worts an who itnopeancew be than mrank againribl\n",
      "some something lines in the estlenbtupenies of korils divenowry apmains, curte, were,\n",
      "ind \"feulness.  a will, natur()\n",
      "()\n",
      "('----- diversity:', 1.2)\n",
      "----- Generating with seed: \"nd the frenzied\n",
      "speeches of the prophets\"\n",
      "nd the frenzied\n",
      "speeches of the prophets, ind someaterting will stroour hast-fards and lofe beausold, in souby in ruarest, we withquus. \"the capinistin and it a mode what it be\n",
      "my oc, to th[se condectay\n",
      "of ymo fre\n",
      "dunt and so asexthersess renieved concecunaulies tound\"), from glubiakeitiouals kenty am feelitafouer deceanw or sumpind, and by afolod peall--phasoos of sole\n",
      "iy copprajakias\n",
      "in\n",
      "in adcyont-mean to prives apf-rigionall thust wi()\n",
      "()\n",
      "--------------------------------------------------\n",
      "('Iteration', 2)\n",
      "Epoch 1/1\n",
      " 40576/200287 [=====>........................] - ETA: 1064s - loss: 1.6878"
     ]
    }
   ],
   "source": [
    "from keras.models import Sequential\n",
    "from keras.layers import Dense, Activation, Dropout\n",
    "from keras.layers import LSTM\n",
    "from keras.optimizers import RMSprop\n",
    "from keras.utils.data_utils import get_file\n",
    "import numpy as np\n",
    "import random\n",
    "import sys\n",
    "\n",
    "path = get_file('nietzsche.txt', origin=\"https://s3.amazonaws.com/text-datasets/nietzsche.txt\")\n",
    "text = open(path).read().lower()\n",
    "print('corpus length:', len(text))\n",
    "\n",
    "chars = sorted(list(set(text)))\n",
    "print('total chars:', len(chars))\n",
    "char_indices = dict((c, i) for i, c in enumerate(chars))\n",
    "indices_char = dict((i, c) for i, c in enumerate(chars))\n",
    "\n",
    "# cut the text in semi-redundant sequences of maxlen characters\n",
    "maxlen = 40\n",
    "step = 3\n",
    "sentences = []\n",
    "next_chars = []\n",
    "for i in range(0, len(text) - maxlen, step):\n",
    "    sentences.append(text[i: i + maxlen])\n",
    "    next_chars.append(text[i + maxlen])\n",
    "print('nb sequences:', len(sentences))\n",
    "\n",
    "print('Vectorization...')\n",
    "X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)\n",
    "y = np.zeros((len(sentences), len(chars)), dtype=np.bool)\n",
    "for i, sentence in enumerate(sentences):\n",
    "    for t, char in enumerate(sentence):\n",
    "        X[i, t, char_indices[char]] = 1\n",
    "    y[i, char_indices[next_chars[i]]] = 1\n",
    "\n",
    "\n",
    "# build the model: a single LSTM\n",
    "print('Build model...')\n",
    "model = Sequential()\n",
    "model.add(LSTM(128, input_shape=(maxlen, len(chars))))\n",
    "model.add(Dense(len(chars)))\n",
    "model.add(Activation('softmax'))\n",
    "\n",
    "optimizer = RMSprop(lr=0.01)\n",
    "model.compile(loss='categorical_crossentropy', optimizer=optimizer)\n",
    "\n",
    "\n",
    "def sample(preds, temperature=1.0):\n",
    "    # helper function to sample an index from a probability array\n",
    "    preds = np.asarray(preds).astype('float64')\n",
    "    preds = np.log(preds) / temperature\n",
    "    exp_preds = np.exp(preds)\n",
    "    preds = exp_preds / np.sum(exp_preds)\n",
    "    probas = np.random.multinomial(1, preds, 1)\n",
    "    return np.argmax(probas)\n",
    "\n",
    "# train the model, output generated text after each iteration\n",
    "for iteration in range(1, 60):\n",
    "    print()\n",
    "    print('-' * 50)\n",
    "    print('Iteration', iteration)\n",
    "    model.fit(X, y, batch_size=128, nb_epoch=1)\n",
    "\n",
    "    start_index = random.randint(0, len(text) - maxlen - 1)\n",
    "\n",
    "    for diversity in [0.2, 0.5, 1.0, 1.2]:\n",
    "        print()\n",
    "        print('----- diversity:', diversity)\n",
    "\n",
    "        generated = ''\n",
    "        sentence = text[start_index: start_index + maxlen]\n",
    "        generated += sentence\n",
    "        print('----- Generating with seed: \"' + sentence + '\"')\n",
    "        sys.stdout.write(generated)\n",
    "\n",
    "        for i in range(400):\n",
    "            x = np.zeros((1, maxlen, len(chars)))\n",
    "            for t, char in enumerate(sentence):\n",
    "                x[0, t, char_indices[char]] = 1.\n",
    "\n",
    "            preds = model.predict(x, verbose=0)[0]\n",
    "            next_index = sample(preds, diversity)\n",
    "            next_char = indices_char[next_index]\n",
    "\n",
    "            generated += next_char\n",
    "            sentence = sentence[1:] + next_char\n",
    "\n",
    "            sys.stdout.write(next_char)\n",
    "            sys.stdout.flush()\n",
    "        print()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}