Add tensor-flow-basic-tutorials

This commit is contained in:
Tuan Vu 2016-05-14 21:07:33 -07:00
parent c1ed46758c
commit 210edeabf5
10 changed files with 1562 additions and 0 deletions

BIN
.DS_Store vendored

Binary file not shown.

View File

@ -80,6 +80,18 @@ IPython Notebook(s) demonstrating deep learning functionality.
<img src="https://avatars0.githubusercontent.com/u/15658638?v=3&s=100">
</p>
### tensor-flow-basic-tutorials
These notebooks are derived from [learningtensorflow](http://learningtensorflow.com/)
| Notebook | Description |
|--------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [tensorflow-basic](http://nbviewer.jupyter.org/github/tuanvu216/machine-learning-ipython-notebooks/blob/master/deep-learning/tensorflow-tutorials/1_tensorflow_basic.ipynb) | Learn basic operations in TensorFlow, a library for various kinds of perceptual and language understanding tasks from Google. |
| [tensorflow-array](http://nbviewer.jupyter.org/github/tuanvu216/machine-learning-ipython-notebooks/blob/master/deep-learning/tensorflow-tutorials/2_Arrays_working_with_images.ipynb) | Using array to work with images.|
| [tensorflow-placeholders](http://nbviewer.jupyter.org/github/tuanvu216/machine-learning-ipython-notebooks/blob/master/deep-learning/tensorflow-tutorials/3_Placeholders.ipynb) | Understand the concept of Placeholders |
| [tensorflow-iteration](http://nbviewer.jupyter.org/github/tuanvu216/machine-learning-ipython-notebooks/blob/master/deep-learning/tensorflow-tutorials/4_Iteration.ipynb) | Iteration in TensorFlow. |
| [tensorflow-clustering](http://nbviewer.jupyter.org/github/tuanvu216/machine-learning-ipython-notebooks/blob/master/deep-learning/tensorflow-tutorials/5_clustering.ipynb) | Implement K-mean cluster in TensorFlow. |
### tensor-flow-tutorials
| Notebook | Description |

View File

@ -0,0 +1,348 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"toc": "true"
},
"source": [
"# Table of Contents\n",
" <p><div class=\"lev1\"><a href=\"#What-does-TensorFlow-do?\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>What does TensorFlow do?</a></div><div class=\"lev1\"><a href=\"#Exercises\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Exercises</a></div><div class=\"lev2\"><a href=\"#Exercise-1\"><span class=\"toc-item-num\">2.1&nbsp;&nbsp;</span>Exercise 1</a></div><div class=\"lev2\"><a href=\"#Exercise-2\"><span class=\"toc-item-num\">2.2&nbsp;&nbsp;</span>Exercise 2</a></div><div class=\"lev2\"><a href=\"#Exercise-3\"><span class=\"toc-item-num\">2.3&nbsp;&nbsp;</span>Exercise 3</a></div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What does TensorFlow do?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- http://learningtensorflow.com/lesson2/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TensorFlow is a way of representing computation without actually performing it until asked. In this sense, it is a form of lazy computing, and it allows for some great improvements to the running of code:\n",
"\n",
"- Faster computation of complex variables\n",
"- Distributed computation across multiple systems, including GPUs.\n",
"- Reduced redundency in some computations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets have a look at this in action. First, a very basic python script:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"40\n"
]
}
],
"source": [
"x = 35\n",
"y = x + 5\n",
"print(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This script basically just says “create a variable x with value 35, set the value of a new variable y to that plus 5, which is currently 40, and print it out”. The value 40 will print out when you run this program."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<tensorflow.python.ops.variables.Variable object at 0x10ae81890>\n"
]
}
],
"source": [
"import tensorflow as tf\n",
"\n",
"x = tf.constant(35, name='x')\n",
"y = tf.Variable(x + 5, name='y')\n",
"\n",
"print(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After running this, youll get quite a funny output, something like ```<tensorflow.python.ops.variables.Variable object at 0x7f074bfd9ef0>```. This is clearly not the value 40.\n",
"\n",
"The reason why, is that our program actually does something quite different to the previous one. The code here does the following:\n",
"\n",
"- Import the tensorflow module and call it tf\n",
"- Create a constant value called x, and give it the numerical value 35\n",
"- Create a Variable called y, and define it as being the equation x + 5\n",
"- Print out the equation object for y\n",
"\n",
"The subtle difference is that y isnt given “the current value of x + 5” as in our previous program. Instead, it is effectively an equation that means “when this variable is computed, take the value of x (as it is then) and add 5 to it”. The computation of the value of y is never actually performed in the above program.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"40\n"
]
}
],
"source": [
"import tensorflow as tf\n",
"\n",
"x = tf.constant(35, name='x')\n",
"y = tf.Variable(x + 5, name='y')\n",
"\n",
"model = tf.initialize_all_variables()\n",
"\n",
"with tf.Session() as session:\n",
" session.run(model)\n",
" print(session.run(y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have removed the print(y) statement, and instead we have code that creates a session, and actually computes the value of y. This is quite a bit of boilerplate, but it works like this:\n",
"\n",
"1. Import the tensorflow module and call it tf\n",
"2. Create a constant value called x, and give it the numerical value 35\n",
"3. Create a Variable called y, and define it as being the equation x + 5\n",
"4. Initialize the variables with initialize_all_variables (we will go into more detail on this)\n",
"5. Create a session for computing the values\n",
"6. Run the model created in 4\n",
"7. Run just the variable y and print out its current value\n",
"\n",
"The step 4 above is where some magic happens. In this step, a graph is created of the dependencies between the variables. In this case, the variable y depends on the variable x, and that value is transformed by adding 5 to it. Keep in mind that this value isnt computed until step 7, as up until then, only equations and relations are computed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1\n",
"\n",
"- Constants can also be arrays. Predict what this code will do, then run it to confirm:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[40 45 50]\n"
]
}
],
"source": [
"import tensorflow as tf\n",
"\n",
"\n",
"x = tf.constant([35, 40, 45], name='x')\n",
"y = tf.Variable(x + 5, name='y')\n",
"\n",
"\n",
"model = tf.initialize_all_variables()\n",
"\n",
"with tf.Session() as session:\n",
" session.run(model)\n",
" print(session.run(y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2\n",
"- Generate a NumPy array of 10,000 random numbers (called x) and create a Variable storing the equation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$y = 5x^2 - 3x + 15$$\n",
"\n",
"You can generate the NumPy array using the following code:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([136, 612, 947, ..., 205, 238, 803])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"data = np.random.randint(1000, size=10000)\n",
"data"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 92087 1870899 4481219 ..., 209525 282521 3221651]\n"
]
}
],
"source": [
"import tensorflow as tf\n",
"\n",
"\n",
"x = tf.constant(data, name='x')\n",
"y = tf.Variable(5*(x**2) - (3*x) + 15, name='y')\n",
"\n",
"\n",
"model = tf.initialize_all_variables()\n",
"\n",
"with tf.Session() as session:\n",
" session.run(model)\n",
" print(session.run(y))"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## Exercise 3\n",
"- You can also update variables in loops, which we will use later for machine learning. Take a look at this code, and predict what it will do (then run it to check):"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1\n",
"2\n",
"3\n",
"4\n",
"5\n"
]
}
],
"source": [
"import tensorflow as tf\n",
"\n",
"x = tf.Variable(0, name='x')\n",
"\n",
"model = tf.initialize_all_variables()\n",
"\n",
"with tf.Session() as session:\n",
" for i in range(5):\n",
" session.run(model)\n",
" x = x + 1\n",
" print(session.run(x))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
},
"toc": {
"toc_cell": true,
"toc_number_sections": true,
"toc_section_display": "none",
"toc_threshold": "8",
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 0
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,291 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Iteration"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have a few examples under our belt, let us take a look at what is happening a bit more closely.\n",
"\n",
"As we have identified earlier, TensorFlow allows us to create a graph of operations and variables. These variables are called **Tensors**, and represent data, whether that is a single number, a string, a matrix, or something else. **Tensors** are combined through operations, and this whole process is modelled in a graph.\n",
"\n",
"First, make sure you have your **tensorenv** virtual environment activated, Once it is activated type in **conda install jupyter** to install jupter books.\n",
"\n",
"Then, run **jupyter notebook** to launch a browser session of the Jupyter Notebook (previously called the IPython Notebook). (If your browser doesnt open, open it and type **localhost:8888** into the browsers address bar.)\n",
"\n",
"Click “New” and then “Python 3” under “Notebooks”. This will launch a new browser tab. Give the notebook a name by clicking “Untitled” at the top and give it a name (I used “Interactive TensorFlow”).\n",
"\n",
"> If you have never used a Jupyter Notebook (or IPython Notebook) before, take a look [at this site](http://opentechschool.github.io/python-data-intro/core/notebook.html) for a brief introduction.\n",
"\n",
"Next, as before, lets create a basic TensorFlow program. One major change is the use of an **InteractiveSession**, which allows us to run variables without needing to constantly refer to the session object (less typing!). Code blocks below are broken into different cells. If you see a break in the code, you will need to run the previous cell first. Also, if you arent otherwise confident, ensure all of the code in a given block is type into a cell before you run it."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"\n",
"session = tf.InteractiveSession()\n",
"\n",
"x = tf.constant(list(range(10)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this section of code, we create an **InteractiveSession**, and then define a **constant** value, which is like a placeholder, but with a set value (that doesnt change). In the next cell, we can evaluate this constant and print the result."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0 1 2 3 4 5 6 7 8 9]\n"
]
}
],
"source": [
"print(x.eval())"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"session.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Closing sessions is quite important, and can be easy to forget. For that reason, we were using the **with** keyword in earlier tutorials to handle this. When the **with** block is finished executing, the session will be closed (this also happens if an error happens - the session is still closed).\n",
"\n",
"Now lets take a look at a larger example. In this example, we will take a very large matrix and compute on it, keeping track of when memory is used. First, lets find out how much memory our Python session is currently using:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3944091648 Kb\n"
]
}
],
"source": [
"import resource\n",
"print(\"{} Kb\".format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On my system, this is using 78496 kilobytes, after running the above code as well. Now, create a new session, and define two matrices:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"session = tf.InteractiveSession()\n",
"\n",
"X = tf.constant(np.eye(10000))\n",
"Y = tf.constant(np.random.randn(10000, 300))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets take a look at our memory usage again:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3944091648 Kb\n"
]
}
],
"source": [
"print(\"{} Kb\".format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On my system, the memory usage jumped to 885,220 Kb - those matrices are large!\n",
"\n",
"Now, lets multiply those matrices together using matmul:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"Z = tf.matmul(X, Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we check our memory usage now, we find that no more memory has been used no actual computation of Z has taken place. It is only when we evaluate the operation do we actually computer this. For an interactive session, you can just use **Z.eval()**, rather than run **session.run(Z)**. Note that you cant always rely on .eval(), as this is a shortcut that uses the “default” session, not necessarily the one you want to use."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([[-1.34447547, 0.21090638, 0.43680784, ..., 0.73037215,\n",
" -0.23213385, -1.12871138],\n",
" [ 0.64485571, -0.54936434, 0.35649891, ..., -0.12413736,\n",
" -0.58614079, -0.32230335],\n",
" [ 0.37195075, 0.20654139, -0.15417305, ..., 0.11346775,\n",
" 0.23759908, -1.27450529],\n",
" ..., \n",
" [ 0.63094722, -2.96401008, -1.58592691, ..., -1.57240119,\n",
" -0.34051408, 0.53935437],\n",
" [-1.97724198, -0.32331035, -0.98294343, ..., -1.56477455,\n",
" -0.02398526, -0.37687652],\n",
" [-0.70687284, 2.04289277, -0.57564784, ..., -0.40614639,\n",
" -0.12381717, -2.23741137]])"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Z.eval()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your computer will think for quite a while, because only now is it actually performing the action of multiplying those matrices. Checking the memory usage afterwards reveals that this computation has happened, as it now uses nearly 3Gb!"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4016365568 Kb\n"
]
}
],
"source": [
"print(\"{} Kb\".format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dont forget to close your session!"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"session.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
},
"toc": {
"toc_cell": false,
"toc_number_sections": false,
"toc_threshold": "8",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@ -0,0 +1,285 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"toc": "true"
},
"source": [
"# Table of Contents\n",
" <p><div class=\"lev1\"><a href=\"#Clustering-and-k-means\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Clustering and k-means</a></div><div class=\"lev1\"><a href=\"#Generating-Samples\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Generating Samples</a></div><div class=\"lev1\"><a href=\"#Initialisation\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Initialisation</a></div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Clustering and k-means"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now venture into our first application, which is clustering with the k-means algorithm. Clustering is a data mining exercise where we take a bunch of data and find groups of points that are similar to each other. K-means is an algorithm that is great for finding clusters in many types of datasets."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Generating Samples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First up, we are going to need to generate some samples. We could generate the samples randomly, but that is likely to either give us very sparse points, or just one big group - not very exciting for clustering.\n",
"\n",
"Instead, we are going to start by generating three centroids, and then randomly choose (with a normal distribution) around that point. First up, here is a method for doing this:\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting functions.py\n"
]
}
],
"source": [
"%%writefile functions.py\n",
"\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"\n",
"\n",
"def create_samples(n_clusters, n_samples_per_cluster, n_features, embiggen_factor, seed):\n",
" np.random.seed(seed)\n",
" slices = []\n",
" centroids = []\n",
" # Create samples for each cluster\n",
" for i in range(n_clusters):\n",
" samples = tf.random_normal((n_samples_per_cluster, n_features),\n",
" mean=0.0, stddev=5.0, dtype=tf.float32, seed=seed, name=\"cluster_{}\".format(i))\n",
" current_centroid = (np.random.random((1, n_features)) * embiggen_factor) - (embiggen_factor/2)\n",
" centroids.append(current_centroid)\n",
" samples += current_centroid\n",
" slices.append(samples)\n",
" # Create a big \"samples\" dataset\n",
" samples = tf.concat(0, slices, name='samples')\n",
" centroids = tf.concat(0, centroids, name='centroids')\n",
" return centroids, samples\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The way this works is to create **n_clusters different** centroids at random (using **np.random.random((1, n_features))**) and using those as the centre points for **tf.random_normal**. The **tf.random_normal** function generates normally distributed random values, which we then add to the current centre point. This creates a blob of points around that center. We then record the centroids (**centroids.append**) and the generated samples (**slices.append(samples)**). Finally, we create “One big list of samples” using **tf.concat**, and convert the centroids to a TensorFlow Variable as well, also using **tf.concat**.\n",
"\n",
"Saving this **create_samples** method in a file called **functions.py** allows us to import these methods into our scripts for this (and the next!) lesson. Create a new file called **generate_samples.py**, which has the following code:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting generate_samples.py\n"
]
}
],
"source": [
"%%writefile generate_samples.py\n",
"\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"\n",
"from functions import create_samples\n",
"\n",
"n_features = 2\n",
"n_clusters = 3\n",
"n_samples_per_cluster = 500\n",
"seed = 700\n",
"embiggen_factor = 70\n",
"\n",
"np.random.seed(seed)\n",
"\n",
"centroids, samples = create_samples(n_clusters, n_samples_per_cluster, n_features, embiggen_factor, seed)\n",
"\n",
"model = tf.initialize_all_variables()\n",
"with tf.Session() as session:\n",
" sample_values = session.run(samples)\n",
" centroid_values = session.run(centroids)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This just sets up the number of clusters and features (I recommend keeping the number of features at 2, allowing us to visualise them later), and the number of samples to generate. Increasing the [embiggen_factor](https://en.wiktionary.org/wiki/embiggen) will increase the “spread” or the size of the clusters. I chose a value here that provides good learning opportunity, as it generates visually identifiable clusters.\n",
"\n",
"To visualise the results, lets create a plotting function using matplotlib. Add this code to functions.py:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Appending to functions.py\n"
]
}
],
"source": [
"%%writefile -a functions.py\n",
"\n",
"\n",
"def plot_clusters(all_samples, centroids, n_samples_per_cluster):\n",
" import matplotlib.pyplot as plt\n",
" # Plot out the different clusters\n",
" # Choose a different colour for each cluster\n",
" colour = plt.cm.rainbow(np.linspace(0,1,len(centroids)))\n",
" for i, centroid in enumerate(centroids):\n",
" # Grab just the samples fpr the given cluster and plot them out with a new colour\n",
" samples = all_samples[i*n_samples_per_cluster:(i+1)*n_samples_per_cluster]\n",
" plt.scatter(samples[:,0], samples[:,1], c=colour[i])\n",
" # Also plot centroid\n",
" plt.plot(centroid[0], centroid[1], markersize=35, marker=\"x\", color='k', mew=10)\n",
" plt.plot(centroid[0], centroid[1], markersize=30, marker=\"x\", color='m', mew=5)\n",
" plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All this code does is plots out the samples from each cluster using a different colour, and creates a big magenta X where the centroid is. The centroid is given as an argument, which will be handy later on.\n",
"\n",
"Update the **generate_samples.py** to import this function by adding **from functions import plot_clusters** to the top of the file. Then, add this line of code to the bottom:\n",
"\n",
"```python\n",
"plot_clusters(sample_values, centroid_values, n_samples_per_cluster)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting generate_samples.py\n"
]
}
],
"source": [
"%%writefile generate_samples.py\n",
"\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"\n",
"from functions import create_samples\n",
"from functions import plot_clusters\n",
"\n",
"n_features = 2\n",
"n_clusters = 3\n",
"n_samples_per_cluster = 500\n",
"seed = 700\n",
"embiggen_factor = 70\n",
"\n",
"np.random.seed(seed)\n",
"\n",
"centroids, samples = create_samples(n_clusters, n_samples_per_cluster, n_features, embiggen_factor, seed)\n",
"\n",
"model = tf.initialize_all_variables()\n",
"with tf.Session() as session:\n",
" sample_values = session.run(samples)\n",
" centroid_values = session.run(centroids)\n",
" \n",
"plot_clusters(sample_values, centroid_values, n_samples_per_cluster)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!python generate_samples.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Initialisation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
},
"toc": {
"toc_cell": true,
"toc_number_sections": true,
"toc_threshold": "8",
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@ -0,0 +1,35 @@
import tensorflow as tf
import numpy as np
def create_samples(n_clusters, n_samples_per_cluster, n_features, embiggen_factor, seed):
np.random.seed(seed)
slices = []
centroids = []
# Create samples for each cluster
for i in range(n_clusters):
samples = tf.random_normal((n_samples_per_cluster, n_features),
mean=0.0, stddev=5.0, dtype=tf.float32, seed=seed, name="cluster_{}".format(i))
current_centroid = (np.random.random((1, n_features)) * embiggen_factor) - (embiggen_factor/2)
centroids.append(current_centroid)
samples += current_centroid
slices.append(samples)
# Create a big "samples" dataset
samples = tf.concat(0, slices, name='samples')
centroids = tf.concat(0, centroids, name='centroids')
return centroids, samples
def plot_clusters(all_samples, centroids, n_samples_per_cluster):
import matplotlib.pyplot as plt
# Plot out the different clusters
# Choose a different colour for each cluster
colour = plt.cm.rainbow(np.linspace(0,1,len(centroids)))
for i, centroid in enumerate(centroids):
# Grab just the samples fpr the given cluster and plot them out with a new colour
samples = all_samples[i*n_samples_per_cluster:(i+1)*n_samples_per_cluster]
plt.scatter(samples[:,0], samples[:,1], c=colour[i])
# Also plot centroid
plt.plot(centroid[0], centroid[1], markersize=35, marker="x", color='k', mew=10)
plt.plot(centroid[0], centroid[1], markersize=30, marker="x", color='m', mew=5)
plt.show()

View File

@ -0,0 +1,23 @@
import tensorflow as tf
import numpy as np
from functions import create_samples
from functions import plot_clusters
n_features = 2
n_clusters = 3
n_samples_per_cluster = 500
seed = 700
embiggen_factor = 70
np.random.seed(seed)
centroids, samples = create_samples(n_clusters, n_samples_per_cluster, n_features, embiggen_factor, seed)
model = tf.initialize_all_variables()
with tf.Session() as session:
sample_values = session.run(samples)
centroid_values = session.run(centroids)
plot_clusters(sample_values, centroid_values, n_samples_per_cluster)

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.6 MiB