data-science-ipython-notebooks/core/structs.ipynb

1092 lines
21 KiB
Plaintext
Raw Normal View History

{
"metadata": {
"name": "",
2015-01-26 01:18:28 +08:00
"signature": "sha256:a7df82135c56674dfb56c48aead92e0e7fe6ce1bdc8cdbc48eb820284aa24f65"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data Structures"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2015-01-25 21:18:02 +08:00
"## tuple"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One dimensional, fixed-length, immutable sequence"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tup = (1, 2, 3)\n",
"tup"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 1,
"text": [
"(1, 2, 3)"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a_list = [1, 2, 3]"
],
"language": "python",
"metadata": {},
"outputs": [],
2015-01-25 21:18:02 +08:00
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Convert to a tuple"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"type(tuple(a_list))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 3,
"text": [
"tuple"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nested tuples"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nested_tup = ([1, 2, 3], (4, 5))\n",
"nested_tup"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 4,
"text": [
"([1, 2, 3], (4, 5))"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Access by index O(1)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nested_tup[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 5,
"text": [
"[1, 2, 3]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Although tuples are immutable, their contents can contain mutable objects"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nested_tup[0].append(4)\n",
"nested_tup[0]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 6,
"text": [
"[1, 2, 3, 4]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Concatenate tuples by creating a new tuple and copying objects"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"(1, 3, 2) + (4, 5, 6)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 7,
"text": [
"(1, 3, 2, 4, 5, 6)"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Multiply copies references to objects (objects themselves are not copied)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"('foo', 'bar') * 2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 8,
"text": [
"('foo', 'bar', 'foo', 'bar')"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unpack tuples"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a, b = nested_tup\n",
"a, b"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 9,
"text": [
"([1, 2, 3, 4], (4, 5))"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unpack nested tuples"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"(a, b, c, d), (e, f) = nested_tup\n",
"a, b, c, d, e, f"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 10,
"text": [
"(1, 2, 3, 4, 4, 5)"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A common use of variable unpacking is when iterating over sequences of tuples or lists"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"seq = [( 1, 2, 3), (4, 5, 6), (7, 8, 9)] \n",
"for a, b, c in seq: \n",
" print(a, b, c)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"(1, 2, 3)\n",
"(4, 5, 6)\n",
"(7, 8, 9)\n"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2015-01-25 21:18:02 +08:00
"## list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One dimensional, variable-length, mutable sequence"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"a_list = [1, 2, 3]\n",
"a_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 12,
"text": [
"[1, 2, 3]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Convert to a list"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"type(list(tup))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 13,
"text": [
"list"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Nested list"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nested_list = [(1, 2, 3), [4, 5]]\n",
"nested_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 14,
"text": [
"[(1, 2, 3), [4, 5]]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Access by index"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nested_list[1]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 15,
"text": [
"[4, 5]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Append an element O(1)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nested_list.append(6)\n",
"nested_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 16,
"text": [
"[(1, 2, 3), [4, 5], 6]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Insert an element at a specific index. Insert is expensive as it has to shift subsequent elements O(n)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nested_list.insert(0, 'start')\n",
"nested_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 17,
"text": [
"['start', (1, 2, 3), [4, 5], 6]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pop removes and returns an element from a specified index. Pop is expensive as it has to shift subsequent elements O(n). O(1) if pop is used for the last element"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nested_list.pop(0)\n",
"nested_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 18,
"text": [
"[(1, 2, 3), [4, 5], 6]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Remove locates the first such value and removes it O(n)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nested_list.remove((1, 2, 3))\n",
"nested_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 19,
"text": [
"[[4, 5], 6]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check if a list contains a value O(n)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"6 in nested_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 20,
"text": [
"True"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Concatenate lists by creating a new list and copying objects"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"[1, 3, 2] + [4, 5, 6]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 21,
"text": [
"[1, 3, 2, 4, 5, 6]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Extend a list by appending elements. Faster than concatenating lists."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nested_list.extend([7, 8, 9])\n",
"nested_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:18:02 +08:00
"prompt_number": 22,
"text": [
2015-01-25 21:18:02 +08:00
"[[4, 5], 6, 7, 8, 9]"
]
}
],
2015-01-25 21:18:02 +08:00
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## sort"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sort in-place O(n log n)"
]
},
2015-01-25 21:18:02 +08:00
{
"cell_type": "code",
"collapsed": false,
"input": [
"a_list = [1, 5, 3, 9, 7, 6]\n",
"a_list.sort()\n",
"a_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:25:44 +08:00
"prompt_number": 23,
2015-01-25 21:18:02 +08:00
"text": [
"[1, 3, 5, 6, 7, 9]"
]
}
],
2015-01-25 21:25:44 +08:00
"prompt_number": 23
2015-01-25 21:18:02 +08:00
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sort by secondary key: str length"
]
},
2015-01-25 21:18:02 +08:00
{
"cell_type": "code",
"collapsed": false,
"input": [
"b_list = ['the', 'quick', 'brown', 'fox', 'jumps', 'over']\n",
"b_list.sort(key=len)\n",
"b_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
2015-01-25 21:25:44 +08:00
"prompt_number": 24,
2015-01-25 21:18:02 +08:00
"text": [
"['the', 'fox', 'over', 'quick', 'brown', 'jumps']"
]
}
],
2015-01-25 21:25:44 +08:00
"prompt_number": 24
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## bisect"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The bisect module does not check whether the list is sorted, as this check would be expensive O(n). Using bisect on an unsorted list will not result in an error but could lead to incorrect results."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import bisect"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 25
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Find the location where an element should be inserted to keep the list sorted"
]
},
2015-01-25 21:25:44 +08:00
{
"cell_type": "code",
"collapsed": false,
"input": [
2015-01-25 22:04:56 +08:00
"c_list = [1, 2, 2, 3, 5, 13]\n",
"bisect.bisect(c_list, 8)"
2015-01-25 21:25:44 +08:00
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 26,
2015-01-25 21:25:44 +08:00
"text": [
2015-01-25 22:04:56 +08:00
"5"
2015-01-25 21:25:44 +08:00
]
}
],
"prompt_number": 26
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Inserts an element into a location to keep the list sorted"
]
2015-01-25 21:25:44 +08:00
},
{
"cell_type": "code",
"collapsed": false,
"input": [
2015-01-25 22:04:56 +08:00
"bisect.insort(c_list, 8)\n",
2015-01-25 21:25:44 +08:00
"c_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 27,
2015-01-25 21:25:44 +08:00
"text": [
2015-01-25 22:04:56 +08:00
"[1, 2, 2, 3, 5, 8, 13]"
2015-01-25 21:25:44 +08:00
]
}
],
"prompt_number": 27
2015-01-25 22:04:56 +08:00
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## slice"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![alt text](http://www.nltk.org/images/string-slicing.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Select a section of list types (arrays, tuples, NumPy arrays) using [start:stop]. start is included, stop is not. The number of elements in the result is stop - start."
]
},
2015-01-25 22:04:56 +08:00
{
"cell_type": "code",
"collapsed": false,
"input": [
"d_list = 'Monty Python'\n",
"d_list[6:10]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 28,
2015-01-25 22:04:56 +08:00
"text": [
"'Pyth'"
]
}
],
"prompt_number": 28
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Omit start to default to start of the sequence"
]
2015-01-25 22:04:56 +08:00
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"d_list[:5]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 29,
2015-01-25 22:04:56 +08:00
"text": [
"'Monty'"
]
}
],
"prompt_number": 29
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Omit end to default to end of the sequence"
]
2015-01-25 22:04:56 +08:00
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"d_list[6:]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 30,
2015-01-25 22:04:56 +08:00
"text": [
"'Python'"
]
}
],
"prompt_number": 30
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Negative indices slice relative to the end"
]
2015-01-25 22:04:56 +08:00
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"d_list[-12:-7]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 31,
2015-01-25 22:04:56 +08:00
"text": [
"'Monty'"
]
}
],
"prompt_number": 31
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Slice can also take a step such as the one below, which takes every other element"
]
2015-01-25 22:04:56 +08:00
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"d_list[::2]"
2015-01-25 22:04:56 +08:00
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 32,
2015-01-25 22:04:56 +08:00
"text": [
"'MnyPto'"
2015-01-25 22:04:56 +08:00
]
}
],
"prompt_number": 32
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Passing -1 for the step reverses the list or tuple:"
]
2015-01-25 22:04:56 +08:00
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"d_list[::-1]"
2015-01-25 22:04:56 +08:00
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 33,
2015-01-25 22:04:56 +08:00
"text": [
"'nohtyP ytnoM'"
2015-01-25 22:04:56 +08:00
]
}
],
"prompt_number": 33
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assign elements to a slice. Slice range does not have to equal number of elements to assign."
]
2015-01-25 22:04:56 +08:00
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"e_list = [1, 1, 2, 3, 5, 8, 13]\n",
"e_list[5:] = ['H', 'a', 'l', 'l']\n",
"e_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 34,
2015-01-25 22:04:56 +08:00
"text": [
"[1, 1, 2, 3, 5, 'H', 'a', 'l', 'l']"
]
}
],
"prompt_number": 34
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compare assigning into a slice (above) versus assigning into an index (below)"
]
2015-01-25 22:04:56 +08:00
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"e_list = [1, 1, 2, 3, 5, 8, 13]\n",
"e_list[5] = ['H', 'a', 'l', 'l']\n",
2015-01-25 22:04:56 +08:00
"e_list"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 35,
2015-01-25 22:04:56 +08:00
"text": [
"[1, 1, 2, 3, 5, ['H', 'a', 'l', 'l'], 13]"
]
}
],
"prompt_number": 35
2015-01-26 01:18:28 +08:00
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## sorted"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Return a new sorted list from the elements of a sequence"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sorted([2, 5, 1, 8, 7, 9])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 36,
"text": [
"[1, 2, 5, 7, 8, 9]"
]
}
],
"prompt_number": 36
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"sorted('foo bar baz')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 37,
"text": [
"[' ', ' ', 'a', 'a', 'b', 'b', 'f', 'o', 'o', 'r', 'z']"
]
}
],
"prompt_number": 37
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's common to get a sorted list of unique elements by combining sorted and set"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"seq = [2, 5, 1, 8, 7, 9, 9, 2, 5, 1, (4, 2), (1, 2), (1, 2)]\n",
"sorted(set(seq))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 38,
"text": [
"[1, 2, 5, 7, 8, 9, (1, 2), (4, 2)]"
]
}
],
"prompt_number": 38
}
],
"metadata": {}
}
]
}