interactive-coding-challenges/arrays-strings/compress.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<small><i>This notebook was prepared by [Donne Martin](http://donnemartin.com). Source and license info is on [GitHub](https://bit.ly/code-notes).</i></small>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem: Compress a string such that 'AAABCCDDDD' becomes 'A3B1C2D4'\n",
    "\n",
    "* [Clarifying Questions](#Clarifying-Questions)\n",
    "* [Test Cases](#Test-Cases)\n",
    "* [Algorithm: List](#Algorithm:-List)\n",
    "* [Code: List](#Code:-List)\n",
    "* [Algorithm: Byte Array](#Algorithm:-Byte-Array)\n",
    "* [Code: Byte array](#Code:-Byte-Array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clarifying Questions\n",
    "\n",
    "* Is the string ASCII (extended)?  Or Unicode?\n",
    "    * ASCII extended, which is 256 characters\n",
    "* Can you use additional data structures?  \n",
    "    * Yes\n",
    "* Is this case sensitive?\n",
    "    * Yes\n",
    "* Do you compress even if it doesn't save space?\n",
    "    * No"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test Cases\n",
    "\n",
    "* NULL\n",
    "* '' -> ''\n",
    "* 'ABC' -> 'ABC'\n",
    "* 'AAABCCDDDD' -> 'A3B1C2D4'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from nose.tools import assert_equal\n",
    "\n",
    "class Test(object):\n",
    "    def test_compress(self, func):\n",
    "        assert_equal(func(None), None)\n",
    "        assert_equal(func(''), '')\n",
    "        assert_equal(func('ABC'), 'ABC')\n",
    "        assert_equal(func('AAABCCDDDD'), 'A3B1C2D4')\n",
    "\n",
    "def run_tests(func):\n",
    "    test = Test()\n",
    "    test.test_compress(func)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Algorithm: List\n",
    "\n",
    "![alt text](https://raw.githubusercontent.com/donnemartin/algorithms-data-structures/master/images/compress_string.jpg)\n",
    "\n",
    "Since Python strings are immutable, we'll use a list of characters to exercise string manipulation.  Note using a list vs a bytearray will will result in additional space to create the list and to convert the list to a string.\n",
    "\n",
    "* If string is empty return string\n",
    "* count = 0\n",
    "* size = 0\n",
    "* last_char = first char in string\n",
    "* For each char in string\n",
    "    * If char == last_char\n",
    "        count++\n",
    "    * Else\n",
    "        size += 2\n",
    "        count++\n",
    "        last_char = char\n",
    "* size += 2\n",
    "* If the compressed string size is >= string size, return string\n",
    "* Create compressed_string\n",
    "* For each char in string\n",
    "    * If char == last_char\n",
    "        count++\n",
    "    * Else\n",
    "        * Append last_char to compressed_string\n",
    "        * append count to compressed_string\n",
    "        * count = 1\n",
    "        * last_char = char\n",
    "    * Append last_char to compressed_string\n",
    "    * append count to compressed_string\n",
    "* return compressed_string\n",
    "\n",
    "Complexity:\n",
    "* Time: O(n)\n",
    "* Space: O(2m) where m is the size of the compressed list and the resulting string copied from the list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Code: List"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def compress_string(string):\n",
    "    if string is None or len(string) == 0:\n",
    "        return string\n",
    "    size = 0\n",
    "    count = 0\n",
    "    last_char = string[0]\n",
    "    for char in string:\n",
    "        if char == last_char:\n",
    "            count += 1\n",
    "        else:\n",
    "            size += 2\n",
    "            count = 1\n",
    "            last_char = char\n",
    "    size += 2\n",
    "    if size >= len(string):\n",
    "        return string\n",
    "    compressed_string = list()\n",
    "    count = 0\n",
    "    last_char = string[0]\n",
    "    for char in string:\n",
    "        if char == last_char:\n",
    "            count += 1\n",
    "        else:\n",
    "            compressed_string.append(last_char)\n",
    "            compressed_string.append(str(count))\n",
    "            count = 1\n",
    "            last_char = char\n",
    "    compressed_string.append(last_char)\n",
    "    compressed_string.append(str(count))\n",
    "    return \"\".join(compressed_string)\n",
    "\n",
    "run_tests(compress_string)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Algorithm: Byte Array\n",
    "\n",
    "![alt text](https://raw.githubusercontent.com/donnemartin/algorithms-data-structures/master/images/compress_string.jpg)\n",
    "\n",
    "Since Python strings are immutable, we'll use a bytearray to exercise array manipulation.  As seen above, we could use a list of characters to create the compressed string then convert it to a string in the end, but this will result in additional space.\n",
    "\n",
    "The algorithm is the same, except we will need to work with the bytearray's character codes instead of the characters as we did above when we implemented this solution with a list.\n",
    "\n",
    "Complexity:\n",
    "* Time: O(n)\n",
    "* Space: O(m) where m is the size of the compressed bytearray"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Code: Byte Array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def compress_string_alt(string):\n",
    "    if string is None or len(string) == 0:\n",
    "        return string\n",
    "    size = 0\n",
    "    count = 0\n",
    "    last_char_code = string[0]\n",
    "    for char_code in string:\n",
    "        if char_code == last_char_code:\n",
    "            count += 1\n",
    "        else:\n",
    "            size += 2\n",
    "            count = 1\n",
    "            last_char_code = char_code\n",
    "    size += 2\n",
    "    if size >= len(string):\n",
    "        return string\n",
    "    compressed_string = bytearray(size)\n",
    "    pos = 0\n",
    "    count = 0\n",
    "    last_char_code = string[0]\n",
    "    for char_code in string:\n",
    "        if char_code == last_char_code:\n",
    "            count += 1\n",
    "        else:\n",
    "            compressed_string[pos] = last_char_code\n",
    "            compressed_string[pos + 1] = ord(str(count))\n",
    "            pos += 2\n",
    "            count = 1\n",
    "            last_char_code = char_code\n",
    "    compressed_string[pos] = last_char_code\n",
    "    compressed_string[pos + 1] = ord(str(count))\n",
    "    return compressed_string\n",
    "\n",
    "run_tests(compress_string_alt)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}