{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem: Compress a String Such that 'AAABCCDDDD' Becomes 'A3B1C2D4'. Only Compress if it Saves Space.\n", "\n", "* [Clarifying Questions](#Clarifying-Questions)\n", "* [Test Cases](#Test-Cases)\n", "* [Algorithm: List](#Algorithm:-List)\n", "* [Code: List](#Code:-List)\n", "* [Algorithm: Byte Array](#Algorithm:-Byte-Array)\n", "* [Code: Byte array](#Code:-Byte-Array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clarifying Questions\n", "\n", "* Is the string ASCII (extended)? Or Unicode?\n", " * ASCII extended, which is 256 characters\n", "* Can you use additional data structures? \n", " * Yes\n", "* Is this case sensitive?\n", " * Yes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test Cases\n", "\n", "* NULL\n", "* '' -> ''\n", "* 'ABC' -> 'ABC'\n", "* 'AAABCCDDDD' -> 'A3B1C2D4'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Algorithm: List\n", "\n", "![alt text](https://raw.githubusercontent.com/donnemartin/algorithms-data-structures/master/images/compress_string.jpg)\n", "\n", "Since Python strings are immutable, we'll use a list to exercise array manipulation. Note using a list vs a bytearray will will result in additional space to create the list and to convert the list to a string.\n", "\n", "* If string is empty return string\n", "* count = 0\n", "* size = 0\n", "* last_char = first char in string\n", "* For each char in string\n", " * If char == last_char\n", " count++\n", " * Else\n", " size += 2\n", " count++\n", " last_char = char\n", "* size += 2\n", "* If the compressed string size is >= string size, return string\n", "* Create compressed_string\n", "* For each char in string\n", " * If char == last_char\n", " count++\n", " * Else\n", " * Append last_char to compressed_string\n", " * append count to compressed_string\n", " * count = 1\n", " * last_char = char\n", " * Append last_char to compressed_string\n", " * append count to compressed_string\n", "* return compressed_string\n", "\n", "Complexity:\n", "* Time: O(n)\n", "* Space: O(m) where m is the size of the compressed bytearray" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Code: List" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def compress_string(string):\n", " if string is None or len(string) == 0:\n", " return string\n", " size = 0\n", " count = 0\n", " last_char = string[0]\n", " for char in string:\n", " if char == last_char:\n", " count += 1\n", " else:\n", " size += 2\n", " count = 1\n", " last_char = char\n", " size += 2\n", " if size >= len(string):\n", " return string\n", " compressed_string = list()\n", " count = 0\n", " last_char = string[0]\n", " for char in string:\n", " if char == last_char:\n", " count += 1\n", " else:\n", " compressed_string.append(last_char)\n", " compressed_string.append(str(count))\n", " count = 1\n", " last_char = char\n", " compressed_string.append(last_char)\n", " compressed_string.append(str(count))\n", " return \"\".join(compressed_string)\n", "\n", "string0 = None\n", "string1 = ''\n", "string2 = 'ABC'\n", "string3 = 'AAABCCDDDD'\n", "print(compress_string(string0))\n", "print(compress_string(string1))\n", "print(compress_string(string2))\n", "print(compress_string(string3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Algorithm: Byte Array\n", "\n", "![alt text](https://raw.githubusercontent.com/donnemartin/algorithms-data-structures/master/images/compress_string.jpg)\n", "\n", "Since Python strings are immutable, we'll use a bytearray to exercise array manipulation. We could use a list of characters to create the compressed string then convert it to a string in the end, but this will result in additional space.\n", "\n", "* If bytearray is empty return bytearray\n", "* count = 0\n", "* size = 0\n", "* last_char_code = first char code in bytearray\n", "* For each char code in bytearray\n", " * If char code == last_char_code\n", " count++\n", " * Else\n", " size += 2\n", " count++\n", " last_char_code = char code\n", "* size += 2\n", "* If the compressed bytearray size is >= bytearray size, return string\n", "* Create compressed_bytearray\n", "* pos = 0\n", "* For each char code in bytearray\n", " * If char code == last_char_code\n", " count++\n", " * Else\n", " * compressed_bytearray[pos] = last_char_code\n", " * compressed_bytearray[pos + 1] = count\n", " * pos += 2\n", " * count = 1\n", " * last_char_code = char code\n", " * compressed_bytearray[pos] = last_char_code\n", " * compressed_bytearray[pos + 1] = count\n", "* return compressed_bytearray\n", "\n", "Complexity:\n", "* Time: O(n)\n", "* Space: O(m) where m is the size of the compressed bytearray" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Code: Byte Array" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def compress_string(string):\n", " if string is None or len(string) == 0:\n", " return string\n", " size = 0\n", " count = 0\n", " last_char_code = string[0]\n", " for char_code in string:\n", " if char_code == last_char_code:\n", " count += 1\n", " else:\n", " size += 2\n", " count = 1\n", " last_char_code = char_code\n", " size += 2\n", " if size >= len(string):\n", " return string\n", " compressed_string = bytearray(size)\n", " pos = 0\n", " count = 0\n", " last_char_code = string[0]\n", " for char_code in string:\n", " if char_code == last_char_code:\n", " count += 1\n", " else:\n", " compressed_string[pos] = last_char_code\n", " compressed_string[pos + 1] = ord(str(count))\n", " pos += 2\n", " count = 1\n", " last_char_code = char_code\n", " compressed_string[pos] = last_char_code\n", " compressed_string[pos + 1] = ord(str(count))\n", " return compressed_string\n", "\n", "string0 = None\n", "string1 = bytearray('')\n", "string2 = bytearray('ABC')\n", "string3 = bytearray('AAABCCDDDD')\n", "print(compress_string(string0))\n", "print(compress_string(string1))\n", "print(compress_string(string2))\n", "print(compress_string(string3))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 }