data-science-ipython-notebooks/algorithmia/Algorithmia.ipynb

425 lines
132 KiB
Python
Raw Normal View History

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook was prepared by [Algorithmia](algorithmia.com). Source and license info is on [GitHub](https://github.com/donnemartin/data-science-ipython-notebooks)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Algorithmia\n",
"\n",
"Reference: [Algorithmia Documentation](http://docs.algorithmia.com/)\n",
"\n",
"Table of Contents:\n",
"1. Authentication\n",
"2. Face Detection\n",
"3. Content Summarizer\n",
"4. Latent Dirichlet Allocation\n",
"5. Optical Character Recognition"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import Algorithmia\n",
"import pprint\n",
"\n",
"pp = pprint.PrettyPrinter(indent=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Authentication\n",
"\n",
"You only need your Algorithmia API Key to run the following commands."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"API_KEY = 'YOUR_API_KEY'\n",
"# Create a client instance\n",
"client = Algorithmia.client(API_KEY)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Face Detection\n",
"\n",
"Uses a pretrained model to detect faces in a given image.\n",
"\n",
"Read more about Face Detection [here](https://algorithmia.com/algorithms/opencv/FaceDetection)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<img src=\"https://s3.amazonaws.com/algorithmia-assets/data-science-ipython-notebooks/face.jpg\"/>"
],
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.display import Image\n",
"\n",
"face_url = 'https://s3.amazonaws.com/algorithmia-assets/data-science-ipython-notebooks/face.jpg'\n",
"\n",
"# Sample Face Image\n",
"Image(url=face_url)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAIBAQEBAQIBAQECAgICAgQDAgICAgUEBAMEBgUGBgYF\nBgYGBwkIBgcJBwYGCAsICQoKCgoKBggLDAsKDAkKCgr/2wBDAQICAgICAgUDAwUKBwYHCgoKCgoK\nCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgr/wAARCAICAgIDASIA\nAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA\nAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3\nODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm\np6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEA\nAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSEx\nBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElK\nU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3\nuLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD809lu\nnyf6zd/Cv96m28P7xnEO3+//AHqmks5k/jX/AGam8mHZsjC/N/47XjxlE+gKscYSX5P3f/XWnQo7\nr5zv/F8rVaW2z+7mTzP7m/8Au/3vlp2+aGSTfGq/xbdvy1P+ECFbW58zfNCrL/C1Nt4X85diL975\n/n/hq1ImHj3/AMS7f97/AGf9mjyUfciI37v+CWpl+8p+6EZEcNsUt43f5o9/ybKdJZ/vvn+bd95V\nT7v/AAL/AIDVqOzuZot7/wASfOuz/wAd/wBmmx2yIuxPm/6Zf73+1UqX8wc41YUwty833m2q71oW\n8bvbSRpIsjbtz75f9Y1Q2dqnmL8nyq3zf7tWrO2Ty43mRtv8a/3fl+9W3NzEykWLOT/ViL5V/g3L\n/FV7ydjb0dVXyl3J/tU2ztoX+5t3bNrJ/wAB/vVe+zI9vGbaH5VVv4Pl+7XR78Uc8uaY6OGHy/mn\nX5v4tv8An+7Vq3s53Mbp8u35n+Xdt/2f/QadHZvvm8wbdr7beJE3fxfLuar1vYPDuEP+sbcrpF8v\nzN97/wBlo5f3XMTKUY/COgs3KrsdVZf4ivy7f93/AL5rQs4UdWTyd38X93b/AA/L/wB9VHZ2FzM3\nyJubZ/7NWhaw2fmR3Mz+Yqv8mxWVtyt/9jWkY+5HlM/hgWrGzzJG/wC8kWT76L8qrubb97/x3/gV\naWnw2zrHMk0Lf88n2bv++f8AgO3/AGvmqvb2f7zznRo45Nsn3Pl3f7P/AAKtyOzztCfNu+a4VF/v\nf/tLV1/hjymfIFv5KbftNhIrRv5zyxP/AOOqyr/u1oWdn5zSGayjZpIlWXf/AKvd8vy/+PVHY2aP\ndSTQw/d/ufxf3fl+983y1rW8LpDvFsq/NtTd/wB9f3fl+7/6FV81PpEn4Ik0kLzRyQ3kLSN95n8r\n93u/h+9/D8y/xL937tWNLs4f3f2l9vztHbyt/CrNt3N/31t/i+VqbZabMJI/+Wn3vKT7u5f4v/Zv\n8rWhZ2cKTL9v85W8rc7unyq3y7f++flrojy/4jGY1tMh2zW00LM3lKrf7O1du3du/wDQv738NXNQ\nsEhjuId8O7zfMl2oytHt+997aq/xfxf99VHGj7v9JTy/O+W4i3ru2/N8qqzfN95fmWrw3+Xsfaqq\n+3zVTa397+JqqNOUvdM5lOzhmSOOGZ/+WHzt/wA9Pvf3flb7rVrW+jzf8sH85Y/9U7syru3MvzL/\nAN9f3f8A2WizhheVrb7HHD+9/wBa3yru2/db/e/9mrctke5uGffH5m3/AGvu7Vb73+6zf3vmrSFO\nrT+0ECvHpVtDZ/PCyxyJ5fztt/hb5d23/e+838X/AAJtSx8N3NtqF1BcpHZyWqyrcfaF+8yr8zbf\nm/u/d/2vl/vLNZ6C7yNNcpH5km6FIkf94y7fvLuZWZv/AGar2+5v/wDSd8k0kkqtb/Ov3l+6zbfv\nNtX7v+1RUp8suX7QR5IzKMcf9mrapM6w+Y3+lPu+bb/d/wBn7zfN/wCPL8q1pabbWyWvkw3K7V3b\nvKT5Y5Wi+822Jl2ruVWX/ab7tR6en2y8kuY4fLVv4N+1pIv4dv8Atfd/75rUuNN0rbH9mj/drtW4\nlR4m2r5f8Xzf3q0JqcnMV7eezSOSa2eSGOHcuzbuXcu5fvN833l3f8CqSF9NvIZFhh87dL8turtu\n+7/tN827aq/3ambQUs/tCW0025t3m/wru+b7rbflX7rVoWlhbeZ5FnNbx3H3vK83y1+78q/N/ssy\n/wDfP92s4y5Y8vKTze6U9Qs0+2SPDpv+j27NJL5SMvyqu1lb7y/3fm/6a/7NGl6VZ/66azaaRZWV\novK2xyMzeZtX5tyrtb/x7/drUs7DRX1CPTZobiazkf5vI+bcqs393/d/hb+L71SaH/plxHcvcrGz\nXW17h/MkXd83zKqr821v975m+981bU5e7yyDm55GH/ZsKCS/ePyWb5pYtnzR/L8zbWb5vm3fe/u0\nXFg8MbI6Rtui27PN2+X83zSbt38O37v+zXTR2dnYfvpoY122q/Nb/dkXav8ACy/ekbd/Du+as/VN\nKtk0uOaa8jjaZIGlR32q0n+0rbf4Vb/v633aqNTln7xMvfMWTTfOvVmeGOTzH3eU7/el3fd+993b\n/wChf7NNuHs7Dy0toZGjVP8AlrtXcvmKzK3zf7NaEcKeV5H2lZN3zIkv3tv3drKv8P3v93/ZqjfP\neW0n2x0k2qvmJ/Ey/wAX3v4v++a29zk/vE0/cIZLBMr5dt+8b5kdfu/3f++du7/x3+7TbNIU2pbW\n0czRz/xRbd3+VX/0KnW/nXW7eiyf88P9lfuqzf8AfK1oSWyTSSWbwt/di+f5vmVtrf3vu7v++qnl\nlAPfM+OzeG4WSFG/ePu3q3/jq1JCiI3yW0jSRuzMnysyr/d/u1ekR0s/szzfL5u5n8pW/u/xL/lf\nmptvcvtjSzmbzdy/Pv8ALb7vy/erSPL8XKHNze6V7fTX8xrZ0j+b77K3zN827d5f/AfvVIsL38vn\neS0zSIskXzr8r7f9n/dWrlvpvkySW0L/AOrRdiRRNHu+8q/NUcdtDN/o1zcw7W+Vk37W2/xf+OrW\ncpfygUfs3nTb03eXtbYnyx7l/hVv/Hf+Att/u037H5LfaoYY/LjfdEu/93Iy/wB5v++q0JLbzpld\n0baqbvufeX+L5m/3Wp0sKP8Aubny5JN/72JP9T937qt/u/8AoLVTj8MpkcsSjZWFm/8ApMMe6RUX\n/eX5v/Hqjm09GjW2mtv9XLufb/wJfm/4DurSh025ePY+2T+JG/i+7u/z/ur/AHqb5LvuE0K+Yrrv\n3p/vL/7N/datH+8+Iv44GHJYJ5aoj/6vd8zu277y7m3f8Caqf2OEL5M3ysu5Xfb937zfe/4FW8yJ\nDcbHhaNZE+dH3N/tN/st95l/4DWfMkaXrFEVvvbkbbubd/ern9nFc0oyCny/ylW1hms49if6tvlf\nYy7Y921W3N/7KtU5IYV+d3kkkZfM/ey/MrbmZdv+ztZf+BLWksMybtkK+X/q1/e/3v8AP95arzQj\ny/Lugyxrt+4+3d82373/AAL7v+1/DXLKUeW8pfEVGjIybq2M/mWbpuZpW3sq/eX/ADuqO48nb+6T\n+BV2/wB1f937396rFz
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Algorithmia.apiKey = 'Simple ' + API_KEY\n",
"\n",
"input = [face_url, \"data://.algo/temp/face_result.jpg\"]\n",
"\n",
"algo = client.algo('opencv/FaceDetection/0.1.8')\n",
"algo.pipe(input)\n",
"\n",
"# Result Image is in under another algorithm name because FaceDetection calls ObjectDetectionWithModels\n",
"result_image_data_api_path = '.algo/opencv/ObjectDetectionWithModels/temp/face_result.jpg'\n",
"\n",
"# Result Image with coordinates for the detected face region\n",
"result_coord_data_api_path = '.algo/opencv/ObjectDetectionWithModels/temp/face_result.jpgrects.txt'\n",
"\n",
"result_file = Algorithmia.file(result_image_data_api_path).getBytes()\n",
"\n",
"result_coord = Algorithmia.file(result_coord_data_api_path).getString()\n",
"\n",
"# Show Result Image\n",
"Image(data=result_file)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Detected face region coordinates: 103\t88\t280\t280\n",
"\n"
]
}
],
"source": [
"# Show detected face region coordinates\n",
"print 'Detected face region coordinates: ' + result_coord"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. Content Summarizer\n",
"\n",
"SummarAI is an advanced content summarizer with the option of generating context-controlled summaries. It is based on award-winning patented methods related to artificial intelligence and vector space developed at Lawrence Berkeley National Laboratory."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Wikipedia article length: 39683\n"
]
}
],
"source": [
"# Get a Wikipedia article as content\n",
"wiki_article_name = 'Technological Singularity'\n",
"client = Algorithmia.client(API_KEY)\n",
"algo = client.algo('web/WikipediaParser/0.1.0')\n",
"wiki_page_content = algo.pipe(wiki_article_name)['content']\n",
"print 'Wikipedia article length: ' + str(len(wiki_page_content))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Wikipedia generated summary length: 406\n",
"The term was popularized by mathematician, computer scientist and science fiction author Vernor Vinge, who argues that artificial intelligence, human biological enhancement, or brain computer interfaces could be possible causes of the singularity...The technological singularity is a hypothetical event related to the advent of genuine artificial general intelligence also known as quote_tokenstrong AI\"...\n"
]
}
],
"source": [
"# Summarize the Wikipedia article\n",
"client = Algorithmia.client(API_KEY)\n",
"algo = client.algo('SummarAI/Summarizer/0.1.2')\n",
"summary = algo.pipe(wiki_page_content.encode('utf-8'))\n",
"print 'Wikipedia generated summary length: ' + str(len(summary['summarized_data']))\n",
"print summary['summarized_data']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 4. Latent Dirichlet Allocation\n",
"\n",
"This algorithm takes a group of documents (anything that is made of up text), and returns a number of topics (which are made up of a number of words) most relevant to these documents.\n",
"\n",
"Read more about Latent Dirichlet Allocation [here](https://algorithmia.com/algorithms/nlp/LDA)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of Wikipedia articles scraped: 17\n"
]
}
],
"source": [
"# Get up to 20 random Wikipedia articles\n",
"client = Algorithmia.client(API_KEY)\n",
"algo = client.algo('web/WikipediaParser/0.1.0')\n",
"random_wiki_article_names = algo.pipe({\"random\":20})\n",
"\n",
"random_wiki_articles = []\n",
"\n",
"for article_name in random_wiki_article_names:\n",
" try:\n",
" article_content = algo.pipe(article_name)['content']\n",
" random_wiki_articles.append(article_content)\n",
" except:\n",
" pass\n",
"print 'Number of Wikipedia articles scraped: ' + str(len(random_wiki_articles))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ { u'album': 9,\n",
" u'bomfunk': 10,\n",
" u'kurepa': 13,\n",
" u'mathematics': 9,\n",
" u\"mc's\": 9,\n",
" u'music': 9,\n",
" u'university': 21,\n",
" u'zagreb': 9},\n",
" { u'british': 23,\n",
" u'hindu': 115,\n",
" u'india': 47,\n",
" u'indian': 33,\n",
" u'movement': 24,\n",
" u'national': 22,\n",
" u'political': 28,\n",
" u'rss': 29},\n",
" { u'berlin': 9,\n",
" u'film': 26,\n",
" u'gotcha': 12,\n",
" u'jonathan': 33,\n",
" u'sasha': 23,\n",
" u'states': 8,\n",
" u'township': 15,\n",
" u'united': 8},\n",
" { u'belfast': 8,\n",
" u'building': 6,\n",
" u'church': 12,\n",
" u'history': 7,\n",
" u'incumbent': 7,\n",
" u'james': 5,\n",
" u\"rev'd\": 7,\n",
" u'worship': 5}]\n"
]
}
],
"source": [
"# Find topics from 20 random Wikipedia articles\n",
"algo = client.algo('nlp/LDA/0.1.0')\n",
"\n",
"input = {\"docsList\": random_wiki_articles, \"mode\": \"quality\"}\n",
"\n",
"topics = algo.pipe(input)\n",
"\n",
"pp.pprint(topics)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 5. Optical Character Recognition\n",
"\n",
"Recognize text in your images.\n",
"\n",
"Read more about Optical Character Recognition [here](https://algorithmia.com/algorithms/tesseractocr/OCR)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<img src=\"https://s3.amazonaws.com/algorithmia-assets/data-science-ipython-notebooks/businesscard.jpg\"/>"
],
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.display import Image\n",
"\n",
"businesscard_url = 'https://s3.amazonaws.com/algorithmia-assets/data-science-ipython-notebooks/businesscard.jpg'\n",
"\n",
"# Sample Image\n",
"Image(url=businesscard_url)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{ u'compound': { u'': 95,\n",
" u'206.552.9054': 84,\n",
" u'@doppenhe': 85,\n",
" u'AALGORITHMIA': 53,\n",
" u'CEO': 88,\n",
" u'DIEGO': 88,\n",
" u'OPPENHEIMER': 88,\n",
" u'Q': 55,\n",
" u'diego@algorithmia.com': 83,\n",
" u'doppenheimer': 79,\n",
" u'o': 77},\n",
" u'result': u' \\n \\n \\n \\nAALGORITHMIA \\nDIEGO \\nOPPENHEIMER \\nCEO \\ndiego@algorithmia.com \\no \\n@doppenhe \\n206.552.9054 \\nQ \\ndoppenheimer \\n \\n \\n'}\n"
]
}
],
"source": [
"input = {\"src\": businesscard_url,\n",
"\"hocr\":{\n",
"\"tessedit_create_hocr\":1,\n",
"\"tessedit_pageseg_mode\":1,\n",
"\"tessedit_char_whitelist\":\"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-@/.,:()\"}}\n",
"\n",
"algo = client.algo('tesseractocr/OCR/0.1.0')\n",
"pp.pprint(algo.pipe(input))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}