mirror of
https://github.com/awesomedata/awesome-public-datasets.git
synced 2024-04-18 07:30:58 +08:00
A topic-centric list of HQ open datasets.
Datasets | ||
LICENSE | ||
README.rst |
Awesome Public Datasets ======================= `This list of public data sources <https://github.com/caesar0301/awesome-public-datasets>`_ are collected and tidyed from blogs, answers, and user reponses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in the `awesome-awesomeness <https://github.com/bayandin/awesome-awesomeness>`_ and `another awesome <https://github.com/sindresorhus/awesome>`_ list. Agriculture ------------ * `U.S. Department of Agriculture's PLANTS Database <http://www.plants.usda.gov/dl_all.html>`_ Biology ------- * `1000 Genomes <http://www.1000genomes.org/data>`_ * `Collaborative Research in Computational Neuroscience (CRCNS) <http://crcns.org/data-sets>`_ * `Gene Expression Omnibus (GEO) <http://www.ncbi.nlm.nih.gov/geo/>`_ * `Human Microbiome Project (HMP) <http://www.hmpdacc.org/reference_genomes/reference_genomes.php>`_ * `ICOS PSP Benchmark <http://www.infobiotic.net/PSPbenchmarks/>`_ * `MIT Cancer Genomics Data <http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi>`_ * `NIH Microarray data (FTP) <http://bit.do/VVW6>`_ * `Protein Data Bank <http://pdb.org/>`_ * `PubChem Project <https://pubchem.ncbi.nlm.nih.gov/>`_ * `PubGene (now Coremine Medical) <http://www.pubgene.org/>`_ * `Stanford Microarray Data <http://smd.stanford.edu/>`_ * `The Personal Genome Project <http://www.personalgenomes.org/>`_ or `PGP <https://my.pgp-hms.org/public_genetic_data>`_ * `UCSC Public Data <http://hgdownload.soe.ucsc.edu/downloads.html>`_ * `UniGene <http://www.ncbi.nlm.nih.gov/unigene>`_ Climate/Weather --------------- * `Australian Weather <http://www.bom.gov.au/climate/dwo/>`_ * `Canadian Meteorological Centre <https://weather.gc.ca/grib/index_e.html>`_ * `Climate Data from UEA (updated at roughly monthly intervals) <http://www.cru.uea.ac.uk/cru/data/temperature/#datter and ftp://ftp.cmdl.noaa.gov/>`_ * `Global Climate Data Since 1929 <http://www.tutiempo.net/en/Climate>`_ * `NOAA Bering Sea Climate <http://www.beringclimate.noaa.gov/>`_ * `NOAA Climate Datasets <http://ncdc.noaa.gov/data-access/quick-links>`_ * `NOAA Realtime Weather Models <http://www.ncdc.noaa.gov/data-access/model-data/model-datasets/numerical-weather-prediction>`_ * `WU Historical Weather Worldwide <http://www.wunderground.com/history/index.html>`_ Complex Networks ---------------- * `CrossRef DOI URLs <https://archive.org/details/doi-urls>`_ * `DBLP Citation dataset <https://kdl.cs.umass.edu/display/public/DBLP>`_ * `NBER Patent Citations <http://nber.org/patents/>`_ * `NIST complex networks data collection <http://math.nist.gov/~RPozo/complex_datasets.html>`_ * `Protein-protein interaction network <http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/Yeast.htm>`_ * `PyPI and Maven Dependency Network <http://ogirardot.wordpress.com/2013/01/31/sharing-pypimaven-dependency-data/>`_ * `Scopus Citation Database <http://www.elsevier.com/online-tools/scopus>`_ * `Stanford GraphBase (Steven Skiena) <http://www3.cs.stonybrook.edu/~algorith/implement/graphbase/implement.shtml>`_ * `Stanford Large Network Dataset Collection <http://snap.stanford.edu/data/>`_ * `The Koblenz Network Collection <http://konect.uni-koblenz.de/>`_ * `The Laboratory for Web Algorithmics (UNIMI) <http://law.di.unimi.it/datasets.php>`_ * `UCI Network Data Repository <http://networkdata.ics.uci.edu/resources.php>`_ * `UFL sparse matrix collection <http://www.cise.ufl.edu/research/sparse/matrices/>`_ * `WSU Graph Database <http://www.eecs.wsu.edu/mgd/gdb.html>`_ Computer Networks ----------------- * `3.5B Web Pages <http://www.bigdatanews.com/profiles/blogs/big-data-set-3-5-billion-web-pages-made-available-for-all-of-us>`_ * `53.5B Web clicks <http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset>`_ * `CAIDA Internet Datasets <http://www.caida.org/data/overview/>`_ * `ClueWeb09 <http://lemurproject.org/clueweb09/>`_ * `ClueWeb12 <http://lemurproject.org/clueweb12/>`_ * `CommonCrawl Web Data <http://commoncrawl.org/the-data/get-started/>`_ * `Dartmouth CRAWDAD Wireless datasets <http://crawdad.cs.dartmouth.edu/>`_ * `OpenMobileData (MobiPerf) <https://console.developers.google.com/storage/openmobiledata_public/>`_ * `UCSD Network Telescope <http://www.caida.org/projects/network_telescope/>`_ Data Challenges --------------- * `Challenges in Machine Learning <http://www.chalearn.org/>`_ * `DrivenData Competitions for Social Good <http://www.drivendata.org/>`_ * `ICWSM Data Challenge (since 2009) <http://icwsm.cs.umbc.edu/>`_ * `Kaggle Competition Data <http://www.kaggle.com/>`_ * `KDD Cup by Tencent 2012 <https://www.kddcup2012.org/>`_ * `Localytics Data Visualization Challenge <https://github.com/localytics/data-viz-challenge>`_ * `Netflix Prize <http://www.netflixprize.com/leaderboard>`_ * `Yelp Dataset Challenge <http://www.yelp.com/dataset_challenge>`_ Economics --------- * `American Economic Ass. (AEA) <http://www.aeaweb.org/RFE/toc.php?show=complete>`_ * `EconData from UMD <http://inforumweb.umd.edu/econdata/econdata.html>`_ * `Internet Product Code Database <http://www.upcdatabase.com/>`_ Energy ------ * `AMPds <http://ampds.org/>`_ * `BLUEd <http://nilm.cmubi.org/>`_ * `COMBED <http://combed.github.io/>`_ * `Dataport <https://dataport.pecanstreet.org/>`_ * `ECO <http://www.vs.inf.ethz.ch/res/show.html?what=eco-data>`_ * `EIA <http://www.eia.gov/electricity/data/eia923/>`_ * `HFED <http://hfed.github.io/>`_ * `iAWE <http://iawe.github.io/>`_ * `Plaid <http://plaidplug.com/>`_ * `REDD <http://redd.csail.mit.edu/>`_ * `UK-Dale <http://www.doc.ic.ac.uk/~dk3810/data/>`_ Finance ------- * `CBOE Futures Exchange <http://cfe.cboe.com/Data/>`_ * `Google Finance <https://www.google.com/finance>`_ * `Google Trends <http://www.google.com/trends?q=google&ctab=0&geo=all&date=all&sort=0>`_ * `NASDAQ <https://data.nasdaq.com/>`_ * `OANDA <http://www.oanda.com/>`_ * `OSU Financial data <http://fisher.osu.edu/fin/fdf/osudata.htm>`_ * `Quandl <http://www.quandl.com/>`_ * `St Louis Federal <http://research.stlouisfed.org/fred2/>`_ * `Yahoo Finance <http://finance.yahoo.com/>`_ GeoSpace/GIS ------------ * `BODC (marine data of nearly 22,000 oceanographic vars) <http://www.bodc.ac.uk/data/where_to_find_data/>`_ * `EOSDIS <http://sedac.ciesin.columbia.edu/data/sets/browse>`_ * `Factual Global Location Data <http://www.factual.com/>`_ * `GADM (Global Administrative Areas database) <http://www.gadm.org/>`_ * `Geo Spatial Data from ASU <http://geodacenter.asu.edu/datalist/>`_ * `GeoNames (over eight million placenames) <http://www.geonames.org/>`_ * `Natural Earth (vectors and rasters of the world) <http://www.naturalearthdata.com/>`_ * `OpenStreetMap (a free map worldwide) <http://wiki.openstreetmap.org/wiki/Downloading_data>`_ * `TIGER/Line (official United States boundaries and roads) <http://www.census.gov/geo/maps-data/data/tiger-line.html>`_ * `twofishes (Foursquare's coarse geocoder) <https://github.com/foursquare/twofishes>`_ * `tz_world (timezone polygons) <http://efele.net/maps/tz/world/>`_ Government ---------- * `Australia <http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3301.02009?OpenDocument>`_ (abs.gov.au) * `Australia <https://data.gov.au/>`_ (data.gov.au) * `Canada <http://www.data.gc.ca/default.asp?lang=En&n=5BCD274E-1>`_ * `Chicago <https://data.cityofchicago.org/>`_ * `EuroStat <http://ec.europa.eu/eurostat/data/database>`_ * `FedStats <http://www.fedstats.gov/cgi-bin/A2Z.cgi>`_ * `Germany <https://www-genesis.destatis.de/genesis/online>`_ * `Glasgow, Scotland, UK <http://data.glasgow.gov.uk/>`_ * `Guardian world governments <http://www.guardian.co.uk/world-government-data>`_ * `London Datastore, U.K <http://data.london.gov.uk/dataset>`_ * `Netherlands <https://data.overheid.nl/>`_ * `New Zealand <http://www.stats.govt.nz/browse_for_stats.aspx>`_ * `NYC betanyc <http://betanyc.us/>`_ * `NYC Open Data <http://nycplatform.socrata.com/>`_ * `OECD <http://www.oecd.org/document/0,3746,en_2649_201185_46462759_1_1_1_1,00.html>`_ * `Open Government Data (OGD) Platform India <http://www.data.gov.in/>`_ * `San Francisco Data sets <http://datasf.org/>`_ * `South Africa <http://beta2.statssa.gov.za/>`_ * `The World Bank <http://wdronline.worldbank.org/>`_ * `U.K. Government Data <http://data.gov.uk/data>`_ * `U.S. American Community Survey <http://www.census.gov/acs/www/data_documentation/data_release_info/>`_ * `U.S. CDC Public Health datasets <http://www.cdc.gov/nchs/data_access/ftp_data.htm>`_ * `U.S. Census Bureau <http://www.census.gov/data.html>`_ * `U.S. Department of Housing and Urban Development (HUD) <http://www.huduser.org/portal/datasets/pdrdatas.html>`_ * `U.S. Federal Government Agencies <http://www.data.gov/metric>`_ * `U.S. Federal Government Data Catalog <http://catalog.data.gov/dataset>`_ * `U.S. Food and Drug Administration (FDA) <https://open.fda.gov/index.html>`_ * `U.S. Open Government <http://www.data.gov/open-gov/>`_ * `UK 2011 Census Open Atlas Project <http://www.alex-singleton.com/2011-census-open-atlas-project/>`_ * `United Nations <http://data.un.org/>`_ Healthcare ---------- * `EHDP Large Health Data Sets <http://www.ehdp.com/vitalnet/datasets.htm>`_ * `Gapminder <http://www.gapminder.org/data/>`_ * `Medicare Data File <http://go.cms.gov/19xxPN4>`_ Image Processing ---------------- * `2GB of photos of cats <http://137.189.35.203/WebUI/CatDatabase/catData.html>`_ * `Face Recognition Benchmark <http://www.face-rec.org/databases/>`_ * `ImageNet <http://www.image-net.org/>`_ Machine Learning ---------------- * `eBay Online Auctions <http://www.modelingonlineauctions.com/datasets>`_ * `IMDb database <http://www.imdb.com/interfaces>`_ * `Keel Repository <http://sci2s.ugr.es/keel/datasets.php>`_ * `Lending Club Loan Data <https://www.lendingclub.com/info/download-data.action>`_ * `Machine Learning Data Set Repository <http://mldata.org/>`_ * `Million Song Dataset <http://blog.echonest.com/post/3639160982/million-song-dataset>`_ * `More Song Datasets <http://labrosa.ee.columbia.edu/millionsong/pages/additional-datasets>`_ * `MovieLens Data Sets <http://datahub.io/dataset/movielens>`_ * `RDataMining R and Data Mining ebook data <http://www.rdatamining.com/data>`_ * `Registered meteorites on Earth <http://www.analyticbridge.com/profiles/blogs/registered-meteorites-that-has-impacted-on-earth-visualized>`_ * `SF restaurants dataset <http://missionlocal.org/san-francisco-restaurant-health-inspections/>`_ * `UCI Machine Learning Repository <http://archive.ics.uci.edu/ml/>`_ * `University of Toronto Delve Datasets <http://www.cs.toronto.edu/~delve/data/datasets.html>`_ * `Yahoo Ratings and Classification Data <http://webscope.sandbox.yahoo.com/catalog.php?datatype=r>`_ Museums ------- * `Cooper-Hewitt's Collection Database <https://github.com/cooperhewitt/collection>`_ * `Minneapolis Institute of Arts metadata <https://github.com/artsmia/collection>`_ * `Tate Collection metadata <https://github.com/tategallery/collection>`_ * `The Getty vocabularies <http://vocab.getty.edu>`_ Music ----- * `Discogs Data <http://www.discogs.com/data/>`_ Natural Language ---------------- * `ClueWeb09 FACC - Annotated English-language Web pages from the ClueWeb09 corpora. <http://lemurproject.org/clueweb09/FACC1/>`_ * `ClueWeb12 FACC - Annotated English-language Web pages from the ClueWeb12 corpora. <http://lemurproject.org/clueweb12/FACC1/>`_ * `DBpedia - Multi-domain ontology describing 4.58M “things” with 583M “facts”. <http://wiki.dbpedia.org/Datasets>`_ * `Flickr Personal Taxonomies - Personalized tagging pictures with descriptive labels. <http://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html>`_ * `Google Books Ngrams (2.2TB) - N-gram corpuses extracted from Google Books. <http://aws.amazon.com/datasets/8172056142375670>`_ * `Google Web 5gram (1TB, 2006) - 5-gram corpuses extracted from Web pages. <https://catalog.ldc.upenn.edu/LDC2006T13>`_ * `Gutenberg eBooks List - Basic information about each eBook from Project Gutenberg. <http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs>`_ * `Hansards - 1.3M aligned text chunks from official records of Canadian Parliament. <http://www.isi.edu/natural-language/download/hansard/>`_ * `Machine Translation - The recurring translation task focusing on European languages. <http://statmt.org/wmt11/translation-task.html#download>`_ * `SMS Spam Collection - 5,574 real English messages, labled as being ham or spam. <http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/>`_ * `USENET corpus - A collection of public USENET postings between Oct 2005 and Jan 2011. <http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html>`_ * `Wikidata - Wikipedia databases available in JSON and XML formats. <https://www.wikidata.org/wiki/Wikidata:Database_download>`_ * `Wikipedia Links data - 40 Million Entities in Context. <https://code.google.com/p/wiki-links/downloads/list>`_ * `WordNet - Databases, associated packages and tools. <http://wordnet.princeton.edu/wordnet/download/>`_ Physics ------- * `CERN Open Data Portal - Experimental data of CMS experiment, ALICE, ATLAS and LHCb <http://opendata.cern.ch/>`_ * `NSSDC (NASA) - More than 230 TB of data from about 550 space science spacecraft <http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html>`_ Public Domains -------------- * `Amazon <http://aws.amazon.com/datasets>`_ * `Archive.org Datasets <https://archive.org/details/datasets>`_ * `CMU JASA data archive <http://lib.stat.cmu.edu/jasadata/>`_ * `CMU StatLab collections <http://lib.stat.cmu.edu/datasets/>`_ * `Data360 <http://www.data360.org/index.aspx>`_ * `Datamob.org <http://datamob.org/datasets>`_ * `Google <http://www.google.com/publicdata/directory>`_ * `Infochimps <http://www.infochimps.com/>`_ * `KDNuggets Data Collections <http://www.kdnuggets.com/datasets/index.html>`_ * `Numbray <http://numbrary.com/>`_ * `Reddit Datasets <http://www.reddit.com/r/datasets>`_ * `RevolutionAnalytics Collection <http://www.revolutionanalytics.com/subscriptions/datasets/>`_ * `Sample R data sets <http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html>`_ * `Stats4Stem R data sets <http://www.stats4stem.org/data-sets.html>`_ * `StatSci.org <http://www.statsci.org/datasets.html>`_ * `The Washington Post List <http://www.washingtonpost.com/wp-srv/metro/data/datapost.html>`_ * `UCLA SOCR data collection <http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data>`_ * `UFO Reports <http://www.nuforc.org/webreports.html>`_ * `Wikileaks 911 pager intercepts <http://911.wikileaks.org/files/index.html>`_ * `Yahoo Webscope <http://webscope.sandbox.yahoo.com/catalog.php>`_ Search Engines -------------- * `Academic Torrents (UMB) - Sharing enormous datasets, for researchers, by researchers. <http://academictorrents.com/>`_ * `Archive-it - Web archiving service built at the Internet Archive <https://www.archive-it.org/explore?show=Collections>`_ * `Datahub.io - The easy way to get, use and share data <http://datahub.io/dataset>`_ * `DataMarket (Qlik) <https://datamarket.com/data/list/?q=all>`_ * `Freebase.com - A community-curated database of well-known people, places, and things <http://www.freebase.com/>`_ * `Harvard Dataverse Network - Scientific data for reproducible research <http://thedata.harvard.edu/dvn/>`_ * `ICPSR (UMICH) - Find and analyze data <http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp>`_ * `Statista.com - Statistics and Studies from more than 18,000 Sources <http://www.statista.com/>`_ Social Sciences --------------- * `Ancestry.com Forum Dataset - Forum users and messages over ten years <http://www.cs.cmu.edu/~jelsas/data/ancestry.com/>`_ * `CMU Enron Email - 150 users, mostly senior management of Enron <http://www.cs.cmu.edu/~enron/>`_ * `Facebook Data Scrape (2005) - 100 American colleges and univ. <https://archive.org/details/oxford-2005-facebook-matrix>`_ * `Facebook Social Networks from LAW (since 2007) <http://law.di.unimi.it/datasets.php>`_ * `Foursquare (2010, 2011) - Social networks, check-in locations and categories <http://www.public.asu.edu/~hgao16/dataset.html>`_ * `Foursquare from UMN/Sarwat (2013) - Users, venues, check-ins, ratings etc. <https://archive.org/details/201309_foursquare_dataset_umn>`_ * `General Social Survey (GSS, since 1972) - Demographic and attitudinal questions, topics etc. <http://www3.norc.org/GSS+Website/>`_ * `GetGlue - Users rating TV shows <http://bit.ly/1aL8XS0>`_ * `GitHub Archive - Programmers collaboration, projects progress etc. <http://www.githubarchive.org/>`_ * `Mobile Social Networks (UMASS) - Timestamped mote-to-mote (up to 27 subjects) connections <https://kdl.cs.umass.edu/display/public/Mobile+Social+Networks>`_ * `PewResearch Internet Project - A wide range of surveys about library usage, online dating etc. <http://www.pewinternet.org/datasets/pages/2/>`_ * `SourceForge.net Research Data - Historic and status statistics of projects and users' activities <http://www.nd.edu/~oss/Data/data.html>`_ * `Stack Exchange Data Explorer - User-contributed content on the Stack Exchange network <http://data.stackexchange.com/help>`_ * `Titanic Survival Data Set - Demographic information of Titanic passengers <http://bit.do/dataset-titanic-csv-zip>`_ * `Twitter Graph - Crawled entire Twitter site including tweets, user profiles, relations <http://an.kaist.ac.kr/traces/WWW2010.html>`_ * `UCB's Archive of Social Science Data (D-Lab) - Holdings of political, social and health areas <http://ucdata.berkeley.edu/>`_ * `UCLA Social Sciences Data Archive - A collection of social science data on the Web <http://dataarchives.ss.ucla.edu/Home.DataPortals.htm>`_ * `UNIMI/LAW Social Network Datasets - Social networks like amazon, LiveJournal, dblp and more <http://law.di.unimi.it/datasets.php>`_ * `Universities Worldwide - Links to 9307 Universities in 205 countries <http://univ.cc/>`_ * `UPJOHN for Employment Research - Labor surveys, unemployment spells and more <http://www.upjohn.org/erdc/erdc.html>`_ * `Yahoo Graph and Social Data - Web page graph, user-group membership, IM friends etc. <http://webscope.sandbox.yahoo.com/catalog.php?datatype=g>`_ * `Youtube Video Graph (2007,2008) - Video relations, uploaders, views, ratings and more <http://netsg.cs.sfu.ca/youtubedata/>`_ Sports ------ * `Betfair Event Results - Fully time-stamped historical Betfair exchange data <http://data.betfair.com/>`_ * `Cricsheet (baseball) - Thousands of Cricket matches <http://cricsheet.org/>`_ * `Ergast Formula 1, from 1950 up to date (API available) <http://ergast.com/mrd/db>`_ * `Football/Soccer resouces (data and APIs) <http://www.jokecamp.com/blog/guide-to-football-and-soccer-data-and-apis/>`_ * `Lahman's Baseball Database - Batting and pitching statistics, team stats etc. <http://www.seanlahman.com/baseball-archive/statistics/>`_ * `Retrosheet (baseball) - Play-by-Play files, game logs and schedules <http://www.retrosheet.org/game.htm>`_ Time Series ----------- * `Time Series data Library (TSDL), created by Rob Hyndman, MU <https://datamarket.com/data/list/?q=provider:tsdl>`_ * `UC Riverside Time Series, for classification and clustering. <http://www.cs.ucr.edu/~eamonn/time_series_data/>`_ Transportation -------------- * `Airlines OD Data 1987-2008, used by ASA Challenge 2009 <http://stat-computing.org/dataexpo/2009/the-data.html>`_ * `Bike Share Data Systems - Trip histories, site maps etc. <https://github.com/BetaNYC/Bike-Share-Data-Best-Practices/wiki/Bike-Share-Data-Systems>`_ * `Edge data for US domestic flights 1990 to 2009 <http://data.memect.com/?p=229>`_ * `Half a million Hubway rides in MA <http://hubwaydatachallenge.org/trip-history-data/>`_ * `Marine Traffic - Ship tracks, port calls and more <https://www.marinetraffic.com/de/p/api-services>`_ * `NYC Taxi Trip Data 2013 - FOIA/FOILed by Chris Whong <https://archive.org/details/nycTaxiTripData2013>`_ * `OpenFlights - Airport, airline and route data <http://openflights.org/data.html>`_ * `RITA Airline On-Time Performance data of major air carriers in US <http://www.transtats.bts.gov/Tables.asp?DB_ID=120>`_ * `RITA/BTS transport data collection (TranStat) <http://www.transtats.bts.gov/DataIndex.asp>`_ * `Transport for London (TFL) - Trip histories and networking statistics <http://www.tfl.gov.uk/info-for/open-data-users/our-feeds>`_ * `Travel Tracker Survey (TTS), Chicago, 1990, 2007-2008 <http://www.cmap.illinois.gov/data/transportation/travel-tracker-survey>`_ * `U.S. Bureau of Transportation Statistics (BTS) <http://www.rita.dot.gov/bts/>`_ * `**U.S. Freight Analysis Framework** - Freight movement among states since 2007 <http://ops.fhwa.dot.gov/freight/freight_analysis/faf/index.htm>`_ Complementary Collections ------------------------- * DataWrangling: `Some Datasets Available on the Web <http://www.datawrangling.com/some-datasets-available-on-the-web>`_ * Inside-r: `Finding Data on the Internet <http://www.inside-r.org/howto/finding-data-internet>`_ * Quora: `Where can I find large datasets open to the public? <http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public>`_ * RS.io: `100+ Interesting Data Sets for Statistics <http://rs.io/2014/05/29/list-of-data-sets.html>`_ * StaTrek: `Leveraging open data to understand urban lives <http://hsiamin.com/posts/2014/10/23/leveraging-open-data-to-understand-urban-lives/>`_