@ -38,7 +38,7 @@ Climate/Weather
* `Australian Weather <http://www.bom.gov.au/climate/dwo/>`_
* `Canadian Meteorological Centre <https://weather.gc.ca/grib/index_e.html>`_
* `Climate Data from UEA (updated at roughly monthly intervals) <http://www.cru.uea.ac.uk/cru/data/temperature/#datter and ftp://ftp.cmdl.noaa.gov/>`_
* `Global Climate Data Since 1929 <http://www.tutiempo.net/en/Climate>`_
* `NOAA Bering Sea Climate <http://www.beringclimate.noaa.gov/>`_
* `NOAA Climate Datasets <http://ncdc.noaa.gov/data-access/quick-links>`_
@ -68,15 +68,15 @@ Complex Networks
Computer Networks
* `3.5B Web Pages from CommonCraw 2012 <http://www.bigdatanews.com/profiles/blogs/big-data-set-3-5-billion-web-pages-made-available-for-all-of-us>`_
* `53.5B Web clicks of 100K users in Indiana Univ. <http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset>`_
* `CAIDA Internet Datasets <http://www.caida.org/data/overview/>`_
* `ClueWeb09 - 1B web pages <http://lemurproject.org/clueweb09/>`_
* `ClueWeb12 - 733M web pages <http://lemurproject.org/clueweb12/>`_
* `CommonCrawl Web Data over 7 years <http://commoncrawl.org/the-data/get-started/>`_
* `CRAWDAD Wireless datasets from Dartmouth Univ. <http://crawdad.cs.dartmouth.edu/>`_
* `Open Mobile Data by MobiPerf <https://console.developers.google.com/storage/openmobiledata_public/>`_
* `UCSD Network Telescope, IPv4 /8 net <http://www.caida.org/projects/network_telescope/>`_
Data Challenges
@ -95,7 +95,7 @@ Data Challenges
* `American Economic Ass. (AEA) <http://www.aeaweb.org/RFE/toc.php?show=complete>`_
* `EconData from UMD <http://inforumweb.umd.edu/econdata/econdata.html>`_
* `Internet Product Code Database <http://www.upcdatabase.com/>`_
@ -133,24 +133,24 @@ Finance
* `BODC - marine data of ~22K vars <http://www.bodc.ac.uk/data/where_to_find_data/>`_
* `EOSDIS - NASA's earth observing system data <http://sedac.ciesin.columbia.edu/data/sets/browse>`_
* `Factual Global Location Data <http://www.factual.com/>`_
* `Global Administrative Areas Database (GADM) <http://www.gadm.org/>`_
* `Geo Spatial Data from ASU <http://geodacenter.asu.edu/datalist/>`_
* `GeoNames Worldwide <http://www.geonames.org/>`_
* `Natural Earth - vectors and rasters of the world <http://www.naturalearthdata.com/>`_
* `Open Street Map (OSM) <http://wiki.openstreetmap.org/wiki/Downloading_data>`_
* `TIGER/Line - U.S. boundaries and roads <http://www.census.gov/geo/maps-data/data/tiger-line.html>`_
* `TwoFishes - Foursquare's coarse geocoder <https://github.com/foursquare/twofishes>`_
* `TZ Timezones shapfiles <http://efele.net/maps/tz/world/>`_
* `Australia (abs.gov.au) <http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3301.02009?OpenDocument>`_
* `Australia (data.gov.au) <https://data.gov.au/>`_
* `Canada <http://www.data.gc.ca/default.asp?lang=En&n=5BCD274E-1>`_
* `Chicago <https://data.cityofchicago.org/>`_
* `EuroStat <http://ec.europa.eu/eurostat/data/database>`_
@ -185,10 +185,10 @@ Government
* `EHDP Large Health Data Sets <http://www.ehdp.com/vitalnet/datasets.htm>`_
* `Gapminder World, demographic databases <http://www.gapminder.org/data/>`_
* `Medicare Coverage Database (MCD), U.S. <http://www.cms.gov/medicare-coverage-database/>`_
* `Medicare Data Engine of medicare.gov Data <https://data.medicare.gov/>`_
* `Medicare Data File <http://go.cms.gov/19xxPN4>`_
@ -196,28 +196,29 @@ Healthcare
Image Processing
* `2GB of Photos of Cats <>`_
* `Face Recognition Benchmark <http://www.face-rec.org/databases/>`_
* `ImageNet - an image database in WordNet hierarchy <http://www.image-net.org/>`_
Machine Learning
* `Delve Datasets for classification and regression (Univ. of Toronto) <http://www.cs.toronto.edu/~delve/data/datasets.html>`_
* `Discogs Monthly Data <http://www.discogs.com/data/>`_
* `Registered Meteorites on Earth <http://www.analyticbridge.com/profiles/blogs/registered-meteorites-that-has-impacted-on-earth-visualized>`_
* `Restaurants Health Score Data in San Francisco <http://missionlocal.org/san-francisco-restaurant-health-inspections/>`_
* `UCI Machine Learning Repository <http://archive.ics.uci.edu/ml/>`_
* `Yahoo! Ratings and Classification Data <http://webscope.sandbox.yahoo.com/catalog.php?datatype=r>`_
@ -229,36 +230,30 @@ Museums
* `The Getty vocabularies <http://vocab.getty.edu>`_
Natural Language
* `ClueWeb09 FACC <http://lemurproject.org/clueweb09/FACC1/>`_
* `ClueWeb12 FACC <http://lemurproject.org/clueweb12/FACC1/>`_
* `DBpedia - 4.58M “things” with 583M “facts”<http://wiki.dbpedia.org/Datasets>`_
* `Flickr Personal Taxonomies <http://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html>`_
* `Google Books Ngrams (2.2TB) <http://aws.amazon.com/datasets/8172056142375670>`_
* `Google Web 5gram (1TB, 2006) <https://catalog.ldc.upenn.edu/LDC2006T13>`_
* `Gutenberg eBooks List <http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs>`_
* `Hansards text chunks of Canadian Parliament <http://www.isi.edu/natural-language/download/hansard/>`_
* `Machine Translation of European languages <http://statmt.org/wmt11/translation-task.html#download>`_
* `SMS Spam Collection in English <http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/>`_
* `USENET postings corpus of 2005~2011 <http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html>`_
* `Wikidata - Wikipedia databases <https://www.wikidata.org/wiki/Wikidata:Database_download>`_
* `Wikipedia Links data - 40 Million Entities in Context <https://code.google.com/p/wiki-links/downloads/list>`_
* `WordNet databases and tools <http://wordnet.princeton.edu/wordnet/download/>`_
* `CERN Open Data Portal <http://opendata.cern.ch/>`_
* `NSSDC (NASA) data of 550 space spacecraft <http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html>`_
Public Domains
@ -289,77 +284,77 @@ Public Domains
Search Engines
* `Academic Torrents of data sharing from UMB <http://academictorrents.com/>`_
* `Archive-it from Internet Archive <https://www.archive-it.org/explore?show=Collections>`_
* `Datahub.io <http://datahub.io/dataset>`_
* `Freebase.com of people, places, and things <http://www.freebase.com/>`_
* `Harvard Dataverse Network of scientific data <http://thedata.harvard.edu/dvn/>`_
* `ICPSR (UMICH) <http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp>`_
* `Statista.com - statistics and Studies <http://www.statista.com/>`_
Social Sciences
* `Ancestry.com Forum Dataset over 10 years <http://www.cs.cmu.edu/~jelsas/data/ancestry.com/>`_
* `CMU Enron Email of 150 users <http://www.cs.cmu.edu/~enron/>`_
* `Facebook Data Scrape (2005) <https://archive.org/details/oxford-2005-facebook-matrix>`_
* `Facebook Social Networks from LAW (since 2007) <http://law.di.unimi.it/datasets.php>`_
* `Foursquare Social Network in 2010, 2011 <http://www.public.asu.edu/~hgao16/dataset.html>`_
* `Foursquare from UMN/Sarwat (2013) <https://archive.org/details/201309_foursquare_dataset_umn>`_
* `General Social Survey (GSS) since 1972 <http://www3.norc.org/GSS+Website/>`_
* `GetGlue - users rating TV shows <http://bit.ly/1aL8XS0>`_
* `GitHub Collaboration Archive <http://www.githubarchive.org/>`_
* `Mobile Social Networks from UMASS <https://kdl.cs.umass.edu/display/public/Mobile+Social+Networks>`_
* `PewResearch Internet Survey Project <http://www.pewinternet.org/datasets/pages/2/>`_
* `SourceForge.net Research Data <http://www.nd.edu/~oss/Data/data.html>`_
* `StackExchange Data Explorer <http://data.stackexchange.com/help>`_
* `Titanic Survival Data Set <http://bit.do/dataset-titanic-csv-zip>`_
* `Twitter Graph of entire Twitter site <http://an.kaist.ac.kr/traces/WWW2010.html>`_
* `UCB's Archive of Social Science Data (D-Lab) <http://ucdata.berkeley.edu/>`_
* `UCLA Social Sciences Data Archive <http://dataarchives.ss.ucla.edu/Home.DataPortals.htm>`_
* `UNIMI/LAW Social Network Datasets <http://law.di.unimi.it/datasets.php>`_
* `Universities Worldwide <http://univ.cc/>`_
* `UPJOHN for Labor Employment Research <http://www.upjohn.org/erdc/erdc.html>`_
* `Yahoo! Graph and Social Data <http://webscope.sandbox.yahoo.com/catalog.php?datatype=g>`_
* `Youtube Video Social Graph in 2007,2008 <http://netsg.cs.sfu.ca/youtubedata/>`_
* `Betfair Historical Exchange Data <http://data.betfair.com/>`_
* `Cricsheet Matches (baseball) <http://cricsheet.org/>`_
* `Ergast Formula 1, from 1950 up to date (API) <http://ergast.com/mrd/db>`_
* `Football/Soccer resouces (data and APIs) <http://www.jokecamp.com/blog/guide-to-football-and-soccer-data-and-apis/>`_
* `Time Series Data Library (TSDL) from MU <https://datamarket.com/data/list/?q=provider:tsdl>`_
* `UC Riverside Time Series Dataset <http://www.cs.ucr.edu/~eamonn/time_series_data/>`_
* `Airlines OD Data 1987-2008 <http://stat-computing.org/dataexpo/2009/the-data.html>`_
* `Bike Share Systems (BSS) collection <https://github.com/BetaNYC/Bike-Share-Data-Best-Practices/wiki/Bike-Share-Data-Systems>`_
* `Hubway Million Rides in MA <http://hubwaydatachallenge.org/trip-history-data/>`_
* `Marine Traffic - ship tracks, port calls and more <https://www.marinetraffic.com/de/p/api-services>`_
* `NYC Taxi Trip Data 2013 (FOIA/FOILed) <https://archive.org/details/nycTaxiTripData2013>`_
* `OpenFlights - airport, airline and route data <http://openflights.org/data.html>`_
* `RITA Airline On-Time Performance data <http://www.transtats.bts.gov/Tables.asp?DB_ID=120>`_
* `RITA/BTS transport data collection (TranStat) <http://www.transtats.bts.gov/DataIndex.asp>`_
* `U.S. Domestic Flights 1990 to 2009 <http://data.memect.com/?p=229>`_
* `U.S. Freight Analysis Framework since 2007 <http://ops.fhwa.dot.gov/freight/freight_analysis/faf/index.htm>`_
Complementary Collections
@ -369,4 +364,4 @@ Complementary Collections
* Inside-r: `Finding Data on the Internet <http://www.inside-r.org/howto/finding-data-internet>`_
* Quora: `Where can I find large datasets open to the public? <http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public>`_
* RS.io: `100+ Interesting Data Sets for Statistics <http://rs.io/2014/05/29/list-of-data-sets.html>`_
* StaTrek: `Leveraging open data to understand urban lives <http://hsiamin.com/posts/2014/10/23/leveraging-open-data-to-understand-urban-lives/>`_
* StaTrek: `Leveraging open data to understand urban lives <http://hsiamin.com/posts/2014/10/23/leveraging-open-data-to-understand-urban-lives/>`_