2014-11-21 17:10:09 +08:00
Awesome Public Datasets
=======================
2015-08-08 00:30:45 +08:00
.. image :: https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg
:alt: Awesome
:target: https://github.com/sindresorhus/awesome
2015-10-13 15:42:43 +08:00
.. image :: https://travis-ci.org/caesar0301/awesome-public-datasets.svg
:target: https://travis-ci.org/caesar0301/awesome-public-datasets
2015-12-08 13:23:43 +08:00
2014-12-21 15:38:35 +08:00
`This list of public data sources <https://github.com/caesar0301/awesome-public-datasets> `_
2015-12-30 02:58:26 +08:00
are collected and tidied from blogs, answers, and user responses.
2014-12-21 15:38:35 +08:00
Most of the data sets listed below are free, however, some are not.
Other amazingly awesome lists can be found in the
`awesome-awesomeness <https://github.com/bayandin/awesome-awesomeness> `_ and
2015-04-28 09:54:10 +08:00
`sindresorhus's awesome <https://github.com/sindresorhus/awesome> `_ list.
2014-12-05 18:37:43 +08:00
2016-02-26 11:14:07 +08:00
.. contents :: Table of Contents
2014-12-26 22:12:33 +08:00
2016-02-04 22:06:44 +08:00
2014-12-26 22:12:33 +08:00
Agriculture
------------
2014-12-27 00:27:06 +08:00
* `U.S. Department of Agriculture's PLANTS Database <http://www.plants.usda.gov/dl_all.html> `_
2014-12-26 22:12:33 +08:00
Biology
2014-11-21 17:10:09 +08:00
-------
2014-12-27 00:27:06 +08:00
* `1000 Genomes <http://www.1000genomes.org/data> `_
2015-09-16 13:43:41 +08:00
* `American Gut (Microbiome Project) <https://github.com/biocore/American-Gut> `_
2016-01-31 21:10:01 +08:00
* `Broad Cancer Cell Line Encyclopedia (CCLE) <http://www.broadinstitute.org/ccle/home> `_
2016-03-18 21:36:16 +08:00
* `Broad Bioimage Benchmark Collection (BBBC) <https://www.broadinstitute.org/bbbc> `_
2016-01-22 00:38:29 +08:00
* `Cell Image Library <http://www.cellimagelibrary.org> `_
2014-12-27 00:27:06 +08:00
* `Collaborative Research in Computational Neuroscience (CRCNS) <http://crcns.org/data-sets> `_
2016-01-31 14:54:38 +08:00
* `Complete Genomics Public Data <http://www.completegenomics.com/public-data/69-genomes/> `_
2016-01-22 00:38:29 +08:00
* `EBI ArrayExpress <http://www.ebi.ac.uk/arrayexpress/> `_
* `EBI Protein Data Bank in Europe <http://www.ebi.ac.uk/pdbe/emdb/index.html/> `_
2016-04-15 14:02:08 +08:00
* `Electron Microscopy Pilot Image Archive (EMPIAR) <http://www.ebi.ac.uk/pdbe/emdb/empiar/> `_
2015-12-08 13:23:43 +08:00
* `ENCODE project <https://www.encodeproject.org> `_
2015-12-16 20:31:28 +08:00
* `Ensembl Genomes <http://ensemblgenomes.org/info/genomes> `_
2014-12-27 00:27:06 +08:00
* `Gene Expression Omnibus (GEO) <http://www.ncbi.nlm.nih.gov/geo/> `_
2015-10-20 15:11:34 +08:00
* `Gene Ontology (GO) <http://geneontology.org/page/download-annotations> `_
2016-02-15 01:37:37 +08:00
* `Global Biotic Interactions (GloBI) <https://github.com/jhpoelen/eol-globi-data/wiki#accessing-species-interaction-data> `_
2016-01-22 00:38:29 +08:00
* `Harvard Medical School (HMS) LINCS Project <http://lincs.hms.harvard.edu> `_
2016-01-31 14:54:38 +08:00
* `Human Genome Diversity Project <http://www.hagsc.org/hgdp/files.html> `_
2014-12-27 00:27:06 +08:00
* `Human Microbiome Project (HMP) <http://www.hmpdacc.org/reference_genomes/reference_genomes.php> `_
2015-11-21 01:15:47 +08:00
* `ICOS PSP Benchmark <http://ico2s.org/datasets/psp_benchmark.html> `_
2016-01-31 14:54:38 +08:00
* `International HapMap Project <http://hapmap.ncbi.nlm.nih.gov/downloads/index.html.en> `_
2016-01-22 00:38:29 +08:00
* `Journal of Cell Biology DataViewer <http://jcb-dataviewer.rupress.org> `_
2014-12-27 00:27:06 +08:00
* `MIT Cancer Genomics Data <http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi> `_
2016-02-06 06:25:29 +08:00
* `NCBI Proteins <http://www.ncbi.nlm.nih.gov/guide/proteins/#databases> `_
* `NCBI Taxonomy <http://www.ncbi.nlm.nih.gov/taxonomy> `_
2016-01-31 11:11:43 +08:00
* `NeuroData <http://neurodata.io> `_
2015-12-22 13:00:21 +08:00
* `NIH Microarray data <http://bit.do/VVW6> `_ or `FTP <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/> `_
2015-07-31 00:31:56 +08:00
* `OpenSNP genotypes data <https://opensnp.org/> `_
2016-01-02 20:23:00 +08:00
* `Pathguid - Protein-Protein Interactions Catalog <http://www.pathguide.org/> `_
2015-11-21 01:15:47 +08:00
* `Protein Data Bank <http://www.rcsb.org/> `_
2016-02-11 16:34:14 +08:00
* `Psychiatric Genomics Consortium <https://www.med.unc.edu/pgc/downloads> `_
2014-12-27 00:27:06 +08:00
* `PubChem Project <https://pubchem.ncbi.nlm.nih.gov/> `_
* `PubGene (now Coremine Medical) <http://www.pubgene.org/> `_
2016-01-31 21:10:01 +08:00
* `Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) <http://cancer.sanger.ac.uk/cosmic> `_
* `Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC) <http://www.cancerrxgene.org/> `_
2015-12-08 13:23:43 +08:00
* `Sequence Read Archive(SRA) <http://www.ncbi.nlm.nih.gov/Traces/sra/> `_
2014-12-27 00:27:06 +08:00
* `Stanford Microarray Data <http://smd.stanford.edu/> `_
2016-01-22 00:38:29 +08:00
* `Stowers Institute Original Data Repository <http://www.stowers.org/research/publications/odr> `_
* `Systems Science of Biological Dynamics (SSBD) Database <http://ssbd.qbic.riken.jp> `_
2016-02-04 22:14:31 +08:00
* `Temple University Hospital EEG Database <https://www.nedcdata.org/drupal/node/12> `_
2016-01-31 21:10:01 +08:00
* `The Cancer Genome Atlas (TCGA), available via Broad GDAC <https://gdac.broadinstitute.org/> `_
2015-12-08 13:23:43 +08:00
* `The Catalogue of Life <http://www.catalogueoflife.org/content/annual-checklist-archive> `_
2014-12-27 00:34:38 +08:00
* `The Personal Genome Project <http://www.personalgenomes.org/> `_ or `PGP <https://my.pgp-hms.org/public_genetic_data> `_
2014-12-27 00:27:06 +08:00
* `UCSC Public Data <http://hgdownload.soe.ucsc.edu/downloads.html> `_
2016-02-06 06:27:41 +08:00
* `Universal Protein Resource (UnitProt) <http://www.uniprot.org/downloads> `_
2014-12-27 00:27:06 +08:00
* `UniGene <http://www.ncbi.nlm.nih.gov/unigene> `_
2014-12-26 22:12:33 +08:00
Climate/Weather
---------------
2014-12-27 00:27:06 +08:00
* `Australian Weather <http://www.bom.gov.au/climate/dwo/> `_
2015-05-29 00:01:26 +08:00
* `Brazilian Weather - Historical data (In Portuguese) <http://sinda.crn2.inpe.br/PCD/SITE/novo/site/> `_
2015-12-23 16:15:22 +08:00
* `Canadian Meteorological Centre <http://weather.gc.ca/grib/index_e.html> `_
2015-12-22 00:36:13 +08:00
* `Climate Data from UEA (updated monthly) <https://crudata.uea.ac.uk/cru/data/temperature/#datter and ftp://ftp.cmdl.noaa.gov/> `_
2015-12-24 08:22:56 +08:00
* `European Climate Assessment & Dataset <http://eca.knmi.nl/> `_
2016-05-12 22:45:17 +08:00
* `DWD Climate Data Center (CDC) - Deutscher Wetterdienst <http://ftp-cdc.dwd.de/pub/CDC/> `_
2015-11-21 01:15:47 +08:00
* `Global Climate Data Since 1929 <http://en.tutiempo.net/climate> `_
2015-04-06 21:45:29 +08:00
* `NASA Global Imagery Browse Services <https://wiki.earthdata.nasa.gov/display/GIBS> `_
2014-12-27 00:27:06 +08:00
* `NOAA Bering Sea Climate <http://www.beringclimate.noaa.gov/> `_
2015-11-21 01:15:47 +08:00
* `NOAA Climate Datasets <http://www.ncdc.noaa.gov/data-access/quick-links> `_
2014-12-27 00:27:06 +08:00
* `NOAA Realtime Weather Models <http://www.ncdc.noaa.gov/data-access/model-data/model-datasets/numerical-weather-prediction> `_
2015-04-21 14:54:28 +08:00
* `The World Bank Open Data Resources for Climate Change <http://data.worldbank.org/developers/climate-data-api> `_
* `UEA Climatic Research Unit <http://www.cru.uea.ac.uk/data> `_
2015-12-08 13:23:43 +08:00
* `WorldClim - Global Climate Data <http://www.worldclim.org> `_
2015-12-23 16:04:01 +08:00
* `WU Historical Weather Worldwide <https://www.wunderground.com/history/index.html> `_
2014-11-21 17:10:09 +08:00
2014-12-05 18:37:43 +08:00
2014-12-26 22:12:33 +08:00
Complex Networks
----------------
2016-02-25 19:28:36 +08:00
* `AMiner Citation Network Dataset <http://aminer.org/citation> `_
2014-12-27 00:27:06 +08:00
* `CrossRef DOI URLs <https://archive.org/details/doi-urls> `_
* `DBLP Citation dataset <https://kdl.cs.umass.edu/display/public/DBLP> `_
* `NBER Patent Citations <http://nber.org/patents/> `_
2016-02-25 07:21:28 +08:00
* `Network Repository with Interactive Exploratory Analysis Tools <http://networkrepository.com/> `_
2014-12-27 00:27:06 +08:00
* `NIST complex networks data collection <http://math.nist.gov/~RPozo/complex_datasets.html> `_
* `Protein-protein interaction network <http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/Yeast.htm> `_
2015-11-21 01:15:47 +08:00
* `PyPI and Maven Dependency Network <https://ogirardot.wordpress.com/2013/01/31/sharing-pypimaven-dependency-data/> `_
* `Scopus Citation Database <https://www.elsevier.com/solutions/scopus> `_
2015-09-16 13:43:41 +08:00
* `Small Network Data <http://www-personal.umich.edu/~mejn/netdata/> `_
2014-12-27 00:27:06 +08:00
* `Stanford GraphBase (Steven Skiena) <http://www3.cs.stonybrook.edu/~algorith/implement/graphbase/implement.shtml> `_
* `Stanford Large Network Dataset Collection <http://snap.stanford.edu/data/> `_
2016-02-15 01:37:37 +08:00
* `Stanford Longitudinal Network Data Sources <http://stanford.edu/group/sonia/dataSources/index.html> `_
2014-12-27 00:27:06 +08:00
* `The Koblenz Network Collection <http://konect.uni-koblenz.de/> `_
* `The Laboratory for Web Algorithmics (UNIMI) <http://law.di.unimi.it/datasets.php> `_
2015-04-14 15:51:04 +08:00
* `The Nexus Network Repository <http://nexus.igraph.org/> `_
2015-09-16 13:43:41 +08:00
* `UCI Network Data Repository <https://networkdata.ics.uci.edu/resources.php> `_
2014-12-27 00:27:06 +08:00
* `UFL sparse matrix collection <http://www.cise.ufl.edu/research/sparse/matrices/> `_
* `WSU Graph Database <http://www.eecs.wsu.edu/mgd/gdb.html> `_
2016-02-20 07:32:46 +08:00
* `DIMACS Road Networks Collection <http://www.dis.uniroma1.it/challenge9/download.shtml> `_
2014-12-26 22:12:33 +08:00
Computer Networks
-----------------
2015-01-31 17:18:37 +08:00
* `3.5B Web Pages from CommonCraw 2012 <http://www.bigdatanews.com/profiles/blogs/big-data-set-3-5-billion-web-pages-made-available-for-all-of-us> `_
2015-11-21 01:15:47 +08:00
* `53.5B Web clicks of 100K users in Indiana Univ. <http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/> `_
2015-01-31 17:18:37 +08:00
* `CAIDA Internet Datasets <http://www.caida.org/data/overview/> `_
* `ClueWeb09 - 1B web pages <http://lemurproject.org/clueweb09/> `_
* `ClueWeb12 - 733M web pages <http://lemurproject.org/clueweb12/> `_
* `CommonCrawl Web Data over 7 years <http://commoncrawl.org/the-data/get-started/> `_
2015-11-21 01:15:47 +08:00
* `CRAWDAD Wireless datasets from Dartmouth Univ. <https://crawdad.cs.dartmouth.edu/> `_
* `Criteo click-through data <http://labs.criteo.com/2015/03/criteo-releases-its-new-dataset/> `_
2015-01-31 17:18:37 +08:00
* `Open Mobile Data by MobiPerf <https://console.developers.google.com/storage/openmobiledata_public/> `_
2016-02-10 13:04:27 +08:00
* `Rapid7 Sonar Internet Scans <https://sonar.labs.rapid7.com/> `_
2015-01-31 17:18:37 +08:00
* `UCSD Network Telescope, IPv4 /8 net <http://www.caida.org/projects/network_telescope/> `_
2014-12-26 22:12:33 +08:00
2015-10-29 14:19:04 +08:00
Contextual Data
---------------
* `Context-aware data sets from five domains <http://students.depaul.edu/~yzheng8/DataSets.html#Data> `_ or `GitHub <https://github.com/irecsys/CARSKit/tree/master/context-aware_data_sets> `_
2014-12-26 22:12:33 +08:00
Data Challenges
---------------
2014-12-27 00:27:06 +08:00
* `Challenges in Machine Learning <http://www.chalearn.org/> `_
2015-11-21 01:15:47 +08:00
* `CrowdANALYTIX dataX <http://data.crowdanalytix.com> `_
2015-12-08 13:23:43 +08:00
* `D4D Challenge of Orange <http://www.d4d.orange.com/en/home> `_
2014-12-27 00:27:06 +08:00
* `DrivenData Competitions for Social Good <http://www.drivendata.org/> `_
* `ICWSM Data Challenge (since 2009) <http://icwsm.cs.umbc.edu/> `_
2015-11-21 01:15:47 +08:00
* `Kaggle Competition Data <https://www.kaggle.com/> `_
2015-12-23 16:04:01 +08:00
* `KDD Cup by Tencent 2012 <http://www.kddcup2012.org/> `_
2014-12-27 00:27:06 +08:00
* `Localytics Data Visualization Challenge <https://github.com/localytics/data-viz-challenge> `_
* `Netflix Prize <http://www.netflixprize.com/leaderboard> `_
2015-11-21 01:15:47 +08:00
* `Space Apps Challenge <https://2015.spaceappschallenge.org> `_
2015-04-15 21:19:21 +08:00
* `Telecom Italia Big Data Challenge <https://dandelion.eu/datamine/open-big-data/> `_
2014-12-27 00:27:06 +08:00
* `Yelp Dataset Challenge <http://www.yelp.com/dataset_challenge> `_
2016-02-10 13:09:46 +08:00
* `Bruteforce Database <https://github.com/duyetdev/bruteforce-database> `_
2014-12-26 22:12:33 +08:00
2014-11-21 17:10:09 +08:00
Economics
---------
2015-11-21 01:15:47 +08:00
* `American Economic Ass (AEA) <https://www.aeaweb.org/RFE/toc.php?show=complete> `_
2014-12-27 00:27:06 +08:00
* `EconData from UMD <http://inforumweb.umd.edu/econdata/econdata.html> `_
2016-01-02 20:23:00 +08:00
* `Economic Freedom of the World Data <http://www.freetheworld.com/datasets_efw.html> `_
* `Historical MacroEconomc Statistics <http://www.historicalstatistics.org/> `_
* `International Trade Statistics <http://www.econostatistics.co.za/> `_
2014-12-27 00:27:06 +08:00
* `Internet Product Code Database <http://www.upcdatabase.com/> `_
2016-01-01 03:56:58 +08:00
* `Joint External Debt Data Hub <http://www.jedh.org/> `_
2016-01-02 20:23:00 +08:00
* `Jon Haveman International Trade Data Links <http://www.macalester.edu/research/economics/PAGE/HAVEMAN/Trade.Resources/TradeData.html> `_
* `OpenCorporates Database of Companies in the World <https://opencorporates.com/> `_
* `Our World in Data <http://ourworldindata.org/> `_
* `SciencesPo World Trade Gravity Datasets <http://econ.sciences-po.fr/thierry-mayer/data> `_
2016-01-05 14:11:23 +08:00
* `The Atlas of Economic Complexity <http://atlas.cid.harvard.edu> `_
* `The Center for International Data <http://cid.econ.ucdavis.edu> `_
* `The Observatory of Economic Complexity <http://atlas.media.mit.edu/en/> `_
* `UN Commodity Trade Statistics <http://comtrade.un.org/db/> `_
* `UN Human Development Reports <http://hdr.undp.org/en> `_
2016-01-01 03:56:58 +08:00
2014-12-05 18:37:43 +08:00
2016-02-02 06:42:02 +08:00
Education
------------
* `Student Data from Free Code Camp <http://academictorrents.com/details/030b10dad0846b5aecc3905692890fb02404adbf> `_
2014-12-02 02:52:10 +08:00
Energy
------
2014-12-27 00:27:06 +08:00
* `AMPds <http://ampds.org/> `_
* `BLUEd <http://nilm.cmubi.org/> `_
* `COMBED <http://combed.github.io/> `_
* `Dataport <https://dataport.pecanstreet.org/> `_
* `ECO <http://www.vs.inf.ethz.ch/res/show.html?what=eco-data> `_
* `EIA <http://www.eia.gov/electricity/data/eia923/> `_
* `HFED <http://hfed.github.io/> `_
* `iAWE <http://iawe.github.io/> `_
* `Plaid <http://plaidplug.com/> `_
* `REDD <http://redd.csail.mit.edu/> `_
* `UK-Dale <http://www.doc.ic.ac.uk/~dk3810/data/> `_
2014-12-02 02:52:10 +08:00
2014-11-21 17:10:09 +08:00
Finance
-------
2014-12-27 00:27:06 +08:00
* `CBOE Futures Exchange <http://cfe.cboe.com/Data/> `_
* `Google Finance <https://www.google.com/finance> `_
* `Google Trends <http://www.google.com/trends?q=google&ctab=0&geo=all&date=all&sort=0> `_
* `NASDAQ <https://data.nasdaq.com/> `_
* `OANDA <http://www.oanda.com/> `_
2014-12-27 00:34:38 +08:00
* `OSU Financial data <http://fisher.osu.edu/fin/fdf/osudata.htm> `_
2015-11-21 01:15:47 +08:00
* `Quandl <https://www.quandl.com/> `_
* `St Louis Federal <https://research.stlouisfed.org/fred2/> `_
2014-12-27 00:27:06 +08:00
* `Yahoo Finance <http://finance.yahoo.com/> `_
2016-04-27 02:54:35 +08:00
* `NYSE Market Data <ftp://ftp.nyxdata.com> `_
2014-11-21 17:10:09 +08:00
2016-01-01 03:56:58 +08:00
2015-04-28 01:21:18 +08:00
Geology
-------
2015-12-08 13:23:43 +08:00
2016-01-02 20:23:00 +08:00
* `Earth Models <http://www.earthmodels.org/> `_
2015-11-21 01:15:47 +08:00
* `Smithsonian Institution Global Volcano and Eruption Database <http://volcano.si.edu/> `_
2015-12-08 13:23:43 +08:00
* `USGS Earthquake Archives <http://earthquake.usgs.gov/earthquakes/search/> `_
2015-04-28 01:21:18 +08:00
2014-12-05 18:37:43 +08:00
2016-02-14 01:25:23 +08:00
GIS/Environment
---------------
2014-11-21 17:10:09 +08:00
2015-01-31 17:18:37 +08:00
* `BODC - marine data of ~22K vars <http://www.bodc.ac.uk/data/where_to_find_data/> `_
2015-02-05 10:27:24 +08:00
* `Cambridge, MA, US, GIS data on GitHub <http://cambridgegis.github.io/gisdata.html> `_
2016-01-31 23:07:40 +08:00
* `EOSDIS - NASA's earth observing system data <http://sedac.ciesin.columbia.edu/data/sets/browse> `_
2015-11-21 01:15:47 +08:00
* `Factual Global Location Data <https://www.factual.com/> `_
2015-01-31 17:18:37 +08:00
* `Geo Spatial Data from ASU <http://geodacenter.asu.edu/datalist/> `_
2016-01-02 20:23:00 +08:00
* `Geo Wiki Project - Citizen-driven Environmental Monitoring <http://geo-wiki.org/> `_
2016-02-04 22:20:49 +08:00
* `GeoFabrik - OSM data extracted to a variety of formats and areas <http://download.geofabrik.de/> `_
2015-01-31 17:18:37 +08:00
* `GeoNames Worldwide <http://www.geonames.org/> `_
2015-02-05 10:27:24 +08:00
* `Global Administrative Areas Database (GADM) <http://www.gadm.org/> `_
2016-02-25 18:48:48 +08:00
* `Homeland Infrastructure Foundation-Level Data <https://hifld-dhs-gii.opendata.arcgis.com/> `_
2016-02-14 01:25:23 +08:00
* `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements <https://imos.aodn.org.au> `_ or `on S3 <http://imos-data.s3-website-ap-southeast-2.amazonaws.com/> `_
2016-01-02 20:23:00 +08:00
* `International Institute for Systems Analysis - GIS Datasets <http://www.iiasa.ac.at/web/home/research/modelsData/Models--Tools--Data.en.html> `_
2015-03-23 10:30:09 +08:00
* `Landsat 8 on AWS <https://aws.amazon.com/public-data-sets/landsat/> `_
2015-12-08 13:23:43 +08:00
* `List of all countries in all languages <https://github.com/umpirsky/country-list> `_
2016-02-14 01:09:49 +08:00
* `Marinexplore - Open Oceanographic Data <http://marinexplore.org/> `_
2016-02-03 01:21:51 +08:00
* `National Weather Service GIS Data Portal <http://www.nws.noaa.gov/gis/> `_
2015-01-31 17:18:37 +08:00
* `Natural Earth - vectors and rasters of the world <http://www.naturalearthdata.com/> `_
2015-12-08 13:23:43 +08:00
* `OpenAddresses <http://openaddresses.io/> `_
2015-07-10 03:49:48 +08:00
* `OpenStreetMap (OSM) <http://wiki.openstreetmap.org/wiki/Downloading_data> `_
2016-02-04 22:12:01 +08:00
* `Pleiades - Gazetteer and graph of ancient places <http://pleiades.stoa.org/> `_
2015-12-22 00:36:13 +08:00
* `Reverse Geocoder using OSM data <https://github.com/kno10/reversegeocode> `_ & `additional high-resolution data files <http://data.ub.uni-muenchen.de/61/> `_
2015-01-31 17:18:37 +08:00
* `TIGER/Line - U.S. boundaries and roads <http://www.census.gov/geo/maps-data/data/tiger-line.html> `_
* `TwoFishes - Foursquare's coarse geocoder <https://github.com/foursquare/twofishes> `_
* `TZ Timezones shapfiles <http://efele.net/maps/tz/world/> `_
2016-01-01 03:56:58 +08:00
* `UN Environmental Data <http://geodata.grid.unep.ch/> `_
2016-02-03 01:21:51 +08:00
* `World boundaries from the U.S. Department of State <https://hiu.state.gov/data/data.aspx> `_
2016-01-02 20:23:00 +08:00
* `World countries in multiple formats <https://github.com/mledoze/countries> `_
2014-11-21 17:10:09 +08:00
Government
----------
2016-02-26 11:06:00 +08:00
* `OpenDataSoft's list of 1,600 open data portals <https://www.opendatasoft.com/a-comprehensive-list-of-all-open-data-portals-around-the-world/> `_
* `A list of cities and countries contributed by community <https://github.com/caesar0301/awesome-public-datasets/blob/master/Government.rst> `_
2014-11-21 17:10:09 +08:00
2014-12-11 18:50:25 +08:00
2014-12-26 22:12:33 +08:00
Healthcare
2014-12-11 16:35:51 +08:00
----------
2015-01-31 17:18:37 +08:00
* `EHDP Large Health Data Sets <http://www.ehdp.com/vitalnet/datasets.htm> `_
2016-01-01 03:56:58 +08:00
* `Gapminder World demographic databases <http://www.gapminder.org/data/> `_
2015-11-21 01:15:47 +08:00
* `Medicare Coverage Database (MCD), U.S. <https://www.cms.gov/medicare-coverage-database/> `_
2015-01-31 17:18:37 +08:00
* `Medicare Data Engine of medicare.gov Data <https://data.medicare.gov/> `_
2014-12-27 00:27:06 +08:00
* `Medicare Data File <http://go.cms.gov/19xxPN4> `_
2015-10-14 09:48:17 +08:00
* `MeSH, the vocabulary thesaurus used for indexing articles for PubMed <https://www.nlm.nih.gov/mesh/filelist.html> `_
2015-03-31 00:06:33 +08:00
* `Number of Ebola Cases and Deaths in Affected Countries (2014) <https://data.hdx.rwlabs.org/dataset/ebola-cases-2014> `_
2015-12-02 18:55:04 +08:00
* `Open-ODS (structure of the UK NHS) <http://www.openods.co.uk> `_
2016-01-31 11:49:11 +08:00
* `OpenPaymentsData, Healthcare financial relationship data <https://openpaymentsdata.cms.gov> `_
2015-12-08 13:23:43 +08:00
* `The Cancer Genome Atlas project (TCGA) <https://tcga-data.nci.nih.gov/tcga/tcgaDownload.jsp> `_ and `BigQuery table <http://google-genomics.readthedocs.org/en/latest/use_cases/discover_public_data/isb_cgc_data.html> `_
2016-01-31 23:07:40 +08:00
* `World Health Organization Global Health Observatory <http://www.who.int/gho/en/> `_
2014-11-21 17:10:09 +08:00
2015-01-11 13:21:56 +08:00
2014-12-26 22:12:33 +08:00
Image Processing
----------------
2014-11-21 17:10:09 +08:00
2015-02-05 10:27:24 +08:00
* `10k US Adult Faces Database <http://wilmabainbridge.com/facememorability2.html> `_
2015-12-23 16:25:56 +08:00
* `2GB of Photos of Cats <http://137.189.35.203/WebUI/CatDatabase/catData.html> `_ or `Archive version <https://web.archive.org/web/20150520175645/http://137.189.35.203/WebUI/CatDatabase/catData.html> `_
2015-02-05 10:27:24 +08:00
* `Affective Image Classification <http://www.imageemotion.org/> `_
2015-12-08 13:23:43 +08:00
* `Animals with attributes <http://attributes.kyb.tuebingen.mpg.de/> `_
2015-01-31 17:18:37 +08:00
* `Face Recognition Benchmark <http://www.face-rec.org/databases/> `_
2015-02-05 10:27:24 +08:00
* `ImageNet (in WordNet hierarchy) <http://www.image-net.org/> `_
2015-12-08 13:23:43 +08:00
* `Indoor Scene Recognition <http://web.mit.edu/torralba/www/indoor.html> `_
2015-02-05 10:27:24 +08:00
* `International Affective Picture System, UFL <http://csea.phhp.ufl.edu/media/iapsmessage.html> `_
* `Massive Visual Memory Stimuli, MIT <http://cvcl.mit.edu/MM/stimuli.html> `_
2016-01-02 20:23:00 +08:00
* `Several Shape-from-Silhouette Datasets <http://kaiwolf.no-ip.org/3d-model-repository.html> `_
2015-12-08 13:23:43 +08:00
* `Stanford Dogs Dataset <http://vision.stanford.edu/aditya86/ImageNetDogs/> `_
2015-02-05 10:27:24 +08:00
* `SUN database, MIT <http://groups.csail.mit.edu/vision/SUN/hierarchy.html> `_
2015-12-08 13:23:43 +08:00
* `The Oxford-IIIT Pet Dataset <http://www.robots.ox.ac.uk/~vgg/data/pets/> `_
2015-04-23 13:26:33 +08:00
* `YouTube Faces Database <http://www.cs.tau.ac.il/~wolf/ytfaces/> `_
2016-02-10 23:39:44 +08:00
* `Adience Unfiltered faces for gender and age classification <http://www.openu.ac.il/home/hassner/Adience/data.html> `_
* `The Action Similarity Labeling (ASLAN) Challenge <http://www.openu.ac.il/home/hassner/data/ASLAN/ASLAN.html> `_
* `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark <http://www.openu.ac.il/home/hassner/data/violentflows/> `_
2016-01-01 03:56:58 +08:00
2014-11-21 17:10:09 +08:00
Machine Learning
----------------
2015-01-31 17:18:37 +08:00
* `Delve Datasets for classification and regression (Univ. of Toronto) <http://www.cs.toronto.edu/~delve/data/datasets.html> `_
2015-12-23 16:04:01 +08:00
* `Discogs Monthly Data <http://data.discogs.com/> `_
2015-01-31 17:18:37 +08:00
* `eBay Online Auctions (2012) <http://www.modelingonlineauctions.com/datasets> `_
* `IMDb Database <http://www.imdb.com/interfaces> `_
* `Keel Repository for classification, regression and time series <http://sci2s.ugr.es/keel/datasets.php> `_
2016-02-04 22:20:49 +08:00
* `Labeled Faces in the Wild (LFW) <http://vis-www.cs.umass.edu/lfw/> `_
2015-01-31 17:18:37 +08:00
* `Lending Club Loan Data <https://www.lendingclub.com/info/download-data.action> `_
* `Machine Learning Data Set Repository <http://mldata.org/> `_
* `Million Song Dataset <http://labrosa.ee.columbia.edu/millionsong/> `_
* `More Song Datasets <http://labrosa.ee.columbia.edu/millionsong/pages/additional-datasets> `_
* `MovieLens Data Sets <http://grouplens.org/datasets/movielens/> `_
2015-01-11 13:05:53 +08:00
* `RDataMining - "R and Data Mining" ebook data <http://www.rdatamining.com/data> `_
2015-11-21 18:00:46 +08:00
* `Registered Meteorites on Earth <http://healthintelligence.drupalgardens.com/content/registered-meteorites-has-impacted-earth-visualized> `_
2015-01-31 17:18:37 +08:00
* `Restaurants Health Score Data in San Francisco <http://missionlocal.org/san-francisco-restaurant-health-inspections/> `_
* `UCI Machine Learning Repository <http://archive.ics.uci.edu/ml/> `_
* `Yahoo! Ratings and Classification Data <http://webscope.sandbox.yahoo.com/catalog.php?datatype=r> `_
2014-11-21 17:10:09 +08:00
2014-12-26 22:12:33 +08:00
Museums
-------
2016-01-02 20:23:00 +08:00
* `Canada Science and Technology Museums Corporation's Open Data <http://techno-science.ca/en/data.php> `_
2014-12-27 00:27:06 +08:00
* `Cooper-Hewitt's Collection Database <https://github.com/cooperhewitt/collection> `_
* `Minneapolis Institute of Arts metadata <https://github.com/artsmia/collection> `_
2015-10-14 19:28:46 +08:00
* `Natural History Museum (London) Data Portal <http://data.nhm.ac.uk/> `_
* `Rijksmuseum Historical Art Collection <https://www.rijksmuseum.nl/en/api> `_
2014-12-27 00:27:06 +08:00
* `Tate Collection metadata <https://github.com/tategallery/collection> `_
* `The Getty vocabularies <http://vocab.getty.edu> `_
2014-12-26 22:12:33 +08:00
2014-11-21 17:10:09 +08:00
Natural Language
----------------
2015-02-26 11:56:02 +08:00
* `Blogger Corpus <http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm> `_
2016-02-02 20:25:17 +08:00
* `CLiPS Stylometry Investigation Corpus <http://www.clips.uantwerpen.be/datasets/csi-corpus> `_
2015-01-31 17:18:37 +08:00
* `ClueWeb09 FACC <http://lemurproject.org/clueweb09/FACC1/> `_
* `ClueWeb12 FACC <http://lemurproject.org/clueweb12/FACC1/> `_
2015-01-31 17:21:26 +08:00
* `DBpedia - 4.58M things with 583M facts <http://wiki.dbpedia.org/Datasets> `_
2015-01-31 17:18:37 +08:00
* `Flickr Personal Taxonomies <http://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html> `_
2015-12-30 17:18:44 +08:00
* `Freebase.com of people, places, and things <http://www.freebase.com/> `_
2015-11-21 01:15:47 +08:00
* `Google Books Ngrams (2.2TB) <https://aws.amazon.com/datasets/google-books-ngrams/> `_
2015-01-31 17:18:37 +08:00
* `Google Web 5gram (1TB, 2006) <https://catalog.ldc.upenn.edu/LDC2006T13> `_
* `Gutenberg eBooks List <http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs> `_
* `Hansards text chunks of Canadian Parliament <http://www.isi.edu/natural-language/download/hansard/> `_
2015-12-22 10:50:15 +08:00
* `Machine Comprehension Test (MCTest) of text from Microsoft Research <http://research.microsoft.com/en-us/um/redmond/projects/mctest/index.html> `_
2015-12-30 17:18:44 +08:00
* `Machine Translation of European languages <http://statmt.org/wmt11/translation-task.html#download> `_
2016-02-02 20:25:17 +08:00
* `Personae Corpus <http://www.clips.uantwerpen.be/datasets/personae-corpus> `_
2015-08-12 00:18:56 +08:00
* `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) <https://github.com/ParallelMazen/SaudiNewsNet> `_
2015-12-08 13:23:43 +08:00
* `SMS Spam Collection in English <http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/> `_
2015-01-31 17:18:37 +08:00
* `USENET postings corpus of 2005~2011 <http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html> `_
* `Wikidata - Wikipedia databases <https://www.wikidata.org/wiki/Wikidata:Database_download> `_
* `Wikipedia Links data - 40 Million Entities in Context <https://code.google.com/p/wiki-links/downloads/list> `_
* `WordNet databases and tools <http://wordnet.princeton.edu/wordnet/download/> `_
2014-11-21 17:10:09 +08:00
2014-12-26 22:12:33 +08:00
Physics
-------
2014-11-21 17:10:09 +08:00
2015-01-31 17:18:37 +08:00
* `CERN Open Data Portal <http://opendata.cern.ch/> `_
2016-02-14 12:12:18 +08:00
* `Crystallography Open Database <http://www.crystallography.net/> `_
2015-04-30 04:09:15 +08:00
* `NASA Exoplanet Archive <http://exoplanetarchive.ipac.caltech.edu/> `_
2015-12-08 13:23:43 +08:00
* `NSSDC (NASA) data of 550 space spacecraft <http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html> `_
2015-05-29 00:01:26 +08:00
* `Sloan Digital Sky Survey (SDSS) - Mapping the Universe <http://www.sdss.org/> `_
2014-11-21 17:10:09 +08:00
2015-12-08 13:23:43 +08:00
2015-05-20 06:27:51 +08:00
Psychology/Cognition
2016-01-02 20:23:00 +08:00
--------------------
2015-05-20 06:27:51 +08:00
2015-05-21 06:14:15 +08:00
* `OSU Cognitive Modeling Repository Datasets <http://www.cmr.osu.edu/browse/datasets> `_
2015-05-20 06:27:51 +08:00
2014-11-21 17:10:09 +08:00
2014-12-26 22:12:33 +08:00
Public Domains
--------------
2015-11-21 01:15:47 +08:00
* `Amazon <http://aws.amazon.com/datasets/> `_
2016-02-14 01:18:12 +08:00
* `Archive-it from Internet Archive <https://www.archive-it.org/explore?show=Collections> `_
2014-12-27 00:27:06 +08:00
* `Archive.org Datasets <https://archive.org/details/datasets> `_
* `CMU JASA data archive <http://lib.stat.cmu.edu/jasadata/> `_
* `CMU StatLab collections <http://lib.stat.cmu.edu/datasets/> `_
* `Data360 <http://www.data360.org/index.aspx> `_
* `Datamob.org <http://datamob.org/datasets> `_
* `Google <http://www.google.com/publicdata/directory> `_
* `Infochimps <http://www.infochimps.com/> `_
* `KDNuggets Data Collections <http://www.kdnuggets.com/datasets/index.html> `_
2015-04-27 03:15:39 +08:00
* `Microsoft Azure Data Market Free DataSets <http://datamarket.azure.com/browse/data?price=free> `_
2014-12-27 00:27:06 +08:00
* `Numbray <http://numbrary.com/> `_
2016-02-14 01:18:12 +08:00
* `Open Library Data Dumps <https://openlibrary.org/developers/dumps> `_
2015-11-21 01:15:47 +08:00
* `Reddit Datasets <https://www.reddit.com/r/datasets> `_
* `RevolutionAnalytics Collection <http://packages.revolutionanalytics.com/datasets/> `_
2014-12-27 00:27:06 +08:00
* `Sample R data sets <http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html> `_
* `Stats4Stem R data sets <http://www.stats4stem.org/data-sets.html> `_
* `StatSci.org <http://www.statsci.org/datasets.html> `_
* `The Washington Post List <http://www.washingtonpost.com/wp-srv/metro/data/datapost.html> `_
* `UCLA SOCR data collection <http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data> `_
* `UFO Reports <http://www.nuforc.org/webreports.html> `_
2015-11-21 01:15:47 +08:00
* `Wikileaks 911 pager intercepts <https://911.wikileaks.org/files/index.html> `_
2014-12-27 00:27:06 +08:00
* `Yahoo Webscope <http://webscope.sandbox.yahoo.com/catalog.php> `_
2014-11-21 17:10:09 +08:00
2014-12-26 22:12:33 +08:00
Search Engines
--------------
2015-01-31 17:18:37 +08:00
* `Academic Torrents of data sharing from UMB <http://academictorrents.com/> `_
2015-11-21 01:15:47 +08:00
* `Datahub.io <https://datahub.io/dataset> `_
2015-01-06 16:17:58 +08:00
* `DataMarket (Qlik) <https://datamarket.com/data/list/?q=all> `_
2015-11-21 01:15:47 +08:00
* `Harvard Dataverse Network of scientific data <https://dataverse.harvard.edu/> `_
2015-01-31 17:18:37 +08:00
* `ICPSR (UMICH) <http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp> `_
2016-01-02 20:23:00 +08:00
* `Institute of Education Sciences <http://eric.ed.gov> `_
2016-01-17 20:22:58 +08:00
* `National Technical Reports Library <http://www.ntis.gov/products/ntrl/> `_
2015-11-21 01:15:47 +08:00
* `Open Data Certificates (beta) <https://certificates.theodi.org/en/datasets> `_
2016-01-02 20:23:00 +08:00
* `OpenDataNetwork - A search engine of all Socrata powered data portals <http://www.opendatanetwork.com/> `_
2015-01-31 17:18:37 +08:00
* `Statista.com - statistics and Studies <http://www.statista.com/> `_
2016-01-02 20:23:00 +08:00
* `Zenodo - An open dependable home for the long-tail of science <https://zenodo.org/collection/datasets> `_
2014-11-21 17:10:09 +08:00
2015-12-08 13:23:43 +08:00
2016-01-02 20:23:00 +08:00
Social Networks
2014-11-21 17:10:09 +08:00
---------------
2016-01-02 20:23:00 +08:00
* `72 hours #gamergate Twitter Scrape <http://waxy.org/random/misc/gamergate_tweets.csv> `_
2015-01-31 17:18:37 +08:00
* `Ancestry.com Forum Dataset over 10 years <http://www.cs.cmu.edu/~jelsas/data/ancestry.com/> `_
2015-12-30 17:18:44 +08:00
* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape <https://archive.org/details/twitter_cikm_2010> `_
2015-01-31 17:18:37 +08:00
* `CMU Enron Email of 150 users <http://www.cs.cmu.edu/~enron/> `_
2015-11-21 01:15:47 +08:00
* `EDRM Enron EMail of 151 users, hosted on S3 <https://aws.amazon.com/datasets/enron-email-data/> `_
2015-01-31 17:18:37 +08:00
* `Facebook Data Scrape (2005) <https://archive.org/details/oxford-2005-facebook-matrix> `_
2015-01-06 14:18:24 +08:00
* `Facebook Social Networks from LAW (since 2007) <http://law.di.unimi.it/datasets.php> `_
2015-01-31 17:18:37 +08:00
* `Foursquare from UMN/Sarwat (2013) <https://archive.org/details/201309_foursquare_dataset_umn> `_
2015-12-22 12:52:22 +08:00
* `GetGlue - users rating TV shows <http://getglue-data.s3.amazonaws.com/getglue_sample.tar.gz> `_
2015-11-21 01:15:47 +08:00
* `GitHub Collaboration Archive <https://www.githubarchive.org/> `_
2015-12-08 13:23:43 +08:00
* `Google Scholar citation relations <http://www3.cs.stonybrook.edu/~leman/data/gscholar.db> `_
2016-02-10 23:45:43 +08:00
* `High-Resolution Contact Networks from Wearable Sensors <http://www.sociopatterns.org/datasets/> `_
2015-01-31 17:18:37 +08:00
* `Mobile Social Networks from UMASS <https://kdl.cs.umass.edu/display/public/Mobile+Social+Networks> `_
2015-12-30 17:18:44 +08:00
* `Network Twitter Data <http://snap.stanford.edu/data/higgs-twitter.html> `_
2015-07-14 17:44:44 +08:00
* `Reddit Comments <https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/> `_
2015-12-08 13:23:43 +08:00
* `Skytrax' Air Travel Reviews Dataset <https://github.com/quankiquanki/skytrax-reviews-dataset> `_
2015-12-30 17:18:44 +08:00
* `Social Twitter Data <http://snap.stanford.edu/data/egonets-Twitter.html> `_
2015-11-21 01:15:47 +08:00
* `SourceForge.net Research Data <http://www3.nd.edu/~oss/Data/data.html> `_
2015-12-30 17:18:44 +08:00
* `Twitter Data for Sentiment Analysis <http://help.sentiment140.com/for-students/> `_
2016-02-09 20:33:36 +08:00
* `Twitter Data for Online Reputation Management <http://nlp.uned.es/replab2013/> `_
2015-01-31 17:18:37 +08:00
* `Twitter Graph of entire Twitter site <http://an.kaist.ac.kr/traces/WWW2010.html> `_
2015-12-30 17:18:44 +08:00
* `Twitter Scrape Calufa May 2011 <http://archive.org/details/2011-05-calufa-twitter-sql> `_
2015-01-31 17:18:37 +08:00
* `UNIMI/LAW Social Network Datasets <http://law.di.unimi.it/datasets.php> `_
* `Yahoo! Graph and Social Data <http://webscope.sandbox.yahoo.com/catalog.php?datatype=g> `_
* `Youtube Video Social Graph in 2007,2008 <http://netsg.cs.sfu.ca/youtubedata/> `_
2016-01-02 20:23:00 +08:00
Social Sciences
---------------
2016-02-03 01:21:51 +08:00
* `ACLED (Armed Conflict Location & Event Data Project) <http://www.acleddata.com/> `_
2016-01-02 20:23:00 +08:00
* `Canadian Legal Information Institute <https://www.canlii.org/en/index.php> `_
* `Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc <http://www.systemicpeace.org/> `_
2016-01-01 03:56:58 +08:00
* `Correlates of War Project <http://www.correlatesofwar.org/> `_
* `Cryptome Conspiracy Theory Items <http://cryptome.org> `_
* `Datacards <http://datacards.org> `_
2016-01-05 14:11:23 +08:00
* `European Social Survey <http://www.europeansocialsurvey.org/data/> `_
2016-01-02 20:23:00 +08:00
* `FBI Hate Crime 2013 - aggregated data <https://github.com/emorisse/FBI-Hate-Crime-Statistics/tree/master/2013> `_
* `GDELT Global Events Database <http://gdeltproject.org/data.html> `_
* `General Social Survey (GSS) since 1972 <http://gss.norc.org> `_
* `German Social Survey <http://www.gesis.org/en/home/> `_
2016-01-01 03:56:58 +08:00
* `Global Religious Futures Project <http://www.globalreligiousfutures.org/> `_
2016-02-03 01:44:33 +08:00
* `Humanitarian Data Exchange <https://data.hdx.rwlabs.org/> `_
2016-01-01 03:56:58 +08:00
* `Institute for Demographic Studies <http://www.ined.fr/en/> `_
* `International Networks Archive <http://www.princeton.edu/~ina/> `_
* `International Social Survey Program ISSP <http://www.issp.org> `_
2016-01-02 20:23:00 +08:00
* `International Studies Compendium Project <http://www.isacompendium.com/public/> `_
* `James McGuire Cross National Data <http://jmcguire.faculty.wesleyan.edu/welcome/cross-national-data/> `_
2016-02-04 22:06:44 +08:00
* `MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste <http://nsd.uib.no> `_
2016-01-02 20:23:00 +08:00
* `MIT Reality Mining Dataset <http://realitycommons.media.mit.edu/realitymining.html> `_
2016-02-04 22:06:44 +08:00
* `Open Crime and Policing Data in England, Wales and Northern Ireland <https://data.police.uk/data/> `_
2016-01-02 20:23:00 +08:00
* `Paul Hensel General International Data Page <http://www.paulhensel.org/dataintl.html> `_
* `PewResearch Internet Survey Project <http://www.pewinternet.org/datasets/pages/2/> `_
* `PewResearch Society Data Collection <http://www.pewresearch.org/data/download-datasets/> `_
* `Political Polarity Data <http://www3.cs.stonybrook.edu/~leman/data/14-icwsm-political-polarity-data.zip> `_
* `StackExchange Data Explorer <http://data.stackexchange.com/help> `_
* `Terrorism Research and Analysis Consortium <http://www.trackingterrorism.org/> `_
* `Texas Inmates Executed Since 1984 <http://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html> `_
* `Titanic Survival Data Set <https://github.com/caesar0301/awesome-public-datasets/tree/master/Datasets> `_
* `UCB's Archive of Social Science Data (D-Lab) <http://ucdata.berkeley.edu/> `_
* `UCLA Social Sciences Data Archive <http://dataarchives.ss.ucla.edu/Home.DataPortals.htm> `_
* `UN Civil Society Database <http://esango.un.org/civilsociety/> `_
* `Universities Worldwide <http://univ.cc/> `_
* `UPJOHN for Labor Employment Research <http://www.upjohn.org/services/resources/employment-research-data-center> `_
2016-02-04 22:06:44 +08:00
* `World Bank Data <http://data.worldbank.org/> `_
2016-01-18 15:33:16 +08:00
* `WorldPop project - Worldwide human population distributions <http://www.worldpop.org.uk/data/get_data/> `_
2014-11-21 17:10:09 +08:00
2016-02-15 01:58:38 +08:00
Software
--------
* `FLOSSmole data about free, libre, and open source software development <http://flossdata.syr.edu/data/> `_
2014-12-26 22:12:33 +08:00
Sports
------
2014-12-08 17:42:02 +08:00
2016-02-09 12:19:23 +08:00
* `Basketball (NBA/NCAA/Euro) Player Database and Statistics <http://www.draftexpress.com/stats.php> `_
2015-01-31 17:18:37 +08:00
* `Betfair Historical Exchange Data <http://data.betfair.com/> `_
2015-04-20 21:44:27 +08:00
* `Cricsheet Matches (cricket) <http://cricsheet.org/> `_
2015-01-31 17:18:37 +08:00
* `Ergast Formula 1, from 1950 up to date (API) <http://ergast.com/mrd/db> `_
2015-09-01 23:03:19 +08:00
* `Football/Soccer resources (data and APIs) <http://www.jokecamp.com/blog/guide-to-football-and-soccer-data-and-apis/> `_
2015-01-31 17:18:37 +08:00
* `Lahman's Baseball Database <http://www.seanlahman.com/baseball-archive/statistics/> `_
2016-01-17 18:33:21 +08:00
* `Pinhooker: Thoroughbred Bloodstock Sale Data <https://github.com/phillc73/pinhooker> `_
2015-01-31 17:18:37 +08:00
* `Retrosheet Baseball Statistics <http://www.retrosheet.org/game.htm> `_
2014-12-17 04:00:23 +08:00
2014-12-26 22:12:33 +08:00
Time Series
-----------
2014-11-21 17:10:09 +08:00
2016-01-02 20:23:00 +08:00
* `Databanks International Cross National Time Series Data Archive <http://www.cntsdata.com> `_
2015-02-05 03:36:47 +08:00
* `Hard Drive Failure Rates <https://www.backblaze.com/hard-drive-test-data.html> `_
2015-05-20 06:24:31 +08:00
* `Heart Rate Time Series from MIT <http://ecg.mit.edu/time-series/> `_
2015-12-08 13:23:43 +08:00
* `Time Series Data Library (TSDL) from MU <https://datamarket.com/data/list/?q=provider:tsdl> `_
* `UC Riverside Time Series Dataset <http://www.cs.ucr.edu/~eamonn/time_series_data/> `_
2014-11-21 17:10:09 +08:00
2014-12-26 22:12:33 +08:00
Transportation
2014-11-21 17:10:09 +08:00
--------------
2015-01-31 17:18:37 +08:00
* `Airlines OD Data 1987-2008 <http://stat-computing.org/dataexpo/2009/the-data.html> `_
2015-11-21 01:15:47 +08:00
* `Bay Area Bike Share Data <http://www.bayareabikeshare.com/open-data> `_
2015-12-08 13:23:43 +08:00
* `Bike Share Systems (BSS) collection <https://github.com/BetaNYC/Bike-Share-Data-Best-Practices/wiki/Bike-Share-Data-Systems> `_
2015-04-01 12:56:22 +08:00
* `GeoLife GPS Trajectory from Microsoft Research <http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/> `_
2015-12-21 18:50:36 +08:00
* `German train system by Deutsche Bahn <http://data.deutschebahn.com/datasets/> `_
2015-01-31 17:18:37 +08:00
* `Hubway Million Rides in MA <http://hubwaydatachallenge.org/trip-history-data/> `_
2015-11-21 01:15:47 +08:00
* `Marine Traffic - ship tracks, port calls and more <http://www.marinetraffic.com/de/ais-api-services> `_
2015-12-31 12:52:03 +08:00
* `Montreal BIXI Bike Share <https://montreal.bixi.com/donn%C3%A9es-libre-service> `_
2015-10-04 04:40:14 +08:00
* `NYC Taxi Trip Data 2009- <http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml> `_
2015-12-08 13:23:43 +08:00
* `NYC Taxi Trip Data 2013 (FOIA/FOILed) <https://archive.org/details/nycTaxiTripData2013> `_
* `NYC Uber trip data April 2014 to September 2014 <https://github.com/fivethirtyeight/uber-tlc-foil-response> `_
2016-02-04 22:20:49 +08:00
* `Open Traffic collection <https://github.com/graphhopper/open-traffic-collection> `_
2015-01-31 17:18:37 +08:00
* `OpenFlights - airport, airline and route data <http://openflights.org/data.html> `_
2016-02-04 22:10:29 +08:00
* `Philadelphia Bike Share Stations (JSON) <https://www.rideindego.com/stations/json/> `_
2015-10-26 12:37:09 +08:00
* `Plane Crash Database, since 1920 <http://www.planecrashinfo.com/database.htm> `_
2015-01-31 17:18:37 +08:00
* `RITA Airline On-Time Performance data <http://www.transtats.bts.gov/Tables.asp?DB_ID=120> `_
2015-01-06 12:25:46 +08:00
* `RITA/BTS transport data collection (TranStat) <http://www.transtats.bts.gov/DataIndex.asp> `_
2015-12-31 12:52:03 +08:00
* `Toronto Bike Share Stations (XML file) <http://www.bikesharetoronto.com/data/stations/bikeStations.xml> `_
2015-12-22 00:36:13 +08:00
* `Transport for London (TFL) <https://tfl.gov.uk/info-for/open-data-users/our-feeds> `_
2015-01-31 17:18:37 +08:00
* `Travel Tracker Survey (TTS) for Chicago <http://www.cmap.illinois.gov/data/transportation/travel-tracker-survey> `_
2015-01-06 12:25:46 +08:00
* `U.S. Bureau of Transportation Statistics (BTS) <http://www.rita.dot.gov/bts/> `_
2015-03-20 10:41:24 +08:00
* `U.S. Domestic Flights 1990 to 2009 <http://academictorrents.com/details/a2ccf94bbb4af222bf8e69dad60a68a29f310d9a> `_
2015-01-31 17:18:37 +08:00
* `U.S. Freight Analysis Framework since 2007 <http://ops.fhwa.dot.gov/freight/freight_analysis/faf/index.htm> `_
2014-11-21 17:10:09 +08:00
Complementary Collections
-------------------------
2016-02-04 22:20:49 +08:00
* `Data Packaged Core Datasets <https://github.com/datasets/> `_
2016-01-02 20:23:00 +08:00
* `Database of Scientific Code Contributions <https://mozillascience.org/collaborate> `_
2014-12-27 00:27:06 +08:00
* DataWrangling: `Some Datasets Available on the Web <http://www.datawrangling.com/some-datasets-available-on-the-web> `_
* Inside-r: `Finding Data on the Internet <http://www.inside-r.org/howto/finding-data-internet> `_
2015-12-08 13:23:43 +08:00
* OpenDataMonitor: `An overview of available open data resources in Europe <http://opendatamonitor.eu> `_
2014-12-27 00:27:06 +08:00
* Quora: `Where can I find large datasets open to the public? <http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public> `_
2015-11-21 01:15:47 +08:00
* RS.io: `100+ Interesting Data Sets for Statistics <http://rs.io/100-interesting-data-sets-for-statistics/> `_
2015-03-18 15:03:50 +08:00
* StaTrek: `Leveraging open data to understand urban lives <http://xiaming.me/posts/2014/10/23/leveraging-open-data-to-understand-urban-lives/> `_
2016-01-01 03:56:58 +08:00