Clear format.
This commit is contained in:
Xiaming Chen 2015-12-08 13:23:43 +08:00
parent c716fe4040
commit bf5e282f43

View File

@ -27,12 +27,11 @@ Biology
* `1000 Genomes <http://www.1000genomes.org/data>`_ * `1000 Genomes <http://www.1000genomes.org/data>`_
* `American Gut (Microbiome Project) <https://github.com/biocore/American-Gut>`_ * `American Gut (Microbiome Project) <https://github.com/biocore/American-Gut>`_
* `Collaborative Research in Computational Neuroscience (CRCNS) <http://crcns.org/data-sets>`_ * `Collaborative Research in Computational Neuroscience (CRCNS) <http://crcns.org/data-sets>`_
* `EBI ArrayExrepss <http://www.ebi.ac.uk/arrayexpress/>`_
* `ENCODE project <https://www.encodeproject.org>`_
* `Gene Expression Omnibus (GEO) <http://www.ncbi.nlm.nih.gov/geo/>`_ * `Gene Expression Omnibus (GEO) <http://www.ncbi.nlm.nih.gov/geo/>`_
* `Gene Ontology (GO) <http://geneontology.org/page/download-annotations>`_ * `Gene Ontology (GO) <http://geneontology.org/page/download-annotations>`_
* `Global Biotic Interations (GloBI) <https://github.com/jhpoelen/eol-globi-data/wiki#accessing-species-interaction-data>`_ * `Global Biotic Interations (GloBI) <https://github.com/jhpoelen/eol-globi-data/wiki#accessing-species-interaction-data>`_
* `Sequence Read Archive(SRA) <http://www.ncbi.nlm.nih.gov/Traces/sra/>`_
* `EBI ArrayExrepss <http://www.ebi.ac.uk/arrayexpress/>`_
* `ENCODE project <https://www.encodeproject.org>`_
* `Human Microbiome Project (HMP) <http://www.hmpdacc.org/reference_genomes/reference_genomes.php>`_ * `Human Microbiome Project (HMP) <http://www.hmpdacc.org/reference_genomes/reference_genomes.php>`_
* `ICOS PSP Benchmark <http://ico2s.org/datasets/psp_benchmark.html>`_ * `ICOS PSP Benchmark <http://ico2s.org/datasets/psp_benchmark.html>`_
* `MIT Cancer Genomics Data <http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi>`_ * `MIT Cancer Genomics Data <http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi>`_
@ -42,11 +41,12 @@ Biology
* `Protein Data Bank <http://www.rcsb.org/>`_ * `Protein Data Bank <http://www.rcsb.org/>`_
* `PubChem Project <https://pubchem.ncbi.nlm.nih.gov/>`_ * `PubChem Project <https://pubchem.ncbi.nlm.nih.gov/>`_
* `PubGene (now Coremine Medical) <http://www.pubgene.org/>`_ * `PubGene (now Coremine Medical) <http://www.pubgene.org/>`_
* `Sequence Read Archive(SRA) <http://www.ncbi.nlm.nih.gov/Traces/sra/>`_
* `Stanford Microarray Data <http://smd.stanford.edu/>`_ * `Stanford Microarray Data <http://smd.stanford.edu/>`_
* `The Catalogue of Life <http://www.catalogueoflife.org/content/annual-checklist-archive>`_
* `The Personal Genome Project <http://www.personalgenomes.org/>`_ or `PGP <https://my.pgp-hms.org/public_genetic_data>`_ * `The Personal Genome Project <http://www.personalgenomes.org/>`_ or `PGP <https://my.pgp-hms.org/public_genetic_data>`_
* `UCSC Public Data <http://hgdownload.soe.ucsc.edu/downloads.html>`_ * `UCSC Public Data <http://hgdownload.soe.ucsc.edu/downloads.html>`_
* `UniGene <http://www.ncbi.nlm.nih.gov/unigene>`_ * `UniGene <http://www.ncbi.nlm.nih.gov/unigene>`_
* `The Catalogue of Life <http://www.catalogueoflife.org/content/annual-checklist-archive>`_
Climate/Weather Climate/Weather
@ -62,8 +62,8 @@ Climate/Weather
* `NOAA Climate Datasets <http://www.ncdc.noaa.gov/data-access/quick-links>`_ * `NOAA Climate Datasets <http://www.ncdc.noaa.gov/data-access/quick-links>`_
* `NOAA Realtime Weather Models <http://www.ncdc.noaa.gov/data-access/model-data/model-datasets/numerical-weather-prediction>`_ * `NOAA Realtime Weather Models <http://www.ncdc.noaa.gov/data-access/model-data/model-datasets/numerical-weather-prediction>`_
* `The World Bank Open Data Resources for Climate Change <http://data.worldbank.org/developers/climate-data-api>`_ * `The World Bank Open Data Resources for Climate Change <http://data.worldbank.org/developers/climate-data-api>`_
* `WorldClim - Global Climate Data <http://www.worldclim.org>`_
* `UEA Climatic Research Unit <http://www.cru.uea.ac.uk/data>`_ * `UEA Climatic Research Unit <http://www.cru.uea.ac.uk/data>`_
* `WorldClim - Global Climate Data <http://www.worldclim.org>`_
* `WU Historical Weather Worldwide <http://www.wunderground.com/history/index.html>`_ * `WU Historical Weather Worldwide <http://www.wunderground.com/history/index.html>`_
@ -114,8 +114,8 @@ Data Challenges
--------------- ---------------
* `Challenges in Machine Learning <http://www.chalearn.org/>`_ * `Challenges in Machine Learning <http://www.chalearn.org/>`_
* `D4D Challenge of Orange <http://www.d4d.orange.com/en/home>`_
* `CrowdANALYTIX dataX <http://data.crowdanalytix.com>`_ * `CrowdANALYTIX dataX <http://data.crowdanalytix.com>`_
* `D4D Challenge of Orange <http://www.d4d.orange.com/en/home>`_
* `DrivenData Competitions for Social Good <http://www.drivendata.org/>`_ * `DrivenData Competitions for Social Good <http://www.drivendata.org/>`_
* `ICWSM Data Challenge (since 2009) <http://icwsm.cs.umbc.edu/>`_ * `ICWSM Data Challenge (since 2009) <http://icwsm.cs.umbc.edu/>`_
* `Kaggle Competition Data <https://www.kaggle.com/>`_ * `Kaggle Competition Data <https://www.kaggle.com/>`_
@ -166,8 +166,9 @@ Finance
Geology Geology
------- -------
* `USGS Earthquake Archives <http://earthquake.usgs.gov/earthquakes/search/>`_
* `Smithsonian Institution Global Volcano and Eruption Database <http://volcano.si.edu/>`_ * `Smithsonian Institution Global Volcano and Eruption Database <http://volcano.si.edu/>`_
* `USGS Earthquake Archives <http://earthquake.usgs.gov/earthquakes/search/>`_
GeoSpace/GIS GeoSpace/GIS
@ -181,14 +182,14 @@ GeoSpace/GIS
* `GeoNames Worldwide <http://www.geonames.org/>`_ * `GeoNames Worldwide <http://www.geonames.org/>`_
* `Global Administrative Areas Database (GADM) <http://www.gadm.org/>`_ * `Global Administrative Areas Database (GADM) <http://www.gadm.org/>`_
* `Landsat 8 on AWS <https://aws.amazon.com/public-data-sets/landsat/>`_ * `Landsat 8 on AWS <https://aws.amazon.com/public-data-sets/landsat/>`_
* `List of all countries in all languages <https://github.com/umpirsky/country-list>`_
* `Natural Earth - vectors and rasters of the world <http://www.naturalearthdata.com/>`_ * `Natural Earth - vectors and rasters of the world <http://www.naturalearthdata.com/>`_
* `OpenAddresses <http://openaddresses.io/>`_
* `OpenStreetMap (OSM) <http://wiki.openstreetmap.org/wiki/Downloading_data>`_ * `OpenStreetMap (OSM) <http://wiki.openstreetmap.org/wiki/Downloading_data>`_
* `TIGER/Line - U.S. boundaries and roads <http://www.census.gov/geo/maps-data/data/tiger-line.html>`_ * `TIGER/Line - U.S. boundaries and roads <http://www.census.gov/geo/maps-data/data/tiger-line.html>`_
* `TwoFishes - Foursquare's coarse geocoder <https://github.com/foursquare/twofishes>`_ * `TwoFishes - Foursquare's coarse geocoder <https://github.com/foursquare/twofishes>`_
* `TZ Timezones shapfiles <http://efele.net/maps/tz/world/>`_ * `TZ Timezones shapfiles <http://efele.net/maps/tz/world/>`_
* `World countries in multiple formats <https://github.com/mledoze/countries>`_ * `World countries in multiple formats <https://github.com/mledoze/countries>`_
* `List of all countries in all languages <https://github.com/umpirsky/country-list>`_
* `OpenAddresses <http://openaddresses.io/>`_
Government Government
@ -232,6 +233,7 @@ Government
* `Open Government Data (OGD) Platform India <https://data.gov.in/>`_ * `Open Government Data (OGD) Platform India <https://data.gov.in/>`_
* `Oregon <https://data.oregon.gov/>`_ * `Oregon <https://data.oregon.gov/>`_
* `Portland, Oregon <http://www.portlandonline.com/28130/>`_ * `Portland, Oregon <http://www.portlandonline.com/28130/>`_
* `Puerto Rico Government <https://data.pr.gov//>`_
* `Rio de Janeiro, Brazil <http://data.rio.rj.gov.br/>`_ * `Rio de Janeiro, Brazil <http://data.rio.rj.gov.br/>`_
* `Romania <http://data.gov.ro/>`_ * `Romania <http://data.gov.ro/>`_
* `Russia <http://data.gov.ru>`_ * `Russia <http://data.gov.ru>`_
@ -240,22 +242,21 @@ Government
* `Singapore Government Data <https://data.gov.sg/>`_ * `Singapore Government Data <https://data.gov.sg/>`_
* `South Africa <http://beta2.statssa.gov.za/>`_ * `South Africa <http://beta2.statssa.gov.za/>`_
* `Switzerland <http://www.opendata.admin.ch/>`_ * `Switzerland <http://www.opendata.admin.ch/>`_
* `The World Bank <http://wdronline.worldbank.org/>`_
* `Texas Open Data <https://data.texas.gov/>`_ * `Texas Open Data <https://data.texas.gov/>`_
* `Puerto Rico Government <https://data.pr.gov//>`_ * `The World Bank <http://wdronline.worldbank.org/>`_
* `U.K. Government Data <http://data.gov.uk/data>`_ * `U.K. Government Data <http://data.gov.uk/data>`_
* `Uruguay <https://catalogodatos.gub.uy/>`_
* `U.S. American Community Survey <http://www.census.gov/acs/www/data_documentation/data_release_info/>`_ * `U.S. American Community Survey <http://www.census.gov/acs/www/data_documentation/data_release_info/>`_
* `U.S. CDC Public Health datasets <http://www.cdc.gov/nchs/data_access/ftp_data.htm>`_ * `U.S. CDC Public Health datasets <http://www.cdc.gov/nchs/data_access/ftp_data.htm>`_
* `U.S. Census Bureau <http://www.census.gov/data.html>`_ * `U.S. Census Bureau <http://www.census.gov/data.html>`_
* `U.S. National Center for Education Statistics (NCES) <http://nces.ed.gov/>`_
* `U.S. Department of Housing and Urban Development (HUD) <http://www.huduser.gov/portal/datasets/pdrdatas.html>`_ * `U.S. Department of Housing and Urban Development (HUD) <http://www.huduser.gov/portal/datasets/pdrdatas.html>`_
* `U.S. Federal Government Agencies <http://www.data.gov/metrics>`_ * `U.S. Federal Government Agencies <http://www.data.gov/metrics>`_
* `U.S. Federal Government Data Catalog <http://catalog.data.gov/dataset>`_ * `U.S. Federal Government Data Catalog <http://catalog.data.gov/dataset>`_
* `U.S. Food and Drug Administration (FDA) <https://open.fda.gov/index.html>`_ * `U.S. Food and Drug Administration (FDA) <https://open.fda.gov/index.html>`_
* `U.S. National Center for Education Statistics (NCES) <http://nces.ed.gov/>`_
* `U.S. Open Government <http://www.data.gov/open-gov/>`_ * `U.S. Open Government <http://www.data.gov/open-gov/>`_
* `UK 2011 Census Open Atlas Project <http://www.alex-singleton.com/r/2013/02/05/2011-census-open-atlas-project/>`_ * `UK 2011 Census Open Atlas Project <http://www.alex-singleton.com/r/2013/02/05/2011-census-open-atlas-project/>`_
* `United Nations <http://data.un.org/>`_ * `United Nations <http://data.un.org/>`_
* `Uruguay <https://catalogodatos.gub.uy/>`_
* `Vancouver, BC Open Data Catalog <http://data.vancouver.ca/datacatalogue/>`_ * `Vancouver, BC Open Data Catalog <http://data.vancouver.ca/datacatalogue/>`_
@ -270,6 +271,7 @@ Healthcare
* `MeSH, the vocabulary thesaurus used for indexing articles for PubMed <https://www.nlm.nih.gov/mesh/filelist.html>`_ * `MeSH, the vocabulary thesaurus used for indexing articles for PubMed <https://www.nlm.nih.gov/mesh/filelist.html>`_
* `Number of Ebola Cases and Deaths in Affected Countries (2014) <https://data.hdx.rwlabs.org/dataset/ebola-cases-2014>`_ * `Number of Ebola Cases and Deaths in Affected Countries (2014) <https://data.hdx.rwlabs.org/dataset/ebola-cases-2014>`_
* `Open-ODS (structure of the UK NHS) <http://www.openods.co.uk>`_ * `Open-ODS (structure of the UK NHS) <http://www.openods.co.uk>`_
* `The Cancer Genome Atlas project (TCGA) <https://tcga-data.nci.nih.gov/tcga/tcgaDownload.jsp>`_ and `BigQuery table <http://google-genomics.readthedocs.org/en/latest/use_cases/discover_public_data/isb_cgc_data.html>`_
Image Processing Image Processing
@ -277,17 +279,17 @@ Image Processing
* `10k US Adult Faces Database <http://wilmabainbridge.com/facememorability2.html>`_ * `10k US Adult Faces Database <http://wilmabainbridge.com/facememorability2.html>`_
* `2GB of Photos of Cats (Original down - 20Agst2015) <http://137.189.35.203/WebUI/CatDatabase/catData.html>`_ or `Archive version <https://web.archive.org/web/20150520175645/http://137.189.35.203/WebUI/CatDatabase/catData.html>`_ * `2GB of Photos of Cats (Original down - 20Agst2015) <http://137.189.35.203/WebUI/CatDatabase/catData.html>`_ or `Archive version <https://web.archive.org/web/20150520175645/http://137.189.35.203/WebUI/CatDatabase/catData.html>`_
* `Stanford Dogs Dataset <http://vision.stanford.edu/aditya86/ImageNetDogs/>`_
* `The Oxford-IIIT Pet Dataset <http://www.robots.ox.ac.uk/~vgg/data/pets/>`_
* `Animals with attributes <http://attributes.kyb.tuebingen.mpg.de/>`_
* `Affective Image Classification <http://www.imageemotion.org/>`_ * `Affective Image Classification <http://www.imageemotion.org/>`_
* `Animals with attributes <http://attributes.kyb.tuebingen.mpg.de/>`_
* `Face Recognition Benchmark <http://www.face-rec.org/databases/>`_ * `Face Recognition Benchmark <http://www.face-rec.org/databases/>`_
* `ImageNet (in WordNet hierarchy) <http://www.image-net.org/>`_ * `ImageNet (in WordNet hierarchy) <http://www.image-net.org/>`_
* `Indoor Scene Recognition <http://web.mit.edu/torralba/www/indoor.html>`_
* `International Affective Picture System, UFL <http://csea.phhp.ufl.edu/media/iapsmessage.html>`_ * `International Affective Picture System, UFL <http://csea.phhp.ufl.edu/media/iapsmessage.html>`_
* `Massive Visual Memory Stimuli, MIT <http://cvcl.mit.edu/MM/stimuli.html>`_ * `Massive Visual Memory Stimuli, MIT <http://cvcl.mit.edu/MM/stimuli.html>`_
* `Stanford Dogs Dataset <http://vision.stanford.edu/aditya86/ImageNetDogs/>`_
* `SUN database, MIT <http://groups.csail.mit.edu/vision/SUN/hierarchy.html>`_ * `SUN database, MIT <http://groups.csail.mit.edu/vision/SUN/hierarchy.html>`_
* `The Oxford-IIIT Pet Dataset <http://www.robots.ox.ac.uk/~vgg/data/pets/>`_
* `YouTube Faces Database <http://www.cs.tau.ac.il/~wolf/ytfaces/>`_ * `YouTube Faces Database <http://www.cs.tau.ac.il/~wolf/ytfaces/>`_
* `Indoor Scene Recognition <http://web.mit.edu/torralba/www/indoor.html>`_
Machine Learning Machine Learning
@ -334,8 +336,8 @@ Natural Language
* `Gutenberg eBooks List <http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs>`_ * `Gutenberg eBooks List <http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs>`_
* `Hansards text chunks of Canadian Parliament <http://www.isi.edu/natural-language/download/hansard/>`_ * `Hansards text chunks of Canadian Parliament <http://www.isi.edu/natural-language/download/hansard/>`_
* `Machine Translation of European languages <http://statmt.org/wmt11/translation-task.html#download>`_ * `Machine Translation of European languages <http://statmt.org/wmt11/translation-task.html#download>`_
* `SMS Spam Collection in English <http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/>`_
* `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) <https://github.com/ParallelMazen/SaudiNewsNet>`_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) <https://github.com/ParallelMazen/SaudiNewsNet>`_
* `SMS Spam Collection in English <http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/>`_
* `USENET postings corpus of 2005~2011 <http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html>`_ * `USENET postings corpus of 2005~2011 <http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html>`_
* `Wikidata - Wikipedia databases <https://www.wikidata.org/wiki/Wikidata:Database_download>`_ * `Wikidata - Wikipedia databases <https://www.wikidata.org/wiki/Wikidata:Database_download>`_
* `Wikipedia Links data - 40 Million Entities in Context <https://code.google.com/p/wiki-links/downloads/list>`_ * `Wikipedia Links data - 40 Million Entities in Context <https://code.google.com/p/wiki-links/downloads/list>`_
@ -346,10 +348,11 @@ Physics
------- -------
* `CERN Open Data Portal <http://opendata.cern.ch/>`_ * `CERN Open Data Portal <http://opendata.cern.ch/>`_
* `NSSDC (NASA) data of 550 space spacecraft <http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html>`_
* `NASA Exoplanet Archive <http://exoplanetarchive.ipac.caltech.edu/>`_ * `NASA Exoplanet Archive <http://exoplanetarchive.ipac.caltech.edu/>`_
* `NSSDC (NASA) data of 550 space spacecraft <http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html>`_
* `Sloan Digital Sky Survey (SDSS) - Mapping the Universe <http://www.sdss.org/>`_ * `Sloan Digital Sky Survey (SDSS) - Mapping the Universe <http://www.sdss.org/>`_
Psychology/Cognition Psychology/Cognition
-------------- --------------
@ -395,6 +398,7 @@ Search Engines
* `Open Data Certificates (beta) <https://certificates.theodi.org/en/datasets>`_ * `Open Data Certificates (beta) <https://certificates.theodi.org/en/datasets>`_
* `Statista.com - statistics and Studies <http://www.statista.com/>`_ * `Statista.com - statistics and Studies <http://www.statista.com/>`_
Social Networks Social Networks
--------------- ---------------
@ -405,6 +409,7 @@ Social Networks
* `Social Twitter Data <http://snap.stanford.edu/data/egonets-Twitter.html>`_ * `Social Twitter Data <http://snap.stanford.edu/data/egonets-Twitter.html>`_
* `Twitter Data for Sentiment Analysis <http://help.sentiment140.com/for-students/>`_ * `Twitter Data for Sentiment Analysis <http://help.sentiment140.com/for-students/>`_
Social Sciences Social Sciences
--------------- ---------------
@ -414,19 +419,23 @@ Social Sciences
* `Facebook Data Scrape (2005) <https://archive.org/details/oxford-2005-facebook-matrix>`_ * `Facebook Data Scrape (2005) <https://archive.org/details/oxford-2005-facebook-matrix>`_
* `Facebook Social Networks from LAW (since 2007) <http://law.di.unimi.it/datasets.php>`_ * `Facebook Social Networks from LAW (since 2007) <http://law.di.unimi.it/datasets.php>`_
* `FBI Hate Crime 2013 - aggregated data <https://github.com/emorisse/FBI-Hate-Crime-Statistics/tree/master/2013>`_ * `FBI Hate Crime 2013 - aggregated data <https://github.com/emorisse/FBI-Hate-Crime-Statistics/tree/master/2013>`_
* `Foursquare Social Network in 2010, 2011 <http://www.public.asu.edu/~hgao16/dataset.html>`_
* `Foursquare from UMN/Sarwat (2013) <https://archive.org/details/201309_foursquare_dataset_umn>`_ * `Foursquare from UMN/Sarwat (2013) <https://archive.org/details/201309_foursquare_dataset_umn>`_
* `Foursquare Social Network in 2010, 2011 <http://www.public.asu.edu/~hgao16/dataset.html>`_
* `GDELT Global Events Database <http://gdeltproject.org/data.html>`_
* `General Social Survey (GSS) since 1972 <http://www3.norc.org/GSS+Website/>`_ * `General Social Survey (GSS) since 1972 <http://www3.norc.org/GSS+Website/>`_
* `GetGlue - users rating TV shows <http://bit.ly/1aL8XS0>`_ * `GetGlue - users rating TV shows <http://bit.ly/1aL8XS0>`_
* `GitHub Collaboration Archive <https://www.githubarchive.org/>`_ * `GitHub Collaboration Archive <https://www.githubarchive.org/>`_
* `Google Scholar citation relations <http://www3.cs.stonybrook.edu/~leman/data/gscholar.db>`_
* `MIT Reality Mining Dataset <http://realitycommons.media.mit.edu/realitymining.html>`_ * `MIT Reality Mining Dataset <http://realitycommons.media.mit.edu/realitymining.html>`_
* `Mobile Social Networks from UMASS <https://kdl.cs.umass.edu/display/public/Mobile+Social+Networks>`_ * `Mobile Social Networks from UMASS <https://kdl.cs.umass.edu/display/public/Mobile+Social+Networks>`_
* `PewResearch Internet Survey Project <http://www.pewinternet.org/datasets/pages/2/>`_ * `PewResearch Internet Survey Project <http://www.pewinternet.org/datasets/pages/2/>`_
* `Political Polarity Data <http://www3.cs.stonybrook.edu/~leman/data/14-icwsm-political-polarity-data.zip>`_
* `Reddit Comments <https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/>`_ * `Reddit Comments <https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/>`_
* `Skytrax' Air Travel Reviews Dataset <https://github.com/quankiquanki/skytrax-reviews-dataset>`_
* `SourceForge.net Research Data <http://www3.nd.edu/~oss/Data/data.html>`_ * `SourceForge.net Research Data <http://www3.nd.edu/~oss/Data/data.html>`_
* `StackExchange Data Explorer <http://data.stackexchange.com/help>`_ * `StackExchange Data Explorer <http://data.stackexchange.com/help>`_
* `Titanic Survival Data Set <https://github.com/caesar0301/awesome-public-datasets/tree/master/Datasets>`_
* `Texas Inmates Executed Since 1984 <http://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html>`_ * `Texas Inmates Executed Since 1984 <http://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html>`_
* `Titanic Survival Data Set <https://github.com/caesar0301/awesome-public-datasets/tree/master/Datasets>`_
* `Twitter Graph of entire Twitter site <http://an.kaist.ac.kr/traces/WWW2010.html>`_ * `Twitter Graph of entire Twitter site <http://an.kaist.ac.kr/traces/WWW2010.html>`_
* `UCB's Archive of Social Science Data (D-Lab) <http://ucdata.berkeley.edu/>`_ * `UCB's Archive of Social Science Data (D-Lab) <http://ucdata.berkeley.edu/>`_
* `UCLA Social Sciences Data Archive <http://dataarchives.ss.ucla.edu/Home.DataPortals.htm>`_ * `UCLA Social Sciences Data Archive <http://dataarchives.ss.ucla.edu/Home.DataPortals.htm>`_
@ -435,10 +444,6 @@ Social Sciences
* `UPJOHN for Labor Employment Research <http://www.upjohn.org/erdc/erdc.html>`_ * `UPJOHN for Labor Employment Research <http://www.upjohn.org/erdc/erdc.html>`_
* `Yahoo! Graph and Social Data <http://webscope.sandbox.yahoo.com/catalog.php?datatype=g>`_ * `Yahoo! Graph and Social Data <http://webscope.sandbox.yahoo.com/catalog.php?datatype=g>`_
* `Youtube Video Social Graph in 2007,2008 <http://netsg.cs.sfu.ca/youtubedata/>`_ * `Youtube Video Social Graph in 2007,2008 <http://netsg.cs.sfu.ca/youtubedata/>`_
* `Google Scholar citation relations <http://www3.cs.stonybrook.edu/~leman/data/gscholar.db>`_
* `Political Polarity Data <http://www3.cs.stonybrook.edu/~leman/data/14-icwsm-political-polarity-data.zip>`_
* `GDELT Global Events Database <http://gdeltproject.org/data.html>`_
* `Skytrax' Air Travel Reviews Dataset <https://github.com/quankiquanki/skytrax-reviews-dataset>`_
Sports Sports
@ -455,23 +460,24 @@ Sports
Time Series Time Series
----------- -----------
* `Time Series Data Library (TSDL) from MU <https://datamarket.com/data/list/?q=provider:tsdl>`_
* `UC Riverside Time Series Dataset <http://www.cs.ucr.edu/~eamonn/time_series_data/>`_
* `Hard Drive Failure Rates <https://www.backblaze.com/hard-drive-test-data.html>`_ * `Hard Drive Failure Rates <https://www.backblaze.com/hard-drive-test-data.html>`_
* `Heart Rate Time Series from MIT <http://ecg.mit.edu/time-series/>`_ * `Heart Rate Time Series from MIT <http://ecg.mit.edu/time-series/>`_
* `Time Series Data Library (TSDL) from MU <https://datamarket.com/data/list/?q=provider:tsdl>`_
* `UC Riverside Time Series Dataset <http://www.cs.ucr.edu/~eamonn/time_series_data/>`_
Transportation Transportation
-------------- --------------
* `Airlines OD Data 1987-2008 <http://stat-computing.org/dataexpo/2009/the-data.html>`_ * `Airlines OD Data 1987-2008 <http://stat-computing.org/dataexpo/2009/the-data.html>`_
* `Bike Share Systems (BSS) collection <https://github.com/BetaNYC/Bike-Share-Data-Best-Practices/wiki/Bike-Share-Data-Systems>`_
* `Bay Area Bike Share Data <http://www.bayareabikeshare.com/open-data>`_ * `Bay Area Bike Share Data <http://www.bayareabikeshare.com/open-data>`_
* `Bike Share Systems (BSS) collection <https://github.com/BetaNYC/Bike-Share-Data-Best-Practices/wiki/Bike-Share-Data-Systems>`_
* `GeoLife GPS Trajectory from Microsoft Research <http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/>`_ * `GeoLife GPS Trajectory from Microsoft Research <http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/>`_
* `Hubway Million Rides in MA <http://hubwaydatachallenge.org/trip-history-data/>`_ * `Hubway Million Rides in MA <http://hubwaydatachallenge.org/trip-history-data/>`_
* `Marine Traffic - ship tracks, port calls and more <http://www.marinetraffic.com/de/ais-api-services>`_ * `Marine Traffic - ship tracks, port calls and more <http://www.marinetraffic.com/de/ais-api-services>`_
* `NYC Taxi Trip Data 2013 (FOIA/FOILed) <https://archive.org/details/nycTaxiTripData2013>`_
* `NYC Taxi Trip Data 2009- <http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml>`_ * `NYC Taxi Trip Data 2009- <http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml>`_
* `NYC Taxi Trip Data 2013 (FOIA/FOILed) <https://archive.org/details/nycTaxiTripData2013>`_
* `NYC Uber trip data April 2014 to September 2014 <https://github.com/fivethirtyeight/uber-tlc-foil-response>`_
* `OpenFlights - airport, airline and route data <http://openflights.org/data.html>`_ * `OpenFlights - airport, airline and route data <http://openflights.org/data.html>`_
* `Plane Crash Database, since 1920 <http://www.planecrashinfo.com/database.htm>`_ * `Plane Crash Database, since 1920 <http://www.planecrashinfo.com/database.htm>`_
* `RITA Airline On-Time Performance data <http://www.transtats.bts.gov/Tables.asp?DB_ID=120>`_ * `RITA Airline On-Time Performance data <http://www.transtats.bts.gov/Tables.asp?DB_ID=120>`_
@ -481,7 +487,6 @@ Transportation
* `U.S. Bureau of Transportation Statistics (BTS) <http://www.rita.dot.gov/bts/>`_ * `U.S. Bureau of Transportation Statistics (BTS) <http://www.rita.dot.gov/bts/>`_
* `U.S. Domestic Flights 1990 to 2009 <http://academictorrents.com/details/a2ccf94bbb4af222bf8e69dad60a68a29f310d9a>`_ * `U.S. Domestic Flights 1990 to 2009 <http://academictorrents.com/details/a2ccf94bbb4af222bf8e69dad60a68a29f310d9a>`_
* `U.S. Freight Analysis Framework since 2007 <http://ops.fhwa.dot.gov/freight/freight_analysis/faf/index.htm>`_ * `U.S. Freight Analysis Framework since 2007 <http://ops.fhwa.dot.gov/freight/freight_analysis/faf/index.htm>`_
* `NYC Uber trip data April 2014 to September 2014 <https://github.com/fivethirtyeight/uber-tlc-foil-response>`_
Complementary Collections Complementary Collections
@ -489,9 +494,9 @@ Complementary Collections
* DataWrangling: `Some Datasets Available on the Web <http://www.datawrangling.com/some-datasets-available-on-the-web>`_ * DataWrangling: `Some Datasets Available on the Web <http://www.datawrangling.com/some-datasets-available-on-the-web>`_
* Inside-r: `Finding Data on the Internet <http://www.inside-r.org/howto/finding-data-internet>`_ * Inside-r: `Finding Data on the Internet <http://www.inside-r.org/howto/finding-data-internet>`_
* OpenDataMonitor: `An overview of available open data resources in Europe <http://opendatamonitor.eu>`_
* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits <http://www.opendatanetwork.com/>`_
* Quora: `Where can I find large datasets open to the public? <http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public>`_ * Quora: `Where can I find large datasets open to the public? <http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public>`_
* RS.io: `100+ Interesting Data Sets for Statistics <http://rs.io/100-interesting-data-sets-for-statistics/>`_ * RS.io: `100+ Interesting Data Sets for Statistics <http://rs.io/100-interesting-data-sets-for-statistics/>`_
* StaTrek: `Leveraging open data to understand urban lives <http://xiaming.me/posts/2014/10/23/leveraging-open-data-to-understand-urban-lives/>`_ * StaTrek: `Leveraging open data to understand urban lives <http://xiaming.me/posts/2014/10/23/leveraging-open-data-to-understand-urban-lives/>`_
* OpenDataMonitor: `An overview of available open data resources in Europe <http://opendatamonitor.eu>`_
* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits <http://www.opendatanetwork.com/>`_
* Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. <https://zenodo.org/collection/datasets>`_ * Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. <https://zenodo.org/collection/datasets>`_