From 3ba773df2de068da80e495437f1b8663a1f6939f Mon Sep 17 00:00:00 2001 From: Daniel Darabos Date: Thu, 5 Jan 2017 17:07:31 +0100 Subject: [PATCH 01/30] Fix typo. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 25186ef..6a03677 100755 --- a/README.rst +++ b/README.rst @@ -113,7 +113,7 @@ Complex Networks Computer Networks ----------------- -* `3.5B Web Pages from CommonCraw 2012 `_ +* `3.5B Web Pages from CommonCrawl 2012 `_ * `53.5B Web clicks of 100K users in Indiana Univ. `_ * `CAIDA Internet Datasets `_ * `ClueWeb09 - 1B web pages `_ From cddb768b860c18928e35b5ffc4b13cea481986e9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Pelletier?= Date: Sun, 8 Jan 2017 14:17:45 -0500 Subject: [PATCH 02/30] Update Government.rst --- Government.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/Government.rst b/Government.rst index db7f229..85f5efd 100644 --- a/Government.rst +++ b/Government.rst @@ -96,6 +96,7 @@ Government * `U.S. Food and Drug Administration (FDA) `_ * `U.S. National Center for Education Statistics (NCES) `_ * `U.S. Open Government `_ +* `Uganda Bureau of Statistics `_ * `UK 2011 Census Open Atlas Project `_ * `United Nations `_ * `Uruguay `_ From 6ea30d09b4f01d27ac433062df457aabac5c66d2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Pelletier?= Date: Sun, 8 Jan 2017 14:23:43 -0500 Subject: [PATCH 03/30] Update README.rst --- README.rst | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 6a03677..05a5b8e 100755 --- a/README.rst +++ b/README.rst @@ -68,7 +68,7 @@ Biology Climate/Weather --------------- - +* `Actuaries Climate Index `_ * `Australian Weather `_ * `Aviation Weather Center - Consistent, timely and accurate weather information for the world airspace system `_ * `Brazilian Weather - Historical data (In Portuguese) `_ @@ -151,7 +151,6 @@ Data Challenges * `Bruteforce Database `_ * `TravisTorrent Dataset - MSR'2017 Mining Challenge `_ - Earth Science ------------- @@ -259,7 +258,8 @@ GIS Government ---------- -* `OpenDataSoft's list of 1,600 open data portals `_ +* `OpenDataSoft's list of 1,600 open data `_ +* `Open Data for Africa `_ * `A list of cities and countries contributed by community `_ @@ -487,11 +487,13 @@ Social Sciences * `Datacards `_ * `European Social Survey `_ * `FBI Hate Crime 2013 - aggregated data `_ +* `Fragile States Index `_ * `GDELT Global Events Database `_ * `General Social Survey (GSS) since 1972 `_ * `German Social Survey `_ * `Global Religious Futures Project `_ * `Humanitarian Data Exchange `_ +* `INFORM Index for Risk Management `_ * `Institute for Demographic Studies `_ * `International Networks Archive `_ * `International Social Survey Program ISSP `_ @@ -500,6 +502,7 @@ Social Sciences * `MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_ * `Minnesota Population Center `_ * `MIT Reality Mining Dataset `_ +* `Notre Dame Global Adaptation Index (NG-DAIN) `_ * `Open Crime and Policing Data in England, Wales and Northern Ireland `_ * `Paul Hensel General International Data Page `_ * `PewResearch Internet Survey Project `_ @@ -515,7 +518,7 @@ Social Sciences * `UN Civil Society Database `_ * `Universities Worldwide `_ * `UPJOHN for Labor Employment Research `_ -* `World Bank Data `_ +* `World Bank Open Data `_ * `WorldPop project - Worldwide human population distributions `_ From e07bb6ccc26ed59f0680ffd45cd28d2d9dd6266a Mon Sep 17 00:00:00 2001 From: Katherine Schinkel Date: Sun, 15 Jan 2017 19:41:14 -0800 Subject: [PATCH 04/30] Add College Scorecard https://collegescorecard.ed.gov/data/ --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 05a5b8e..a003f47 100755 --- a/README.rst +++ b/README.rst @@ -189,6 +189,7 @@ Economics Education ------------ +* `College Scorecard Data `_ * `Student Data from Free Code Camp `_ From ff5ed076f4cef7ec935fd7ff444eaa8d38c15fee Mon Sep 17 00:00:00 2001 From: Raul Jimenez Ortega Date: Fri, 27 Jan 2017 08:10:21 +0100 Subject: [PATCH 05/30] Adding ArcGIS Open Data portal --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 05a5b8e..fee51aa 100755 --- a/README.rst +++ b/README.rst @@ -231,6 +231,7 @@ Finance GIS --- +* `ArcGIS Open Data portal `_ * `Cambridge, MA, US, GIS data on GitHub `_ * `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ From 1c940529b037528433049fdc0e9d6e0d5d0d7b2a Mon Sep 17 00:00:00 2001 From: Jad Chaar Date: Sat, 28 Jan 2017 23:43:32 -0500 Subject: [PATCH 06/30] Added links to SURFRAD data --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 05a5b8e..8b03172 100755 --- a/README.rst +++ b/README.rst @@ -80,6 +80,7 @@ Climate/Weather * `NOAA Bering Sea Climate `_ * `NOAA Climate Datasets `_ * `NOAA Realtime Weather Models `_ +* `NOAA SURFRAD Meteorology and Radiation Datasets `_ * `The World Bank Open Data Resources for Climate Change `_ * `UEA Climatic Research Unit `_ * `WorldClim - Global Climate Data `_ From 92ede117e165d4e2883bcb8c8b696d74a23b49a6 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sat, 4 Feb 2017 13:24:06 +0800 Subject: [PATCH 07/30] fix link issue #276 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 3596eda..c300e61 100755 --- a/README.rst +++ b/README.rst @@ -131,7 +131,7 @@ Computer Networks Contextual Data --------------- -* `Context-aware data sets from five domains `_ or `GitHub `_ +* `Context-aware data sets from five domains `_ Data Challenges From 20ad345175ca9e16ed7c6896448e8c2e813305e2 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sat, 4 Feb 2017 13:25:54 +0800 Subject: [PATCH 08/30] Fix link issue #277 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index c300e61..10da14f 100755 --- a/README.rst +++ b/README.rst @@ -156,7 +156,7 @@ Earth Science ------------- * `AQUASTAT - Global water resources and uses `_ -* `BODC - marine data of ~22K vars `_ +* `BODC - marine data of ~22K vars `_ * `Earth Models `_ * `EOSDIS - NASA's earth observing system data `_ * `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements `_ or `on S3 `_ From cb41229790348825ded701259413459cac920591 Mon Sep 17 00:00:00 2001 From: Philip Fung Date: Tue, 7 Feb 2017 12:24:59 -0800 Subject: [PATCH 09/30] adding National Cancer Institute - Genomic Data Commons --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 10da14f..202d181 100755 --- a/README.rst +++ b/README.rst @@ -45,6 +45,7 @@ Biology * `MIT Cancer Genomics Data `_ * `NCBI Proteins `_ * `NCBI Taxonomy `_ +* `NCI Genomic Data Commons `_ * `NIH Microarray data `_ or `FTP `_ (see FTP link on `RAW `_) * `OpenSNP genotypes data `_ * `Pathguid - Protein-Protein Interactions Catalog `_ From 64fe2cc8c35d8765bfe0735890e18ff409e1cfcd Mon Sep 17 00:00:00 2001 From: Alex Date: Mon, 13 Feb 2017 14:49:11 +1300 Subject: [PATCH 10/30] added youtube 8 and visual genome --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 10da14f..47eff1f 100755 --- a/README.rst +++ b/README.rst @@ -304,6 +304,7 @@ Image Processing * `Adience Unfiltered faces for gender and age classification `_ * `The Action Similarity Labeling (ASLAN) Challenge `_ * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ +* `Visual genome `_ Machine Learning ---------------- @@ -325,6 +326,7 @@ Machine Learning * `Restaurants Health Score Data in San Francisco `_ * `UCI Machine Learning Repository `_ * `Yahoo! Ratings and Classification Data `_ +* `Youtube 8m `_ Museums From e5cea9a18422088a4f641d9d21e6b323f9fd6526 Mon Sep 17 00:00:00 2001 From: Alex Date: Mon, 13 Feb 2017 14:57:38 +1300 Subject: [PATCH 11/30] Update README.rst --- README.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index 47eff1f..6b57705 100755 --- a/README.rst +++ b/README.rst @@ -304,7 +304,7 @@ Image Processing * `Adience Unfiltered faces for gender and age classification `_ * `The Action Similarity Labeling (ASLAN) Challenge `_ * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ -* `Visual genome `_ +* `Visual genome `_ Machine Learning ---------------- @@ -326,7 +326,7 @@ Machine Learning * `Restaurants Health Score Data in San Francisco `_ * `UCI Machine Learning Repository `_ * `Yahoo! Ratings and Classification Data `_ -* `Youtube 8m `_ +* `Youtube 8m `_ Museums From 5587d232b599a2b9dc23ab4b1c99bc2bc19ed399 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Mon, 13 Feb 2017 11:30:01 +0800 Subject: [PATCH 12/30] Add EveryPolitician, #280 --- Government.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Government.rst b/Government.rst index 85f5efd..1df8d04 100644 --- a/Government.rst +++ b/Government.rst @@ -1,6 +1,8 @@ Government ---------- +* `EveryPolitician, ongoing project collating and sharing data on every politician. `_ + * `Alberta, Province of Canada `_ * `Antwerp, Belgium `_ * `Argentina (non official) `_ From 49e07e34c284b9292cd68fb590affeb57756194e Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Mon, 13 Feb 2017 11:34:21 +0800 Subject: [PATCH 13/30] Add data.world #279 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 6b57705..2bee928 100755 --- a/README.rst +++ b/README.rst @@ -417,6 +417,7 @@ Public Domains * `CMU StatLab collections `_ * `Data360 `_ * `Datamob.org `_ +* `Data.World `_ * `Google `_ * `Infochimps `_ * `KDNuggets Data Collections `_ From 7ac9f9e367cdc5d47d897fc788b68ead5135d827 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Mon, 13 Feb 2017 11:45:07 +0800 Subject: [PATCH 14/30] Add Tennis database from Jeff Sackmann #278 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 2bee928..b59d440 100755 --- a/README.rst +++ b/README.rst @@ -544,6 +544,7 @@ Sports * `Lahman's Baseball Database `_ * `Pinhooker: Thoroughbred Bloodstock Sale Data `_ * `Retrosheet Baseball Statistics `_ +* `Tennis database of rankings, results, and stats for ATP `_, `WTA `_, `Grand Slams `_ and `Match Charting Project `_ Time Series From 6141e30d29e36a90eeaddc756f08f7164f351b74 Mon Sep 17 00:00:00 2001 From: Emre Bolat Date: Thu, 23 Feb 2017 10:26:22 +0200 Subject: [PATCH 15/30] New addition to Agriculture category U.S. Department of Agriculture's Nutrient Database link added. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index aba0efe..b59c6c7 100755 --- a/README.rst +++ b/README.rst @@ -17,6 +17,7 @@ Other amazingly awesome lists can be found in the Agriculture ------------ * `U.S. Department of Agriculture's PLANTS Database `_ +* `U.S. Department of Agriculture's Nutrient Database `_ Biology From e746ff23857f0550d47ad3074af00d597446188a Mon Sep 17 00:00:00 2001 From: Alex Date: Fri, 24 Feb 2017 14:20:01 +1300 Subject: [PATCH 16/30] added comp vision dataset --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index aba0efe..d5a2910 100755 --- a/README.rst +++ b/README.rst @@ -306,6 +306,7 @@ Image Processing * `The Action Similarity Labeling (ASLAN) Challenge `_ * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ * `Visual genome `_ +* `Caltech Pedestrian Detection Benchmark `_ Machine Learning ---------------- From dc1f51b3263d700596603c4a52c54dd9b44d0955 Mon Sep 17 00:00:00 2001 From: Martin Linkov Date: Wed, 1 Mar 2017 11:14:10 +0100 Subject: [PATCH 17/30] CoolDatasets The twitter account upgraded to a website, the collection grows, I think it is worth including in the Complementary List --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index aba0efe..79ac468 100755 --- a/README.rst +++ b/README.rst @@ -592,6 +592,7 @@ Complementary Collections * `Data Packaged Core Datasets `_ * `Database of Scientific Code Contributions `_ * DataWrangling: `Some Datasets Available on the Web `_ +* A growing collection of public datasets: `CoolDatasets. `_ * Inside-r: `Finding Data on the Internet `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ * Quora: `Where can I find large datasets open to the public? `_ From aff0331e4e2dcbfc259b92a464c734ad73ffcd28 Mon Sep 17 00:00:00 2001 From: owkwen Date: Thu, 9 Mar 2017 13:54:36 -0500 Subject: [PATCH 18/30] Resurrected link Montreal BIXI Bike Share link is dead. Updated with new link and in english. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index aba0efe..de92f21 100755 --- a/README.rst +++ b/README.rst @@ -568,7 +568,7 @@ Transportation * `German train system by Deutsche Bahn `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ -* `Montreal BIXI Bike Share `_ +* `Montreal BIXI Bike Share `_ * `NYC Taxi Trip Data 2009- `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Uber trip data April 2014 to September 2014 `_ From 1633901880b97b47194c97f6abd896a5dbe14e8f Mon Sep 17 00:00:00 2001 From: Clement Michaud Date: Tue, 28 Mar 2017 22:04:21 +0200 Subject: [PATCH 19/30] Fix broken link to Transport for London open datasets --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index aba0efe..4a57290 100755 --- a/README.rst +++ b/README.rst @@ -579,7 +579,7 @@ Transportation * `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ * `Toronto Bike Share Stations (XML file) `_ -* `Transport for London (TFL) `_ +* `Transport for London (TFL) `_ * `Travel Tracker Survey (TTS) for Chicago `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ * `U.S. Domestic Flights 1990 to 2009 `_ From 863c2c831100a9d03eb6fba2b0644f068edf4d91 Mon Sep 17 00:00:00 2001 From: shagun Sodhani Date: Thu, 6 Apr 2017 14:00:41 +0530 Subject: [PATCH 20/30] Added webhose datasets - related to News/Blogs in multiple languages --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index b59c6c7..87071ab 100755 --- a/README.rst +++ b/README.rst @@ -372,6 +372,7 @@ Natural Language * `WordNet databases and tools `_ * `Open Multilingual Wordnet `_ * `Automatic Keyphrase Extracttion `_ +* `News/Blogs in multiple languages `_ Neuroscience From e53e99c4c468cb6528cc4993ba40cfaf58467114 Mon Sep 17 00:00:00 2001 From: Katherine Schinkel Date: Thu, 6 Apr 2017 21:09:07 -0700 Subject: [PATCH 21/30] Create PULL_REQUEST_TEMPLATE.md --- PULL_REQUEST_TEMPLATE.md | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 PULL_REQUEST_TEMPLATE.md diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 0000000..4690fa4 --- /dev/null +++ b/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,3 @@ +# Overview +Dataset Description:
+[link to dataset](putlinkhere.com) From f96c461782a6d899e21046de3d4a7b622b19e598 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Fri, 7 Apr 2017 16:47:40 +0800 Subject: [PATCH 22/30] Clear format and fix #291 --- README.rst | 69 +++++++++++++++++++++++++++--------------------------- 1 file changed, 34 insertions(+), 35 deletions(-) diff --git a/README.rst b/README.rst index da3a2e7..4068950 100755 --- a/README.rst +++ b/README.rst @@ -25,8 +25,8 @@ Biology * `1000 Genomes `_ * `American Gut (Microbiome Project) `_ -* `Broad Cancer Cell Line Encyclopedia (CCLE) `_ * `Broad Bioimage Benchmark Collection (BBBC) `_ +* `Broad Cancer Cell Line Encyclopedia (CCLE) `_ * `Cell Image Library `_ * `Complete Genomics Public Data `_ * `EBI ArrayExpress `_ @@ -64,12 +64,13 @@ Biology * `The Catalogue of Life `_ * `The Personal Genome Project `_ or `PGP `_ * `UCSC Public Data `_ -* `Universal Protein Resource (UnitProt) `_ * `UniGene `_ +* `Universal Protein Resource (UnitProt) `_ Climate/Weather --------------- + * `Actuaries Climate Index `_ * `Australian Weather `_ * `Aviation Weather Center - Consistent, timely and accurate weather information for the world airspace system `_ @@ -95,6 +96,7 @@ Complex Networks * `AMiner Citation Network Dataset `_ * `CrossRef DOI URLs `_ * `DBLP Citation dataset `_ +* `DIMACS Road Networks Collection `_ * `NBER Patent Citations `_ * `Network Repository with Interactive Exploratory Analysis Tools `_ * `NIST complex networks data collection `_ @@ -111,7 +113,7 @@ Complex Networks * `UCI Network Data Repository `_ * `UFL sparse matrix collection `_ * `WSU Graph Database `_ -* `DIMACS Road Networks Collection `_ + Computer Networks ----------------- @@ -130,15 +132,10 @@ Computer Networks * `UCSD Network Telescope, IPv4 /8 net `_ -Contextual Data ---------------- - -* `Context-aware data sets from five domains `_ - - Data Challenges --------------- +* `Bruteforce Database `_ * `Challenges in Machine Learning `_ * `CrowdANALYTIX dataX `_ * `D4D Challenge of Orange `_ @@ -150,9 +147,9 @@ Data Challenges * `Netflix Prize `_ * `Space Apps Challenge `_ * `Telecom Italia Big Data Challenge `_ -* `Yelp Dataset Challenge `_ -* `Bruteforce Database `_ * `TravisTorrent Dataset - MSR'2017 Mining Challenge `_ +* `Yelp Dataset Challenge `_ + Earth Science ------------- @@ -216,7 +213,6 @@ Energy * `WHITED `_ - Finance ------- @@ -224,12 +220,12 @@ Finance * `Google Finance `_ * `Google Trends `_ * `NASDAQ `_ +* `NYSE Market Data `_ (see FTP link on `RAW `_) * `OANDA `_ * `OSU Financial data `_ * `Quandl `_ * `St Louis Federal `_ * `Yahoo Finance `_ -* `NYSE Market Data `_ (see FTP link on `RAW `_) GIS @@ -263,9 +259,9 @@ GIS Government ---------- -* `OpenDataSoft's list of 1,600 open data `_ -* `Open Data for Africa `_ * `A list of cities and countries contributed by community `_ +* `Open Data for Africa `_ +* `OpenDataSoft's list of 1,600 open data `_ Healthcare @@ -289,10 +285,13 @@ Image Processing * `10k US Adult Faces Database `_ * `2GB of Photos of Cats `_ or `Archive version `_ +* `Adience Unfiltered faces for gender and age classification `_ * `Affective Image Classification `_ * `Animals with attributes `_ +* `Caltech Pedestrian Detection Benchmark `_ * `Chars74K dataset, Character Recognition in Natural Images (both English and Kannada are available) `_ * `Face Recognition Benchmark `_ +* `GDXray: X-ray images for X-ray testing and Computer Vision `_ * `ImageNet (in WordNet hierarchy) `_ * `Indoor Scene Recognition `_ * `International Affective Picture System, UFL `_ @@ -301,17 +300,17 @@ Image Processing * `Several Shape-from-Silhouette Datasets `_ * `Stanford Dogs Dataset `_ * `SUN database, MIT `_ -* `The Oxford-IIIT Pet Dataset `_ -* `YouTube Faces Database `_ -* `Adience Unfiltered faces for gender and age classification `_ * `The Action Similarity Labeling (ASLAN) Challenge `_ +* `The Oxford-IIIT Pet Dataset `_ * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ * `Visual genome `_ -* `Caltech Pedestrian Detection Benchmark `_ +* `YouTube Faces Database `_ + Machine Learning ---------------- +* `Context-aware data sets from five domains `_ * `Delve Datasets for classification and regression (Univ. of Toronto) `_ * `Discogs Monthly Data `_ * `eBay Online Auctions (2012) `_ @@ -322,8 +321,8 @@ Machine Learning * `Machine Learning Data Set Repository `_ * `Million Song Dataset `_ * `More Song Datasets `_ -* `New Yorker caption contest ratings `_ * `MovieLens Data Sets `_ +* `New Yorker caption contest ratings `_ * `RDataMining - "R and Data Mining" ebook data `_ * `Registered Meteorites on Earth `_ * `Restaurants Health Score Data in San Francisco `_ @@ -347,6 +346,7 @@ Museums Natural Language ---------------- +* `Automatic Keyphrase Extracttion `_ * `Blogger Corpus `_ * `CLiPS Stylometry Investigation Corpus `_ * `ClueWeb09 FACC `_ @@ -361,37 +361,36 @@ Natural Language * `Hansards text chunks of Canadian Parliament `_ * `Machine Comprehension Test (MCTest) of text from Microsoft Research `_ * `Machine Translation of European languages `_ -* `Multi-Domain Sentiment Dataset (version 2.0) `_ * `Microsoft MAchine Reading COmprehension Dataset (or MS MARCO) `_ +* `Multi-Domain Sentiment Dataset (version 2.0) `_ +* `Open Multilingual Wordnet `_ * `Personae Corpus `_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ * `SMS Spam Collection in English `_ +* `Universal Dependencies `_ * `USENET postings corpus of 2005~2011 `_ +* `Webhose - News/Blogs in multiple languages `_ * `Wikidata - Wikipedia databases `_ * `Wikipedia Links data - 40 Million Entities in Context `_ -* `Universal Dependencies `_ * `WordNet databases and tools `_ -* `Open Multilingual Wordnet `_ -* `Automatic Keyphrase Extracttion `_ -* `News/Blogs in multiple languages `_ - + Neuroscience ------------- * `Allen Institute Datasets `_ * `Brain Catalogue `_ -* `Brainomics `_ -* `CodeNeuro Datasets `_ +* `Brainomics `_ +* `CodeNeuro Datasets `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `FCP-INDI `_ -* `Human Connectome Project `_ +* `Human Connectome Project `_ * `NDAR `_ -* `NIMH Data Archive `_ * `NeuroData `_ +* `Neuroelectro `_ +* `NIMH Data Archive `_ * `OASIS `_ * `OpenfMRI `_ -* `Neuroelectro `_ * `Study Forrest `_ @@ -419,9 +418,9 @@ Public Domains * `Archive.org Datasets `_ * `CMU JASA data archive `_ * `CMU StatLab collections `_ +* `Data.World `_ * `Data360 `_ * `Datamob.org `_ -* `Data.World `_ * `Google `_ * `Infochimps `_ * `KDNuggets Data Collections `_ @@ -477,8 +476,8 @@ Social Networks * `Skytrax' Air Travel Reviews Dataset `_ * `Social Twitter Data `_ * `SourceForge.net Research Data `_ -* `Twitter Data for Sentiment Analysis `_ * `Twitter Data for Online Reputation Management `_ +* `Twitter Data for Sentiment Analysis `_ * `Twitter Graph of entire Twitter site `_ * `Twitter Scrape Calufa May 2011 `_ * `UNIMI/LAW Social Network Datasets `_ @@ -523,11 +522,11 @@ Social Sciences * `Texas Inmates Executed Since 1984 `_ * `Titanic Survival Data Set `_ or `on Kaggle `_ * `UCB's Archive of Social Science Data (D-Lab) `_ -* `Uppsala Conflict Data Program `_ * `UCLA Social Sciences Data Archive `_ * `UN Civil Society Database `_ * `Universities Worldwide `_ * `UPJOHN for Labor Employment Research `_ +* `Uppsala Conflict Data Program `_ * `World Bank Open Data `_ * `WorldPop project - Worldwide human population distributions `_ @@ -594,8 +593,8 @@ Complementary Collections * `Data Packaged Core Datasets `_ * `Database of Scientific Code Contributions `_ -* DataWrangling: `Some Datasets Available on the Web `_ * A growing collection of public datasets: `CoolDatasets. `_ +* DataWrangling: `Some Datasets Available on the Web `_ * Inside-r: `Finding Data on the Internet `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ * Quora: `Where can I find large datasets open to the public? `_ From 68088197e998355435117ec3a660d8ad96bf4aad Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Fri, 7 Apr 2017 16:59:02 +0800 Subject: [PATCH 23/30] Modify pull_request_template --- PULL_REQUEST_TEMPLATE.md | 3 --- PULL_REQUEST_TEMPLATE.rst | 3 +++ 2 files changed, 3 insertions(+), 3 deletions(-) delete mode 100644 PULL_REQUEST_TEMPLATE.md create mode 100644 PULL_REQUEST_TEMPLATE.rst diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md deleted file mode 100644 index 4690fa4..0000000 --- a/PULL_REQUEST_TEMPLATE.md +++ /dev/null @@ -1,3 +0,0 @@ -# Overview -Dataset Description:
-[link to dataset](putlinkhere.com) diff --git a/PULL_REQUEST_TEMPLATE.rst b/PULL_REQUEST_TEMPLATE.rst new file mode 100644 index 0000000..1014736 --- /dev/null +++ b/PULL_REQUEST_TEMPLATE.rst @@ -0,0 +1,3 @@ +# Overview + +* `Dataset Description `_ From e3dcb1c503e792d692f64a179f8ee1a81a75ce1b Mon Sep 17 00:00:00 2001 From: Cameron Date: Fri, 28 Apr 2017 15:00:28 -0700 Subject: [PATCH 24/30] add flickr logo dataset --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 47e51b5..6e50e87 100755 --- a/README.rst +++ b/README.rst @@ -291,6 +291,7 @@ Image Processing * `Caltech Pedestrian Detection Benchmark `_ * `Chars74K dataset, Character Recognition in Natural Images (both English and Kannada are available) `_ * `Face Recognition Benchmark `_ +* `Flickr: 32 Class Brand Logos `_ * `GDXray: X-ray images for X-ray testing and Computer Vision `_ * `ImageNet (in WordNet hierarchy) `_ * `Indoor Scene Recognition `_ From dac0811dc28755fa5101613f31bbcbf01f887d05 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C3=ABl=20Defferrard?= Date: Wed, 10 May 2017 15:54:12 +0200 Subject: [PATCH 25/30] Add Free Music Archive --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 47e51b5..761f7e2 100755 --- a/README.rst +++ b/README.rst @@ -319,6 +319,7 @@ Machine Learning * `Labeled Faces in the Wild (LFW) `_ * `Lending Club Loan Data `_ * `Machine Learning Data Set Repository `_ +* `Free Music Archive `_ * `Million Song Dataset `_ * `More Song Datasets `_ * `MovieLens Data Sets `_ From 0bde4fd8edcf044131d5669fd22a1ac10f1b2ee3 Mon Sep 17 00:00:00 2001 From: Ryan Barrett Date: Thu, 29 Jun 2017 07:36:48 -0700 Subject: [PATCH 26/30] Add Indie Map --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index edab464..b169fcb 100755 --- a/README.rst +++ b/README.rst @@ -472,6 +472,7 @@ Social Networks * `GitHub Collaboration Archive `_ * `Google Scholar citation relations `_ * `High-Resolution Contact Networks from Wearable Sensors `_ +* `Indie Map: social graph and crawl of top IndieWeb sites `_ * `Mobile Social Networks from UMASS `_ * `Network Twitter Data `_ * `Reddit Comments `_ From 1c57e245bd11f2f6d650ad07a4c3b4d92bc6d087 Mon Sep 17 00:00:00 2001 From: Tom Morris Date: Tue, 11 Jul 2017 10:37:39 -0400 Subject: [PATCH 27/30] Datamob is gone --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index edab464..1a33385 100755 --- a/README.rst +++ b/README.rst @@ -422,7 +422,6 @@ Public Domains * `CMU StatLab collections `_ * `Data.World `_ * `Data360 `_ -* `Datamob.org `_ * `Google `_ * `Infochimps `_ * `KDNuggets Data Collections `_ From 76ee6a0012c8d5d835581928e15b3f8416b71383 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 10 Aug 2017 10:54:22 +0800 Subject: [PATCH 28/30] Fix #308 --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 1a33385..f631ee5 100755 --- a/README.rst +++ b/README.rst @@ -269,6 +269,7 @@ Healthcare * `EHDP Large Health Data Sets `_ * `Gapminder World demographic databases `_ +* `GDC supports several cancer genome programs for CCG, TCGA, TARGET etc. `_ * `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ @@ -276,7 +277,7 @@ Healthcare * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ * `OpenPaymentsData, Healthcare financial relationship data `_ -* `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ +* The Cancer Genome Atlas project (TCGA) (refer to `GDC `_ and `BigQuery table `_) * `World Health Organization Global Health Observatory `_ From a12a3b41693047128bda88552ad1543950c4bb32 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 10 Aug 2017 10:55:40 +0800 Subject: [PATCH 29/30] Fix #307 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index f631ee5..8155e1e 100755 --- a/README.rst +++ b/README.rst @@ -349,7 +349,7 @@ Museums Natural Language ---------------- -* `Automatic Keyphrase Extracttion `_ +* `Automatic Keyphrase Extraction `_ * `Blogger Corpus `_ * `CLiPS Stylometry Investigation Corpus `_ * `ClueWeb09 FACC `_ From 853dbff93781b301cc4af8249927c505192d1d41 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 10 Aug 2017 11:06:01 +0800 Subject: [PATCH 30/30] #306 --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 8155e1e..9472dc3 100755 --- a/README.rst +++ b/README.rst @@ -4,7 +4,7 @@ Awesome Public Datasets :alt: Awesome :target: https://github.com/sindresorhus/awesome -`This list of public data sources `_ +`This list of a topic-centric public data sources `_ in high quality. They are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in the @@ -270,6 +270,7 @@ Healthcare * `EHDP Large Health Data Sets `_ * `Gapminder World demographic databases `_ * `GDC supports several cancer genome programs for CCG, TCGA, TARGET etc. `_ +* `PhysioBank Databases - a large and growing archive of physiological data `_ * `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_