From e746ff23857f0550d47ad3074af00d597446188a Mon Sep 17 00:00:00 2001 From: Alex Date: Fri, 24 Feb 2017 14:20:01 +1300 Subject: [PATCH 1/8] added comp vision dataset --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index aba0efe..d5a2910 100755 --- a/README.rst +++ b/README.rst @@ -306,6 +306,7 @@ Image Processing * `The Action Similarity Labeling (ASLAN) Challenge `_ * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ * `Visual genome `_ +* `Caltech Pedestrian Detection Benchmark `_ Machine Learning ---------------- From dc1f51b3263d700596603c4a52c54dd9b44d0955 Mon Sep 17 00:00:00 2001 From: Martin Linkov Date: Wed, 1 Mar 2017 11:14:10 +0100 Subject: [PATCH 2/8] CoolDatasets The twitter account upgraded to a website, the collection grows, I think it is worth including in the Complementary List --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index aba0efe..79ac468 100755 --- a/README.rst +++ b/README.rst @@ -592,6 +592,7 @@ Complementary Collections * `Data Packaged Core Datasets `_ * `Database of Scientific Code Contributions `_ * DataWrangling: `Some Datasets Available on the Web `_ +* A growing collection of public datasets: `CoolDatasets. `_ * Inside-r: `Finding Data on the Internet `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ * Quora: `Where can I find large datasets open to the public? `_ From aff0331e4e2dcbfc259b92a464c734ad73ffcd28 Mon Sep 17 00:00:00 2001 From: owkwen Date: Thu, 9 Mar 2017 13:54:36 -0500 Subject: [PATCH 3/8] Resurrected link Montreal BIXI Bike Share link is dead. Updated with new link and in english. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index aba0efe..de92f21 100755 --- a/README.rst +++ b/README.rst @@ -568,7 +568,7 @@ Transportation * `German train system by Deutsche Bahn `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ -* `Montreal BIXI Bike Share `_ +* `Montreal BIXI Bike Share `_ * `NYC Taxi Trip Data 2009- `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Uber trip data April 2014 to September 2014 `_ From 1633901880b97b47194c97f6abd896a5dbe14e8f Mon Sep 17 00:00:00 2001 From: Clement Michaud Date: Tue, 28 Mar 2017 22:04:21 +0200 Subject: [PATCH 4/8] Fix broken link to Transport for London open datasets --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index aba0efe..4a57290 100755 --- a/README.rst +++ b/README.rst @@ -579,7 +579,7 @@ Transportation * `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ * `Toronto Bike Share Stations (XML file) `_ -* `Transport for London (TFL) `_ +* `Transport for London (TFL) `_ * `Travel Tracker Survey (TTS) for Chicago `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ * `U.S. Domestic Flights 1990 to 2009 `_ From 863c2c831100a9d03eb6fba2b0644f068edf4d91 Mon Sep 17 00:00:00 2001 From: shagun Sodhani Date: Thu, 6 Apr 2017 14:00:41 +0530 Subject: [PATCH 5/8] Added webhose datasets - related to News/Blogs in multiple languages --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index b59c6c7..87071ab 100755 --- a/README.rst +++ b/README.rst @@ -372,6 +372,7 @@ Natural Language * `WordNet databases and tools `_ * `Open Multilingual Wordnet `_ * `Automatic Keyphrase Extracttion `_ +* `News/Blogs in multiple languages `_ Neuroscience From e53e99c4c468cb6528cc4993ba40cfaf58467114 Mon Sep 17 00:00:00 2001 From: Katherine Schinkel Date: Thu, 6 Apr 2017 21:09:07 -0700 Subject: [PATCH 6/8] Create PULL_REQUEST_TEMPLATE.md --- PULL_REQUEST_TEMPLATE.md | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 PULL_REQUEST_TEMPLATE.md diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 0000000..4690fa4 --- /dev/null +++ b/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,3 @@ +# Overview +Dataset Description:
+[link to dataset](putlinkhere.com) From f96c461782a6d899e21046de3d4a7b622b19e598 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Fri, 7 Apr 2017 16:47:40 +0800 Subject: [PATCH 7/8] Clear format and fix #291 --- README.rst | 69 +++++++++++++++++++++++++++--------------------------- 1 file changed, 34 insertions(+), 35 deletions(-) diff --git a/README.rst b/README.rst index da3a2e7..4068950 100755 --- a/README.rst +++ b/README.rst @@ -25,8 +25,8 @@ Biology * `1000 Genomes `_ * `American Gut (Microbiome Project) `_ -* `Broad Cancer Cell Line Encyclopedia (CCLE) `_ * `Broad Bioimage Benchmark Collection (BBBC) `_ +* `Broad Cancer Cell Line Encyclopedia (CCLE) `_ * `Cell Image Library `_ * `Complete Genomics Public Data `_ * `EBI ArrayExpress `_ @@ -64,12 +64,13 @@ Biology * `The Catalogue of Life `_ * `The Personal Genome Project `_ or `PGP `_ * `UCSC Public Data `_ -* `Universal Protein Resource (UnitProt) `_ * `UniGene `_ +* `Universal Protein Resource (UnitProt) `_ Climate/Weather --------------- + * `Actuaries Climate Index `_ * `Australian Weather `_ * `Aviation Weather Center - Consistent, timely and accurate weather information for the world airspace system `_ @@ -95,6 +96,7 @@ Complex Networks * `AMiner Citation Network Dataset `_ * `CrossRef DOI URLs `_ * `DBLP Citation dataset `_ +* `DIMACS Road Networks Collection `_ * `NBER Patent Citations `_ * `Network Repository with Interactive Exploratory Analysis Tools `_ * `NIST complex networks data collection `_ @@ -111,7 +113,7 @@ Complex Networks * `UCI Network Data Repository `_ * `UFL sparse matrix collection `_ * `WSU Graph Database `_ -* `DIMACS Road Networks Collection `_ + Computer Networks ----------------- @@ -130,15 +132,10 @@ Computer Networks * `UCSD Network Telescope, IPv4 /8 net `_ -Contextual Data ---------------- - -* `Context-aware data sets from five domains `_ - - Data Challenges --------------- +* `Bruteforce Database `_ * `Challenges in Machine Learning `_ * `CrowdANALYTIX dataX `_ * `D4D Challenge of Orange `_ @@ -150,9 +147,9 @@ Data Challenges * `Netflix Prize `_ * `Space Apps Challenge `_ * `Telecom Italia Big Data Challenge `_ -* `Yelp Dataset Challenge `_ -* `Bruteforce Database `_ * `TravisTorrent Dataset - MSR'2017 Mining Challenge `_ +* `Yelp Dataset Challenge `_ + Earth Science ------------- @@ -216,7 +213,6 @@ Energy * `WHITED `_ - Finance ------- @@ -224,12 +220,12 @@ Finance * `Google Finance `_ * `Google Trends `_ * `NASDAQ `_ +* `NYSE Market Data `_ (see FTP link on `RAW `_) * `OANDA `_ * `OSU Financial data `_ * `Quandl `_ * `St Louis Federal `_ * `Yahoo Finance `_ -* `NYSE Market Data `_ (see FTP link on `RAW `_) GIS @@ -263,9 +259,9 @@ GIS Government ---------- -* `OpenDataSoft's list of 1,600 open data `_ -* `Open Data for Africa `_ * `A list of cities and countries contributed by community `_ +* `Open Data for Africa `_ +* `OpenDataSoft's list of 1,600 open data `_ Healthcare @@ -289,10 +285,13 @@ Image Processing * `10k US Adult Faces Database `_ * `2GB of Photos of Cats `_ or `Archive version `_ +* `Adience Unfiltered faces for gender and age classification `_ * `Affective Image Classification `_ * `Animals with attributes `_ +* `Caltech Pedestrian Detection Benchmark `_ * `Chars74K dataset, Character Recognition in Natural Images (both English and Kannada are available) `_ * `Face Recognition Benchmark `_ +* `GDXray: X-ray images for X-ray testing and Computer Vision `_ * `ImageNet (in WordNet hierarchy) `_ * `Indoor Scene Recognition `_ * `International Affective Picture System, UFL `_ @@ -301,17 +300,17 @@ Image Processing * `Several Shape-from-Silhouette Datasets `_ * `Stanford Dogs Dataset `_ * `SUN database, MIT `_ -* `The Oxford-IIIT Pet Dataset `_ -* `YouTube Faces Database `_ -* `Adience Unfiltered faces for gender and age classification `_ * `The Action Similarity Labeling (ASLAN) Challenge `_ +* `The Oxford-IIIT Pet Dataset `_ * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ * `Visual genome `_ -* `Caltech Pedestrian Detection Benchmark `_ +* `YouTube Faces Database `_ + Machine Learning ---------------- +* `Context-aware data sets from five domains `_ * `Delve Datasets for classification and regression (Univ. of Toronto) `_ * `Discogs Monthly Data `_ * `eBay Online Auctions (2012) `_ @@ -322,8 +321,8 @@ Machine Learning * `Machine Learning Data Set Repository `_ * `Million Song Dataset `_ * `More Song Datasets `_ -* `New Yorker caption contest ratings `_ * `MovieLens Data Sets `_ +* `New Yorker caption contest ratings `_ * `RDataMining - "R and Data Mining" ebook data `_ * `Registered Meteorites on Earth `_ * `Restaurants Health Score Data in San Francisco `_ @@ -347,6 +346,7 @@ Museums Natural Language ---------------- +* `Automatic Keyphrase Extracttion `_ * `Blogger Corpus `_ * `CLiPS Stylometry Investigation Corpus `_ * `ClueWeb09 FACC `_ @@ -361,37 +361,36 @@ Natural Language * `Hansards text chunks of Canadian Parliament `_ * `Machine Comprehension Test (MCTest) of text from Microsoft Research `_ * `Machine Translation of European languages `_ -* `Multi-Domain Sentiment Dataset (version 2.0) `_ * `Microsoft MAchine Reading COmprehension Dataset (or MS MARCO) `_ +* `Multi-Domain Sentiment Dataset (version 2.0) `_ +* `Open Multilingual Wordnet `_ * `Personae Corpus `_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ * `SMS Spam Collection in English `_ +* `Universal Dependencies `_ * `USENET postings corpus of 2005~2011 `_ +* `Webhose - News/Blogs in multiple languages `_ * `Wikidata - Wikipedia databases `_ * `Wikipedia Links data - 40 Million Entities in Context `_ -* `Universal Dependencies `_ * `WordNet databases and tools `_ -* `Open Multilingual Wordnet `_ -* `Automatic Keyphrase Extracttion `_ -* `News/Blogs in multiple languages `_ - + Neuroscience ------------- * `Allen Institute Datasets `_ * `Brain Catalogue `_ -* `Brainomics `_ -* `CodeNeuro Datasets `_ +* `Brainomics `_ +* `CodeNeuro Datasets `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `FCP-INDI `_ -* `Human Connectome Project `_ +* `Human Connectome Project `_ * `NDAR `_ -* `NIMH Data Archive `_ * `NeuroData `_ +* `Neuroelectro `_ +* `NIMH Data Archive `_ * `OASIS `_ * `OpenfMRI `_ -* `Neuroelectro `_ * `Study Forrest `_ @@ -419,9 +418,9 @@ Public Domains * `Archive.org Datasets `_ * `CMU JASA data archive `_ * `CMU StatLab collections `_ +* `Data.World `_ * `Data360 `_ * `Datamob.org `_ -* `Data.World `_ * `Google `_ * `Infochimps `_ * `KDNuggets Data Collections `_ @@ -477,8 +476,8 @@ Social Networks * `Skytrax' Air Travel Reviews Dataset `_ * `Social Twitter Data `_ * `SourceForge.net Research Data `_ -* `Twitter Data for Sentiment Analysis `_ * `Twitter Data for Online Reputation Management `_ +* `Twitter Data for Sentiment Analysis `_ * `Twitter Graph of entire Twitter site `_ * `Twitter Scrape Calufa May 2011 `_ * `UNIMI/LAW Social Network Datasets `_ @@ -523,11 +522,11 @@ Social Sciences * `Texas Inmates Executed Since 1984 `_ * `Titanic Survival Data Set `_ or `on Kaggle `_ * `UCB's Archive of Social Science Data (D-Lab) `_ -* `Uppsala Conflict Data Program `_ * `UCLA Social Sciences Data Archive `_ * `UN Civil Society Database `_ * `Universities Worldwide `_ * `UPJOHN for Labor Employment Research `_ +* `Uppsala Conflict Data Program `_ * `World Bank Open Data `_ * `WorldPop project - Worldwide human population distributions `_ @@ -594,8 +593,8 @@ Complementary Collections * `Data Packaged Core Datasets `_ * `Database of Scientific Code Contributions `_ -* DataWrangling: `Some Datasets Available on the Web `_ * A growing collection of public datasets: `CoolDatasets. `_ +* DataWrangling: `Some Datasets Available on the Web `_ * Inside-r: `Finding Data on the Internet `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ * Quora: `Where can I find large datasets open to the public? `_ From 68088197e998355435117ec3a660d8ad96bf4aad Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Fri, 7 Apr 2017 16:59:02 +0800 Subject: [PATCH 8/8] Modify pull_request_template --- PULL_REQUEST_TEMPLATE.md | 3 --- PULL_REQUEST_TEMPLATE.rst | 3 +++ 2 files changed, 3 insertions(+), 3 deletions(-) delete mode 100644 PULL_REQUEST_TEMPLATE.md create mode 100644 PULL_REQUEST_TEMPLATE.rst diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md deleted file mode 100644 index 4690fa4..0000000 --- a/PULL_REQUEST_TEMPLATE.md +++ /dev/null @@ -1,3 +0,0 @@ -# Overview -Dataset Description:
-[link to dataset](putlinkhere.com) diff --git a/PULL_REQUEST_TEMPLATE.rst b/PULL_REQUEST_TEMPLATE.rst new file mode 100644 index 0000000..1014736 --- /dev/null +++ b/PULL_REQUEST_TEMPLATE.rst @@ -0,0 +1,3 @@ +# Overview + +* `Dataset Description `_