From cc4d07233187558982ebe7dd49a9f5bad1c2d406 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 6 Jan 2015 12:25:46 +0800 Subject: [PATCH 001/276] Cleansing transportation data description. --- README.rst | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/README.rst b/README.rst index d4d2ce8..6d74fe7 100644 --- a/README.rst +++ b/README.rst @@ -336,39 +336,39 @@ Sports Time Series ----------- -* `Time Series data Library (TSDL) `_: The Time Series Data Library was created by Rob Hyndman, Professor of Statistics at Monash University, Australia. +* `Time Series data Library (TSDL), created by Rob Hyndman, MU `_ -* `UC Riverside Time Series `_: This data resource was created as a public service to the data mining/machine learning community, to encourage reproducible research for time series classification and clustering. +* `UC Riverside Time Series, for classification and clustering. `_ Transportation -------------- -* `Airlines Data 1987-2008 `_: Flight OD data used by ASA Challenge, 2009. +* `Airlines OD Data 1987-2008, used by ASA Challenge 2009 `_ -* `Bike Share Data Systems `_: A collection of bike sharing systems and trip histories over the world. +* `Bike Share Data Systems - trip histories, site maps etc. `_ * `Edge data for US domestic flights 1990 to 2009 `_ -* `Half a million Hubway rides `_: Bike trip histories (since 2011) in MA published by Hubway. +* `Half a million Hubway rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ -* `NYC Taxi Trip Data 2013 `_: FOIA/FOILed Taxi Trip Data from the NYC Taxi and Limousine Commission 2013, released by a civic hacker, Chris Whong. +* `NYC Taxi Trip Data 2013 - FOIA/FOILed by Chris Whong `_ -* `OpenFlights `_: Airport, airline and route data collected contributed by open communities. +* `OpenFlights - airport, airline and route data `_ -* `RITA Airline On-Time Performance Data `_: On-time arrival details for domestic flights by major air carriers in US. +* `RITA Airline On-Time Performance Data of major air carriers in US `_ -* `RITA transport data collection (TranStat) `_: Various transportation databases published by BTS. +* `RITA/BTS transport data collection (TranStat) `_ -* `Transport for London (TFL) `_: Providing London transportation data including bike sharing system, bus, train, and networking statistics. +* `Transport for London (TFL) - trip histories and networking statistics `_ -* `Travel Tracker Survey, Chicago `_: Data collection took place between January 2007 and February 2008. A total of 10,552 households participated in either a 1-day or 2-day survey, providing a detailed travel inventory for each member of their household on the assigned travel day(s). +* `Travel Tracker Survey (TTS), Chicago, 1990, 2007-2008 `_ -* `U.S. Bureau of Transportation Statistics (BTS) `_: As part of the RITA, BTS covers nearly all of transportation resources to create, manage, and share transportation statistical knowledge with public. +* `U.S. Bureau of Transportation Statistics (BTS) `_ -* `U.S. Freight Analysis Framework `_: Freight movement data among states and major metropolitan areas since 2007. +* `U.S. Freight Analysis Framework - Freight movement among states since 2007 `_ Complementary Collections From 4602a72b4a53ad811bee47c8476d732faa91efef Mon Sep 17 00:00:00 2001 From: "Jamin X. Chen" Date: Tue, 6 Jan 2015 14:18:24 +0800 Subject: [PATCH 002/276] Add shorter into to Social Sciences and Sports categories --- README.rst | 81 +++++++++++++++++++++++------------------------------- 1 file changed, 34 insertions(+), 47 deletions(-) diff --git a/README.rst b/README.rst index 6d74fe7..2d350ff 100644 --- a/README.rst +++ b/README.rst @@ -291,53 +291,52 @@ Search Engines * `DataMarket.com `_ * `Freebase.com `_ * `Harvard Dataverse `_ +* `ICPSR `_ * `Statista.com `_ Social Sciences --------------- -* `CMU Enron Email `_ -* `Facebook Social Networks (since 2007) `_ -* `Facebook100 (2005) `_ -* `Foursquare (2010,2011) `_ -* `Foursquare (UMN/Sarwat, 2013) `_ -* `General Social Survey (GSS) `_ -* `GetGlue (users rating TV shows) `_ -* `GitHub Archive `_ -* `ICPSR `_ -* `Mobile Social Networks (UMASS) `_ -* `PewResearch Internet Project `_ -* `Social Networking `_ -* `SourceForge Graph `_ -* `Stack Exchange Network (Data Explorer) `_ -* `Titanic Survival Data Set `_ -* `Twitter Graph `_ -* `UC Berkeley's D-Lab Achive `_ -* `UCLA Social Sciences Data Archive `_ -* `UNIMI Social Network Datasets `_ -* `Universities Worldwide `_ -* `UPJOHN for Employment Research `_ -* `Yahoo Graph and Social Data `_ -* `Youtube Graph (2007,2008) `_ +* `Ancestry.com Forum Dataset - Forum users and messages over ten years `_ +* `CMU Enron Email - 150 users, mostly senior management of Enron `_ +* `Facebook Data Scrape (2005) - 100 American colleges and univ. `_ +* `Facebook Social Networks from LAW (since 2007) `_ +* `Foursquare (2010, 2011) - Social networks, check-in locations and categories `_ +* `Foursquare from UMN/Sarwat (2013) - Users, venues, check-ins, ratings etc. `_ +* `General Social Survey (GSS, since 1972) - Demographic and attitudinal questions, plus topics of interests `_ +* `GetGlue - Users rating TV shows `_ +* `GitHub Archive - Programmers collaboration, projects progress etc. `_ +* `Mobile Social Networks (UMASS) - Timestamped mote-to-mote (up to 27 subjects) connections `_ +* `PewResearch Internet Project - A wide range of surveys about library usage, cell ownership, online dating etc. `_ +* `SourceForge.net Research Data (authority requested) - Historic and status statistics of projects and users' activities `_ +* `Stack Exchange Data Explorer - User-contributed content on the Stack Exchange network `_ +* `Titanic Survival Data Set - Demographic information of Titanic passengers `_ +* `Twitter Graph (authority requested) - Crawled entire Twitter site including user profiles, relations, topics and tweets `_ +* `UCB's Archive of Social Science Data (D-Lab) - Holdings of political, social and health areas `_ +* `UCLA Social Sciences Data Archive - A collection of social science data on the Web, e.g., DHS surveys `_ +* `UNIMI/LAW Social Network Datasets - Multiple social networks like amazon, LiveJournal, dblp, hollywood and more `_ +* `Universities Worldwide - Links to 9307 Universities in 205 countries `_ +* `UPJOHN for Employment Research - Labor surveys, unemployment spells and more `_ +* `Yahoo Graph and Social Data - Web page hyperlink graph, user-group membership, IM friends etc. `_ +* `Youtube Video Graph (2007,2008) - video relations, uploader, category, views, ratings and more `_ Sports ------ -* `Betfair (betting exchange) Event Results `_ -* `Cricsheet (cricket) `_ -* `Ergast Formula 1 (API available) `_ -* `Football/Soccer data and APIs `_ -* `Lahman's Baseball Database `_ -* `Retrosheet (baseball) `_ +* `Betfair Event Results - Fully time-stamped historical Betfair exchange data `_ +* `Cricsheet (baseball) - Thousands of Cricket matches `_ +* `Ergast Formula 1, from 1950 up to date (API available) `_ +* `Football/Soccer resouces (data and APIs) `_ +* `Lahman's Baseball Database - Batting and pitching statistics, team stats etc. `_ +* `Retrosheet (baseball) - Play-by-Play files, game logs and schedules `_ Time Series ----------- * `Time Series data Library (TSDL), created by Rob Hyndman, MU `_ - * `UC Riverside Time Series, for classification and clustering. `_ @@ -345,29 +344,17 @@ Transportation -------------- * `Airlines OD Data 1987-2008, used by ASA Challenge 2009 `_ - -* `Bike Share Data Systems - trip histories, site maps etc. `_ - +* `Bike Share Data Systems - Trip histories, site maps etc. `_ * `Edge data for US domestic flights 1990 to 2009 `_ - * `Half a million Hubway rides in MA `_ - -* `Marine Traffic - ship tracks, port calls and more `_ - +* `Marine Traffic - Ship tracks, port calls and more `_ * `NYC Taxi Trip Data 2013 - FOIA/FOILed by Chris Whong `_ - -* `OpenFlights - airport, airline and route data `_ - -* `RITA Airline On-Time Performance Data of major air carriers in US `_ - +* `OpenFlights - Airport, airline and route data `_ +* `RITA Airline On-Time Performance data of major air carriers in US `_ * `RITA/BTS transport data collection (TranStat) `_ - -* `Transport for London (TFL) - trip histories and networking statistics `_ - +* `Transport for London (TFL) - Trip histories and networking statistics `_ * `Travel Tracker Survey (TTS), Chicago, 1990, 2007-2008 `_ - * `U.S. Bureau of Transportation Statistics (BTS) `_ - * `U.S. Freight Analysis Framework - Freight movement among states since 2007 `_ From d71b6b1d6562a779c3fd96719278c2b4c4ed3c39 Mon Sep 17 00:00:00 2001 From: "Jamin X. Chen" Date: Tue, 6 Jan 2015 16:17:58 +0800 Subject: [PATCH 003/276] Update intro to physics and SE --- README.rst | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/README.rst b/README.rst index 2d350ff..bc90f74 100644 --- a/README.rst +++ b/README.rst @@ -253,8 +253,8 @@ Natural Language Physics ------- -* `CERN Open Data Portal `_ -* `NASA `_ +* `CERN Open Data Portal - Experimental data of CMS experiment, ALICE, ATLAS and LHCb `_ +* `NSSDC (NASA) - More than 230 TB of data from about 550 space science spacecraft `_ Public Domains @@ -285,14 +285,14 @@ Public Domains Search Engines -------------- -* `Academic Torrents `_ -* `Archive-it `_ -* `Datahub.io `_ -* `DataMarket.com `_ -* `Freebase.com `_ -* `Harvard Dataverse `_ -* `ICPSR `_ -* `Statista.com `_ +* `Academic Torrents (UMB) - Sharing enormous datasets, for researchers, by researchers. `_ +* `Archive-it - Web archiving service built at the Internet Archive `_ +* `Datahub.io - The easy way to get, use and share data `_ +* `DataMarket (Qlik) `_ +* `Freebase.com - A community-curated database of well-known people, places, and things `_ +* `Harvard Dataverse Network - Scientific data for reproducible research `_ +* `ICPSR (UMICH) - Find and analyze data `_ +* `Statista.com - Statistics and Studies from more than 18,000 Sources `_ Social Sciences @@ -304,22 +304,22 @@ Social Sciences * `Facebook Social Networks from LAW (since 2007) `_ * `Foursquare (2010, 2011) - Social networks, check-in locations and categories `_ * `Foursquare from UMN/Sarwat (2013) - Users, venues, check-ins, ratings etc. `_ -* `General Social Survey (GSS, since 1972) - Demographic and attitudinal questions, plus topics of interests `_ +* `General Social Survey (GSS, since 1972) - Demographic and attitudinal questions, topics etc. `_ * `GetGlue - Users rating TV shows `_ * `GitHub Archive - Programmers collaboration, projects progress etc. `_ * `Mobile Social Networks (UMASS) - Timestamped mote-to-mote (up to 27 subjects) connections `_ -* `PewResearch Internet Project - A wide range of surveys about library usage, cell ownership, online dating etc. `_ -* `SourceForge.net Research Data (authority requested) - Historic and status statistics of projects and users' activities `_ +* `PewResearch Internet Project - A wide range of surveys about library usage, online dating etc. `_ +* `SourceForge.net Research Data - Historic and status statistics of projects and users' activities `_ * `Stack Exchange Data Explorer - User-contributed content on the Stack Exchange network `_ * `Titanic Survival Data Set - Demographic information of Titanic passengers `_ -* `Twitter Graph (authority requested) - Crawled entire Twitter site including user profiles, relations, topics and tweets `_ +* `Twitter Graph - Crawled entire Twitter site including tweets, user profiles, relations `_ * `UCB's Archive of Social Science Data (D-Lab) - Holdings of political, social and health areas `_ * `UCLA Social Sciences Data Archive - A collection of social science data on the Web, e.g., DHS surveys `_ -* `UNIMI/LAW Social Network Datasets - Multiple social networks like amazon, LiveJournal, dblp, hollywood and more `_ +* `UNIMI/LAW Social Network Datasets - Social networks like amazon, LiveJournal, dblp and more `_ * `Universities Worldwide - Links to 9307 Universities in 205 countries `_ * `UPJOHN for Employment Research - Labor surveys, unemployment spells and more `_ * `Yahoo Graph and Social Data - Web page hyperlink graph, user-group membership, IM friends etc. `_ -* `Youtube Video Graph (2007,2008) - video relations, uploader, category, views, ratings and more `_ +* `Youtube Video Graph (2007,2008) - Video relations, uploaders, views, ratings and more `_ Sports From 234197dffbf9743688280db06195552e312a7f06 Mon Sep 17 00:00:00 2001 From: "Jamin X. Chen" Date: Sun, 11 Jan 2015 12:29:38 +0800 Subject: [PATCH 004/276] Update basic intro of NL category. --- README.rst | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/README.rst b/README.rst index bc90f74..f3825f4 100644 --- a/README.rst +++ b/README.rst @@ -234,20 +234,20 @@ Music Natural Language ---------------- -* `40 Million Entities in Context `_ -* `ClueWeb09 FACC `_ -* `ClueWeb12 FACC `_ -* `DBpedia `_ -* `Flickr personal taxonomies `_ -* `Google Books Ngrams `_ -* `Google Web 5gram, 2006 (1T) `_ -* `Gutenberg eBooks List `_ -* `Hansards `_ -* `Machine Translation `_ -* `SMS Spam Collection `_ -* `USENET corpus `_ -* `Wikidata `_ -* `WordNet `_ +* `ClueWeb09 FACC - Annotated English-language Web pages from the ClueWeb09 corpora. `_ +* `ClueWeb12 FACC - Annotated English-language Web pages from the ClueWeb12 corpora. `_ +* `DBpedia - Multi-domain ontology describing 4.58M “things” with 583M “facts”. `_ +* `Flickr Personal Taxonomies - Personalized tagging pictures with descriptive labels. `_ +* `Google Books Ngrams (2.2TB) - N-gram corpuses extracted from Google Books. `_ +* `Google Web 5gram (1TB, 2006) - 5-gram corpuses extracted from Web pages. `_ +* `Gutenberg eBooks List - Basic information about each eBook from Project Gutenberg. `_ +* `Hansards - 1.3M aligned text chunks from official records of Canadian Parliament. `_ +* `Machine Translation - The recurring translation task focusing on European languages. `_ +* `SMS Spam Collection - 5,574 real English messages, labled as being ham or spam. `_ +* `USENET corpus - A collection of public USENET postings between Oct 2005 and Jan 2011. `_ +* `Wikidata - Wikipedia databases available in JSON and XML formats. `_ +* `Wikipedia Links data - 40 Million Entities in Context. `_ +* `WordNet - Databases, associated packages and tools. `_ Physics @@ -314,11 +314,11 @@ Social Sciences * `Titanic Survival Data Set - Demographic information of Titanic passengers `_ * `Twitter Graph - Crawled entire Twitter site including tweets, user profiles, relations `_ * `UCB's Archive of Social Science Data (D-Lab) - Holdings of political, social and health areas `_ -* `UCLA Social Sciences Data Archive - A collection of social science data on the Web, e.g., DHS surveys `_ +* `UCLA Social Sciences Data Archive - A collection of social science data on the Web `_ * `UNIMI/LAW Social Network Datasets - Social networks like amazon, LiveJournal, dblp and more `_ * `Universities Worldwide - Links to 9307 Universities in 205 countries `_ * `UPJOHN for Employment Research - Labor surveys, unemployment spells and more `_ -* `Yahoo Graph and Social Data - Web page hyperlink graph, user-group membership, IM friends etc. `_ +* `Yahoo Graph and Social Data - Web page graph, user-group membership, IM friends etc. `_ * `Youtube Video Graph (2007,2008) - Video relations, uploaders, views, ratings and more `_ @@ -355,7 +355,7 @@ Transportation * `Transport for London (TFL) - Trip histories and networking statistics `_ * `Travel Tracker Survey (TTS), Chicago, 1990, 2007-2008 `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ -* `U.S. Freight Analysis Framework - Freight movement among states since 2007 `_ +* `**U.S. Freight Analysis Framework** - Freight movement among states since 2007 `_ Complementary Collections From 7720b3f01de591252659fc93b8c18bc9a1ee79f3 Mon Sep 17 00:00:00 2001 From: "Jamin X. Chen" Date: Sun, 11 Jan 2015 13:05:53 +0800 Subject: [PATCH 005/276] Update basic intro of ML and Music categories. --- README.rst | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/README.rst b/README.rst index f3825f4..ad5bc13 100644 --- a/README.rst +++ b/README.rst @@ -192,7 +192,7 @@ Healthcare Image Processing ---------------- -* `2GB of photos of cats `_ +* `2GB of Photos of Cats `_ * `Face Recognition Benchmark `_ * `ImageNet `_ @@ -200,20 +200,20 @@ Image Processing Machine Learning ---------------- -* `eBay Online Auctions `_ -* `IMDb database `_ -* `Keel Repository `_ -* `Lending Club Loan Data `_ -* `Machine Learning Data Set Repository `_ -* `Million Song Dataset `_ -* `More Song Datasets `_ -* `MovieLens Data Sets `_ -* `RDataMining R and Data Mining ebook data `_ -* `Registered meteorites on Earth `_ -* `SF restaurants dataset `_ -* `UCI Machine Learning Repository `_ -* `University of Toronto Delve Datasets `_ -* `Yahoo Ratings and Classification Data `_ +* `Delve Datasets (Univ. of Toronto) - Evaluating datasets for classification and regression. `_ +* `eBay Online Auctions (2012) - Seller-auction-bidder data with closing prices. `_ +* `IMDb Database - An online database of films, TB programs, and video games. `_ +* `Keel Repository - Multiple datasets for classification, regression, time series. `_ +* `Lending Club Loan Data - Loan status (Current, Late, Fully Paid, etc.) and latest payment info. `_ +* `Machine Learning Data Set Repository - A data search engine for machine learning tasks. `_ +* `Million Song Dataset - Audio features and metadata for a million popular music tracks. `_ +* `More Song Datasets - Complementary data of cover songs, lyrics, user listening data. `_ +* `MovieLens Data Sets - Online movie recommendation including movie tags, user ratings. `_ +* `RDataMining - "R and Data Mining" ebook data `_ +* `Registered Meteorites on Earth - 34,513 meteorites updated to 2012. `_ +* `Restaurants Health Score Data - Health status of restaurants in San Francisco. `_ +* `UCI Machine Learning Repository - One of most famous ML data repositories. `_ +* `Yahoo Ratings and Classification Data - About music, movies, user clicks, images etc. `_ Museums @@ -228,7 +228,7 @@ Museums Music ----- -* `Discogs Data `_ +* `Discogs Data - Monthly dumps of Discogs Release, Artist and Label data. `_ Natural Language From 718b67b0873adf50fe6c104ec50434b9a4da3591 Mon Sep 17 00:00:00 2001 From: "Jamin X. Chen" Date: Sun, 11 Jan 2015 13:21:56 +0800 Subject: [PATCH 006/276] Update basic intro of Image Processing and Healthcare categories. --- README.rst | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/README.rst b/README.rst index ad5bc13..efb88f9 100644 --- a/README.rst +++ b/README.rst @@ -184,17 +184,20 @@ Government Healthcare ---------- -* `EHDP Large Health Data Sets `_ -* `Gapminder `_ +* `EHDP Large Health Data Sets - A collection of health datasets across domains and countries. `_ +* `Gapminder World - A collection of multi-domain, demographic databases for our world. `_ +* `Medicare Coverage Database (MCD) - Containing national and local Coverage Determinations. `_ +* `Medicare Data Engine - Download, Explore, and Visualize Medicare.gov Data. `_ * `Medicare Data File `_ + Image Processing ---------------- -* `2GB of Photos of Cats `_ -* `Face Recognition Benchmark `_ -* `ImageNet `_ +* `2GB of Photos of Cats - 10K cat images with basic annotations. `_ +* `Face Recognition Benchmark - A collection of face datasets for benchmarking algorithms. `_ +* `ImageNet - An image database organized according to the WordNet hierarchy. `_ Machine Learning From 37c74139cf89bfc0a264ee143a40f75fd453c8ac Mon Sep 17 00:00:00 2001 From: Xiaming Date: Sun, 11 Jan 2015 13:23:29 +0800 Subject: [PATCH 007/276] Update README.rst --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index efb88f9..f202777 100644 --- a/README.rst +++ b/README.rst @@ -358,7 +358,7 @@ Transportation * `Transport for London (TFL) - Trip histories and networking statistics `_ * `Travel Tracker Survey (TTS), Chicago, 1990, 2007-2008 `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ -* `**U.S. Freight Analysis Framework** - Freight movement among states since 2007 `_ +* `U.S. Freight Analysis Framework - Freight movement among states since 2007 `_ Complementary Collections From ffdedc1fceb7d9f3b0f3f7d9bb5f09e1b4c187bc Mon Sep 17 00:00:00 2001 From: "Jamin X. Chen" Date: Tue, 13 Jan 2015 21:06:49 +0800 Subject: [PATCH 008/276] Update intro of CN and GIS categories. --- README.rst | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/README.rst b/README.rst index f202777..fbce385 100644 --- a/README.rst +++ b/README.rst @@ -68,15 +68,15 @@ Complex Networks Computer Networks ----------------- -* `3.5B Web Pages `_ -* `53.5B Web clicks `_ -* `CAIDA Internet Datasets `_ -* `ClueWeb09 `_ -* `ClueWeb12 `_ -* `CommonCrawl Web Data `_ -* `Dartmouth CRAWDAD Wireless datasets `_ -* `OpenMobileData (MobiPerf) `_ -* `UCSD Network Telescope `_ +* `3.5B Web Pages - Web graph extracted from CommonCraw 2012 web corpus. `_ +* `53.5B Web clicks - Anonymized HTTP records from 100K users in Indiana Univ. `_ +* `CAIDA Internet Datasets - Network traces and topologies at geographically diverse locations. `_ +* `ClueWeb09 - About 1B web pages in ten languages that were collected in Jan. and Feb. 2009. `_ +* `ClueWeb12 - About 733M web pages collected between Feb. and May 2012. `_ +* `CommonCrawl Web Data - Petabytes of data collected over 7 years of web crawling. `_ +* `CRAWDAD Wireless datasets (Dartmouth) - A wireless network data resource for research communities. `_ +* `OpenMobileData (MobiPerf) - Mobile performance measurement data collected with active tests. `_ +* `UCSD Network Telescope - A passive traffic monitoring system covering IPv4 /8 net. `_ Data Challenges @@ -133,17 +133,17 @@ Finance GeoSpace/GIS ------------ -* `BODC (marine data of nearly 22,000 oceanographic vars) `_ -* `EOSDIS `_ -* `Factual Global Location Data `_ -* `GADM (Global Administrative Areas database) `_ -* `Geo Spatial Data from ASU `_ -* `GeoNames (over eight million placenames) `_ -* `Natural Earth (vectors and rasters of the world) `_ -* `OpenStreetMap (a free map worldwide) `_ -* `TIGER/Line (official United States boundaries and roads) `_ -* `twofishes (Foursquare's coarse geocoder) `_ -* `tz_world (timezone polygons) `_ +* `BODC - Marine data of nearly 22,000 oceanographic vars. `_ +* `EOSDIS - A data collection of NASA's earth observing system data and information system. `_ +* `Factual Global Location Data - 65M POIs with extended attributes in 50 countries. `_ +* `Global Administrative Areas Database (GADM) - For countries and low-level subdivisions. `_ +* `Geo Spatial Data from ASU - Several small spatial or GIS datasets. `_ +* `GeoNames - Over eight million placenames (countries, city stat etc.) of the world. `_ +* `Natural Earth - Vectors and rasters of the world in multiple scales. `_ +* `OpenStreetMap - A free map worldwide maintained by the communities. `_ +* `TIGER/Line - Official United States boundaries and roads. `_ +* `TwoFishes - Foursquare's coarse geocoder. `_ +* `TZ Timezones - A shapefile of the TZ timezones of the world. `_ Government @@ -187,7 +187,7 @@ Healthcare * `EHDP Large Health Data Sets - A collection of health datasets across domains and countries. `_ * `Gapminder World - A collection of multi-domain, demographic databases for our world. `_ * `Medicare Coverage Database (MCD) - Containing national and local Coverage Determinations. `_ -* `Medicare Data Engine - Download, Explore, and Visualize Medicare.gov Data. `_ +* `Medicare Data Engine - Download, explore, and visualize Medicare.gov Data. `_ * `Medicare Data File `_ From 736ded17d796734f2b28f72e41b539112b5edf37 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Fri, 23 Jan 2015 14:19:12 +0800 Subject: [PATCH 009/276] Update MovieLens data site --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index fbce385..ed560e6 100644 --- a/README.rst +++ b/README.rst @@ -211,7 +211,7 @@ Machine Learning * `Machine Learning Data Set Repository - A data search engine for machine learning tasks. `_ * `Million Song Dataset - Audio features and metadata for a million popular music tracks. `_ * `More Song Datasets - Complementary data of cover songs, lyrics, user listening data. `_ -* `MovieLens Data Sets - Online movie recommendation including movie tags, user ratings. `_ +* `MovieLens Data Sets - Online movie recommendation including movie tags, user ratings. `_ * `RDataMining - "R and Data Mining" ebook data `_ * `Registered Meteorites on Earth - 34,513 meteorites updated to 2012. `_ * `Restaurants Health Score Data - Health status of restaurants in San Francisco. `_ From 44c58b64263dc8288f6d86fda491b70e6240d80a Mon Sep 17 00:00:00 2001 From: Xiaming Date: Fri, 30 Jan 2015 10:43:36 +0800 Subject: [PATCH 010/276] Add U.S. MassGIS data --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index ed560e6..1168426 100644 --- a/README.rst +++ b/README.rst @@ -158,7 +158,8 @@ Government * `Germany `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ -* `London Datastore, U.K `_ +* `London Datastore, UK `_ +* `MassGIS, Massachusetts, U.S. `_ * `Netherlands `_ * `New Zealand `_ * `NYC betanyc `_ From 43889987053539f58940af8f8ce3cc05ec19dc28 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sat, 31 Jan 2015 17:18:37 +0800 Subject: [PATCH 011/276] Tidy data description --- README.rst | 221 ++++++++++++++++++++++++++--------------------------- 1 file changed, 108 insertions(+), 113 deletions(-) diff --git a/README.rst b/README.rst index 1168426..2a43faf 100644 --- a/README.rst +++ b/README.rst @@ -38,7 +38,7 @@ Climate/Weather * `Australian Weather `_ * `Canadian Meteorological Centre `_ -* `Climate Data from UEA (updated at roughly monthly intervals) `_ +* `Climate Data from UEA (updated monthly) `_ * `Global Climate Data Since 1929 `_ * `NOAA Bering Sea Climate `_ * `NOAA Climate Datasets `_ @@ -68,15 +68,15 @@ Complex Networks Computer Networks ----------------- -* `3.5B Web Pages - Web graph extracted from CommonCraw 2012 web corpus. `_ -* `53.5B Web clicks - Anonymized HTTP records from 100K users in Indiana Univ. `_ -* `CAIDA Internet Datasets - Network traces and topologies at geographically diverse locations. `_ -* `ClueWeb09 - About 1B web pages in ten languages that were collected in Jan. and Feb. 2009. `_ -* `ClueWeb12 - About 733M web pages collected between Feb. and May 2012. `_ -* `CommonCrawl Web Data - Petabytes of data collected over 7 years of web crawling. `_ -* `CRAWDAD Wireless datasets (Dartmouth) - A wireless network data resource for research communities. `_ -* `OpenMobileData (MobiPerf) - Mobile performance measurement data collected with active tests. `_ -* `UCSD Network Telescope - A passive traffic monitoring system covering IPv4 /8 net. `_ +* `3.5B Web Pages from CommonCraw 2012 `_ +* `53.5B Web clicks of 100K users in Indiana Univ. `_ +* `CAIDA Internet Datasets `_ +* `ClueWeb09 - 1B web pages `_ +* `ClueWeb12 - 733M web pages `_ +* `CommonCrawl Web Data over 7 years `_ +* `CRAWDAD Wireless datasets from Dartmouth Univ. `_ +* `Open Mobile Data by MobiPerf `_ +* `UCSD Network Telescope, IPv4 /8 net `_ Data Challenges @@ -95,7 +95,7 @@ Data Challenges Economics --------- -* `American Economic Ass. (AEA) `_ +* `American Economic Ass (AEA) `_ * `EconData from UMD `_ * `Internet Product Code Database `_ @@ -133,24 +133,24 @@ Finance GeoSpace/GIS ------------ -* `BODC - Marine data of nearly 22,000 oceanographic vars. `_ -* `EOSDIS - A data collection of NASA's earth observing system data and information system. `_ -* `Factual Global Location Data - 65M POIs with extended attributes in 50 countries. `_ -* `Global Administrative Areas Database (GADM) - For countries and low-level subdivisions. `_ -* `Geo Spatial Data from ASU - Several small spatial or GIS datasets. `_ -* `GeoNames - Over eight million placenames (countries, city stat etc.) of the world. `_ -* `Natural Earth - Vectors and rasters of the world in multiple scales. `_ -* `OpenStreetMap - A free map worldwide maintained by the communities. `_ -* `TIGER/Line - Official United States boundaries and roads. `_ -* `TwoFishes - Foursquare's coarse geocoder. `_ -* `TZ Timezones - A shapefile of the TZ timezones of the world. `_ +* `BODC - marine data of ~22K vars `_ +* `EOSDIS - NASA's earth observing system data `_ +* `Factual Global Location Data `_ +* `Global Administrative Areas Database (GADM) `_ +* `Geo Spatial Data from ASU `_ +* `GeoNames Worldwide `_ +* `Natural Earth - vectors and rasters of the world `_ +* `Open Street Map (OSM) `_ +* `TIGER/Line - U.S. boundaries and roads `_ +* `TwoFishes - Foursquare's coarse geocoder `_ +* `TZ Timezones shapfiles `_ Government ---------- -* `Australia `_ (abs.gov.au) -* `Australia `_ (data.gov.au) +* `Australia (abs.gov.au) `_ +* `Australia (data.gov.au) `_ * `Canada `_ * `Chicago `_ * `EuroStat `_ @@ -185,10 +185,10 @@ Government Healthcare ---------- -* `EHDP Large Health Data Sets - A collection of health datasets across domains and countries. `_ -* `Gapminder World - A collection of multi-domain, demographic databases for our world. `_ -* `Medicare Coverage Database (MCD) - Containing national and local Coverage Determinations. `_ -* `Medicare Data Engine - Download, explore, and visualize Medicare.gov Data. `_ +* `EHDP Large Health Data Sets `_ +* `Gapminder World, demographic databases `_ +* `Medicare Coverage Database (MCD), U.S. `_ +* `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ @@ -196,28 +196,29 @@ Healthcare Image Processing ---------------- -* `2GB of Photos of Cats - 10K cat images with basic annotations. `_ -* `Face Recognition Benchmark - A collection of face datasets for benchmarking algorithms. `_ -* `ImageNet - An image database organized according to the WordNet hierarchy. `_ +* `2GB of Photos of Cats `_ +* `Face Recognition Benchmark `_ +* `ImageNet - an image database in WordNet hierarchy `_ Machine Learning ---------------- -* `Delve Datasets (Univ. of Toronto) - Evaluating datasets for classification and regression. `_ -* `eBay Online Auctions (2012) - Seller-auction-bidder data with closing prices. `_ -* `IMDb Database - An online database of films, TB programs, and video games. `_ -* `Keel Repository - Multiple datasets for classification, regression, time series. `_ -* `Lending Club Loan Data - Loan status (Current, Late, Fully Paid, etc.) and latest payment info. `_ -* `Machine Learning Data Set Repository - A data search engine for machine learning tasks. `_ -* `Million Song Dataset - Audio features and metadata for a million popular music tracks. `_ -* `More Song Datasets - Complementary data of cover songs, lyrics, user listening data. `_ -* `MovieLens Data Sets - Online movie recommendation including movie tags, user ratings. `_ +* `Delve Datasets for classification and regression (Univ. of Toronto) `_ +* `Discogs Monthly Data `_ +* `eBay Online Auctions (2012) `_ +* `IMDb Database `_ +* `Keel Repository for classification, regression and time series `_ +* `Lending Club Loan Data `_ +* `Machine Learning Data Set Repository `_ +* `Million Song Dataset `_ +* `More Song Datasets `_ +* `MovieLens Data Sets `_ * `RDataMining - "R and Data Mining" ebook data `_ -* `Registered Meteorites on Earth - 34,513 meteorites updated to 2012. `_ -* `Restaurants Health Score Data - Health status of restaurants in San Francisco. `_ -* `UCI Machine Learning Repository - One of most famous ML data repositories. `_ -* `Yahoo Ratings and Classification Data - About music, movies, user clicks, images etc. `_ +* `Registered Meteorites on Earth `_ +* `Restaurants Health Score Data in San Francisco `_ +* `UCI Machine Learning Repository `_ +* `Yahoo! Ratings and Classification Data `_ Museums @@ -229,36 +230,30 @@ Museums * `The Getty vocabularies `_ -Music ------ - -* `Discogs Data - Monthly dumps of Discogs Release, Artist and Label data. `_ - - Natural Language ---------------- -* `ClueWeb09 FACC - Annotated English-language Web pages from the ClueWeb09 corpora. `_ -* `ClueWeb12 FACC - Annotated English-language Web pages from the ClueWeb12 corpora. `_ -* `DBpedia - Multi-domain ontology describing 4.58M “things” with 583M “facts”. `_ -* `Flickr Personal Taxonomies - Personalized tagging pictures with descriptive labels. `_ -* `Google Books Ngrams (2.2TB) - N-gram corpuses extracted from Google Books. `_ -* `Google Web 5gram (1TB, 2006) - 5-gram corpuses extracted from Web pages. `_ -* `Gutenberg eBooks List - Basic information about each eBook from Project Gutenberg. `_ -* `Hansards - 1.3M aligned text chunks from official records of Canadian Parliament. `_ -* `Machine Translation - The recurring translation task focusing on European languages. `_ -* `SMS Spam Collection - 5,574 real English messages, labled as being ham or spam. `_ -* `USENET corpus - A collection of public USENET postings between Oct 2005 and Jan 2011. `_ -* `Wikidata - Wikipedia databases available in JSON and XML formats. `_ -* `Wikipedia Links data - 40 Million Entities in Context. `_ -* `WordNet - Databases, associated packages and tools. `_ +* `ClueWeb09 FACC `_ +* `ClueWeb12 FACC `_ +* `DBpedia - 4.58M “things” with 583M “facts”`_ +* `Flickr Personal Taxonomies `_ +* `Google Books Ngrams (2.2TB) `_ +* `Google Web 5gram (1TB, 2006) `_ +* `Gutenberg eBooks List `_ +* `Hansards text chunks of Canadian Parliament `_ +* `Machine Translation of European languages `_ +* `SMS Spam Collection in English `_ +* `USENET postings corpus of 2005~2011 `_ +* `Wikidata - Wikipedia databases `_ +* `Wikipedia Links data - 40 Million Entities in Context `_ +* `WordNet databases and tools `_ Physics ------- -* `CERN Open Data Portal - Experimental data of CMS experiment, ALICE, ATLAS and LHCb `_ -* `NSSDC (NASA) - More than 230 TB of data from about 550 space science spacecraft `_ +* `CERN Open Data Portal `_ +* `NSSDC (NASA) data of 550 space spacecraft `_ Public Domains @@ -289,77 +284,77 @@ Public Domains Search Engines -------------- -* `Academic Torrents (UMB) - Sharing enormous datasets, for researchers, by researchers. `_ -* `Archive-it - Web archiving service built at the Internet Archive `_ -* `Datahub.io - The easy way to get, use and share data `_ +* `Academic Torrents of data sharing from UMB `_ +* `Archive-it from Internet Archive `_ +* `Datahub.io `_ * `DataMarket (Qlik) `_ -* `Freebase.com - A community-curated database of well-known people, places, and things `_ -* `Harvard Dataverse Network - Scientific data for reproducible research `_ -* `ICPSR (UMICH) - Find and analyze data `_ -* `Statista.com - Statistics and Studies from more than 18,000 Sources `_ +* `Freebase.com of people, places, and things `_ +* `Harvard Dataverse Network of scientific data `_ +* `ICPSR (UMICH) `_ +* `Statista.com - statistics and Studies `_ Social Sciences --------------- -* `Ancestry.com Forum Dataset - Forum users and messages over ten years `_ -* `CMU Enron Email - 150 users, mostly senior management of Enron `_ -* `Facebook Data Scrape (2005) - 100 American colleges and univ. `_ +* `Ancestry.com Forum Dataset over 10 years `_ +* `CMU Enron Email of 150 users `_ +* `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ -* `Foursquare (2010, 2011) - Social networks, check-in locations and categories `_ -* `Foursquare from UMN/Sarwat (2013) - Users, venues, check-ins, ratings etc. `_ -* `General Social Survey (GSS, since 1972) - Demographic and attitudinal questions, topics etc. `_ -* `GetGlue - Users rating TV shows `_ -* `GitHub Archive - Programmers collaboration, projects progress etc. `_ -* `Mobile Social Networks (UMASS) - Timestamped mote-to-mote (up to 27 subjects) connections `_ -* `PewResearch Internet Project - A wide range of surveys about library usage, online dating etc. `_ -* `SourceForge.net Research Data - Historic and status statistics of projects and users' activities `_ -* `Stack Exchange Data Explorer - User-contributed content on the Stack Exchange network `_ -* `Titanic Survival Data Set - Demographic information of Titanic passengers `_ -* `Twitter Graph - Crawled entire Twitter site including tweets, user profiles, relations `_ -* `UCB's Archive of Social Science Data (D-Lab) - Holdings of political, social and health areas `_ -* `UCLA Social Sciences Data Archive - A collection of social science data on the Web `_ -* `UNIMI/LAW Social Network Datasets - Social networks like amazon, LiveJournal, dblp and more `_ -* `Universities Worldwide - Links to 9307 Universities in 205 countries `_ -* `UPJOHN for Employment Research - Labor surveys, unemployment spells and more `_ -* `Yahoo Graph and Social Data - Web page graph, user-group membership, IM friends etc. `_ -* `Youtube Video Graph (2007,2008) - Video relations, uploaders, views, ratings and more `_ +* `Foursquare Social Network in 2010, 2011 `_ +* `Foursquare from UMN/Sarwat (2013) `_ +* `General Social Survey (GSS) since 1972 `_ +* `GetGlue - users rating TV shows `_ +* `GitHub Collaboration Archive `_ +* `Mobile Social Networks from UMASS `_ +* `PewResearch Internet Survey Project `_ +* `SourceForge.net Research Data `_ +* `StackExchange Data Explorer `_ +* `Titanic Survival Data Set `_ +* `Twitter Graph of entire Twitter site `_ +* `UCB's Archive of Social Science Data (D-Lab) `_ +* `UCLA Social Sciences Data Archive `_ +* `UNIMI/LAW Social Network Datasets `_ +* `Universities Worldwide `_ +* `UPJOHN for Labor Employment Research `_ +* `Yahoo! Graph and Social Data `_ +* `Youtube Video Social Graph in 2007,2008 `_ Sports ------ -* `Betfair Event Results - Fully time-stamped historical Betfair exchange data `_ -* `Cricsheet (baseball) - Thousands of Cricket matches `_ -* `Ergast Formula 1, from 1950 up to date (API available) `_ +* `Betfair Historical Exchange Data `_ +* `Cricsheet Matches (baseball) `_ +* `Ergast Formula 1, from 1950 up to date (API) `_ * `Football/Soccer resouces (data and APIs) `_ -* `Lahman's Baseball Database - Batting and pitching statistics, team stats etc. `_ -* `Retrosheet (baseball) - Play-by-Play files, game logs and schedules `_ +* `Lahman's Baseball Database `_ +* `Retrosheet Baseball Statistics `_ Time Series ----------- -* `Time Series data Library (TSDL), created by Rob Hyndman, MU `_ -* `UC Riverside Time Series, for classification and clustering. `_ +* `Time Series Data Library (TSDL) from MU `_ +* `UC Riverside Time Series Dataset `_ Transportation -------------- -* `Airlines OD Data 1987-2008, used by ASA Challenge 2009 `_ -* `Bike Share Data Systems - Trip histories, site maps etc. `_ -* `Edge data for US domestic flights 1990 to 2009 `_ -* `Half a million Hubway rides in MA `_ -* `Marine Traffic - Ship tracks, port calls and more `_ -* `NYC Taxi Trip Data 2013 - FOIA/FOILed by Chris Whong `_ -* `OpenFlights - Airport, airline and route data `_ -* `RITA Airline On-Time Performance data of major air carriers in US `_ +* `Airlines OD Data 1987-2008 `_ +* `Bike Share Systems (BSS) collection `_ +* `Hubway Million Rides in MA `_ +* `Marine Traffic - ship tracks, port calls and more `_ +* `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ +* `OpenFlights - airport, airline and route data `_ +* `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ -* `Transport for London (TFL) - Trip histories and networking statistics `_ -* `Travel Tracker Survey (TTS), Chicago, 1990, 2007-2008 `_ +* `Transport for London (TFL) `_ +* `Travel Tracker Survey (TTS) for Chicago `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ -* `U.S. Freight Analysis Framework - Freight movement among states since 2007 `_ +* `U.S. Domestic Flights 1990 to 2009 `_ +* `U.S. Freight Analysis Framework since 2007 `_ Complementary Collections @@ -369,4 +364,4 @@ Complementary Collections * Inside-r: `Finding Data on the Internet `_ * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ -* StaTrek: `Leveraging open data to understand urban lives `_ +* StaTrek: `Leveraging open data to understand urban lives `_ \ No newline at end of file From bcba2f0ccc36acf41c09084857d1b23c1b048088 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sat, 31 Jan 2015 17:21:26 +0800 Subject: [PATCH 012/276] Remove format error --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 2a43faf..45e0fbd 100644 --- a/README.rst +++ b/README.rst @@ -235,7 +235,7 @@ Natural Language * `ClueWeb09 FACC `_ * `ClueWeb12 FACC `_ -* `DBpedia - 4.58M “things” with 583M “facts”`_ +* `DBpedia - 4.58M things with 583M facts `_ * `Flickr Personal Taxonomies `_ * `Google Books Ngrams (2.2TB) `_ * `Google Web 5gram (1TB, 2006) `_ From 93111c8fc5015d4423a38443acc7f72edf016569 Mon Sep 17 00:00:00 2001 From: "Dana \"Dani\"" Date: Tue, 3 Feb 2015 13:55:36 -0600 Subject: [PATCH 013/276] Update README.rst Added Dallas, Denver, Seattle city open data --- README.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 45e0fbd..0367bda 100644 --- a/README.rst +++ b/README.rst @@ -153,6 +153,8 @@ Government * `Australia (data.gov.au) `_ * `Canada `_ * `Chicago `_ +* `Dallas Open Data `_ +* `Denver Open Data `_ * `EuroStat `_ * `FedStats `_ * `Germany `_ @@ -167,6 +169,7 @@ Government * `OECD `_ * `Open Government Data (OGD) Platform India `_ * `San Francisco Data sets `_ +* `Seattle `_ * `South Africa `_ * `The World Bank `_ * `U.K. Government Data `_ @@ -364,4 +367,4 @@ Complementary Collections * Inside-r: `Finding Data on the Internet `_ * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ -* StaTrek: `Leveraging open data to understand urban lives `_ \ No newline at end of file +* StaTrek: `Leveraging open data to understand urban lives `_ From 6608a4db4228e300e7110eeecfff443b657eb164 Mon Sep 17 00:00:00 2001 From: gwulfs Date: Wed, 4 Feb 2015 14:36:47 -0500 Subject: [PATCH 014/276] add dataset --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 0367bda..3d236b1 100644 --- a/README.rst +++ b/README.rst @@ -340,6 +340,7 @@ Time Series * `Time Series Data Library (TSDL) from MU `_ * `UC Riverside Time Series Dataset `_ +* `Hard Drive Failure Rates `_ Transportation From 854741f3668bdbe502d4050b9ece47a027dcf172 Mon Sep 17 00:00:00 2001 From: "Jamin X. Chen" Date: Thu, 5 Feb 2015 10:27:24 +0800 Subject: [PATCH 015/276] Add serveral datasets contributed by community --- README.rst | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index 3d236b1..e3bfdb8 100644 --- a/README.rst +++ b/README.rst @@ -134,11 +134,12 @@ GeoSpace/GIS ------------ * `BODC - marine data of ~22K vars `_ +* `Cambridge, MA, US, GIS data on GitHub `_ * `EOSDIS - NASA's earth observing system data `_ * `Factual Global Location Data `_ -* `Global Administrative Areas Database (GADM) `_ * `Geo Spatial Data from ASU `_ * `GeoNames Worldwide `_ +* `Global Administrative Areas Database (GADM) `_ * `Natural Earth - vectors and rasters of the world `_ * `Open Street Map (OSM) `_ * `TIGER/Line - U.S. boundaries and roads `_ @@ -151,12 +152,15 @@ Government * `Australia (abs.gov.au) `_ * `Australia (data.gov.au) `_ +* `Brazil `_ +* `Cambridge, MA, US `_ * `Canada `_ * `Chicago `_ * `Dallas Open Data `_ * `Denver Open Data `_ * `EuroStat `_ * `FedStats `_ +* `France `_ * `Germany `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ @@ -199,9 +203,14 @@ Healthcare Image Processing ---------------- +* `10k US Adult Faces Database `_ * `2GB of Photos of Cats `_ +* `Affective Image Classification `_ * `Face Recognition Benchmark `_ -* `ImageNet - an image database in WordNet hierarchy `_ +* `ImageNet (in WordNet hierarchy) `_ +* `International Affective Picture System, UFL `_ +* `Massive Visual Memory Stimuli, MIT `_ +* `SUN database, MIT `_ Machine Learning @@ -294,6 +303,7 @@ Search Engines * `Freebase.com of people, places, and things `_ * `Harvard Dataverse Network of scientific data `_ * `ICPSR (UMICH) `_ +* `Open Data Certificates (beta) `_ * `Statista.com - statistics and Studies `_ @@ -348,6 +358,7 @@ Transportation * `Airlines OD Data 1987-2008 `_ * `Bike Share Systems (BSS) collection `_ +* `Bay Area Bike Share Data `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ From e546c35955fe54a7c722c64fa86538a12a6b881d Mon Sep 17 00:00:00 2001 From: saurzcode Date: Mon, 23 Feb 2015 17:56:24 +0530 Subject: [PATCH 016/276] Adding Indian Goverment Data Set --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e3bfdb8..ec33ccc 100644 --- a/README.rst +++ b/README.rst @@ -164,6 +164,7 @@ Government * `Germany `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ +* `Indian Government `_ * `London Datastore, UK `_ * `MassGIS, Massachusetts, U.S. `_ * `Netherlands `_ From 0c70538c270b1ccffb4538106c8ac16ec99d7bd1 Mon Sep 17 00:00:00 2001 From: saurzcode Date: Mon, 23 Feb 2015 17:58:38 +0530 Subject: [PATCH 017/276] Adding Indian Government Data. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index ec33ccc..4150713 100644 --- a/README.rst +++ b/README.rst @@ -164,7 +164,7 @@ Government * `Germany `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ -* `Indian Government `_ +* `Indian Government Data `_ * `London Datastore, UK `_ * `MassGIS, Massachusetts, U.S. `_ * `Netherlands `_ From 6de2446e68db11f2d73ddd470ad8f844a9e7a463 Mon Sep 17 00:00:00 2001 From: gwulfs Date: Wed, 25 Feb 2015 22:56:02 -0500 Subject: [PATCH 018/276] blogger corpus --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 4150713..945ece0 100644 --- a/README.rst +++ b/README.rst @@ -246,6 +246,7 @@ Museums Natural Language ---------------- +* `Blogger Corpus `_ * `ClueWeb09 FACC `_ * `ClueWeb12 FACC `_ * `DBpedia - 4.58M things with 583M facts `_ From 575032e57663261be6959ddcc70e752579f8843a Mon Sep 17 00:00:00 2001 From: bore3601 Date: Fri, 6 Mar 2015 17:26:38 +0800 Subject: [PATCH 019/276] Update README.rst --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 945ece0..7250eee 100644 --- a/README.rst +++ b/README.rst @@ -334,6 +334,8 @@ Social Sciences * `UPJOHN for Labor Employment Research `_ * `Yahoo! Graph and Social Data `_ * `Youtube Video Social Graph in 2007,2008 `_ +* `Google Scholar citation relations `_ +* `Political Polarity Data `_ Sports From 24f4bdd48b5d979bd5cf53ab152c8394f51bfc20 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 18 Mar 2015 15:03:50 +0800 Subject: [PATCH 020/276] Fix the link of StaTrek --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 7250eee..2769b9e 100644 --- a/README.rst +++ b/README.rst @@ -383,4 +383,4 @@ Complementary Collections * Inside-r: `Finding Data on the Internet `_ * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ -* StaTrek: `Leveraging open data to understand urban lives `_ +* StaTrek: `Leveraging open data to understand urban lives `_ From 68fbe5674e6046411fa9ebe15a129b7262b3f5ae Mon Sep 17 00:00:00 2001 From: Xiaming Date: Fri, 20 Mar 2015 10:41:24 +0800 Subject: [PATCH 021/276] Fix #21 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 2769b9e..b2f8392 100644 --- a/README.rst +++ b/README.rst @@ -372,7 +372,7 @@ Transportation * `Transport for London (TFL) `_ * `Travel Tracker Survey (TTS) for Chicago `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ -* `U.S. Domestic Flights 1990 to 2009 `_ +* `U.S. Domestic Flights 1990 to 2009 `_ * `U.S. Freight Analysis Framework since 2007 `_ From 4493cd0c447d0eda3631576165c8cd1ad78cbd48 Mon Sep 17 00:00:00 2001 From: Shaun Walbridge Date: Sun, 22 Mar 2015 22:30:09 -0400 Subject: [PATCH 022/276] Add Landsat 8 on Amazon AWS Landsat 8 access over AWS; includes instructions for access from 5 different providers. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index b2f8392..28f65bb 100644 --- a/README.rst +++ b/README.rst @@ -140,6 +140,7 @@ GeoSpace/GIS * `Geo Spatial Data from ASU `_ * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ +* `Landsat 8 on AWS `_ * `Natural Earth - vectors and rasters of the world `_ * `Open Street Map (OSM) `_ * `TIGER/Line - U.S. boundaries and roads `_ From f52376d2707a2f76ea062e5659dae9ac34ca2ae3 Mon Sep 17 00:00:00 2001 From: Nilemar de Barcelos Date: Mon, 30 Mar 2015 13:06:33 -0300 Subject: [PATCH 023/276] Added dataset for "Number of Ebola Cases and Deaths in Affected Countries" --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 28f65bb..a892d91 100644 --- a/README.rst +++ b/README.rst @@ -199,7 +199,7 @@ Healthcare * `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ - +* `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ Image Processing From 1423edaa0b87299d573ec97d43554b4e4c338f91 Mon Sep 17 00:00:00 2001 From: gwulfs Date: Tue, 31 Mar 2015 17:48:15 -0400 Subject: [PATCH 024/276] criteo --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 4150713..04b35f3 100644 --- a/README.rst +++ b/README.rst @@ -75,6 +75,7 @@ Computer Networks * `ClueWeb12 - 733M web pages `_ * `CommonCrawl Web Data over 7 years `_ * `CRAWDAD Wireless datasets from Dartmouth Univ. `_ +* `Criteo click-through data `_ * `Open Mobile Data by MobiPerf `_ * `UCSD Network Telescope, IPv4 /8 net `_ From 5c1a131a22966e10fdf4709bf39e42061cdfaaef Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 1 Apr 2015 12:56:22 +0800 Subject: [PATCH 025/276] Add GeoLife data and MIT reality mining --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index d81bbb5..c6f59f5 100644 --- a/README.rst +++ b/README.rst @@ -323,6 +323,7 @@ Social Sciences * `General Social Survey (GSS) since 1972 `_ * `GetGlue - users rating TV shows `_ * `GitHub Collaboration Archive `_ +* `MIT Reality Mining Dataset `_ * `Mobile Social Networks from UMASS `_ * `PewResearch Internet Survey Project `_ * `SourceForge.net Research Data `_ @@ -365,6 +366,7 @@ Transportation * `Airlines OD Data 1987-2008 `_ * `Bike Share Systems (BSS) collection `_ * `Bay Area Bike Share Data `_ +* `GeoLife GPS Trajectory from Microsoft Research `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ From 4ad9bb7e0d2805fe5165bf59e5442b06934343f7 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 1 Apr 2015 13:09:03 +0800 Subject: [PATCH 026/276] Add D4D challenge --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index c6f59f5..256eb58 100644 --- a/README.rst +++ b/README.rst @@ -84,6 +84,7 @@ Data Challenges --------------- * `Challenges in Machine Learning `_ +* `D4D Challenge of Orange `_ * `DrivenData Competitions for Social Good `_ * `ICWSM Data Challenge (since 2009) `_ * `Kaggle Competition Data `_ From 4f4478380d94ea345179e0e0ccc8d6c119c5fef4 Mon Sep 17 00:00:00 2001 From: Bill Chambers Date: Wed, 1 Apr 2015 14:36:57 -0700 Subject: [PATCH 027/276] Added Network Data --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 256eb58..0bd5455 100644 --- a/README.rst +++ b/README.rst @@ -53,6 +53,8 @@ Complex Networks * `DBLP Citation dataset `_ * `NBER Patent Citations `_ * `NIST complex networks data collection `_ +* `Small Network Data `_ +* `UCI Network Data Repository `_ * `Protein-protein interaction network `_ * `PyPI and Maven Dependency Network `_ * `Scopus Citation Database `_ From 642ca446e1d9c9a9ca7689ab564777969028908a Mon Sep 17 00:00:00 2001 From: Xiaming Date: Thu, 2 Apr 2015 20:04:02 +0800 Subject: [PATCH 028/276] Add world countries collection --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 0bd5455..888f668 100644 --- a/README.rst +++ b/README.rst @@ -150,6 +150,7 @@ GeoSpace/GIS * `TIGER/Line - U.S. boundaries and roads `_ * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ +* `World countries in multiple formats `_ Government From 9e4908ca1322e9ab42abdfacc93a94be6967555e Mon Sep 17 00:00:00 2001 From: Mike McGann Date: Mon, 6 Apr 2015 09:45:29 -0400 Subject: [PATCH 029/276] Added the NASA Global Imagery Browse Serivces and the International Space Apps Challenge --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 888f668..e8c8987 100644 --- a/README.rst +++ b/README.rst @@ -40,6 +40,7 @@ Climate/Weather * `Canadian Meteorological Centre `_ * `Climate Data from UEA (updated monthly) `_ * `Global Climate Data Since 1929 `_ +* `NASA Global Imagery Browse Services `_ * `NOAA Bering Sea Climate `_ * `NOAA Climate Datasets `_ * `NOAA Realtime Weather Models `_ @@ -92,6 +93,7 @@ Data Challenges * `Kaggle Competition Data `_ * `KDD Cup by Tencent 2012 `_ * `Localytics Data Visualization Challenge `_ +* `Space Apps Challenge `_ * `Netflix Prize `_ * `Yelp Dataset Challenge `_ From 7b437321bf184558a3d960a73e5bb4b31bd7affc Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 14 Apr 2015 15:51:04 +0800 Subject: [PATCH 030/276] Add THE NETWORK REPOSITORY --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e8c8987..540da90 100644 --- a/README.rst +++ b/README.rst @@ -63,6 +63,7 @@ Complex Networks * `Stanford Large Network Dataset Collection `_ * `The Koblenz Network Collection `_ * `The Laboratory for Web Algorithmics (UNIMI) `_ +* `The Nexus Network Repository `_ * `UCI Network Data Repository `_ * `UFL sparse matrix collection `_ * `WSU Graph Database `_ From 7563089a90750a92fd8a8e9334c76e3a87f8c63c Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 15 Apr 2015 21:19:21 +0800 Subject: [PATCH 031/276] Add Telecom Italia Big Data Challenge --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 540da90..78ce71a 100644 --- a/README.rst +++ b/README.rst @@ -94,8 +94,9 @@ Data Challenges * `Kaggle Competition Data `_ * `KDD Cup by Tencent 2012 `_ * `Localytics Data Visualization Challenge `_ -* `Space Apps Challenge `_ * `Netflix Prize `_ +* `Space Apps Challenge `_ +* `Telecom Italia Big Data Challenge `_ * `Yelp Dataset Challenge `_ From 3a0f991e7f74777c97574710d09c8b5af01e423e Mon Sep 17 00:00:00 2001 From: Ian Dees Date: Wed, 15 Apr 2015 23:53:43 -0400 Subject: [PATCH 032/276] Add OpenAddresses to the README --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 78ce71a..6bbbcd6 100644 --- a/README.rst +++ b/README.rst @@ -155,6 +155,7 @@ GeoSpace/GIS * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ * `World countries in multiple formats `_ +* `OpenAddresses `_ Government From a71e555a65cb2f196d3d68c25599bb7f4b1c50df Mon Sep 17 00:00:00 2001 From: ulrich Date: Sat, 18 Apr 2015 12:54:03 +0100 Subject: [PATCH 033/276] Add OpenDataMonitor --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 6bbbcd6..ae33283 100644 --- a/README.rst +++ b/README.rst @@ -397,3 +397,4 @@ Complementary Collections * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ +* OpenDataMonitor: `An overview of available open data resources in Europe `_ From 2f4060167ad2be257f5a105dd869b822fe507e4b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Hannu=20Kro=CC=88ger?= Date: Sat, 18 Apr 2015 22:03:34 -0400 Subject: [PATCH 034/276] Add Finnish open data --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 6bbbcd6..9df9318 100644 --- a/README.rst +++ b/README.rst @@ -171,6 +171,7 @@ Government * `Denver Open Data `_ * `EuroStat `_ * `FedStats `_ +* `Finland `_ * `France `_ * `Germany `_ * `Glasgow, Scotland, UK `_ From 1278bf572513352e969ec428c39959163619ac63 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Mon, 20 Apr 2015 16:08:29 +0800 Subject: [PATCH 035/276] Add NCES and LG Inform --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 92e5787..61aee73 100644 --- a/README.rst +++ b/README.rst @@ -169,6 +169,7 @@ Government * `Chicago `_ * `Dallas Open Data `_ * `Denver Open Data `_ +* `England LGInform `_ * `EuroStat `_ * `FedStats `_ * `Finland `_ @@ -193,6 +194,7 @@ Government * `U.S. American Community Survey `_ * `U.S. CDC Public Health datasets `_ * `U.S. Census Bureau `_ +* `U.S. National Center for Education Statistics (NCES) `_ * `U.S. Department of Housing and Urban Development (HUD) `_ * `U.S. Federal Government Agencies `_ * `U.S. Federal Government Data Catalog `_ From 6e938419c69b182fc80d70e5053b169764640ea1 Mon Sep 17 00:00:00 2001 From: George Ehrhorn Date: Mon, 20 Apr 2015 09:44:27 -0400 Subject: [PATCH 036/276] Update README.rst The Cricsheet Matches site indicates it's data about cricket, not baseball. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 61aee73..e8a122b 100644 --- a/README.rst +++ b/README.rst @@ -357,7 +357,7 @@ Sports ------ * `Betfair Historical Exchange Data `_ -* `Cricsheet Matches (baseball) `_ +* `Cricsheet Matches (cricket) `_ * `Ergast Formula 1, from 1950 up to date (API) `_ * `Football/Soccer resouces (data and APIs) `_ * `Lahman's Baseball Database `_ From b9aec20adc873edb74c2c3749ff978e2701b71a2 Mon Sep 17 00:00:00 2001 From: jpagand Date: Mon, 20 Apr 2015 18:44:31 +0200 Subject: [PATCH 037/276] Added Switzerland Government --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e8a122b..7c56a17 100644 --- a/README.rst +++ b/README.rst @@ -189,6 +189,7 @@ Government * `San Francisco Data sets `_ * `Seattle `_ * `South Africa `_ +* `Switzerland `_ * `The World Bank `_ * `U.K. Government Data `_ * `U.S. American Community Survey `_ From 2b44daf2ef24a96daf2be3ab10668358b2b0f7c2 Mon Sep 17 00:00:00 2001 From: Austin Richardson Date: Mon, 20 Apr 2015 14:53:04 -0400 Subject: [PATCH 038/276] Added American Gut Added more recent data similar to the HMP. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e8a122b..8c1d43c 100644 --- a/README.rst +++ b/README.rst @@ -21,6 +21,7 @@ Biology * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `Gene Expression Omnibus (GEO) `_ * `Human Microbiome Project (HMP) `_ +* `American Gut (Microbiome Project) `_ * `ICOS PSP Benchmark `_ * `MIT Cancer Genomics Data `_ * `NIH Microarray data (FTP) `_ From e64d2ae5dc681ddba0f7c1c081d4aaf04c108469 Mon Sep 17 00:00:00 2001 From: Chip Rosenthal Date: Mon, 20 Apr 2015 13:55:54 -0500 Subject: [PATCH 039/276] Added "Austin, TX, US" to Government. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e8a122b..ee0f683 100644 --- a/README.rst +++ b/README.rst @@ -163,6 +163,7 @@ Government * `Australia (abs.gov.au) `_ * `Australia (data.gov.au) `_ +* `Austin, TX, US `_ * `Brazil `_ * `Cambridge, MA, US `_ * `Canada `_ From 84e775bf4f9df4db31ffe4a023178cb009c19889 Mon Sep 17 00:00:00 2001 From: Chip Rosenthal Date: Mon, 20 Apr 2015 13:57:00 -0500 Subject: [PATCH 040/276] fixed ordering --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index ee0f683..4976b85 100644 --- a/README.rst +++ b/README.rst @@ -161,9 +161,9 @@ GeoSpace/GIS Government ---------- +* `Austin, TX, US `_ * `Australia (abs.gov.au) `_ * `Australia (data.gov.au) `_ -* `Austin, TX, US `_ * `Brazil `_ * `Cambridge, MA, US `_ * `Canada `_ From 71a4691dce530c0bf1ad8386846fb80e28e71ec2 Mon Sep 17 00:00:00 2001 From: Alexandru Guzinschi Date: Tue, 21 Apr 2015 09:12:07 +0300 Subject: [PATCH 041/276] Update Readme Added Romania Government --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e8a122b..50d6707 100644 --- a/README.rst +++ b/README.rst @@ -186,6 +186,7 @@ Government * `NYC Open Data `_ * `OECD `_ * `Open Government Data (OGD) Platform India `_ +* `Romania `_ * `San Francisco Data sets `_ * `Seattle `_ * `South Africa `_ From 45c6fb1cf2af2e53326eefb30f701aa1b84788f5 Mon Sep 17 00:00:00 2001 From: Greg Bakos Date: Tue, 21 Apr 2015 07:52:31 +0100 Subject: [PATCH 042/276] added The World Bank Open Data Resources for Climate Change --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e8a122b..7e62682 100644 --- a/README.rst +++ b/README.rst @@ -44,6 +44,7 @@ Climate/Weather * `NOAA Bering Sea Climate `_ * `NOAA Climate Datasets `_ * `NOAA Realtime Weather Models `_ +* `The World Bank Open Data Resources for Climate Change ` * `WU Historical Weather Worldwide `_ From b2502513ef25f4ad5530c32c2818844d4672bf32 Mon Sep 17 00:00:00 2001 From: Greg Bakos Date: Tue, 21 Apr 2015 07:54:28 +0100 Subject: [PATCH 043/276] added UEA Climatic Research Unit --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 7e62682..bce1392 100644 --- a/README.rst +++ b/README.rst @@ -44,7 +44,8 @@ Climate/Weather * `NOAA Bering Sea Climate `_ * `NOAA Climate Datasets `_ * `NOAA Realtime Weather Models `_ -* `The World Bank Open Data Resources for Climate Change ` +* `The World Bank Open Data Resources for Climate Change `_ +* `UEA Climatic Research Unit `_ * `WU Historical Weather Worldwide `_ From 67ac43f039303fab784aba3e0d0a4b08bcc9026c Mon Sep 17 00:00:00 2001 From: Javier Rey Date: Tue, 21 Apr 2015 09:53:36 -0300 Subject: [PATCH 044/276] Adds Uruguay government open datasets --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 2bf903e..538a9c2 100644 --- a/README.rst +++ b/README.rst @@ -197,6 +197,7 @@ Government * `Switzerland `_ * `The World Bank `_ * `U.K. Government Data `_ +* `Uruguay `_ * `U.S. American Community Survey `_ * `U.S. CDC Public Health datasets `_ * `U.S. Census Bureau `_ From 93b396bbbba812507e56fa68f8f1124202789d40 Mon Sep 17 00:00:00 2001 From: Luqmaan Dawoodjee Date: Tue, 21 Apr 2015 10:33:53 -0500 Subject: [PATCH 045/276] Add State of Texas Open Data Portal https://data.texas.gov/ --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 2bf903e..190bfd9 100644 --- a/README.rst +++ b/README.rst @@ -196,6 +196,7 @@ Government * `South Africa `_ * `Switzerland `_ * `The World Bank `_ +* `Texas Open Data `_ * `U.K. Government Data `_ * `U.S. American Community Survey `_ * `U.S. CDC Public Health datasets `_ From e9bbc75fc389ba1578666fec43691f84204ff671 Mon Sep 17 00:00:00 2001 From: Zachary Drummond Date: Tue, 21 Apr 2015 11:03:36 -0700 Subject: [PATCH 046/276] Added EDRM version of the Enron Email --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 2bf903e..fafddd7 100644 --- a/README.rst +++ b/README.rst @@ -334,6 +334,7 @@ Social Sciences * `Ancestry.com Forum Dataset over 10 years `_ * `CMU Enron Email of 150 users `_ +* `EDRM Enron EMail of 151 users, hosted on S3 `_ * `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ * `Foursquare Social Network in 2010, 2011 `_ From 8e855e7b153f80cde96b858147896b026319eb41 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Julio=20Acu=C3=B1a?= Date: Tue, 21 Apr 2015 14:08:05 -0500 Subject: [PATCH 047/276] Update README.rst Adds Mexico government open datasets --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 2bf903e..791d600 100644 --- a/README.rst +++ b/README.rst @@ -184,6 +184,7 @@ Government * `Indian Government Data `_ * `London Datastore, UK `_ * `MassGIS, Massachusetts, U.S. `_ +* `Mexico `_ * `Netherlands `_ * `New Zealand `_ * `NYC betanyc `_ From d6e44a2deedda3a47c8c657f434096bee1d91d1c Mon Sep 17 00:00:00 2001 From: Mark Farrell Date: Wed, 22 Apr 2015 10:04:47 +0000 Subject: [PATCH 048/276] Add Rijksmuseum. Signed-off-by: Mark Farrell --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 2bf903e..3f58304 100644 --- a/README.rst +++ b/README.rst @@ -261,6 +261,7 @@ Museums * `Minneapolis Institute of Arts metadata `_ * `Tate Collection metadata `_ * `The Getty vocabularies `_ +* `Rijksmuseum Historical Art Collection `_ Natural Language From 41f6c37858bbbc306fdd4f38f6c4d93e68abe6c2 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 22 Apr 2015 23:58:36 +0800 Subject: [PATCH 049/276] Atlas Project moved #73 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index df1a105..90503bf 100644 --- a/README.rst +++ b/README.rst @@ -209,7 +209,7 @@ Government * `U.S. Federal Government Data Catalog `_ * `U.S. Food and Drug Administration (FDA) `_ * `U.S. Open Government `_ -* `UK 2011 Census Open Atlas Project `_ +* `UK 2011 Census Open Atlas Project `_ * `United Nations `_ From 307d268132ef5dcf56551fa81f5c190dd674cdac Mon Sep 17 00:00:00 2001 From: devmoreno Date: Wed, 22 Apr 2015 12:21:51 -0400 Subject: [PATCH 050/276] Added Puerto Rico Open data to the List --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 90503bf..9cc8282 100644 --- a/README.rst +++ b/README.rst @@ -198,6 +198,7 @@ Government * `Switzerland `_ * `The World Bank `_ * `Texas Open Data `_ +* `Puerto Rico Government `_ * `U.K. Government Data `_ * `Uruguay `_ * `U.S. American Community Survey `_ From 80d3676805a5539baac12f1693845898c9fd1648 Mon Sep 17 00:00:00 2001 From: Gideon Wulfsohn Date: Thu, 23 Apr 2015 01:26:33 -0400 Subject: [PATCH 051/276] Update README.rst --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 9cc8282..fdab49a 100644 --- a/README.rst +++ b/README.rst @@ -236,6 +236,7 @@ Image Processing * `International Affective Picture System, UFL `_ * `Massive Visual Memory Stimuli, MIT `_ * `SUN database, MIT `_ +* `YouTube Faces Database `_ Machine Learning From c0c42aa2c296e997c624371fdf5fab03a170cfa2 Mon Sep 17 00:00:00 2001 From: Jonah Duckles Date: Thu, 23 Apr 2015 12:23:44 -0500 Subject: [PATCH 052/276] Adding data.ok.gov website to Government section --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index fdab49a..0aeb651 100644 --- a/README.rst +++ b/README.rst @@ -190,6 +190,7 @@ Government * `NYC betanyc `_ * `NYC Open Data `_ * `OECD `_ +* `Oklahoma `_ * `Open Government Data (OGD) Platform India `_ * `Romania `_ * `San Francisco Data sets `_ From c882b2867a6f3211cfd0a5a1db58491fe10fc494 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Sa=C5=A1a=20Stamenkovi=C4=87?= Date: Fri, 24 Apr 2015 16:33:19 +0200 Subject: [PATCH 053/276] Add Country List --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 0aeb651..623dd2c 100644 --- a/README.rst +++ b/README.rst @@ -158,6 +158,7 @@ GeoSpace/GIS * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ * `World countries in multiple formats `_ +* `List of all countries with names and ISO 3166-1 codes in all languages and all data formats `_ * `OpenAddresses `_ From e9eb17fabb291a71a26927e10aad465e71840015 Mon Sep 17 00:00:00 2001 From: Lynn Langit Date: Sun, 26 Apr 2015 12:15:39 -0700 Subject: [PATCH 054/276] Added link to Azure Free Datasets from Microsoft Azure Data Market --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 0aeb651..f75dcd9 100644 --- a/README.rst +++ b/README.rst @@ -309,6 +309,7 @@ Public Domains * `Google `_ * `Infochimps `_ * `KDNuggets Data Collections `_ +* `Microsoft Azure Data Market Free DataSets `_ * `Numbray `_ * `Reddit Datasets `_ * `RevolutionAnalytics Collection `_ From bef8d7862a57301ffd529998d4744355475acf70 Mon Sep 17 00:00:00 2001 From: James Davenport Date: Mon, 27 Apr 2015 10:21:18 -0700 Subject: [PATCH 055/276] added geology datasets --- README.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.rst b/README.rst index 0aeb651..311f99e 100644 --- a/README.rst +++ b/README.rst @@ -140,6 +140,11 @@ Finance * `St Louis Federal `_ * `Yahoo Finance `_ +Geology +------- +* `USGS Earthquake Archives `_ +* `Smithsonian Institution Global Volcano and Eruption Database `_ + GeoSpace/GIS ------------ From 274de9d8450365c967f2f4f7015e05c71360939a Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 28 Apr 2015 09:54:10 +0800 Subject: [PATCH 056/276] Fix Titanic survival data #80 --- README.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index 9173c7d..3aa8cfb 100644 --- a/README.rst +++ b/README.rst @@ -6,7 +6,7 @@ are collected and tidyed from blogs, answers, and user reponses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in the `awesome-awesomeness `_ and -`another awesome `_ list. +`sindresorhus's awesome `_ list. Agriculture @@ -163,7 +163,7 @@ GeoSpace/GIS * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ * `World countries in multiple formats `_ -* `List of all countries with names and ISO 3166-1 codes in all languages and all data formats `_ +* `List of all countries in all languages `_ * `OpenAddresses `_ @@ -361,7 +361,7 @@ Social Sciences * `PewResearch Internet Survey Project `_ * `SourceForge.net Research Data `_ * `StackExchange Data Explorer `_ -* `Titanic Survival Data Set `_ +* `Titanic Survival Data Set `_ * `Twitter Graph of entire Twitter site `_ * `UCB's Archive of Social Science Data (D-Lab) `_ * `UCLA Social Sciences Data Archive `_ From c04574a73badf7d2f0e6a7b1c0897438986a56f3 Mon Sep 17 00:00:00 2001 From: OctoMiao Date: Wed, 29 Apr 2015 14:09:15 -0600 Subject: [PATCH 057/276] new dataset: NASA exoplanet archive There are many exoplanets sets for exoplanets but this one is always updated. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 3aa8cfb..8610014 100644 --- a/README.rst +++ b/README.rst @@ -301,6 +301,7 @@ Physics * `CERN Open Data Portal `_ * `NSSDC (NASA) data of 550 space spacecraft `_ +* `NASA Exoplanet Archive `_ Public Domains From a268928e15c9319a7d77b182363bc08a07aae684 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Thu, 30 Apr 2015 15:38:51 +0800 Subject: [PATCH 058/276] Update license copyright info. --- LICENSE | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/LICENSE b/LICENSE index 36cde46..35fcb87 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ The MIT License (MIT) -Copyright (c) 2014 Xiaming +Copyright (c) 2014-2015 Xiaming Chen and other contributors to this list. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal From 54b0ca4924274056f9ed30571c7ec92df5e9d4ce Mon Sep 17 00:00:00 2001 From: Matthias Kurz Date: Tue, 19 May 2015 09:43:23 +0200 Subject: [PATCH 059/276] Added link to open data website for Austria --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 8610014..42a9eb7 100644 --- a/README.rst +++ b/README.rst @@ -173,6 +173,7 @@ Government * `Austin, TX, US `_ * `Australia (abs.gov.au) `_ * `Australia (data.gov.au) `_ +* `Austria (data.gv.at) `_ * `Brazil `_ * `Cambridge, MA, US `_ * `Canada `_ From 4aae4b12d3c1b944e3282d0a0a6031e717edfa7b Mon Sep 17 00:00:00 2001 From: Nicholas Arner Date: Tue, 19 May 2015 18:24:31 -0400 Subject: [PATCH 060/276] New dataset: MIT Heart Rate Time Series --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 42a9eb7..184add3 100644 --- a/README.rst +++ b/README.rst @@ -393,6 +393,7 @@ Time Series * `Time Series Data Library (TSDL) from MU `_ * `UC Riverside Time Series Dataset `_ * `Hard Drive Failure Rates `_ +* `Heart Rate Time Series from MIT `_ Transportation From d922038aacea31823ba858aa76813e9b743f8147 Mon Sep 17 00:00:00 2001 From: Nicholas Arner Date: Tue, 19 May 2015 18:27:51 -0400 Subject: [PATCH 061/276] Added new category and dataset --- README.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.rst b/README.rst index 184add3..135759c 100644 --- a/README.rst +++ b/README.rst @@ -304,6 +304,11 @@ Physics * `NSSDC (NASA) data of 550 space spacecraft `_ * `NASA Exoplanet Archive `_ +Psychology/Cognition +-------------- + +* `OSU Cognitive Modeling Repositor Datasets `_ + Public Domains -------------- From 91cf5d53287bfe3fd097024b6cb53375a9b4a993 Mon Sep 17 00:00:00 2001 From: Nicholas Arner Date: Wed, 20 May 2015 18:14:15 -0400 Subject: [PATCH 062/276] Typo fix --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 135759c..aa93db8 100644 --- a/README.rst +++ b/README.rst @@ -307,7 +307,7 @@ Physics Psychology/Cognition -------------- -* `OSU Cognitive Modeling Repositor Datasets `_ +* `OSU Cognitive Modeling Repository Datasets `_ Public Domains From 128483668fa904f2a60dcdef5e94f94922da9ebb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Heitor=20Guimar=C3=A3es?= Date: Thu, 28 May 2015 13:01:26 -0300 Subject: [PATCH 063/276] Adding in Weather and Government data of Brazil and the SDSS in Physics. --- README.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.rst b/README.rst index aa93db8..014f4ee 100644 --- a/README.rst +++ b/README.rst @@ -38,6 +38,7 @@ Climate/Weather --------------- * `Australian Weather `_ +* `Brazilian Weather - Historical data (In Portuguese) `_ * `Canadian Meteorological Centre `_ * `Climate Data from UEA (updated monthly) `_ * `Global Climate Data Since 1929 `_ @@ -199,6 +200,7 @@ Government * `OECD `_ * `Oklahoma `_ * `Open Government Data (OGD) Platform India `_ +* `Rio de Janeiro, Brazil `_ * `Romania `_ * `San Francisco Data sets `_ * `Seattle `_ @@ -303,6 +305,7 @@ Physics * `CERN Open Data Portal `_ * `NSSDC (NASA) data of 550 space spacecraft `_ * `NASA Exoplanet Archive `_ +* `Sloan Digital Sky Survey (SDSS) - Mapping the Universe `_ Psychology/Cognition -------------- From 0542ba31997ac671367dedcb32449ef8e72151d3 Mon Sep 17 00:00:00 2001 From: Ben Van Dyke Date: Sun, 7 Jun 2015 11:16:06 -0700 Subject: [PATCH 064/276] Add Los Angeles Open Data --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 014f4ee..9813452 100644 --- a/README.rst +++ b/README.rst @@ -191,6 +191,7 @@ Government * `Guardian world governments `_ * `Indian Government Data `_ * `London Datastore, UK `_ +* `Los Angeles Open Data `_ * `MassGIS, Massachusetts, U.S. `_ * `Mexico `_ * `Netherlands `_ From 4227cffb35067f5a2a4dfc8fa5558276f7d7ee03 Mon Sep 17 00:00:00 2001 From: Michael Harrison Date: Thu, 11 Jun 2015 15:53:07 -0500 Subject: [PATCH 065/276] Add Houston, TX Open Data --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 014f4ee..544f310 100644 --- a/README.rst +++ b/README.rst @@ -189,6 +189,7 @@ Government * `Germany `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ +* `Houston Open Data `_ * `Indian Government Data `_ * `London Datastore, UK `_ * `MassGIS, Massachusetts, U.S. `_ From e50788009e38076d1f80bd7376f59e313ee52789 Mon Sep 17 00:00:00 2001 From: Erich Morisse Date: Sat, 20 Jun 2015 09:48:10 -0400 Subject: [PATCH 066/276] Added aggregated FBI 2013 hate crime data --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 17faaae..66e4626 100644 --- a/README.rst +++ b/README.rst @@ -363,6 +363,7 @@ Social Sciences * `EDRM Enron EMail of 151 users, hosted on S3 `_ * `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ +* `FBI Hate Crime 2013 - aggregated data ` * `Foursquare Social Network in 2010, 2011 `_ * `Foursquare from UMN/Sarwat (2013) `_ * `General Social Survey (GSS) since 1972 `_ From 6c7df5eba53fa4b74007305ee9d766ae35e01a8d Mon Sep 17 00:00:00 2001 From: ebutler Date: Fri, 3 Jul 2015 16:28:54 -0700 Subject: [PATCH 067/276] Update README.rst added vancouver city open data catalog under government --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 66e4626..0c7afa0 100644 --- a/README.rst +++ b/README.rst @@ -224,6 +224,7 @@ Government * `U.S. Open Government `_ * `UK 2011 Census Open Atlas Project `_ * `United Nations `_ +* `Vancouver, BC Open Data Catalog`_ Healthcare From 4fd844f5d9e14e76fd5b22621598cb7e913186df Mon Sep 17 00:00:00 2001 From: Hendra Date: Sun, 5 Jul 2015 21:24:59 +0800 Subject: [PATCH 068/276] Add Indonesian and Singaporean government data portal --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 66e4626..5bd1546 100644 --- a/README.rst +++ b/README.rst @@ -191,6 +191,7 @@ Government * `Guardian world governments `_ * `Houston Open Data `_ * `Indian Government Data `_ +* `Indonesian Data Portal `_ * `London Datastore, UK `_ * `Los Angeles Open Data `_ * `MassGIS, Massachusetts, U.S. `_ @@ -206,6 +207,7 @@ Government * `Romania `_ * `San Francisco Data sets `_ * `Seattle `_ +* `Singapore Government Data `_ * `Switzerland `_ * `The World Bank `_ From dc3366ff5ea60fc4283a814352bf526f862489ae Mon Sep 17 00:00:00 2001 From: Xiaming Date: Thu, 9 Jul 2015 15:55:53 +0800 Subject: [PATCH 069/276] Update README.rst --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index f6cd6da..207bc41 100644 --- a/README.rst +++ b/README.rst @@ -226,7 +226,7 @@ Government * `U.S. Open Government `_ * `UK 2011 Census Open Atlas Project `_ * `United Nations `_ -* `Vancouver, BC Open Data Catalog`_ +* `Vancouver, BC Open Data Catalog `_ Healthcare From a997f8d50beaf2e36781076a5f571864524d1e46 Mon Sep 17 00:00:00 2001 From: Tim Smith Date: Thu, 9 Jul 2015 12:49:48 -0700 Subject: [PATCH 070/276] OpenStreetMap is 1 word --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 207bc41..93dae1d 100644 --- a/README.rst +++ b/README.rst @@ -159,7 +159,7 @@ GeoSpace/GIS * `Global Administrative Areas Database (GADM) `_ * `Landsat 8 on AWS `_ * `Natural Earth - vectors and rasters of the world `_ -* `Open Street Map (OSM) `_ +* `OpenStreetMap (OSM) `_ * `TIGER/Line - U.S. boundaries and roads `_ * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ From b93bd06536dd9d19ae682938d5565d3fa89768c6 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 14 Jul 2015 17:44:44 +0800 Subject: [PATCH 071/276] Update README.rst --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 93dae1d..cce43ef 100644 --- a/README.rst +++ b/README.rst @@ -375,6 +375,7 @@ Social Sciences * `MIT Reality Mining Dataset `_ * `Mobile Social Networks from UMASS `_ * `PewResearch Internet Survey Project `_ +* `Reddit Comments `_ * `SourceForge.net Research Data `_ * `StackExchange Data Explorer `_ * `Titanic Survival Data Set `_ From 1d819e86b51dff199fd59d379e1af8609c72d8b8 Mon Sep 17 00:00:00 2001 From: Austin Davis-Richardson Date: Thu, 30 Jul 2015 12:31:56 -0400 Subject: [PATCH 072/276] added opensnp.org --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index cce43ef..d94fa22 100644 --- a/README.rst +++ b/README.rst @@ -25,6 +25,7 @@ Biology * `ICOS PSP Benchmark `_ * `MIT Cancer Genomics Data `_ * `NIH Microarray data (FTP) `_ +* `OpenSNP genotypes data `_ * `Protein Data Bank `_ * `PubChem Project `_ * `PubGene (now Coremine Medical) `_ From 1a3f1f037ae9cdf1eb7cb3501d0be95a06c939ba Mon Sep 17 00:00:00 2001 From: John Wittenauer Date: Fri, 31 Jul 2015 20:44:25 -0400 Subject: [PATCH 073/276] Added link to the GDELT project. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index cce43ef..93e4138 100644 --- a/README.rst +++ b/README.rst @@ -389,6 +389,7 @@ Social Sciences * `Youtube Video Social Graph in 2007,2008 `_ * `Google Scholar citation relations `_ * `Political Polarity Data `_ +* `GDELT Global Events Database `_ Sports From 1a8281977cb9b3e57a670b0aa7e1a76d90b050dc Mon Sep 17 00:00:00 2001 From: Xiaming Date: Mon, 3 Aug 2015 10:44:30 +0800 Subject: [PATCH 074/276] Fix format error --- README.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index 93e4138..53447de 100644 --- a/README.rst +++ b/README.rst @@ -207,7 +207,7 @@ Government * `Romania `_ * `San Francisco Data sets `_ * `Seattle `_ -* `Singapore Government Data `_ * `South Africa `_ * `Switzerland `_ * `The World Bank `_ @@ -366,7 +366,7 @@ Social Sciences * `EDRM Enron EMail of 151 users, hosted on S3 `_ * `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ -* `FBI Hate Crime 2013 - aggregated data ` +* `FBI Hate Crime 2013 - aggregated data `_ * `Foursquare Social Network in 2010, 2011 `_ * `Foursquare from UMN/Sarwat (2013) `_ * `General Social Survey (GSS) since 1972 `_ From fb8128e88da64f81276c038a336a1f9f36bef7f7 Mon Sep 17 00:00:00 2001 From: Craig Davison Date: Fri, 7 Aug 2015 17:30:45 +0100 Subject: [PATCH 075/276] Add awesome list badge --- README.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index faaacfa..2bcf430 100644 --- a/README.rst +++ b/README.rst @@ -1,6 +1,8 @@ Awesome Public Datasets ======================= - +.. image:: https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg + :alt: Awesome + :target: https://github.com/sindresorhus/awesome `This list of public data sources `_ are collected and tidyed from blogs, answers, and user reponses. Most of the data sets listed below are free, however, some are not. From 822a56532458ee82fad8b364ea79dd0ebe85333f Mon Sep 17 00:00:00 2001 From: Steven Weaver Date: Sat, 8 Aug 2015 10:19:50 -0700 Subject: [PATCH 076/276] README typo correction --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index faaacfa..4985aa0 100644 --- a/README.rst +++ b/README.rst @@ -2,7 +2,7 @@ Awesome Public Datasets ======================= `This list of public data sources `_ -are collected and tidyed from blogs, answers, and user reponses. +are collected and tidied from blogs, answers, and user reponses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in the `awesome-awesomeness `_ and From 00c944ecc1b7c938b9ff63a2dd263ad1444e50c5 Mon Sep 17 00:00:00 2001 From: Mazen Abdulaziz Date: Tue, 11 Aug 2015 19:18:56 +0300 Subject: [PATCH 077/276] Update README.rst Added a link to SaudiNewsNet corpus --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index c5ea0c8..25c38f1 100644 --- a/README.rst +++ b/README.rst @@ -301,6 +301,7 @@ Natural Language * `Hansards text chunks of Canadian Parliament `_ * `Machine Translation of European languages `_ * `SMS Spam Collection in English `_ +* `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ * `USENET postings corpus of 2005~2011 `_ * `Wikidata - Wikipedia databases `_ * `Wikipedia Links data - 40 Million Entities in Context `_ From f034456bc5a82de1983a5754573205bbc09f4d41 Mon Sep 17 00:00:00 2001 From: Alison Date: Sat, 15 Aug 2015 21:26:04 -0500 Subject: [PATCH 078/276] add 2 sources --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 25c38f1..dada0ab 100644 --- a/README.rst +++ b/README.rst @@ -184,6 +184,7 @@ Government * `Chicago `_ * `Dallas Open Data `_ * `Denver Open Data `_ +* `Durham, NC Open Data `_ * `England LGInform `_ * `EuroStat `_ * `FedStats `_ @@ -383,6 +384,7 @@ Social Sciences * `SourceForge.net Research Data `_ * `StackExchange Data Explorer `_ * `Titanic Survival Data Set `_ +* `Texas Inmates Executed Since 1984 `_ * `Twitter Graph of entire Twitter site `_ * `UCB's Archive of Social Science Data (D-Lab) `_ * `UCLA Social Sciences Data Archive `_ From 0626801b5c6101560b596386a687b20efa818031 Mon Sep 17 00:00:00 2001 From: Alison Date: Sat, 15 Aug 2015 22:38:59 -0500 Subject: [PATCH 079/276] add twitter data --- README.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.rst b/README.rst index dada0ab..b4bb0b9 100644 --- a/README.rst +++ b/README.rst @@ -362,6 +362,15 @@ Search Engines * `Open Data Certificates (beta) `_ * `Statista.com - statistics and Studies `_ +Social Networks +--------------- + +* `May 2011 Calufa Twitter Scrape` _ +* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape` _ +* `Twitter Data for Sentiment Analysis` _ +* `Network Twitter Data` _ +* `Social Twitter Data` _ +* `72 hours #gamergate scrape` _ Social Sciences --------------- From 9a8bb991bb2c59adde6c1f26f1edcc4fb444018e Mon Sep 17 00:00:00 2001 From: Alison Date: Sat, 15 Aug 2015 22:41:36 -0500 Subject: [PATCH 080/276] social networks section clean-up --- README.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.rst b/README.rst index b4bb0b9..8a2aad9 100644 --- a/README.rst +++ b/README.rst @@ -365,12 +365,12 @@ Search Engines Social Networks --------------- -* `May 2011 Calufa Twitter Scrape` _ -* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape` _ -* `Twitter Data for Sentiment Analysis` _ -* `Network Twitter Data` _ -* `Social Twitter Data` _ -* `72 hours #gamergate scrape` _ +* `72 hours #gamergate scrape `_ +* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ +* `May 2011 Calufa Twitter Scrape `_ +* `Network Twitter Data `_ +* `Social Twitter Data `_ +* `Twitter Data for Sentiment Analysis `_ Social Sciences --------------- From 660567a9dd7de1de75403b9a44ccb071e0a1dc10 Mon Sep 17 00:00:00 2001 From: ilkercam Date: Thu, 20 Aug 2015 10:23:43 +0200 Subject: [PATCH 081/276] added 3 pet datasets --- README.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 8a2aad9..452d2da 100644 --- a/README.rst +++ b/README.rst @@ -248,7 +248,10 @@ Image Processing ---------------- * `10k US Adult Faces Database `_ -* `2GB of Photos of Cats `_ +* `2GB of Photos of Cats (Down - 20Agst2015) `_ +* `Stanford Dogs Dataset `_ +* `The Oxford-IIIT Pet Dataset `_ +* `Animals with attributes `_ * `Affective Image Classification `_ * `Face Recognition Benchmark `_ * `ImageNet (in WordNet hierarchy) `_ From ac59d756188b301b47c32c48f62894afe3f1739b Mon Sep 17 00:00:00 2001 From: Xiaming Date: Thu, 20 Aug 2015 17:36:34 +0800 Subject: [PATCH 082/276] Update README.rst --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 452d2da..515edf2 100644 --- a/README.rst +++ b/README.rst @@ -248,7 +248,7 @@ Image Processing ---------------- * `10k US Adult Faces Database `_ -* `2GB of Photos of Cats (Down - 20Agst2015) `_ +* `2GB of Photos of Cats (Original down - 20Agst2015) `_ or `Archive version `_ * `Stanford Dogs Dataset `_ * `The Oxford-IIIT Pet Dataset `_ * `Animals with attributes `_ From 56183c6ee57578660ab1c3dee4d793e7001f7644 Mon Sep 17 00:00:00 2001 From: Lorenzo Cafaro Date: Tue, 25 Aug 2015 06:29:17 +0200 Subject: [PATCH 083/276] Added indoor scene recognition (mit.edu) --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 515edf2..c0767ce 100644 --- a/README.rst +++ b/README.rst @@ -259,6 +259,7 @@ Image Processing * `Massive Visual Memory Stimuli, MIT `_ * `SUN database, MIT `_ * `YouTube Faces Database `_ +* `Indoor Scene Recognition `_ Machine Learning From f01ce77b3a60fd591a0e47cf8b5a76b88100775e Mon Sep 17 00:00:00 2001 From: Yan Hong Date: Fri, 28 Aug 2015 17:59:34 -0700 Subject: [PATCH 084/276] Add NYC Uber trip data 2014 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index c0767ce..4c27406 100644 --- a/README.rst +++ b/README.rst @@ -449,6 +449,7 @@ Transportation * `U.S. Bureau of Transportation Statistics (BTS) `_ * `U.S. Domestic Flights 1990 to 2009 `_ * `U.S. Freight Analysis Framework since 2007 `_ +* `NYC Uber trip data April 2014 to September 2014 `_ Complementary Collections From 0000a4c7a5069bac6125b7ed90aa4e6e90f19d86 Mon Sep 17 00:00:00 2001 From: Quang Nguyen Date: Mon, 31 Aug 2015 11:25:24 +0700 Subject: [PATCH 085/276] Update README.rst Added an air travel review dataset scraped and wrangled from Skytrax' website (www.airlinequality.com). Articles that have used this dataset: http://www.quangn.com/exploring-reviews-of-airline-services/ http://priceonomics.com/what-are-the-worst-airports-in-the-world/ --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index c0767ce..5734dfb 100644 --- a/README.rst +++ b/README.rst @@ -409,6 +409,7 @@ Social Sciences * `Google Scholar citation relations `_ * `Political Polarity Data `_ * `GDELT Global Events Database `_ +* `Skytrax' Air Travel Reviews Dataset `_ Sports From 24a326452c00e049bfb057bfec15e1829c93e42f Mon Sep 17 00:00:00 2001 From: "Peter K. Shultz" Date: Tue, 1 Sep 2015 11:03:19 -0400 Subject: [PATCH 086/276] Update README.rst Change "resouce" to "resource". --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 78acf31..db73083 100644 --- a/README.rst +++ b/README.rst @@ -418,7 +418,7 @@ Sports * `Betfair Historical Exchange Data `_ * `Cricsheet Matches (cricket) `_ * `Ergast Formula 1, from 1950 up to date (API) `_ -* `Football/Soccer resouces (data and APIs) `_ +* `Football/Soccer resources (data and APIs) `_ * `Lahman's Baseball Database `_ * `Retrosheet Baseball Statistics `_ From 983915fce9c4182b0b0ab3bca97bdd87c9189243 Mon Sep 17 00:00:00 2001 From: Mark Silverberg Date: Fri, 11 Sep 2015 18:39:17 -0400 Subject: [PATCH 087/276] Update to link to Open Data Network search engine All datasets have an API! --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index db73083..87a737b 100644 --- a/README.rst +++ b/README.rst @@ -462,3 +462,4 @@ Complementary Collections * RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ +* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ From fc6038aca5114b4996d93e7d395e5bcc8c2e0dd5 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 16 Sep 2015 13:43:41 +0800 Subject: [PATCH 088/276] Add pathguide under biology --- README.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index 87a737b..bf75ab7 100644 --- a/README.rst +++ b/README.rst @@ -20,14 +20,15 @@ Biology ------- * `1000 Genomes `_ +* `American Gut (Microbiome Project) `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `Gene Expression Omnibus (GEO) `_ * `Human Microbiome Project (HMP) `_ -* `American Gut (Microbiome Project) `_ * `ICOS PSP Benchmark `_ * `MIT Cancer Genomics Data `_ * `NIH Microarray data (FTP) `_ * `OpenSNP genotypes data `_ +* `Pathguid: Protein-Protein Interactions Catalog `_ * `Protein Data Bank `_ * `PubChem Project `_ * `PubGene (now Coremine Medical) `_ @@ -61,17 +62,17 @@ Complex Networks * `DBLP Citation dataset `_ * `NBER Patent Citations `_ * `NIST complex networks data collection `_ -* `Small Network Data `_ -* `UCI Network Data Repository `_ * `Protein-protein interaction network `_ * `PyPI and Maven Dependency Network `_ * `Scopus Citation Database `_ +* `Small Network Data `_ * `Stanford GraphBase (Steven Skiena) `_ * `Stanford Large Network Dataset Collection `_ * `The Koblenz Network Collection `_ * `The Laboratory for Web Algorithmics (UNIMI) `_ * `The Nexus Network Repository `_ * `UCI Network Data Repository `_ +* `UCI Network Data Repository `_ * `UFL sparse matrix collection `_ * `WSU Graph Database `_ From aff1329432586f836c4bbf4ba6081fcb671d1ddb Mon Sep 17 00:00:00 2001 From: Clapinton Date: Thu, 17 Sep 2015 13:21:45 -0700 Subject: [PATCH 089/276] Update README.rst --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index bf75ab7..4b9637b 100644 --- a/README.rst +++ b/README.rst @@ -97,6 +97,7 @@ Data Challenges * `Challenges in Machine Learning `_ * `D4D Challenge of Orange `_ +* `dataX `_ * `DrivenData Competitions for Social Good `_ * `ICWSM Data Challenge (since 2009) `_ * `Kaggle Competition Data `_ From d67fe16c8b19d16381379f9a2738528c9812339a Mon Sep 17 00:00:00 2001 From: Clapinton Date: Thu, 17 Sep 2015 13:22:23 -0700 Subject: [PATCH 090/276] Update README.rst --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 4b9637b..680a254 100644 --- a/README.rst +++ b/README.rst @@ -97,7 +97,7 @@ Data Challenges * `Challenges in Machine Learning `_ * `D4D Challenge of Orange `_ -* `dataX `_ +* `CrowdANALYTIX dataX `_ * `DrivenData Competitions for Social Good `_ * `ICWSM Data Challenge (since 2009) `_ * `Kaggle Competition Data `_ From e060de8ac4c37cc6d1fcba766539628c93f09d99 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Sat, 19 Sep 2015 10:49:36 +0800 Subject: [PATCH 091/276] :sparkles: Add google group :sparkles: --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 680a254..5a27c92 100644 --- a/README.rst +++ b/README.rst @@ -10,6 +10,8 @@ Other amazingly awesome lists can be found in the `awesome-awesomeness `_ and `sindresorhus's awesome `_ list. +* `Visit our Google Group on HQOD `_ + Agriculture ------------ From 7d870f1cd1321d7028ea14b4211d0b669351a676 Mon Sep 17 00:00:00 2001 From: Edward Lu Date: Sat, 3 Oct 2015 16:40:14 -0400 Subject: [PATCH 092/276] Update README.rst --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 5a27c92..5caf308 100644 --- a/README.rst +++ b/README.rst @@ -446,6 +446,7 @@ Transportation * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ +* `NYC Taxi Trip Data 2009- `_ * `OpenFlights - airport, airline and route data `_ * `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ From e8c8862758cea5d2e9a9d1956d37a4747a25fcf7 Mon Sep 17 00:00:00 2001 From: Pierre Fenoll Date: Tue, 6 Oct 2015 09:33:08 -0700 Subject: [PATCH 093/276] Add the Catalogue of Life --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 5a27c92..740cd9b 100644 --- a/README.rst +++ b/README.rst @@ -38,6 +38,7 @@ Biology * `The Personal Genome Project `_ or `PGP `_ * `UCSC Public Data `_ * `UniGene `_ +* `The Catalogue of Life `_ Climate/Weather From 146b5960994c74cfd7bf3d627d9b200869f12a8e Mon Sep 17 00:00:00 2001 From: Anthony Ringoet Date: Fri, 9 Oct 2015 11:37:52 +0200 Subject: [PATCH 094/276] Added Belgian datasets (Ghent, Antwerp, Belgium) --- README.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.rst b/README.rst index 5a27c92..20c7fe8 100644 --- a/README.rst +++ b/README.rst @@ -178,10 +178,12 @@ GeoSpace/GIS Government ---------- +* `Antwerp, Belgium `_ * `Austin, TX, US `_ * `Australia (abs.gov.au) `_ * `Australia (data.gov.au) `_ * `Austria (data.gv.at) `_ +* `Belgium `_ * `Brazil `_ * `Cambridge, MA, US `_ * `Canada `_ @@ -195,6 +197,7 @@ Government * `Finland `_ * `France `_ * `Germany `_ +* `Ghent, Belgium `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ * `Houston Open Data `_ From 3da29aea5e9de47cd36fd8ddd4dc85c6b29424b2 Mon Sep 17 00:00:00 2001 From: Oliver Kopp Date: Sun, 11 Oct 2015 16:01:01 +0200 Subject: [PATCH 095/276] Add Zenodo --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 5a27c92..ee9727a 100644 --- a/README.rst +++ b/README.rst @@ -467,3 +467,4 @@ Complementary Collections * StaTrek: `Leveraging open data to understand urban lives `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ * OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ +* Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. `_ \ No newline at end of file From 7ed1fec199908806e2a862558e0cb516c07e9350 Mon Sep 17 00:00:00 2001 From: Pierre Fenoll Date: Mon, 12 Oct 2015 19:20:43 -0700 Subject: [PATCH 096/276] Look for dead links --- .travis.yml | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 .travis.yml diff --git a/.travis.yml b/.travis.yml new file mode 100644 index 0000000..2353d2e --- /dev/null +++ b/.travis.yml @@ -0,0 +1,8 @@ +language: bash +sudo: false + +before_install: + - set -e + +script: + - ag -o '([^\s<>]+://[^\s<>]+)' README.rst | while read url; do if [[ "$url" != '' ]]; then echo "$url"; curl --output /dev/null --silent --head --fail --location --insecure "$url" || exit 1; fi; done From 47efe527627782e0b7adcc99b041c0c8981d068b Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 13 Oct 2015 15:42:43 +0800 Subject: [PATCH 097/276] Update README.rst --- README.rst | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index a53f131..8681dd1 100644 --- a/README.rst +++ b/README.rst @@ -3,6 +3,9 @@ Awesome Public Datasets .. image:: https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg :alt: Awesome :target: https://github.com/sindresorhus/awesome +.. image:: https://travis-ci.org/caesar0301/awesome-public-datasets.svg + :target: https://travis-ci.org/caesar0301/awesome-public-datasets + `This list of public data sources `_ are collected and tidied from blogs, answers, and user reponses. Most of the data sets listed below are free, however, some are not. @@ -10,7 +13,7 @@ Other amazingly awesome lists can be found in the `awesome-awesomeness `_ and `sindresorhus's awesome `_ list. -* `Visit our Google Group on HQOD `_ +* `Visit our Google Group on APD `_ Agriculture @@ -472,4 +475,4 @@ Complementary Collections * StaTrek: `Leveraging open data to understand urban lives `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ * OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ -* Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. `_ \ No newline at end of file +* Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. `_ From 8eefbc1313b5ad6b62ed0edfd226b7807f4901b9 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 14 Oct 2015 09:48:17 +0800 Subject: [PATCH 098/276] Add MeSH. Thanks #112 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 8681dd1..00c6a5e 100644 --- a/README.rst +++ b/README.rst @@ -252,6 +252,7 @@ Healthcare * `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ +* `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ From 9b0fdc06607063b62d0da80086855cb6f9b5efa4 Mon Sep 17 00:00:00 2001 From: hanguangchun Date: Wed, 14 Oct 2015 14:59:56 +0800 Subject: [PATCH 099/276] add 3 famous biological public data repositories:SRA, ENCODE&Arrayexpress --- README.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.rst b/README.rst index 00c6a5e..18be9bd 100644 --- a/README.rst +++ b/README.rst @@ -28,6 +28,9 @@ Biology * `American Gut (Microbiome Project) `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `Gene Expression Omnibus (GEO) `_ +* `Sequence Read Archive(SRA) `_ +* `EBI ArrayExrepss `_ +* `ENCODE project `_ * `Human Microbiome Project (HMP) `_ * `ICOS PSP Benchmark `_ * `MIT Cancer Genomics Data `_ From 236e791fa616c11d9c4e9cd06bb856d9dfc4a614 Mon Sep 17 00:00:00 2001 From: Sally Jenkinson Date: Wed, 14 Oct 2015 12:28:46 +0100 Subject: [PATCH 100/276] Added the Natural History Museum Data Portal and reordered Museums alphabetically --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 00c6a5e..6c102c3 100644 --- a/README.rst +++ b/README.rst @@ -299,9 +299,10 @@ Museums * `Cooper-Hewitt's Collection Database `_ * `Minneapolis Institute of Arts metadata `_ +* `Natural History Museum (London) Data Portal `_ +* `Rijksmuseum Historical Art Collection `_ * `Tate Collection metadata `_ * `The Getty vocabularies `_ -* `Rijksmuseum Historical Art Collection `_ Natural Language From 119369800bf8e5ed487f96886983fca097046585 Mon Sep 17 00:00:00 2001 From: Chris Mungall Date: Mon, 19 Oct 2015 19:45:53 -0700 Subject: [PATCH 101/276] Update README.rst Adding link to GloBI. cc @jhpoelen --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index de923f9..926c1b9 100644 --- a/README.rst +++ b/README.rst @@ -28,6 +28,7 @@ Biology * `American Gut (Microbiome Project) `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `Gene Expression Omnibus (GEO) `_ +* `Global Biotic Interations (GloBI) `_ * `Sequence Read Archive(SRA) `_ * `EBI ArrayExrepss `_ * `ENCODE project `_ From 5e382afcf7680ab4c75e887e431ad9774a3ddc84 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 20 Oct 2015 15:11:34 +0800 Subject: [PATCH 102/276] Update README.rst --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 926c1b9..7314a10 100644 --- a/README.rst +++ b/README.rst @@ -28,6 +28,7 @@ Biology * `American Gut (Microbiome Project) `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `Gene Expression Omnibus (GEO) `_ +* `Gene Ontology (GO) `_ * `Global Biotic Interations (GloBI) `_ * `Sequence Read Archive(SRA) `_ * `EBI ArrayExrepss `_ From a89c1c72bdd4fa2d45d32eb27e574581cb363c38 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Mon, 26 Oct 2015 12:37:09 +0800 Subject: [PATCH 103/276] Add place crash info. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 7314a10..f4172c5 100644 --- a/README.rst +++ b/README.rst @@ -462,6 +462,7 @@ Transportation * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Taxi Trip Data 2009- `_ * `OpenFlights - airport, airline and route data `_ +* `Plane Crash Database, since 1920 `_ * `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ * `Transport for London (TFL) `_ From c7826448fa18174dce2c42c2c7889879bd049531 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Thu, 29 Oct 2015 14:19:04 +0800 Subject: [PATCH 104/276] Add contextual data category and data --- README.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.rst b/README.rst index f4172c5..d3397f7 100644 --- a/README.rst +++ b/README.rst @@ -103,6 +103,12 @@ Computer Networks * `UCSD Network Telescope, IPv4 /8 net `_ +Contextual Data +--------------- + +* `Context-aware data sets from five domains `_ or `GitHub `_ + + Data Challenges --------------- From efe0eadc79fbfcd865c861b73e0ba8188bc2f45b Mon Sep 17 00:00:00 2001 From: Dave Justice Date: Mon, 9 Nov 2015 09:56:16 -0800 Subject: [PATCH 105/276] Add Oregon and City of Portland Datasets --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index d3397f7..faad736 100644 --- a/README.rst +++ b/README.rst @@ -229,6 +229,8 @@ Government * `OECD `_ * `Oklahoma `_ * `Open Government Data (OGD) Platform India `_ +* `Oregon `_ +* `Portland, Oregon `_ * `Rio de Janeiro, Brazil `_ * `Romania `_ * `San Francisco Data sets `_ From df727b98401458a1043c3f67c916bd5ae32e1135 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Fri, 20 Nov 2015 09:15:47 -0800 Subject: [PATCH 106/276] Update README URLs based on HTTP redirects --- README.rst | 88 +++++++++++++++++++++++++++--------------------------- 1 file changed, 44 insertions(+), 44 deletions(-) diff --git a/README.rst b/README.rst index faad736..09fcbd7 100644 --- a/README.rst +++ b/README.rst @@ -34,12 +34,12 @@ Biology * `EBI ArrayExrepss `_ * `ENCODE project `_ * `Human Microbiome Project (HMP) `_ -* `ICOS PSP Benchmark `_ +* `ICOS PSP Benchmark `_ * `MIT Cancer Genomics Data `_ * `NIH Microarray data (FTP) `_ * `OpenSNP genotypes data `_ * `Pathguid: Protein-Protein Interactions Catalog `_ -* `Protein Data Bank `_ +* `Protein Data Bank `_ * `PubChem Project `_ * `PubGene (now Coremine Medical) `_ * `Stanford Microarray Data `_ @@ -56,10 +56,10 @@ Climate/Weather * `Brazilian Weather - Historical data (In Portuguese) `_ * `Canadian Meteorological Centre `_ * `Climate Data from UEA (updated monthly) `_ -* `Global Climate Data Since 1929 `_ +* `Global Climate Data Since 1929 `_ * `NASA Global Imagery Browse Services `_ * `NOAA Bering Sea Climate `_ -* `NOAA Climate Datasets `_ +* `NOAA Climate Datasets `_ * `NOAA Realtime Weather Models `_ * `The World Bank Open Data Resources for Climate Change `_ * `UEA Climatic Research Unit `_ @@ -74,8 +74,8 @@ Complex Networks * `NBER Patent Citations `_ * `NIST complex networks data collection `_ * `Protein-protein interaction network `_ -* `PyPI and Maven Dependency Network `_ -* `Scopus Citation Database `_ +* `PyPI and Maven Dependency Network `_ +* `Scopus Citation Database `_ * `Small Network Data `_ * `Stanford GraphBase (Steven Skiena) `_ * `Stanford Large Network Dataset Collection `_ @@ -92,13 +92,13 @@ Computer Networks ----------------- * `3.5B Web Pages from CommonCraw 2012 `_ -* `53.5B Web clicks of 100K users in Indiana Univ. `_ +* `53.5B Web clicks of 100K users in Indiana Univ. `_ * `CAIDA Internet Datasets `_ * `ClueWeb09 - 1B web pages `_ * `ClueWeb12 - 733M web pages `_ * `CommonCrawl Web Data over 7 years `_ -* `CRAWDAD Wireless datasets from Dartmouth Univ. `_ -* `Criteo click-through data `_ +* `CRAWDAD Wireless datasets from Dartmouth Univ. `_ +* `Criteo click-through data `_ * `Open Mobile Data by MobiPerf `_ * `UCSD Network Telescope, IPv4 /8 net `_ @@ -114,14 +114,14 @@ Data Challenges * `Challenges in Machine Learning `_ * `D4D Challenge of Orange `_ -* `CrowdANALYTIX dataX `_ +* `CrowdANALYTIX dataX `_ * `DrivenData Competitions for Social Good `_ * `ICWSM Data Challenge (since 2009) `_ -* `Kaggle Competition Data `_ +* `Kaggle Competition Data `_ * `KDD Cup by Tencent 2012 `_ * `Localytics Data Visualization Challenge `_ * `Netflix Prize `_ -* `Space Apps Challenge `_ +* `Space Apps Challenge `_ * `Telecom Italia Big Data Challenge `_ * `Yelp Dataset Challenge `_ @@ -129,7 +129,7 @@ Data Challenges Economics --------- -* `American Economic Ass (AEA) `_ +* `American Economic Ass (AEA) `_ * `EconData from UMD `_ * `Internet Product Code Database `_ @@ -159,14 +159,14 @@ Finance * `NASDAQ `_ * `OANDA `_ * `OSU Financial data `_ -* `Quandl `_ -* `St Louis Federal `_ +* `Quandl `_ +* `St Louis Federal `_ * `Yahoo Finance `_ Geology ------- * `USGS Earthquake Archives `_ -* `Smithsonian Institution Global Volcano and Eruption Database `_ +* `Smithsonian Institution Global Volcano and Eruption Database `_ GeoSpace/GIS @@ -175,7 +175,7 @@ GeoSpace/GIS * `BODC - marine data of ~22K vars `_ * `Cambridge, MA, US, GIS data on GitHub `_ * `EOSDIS - NASA's earth observing system data `_ -* `Factual Global Location Data `_ +* `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ @@ -201,7 +201,7 @@ Government * `Belgium `_ * `Brazil `_ * `Cambridge, MA, US `_ -* `Canada `_ +* `Canada `_ * `Chicago `_ * `Dallas Open Data `_ * `Denver Open Data `_ @@ -214,9 +214,9 @@ Government * `Germany `_ * `Ghent, Belgium `_ * `Glasgow, Scotland, UK `_ -* `Guardian world governments `_ +* `Guardian world governments `_ * `Houston Open Data `_ -* `Indian Government Data `_ +* `Indian Government Data `_ * `Indonesian Data Portal `_ * `London Datastore, UK `_ * `Los Angeles Open Data `_ @@ -225,17 +225,17 @@ Government * `Netherlands `_ * `New Zealand `_ * `NYC betanyc `_ -* `NYC Open Data `_ +* `NYC Open Data `_ * `OECD `_ * `Oklahoma `_ -* `Open Government Data (OGD) Platform India `_ +* `Open Government Data (OGD) Platform India `_ * `Oregon `_ -* `Portland, Oregon `_ +* `Portland, Oregon `_ * `Rio de Janeiro, Brazil `_ * `Romania `_ * `San Francisco Data sets `_ * `Seattle `_ -* `Singapore Government Data `_ +* `Singapore Government Data `_ * `South Africa `_ * `Switzerland `_ * `The World Bank `_ @@ -247,8 +247,8 @@ Government * `U.S. CDC Public Health datasets `_ * `U.S. Census Bureau `_ * `U.S. National Center for Education Statistics (NCES) `_ -* `U.S. Department of Housing and Urban Development (HUD) `_ -* `U.S. Federal Government Agencies `_ +* `U.S. Department of Housing and Urban Development (HUD) `_ +* `U.S. Federal Government Agencies `_ * `U.S. Federal Government Data Catalog `_ * `U.S. Food and Drug Administration (FDA) `_ * `U.S. Open Government `_ @@ -262,7 +262,7 @@ Healthcare * `EHDP Large Health Data Sets `_ * `Gapminder World, demographic databases `_ -* `Medicare Coverage Database (MCD), U.S. `_ +* `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ * `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ @@ -326,7 +326,7 @@ Natural Language * `ClueWeb12 FACC `_ * `DBpedia - 4.58M things with 583M facts `_ * `Flickr Personal Taxonomies `_ -* `Google Books Ngrams (2.2TB) `_ +* `Google Books Ngrams (2.2TB) `_ * `Google Web 5gram (1TB, 2006) `_ * `Gutenberg eBooks List `_ * `Hansards text chunks of Canadian Parliament `_ @@ -356,7 +356,7 @@ Psychology/Cognition Public Domains -------------- -* `Amazon `_ +* `Amazon `_ * `Archive.org Datasets `_ * `CMU JASA data archive `_ * `CMU StatLab collections `_ @@ -367,15 +367,15 @@ Public Domains * `KDNuggets Data Collections `_ * `Microsoft Azure Data Market Free DataSets `_ * `Numbray `_ -* `Reddit Datasets `_ -* `RevolutionAnalytics Collection `_ +* `Reddit Datasets `_ +* `RevolutionAnalytics Collection `_ * `Sample R data sets `_ * `Stats4Stem R data sets `_ * `StatSci.org `_ * `The Washington Post List `_ * `UCLA SOCR data collection `_ * `UFO Reports `_ -* `Wikileaks 911 pager intercepts `_ +* `Wikileaks 911 pager intercepts `_ * `Yahoo Webscope `_ @@ -384,20 +384,20 @@ Search Engines * `Academic Torrents of data sharing from UMB `_ * `Archive-it from Internet Archive `_ -* `Datahub.io `_ +* `Datahub.io `_ * `DataMarket (Qlik) `_ * `Freebase.com of people, places, and things `_ -* `Harvard Dataverse Network of scientific data `_ +* `Harvard Dataverse Network of scientific data `_ * `ICPSR (UMICH) `_ -* `Open Data Certificates (beta) `_ +* `Open Data Certificates (beta) `_ * `Statista.com - statistics and Studies `_ Social Networks --------------- * `72 hours #gamergate scrape `_ -* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ -* `May 2011 Calufa Twitter Scrape `_ +* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ +* `May 2011 Calufa Twitter Scrape `_ * `Network Twitter Data `_ * `Social Twitter Data `_ * `Twitter Data for Sentiment Analysis `_ @@ -407,7 +407,7 @@ Social Sciences * `Ancestry.com Forum Dataset over 10 years `_ * `CMU Enron Email of 150 users `_ -* `EDRM Enron EMail of 151 users, hosted on S3 `_ +* `EDRM Enron EMail of 151 users, hosted on S3 `_ * `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ * `FBI Hate Crime 2013 - aggregated data `_ @@ -415,12 +415,12 @@ Social Sciences * `Foursquare from UMN/Sarwat (2013) `_ * `General Social Survey (GSS) since 1972 `_ * `GetGlue - users rating TV shows `_ -* `GitHub Collaboration Archive `_ +* `GitHub Collaboration Archive `_ * `MIT Reality Mining Dataset `_ * `Mobile Social Networks from UMASS `_ * `PewResearch Internet Survey Project `_ * `Reddit Comments `_ -* `SourceForge.net Research Data `_ +* `SourceForge.net Research Data `_ * `StackExchange Data Explorer `_ * `Titanic Survival Data Set `_ * `Texas Inmates Executed Since 1984 `_ @@ -463,10 +463,10 @@ Transportation * `Airlines OD Data 1987-2008 `_ * `Bike Share Systems (BSS) collection `_ -* `Bay Area Bike Share Data `_ +* `Bay Area Bike Share Data `_ * `GeoLife GPS Trajectory from Microsoft Research `_ * `Hubway Million Rides in MA `_ -* `Marine Traffic - ship tracks, port calls and more `_ +* `Marine Traffic - ship tracks, port calls and more `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Taxi Trip Data 2009- `_ * `OpenFlights - airport, airline and route data `_ @@ -487,8 +487,8 @@ Complementary Collections * DataWrangling: `Some Datasets Available on the Web `_ * Inside-r: `Finding Data on the Internet `_ * Quora: `Where can I find large datasets open to the public? `_ -* RS.io: `100+ Interesting Data Sets for Statistics `_ +* RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ -* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ +* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ * Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. `_ From b2480be7a992c68dd396c94ddbb3d92037712b77 Mon Sep 17 00:00:00 2001 From: smblance Date: Sat, 21 Nov 2015 01:43:27 +0300 Subject: [PATCH 107/276] Added Russian official data portal --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index faad736..b96b1f6 100644 --- a/README.rst +++ b/README.rst @@ -233,6 +233,7 @@ Government * `Portland, Oregon `_ * `Rio de Janeiro, Brazil `_ * `Romania `_ +* `Russia `_ * `San Francisco Data sets `_ * `Seattle `_ * `Singapore Government Data `_ From 756301186fe8a17282e1fcc373224218a3b14456 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Sat, 21 Nov 2015 18:00:46 +0800 Subject: [PATCH 108/276] Fix issue metioned in #120 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index feb485d..f897c5c 100644 --- a/README.rst +++ b/README.rst @@ -302,7 +302,7 @@ Machine Learning * `More Song Datasets `_ * `MovieLens Data Sets `_ * `RDataMining - "R and Data Mining" ebook data `_ -* `Registered Meteorites on Earth `_ +* `Registered Meteorites on Earth `_ * `Restaurants Health Score Data in San Francisco `_ * `UCI Machine Learning Repository `_ * `Yahoo! Ratings and Classification Data `_ From ad05c35998ef624006abab0cec1af9a4010a6d90 Mon Sep 17 00:00:00 2001 From: Sally Jenkinson Date: Wed, 2 Dec 2015 10:55:04 +0000 Subject: [PATCH 109/276] Added reference to Open-ODS API (an open API for the structure of the UK National Health Service) --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index f897c5c..209ed58 100644 --- a/README.rst +++ b/README.rst @@ -268,6 +268,7 @@ Healthcare * `Medicare Data File `_ * `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ +* `Open-ODS (structure of the UK NHS) `_ Image Processing From 1788e514c206829a274c4e76402fb651fd6a1d22 Mon Sep 17 00:00:00 2001 From: Derwin McGeary Date: Tue, 8 Dec 2015 00:33:13 +0000 Subject: [PATCH 110/276] Add worldclim.org This site has good global climate data. Licence: "This dataset is freely available for academic and other non-commercial use. Redistribution, or commercial use is not allowed without prior permission." --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index f897c5c..88b0cc5 100644 --- a/README.rst +++ b/README.rst @@ -62,6 +62,7 @@ Climate/Weather * `NOAA Climate Datasets `_ * `NOAA Realtime Weather Models `_ * `The World Bank Open Data Resources for Climate Change `_ +* `WorldClim - Global Climate Data `_ * `UEA Climatic Research Unit `_ * `WU Historical Weather Worldwide `_ From bf5e282f438f4f43af02f99db260f445d172acf1 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Tue, 8 Dec 2015 13:23:43 +0800 Subject: [PATCH 111/276] Add TCGA #126; Clear format. --- README.rst | 71 +++++++++++++++++++++++++++++------------------------- 1 file changed, 38 insertions(+), 33 deletions(-) diff --git a/README.rst b/README.rst index 7c94bde..a7aab67 100644 --- a/README.rst +++ b/README.rst @@ -5,7 +5,7 @@ Awesome Public Datasets :target: https://github.com/sindresorhus/awesome .. image:: https://travis-ci.org/caesar0301/awesome-public-datasets.svg :target: https://travis-ci.org/caesar0301/awesome-public-datasets - + `This list of public data sources `_ are collected and tidied from blogs, answers, and user reponses. Most of the data sets listed below are free, however, some are not. @@ -27,12 +27,11 @@ Biology * `1000 Genomes `_ * `American Gut (Microbiome Project) `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ +* `EBI ArrayExrepss `_ +* `ENCODE project `_ * `Gene Expression Omnibus (GEO) `_ * `Gene Ontology (GO) `_ * `Global Biotic Interations (GloBI) `_ -* `Sequence Read Archive(SRA) `_ -* `EBI ArrayExrepss `_ -* `ENCODE project `_ * `Human Microbiome Project (HMP) `_ * `ICOS PSP Benchmark `_ * `MIT Cancer Genomics Data `_ @@ -42,11 +41,12 @@ Biology * `Protein Data Bank `_ * `PubChem Project `_ * `PubGene (now Coremine Medical) `_ +* `Sequence Read Archive(SRA) `_ * `Stanford Microarray Data `_ +* `The Catalogue of Life `_ * `The Personal Genome Project `_ or `PGP `_ * `UCSC Public Data `_ * `UniGene `_ -* `The Catalogue of Life `_ Climate/Weather @@ -62,8 +62,8 @@ Climate/Weather * `NOAA Climate Datasets `_ * `NOAA Realtime Weather Models `_ * `The World Bank Open Data Resources for Climate Change `_ -* `WorldClim - Global Climate Data `_ * `UEA Climatic Research Unit `_ +* `WorldClim - Global Climate Data `_ * `WU Historical Weather Worldwide `_ @@ -114,8 +114,8 @@ Data Challenges --------------- * `Challenges in Machine Learning `_ -* `D4D Challenge of Orange `_ * `CrowdANALYTIX dataX `_ +* `D4D Challenge of Orange `_ * `DrivenData Competitions for Social Good `_ * `ICWSM Data Challenge (since 2009) `_ * `Kaggle Competition Data `_ @@ -166,8 +166,9 @@ Finance Geology ------- -* `USGS Earthquake Archives `_ + * `Smithsonian Institution Global Volcano and Eruption Database `_ +* `USGS Earthquake Archives `_ GeoSpace/GIS @@ -181,14 +182,14 @@ GeoSpace/GIS * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ * `Landsat 8 on AWS `_ +* `List of all countries in all languages `_ * `Natural Earth - vectors and rasters of the world `_ +* `OpenAddresses `_ * `OpenStreetMap (OSM) `_ * `TIGER/Line - U.S. boundaries and roads `_ * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ * `World countries in multiple formats `_ -* `List of all countries in all languages `_ -* `OpenAddresses `_ Government @@ -232,6 +233,7 @@ Government * `Open Government Data (OGD) Platform India `_ * `Oregon `_ * `Portland, Oregon `_ +* `Puerto Rico Government `_ * `Rio de Janeiro, Brazil `_ * `Romania `_ * `Russia `_ @@ -240,22 +242,21 @@ Government * `Singapore Government Data `_ * `South Africa `_ * `Switzerland `_ -* `The World Bank `_ * `Texas Open Data `_ -* `Puerto Rico Government `_ +* `The World Bank `_ * `U.K. Government Data `_ -* `Uruguay `_ * `U.S. American Community Survey `_ * `U.S. CDC Public Health datasets `_ * `U.S. Census Bureau `_ -* `U.S. National Center for Education Statistics (NCES) `_ * `U.S. Department of Housing and Urban Development (HUD) `_ * `U.S. Federal Government Agencies `_ * `U.S. Federal Government Data Catalog `_ * `U.S. Food and Drug Administration (FDA) `_ +* `U.S. National Center for Education Statistics (NCES) `_ * `U.S. Open Government `_ * `UK 2011 Census Open Atlas Project `_ * `United Nations `_ +* `Uruguay `_ * `Vancouver, BC Open Data Catalog `_ @@ -270,6 +271,7 @@ Healthcare * `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ +* `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ Image Processing @@ -277,17 +279,17 @@ Image Processing * `10k US Adult Faces Database `_ * `2GB of Photos of Cats (Original down - 20Agst2015) `_ or `Archive version `_ -* `Stanford Dogs Dataset `_ -* `The Oxford-IIIT Pet Dataset `_ -* `Animals with attributes `_ * `Affective Image Classification `_ +* `Animals with attributes `_ * `Face Recognition Benchmark `_ * `ImageNet (in WordNet hierarchy) `_ +* `Indoor Scene Recognition `_ * `International Affective Picture System, UFL `_ * `Massive Visual Memory Stimuli, MIT `_ +* `Stanford Dogs Dataset `_ * `SUN database, MIT `_ +* `The Oxford-IIIT Pet Dataset `_ * `YouTube Faces Database `_ -* `Indoor Scene Recognition `_ Machine Learning @@ -334,8 +336,8 @@ Natural Language * `Gutenberg eBooks List `_ * `Hansards text chunks of Canadian Parliament `_ * `Machine Translation of European languages `_ -* `SMS Spam Collection in English `_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ +* `SMS Spam Collection in English `_ * `USENET postings corpus of 2005~2011 `_ * `Wikidata - Wikipedia databases `_ * `Wikipedia Links data - 40 Million Entities in Context `_ @@ -346,10 +348,11 @@ Physics ------- * `CERN Open Data Portal `_ -* `NSSDC (NASA) data of 550 space spacecraft `_ * `NASA Exoplanet Archive `_ +* `NSSDC (NASA) data of 550 space spacecraft `_ * `Sloan Digital Sky Survey (SDSS) - Mapping the Universe `_ + Psychology/Cognition -------------- @@ -395,6 +398,7 @@ Search Engines * `Open Data Certificates (beta) `_ * `Statista.com - statistics and Studies `_ + Social Networks --------------- @@ -405,6 +409,7 @@ Social Networks * `Social Twitter Data `_ * `Twitter Data for Sentiment Analysis `_ + Social Sciences --------------- @@ -414,19 +419,23 @@ Social Sciences * `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ * `FBI Hate Crime 2013 - aggregated data `_ -* `Foursquare Social Network in 2010, 2011 `_ * `Foursquare from UMN/Sarwat (2013) `_ +* `Foursquare Social Network in 2010, 2011 `_ +* `GDELT Global Events Database `_ * `General Social Survey (GSS) since 1972 `_ * `GetGlue - users rating TV shows `_ * `GitHub Collaboration Archive `_ +* `Google Scholar citation relations `_ * `MIT Reality Mining Dataset `_ * `Mobile Social Networks from UMASS `_ * `PewResearch Internet Survey Project `_ +* `Political Polarity Data `_ * `Reddit Comments `_ +* `Skytrax' Air Travel Reviews Dataset `_ * `SourceForge.net Research Data `_ * `StackExchange Data Explorer `_ -* `Titanic Survival Data Set `_ * `Texas Inmates Executed Since 1984 `_ +* `Titanic Survival Data Set `_ * `Twitter Graph of entire Twitter site `_ * `UCB's Archive of Social Science Data (D-Lab) `_ * `UCLA Social Sciences Data Archive `_ @@ -435,10 +444,6 @@ Social Sciences * `UPJOHN for Labor Employment Research `_ * `Yahoo! Graph and Social Data `_ * `Youtube Video Social Graph in 2007,2008 `_ -* `Google Scholar citation relations `_ -* `Political Polarity Data `_ -* `GDELT Global Events Database `_ -* `Skytrax' Air Travel Reviews Dataset `_ Sports @@ -455,23 +460,24 @@ Sports Time Series ----------- -* `Time Series Data Library (TSDL) from MU `_ -* `UC Riverside Time Series Dataset `_ * `Hard Drive Failure Rates `_ * `Heart Rate Time Series from MIT `_ +* `Time Series Data Library (TSDL) from MU `_ +* `UC Riverside Time Series Dataset `_ Transportation -------------- * `Airlines OD Data 1987-2008 `_ -* `Bike Share Systems (BSS) collection `_ * `Bay Area Bike Share Data `_ +* `Bike Share Systems (BSS) collection `_ * `GeoLife GPS Trajectory from Microsoft Research `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ -* `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Taxi Trip Data 2009- `_ +* `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ +* `NYC Uber trip data April 2014 to September 2014 `_ * `OpenFlights - airport, airline and route data `_ * `Plane Crash Database, since 1920 `_ * `RITA Airline On-Time Performance data `_ @@ -481,7 +487,6 @@ Transportation * `U.S. Bureau of Transportation Statistics (BTS) `_ * `U.S. Domestic Flights 1990 to 2009 `_ * `U.S. Freight Analysis Framework since 2007 `_ -* `NYC Uber trip data April 2014 to September 2014 `_ Complementary Collections @@ -489,9 +494,9 @@ Complementary Collections * DataWrangling: `Some Datasets Available on the Web `_ * Inside-r: `Finding Data on the Internet `_ +* OpenDataMonitor: `An overview of available open data resources in Europe `_ +* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ -* OpenDataMonitor: `An overview of available open data resources in Europe `_ -* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ * Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. `_ From 13c09f81d8e3e8fe4d4a10a4801791b26aed894b Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 8 Dec 2015 14:47:50 +0800 Subject: [PATCH 112/276] Remove dup UCI net data --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index a7aab67..91bbec8 100644 --- a/README.rst +++ b/README.rst @@ -83,7 +83,6 @@ Complex Networks * `The Koblenz Network Collection `_ * `The Laboratory for Web Algorithmics (UNIMI) `_ * `The Nexus Network Repository `_ -* `UCI Network Data Repository `_ * `UCI Network Data Repository `_ * `UFL sparse matrix collection `_ * `WSU Graph Database `_ From 64d40f70bfe6aba3e9b873a6b1f10ae32a1fa1f2 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 8 Dec 2015 15:25:12 +0800 Subject: [PATCH 113/276] Fix link of JPJOHN employment research data --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 91bbec8..cc3e00a 100644 --- a/README.rst +++ b/README.rst @@ -440,7 +440,7 @@ Social Sciences * `UCLA Social Sciences Data Archive `_ * `UNIMI/LAW Social Network Datasets `_ * `Universities Worldwide `_ -* `UPJOHN for Labor Employment Research `_ +* `UPJOHN for Labor Employment Research `_ * `Yahoo! Graph and Social Data `_ * `Youtube Video Social Graph in 2007,2008 `_ From 078e4e353d5bdc5e2dccbfd8b42356c9f0389613 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 8 Dec 2015 16:05:03 +0800 Subject: [PATCH 114/276] FQ 2011, 2012 data link dead --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index cc3e00a..d0327b8 100644 --- a/README.rst +++ b/README.rst @@ -419,7 +419,6 @@ Social Sciences * `Facebook Social Networks from LAW (since 2007) `_ * `FBI Hate Crime 2013 - aggregated data `_ * `Foursquare from UMN/Sarwat (2013) `_ -* `Foursquare Social Network in 2010, 2011 `_ * `GDELT Global Events Database `_ * `General Social Survey (GSS) since 1972 `_ * `GetGlue - users rating TV shows `_ From 4a13b653df46528a448193f47e466b2ef98ede7d Mon Sep 17 00:00:00 2001 From: Erich Schubert Date: Wed, 16 Dec 2015 10:20:48 +0100 Subject: [PATCH 115/276] Add fast reverse-geocoder using OSM data --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index d0327b8..3ddb381 100644 --- a/README.rst +++ b/README.rst @@ -185,6 +185,7 @@ GeoSpace/GIS * `Natural Earth - vectors and rasters of the world `_ * `OpenAddresses `_ * `OpenStreetMap (OSM) `_ +* `Reverse Geocoder using OSM data `_ (capable of 1m lookups/s); `additional high-resolution data files `_ * `TIGER/Line - U.S. boundaries and roads `_ * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ From 334650ec541bffa7fef49647230b1e15f3e1b23e Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 16 Dec 2015 20:03:54 +0800 Subject: [PATCH 116/276] Update README.rst --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 3ddb381..bed310f 100644 --- a/README.rst +++ b/README.rst @@ -185,7 +185,7 @@ GeoSpace/GIS * `Natural Earth - vectors and rasters of the world `_ * `OpenAddresses `_ * `OpenStreetMap (OSM) `_ -* `Reverse Geocoder using OSM data `_ (capable of 1m lookups/s); `additional high-resolution data files `_ +* `Reverse Geocoder using OSM data `_ & `additional high-resolution data files `_ * `TIGER/Line - U.S. boundaries and roads `_ * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ From b2d27fdf2e4eb3c9461d27d6debf0a17516c5b9b Mon Sep 17 00:00:00 2001 From: Derwin McGeary Date: Wed, 16 Dec 2015 12:31:28 +0000 Subject: [PATCH 117/276] Add Ensemble Genomes --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index bed310f..a0c3133 100644 --- a/README.rst +++ b/README.rst @@ -29,6 +29,7 @@ Biology * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `EBI ArrayExrepss `_ * `ENCODE project `_ +* `Ensembl Genomes `_ * `Gene Expression Omnibus (GEO) `_ * `Gene Ontology (GO) `_ * `Global Biotic Interations (GloBI) `_ From c51feea86cb4b212a73a6e76bd31c9ccab2f760b Mon Sep 17 00:00:00 2001 From: Marcus Emmanuel Barnes Date: Thu, 17 Dec 2015 14:44:45 -0800 Subject: [PATCH 118/276] Update README.rst Add the Canada Science and Technology Museums Corporation open data page under the Museums section. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index bed310f..c4bfe91 100644 --- a/README.rst +++ b/README.rst @@ -321,6 +321,7 @@ Museums * `Rijksmuseum Historical Art Collection `_ * `Tate Collection metadata `_ * `The Getty vocabularies `_ +* `Canada Science and Technology Museums Corporation's Open Data `_ Natural Language From b7edc5cb38ac8113af042cb6c270df670d04f197 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Mon, 21 Dec 2015 18:50:36 +0800 Subject: [PATCH 119/276] Add German train data. #131 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 5ea84a6..5f828fe 100644 --- a/README.rst +++ b/README.rst @@ -474,6 +474,7 @@ Transportation * `Bay Area Bike Share Data `_ * `Bike Share Systems (BSS) collection `_ * `GeoLife GPS Trajectory from Microsoft Research `_ +* `German train system by Deutsche Bahn `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ * `NYC Taxi Trip Data 2009- `_ From 1c336735d59a068e1766b551200fb04dca7a0576 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Mon, 21 Dec 2015 08:36:13 -0800 Subject: [PATCH 120/276] Update README URLs based on HTTP redirects --- README.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.rst b/README.rst index 5f828fe..e6bca1b 100644 --- a/README.rst +++ b/README.rst @@ -56,7 +56,7 @@ Climate/Weather * `Australian Weather `_ * `Brazilian Weather - Historical data (In Portuguese) `_ * `Canadian Meteorological Centre `_ -* `Climate Data from UEA (updated monthly) `_ +* `Climate Data from UEA (updated monthly) `_ * `Global Climate Data Since 1929 `_ * `NASA Global Imagery Browse Services `_ * `NOAA Bering Sea Climate `_ @@ -186,7 +186,7 @@ GeoSpace/GIS * `Natural Earth - vectors and rasters of the world `_ * `OpenAddresses `_ * `OpenStreetMap (OSM) `_ -* `Reverse Geocoder using OSM data `_ & `additional high-resolution data files `_ +* `Reverse Geocoder using OSM data `_ & `additional high-resolution data files `_ * `TIGER/Line - U.S. boundaries and roads `_ * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ @@ -216,7 +216,7 @@ Government * `France `_ * `Germany `_ * `Ghent, Belgium `_ -* `Glasgow, Scotland, UK `_ +* `Glasgow, Scotland, UK `_ * `Guardian world governments `_ * `Houston Open Data `_ * `Indian Government Data `_ @@ -423,7 +423,7 @@ Social Sciences * `FBI Hate Crime 2013 - aggregated data `_ * `Foursquare from UMN/Sarwat (2013) `_ * `GDELT Global Events Database `_ -* `General Social Survey (GSS) since 1972 `_ +* `General Social Survey (GSS) since 1972 `_ * `GetGlue - users rating TV shows `_ * `GitHub Collaboration Archive `_ * `Google Scholar citation relations `_ @@ -484,7 +484,7 @@ Transportation * `Plane Crash Database, since 1920 `_ * `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ -* `Transport for London (TFL) `_ +* `Transport for London (TFL) `_ * `Travel Tracker Survey (TTS) for Chicago `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ * `U.S. Domestic Flights 1990 to 2009 `_ From 716778896a44d414b6d97f1c28803bbdb00695fb Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 22 Dec 2015 10:50:15 +0800 Subject: [PATCH 121/276] Add MCTest #133 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e6bca1b..d15388c 100644 --- a/README.rst +++ b/README.rst @@ -338,6 +338,7 @@ Natural Language * `Gutenberg eBooks List `_ * `Hansards text chunks of Canadian Parliament `_ * `Machine Translation of European languages `_ +* `Machine Comprehension Test (MCTest) of text from Microsoft Research `_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ * `SMS Spam Collection in English `_ * `USENET postings corpus of 2005~2011 `_ From bc53888819a0c773e847d4647c89fff5b4f1b965 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Tue, 22 Dec 2015 12:52:22 +0800 Subject: [PATCH 122/276] Add awesome_bot to check links; Repair several dead links reported by awesome_bot. #130 --- .travis.yml | 13 ++++++------- README.rst | 8 ++++---- 2 files changed, 10 insertions(+), 11 deletions(-) diff --git a/.travis.yml b/.travis.yml index 2353d2e..07ebbc1 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,8 +1,7 @@ -language: bash -sudo: false - -before_install: - - set -e - +language: ruby +rvm: + - 2.2 +before_script: + - gem install awesome_bot script: - - ag -o '([^\s<>]+://[^\s<>]+)' README.rst | while read url; do if [[ "$url" != '' ]]; then echo "$url"; curl --output /dev/null --silent --head --fail --location --insecure "$url" || exit 1; fi; done + - awesome_bot README.rst --allow-dupe --allow-redirect --white-list travis \ No newline at end of file diff --git a/README.rst b/README.rst index d15388c..7ca5f13 100644 --- a/README.rst +++ b/README.rst @@ -36,7 +36,7 @@ Biology * `Human Microbiome Project (HMP) `_ * `ICOS PSP Benchmark `_ * `MIT Cancer Genomics Data `_ -* `NIH Microarray data (FTP) `_ +* `NIH Microarray data (FTP) `_ * `OpenSNP genotypes data `_ * `Pathguid: Protein-Protein Interactions Catalog `_ * `Protein Data Bank `_ @@ -217,7 +217,7 @@ Government * `Germany `_ * `Ghent, Belgium `_ * `Glasgow, Scotland, UK `_ -* `Guardian world governments `_ +* `Guardian world governments `_ * `Houston Open Data `_ * `Indian Government Data `_ * `Indonesian Data Portal `_ @@ -233,7 +233,7 @@ Government * `Oklahoma `_ * `Open Government Data (OGD) Platform India `_ * `Oregon `_ -* `Portland, Oregon `_ +* `Portland, Oregon `_ * `Puerto Rico Government `_ * `Rio de Janeiro, Brazil `_ * `Romania `_ @@ -425,7 +425,7 @@ Social Sciences * `Foursquare from UMN/Sarwat (2013) `_ * `GDELT Global Events Database `_ * `General Social Survey (GSS) since 1972 `_ -* `GetGlue - users rating TV shows `_ +* `GetGlue - users rating TV shows `_ * `GitHub Collaboration Archive `_ * `Google Scholar citation relations `_ * `MIT Reality Mining Dataset `_ From 4a0d02d9a23b7a41e45ef79631fe614b8a569918 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Tue, 22 Dec 2015 13:00:21 +0800 Subject: [PATCH 123/276] Fix NIH FTP link --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 7ca5f13..3618730 100644 --- a/README.rst +++ b/README.rst @@ -36,7 +36,7 @@ Biology * `Human Microbiome Project (HMP) `_ * `ICOS PSP Benchmark `_ * `MIT Cancer Genomics Data `_ -* `NIH Microarray data (FTP) `_ +* `NIH Microarray data `_ or `FTP `_ * `OpenSNP genotypes data `_ * `Pathguid: Protein-Protein Interactions Catalog `_ * `Protein Data Bank `_ From 09ebe4a9a7cd689b3645ecfc6a4823585c95c2ad Mon Sep 17 00:00:00 2001 From: Kai Wolf Date: Tue, 22 Dec 2015 14:25:14 +0100 Subject: [PATCH 124/276] Add some shape-from-silhoutte datasets --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 3618730..cea7d91 100644 --- a/README.rst +++ b/README.rst @@ -291,6 +291,7 @@ Image Processing * `SUN database, MIT `_ * `The Oxford-IIIT Pet Dataset `_ * `YouTube Faces Database `_ +* `Several Shape-from-Silhouette Datasets `_ Machine Learning From 628f0ed6943a03bc5ba0a1c0b7549f75f7207bfc Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Wed, 23 Dec 2015 16:04:01 +0800 Subject: [PATCH 125/276] Recheck travis list --- .travis.yml | 3 ++- README.rst | 8 ++++---- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/.travis.yml b/.travis.yml index 07ebbc1..77eab30 100644 --- a/.travis.yml +++ b/.travis.yml @@ -4,4 +4,5 @@ rvm: before_script: - gem install awesome_bot script: - - awesome_bot README.rst --allow-dupe --allow-redirect --white-list travis \ No newline at end of file + - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu + - awesome_bot README.rst --allow-dupe --allow-redirect --white-list travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,datamob.org,numbrary.com,gdeltproject.org,www.cmr.osu.edu,$site404 \ No newline at end of file diff --git a/README.rst b/README.rst index cea7d91..f3242bd 100644 --- a/README.rst +++ b/README.rst @@ -65,7 +65,7 @@ Climate/Weather * `The World Bank Open Data Resources for Climate Change `_ * `UEA Climatic Research Unit `_ * `WorldClim - Global Climate Data `_ -* `WU Historical Weather Worldwide `_ +* `WU Historical Weather Worldwide `_ Complex Networks @@ -119,7 +119,7 @@ Data Challenges * `DrivenData Competitions for Social Good `_ * `ICWSM Data Challenge (since 2009) `_ * `Kaggle Competition Data `_ -* `KDD Cup by Tencent 2012 `_ +* `KDD Cup by Tencent 2012 `_ * `Localytics Data Visualization Challenge `_ * `Netflix Prize `_ * `Space Apps Challenge `_ @@ -211,7 +211,7 @@ Government * `Durham, NC Open Data `_ * `England LGInform `_ * `EuroStat `_ -* `FedStats `_ +* `FedStats `_ * `Finland `_ * `France `_ * `Germany `_ @@ -298,7 +298,7 @@ Machine Learning ---------------- * `Delve Datasets for classification and regression (Univ. of Toronto) `_ -* `Discogs Monthly Data `_ +* `Discogs Monthly Data `_ * `eBay Online Auctions (2012) `_ * `IMDb Database `_ * `Keel Repository for classification, regression and time series `_ From b048ab1afe35ddc0f133c52095917d5634962607 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Wed, 23 Dec 2015 16:15:22 +0800 Subject: [PATCH 126/276] add SSL certificate error to white list --- .travis.yml | 2 +- README.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.travis.yml b/.travis.yml index 77eab30..9bead36 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,4 +5,4 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu - - awesome_bot README.rst --allow-dupe --allow-redirect --white-list travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,datamob.org,numbrary.com,gdeltproject.org,www.cmr.osu.edu,$site404 \ No newline at end of file + - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,datamob.org,numbrary.com,gdeltproject.org,www.cmr.osu.edu,wiki.earthdata.nasa.gov,weather.gc.ca \ No newline at end of file diff --git a/README.rst b/README.rst index f3242bd..23b101a 100644 --- a/README.rst +++ b/README.rst @@ -55,7 +55,7 @@ Climate/Weather * `Australian Weather `_ * `Brazilian Weather - Historical data (In Portuguese) `_ -* `Canadian Meteorological Centre `_ +* `Canadian Meteorological Centre `_ * `Climate Data from UEA (updated monthly) `_ * `Global Climate Data Since 1929 `_ * `NASA Global Imagery Browse Services `_ From 0b0766acc933e5897284254e883f411bae3bbdeb Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Wed, 23 Dec 2015 16:25:56 +0800 Subject: [PATCH 127/276] Update 2G cat link --- .travis.yml | 2 +- README.rst | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.travis.yml b/.travis.yml index 9bead36..23b0500 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,4 +5,4 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu - - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,datamob.org,numbrary.com,gdeltproject.org,www.cmr.osu.edu,wiki.earthdata.nasa.gov,weather.gc.ca \ No newline at end of file + - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,datamob.org,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov \ No newline at end of file diff --git a/README.rst b/README.rst index 23b101a..ee795d9 100644 --- a/README.rst +++ b/README.rst @@ -279,7 +279,7 @@ Image Processing ---------------- * `10k US Adult Faces Database `_ -* `2GB of Photos of Cats (Original down - 20Agst2015) `_ or `Archive version `_ +* `2GB of Photos of Cats `_ or `Archive version `_ * `Affective Image Classification `_ * `Animals with attributes `_ * `Face Recognition Benchmark `_ From e6dc40ad8583fcef174523639f9f84ff81d88bf7 Mon Sep 17 00:00:00 2001 From: Ignacio Peluffo Date: Wed, 23 Dec 2015 11:04:00 -0300 Subject: [PATCH 128/276] Datasets from Argentina added Datasets from Argentina added to the Government list. I added two open data resources for Argentina and one for Buenos Aires --- README.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.rst b/README.rst index ee795d9..ca5a3fb 100644 --- a/README.rst +++ b/README.rst @@ -197,12 +197,15 @@ Government ---------- * `Antwerp, Belgium `_ +* `Argentina `_ +* `Argentina (non official) `_ * `Austin, TX, US `_ * `Australia (abs.gov.au) `_ * `Australia (data.gov.au) `_ * `Austria (data.gv.at) `_ * `Belgium `_ * `Brazil `_ +* `Buenos Aires, Argentina `_ * `Cambridge, MA, US `_ * `Canada `_ * `Chicago `_ From 19647877e14a000ab1f0b0a36f09da524f8e3268 Mon Sep 17 00:00:00 2001 From: Marcus Emmanuel Barnes Date: Wed, 23 Dec 2015 13:59:55 -0800 Subject: [PATCH 129/276] Update README.rst MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Government of British Columbia (Canada) data portal, which includes access to over 1,500 data sets licensed under the Open Government License – British Columbia. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index ca5a3fb..bddaacc 100644 --- a/README.rst +++ b/README.rst @@ -262,6 +262,7 @@ Government * `United Nations `_ * `Uruguay `_ * `Vancouver, BC Open Data Catalog `_ +* `DataBC - data from the Province of British Columbia `_ Healthcare From 309c82668d8bc6b29b9dc6d3ea27f777af60b97b Mon Sep 17 00:00:00 2001 From: Tim Carnus Date: Thu, 24 Dec 2015 00:22:56 +0000 Subject: [PATCH 130/276] Adding european climate assessment dataset --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index ca5a3fb..fa325d1 100644 --- a/README.rst +++ b/README.rst @@ -57,6 +57,7 @@ Climate/Weather * `Brazilian Weather - Historical data (In Portuguese) `_ * `Canadian Meteorological Centre `_ * `Climate Data from UEA (updated monthly) `_ +* `European Climate Assessment & Dataset `_ * `Global Climate Data Since 1929 `_ * `NASA Global Imagery Browse Services `_ * `NOAA Bering Sea Climate `_ From c178c90b66e52f66ac6527851cf8258eb0a68f6f Mon Sep 17 00:00:00 2001 From: Camilo Nova Date: Tue, 29 Dec 2015 13:58:26 -0500 Subject: [PATCH 131/276] Fix typo --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index d455a4c..ff6a750 100644 --- a/README.rst +++ b/README.rst @@ -7,7 +7,7 @@ Awesome Public Datasets :target: https://travis-ci.org/caesar0301/awesome-public-datasets `This list of public data sources `_ -are collected and tidied from blogs, answers, and user reponses. +are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in the `awesome-awesomeness `_ and From 795252c7f76ae835553e52ec5d60cfd93477a907 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Wed, 30 Dec 2015 17:18:44 +0800 Subject: [PATCH 132/276] 1. Add society data from Pew Research Center; 2. Merge social networks into social science; --- README.rst | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/README.rst b/README.rst index ff6a750..1f8b63d 100644 --- a/README.rst +++ b/README.rst @@ -13,8 +13,6 @@ Other amazingly awesome lists can be found in the `awesome-awesomeness `_ and `sindresorhus's awesome `_ list. -* `Visit our Google Group on APD `_ - Agriculture ------------ @@ -339,12 +337,13 @@ Natural Language * `ClueWeb12 FACC `_ * `DBpedia - 4.58M things with 583M facts `_ * `Flickr Personal Taxonomies `_ +* `Freebase.com of people, places, and things `_ * `Google Books Ngrams (2.2TB) `_ * `Google Web 5gram (1TB, 2006) `_ * `Gutenberg eBooks List `_ * `Hansards text chunks of Canadian Parliament `_ -* `Machine Translation of European languages `_ * `Machine Comprehension Test (MCTest) of text from Microsoft Research `_ +* `Machine Translation of European languages `_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ * `SMS Spam Collection in English `_ * `USENET postings corpus of 2005~2011 `_ @@ -401,28 +400,18 @@ Search Engines * `Archive-it from Internet Archive `_ * `Datahub.io `_ * `DataMarket (Qlik) `_ -* `Freebase.com of people, places, and things `_ * `Harvard Dataverse Network of scientific data `_ * `ICPSR (UMICH) `_ * `Open Data Certificates (beta) `_ * `Statista.com - statistics and Studies `_ -Social Networks ---------------- - -* `72 hours #gamergate scrape `_ -* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ -* `May 2011 Calufa Twitter Scrape `_ -* `Network Twitter Data `_ -* `Social Twitter Data `_ -* `Twitter Data for Sentiment Analysis `_ - - Social Sciences --------------- +* `72 hours #gamergate scrape `_ * `Ancestry.com Forum Dataset over 10 years `_ +* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ * `CMU Enron Email of 150 users `_ * `EDRM Enron EMail of 151 users, hosted on S3 `_ * `Facebook Data Scrape (2005) `_ @@ -436,15 +425,20 @@ Social Sciences * `Google Scholar citation relations `_ * `MIT Reality Mining Dataset `_ * `Mobile Social Networks from UMASS `_ +* `Network Twitter Data `_ * `PewResearch Internet Survey Project `_ +* `PewResearch Society Data Collection `_ * `Political Polarity Data `_ * `Reddit Comments `_ * `Skytrax' Air Travel Reviews Dataset `_ +* `Social Twitter Data `_ * `SourceForge.net Research Data `_ * `StackExchange Data Explorer `_ * `Texas Inmates Executed Since 1984 `_ * `Titanic Survival Data Set `_ +* `Twitter Data for Sentiment Analysis `_ * `Twitter Graph of entire Twitter site `_ +* `Twitter Scrape Calufa May 2011 `_ * `UCB's Archive of Social Science Data (D-Lab) `_ * `UCLA Social Sciences Data Archive `_ * `UNIMI/LAW Social Network Datasets `_ From fbf46c30e2d0ba9a702a04d071ebad162b409e61 Mon Sep 17 00:00:00 2001 From: Herman Slatman Date: Thu, 31 Dec 2015 00:44:45 +0100 Subject: [PATCH 133/276] OpenCorporates database of companies --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 1f8b63d..5abeef2 100644 --- a/README.rst +++ b/README.rst @@ -132,7 +132,7 @@ Economics * `American Economic Ass (AEA) `_ * `EconData from UMD `_ * `Internet Product Code Database `_ - +* `OpenCorporates Database of Companies in the World `_ Energy ------ From 9d895a6473b9d51ed41ea88c94de8059625b1b96 Mon Sep 17 00:00:00 2001 From: CW Dillon Date: Wed, 30 Dec 2015 20:45:52 -0500 Subject: [PATCH 134/276] Adding a few data sources from my data bookmarks --- README.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.rst b/README.rst index 1f8b63d..17d84b3 100644 --- a/README.rst +++ b/README.rst @@ -264,6 +264,7 @@ Government * `DataBC - data from the Province of British Columbia `_ + Healthcare ---------- @@ -446,6 +447,12 @@ Social Sciences * `UPJOHN for Labor Employment Research `_ * `Yahoo! Graph and Social Data `_ * `Youtube Video Social Graph in 2007,2008 `_ +* `The MacroData Guide - Norsk samfunnsvitenskapelig datatjeneste`_ +* `Cryptome - Random Government Items `_ +* ``_ +* ``_ +* ``_ +* ``_ Sports From f9cdb924cd3767a69bbaf6b549e411af3d45f959 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Pelletier?= Date: Wed, 30 Dec 2015 23:52:03 -0500 Subject: [PATCH 135/276] New data sources from Canada Added Canada and other miscellaneous open data sources --- README.rst | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/README.rst b/README.rst index 5abeef2..1e972fd 100644 --- a/README.rst +++ b/README.rst @@ -195,6 +195,7 @@ GeoSpace/GIS Government ---------- +* `Alberta, Province of Canada `_ * `Antwerp, Belgium `_ * `Argentina `_ * `Argentina (non official) `_ @@ -202,31 +203,43 @@ Government * `Australia (abs.gov.au) `_ * `Australia (data.gov.au) `_ * `Austria (data.gv.at) `_ +* `Baton Rouge, LA, US `_ * `Belgium `_ * `Brazil `_ * `Buenos Aires, Argentina `_ +* `Calgary, AB, Canada ` * `Cambridge, MA, US `_ * `Canada `_ * `Chicago `_ * `Dallas Open Data `_ * `Denver Open Data `_ * `Durham, NC Open Data `_ +* `Edmonton, AB, Canada `_ * `England LGInform `_ * `EuroStat `_ * `FedStats `_ * `Finland `_ * `France `_ +* `Fredericton, NB, Canada `_ +* `Gatineau, QC, Canada `_ * `Germany `_ * `Ghent, Belgium `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ +* `Halifax, NS, Canada ` +* `Helsinki Region, Finland ` * `Houston Open Data `_ * `Indian Government Data `_ * `Indonesian Data Portal `_ +* `Laval, QC, Canada `_ +* `London, ON, Canada `_ * `London Datastore, UK `_ * `Los Angeles Open Data `_ * `MassGIS, Massachusetts, U.S. `_ * `Mexico `_ +* `Missisauga, ON, Canada `_ +* `Moncton, NB, Canada `_ +* `Montreal, QC, Canada `_ * `Netherlands `_ * `New Zealand `_ * `NYC betanyc `_ @@ -235,18 +248,25 @@ Government * `Oklahoma `_ * `Open Government Data (OGD) Platform India `_ * `Oregon `_ +* `Ottawa, ON, Canada `_ * `Portland, Oregon `_ * `Puerto Rico Government `_ +* `Quebec City, QC, Canada `_ +* `Quebec Province of Canada `_ +* `Regina SK, Canada `_ * `Rio de Janeiro, Brazil `_ * `Romania `_ * `Russia `_ * `San Francisco Data sets `_ +* `Saskatchewan, Province of Canada `_ * `Seattle `_ * `Singapore Government Data `_ * `South Africa `_ +* `State of Utah, US `_ * `Switzerland `_ * `Texas Open Data `_ * `The World Bank `_ +* `Toronto, ON, Canada ` * `U.K. Government Data `_ * `U.S. American Community Survey `_ * `U.S. CDC Public Health datasets `_ @@ -261,6 +281,7 @@ Government * `United Nations `_ * `Uruguay `_ * `Vancouver, BC Open Data Catalog `_ +* `Victoria, BC, Canada `_ * `DataBC - data from the Province of British Columbia `_ @@ -296,6 +317,10 @@ Image Processing * `YouTube Faces Database `_ * `Several Shape-from-Silhouette Datasets `_ +Legal +---------------- + +* `Canadian Legal Information Institute `_ Machine Learning ---------------- @@ -478,6 +503,7 @@ Transportation * `German train system by Deutsche Bahn `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ +* `Montreal BIXI Bike Share `_ * `NYC Taxi Trip Data 2009- `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Uber trip data April 2014 to September 2014 `_ @@ -485,6 +511,7 @@ Transportation * `Plane Crash Database, since 1920 `_ * `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ +* `Toronto Bike Share Stations (XML file) `_ * `Transport for London (TFL) `_ * `Travel Tracker Survey (TTS) for Chicago `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ From 4c94713af0070aaa7f35d70f6a9af9c34125c10d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Pelletier?= Date: Wed, 30 Dec 2015 23:55:15 -0500 Subject: [PATCH 136/276] Update README.rst --- README.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 1e972fd..8f1e641 100644 --- a/README.rst +++ b/README.rst @@ -207,7 +207,7 @@ Government * `Belgium `_ * `Brazil `_ * `Buenos Aires, Argentina `_ -* `Calgary, AB, Canada ` +* `Calgary, AB, Canada `_ * `Cambridge, MA, US `_ * `Canada `_ * `Chicago `_ @@ -226,8 +226,8 @@ Government * `Ghent, Belgium `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ -* `Halifax, NS, Canada ` -* `Helsinki Region, Finland ` +* `Halifax, NS, Canada `_ +* `Helsinki Region, Finland `_ * `Houston Open Data `_ * `Indian Government Data `_ * `Indonesian Data Portal `_ @@ -266,7 +266,7 @@ Government * `Switzerland `_ * `Texas Open Data `_ * `The World Bank `_ -* `Toronto, ON, Canada ` +* `Toronto, ON, Canada `_ * `U.K. Government Data `_ * `U.S. American Community Survey `_ * `U.S. CDC Public Health datasets `_ From 549c99ca14f005bff5674391c5817e872d6b19f0 Mon Sep 17 00:00:00 2001 From: CW Dillon Date: Thu, 31 Dec 2015 08:24:05 -0500 Subject: [PATCH 137/276] Adding a few data sources from my data bookmarks --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 17d84b3..d1c88e4 100644 --- a/README.rst +++ b/README.rst @@ -449,7 +449,7 @@ Social Sciences * `Youtube Video Social Graph in 2007,2008 `_ * `The MacroData Guide - Norsk samfunnsvitenskapelig datatjeneste`_ * `Cryptome - Random Government Items `_ -* ``_ +* `Datacards`_ * ``_ * ``_ * ``_ From c990c1085eb8b3d84d801c0b2def5c45643a9e05 Mon Sep 17 00:00:00 2001 From: usuallycwdillon Date: Thu, 31 Dec 2015 14:56:58 -0500 Subject: [PATCH 138/276] Added several links from my personal bookmarks --- README.rst | 55 +++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 46 insertions(+), 9 deletions(-) diff --git a/README.rst b/README.rst index 7dc7afa..ffa9c01 100644 --- a/README.rst +++ b/README.rst @@ -86,6 +86,7 @@ Complex Networks * `UCI Network Data Repository `_ * `UFL sparse matrix collection `_ * `WSU Graph Database `_ +* `Stanford Longitudnal Network Data Sources `_ Computer Networks @@ -133,6 +134,19 @@ Economics * `EconData from UMD `_ * `Internet Product Code Database `_ * `OpenCorporates Database of Companies in the World `_ +* `Joint External Debt Data Hub `_ +* `The Atlas of Economic Complexity `_ +* `The Observatory of Economic Complexity `_ +* `The Center for International Data `_ +* `UN Commodity Trade Statistics `_ +* `UN Human Development Reports `_ +* `International Trade Statistics `_ +* `Historical MacroEconomc Statistics `_ +* `SciencesPo World Trade Gravity Datasets `_ +* `Jon Haveman International Trade Data Links `_ +* `Economic Freedom of the World Data `_ +* `Our World in Data `_ + Energy ------ @@ -163,11 +177,13 @@ Finance * `St Louis Federal `_ * `Yahoo Finance `_ + Geology ------- * `Smithsonian Institution Global Volcano and Eruption Database `_ * `USGS Earthquake Archives `_ +* `Earth Models `_ GeoSpace/GIS @@ -175,7 +191,7 @@ GeoSpace/GIS * `BODC - marine data of ~22K vars `_ * `Cambridge, MA, US, GIS data on GitHub `_ -* `EOSDIS - NASA's earth observing system data `_ +* `EOSDIS - NASA's earth observing system data `_ * `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ * `GeoNames Worldwide `_ @@ -190,6 +206,9 @@ GeoSpace/GIS * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ * `World countries in multiple formats `_ +* `International Institute for Systems Analysis - GIS Datasets `_ +* `Geo Wiki Project - Citizen-driven Environmental Monitoring `_ +* `UN Environmental Data `_ Government @@ -262,6 +281,7 @@ Government * `Seattle `_ * `Singapore Government Data `_ * `South Africa `_ +* `South Africa Trade Statistics `_ * `State of Utah, US `_ * `Switzerland `_ * `Texas Open Data `_ @@ -285,12 +305,11 @@ Government * `DataBC - data from the Province of British Columbia `_ - Healthcare ---------- * `EHDP Large Health Data Sets `_ -* `Gapminder World, demographic databases `_ +* `Gapminder World demographic databases `_ * `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ @@ -298,6 +317,7 @@ Healthcare * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ * `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ +* `World Health Organization Global Health Observatory `_ Image Processing @@ -323,6 +343,7 @@ Legal * `Canadian Legal Information Institute `_ + Machine Learning ---------------- @@ -430,6 +451,8 @@ Search Engines * `ICPSR (UMICH) `_ * `Open Data Certificates (beta) `_ * `Statista.com - statistics and Studies `_ +* `Institute of Education Sciences `_ +* `National Technical Reports Library `_ Social Sciences @@ -472,12 +495,23 @@ Social Sciences * `UPJOHN for Labor Employment Research `_ * `Yahoo! Graph and Social Data `_ * `Youtube Video Social Graph in 2007,2008 `_ -* `The MacroData Guide - Norsk samfunnsvitenskapelig datatjeneste`_ -* `Cryptome - Random Government Items `_ -* `Datacards`_ -* ``_ -* ``_ -* ``_ +* `Correlates of War Project `_ +* `The MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_ +* `Cryptome Conspiracy Theory Items `_ +* `Datacards `_ +* `Global Religious Futures Project `_ +* `Institute for Demographic Studies `_ +* `UN Civil Society Database `_ +* `Terrorism Research and Analysis Consortium `_ +* `Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc `_ +* `International Networks Archive `_ +* `Paul Hensel General International Data Page `_ +* `James McGuire Cross National Data `_ +* `International Studies Compendium Project `_ +* `European Social Survey `_ +* `General Social Survey `_ +* `International Social Survey Program ISSP `_ +* `German Social Survey `_ Sports @@ -498,6 +532,7 @@ Time Series * `Heart Rate Time Series from MIT `_ * `Time Series Data Library (TSDL) from MU `_ * `UC Riverside Time Series Dataset `_ +* `Databanks International Cross National Time Series Data Archive `_ Transportation @@ -537,3 +572,5 @@ Complementary Collections * RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ * Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. `_ +* `Database of Scientific Code Contributions `_ + From d2f8cb854921faaa1a95964a1c82212a53212d9c Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sat, 2 Jan 2016 20:23:00 +0800 Subject: [PATCH 139/276] Clean list format --- .travis.yml | 4 +- README.rst | 115 ++++++++++++++++++++++++++-------------------------- 2 files changed, 61 insertions(+), 58 deletions(-) diff --git a/.travis.yml b/.travis.yml index 23b0500..8a16046 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,4 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu - - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,datamob.org,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov \ No newline at end of file + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org + - site503=labrosa.ee.columbia.edu/millionsong,datamob.org + - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 \ No newline at end of file diff --git a/README.rst b/README.rst index ffa9c01..db47ca9 100644 --- a/README.rst +++ b/README.rst @@ -36,7 +36,7 @@ Biology * `MIT Cancer Genomics Data `_ * `NIH Microarray data `_ or `FTP `_ * `OpenSNP genotypes data `_ -* `Pathguid: Protein-Protein Interactions Catalog `_ +* `Pathguid - Protein-Protein Interactions Catalog `_ * `Protein Data Bank `_ * `PubChem Project `_ * `PubGene (now Coremine Medical) `_ @@ -132,20 +132,20 @@ Economics * `American Economic Ass (AEA) `_ * `EconData from UMD `_ +* `Economic Freedom of the World Data `_ +* `Historical MacroEconomc Statistics `_ +* `International Trade Statistics `_ * `Internet Product Code Database `_ -* `OpenCorporates Database of Companies in the World `_ * `Joint External Debt Data Hub `_ +* `Jon Haveman International Trade Data Links `_ +* `OpenCorporates Database of Companies in the World `_ +* `Our World in Data `_ +* `SciencesPo World Trade Gravity Datasets `_ * `The Atlas of Economic Complexity `_ -* `The Observatory of Economic Complexity `_ * `The Center for International Data `_ +* `The Observatory of Economic Complexity `_ * `UN Commodity Trade Statistics `_ * `UN Human Development Reports `_ -* `International Trade Statistics `_ -* `Historical MacroEconomc Statistics `_ -* `SciencesPo World Trade Gravity Datasets `_ -* `Jon Haveman International Trade Data Links `_ -* `Economic Freedom of the World Data `_ -* `Our World in Data `_ Energy @@ -181,9 +181,9 @@ Finance Geology ------- +* `Earth Models `_ * `Smithsonian Institution Global Volcano and Eruption Database `_ * `USGS Earthquake Archives `_ -* `Earth Models `_ GeoSpace/GIS @@ -194,8 +194,10 @@ GeoSpace/GIS * `EOSDIS - NASA's earth observing system data `_ * `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ +* `Geo Wiki Project - Citizen-driven Environmental Monitoring `_ * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ +* `International Institute for Systems Analysis - GIS Datasets `_ * `Landsat 8 on AWS `_ * `List of all countries in all languages `_ * `Natural Earth - vectors and rasters of the world `_ @@ -205,10 +207,8 @@ GeoSpace/GIS * `TIGER/Line - U.S. boundaries and roads `_ * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ -* `World countries in multiple formats `_ -* `International Institute for Systems Analysis - GIS Datasets `_ -* `Geo Wiki Project - Citizen-driven Environmental Monitoring `_ * `UN Environmental Data `_ +* `World countries in multiple formats `_ Government @@ -216,8 +216,8 @@ Government * `Alberta, Province of Canada `_ * `Antwerp, Belgium `_ -* `Argentina `_ * `Argentina (non official) `_ +* `Argentina `_ * `Austin, TX, US `_ * `Australia (abs.gov.au) `_ * `Australia (data.gov.au) `_ @@ -231,6 +231,7 @@ Government * `Canada `_ * `Chicago `_ * `Dallas Open Data `_ +* `DataBC - data from the Province of British Columbia `_ * `Denver Open Data `_ * `Durham, NC Open Data `_ * `Edmonton, AB, Canada `_ @@ -251,8 +252,8 @@ Government * `Indian Government Data `_ * `Indonesian Data Portal `_ * `Laval, QC, Canada `_ -* `London, ON, Canada `_ * `London Datastore, UK `_ +* `London, ON, Canada `_ * `Los Angeles Open Data `_ * `MassGIS, Massachusetts, U.S. `_ * `Mexico `_ @@ -302,7 +303,6 @@ Government * `Uruguay `_ * `Vancouver, BC Open Data Catalog `_ * `Victoria, BC, Canada `_ -* `DataBC - data from the Province of British Columbia `_ Healthcare @@ -332,16 +332,11 @@ Image Processing * `Indoor Scene Recognition `_ * `International Affective Picture System, UFL `_ * `Massive Visual Memory Stimuli, MIT `_ +* `Several Shape-from-Silhouette Datasets `_ * `Stanford Dogs Dataset `_ * `SUN database, MIT `_ * `The Oxford-IIIT Pet Dataset `_ * `YouTube Faces Database `_ -* `Several Shape-from-Silhouette Datasets `_ - -Legal ----------------- - -* `Canadian Legal Information Institute `_ Machine Learning @@ -367,13 +362,13 @@ Machine Learning Museums ------- +* `Canada Science and Technology Museums Corporation's Open Data `_ * `Cooper-Hewitt's Collection Database `_ * `Minneapolis Institute of Arts metadata `_ * `Natural History Museum (London) Data Portal `_ * `Rijksmuseum Historical Art Collection `_ * `Tate Collection metadata `_ * `The Getty vocabularies `_ -* `Canada Science and Technology Museums Corporation's Open Data `_ Natural Language @@ -409,7 +404,7 @@ Physics Psychology/Cognition --------------- +-------------------- * `OSU Cognitive Modeling Repository Datasets `_ @@ -449,69 +444,77 @@ Search Engines * `DataMarket (Qlik) `_ * `Harvard Dataverse Network of scientific data `_ * `ICPSR (UMICH) `_ -* `Open Data Certificates (beta) `_ -* `Statista.com - statistics and Studies `_ * `Institute of Education Sciences `_ -* `National Technical Reports Library `_ +* `National Technical Reports Library `_ +* `Open Data Certificates (beta) `_ +* `OpenDataNetwork - A search engine of all Socrata powered data portals `_ +* `Statista.com - statistics and Studies `_ +* `Zenodo - An open dependable home for the long-tail of science `_ -Social Sciences +Social Networks --------------- -* `72 hours #gamergate scrape `_ +* `72 hours #gamergate Twitter Scrape `_ * `Ancestry.com Forum Dataset over 10 years `_ * `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ * `CMU Enron Email of 150 users `_ * `EDRM Enron EMail of 151 users, hosted on S3 `_ * `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ -* `FBI Hate Crime 2013 - aggregated data `_ * `Foursquare from UMN/Sarwat (2013) `_ -* `GDELT Global Events Database `_ -* `General Social Survey (GSS) since 1972 `_ * `GetGlue - users rating TV shows `_ * `GitHub Collaboration Archive `_ * `Google Scholar citation relations `_ -* `MIT Reality Mining Dataset `_ * `Mobile Social Networks from UMASS `_ * `Network Twitter Data `_ -* `PewResearch Internet Survey Project `_ -* `PewResearch Society Data Collection `_ -* `Political Polarity Data `_ * `Reddit Comments `_ * `Skytrax' Air Travel Reviews Dataset `_ * `Social Twitter Data `_ * `SourceForge.net Research Data `_ -* `StackExchange Data Explorer `_ -* `Texas Inmates Executed Since 1984 `_ -* `Titanic Survival Data Set `_ * `Twitter Data for Sentiment Analysis `_ * `Twitter Graph of entire Twitter site `_ * `Twitter Scrape Calufa May 2011 `_ -* `UCB's Archive of Social Science Data (D-Lab) `_ -* `UCLA Social Sciences Data Archive `_ * `UNIMI/LAW Social Network Datasets `_ -* `Universities Worldwide `_ -* `UPJOHN for Labor Employment Research `_ * `Yahoo! Graph and Social Data `_ * `Youtube Video Social Graph in 2007,2008 `_ + + +Social Sciences +--------------- + +* `Canadian Legal Information Institute `_ +* `Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc `_ * `Correlates of War Project `_ -* `The MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_ * `Cryptome Conspiracy Theory Items `_ * `Datacards `_ +* `European Social Survey `_ +* `FBI Hate Crime 2013 - aggregated data `_ +* `GDELT Global Events Database `_ +* `General Social Survey (GSS) since 1972 `_ +* `General Social Survey `_ +* `German Social Survey `_ * `Global Religious Futures Project `_ * `Institute for Demographic Studies `_ -* `UN Civil Society Database `_ -* `Terrorism Research and Analysis Consortium `_ -* `Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc `_ * `International Networks Archive `_ -* `Paul Hensel General International Data Page `_ -* `James McGuire Cross National Data `_ -* `International Studies Compendium Project `_ -* `European Social Survey `_ -* `General Social Survey `_ * `International Social Survey Program ISSP `_ -* `German Social Survey `_ +* `International Studies Compendium Project `_ +* `James McGuire Cross National Data `_ +* `MIT Reality Mining Dataset `_ +* `Paul Hensel General International Data Page `_ +* `PewResearch Internet Survey Project `_ +* `PewResearch Society Data Collection `_ +* `Political Polarity Data `_ +* `StackExchange Data Explorer `_ +* `Terrorism Research and Analysis Consortium `_ +* `Texas Inmates Executed Since 1984 `_ +* `The MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_ +* `Titanic Survival Data Set `_ +* `UCB's Archive of Social Science Data (D-Lab) `_ +* `UCLA Social Sciences Data Archive `_ +* `UN Civil Society Database `_ +* `Universities Worldwide `_ +* `UPJOHN for Labor Employment Research `_ Sports @@ -528,11 +531,11 @@ Sports Time Series ----------- +* `Databanks International Cross National Time Series Data Archive `_ * `Hard Drive Failure Rates `_ * `Heart Rate Time Series from MIT `_ * `Time Series Data Library (TSDL) from MU `_ * `UC Riverside Time Series Dataset `_ -* `Databanks International Cross National Time Series Data Archive `_ Transportation @@ -564,13 +567,11 @@ Transportation Complementary Collections ------------------------- +* `Database of Scientific Code Contributions `_ * DataWrangling: `Some Datasets Available on the Web `_ * Inside-r: `Finding Data on the Internet `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ -* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ -* Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. `_ -* `Database of Scientific Code Contributions `_ From a9c241aa87edf640e60d3c7e634002e90d664dd9 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Tue, 5 Jan 2016 00:06:18 +0800 Subject: [PATCH 140/276] Remove dup GSS --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index db47ca9..507ea36 100644 --- a/README.rst +++ b/README.rst @@ -492,7 +492,6 @@ Social Sciences * `FBI Hate Crime 2013 - aggregated data `_ * `GDELT Global Events Database `_ * `General Social Survey (GSS) since 1972 `_ -* `General Social Survey `_ * `German Social Survey `_ * `Global Religious Futures Project `_ * `Institute for Demographic Studies `_ From 81cd6895cab4e27bad65842f8de5844e4a0fa19c Mon Sep 17 00:00:00 2001 From: raybuhr Date: Tue, 5 Jan 2016 00:11:23 -0600 Subject: [PATCH 141/276] add http:// prefix to a few links Some of the links returned 404 error messages due to the rst used. Rst assumes a link without a prefix is contained in the local directory, though none of the links in this file are. For example, the line * `The Atlas of Economic Complexity `_ would proceed to the url https://github.com/caesar0301/awesome-public-datasets/blob/master/atlas.cid.harvard.edu, resulting in a 404 error. My change prepends http:// to the link so that line now routes to the correct address. New line: * `The Atlas of Economic Complexity `_ --- README.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.rst b/README.rst index 507ea36..df784ac 100644 --- a/README.rst +++ b/README.rst @@ -141,11 +141,11 @@ Economics * `OpenCorporates Database of Companies in the World `_ * `Our World in Data `_ * `SciencesPo World Trade Gravity Datasets `_ -* `The Atlas of Economic Complexity `_ -* `The Center for International Data `_ -* `The Observatory of Economic Complexity `_ -* `UN Commodity Trade Statistics `_ -* `UN Human Development Reports `_ +* `The Atlas of Economic Complexity `_ +* `The Center for International Data `_ +* `The Observatory of Economic Complexity `_ +* `UN Commodity Trade Statistics `_ +* `UN Human Development Reports `_ Energy @@ -488,7 +488,7 @@ Social Sciences * `Correlates of War Project `_ * `Cryptome Conspiracy Theory Items `_ * `Datacards `_ -* `European Social Survey `_ +* `European Social Survey `_ * `FBI Hate Crime 2013 - aggregated data `_ * `GDELT Global Events Database `_ * `General Social Survey (GSS) since 1972 `_ From c7828639c876b92c42c7a853eade81856fa1d750 Mon Sep 17 00:00:00 2001 From: Wes Turner Date: Fri, 8 Jan 2016 07:10:53 -0600 Subject: [PATCH 142/276] DOC: README.rst: .. contents:: --- README.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.rst b/README.rst index df784ac..db0b8de 100644 --- a/README.rst +++ b/README.rst @@ -13,6 +13,9 @@ Other amazingly awesome lists can be found in the `awesome-awesomeness `_ and `sindresorhus's awesome `_ list. +Contents +---------- +.. contents:: Agriculture ------------ From bf251cea26eab5597dd232fb523fd85e570ca800 Mon Sep 17 00:00:00 2001 From: Krishna Chaitanya Date: Sun, 10 Jan 2016 11:16:22 +0530 Subject: [PATCH 143/276] Added the dataset 'Labeled Faces in the Wild' --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index df784ac..71a0700 100644 --- a/README.rst +++ b/README.rst @@ -357,6 +357,7 @@ Machine Learning * `Restaurants Health Score Data in San Francisco `_ * `UCI Machine Learning Repository `_ * `Yahoo! Ratings and Classification Data `_ +* `Labeled Faces in the Wild (LFW) `_ Museums From 04400158cecbc810885b2bba7b8bfe7a729d5432 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Sat, 16 Jan 2016 06:28:34 -0800 Subject: [PATCH 144/276] [travis] white list gutenberg.org --- .travis.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.travis.yml b/.travis.yml index 8a16046..e0e704a 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org - site503=labrosa.ee.columbia.edu/millionsong,datamob.org - - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 \ No newline at end of file + - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From ccb87d4fc3fec5e573d82d73c957f39c72f4920a Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Sat, 16 Jan 2016 06:30:10 -0800 Subject: [PATCH 145/276] [travis] 404 http://www.oecd.org/document/0 --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index e0e704a..36d66e0 100644 --- a/.travis.yml +++ b/.travis.yml @@ -4,7 +4,7 @@ rvm: before_script: - gem install awesome_bot script: - - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu + - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0 - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org - site503=labrosa.ee.columbia.edu/millionsong,datamob.org - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From 8e55d64b62931fee77876e66622f4b37b89277cd Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Sat, 16 Jan 2016 06:31:52 -0800 Subject: [PATCH 146/276] [travis] white list donnees.gouv.qc.ca --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 36d66e0..9aac90b 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0 - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca - site503=labrosa.ee.columbia.edu/millionsong,datamob.org - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From 22009c64929347a5b5be30b2882421df1f9f6cc9 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Sat, 16 Jan 2016 06:33:36 -0800 Subject: [PATCH 147/276] [travis] white list data.rio.rj.gov.br --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 9aac90b..ffa52b7 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0 - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br - site503=labrosa.ee.columbia.edu/millionsong,datamob.org - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From 14404dacef75c3bf5efd14cb01b5259e4cc3bd4a Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Sat, 16 Jan 2016 06:36:12 -0800 Subject: [PATCH 148/276] [travis] white list cvcl.mit.edu --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index ffa52b7..6d604a4 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0 - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu - site503=labrosa.ee.columbia.edu/millionsong,datamob.org - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From 1dc044131cb119f9ce5fc99ad133d22e3b448c79 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Sat, 16 Jan 2016 06:36:51 -0800 Subject: [PATCH 149/276] [travis] white list data.ohouston.org --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 6d604a4..f58076b 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0 - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org - site503=labrosa.ee.columbia.edu/millionsong,datamob.org - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From f4b331a6174373eeed1ba724792245e3cfc60581 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Sat, 16 Jan 2016 06:37:32 -0800 Subject: [PATCH 150/276] [travis] white list ntrl.ntis.gov --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index f58076b..cede4dc 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0 - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov - site503=labrosa.ee.columbia.edu/millionsong,datamob.org - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From 8db25faf8dcb961124544bf95eb9054d5d7fe66b Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Sat, 16 Jan 2016 06:40:19 -0800 Subject: [PATCH 151/276] [travis] 404 data.gov.be --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index cede4dc..c689448 100644 --- a/.travis.yml +++ b/.travis.yml @@ -4,7 +4,7 @@ rvm: before_script: - gem install awesome_bot script: - - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0 + - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,data.gov.be - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov - site503=labrosa.ee.columbia.edu/millionsong,datamob.org - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From 60a7a434aa9a439001bd42b03af969300f9d4146 Mon Sep 17 00:00:00 2001 From: Phill Date: Sun, 17 Jan 2016 10:32:07 +0000 Subject: [PATCH 152/276] Added Pinhooker to Sport --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index dbe0a2a..78fffc4 100644 --- a/README.rst +++ b/README.rst @@ -528,6 +528,7 @@ Sports * `Ergast Formula 1, from 1950 up to date (API) `_ * `Football/Soccer resources (data and APIs) `_ * `Lahman's Baseball Database `_ +* `Pinhooker: Thoroughbred Bloodstock Sale Data `_ From 535be187b1d9a19b91d3c8a809745ab51ffb04a8 Mon Sep 17 00:00:00 2001 From: Phill Date: Sun, 17 Jan 2016 10:33:21 +0000 Subject: [PATCH 153/276] Fix Pinhooker URL --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 78fffc4..0564ead 100644 --- a/README.rst +++ b/README.rst @@ -528,7 +528,7 @@ Sports * `Ergast Formula 1, from 1950 up to date (API) `_ * `Football/Soccer resources (data and APIs) `_ * `Lahman's Baseball Database `_ -* `Pinhooker: Thoroughbred Bloodstock Sale Data `_ * `Retrosheet Baseball Statistics `_ From 7916027e4d2f9bd9fc73bdf4b1c9f906d5a862db Mon Sep 17 00:00:00 2001 From: Phill Date: Sun, 17 Jan 2016 12:22:58 +0000 Subject: [PATCH 154/276] Fix Broken Links Travis build failed on a number of broken links. I've rectified some of the links, but the following I cannot: 3. http://cvcl.mit.edu/MM/stimuli.html Connection refused - connect(2) for "cvcl.mit.edu" port 80 4. 403 http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs 5. http://data.ohouston.org Net::ReadTimeout 6. http://data.rio.rj.gov.br/ Connection timed out - connect(2) for "data.rio.rj.gov.br" port 80 --- README.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index 0564ead..1630212 100644 --- a/README.rst +++ b/README.rst @@ -226,7 +226,7 @@ Government * `Australia (data.gov.au) `_ * `Austria (data.gv.at) `_ * `Baton Rouge, LA, US `_ -* `Belgium `_ +* `Belgium `_ * `Brazil `_ * `Buenos Aires, Argentina `_ * `Calgary, AB, Canada `_ @@ -267,7 +267,7 @@ Government * `New Zealand `_ * `NYC betanyc `_ * `NYC Open Data `_ -* `OECD `_ +* `OECD `_ * `Oklahoma `_ * `Open Government Data (OGD) Platform India `_ * `Oregon `_ @@ -449,7 +449,7 @@ Search Engines * `Harvard Dataverse Network of scientific data `_ * `ICPSR (UMICH) `_ * `Institute of Education Sciences `_ -* `National Technical Reports Library `_ +* `National Technical Reports Library `_ * `Open Data Certificates (beta) `_ * `OpenDataNetwork - A search engine of all Socrata powered data portals `_ * `Statista.com - statistics and Studies `_ From 52183c015fc3b84a5576ff58db1f24e449c2977d Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Mon, 18 Jan 2016 15:33:16 +0800 Subject: [PATCH 155/276] Add WorldPop project --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 1630212..407b223 100644 --- a/README.rst +++ b/README.rst @@ -518,6 +518,7 @@ Social Sciences * `UN Civil Society Database `_ * `Universities Worldwide `_ * `UPJOHN for Labor Employment Research `_ +* `WorldPop project - Worldwide human population distributions `_ Sports From cd8064eafef2534d55616132f5def68cd5913be8 Mon Sep 17 00:00:00 2001 From: Helen Flynn Date: Thu, 21 Jan 2016 16:38:29 +0000 Subject: [PATCH 156/276] Add OME powered data repositories --- README.rst | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 407b223..8504e4b 100644 --- a/README.rst +++ b/README.rst @@ -27,15 +27,19 @@ Biology * `1000 Genomes `_ * `American Gut (Microbiome Project) `_ +* `Cell Image Library `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ -* `EBI ArrayExrepss `_ +* `EBI ArrayExpress `_ +* `EBI Protein Data Bank in Europe `_ * `ENCODE project `_ * `Ensembl Genomes `_ * `Gene Expression Omnibus (GEO) `_ * `Gene Ontology (GO) `_ * `Global Biotic Interations (GloBI) `_ +* `Harvard Medical School (HMS) LINCS Project `_ * `Human Microbiome Project (HMP) `_ * `ICOS PSP Benchmark `_ +* `Journal of Cell Biology DataViewer `_ * `MIT Cancer Genomics Data `_ * `NIH Microarray data `_ or `FTP `_ * `OpenSNP genotypes data `_ @@ -45,6 +49,8 @@ Biology * `PubGene (now Coremine Medical) `_ * `Sequence Read Archive(SRA) `_ * `Stanford Microarray Data `_ +* `Stowers Institute Original Data Repository `_ +* `Systems Science of Biological Dynamics (SSBD) Database `_ * `The Catalogue of Life `_ * `The Personal Genome Project `_ or `PGP `_ * `UCSC Public Data `_ From 633bf45d45f6b8eeb8b53d367727b1cf67dcd983 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Mon, 25 Jan 2016 07:34:17 -0800 Subject: [PATCH 157/276] [travis] 404 census.gov/acs/www/data_documentation/data_release_info/ --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index c689448..66e6508 100644 --- a/.travis.yml +++ b/.travis.yml @@ -4,7 +4,7 @@ rvm: before_script: - gem install awesome_bot script: - - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,data.gov.be + - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,data.gov.be,census.gov/acs/www/data_documentation/data_release_info/ - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov - site503=labrosa.ee.columbia.edu/millionsong,datamob.org - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From e2cf49a247a27ced2922ce960891b05e548e4e45 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Mon, 25 Jan 2016 07:49:50 -0800 Subject: [PATCH 158/276] [travis] update --- .travis.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.travis.yml b/.travis.yml index 66e6508..e6e15a0 100644 --- a/.travis.yml +++ b/.travis.yml @@ -4,7 +4,7 @@ rvm: before_script: - gem install awesome_bot script: - - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,data.gov.be,census.gov/acs/www/data_documentation/data_release_info/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov + - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,data.gov.be,census.gov/acs/www/data_documentation/data_release_info/,europeansocialsurvey.org/data/ + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov,networkdata.ics.uci.edu,sinda.crn2.inpe.br,archive.ics.uci.edu - site503=labrosa.ee.columbia.edu/millionsong,datamob.org - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From f871e81fe0944f810c631942054fc40a846108d7 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Thu, 28 Jan 2016 16:39:21 -0800 Subject: [PATCH 159/276] [travis] white list update --- .travis.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.travis.yml b/.travis.yml index e6e15a0..d2ad548 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,data.gov.be,census.gov/acs/www/data_documentation/data_release_info/,europeansocialsurvey.org/data/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov,networkdata.ics.uci.edu,sinda.crn2.inpe.br,archive.ics.uci.edu - - site503=labrosa.ee.columbia.edu/millionsong,datamob.org + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov,networkdata.ics.uci.edu,sinda.crn2.inpe.br,archive.ics.uci.edu,hmpdacc + - site503=labrosa.ee.columbia.edu/millionsong,datamob.org,wikileaks - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From d252d73097736938c8f35dbfb1ae626975e32fcf Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Fri, 29 Jan 2016 07:08:31 -0800 Subject: [PATCH 160/276] [travis] white list statista --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index d2ad548..3ceb8e5 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,data.gov.be,census.gov/acs/www/data_documentation/data_release_info/,europeansocialsurvey.org/data/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov,networkdata.ics.uci.edu,sinda.crn2.inpe.br,archive.ics.uci.edu,hmpdacc + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov,networkdata.ics.uci.edu,sinda.crn2.inpe.br,archive.ics.uci.edu,hmpdacc,statista - site503=labrosa.ee.columbia.edu/millionsong,datamob.org,wikileaks - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From 3f8a982be8c7dd881a9bf4554376c384976561bf Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Fri, 29 Jan 2016 07:19:39 -0800 Subject: [PATCH 161/276] [travis] white list moncton.ca --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 3ceb8e5..a9b5ef3 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,data.gov.be,census.gov/acs/www/data_documentation/data_release_info/,europeansocialsurvey.org/data/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov,networkdata.ics.uci.edu,sinda.crn2.inpe.br,archive.ics.uci.edu,hmpdacc,statista + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov,networkdata.ics.uci.edu,sinda.crn2.inpe.br,archive.ics.uci.edu,hmpdacc,statista,moncton.ca - site503=labrosa.ee.columbia.edu/millionsong,datamob.org,wikileaks - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From 8df05809de0543c894653d060a6ca539cef856d1 Mon Sep 17 00:00:00 2001 From: Jordan Matelsky Date: Sat, 30 Jan 2016 22:11:43 -0500 Subject: [PATCH 162/276] Update README.rst --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 8504e4b..da86e8d 100644 --- a/README.rst +++ b/README.rst @@ -41,6 +41,7 @@ Biology * `ICOS PSP Benchmark `_ * `Journal of Cell Biology DataViewer `_ * `MIT Cancer Genomics Data `_ +* `NeuroData `_ * `NIH Microarray data `_ or `FTP `_ * `OpenSNP genotypes data `_ * `Pathguid - Protein-Protein Interactions Catalog `_ From 788b7af22e05b6094f0e02c409f9f8bf68fb2992 Mon Sep 17 00:00:00 2001 From: Daniel Date: Sat, 30 Jan 2016 22:49:11 -0500 Subject: [PATCH 163/276] Update README.rst Added Open Payments Data --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 8504e4b..7fd58fb 100644 --- a/README.rst +++ b/README.rst @@ -325,6 +325,7 @@ Healthcare * `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ +* `OpenPaymentsData, Healthcare financial relationship data `_ * `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ * `World Health Organization Global Health Observatory `_ From 29fccee399063c77f45e606d703990d68e79649b Mon Sep 17 00:00:00 2001 From: Suyash Shringarpure Date: Sat, 30 Jan 2016 22:54:38 -0800 Subject: [PATCH 164/276] Added more genomics datasets HGDP/HapMap/CGI Added datasets from the Human Genome Diversity Project, HapMap Project and Complete Genomics. --- README.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.rst b/README.rst index 8504e4b..fb9ce9a 100644 --- a/README.rst +++ b/README.rst @@ -29,6 +29,7 @@ Biology * `American Gut (Microbiome Project) `_ * `Cell Image Library `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ +* `Complete Genomics Public Data `_ * `EBI ArrayExpress `_ * `EBI Protein Data Bank in Europe `_ * `ENCODE project `_ @@ -37,8 +38,10 @@ Biology * `Gene Ontology (GO) `_ * `Global Biotic Interations (GloBI) `_ * `Harvard Medical School (HMS) LINCS Project `_ +* `Human Genome Diversity Project `_ * `Human Microbiome Project (HMP) `_ * `ICOS PSP Benchmark `_ +* `International HapMap Project `_ * `Journal of Cell Biology DataViewer `_ * `MIT Cancer Genomics Data `_ * `NIH Microarray data `_ or `FTP `_ From 1418f271f83bde195ff1329dd35ed2b01f10072b Mon Sep 17 00:00:00 2001 From: Will Oemler Date: Sun, 31 Jan 2016 08:10:01 -0500 Subject: [PATCH 165/276] Added some cancer genomics resources. --- README.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.rst b/README.rst index 8504e4b..17321dd 100644 --- a/README.rst +++ b/README.rst @@ -27,6 +27,7 @@ Biology * `1000 Genomes `_ * `American Gut (Microbiome Project) `_ +* `Broad Cancer Cell Line Encyclopedia (CCLE) `_ * `Cell Image Library `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `EBI ArrayExpress `_ @@ -47,10 +48,13 @@ Biology * `Protein Data Bank `_ * `PubChem Project `_ * `PubGene (now Coremine Medical) `_ +* `Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) `_ +* `Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC) `_ * `Sequence Read Archive(SRA) `_ * `Stanford Microarray Data `_ * `Stowers Institute Original Data Repository `_ * `Systems Science of Biological Dynamics (SSBD) Database `_ +* `The Cancer Genome Atlas (TCGA), available via Broad GDAC `_ * `The Catalogue of Life `_ * `The Personal Genome Project `_ or `PGP `_ * `UCSC Public Data `_ From 4f9f1181ef1ba53322a969f5d79e4e646ce973e9 Mon Sep 17 00:00:00 2001 From: Peter Date: Sun, 31 Jan 2016 14:39:53 +0100 Subject: [PATCH 166/276] added open traffic collection --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 8504e4b..b399e4a 100644 --- a/README.rst +++ b/README.rst @@ -564,6 +564,7 @@ Transportation * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Uber trip data April 2014 to September 2014 `_ * `OpenFlights - airport, airline and route data `_ +* `Open Traffic collection `_ * `Plane Crash Database, since 1920 `_ * `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ From 41db20085686ed4a95880293a1bc4478dbab35e4 Mon Sep 17 00:00:00 2001 From: Dan Bartlett Date: Sun, 31 Jan 2016 15:07:13 +0000 Subject: [PATCH 167/276] Update README.rst Link to latest version of Census Open Atlas --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 8504e4b..fe50808 100644 --- a/README.rst +++ b/README.rst @@ -307,7 +307,7 @@ Government * `U.S. Food and Drug Administration (FDA) `_ * `U.S. National Center for Education Statistics (NCES) `_ * `U.S. Open Government `_ -* `UK 2011 Census Open Atlas Project `_ +* `UK 2011 Census Open Atlas Project `_ * `United Nations `_ * `Uruguay `_ * `Vancouver, BC Open Data Catalog `_ From e6c70b9f47b657d572e01d60cc4251082a318472 Mon Sep 17 00:00:00 2001 From: Tome Date: Sun, 31 Jan 2016 15:07:40 +0000 Subject: [PATCH 168/276] Added Portuguese database --- README.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 8504e4b..b28aec0 100644 --- a/README.rst +++ b/README.rst @@ -200,7 +200,7 @@ GeoSpace/GIS * `BODC - marine data of ~22K vars `_ * `Cambridge, MA, US, GIS data on GitHub `_ -* `EOSDIS - NASA's earth observing system data `_ +* `EOSDIS - NASA's earth observing system data `_ * `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ * `Geo Wiki Project - Citizen-driven Environmental Monitoring `_ @@ -240,7 +240,7 @@ Government * `Canada `_ * `Chicago `_ * `Dallas Open Data `_ -* `DataBC - data from the Province of British Columbia `_ +* `DataBC - data from the Province of British Columbia `_ * `Denver Open Data `_ * `Durham, NC Open Data `_ * `Edmonton, AB, Canada `_ @@ -279,11 +279,12 @@ Government * `Oregon `_ * `Ottawa, ON, Canada `_ * `Portland, Oregon `_ +* `Portugal - Pordata `_ * `Puerto Rico Government `_ * `Quebec City, QC, Canada `_ * `Quebec Province of Canada `_ * `Regina SK, Canada `_ -* `Rio de Janeiro, Brazil `_ +* `Rio de Janeiro, Brazil `_ * `Romania `_ * `Russia `_ * `San Francisco Data sets `_ @@ -326,7 +327,7 @@ Healthcare * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ * `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ -* `World Health Organization Global Health Observatory `_ +* `World Health Organization Global Health Observatory `_ Image Processing From 64f0325f38d7de0b53cdad8091e0f99345b62ad3 Mon Sep 17 00:00:00 2001 From: Tome Date: Sun, 31 Jan 2016 15:07:40 +0000 Subject: [PATCH 169/276] Added Portuguese stats atabase --- README.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 8504e4b..b28aec0 100644 --- a/README.rst +++ b/README.rst @@ -200,7 +200,7 @@ GeoSpace/GIS * `BODC - marine data of ~22K vars `_ * `Cambridge, MA, US, GIS data on GitHub `_ -* `EOSDIS - NASA's earth observing system data `_ +* `EOSDIS - NASA's earth observing system data `_ * `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ * `Geo Wiki Project - Citizen-driven Environmental Monitoring `_ @@ -240,7 +240,7 @@ Government * `Canada `_ * `Chicago `_ * `Dallas Open Data `_ -* `DataBC - data from the Province of British Columbia `_ +* `DataBC - data from the Province of British Columbia `_ * `Denver Open Data `_ * `Durham, NC Open Data `_ * `Edmonton, AB, Canada `_ @@ -279,11 +279,12 @@ Government * `Oregon `_ * `Ottawa, ON, Canada `_ * `Portland, Oregon `_ +* `Portugal - Pordata `_ * `Puerto Rico Government `_ * `Quebec City, QC, Canada `_ * `Quebec Province of Canada `_ * `Regina SK, Canada `_ -* `Rio de Janeiro, Brazil `_ +* `Rio de Janeiro, Brazil `_ * `Romania `_ * `Russia `_ * `San Francisco Data sets `_ @@ -326,7 +327,7 @@ Healthcare * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ * `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ -* `World Health Organization Global Health Observatory `_ +* `World Health Organization Global Health Observatory `_ Image Processing From 9792bace9e7763c9ae591c39017dfb2d02f92ec6 Mon Sep 17 00:00:00 2001 From: Tome Date: Sun, 31 Jan 2016 15:24:54 +0000 Subject: [PATCH 170/276] Added Portuguese stats database --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index b28aec0..685ff79 100644 --- a/README.rst +++ b/README.rst @@ -279,7 +279,7 @@ Government * `Oregon `_ * `Ottawa, ON, Canada `_ * `Portland, Oregon `_ -* `Portugal - Pordata `_ +* `Portugal - Pordata organization `_ * `Puerto Rico Government `_ * `Quebec City, QC, Canada `_ * `Quebec Province of Canada `_ From 9e5a4aef8e3a81f837a5174b0a1756b1dd8a0158 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Mon, 1 Feb 2016 07:49:53 -0800 Subject: [PATCH 171/276] [travis] white list openflights --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index a9b5ef3..d1935ca 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,data.gov.be,census.gov/acs/www/data_documentation/data_release_info/,europeansocialsurvey.org/data/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov,networkdata.ics.uci.edu,sinda.crn2.inpe.br,archive.ics.uci.edu,hmpdacc,statista,moncton.ca + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov,networkdata.ics.uci.edu,sinda.crn2.inpe.br,archive.ics.uci.edu,hmpdacc,statista,moncton.ca,openflights - site503=labrosa.ee.columbia.edu/millionsong,datamob.org,wikileaks - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 From ce186bb56d3788906eb32b0024fe31be03340424 Mon Sep 17 00:00:00 2001 From: Quincy Larson Date: Mon, 1 Feb 2016 14:42:02 -0800 Subject: [PATCH 172/276] Add Free Code Camp's 150,000 record open data set For more information on the dataset: https://medium.freecodecamp.com/free-code-camp-christmas-special-giving-the-gift-of-data-6ecbf0313d62#.4y2k11ta2 --- README.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.rst b/README.rst index 8504e4b..2acb54d 100644 --- a/README.rst +++ b/README.rst @@ -157,6 +157,12 @@ Economics * `UN Human Development Reports `_ +Education +------------ + +* `Student Data from Free Code Camp `_ + + Energy ------ From baee4a3fdd523712eac95585f5f6044240297d59 Mon Sep 17 00:00:00 2001 From: Sean Ryan Date: Tue, 2 Feb 2016 09:17:02 +0000 Subject: [PATCH 173/276] Ireland's open data --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 8504e4b..2fadfa7 100644 --- a/README.rst +++ b/README.rst @@ -260,6 +260,7 @@ Government * `Houston Open Data `_ * `Indian Government Data `_ * `Indonesian Data Portal `_ +* `Ireland's Open Data Portal `_ * `Laval, QC, Canada `_ * `London Datastore, UK `_ * `London, ON, Canada `_ From 59a5dc490b31d2218b0acff711bb41d4d4fb6252 Mon Sep 17 00:00:00 2001 From: Ben Verhoeven Date: Tue, 2 Feb 2016 13:25:17 +0100 Subject: [PATCH 174/276] Update README.rst added Personae and CSI corpus to Natural Language --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 8504e4b..769aefd 100644 --- a/README.rst +++ b/README.rst @@ -385,6 +385,7 @@ Natural Language ---------------- * `Blogger Corpus `_ +* `CLiPS Stylometry Investigation Corpus `_ * `ClueWeb09 FACC `_ * `ClueWeb12 FACC `_ * `DBpedia - 4.58M things with 583M facts `_ @@ -396,6 +397,7 @@ Natural Language * `Hansards text chunks of Canadian Parliament `_ * `Machine Comprehension Test (MCTest) of text from Microsoft Research `_ * `Machine Translation of European languages `_ +* `Personae Corpus `_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ * `SMS Spam Collection in English `_ * `USENET postings corpus of 2005~2011 `_ From 717a5e490037204348181518968709e260c320da Mon Sep 17 00:00:00 2001 From: Alex Urquhart Date: Tue, 2 Feb 2016 12:21:51 -0500 Subject: [PATCH 175/276] Update README.rst --- README.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 8504e4b..b7cf4bc 100644 --- a/README.rst +++ b/README.rst @@ -197,7 +197,6 @@ Geology GeoSpace/GIS ------------ - * `BODC - marine data of ~22K vars `_ * `Cambridge, MA, US, GIS data on GitHub `_ * `EOSDIS - NASA's earth observing system data `_ @@ -209,6 +208,7 @@ GeoSpace/GIS * `International Institute for Systems Analysis - GIS Datasets `_ * `Landsat 8 on AWS `_ * `List of all countries in all languages `_ +* `National Weather Service GIS Data Portal `_ * `Natural Earth - vectors and rasters of the world `_ * `OpenAddresses `_ * `OpenStreetMap (OSM) `_ @@ -217,6 +217,7 @@ GeoSpace/GIS * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ * `UN Environmental Data `_ +* `World boundaries from the U.S. Department of State `_ * `World countries in multiple formats `_ @@ -493,6 +494,7 @@ Social Networks Social Sciences --------------- +* `ACLED (Armed Conflict Location & Event Data Project) `_ * `Canadian Legal Information Institute `_ * `Center for Systemic Peace Datasets - Conflict Trends, Polities, State Fragility, etc `_ * `Correlates of War Project `_ @@ -504,6 +506,7 @@ Social Sciences * `General Social Survey (GSS) since 1972 `_ * `German Social Survey `_ * `Global Religious Futures Project `_ +* `Humanitarian Data Exchange _ * `Institute for Demographic Studies `_ * `International Networks Archive `_ * `International Social Survey Program ISSP `_ From 80c484fa7e385180b243f834966072a3812aeebc Mon Sep 17 00:00:00 2001 From: Alex Urquhart Date: Tue, 2 Feb 2016 12:25:44 -0500 Subject: [PATCH 176/276] Update README.rst --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index b7cf4bc..1fc2b7e 100644 --- a/README.rst +++ b/README.rst @@ -197,6 +197,7 @@ Geology GeoSpace/GIS ------------ + * `BODC - marine data of ~22K vars `_ * `Cambridge, MA, US, GIS data on GitHub `_ * `EOSDIS - NASA's earth observing system data `_ From a1534d5cf5baf3de4227e0059d80e2ac8855ebb2 Mon Sep 17 00:00:00 2001 From: Alex Urquhart Date: Tue, 2 Feb 2016 12:39:02 -0500 Subject: [PATCH 177/276] Update README.rst --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 1fc2b7e..0a01961 100644 --- a/README.rst +++ b/README.rst @@ -213,6 +213,7 @@ GeoSpace/GIS * `Natural Earth - vectors and rasters of the world `_ * `OpenAddresses `_ * `OpenStreetMap (OSM) `_ +* `GeoFabrik - OSM data extracted to a variety of formats and areas `_ * `Reverse Geocoder using OSM data `_ & `additional high-resolution data files `_ * `TIGER/Line - U.S. boundaries and roads `_ * `TwoFishes - Foursquare's coarse geocoder `_ From a9b5b6095e5f270366602c5a6d4d26620f214210 Mon Sep 17 00:00:00 2001 From: Alex Urquhart Date: Tue, 2 Feb 2016 12:44:33 -0500 Subject: [PATCH 178/276] Update README.rst --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 0a01961..7c9eee8 100644 --- a/README.rst +++ b/README.rst @@ -508,7 +508,7 @@ Social Sciences * `General Social Survey (GSS) since 1972 `_ * `German Social Survey `_ * `Global Religious Futures Project `_ -* `Humanitarian Data Exchange _ +* `Humanitarian Data Exchange `_ * `Institute for Demographic Studies `_ * `International Networks Archive `_ * `International Social Survey Program ISSP `_ From c6b678ad6a32b96da4753f97afc38f07213340a4 Mon Sep 17 00:00:00 2001 From: Chase Southard Date: Tue, 2 Feb 2016 14:13:34 -0500 Subject: [PATCH 179/276] add link to lexinton's open data collection --- README.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 8504e4b..c4881c9 100644 --- a/README.rst +++ b/README.rst @@ -200,7 +200,7 @@ GeoSpace/GIS * `BODC - marine data of ~22K vars `_ * `Cambridge, MA, US, GIS data on GitHub `_ -* `EOSDIS - NASA's earth observing system data `_ +* `EOSDIS - NASA's earth observing system data `_ * `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ * `Geo Wiki Project - Citizen-driven Environmental Monitoring `_ @@ -240,7 +240,7 @@ Government * `Canada `_ * `Chicago `_ * `Dallas Open Data `_ -* `DataBC - data from the Province of British Columbia `_ +* `DataBC - data from the Province of British Columbia `_ * `Denver Open Data `_ * `Durham, NC Open Data `_ * `Edmonton, AB, Canada `_ @@ -261,6 +261,7 @@ Government * `Indian Government Data `_ * `Indonesian Data Portal `_ * `Laval, QC, Canada `_ +* `Lexington, KY `_ * `London Datastore, UK `_ * `London, ON, Canada `_ * `Los Angeles Open Data `_ @@ -283,7 +284,7 @@ Government * `Quebec City, QC, Canada `_ * `Quebec Province of Canada `_ * `Regina SK, Canada `_ -* `Rio de Janeiro, Brazil `_ +* `Rio de Janeiro, Brazil `_ * `Romania `_ * `Russia `_ * `San Francisco Data sets `_ @@ -326,7 +327,7 @@ Healthcare * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ * `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ -* `World Health Organization Global Health Observatory `_ +* `World Health Organization Global Health Observatory `_ Image Processing From ccb6eb82c62e1767ee641775a2ba5d0c2499fd1e Mon Sep 17 00:00:00 2001 From: Daniel Fowler Date: Wed, 3 Feb 2016 16:40:50 +0300 Subject: [PATCH 180/276] Update README.rst Add data packaged "core" datasets --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 8504e4b..b6202b5 100644 --- a/README.rst +++ b/README.rst @@ -585,4 +585,5 @@ Complementary Collections * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ +* `Data Packaged Core Datasets `_ From 4726d58dcbdb039511f69101ebbec25ca7c7b8a1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bernhard=20M=C3=A4ser?= Date: Wed, 3 Feb 2016 16:37:29 +0100 Subject: [PATCH 181/276] added the Vienna (Austria) 'Open Government Data' catalogue --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 8504e4b..8de2e9a 100644 --- a/README.rst +++ b/README.rst @@ -312,6 +312,7 @@ Government * `Uruguay `_ * `Vancouver, BC Open Data Catalog `_ * `Victoria, BC, Canada `_ +* `Vienna, Austria `_ Healthcare From c0fbb8cc0e199aeebc4a38b1fa4950bcc8a681bc Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 4 Feb 2016 22:06:44 +0800 Subject: [PATCH 182/276] Merge #180 --- README.rst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 7e8072b..ab207ff 100644 --- a/README.rst +++ b/README.rst @@ -13,10 +13,12 @@ Other amazingly awesome lists can be found in the `awesome-awesomeness `_ and `sindresorhus's awesome `_ list. + Contents ---------- .. contents:: + Agriculture ------------ * `U.S. Department of Agriculture's PLANTS Database `_ @@ -535,7 +537,9 @@ Social Sciences * `International Social Survey Program ISSP `_ * `International Studies Compendium Project `_ * `James McGuire Cross National Data `_ +* `MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_ * `MIT Reality Mining Dataset `_ +* `Open Crime and Policing Data in England, Wales and Northern Ireland `_ * `Paul Hensel General International Data Page `_ * `PewResearch Internet Survey Project `_ * `PewResearch Society Data Collection `_ @@ -543,13 +547,13 @@ Social Sciences * `StackExchange Data Explorer `_ * `Terrorism Research and Analysis Consortium `_ * `Texas Inmates Executed Since 1984 `_ -* `The MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_ * `Titanic Survival Data Set `_ * `UCB's Archive of Social Science Data (D-Lab) `_ * `UCLA Social Sciences Data Archive `_ * `UN Civil Society Database `_ * `Universities Worldwide `_ * `UPJOHN for Labor Employment Research `_ +* `World Bank Data `_ * `WorldPop project - Worldwide human population distributions `_ From de00186b9628bd10aa2b9e31ffa7e170cfd707b6 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 4 Feb 2016 22:09:31 +0800 Subject: [PATCH 183/276] Merge #179 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index ab207ff..e57e542 100644 --- a/README.rst +++ b/README.rst @@ -280,6 +280,7 @@ Government * `Indian Government Data `_ * `Indonesian Data Portal `_ * `Ireland's Open Data Portal `_ +* `Japan `_ * `Laval, QC, Canada `_ * `Lexington, KY `_ * `London Datastore, UK `_ From 845e78f006577cbd540e5bbd4fd2328b7843a670 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 4 Feb 2016 22:10:29 +0800 Subject: [PATCH 184/276] Merge #178 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e57e542..4ec9926 100644 --- a/README.rst +++ b/README.rst @@ -595,6 +595,7 @@ Transportation * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Uber trip data April 2014 to September 2014 `_ * `OpenFlights - airport, airline and route data `_ +* `Philadelphia Bike Share Stations (JSON) `_ * `Open Traffic collection `_ * `Plane Crash Database, since 1920 `_ * `RITA Airline On-Time Performance data `_ From d5030b0f5b4e4d82639ee2bf1d4c46241eef7abc Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 4 Feb 2016 22:12:01 +0800 Subject: [PATCH 185/276] Merge #175 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 4ec9926..e67e5b0 100644 --- a/README.rst +++ b/README.rst @@ -229,6 +229,7 @@ GeoSpace/GIS * `Natural Earth - vectors and rasters of the world `_ * `OpenAddresses `_ * `OpenStreetMap (OSM) `_ +* `Pleiades - Gazetteer and graph of ancient places `_ * `GeoFabrik - OSM data extracted to a variety of formats and areas `_ * `Reverse Geocoder using OSM data `_ & `additional high-resolution data files `_ * `TIGER/Line - U.S. boundaries and roads `_ From 5323085486b8d97cf08288e1e2ca5f4b9c98124e Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 4 Feb 2016 22:14:31 +0800 Subject: [PATCH 186/276] Merge #167 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e67e5b0..8373b72 100644 --- a/README.rst +++ b/README.rst @@ -60,6 +60,7 @@ Biology * `Stanford Microarray Data `_ * `Stowers Institute Original Data Repository `_ * `Systems Science of Biological Dynamics (SSBD) Database `_ +* `Temple University Hospital EEG Database `_ * `The Cancer Genome Atlas (TCGA), available via Broad GDAC `_ * `The Catalogue of Life `_ * `The Personal Genome Project `_ or `PGP `_ From a58b29365dd96f38afe5aea5567f0dc298330925 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 4 Feb 2016 22:15:56 +0800 Subject: [PATCH 187/276] Merge #163 --- .travis.yml | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/.travis.yml b/.travis.yml index d1935ca..b563278 100644 --- a/.travis.yml +++ b/.travis.yml @@ -4,7 +4,7 @@ rvm: before_script: - gem install awesome_bot script: - - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,data.gov.be,census.gov/acs/www/data_documentation/data_release_info/,europeansocialsurvey.org/data/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,wiki.earthdata.nasa.gov,missionlocal.org,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,cvcl.mit.edu,data.ohouston.org,ntrl.ntis.gov,networkdata.ics.uci.edu,sinda.crn2.inpe.br,archive.ics.uci.edu,hmpdacc,statista,moncton.ca,openflights - - site503=labrosa.ee.columbia.edu/millionsong,datamob.org,wikileaks - - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 + - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca + - site503=datamob.org,research.microsoft.com + - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 --set-timeout=5 From a467d56ac5dc731161d643f914bf4fef8832a295 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 4 Feb 2016 22:20:49 +0800 Subject: [PATCH 188/276] Clean format and thanks for every contribution in last days --- README.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/README.rst b/README.rst index 8373b72..082f666 100644 --- a/README.rst +++ b/README.rst @@ -100,13 +100,13 @@ Complex Networks * `Small Network Data `_ * `Stanford GraphBase (Steven Skiena) `_ * `Stanford Large Network Dataset Collection `_ +* `Stanford Longitudnal Network Data Sources `_ * `The Koblenz Network Collection `_ * `The Laboratory for Web Algorithmics (UNIMI) `_ * `The Nexus Network Repository `_ * `UCI Network Data Repository `_ * `UFL sparse matrix collection `_ * `WSU Graph Database `_ -* `Stanford Longitudnal Network Data Sources `_ Computer Networks @@ -221,6 +221,7 @@ GeoSpace/GIS * `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ * `Geo Wiki Project - Citizen-driven Environmental Monitoring `_ +* `GeoFabrik - OSM data extracted to a variety of formats and areas `_ * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ * `International Institute for Systems Analysis - GIS Datasets `_ @@ -231,7 +232,6 @@ GeoSpace/GIS * `OpenAddresses `_ * `OpenStreetMap (OSM) `_ * `Pleiades - Gazetteer and graph of ancient places `_ -* `GeoFabrik - OSM data extracted to a variety of formats and areas `_ * `Reverse Geocoder using OSM data `_ & `additional high-resolution data files `_ * `TIGER/Line - U.S. boundaries and roads `_ * `TwoFishes - Foursquare's coarse geocoder `_ @@ -383,6 +383,7 @@ Machine Learning * `eBay Online Auctions (2012) `_ * `IMDb Database `_ * `Keel Repository for classification, regression and time series `_ +* `Labeled Faces in the Wild (LFW) `_ * `Lending Club Loan Data `_ * `Machine Learning Data Set Repository `_ * `Million Song Dataset `_ @@ -393,7 +394,6 @@ Machine Learning * `Restaurants Health Score Data in San Francisco `_ * `UCI Machine Learning Repository `_ * `Yahoo! Ratings and Classification Data `_ -* `Labeled Faces in the Wild (LFW) `_ Museums @@ -596,9 +596,9 @@ Transportation * `NYC Taxi Trip Data 2009- `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Uber trip data April 2014 to September 2014 `_ +* `Open Traffic collection `_ * `OpenFlights - airport, airline and route data `_ * `Philadelphia Bike Share Stations (JSON) `_ -* `Open Traffic collection `_ * `Plane Crash Database, since 1920 `_ * `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ @@ -613,6 +613,7 @@ Transportation Complementary Collections ------------------------- +* `Data Packaged Core Datasets `_ * `Database of Scientific Code Contributions `_ * DataWrangling: `Some Datasets Available on the Web `_ * Inside-r: `Finding Data on the Internet `_ @@ -620,5 +621,4 @@ Complementary Collections * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ -* `Data Packaged Core Datasets `_ From 74fb770e3a51b426a2a656010ea8ff93d9e052e4 Mon Sep 17 00:00:00 2001 From: Brant Strand Date: Fri, 5 Feb 2016 14:25:29 -0800 Subject: [PATCH 189/276] Adding NCBI protein and taxonomy databases --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 082f666..cae2651 100644 --- a/README.rst +++ b/README.rst @@ -47,6 +47,8 @@ Biology * `International HapMap Project `_ * `Journal of Cell Biology DataViewer `_ * `MIT Cancer Genomics Data `_ +* `NCBI Proteins `_ +* `NCBI Taxonomy `_ * `NeuroData `_ * `NIH Microarray data `_ or `FTP `_ * `OpenSNP genotypes data `_ From a00a61fe4e1ebef31e17f1c7a0a21ea0d50d5395 Mon Sep 17 00:00:00 2001 From: Brant Strand Date: Fri, 5 Feb 2016 14:27:41 -0800 Subject: [PATCH 190/276] Adding UniProt proteins --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index cae2651..b10de7d 100644 --- a/README.rst +++ b/README.rst @@ -67,6 +67,7 @@ Biology * `The Catalogue of Life `_ * `The Personal Genome Project `_ or `PGP `_ * `UCSC Public Data `_ +* `Universal Protein Resource (UnitProt) `_ * `UniGene `_ From 31b6c3c0870129202b3ac286715800d953766d9d Mon Sep 17 00:00:00 2001 From: Diomidis Spinellis Date: Sun, 7 Feb 2016 12:36:29 +0200 Subject: [PATCH 191/276] Add Greece's government data site --- README.rst | 1 + 1 file changed, 1 insertion(+) mode change 100644 => 100755 README.rst diff --git a/README.rst b/README.rst old mode 100644 new mode 100755 index 082f666..fcb8afd --- a/README.rst +++ b/README.rst @@ -275,6 +275,7 @@ Government * `Germany `_ * `Ghent, Belgium `_ * `Glasgow, Scotland, UK `_ +* `Greece `_ * `Guardian world governments `_ * `Halifax, NS, Canada `_ * `Helsinki Region, Finland `_ From 0a0bf5b1e01808bc7de3e003ec870e14984d1a62 Mon Sep 17 00:00:00 2001 From: kenguish Date: Mon, 8 Feb 2016 03:47:32 +0800 Subject: [PATCH 192/276] Add Hong Kong (China) government data site --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 082f666..a0088df 100644 --- a/README.rst +++ b/README.rst @@ -278,6 +278,7 @@ Government * `Guardian world governments `_ * `Halifax, NS, Canada `_ * `Helsinki Region, Finland `_ +* `Hong Kong, China `_ * `Houston Open Data `_ * `Indian Government Data `_ * `Indonesian Data Portal `_ From 15be9e7fc06f67b4654ab4e7dae08aa172835505 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Mon, 8 Feb 2016 07:47:38 -0800 Subject: [PATCH 193/276] [travis] correct format for --set-timeout --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index b563278..aee2b88 100644 --- a/.travis.yml +++ b/.travis.yml @@ -7,4 +7,4 @@ script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca - site503=datamob.org,research.microsoft.com - - awesome_bot README.rst --allow-dupe --allow-redirect --white-list $site404,$whtlist,$site503 --set-timeout=5 + - awesome_bot README.rst --allow-dupe --allow-redirect --white-list --set-timeout 5 $site404,$whtlist,$site503 From 361498e759c2a4110b0564604ec8364ad7aac681 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Mon, 8 Feb 2016 07:53:46 -0800 Subject: [PATCH 194/276] [travis] fix typo --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index aee2b88..1295262 100644 --- a/.travis.yml +++ b/.travis.yml @@ -7,4 +7,4 @@ script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca - site503=datamob.org,research.microsoft.com - - awesome_bot README.rst --allow-dupe --allow-redirect --white-list --set-timeout 5 $site404,$whtlist,$site503 + - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --white-list $site404,$whtlist,$site503 From 2454028eb06a660f95a4f5c1fc74b0446a8764bd Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Mon, 8 Feb 2016 07:57:30 -0800 Subject: [PATCH 195/276] [travis] white list update --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 1295262..547031f 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia - site503=datamob.org,research.microsoft.com - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --white-list $site404,$whtlist,$site503 From 46e601cfa3481b47725ed15011449a76cbdcee6a Mon Sep 17 00:00:00 2001 From: HashirZahir Date: Tue, 9 Feb 2016 12:19:23 +0800 Subject: [PATCH 196/276] Added Basketball Player Database and Statistics --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index a3cd554..6985f3f 100755 --- a/README.rst +++ b/README.rst @@ -568,6 +568,7 @@ Social Sciences Sports ------ +* `Basketball (NBA/NCAA/Euro) Player Database and Statistics `_ * `Betfair Historical Exchange Data `_ * `Cricsheet Matches (cricket) `_ * `Ergast Formula 1, from 1950 up to date (API) `_ From 299dd2c9522eab9ddcb2722905177cea84449d8c Mon Sep 17 00:00:00 2001 From: Damiano Spina Date: Tue, 9 Feb 2016 23:33:36 +1100 Subject: [PATCH 197/276] Adding 'Twitter Data for Online Reputation Management' Added the RepLab 2013 dataset into the 'Social Networks' category --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index a3cd554..f38d835 100755 --- a/README.rst +++ b/README.rst @@ -517,6 +517,7 @@ Social Networks * `Social Twitter Data `_ * `SourceForge.net Research Data `_ * `Twitter Data for Sentiment Analysis `_ +* `Twitter Data for Online Reputation Management `_ * `Twitter Graph of entire Twitter site `_ * `Twitter Scrape Calufa May 2011 `_ * `UNIMI/LAW Social Network Datasets `_ From 39abe366703ebfd1b63c400b87090d176d2226c2 Mon Sep 17 00:00:00 2001 From: pdeardorff-r7 Date: Tue, 9 Feb 2016 21:04:27 -0800 Subject: [PATCH 198/276] Add Rapid7 Sonar internet scans --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index a3cd554..9cb2782 100755 --- a/README.rst +++ b/README.rst @@ -124,6 +124,7 @@ Computer Networks * `CRAWDAD Wireless datasets from Dartmouth Univ. `_ * `Criteo click-through data `_ * `Open Mobile Data by MobiPerf `_ +* `Rapid7 Sonar Internet Scans `_ * `UCSD Network Telescope, IPv4 /8 net `_ From a8d357192b02924e021b2be272f5311909502940 Mon Sep 17 00:00:00 2001 From: Van-Duyet Le Date: Wed, 10 Feb 2016 12:09:46 +0700 Subject: [PATCH 199/276] Add Bruteforce Database --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index a3cd554..7f75f41 100755 --- a/README.rst +++ b/README.rst @@ -148,7 +148,7 @@ Data Challenges * `Space Apps Challenge `_ * `Telecom Italia Big Data Challenge `_ * `Yelp Dataset Challenge `_ - +* `Bruteforce Database `_ Economics --------- From ea894f47d169cd5eb41d94f6891f9c73b82fb8ff Mon Sep 17 00:00:00 2001 From: Van-Duyet Le Date: Wed, 10 Feb 2016 12:29:53 +0700 Subject: [PATCH 200/276] Update .travis.yml --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 547031f..354bbb4 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia,www.laval.ca,www.wunderground.com - site503=datamob.org,research.microsoft.com - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --white-list $site404,$whtlist,$site503 From 2d0d9c9ca766c3b6253e6a966f8d1d78f3a107ec Mon Sep 17 00:00:00 2001 From: Van-Duyet Le Date: Wed, 10 Feb 2016 12:34:30 +0700 Subject: [PATCH 201/276] Update .travis.yml --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 354bbb4..4cdd1dc 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia,www.laval.ca,www.wunderground.com + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia,www.laval.ca,www.wunderground.com,data.lexingtonky.gov - site503=datamob.org,research.microsoft.com - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --white-list $site404,$whtlist,$site503 From 734dc4a40721d3fda78cd20911641d9244876462 Mon Sep 17 00:00:00 2001 From: "M. Valdes" Date: Wed, 10 Feb 2016 03:09:40 -0300 Subject: [PATCH 202/276] add Chile Open Data to README.rst --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index a3cd554..c8cbc7d 100755 --- a/README.rst +++ b/README.rst @@ -263,6 +263,7 @@ Government * `Cambridge, MA, US `_ * `Canada `_ * `Chicago `_ +* `Chile `_ * `Dallas Open Data `_ * `DataBC - data from the Province of British Columbia `_ * `Denver Open Data `_ From 18f0b961bff5958234fc3801e41d57210a0e372e Mon Sep 17 00:00:00 2001 From: shai harel Date: Wed, 10 Feb 2016 17:39:44 +0200 Subject: [PATCH 203/276] Update README.rst added Adience ASLAN and violent flow DATASETES --- README.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index a3cd554..9d0d0c9 100755 --- a/README.rst +++ b/README.rst @@ -378,7 +378,9 @@ Image Processing * `SUN database, MIT `_ * `The Oxford-IIIT Pet Dataset `_ * `YouTube Faces Database `_ - +* `Adience Unfiltered faces for gender and age classification `_ +* `The Action Similarity Labeling (ASLAN) Challenge `_ +* `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ Machine Learning ---------------- From 71d2854ec55381d9807f1b981341b6b2be47902a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Panisson?= Date: Wed, 10 Feb 2016 16:45:43 +0100 Subject: [PATCH 204/276] Add High-Resolution Contact Networks from Wearable Sensors --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index a3cd554..670d732 100755 --- a/README.rst +++ b/README.rst @@ -510,6 +510,7 @@ Social Networks * `GetGlue - users rating TV shows `_ * `GitHub Collaboration Archive `_ * `Google Scholar citation relations `_ +* `High-Resolution Contact Networks from Wearable Sensors `_ * `Mobile Social Networks from UMASS `_ * `Network Twitter Data `_ * `Reddit Comments `_ From 28765b8cbca34c69a54c82b40a41f2e025e2268f Mon Sep 17 00:00:00 2001 From: Robert Porsch Date: Thu, 11 Feb 2016 16:34:14 +0800 Subject: [PATCH 205/276] Added data available from the Psychiatric Genomics Consortium --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index a3cd554..e917338 100755 --- a/README.rst +++ b/README.rst @@ -54,6 +54,7 @@ Biology * `OpenSNP genotypes data `_ * `Pathguid - Protein-Protein Interactions Catalog `_ * `Protein Data Bank `_ +* `Psychiatric Genomics Consortium `_ * `PubChem Project `_ * `PubGene (now Coremine Medical) `_ * `Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) `_ From 2467b46057bae726cbdad6a025469724bfbc0363 Mon Sep 17 00:00:00 2001 From: Dmitri Suvorov Date: Sat, 13 Feb 2016 00:25:00 +0200 Subject: [PATCH 206/276] Added Moldova government data site --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index a3cd554..d16897d 100755 --- a/README.rst +++ b/README.rst @@ -296,6 +296,7 @@ Government * `MassGIS, Massachusetts, U.S. `_ * `Mexico `_ * `Missisauga, ON, Canada `_ +* `Moldova `_ * `Moncton, NB, Canada `_ * `Montreal, QC, Canada `_ * `Netherlands `_ From 9a18e153b2ce2ce4126307e6e7f1e484459e4cbe Mon Sep 17 00:00:00 2001 From: andycheng Date: Sat, 13 Feb 2016 18:18:13 +0800 Subject: [PATCH 207/276] Datasets from Taiwan added --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index a3cd554..1a82045 100755 --- a/README.rst +++ b/README.rst @@ -324,6 +324,8 @@ Government * `South Africa Trade Statistics `_ * `State of Utah, US `_ * `Switzerland `_ +* `Taiwan `_ +* `Taiwan g0v `_ * `Texas Open Data `_ * `The World Bank `_ * `Toronto, ON, Canada `_ From 9dd7a97da3cdac77ff257c12acc58770f3a6413a Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sun, 14 Feb 2016 01:09:49 +0800 Subject: [PATCH 208/276] Merge #189 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 01e58d9..bbbe7c4 100755 --- a/README.rst +++ b/README.rst @@ -232,6 +232,7 @@ GeoSpace/GIS * `International Institute for Systems Analysis - GIS Datasets `_ * `Landsat 8 on AWS `_ * `List of all countries in all languages `_ +* `Marinexplore - Open Oceanographic Data `_ * `National Weather Service GIS Data Portal `_ * `Natural Earth - vectors and rasters of the world `_ * `OpenAddresses `_ From fb909aa46fd31b7041c16f4eee71dfe413a56aea Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sun, 14 Feb 2016 01:18:12 +0800 Subject: [PATCH 209/276] Move ArchiveIt! to PublicDomains; --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index bbbe7c4..333d65f 100755 --- a/README.rst +++ b/README.rst @@ -466,6 +466,7 @@ Public Domains -------------- * `Amazon `_ +* `Archive-it from Internet Archive `_ * `Archive.org Datasets `_ * `CMU JASA data archive `_ * `CMU StatLab collections `_ @@ -476,6 +477,7 @@ Public Domains * `KDNuggets Data Collections `_ * `Microsoft Azure Data Market Free DataSets `_ * `Numbray `_ +* `Open Library Data Dumps `_ * `Reddit Datasets `_ * `RevolutionAnalytics Collection `_ * `Sample R data sets `_ @@ -492,7 +494,6 @@ Search Engines -------------- * `Academic Torrents of data sharing from UMB `_ -* `Archive-it from Internet Archive `_ * `Datahub.io `_ * `DataMarket (Qlik) `_ * `Harvard Dataverse Network of scientific data `_ From 38ecc63b95aae7b510f9975a3e718fbfc4f75a44 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sun, 14 Feb 2016 01:25:23 +0800 Subject: [PATCH 210/276] Change GeoSpace/GIS to GIS/Environment; Add IMOS data; --- README.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index 333d65f..3324b92 100755 --- a/README.rst +++ b/README.rst @@ -217,8 +217,8 @@ Geology * `USGS Earthquake Archives `_ -GeoSpace/GIS ------------- +GIS/Environment +--------------- * `BODC - marine data of ~22K vars `_ * `Cambridge, MA, US, GIS data on GitHub `_ @@ -229,6 +229,7 @@ GeoSpace/GIS * `GeoFabrik - OSM data extracted to a variety of formats and areas `_ * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ +* `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements `_ or `on S3 `_ * `International Institute for Systems Analysis - GIS Datasets `_ * `Landsat 8 on AWS `_ * `List of all countries in all languages `_ @@ -246,7 +247,6 @@ GeoSpace/GIS * `World boundaries from the U.S. Department of State `_ * `World countries in multiple formats `_ - Government ---------- From c9a3a0affc6aea95d3a9dd03e36a89e04ba2c551 Mon Sep 17 00:00:00 2001 From: anatoly techtonik Date: Sun, 14 Feb 2016 07:12:18 +0300 Subject: [PATCH 211/276] Add Crystallography Open Database --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 3324b92..28c91b2 100755 --- a/README.rst +++ b/README.rst @@ -451,6 +451,7 @@ Physics ------- * `CERN Open Data Portal `_ +* `Crystallography Open Database `_ * `NASA Exoplanet Archive `_ * `NSSDC (NASA) data of 550 space spacecraft `_ * `Sloan Digital Sky Survey (SDSS) - Mapping the Universe `_ From b259eb2a3f5e99ce622eac08c1e37a082862cb16 Mon Sep 17 00:00:00 2001 From: Prayag Verma Date: Sun, 14 Feb 2016 23:07:37 +0530 Subject: [PATCH 212/276] Fix typos MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `Interations` → `Interactions` `Longitudnal` → `Longitudinal` --- README.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index 28c91b2..ca9ba09 100755 --- a/README.rst +++ b/README.rst @@ -39,7 +39,7 @@ Biology * `Ensembl Genomes `_ * `Gene Expression Omnibus (GEO) `_ * `Gene Ontology (GO) `_ -* `Global Biotic Interations (GloBI) `_ +* `Global Biotic Interactions (GloBI) `_ * `Harvard Medical School (HMS) LINCS Project `_ * `Human Genome Diversity Project `_ * `Human Microbiome Project (HMP) `_ @@ -104,7 +104,7 @@ Complex Networks * `Small Network Data `_ * `Stanford GraphBase (Steven Skiena) `_ * `Stanford Large Network Dataset Collection `_ -* `Stanford Longitudnal Network Data Sources `_ +* `Stanford Longitudinal Network Data Sources `_ * `The Koblenz Network Collection `_ * `The Laboratory for Web Algorithmics (UNIMI) `_ * `The Nexus Network Repository `_ From feb840727c94ab2798a78da418fb5346dfad1eba Mon Sep 17 00:00:00 2001 From: Megan Squire Date: Sun, 14 Feb 2016 12:58:38 -0500 Subject: [PATCH 213/276] Update README.rst Added FLOSSmole 60,000 data sets about free, libre, and open source software development practices with corrected link --- README.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.rst b/README.rst index 28c91b2..79e5413 100755 --- a/README.rst +++ b/README.rst @@ -578,6 +578,11 @@ Social Sciences * `WorldPop project - Worldwide human population distributions `_ +Software +-------- + +* `FLOSSmole data about free, libre, and open source software development `_ + Sports ------ From dea18ce15828c603b2f7960d2f05e83c583d6dd9 Mon Sep 17 00:00:00 2001 From: lukeleslie Date: Fri, 19 Feb 2016 17:32:46 -0600 Subject: [PATCH 214/276] Add Road Networks source to Complex Networks. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index cf519e4..ed79191 100755 --- a/README.rst +++ b/README.rst @@ -111,7 +111,7 @@ Complex Networks * `UCI Network Data Repository `_ * `UFL sparse matrix collection `_ * `WSU Graph Database `_ - +* `DIMACS Road Networks Collection `_ Computer Networks ----------------- From abd28a9836aa6e90908f53405bb797eac1e77fa1 Mon Sep 17 00:00:00 2001 From: Ron Date: Wed, 24 Feb 2016 15:21:28 -0800 Subject: [PATCH 215/276] added network repository to complex networks --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index ed79191..890fc60 100755 --- a/README.rst +++ b/README.rst @@ -97,6 +97,7 @@ Complex Networks * `CrossRef DOI URLs `_ * `DBLP Citation dataset `_ * `NBER Patent Citations `_ +* `Network Repository with Interactive Exploratory Analysis Tools `_ * `NIST complex networks data collection `_ * `Protein-protein interaction network `_ * `PyPI and Maven Dependency Network `_ From 08e3bda416791444527665d951efeb0b2320920a Mon Sep 17 00:00:00 2001 From: Alex Urquhart Date: Thu, 25 Feb 2016 05:48:48 -0500 Subject: [PATCH 216/276] Added HIFLD GIS data Homeland Infrastructure Foundation-Level Data - https://hifld-dhs-gii.opendata.arcgis.com/ --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index ed79191..6495013 100755 --- a/README.rst +++ b/README.rst @@ -229,6 +229,7 @@ GIS/Environment * `GeoFabrik - OSM data extracted to a variety of formats and areas `_ * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ +* `Homeland Infrastructure Foundation-Level Data `_ * `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements `_ or `on S3 `_ * `International Institute for Systems Analysis - GIS Datasets `_ * `Landsat 8 on AWS `_ From ddc77bdf6974f5831dc4e31bc4eba10f4133b9d0 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Thu, 25 Feb 2016 19:28:36 +0800 Subject: [PATCH 217/276] Add AMiner Citation Network Dataset --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 6f328a8..7f3123f 100755 --- a/README.rst +++ b/README.rst @@ -94,6 +94,7 @@ Climate/Weather Complex Networks ---------------- +* `AMiner Citation Network Dataset `_ * `CrossRef DOI URLs `_ * `DBLP Citation dataset `_ * `NBER Patent Citations `_ From f85d5195898379687720f9a31f259b5c78da98c0 Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Thu, 25 Feb 2016 15:40:02 -0800 Subject: [PATCH 218/276] [travis] allow timeout --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 4cdd1dc..1abe2b9 100644 --- a/.travis.yml +++ b/.travis.yml @@ -7,4 +7,4 @@ script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia,www.laval.ca,www.wunderground.com,data.lexingtonky.gov - site503=datamob.org,research.microsoft.com - - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --white-list $site404,$whtlist,$site503 + - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --allow-timeout --white-list $site404,$whtlist,$site503 From febb09ef8be478a4cf96a8b55393b95ddfcaad7b Mon Sep 17 00:00:00 2001 From: ReadmeCritic Date: Thu, 25 Feb 2016 15:41:05 -0800 Subject: [PATCH 219/276] [travis] white lis arcgis,bixi --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index 1abe2b9..d4709b6 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,6 +5,6 @@ before_script: - gem install awesome_bot script: - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia,www.laval.ca,www.wunderground.com,data.lexingtonky.gov + - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia,www.laval.ca,www.wunderground.com,data.lexingtonky.gov,arcgis,bixi - site503=datamob.org,research.microsoft.com - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --allow-timeout --white-list $site404,$whtlist,$site503 From 5c553144274240164a58ac69db168d1afb951d7d Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Fri, 26 Feb 2016 11:06:00 +0800 Subject: [PATCH 220/276] Add OpenDataSoft's portal list #208; Move collected government to separated file to make the list short and clean. --- Government.rst | 103 +++++++++++++++++++++++++++++++++++++++++++++++++ README.rst | 102 +----------------------------------------------- 2 files changed, 105 insertions(+), 100 deletions(-) create mode 100644 Government.rst diff --git a/Government.rst b/Government.rst new file mode 100644 index 0000000..26555da --- /dev/null +++ b/Government.rst @@ -0,0 +1,103 @@ +Government +---------- + +* `Alberta, Province of Canada `_ +* `Antwerp, Belgium `_ +* `Argentina (non official) `_ +* `Argentina `_ +* `Austin, TX, US `_ +* `Australia (abs.gov.au) `_ +* `Australia (data.gov.au) `_ +* `Austria (data.gv.at) `_ +* `Baton Rouge, LA, US `_ +* `Belgium `_ +* `Brazil `_ +* `Buenos Aires, Argentina `_ +* `Calgary, AB, Canada `_ +* `Cambridge, MA, US `_ +* `Canada `_ +* `Chicago `_ +* `Chile `_ +* `Dallas Open Data `_ +* `DataBC - data from the Province of British Columbia `_ +* `Denver Open Data `_ +* `Durham, NC Open Data `_ +* `Edmonton, AB, Canada `_ +* `England LGInform `_ +* `EuroStat `_ +* `FedStats `_ +* `Finland `_ +* `France `_ +* `Fredericton, NB, Canada `_ +* `Gatineau, QC, Canada `_ +* `Germany `_ +* `Ghent, Belgium `_ +* `Glasgow, Scotland, UK `_ +* `Greece `_ +* `Guardian world governments `_ +* `Halifax, NS, Canada `_ +* `Helsinki Region, Finland `_ +* `Hong Kong, China `_ +* `Houston Open Data `_ +* `Indian Government Data `_ +* `Indonesian Data Portal `_ +* `Ireland's Open Data Portal `_ +* `Japan `_ +* `Laval, QC, Canada `_ +* `Lexington, KY `_ +* `London Datastore, UK `_ +* `London, ON, Canada `_ +* `Los Angeles Open Data `_ +* `MassGIS, Massachusetts, U.S. `_ +* `Mexico `_ +* `Missisauga, ON, Canada `_ +* `Moldova `_ +* `Moncton, NB, Canada `_ +* `Montreal, QC, Canada `_ +* `Netherlands `_ +* `New Zealand `_ +* `NYC betanyc `_ +* `NYC Open Data `_ +* `OECD `_ +* `Oklahoma `_ +* `Open Government Data (OGD) Platform India `_ +* `Oregon `_ +* `Ottawa, ON, Canada `_ +* `Portland, Oregon `_ +* `Portugal - Pordata organization `_ +* `Puerto Rico Government `_ +* `Quebec City, QC, Canada `_ +* `Quebec Province of Canada `_ +* `Regina SK, Canada `_ +* `Rio de Janeiro, Brazil `_ +* `Romania `_ +* `Russia `_ +* `San Francisco Data sets `_ +* `Saskatchewan, Province of Canada `_ +* `Seattle `_ +* `Singapore Government Data `_ +* `South Africa `_ +* `South Africa Trade Statistics `_ +* `State of Utah, US `_ +* `Switzerland `_ +* `Taiwan `_ +* `Taiwan g0v `_ +* `Texas Open Data `_ +* `The World Bank `_ +* `Toronto, ON, Canada `_ +* `U.K. Government Data `_ +* `U.S. American Community Survey `_ +* `U.S. CDC Public Health datasets `_ +* `U.S. Census Bureau `_ +* `U.S. Department of Housing and Urban Development (HUD) `_ +* `U.S. Federal Government Agencies `_ +* `U.S. Federal Government Data Catalog `_ +* `U.S. Food and Drug Administration (FDA) `_ +* `U.S. National Center for Education Statistics (NCES) `_ +* `U.S. Open Government `_ +* `UK 2011 Census Open Atlas Project `_ +* `United Nations `_ +* `Uruguay `_ +* `Vancouver, BC Open Data Catalog `_ +* `Victoria, BC, Canada `_ +* `Vienna, Austria `_ \ No newline at end of file diff --git a/README.rst b/README.rst index 7f3123f..956c36e 100755 --- a/README.rst +++ b/README.rst @@ -253,106 +253,8 @@ GIS/Environment Government ---------- -* `Alberta, Province of Canada `_ -* `Antwerp, Belgium `_ -* `Argentina (non official) `_ -* `Argentina `_ -* `Austin, TX, US `_ -* `Australia (abs.gov.au) `_ -* `Australia (data.gov.au) `_ -* `Austria (data.gv.at) `_ -* `Baton Rouge, LA, US `_ -* `Belgium `_ -* `Brazil `_ -* `Buenos Aires, Argentina `_ -* `Calgary, AB, Canada `_ -* `Cambridge, MA, US `_ -* `Canada `_ -* `Chicago `_ -* `Chile `_ -* `Dallas Open Data `_ -* `DataBC - data from the Province of British Columbia `_ -* `Denver Open Data `_ -* `Durham, NC Open Data `_ -* `Edmonton, AB, Canada `_ -* `England LGInform `_ -* `EuroStat `_ -* `FedStats `_ -* `Finland `_ -* `France `_ -* `Fredericton, NB, Canada `_ -* `Gatineau, QC, Canada `_ -* `Germany `_ -* `Ghent, Belgium `_ -* `Glasgow, Scotland, UK `_ -* `Greece `_ -* `Guardian world governments `_ -* `Halifax, NS, Canada `_ -* `Helsinki Region, Finland `_ -* `Hong Kong, China `_ -* `Houston Open Data `_ -* `Indian Government Data `_ -* `Indonesian Data Portal `_ -* `Ireland's Open Data Portal `_ -* `Japan `_ -* `Laval, QC, Canada `_ -* `Lexington, KY `_ -* `London Datastore, UK `_ -* `London, ON, Canada `_ -* `Los Angeles Open Data `_ -* `MassGIS, Massachusetts, U.S. `_ -* `Mexico `_ -* `Missisauga, ON, Canada `_ -* `Moldova `_ -* `Moncton, NB, Canada `_ -* `Montreal, QC, Canada `_ -* `Netherlands `_ -* `New Zealand `_ -* `NYC betanyc `_ -* `NYC Open Data `_ -* `OECD `_ -* `Oklahoma `_ -* `Open Government Data (OGD) Platform India `_ -* `Oregon `_ -* `Ottawa, ON, Canada `_ -* `Portland, Oregon `_ -* `Portugal - Pordata organization `_ -* `Puerto Rico Government `_ -* `Quebec City, QC, Canada `_ -* `Quebec Province of Canada `_ -* `Regina SK, Canada `_ -* `Rio de Janeiro, Brazil `_ -* `Romania `_ -* `Russia `_ -* `San Francisco Data sets `_ -* `Saskatchewan, Province of Canada `_ -* `Seattle `_ -* `Singapore Government Data `_ -* `South Africa `_ -* `South Africa Trade Statistics `_ -* `State of Utah, US `_ -* `Switzerland `_ -* `Taiwan `_ -* `Taiwan g0v `_ -* `Texas Open Data `_ -* `The World Bank `_ -* `Toronto, ON, Canada `_ -* `U.K. Government Data `_ -* `U.S. American Community Survey `_ -* `U.S. CDC Public Health datasets `_ -* `U.S. Census Bureau `_ -* `U.S. Department of Housing and Urban Development (HUD) `_ -* `U.S. Federal Government Agencies `_ -* `U.S. Federal Government Data Catalog `_ -* `U.S. Food and Drug Administration (FDA) `_ -* `U.S. National Center for Education Statistics (NCES) `_ -* `U.S. Open Government `_ -* `UK 2011 Census Open Atlas Project `_ -* `United Nations `_ -* `Uruguay `_ -* `Vancouver, BC Open Data Catalog `_ -* `Victoria, BC, Canada `_ -* `Vienna, Austria `_ +* `OpenDataSoft's list of 1,600 open data portals `_ +* `A list of cities and countries contributed by community `_ Healthcare From a355d0ef933b403a0106e203b0de81fb728285b7 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Fri, 26 Feb 2016 11:14:07 +0800 Subject: [PATCH 221/276] Clean TOC --- README.rst | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 956c36e..dee2bf2 100755 --- a/README.rst +++ b/README.rst @@ -13,10 +13,7 @@ Other amazingly awesome lists can be found in the `awesome-awesomeness `_ and `sindresorhus's awesome `_ list. - -Contents ----------- -.. contents:: +.. contents:: Table of Contents Agriculture From 0f850530464e6cae74c75375abeba21280d6e193 Mon Sep 17 00:00:00 2001 From: David Dao Date: Fri, 18 Mar 2016 09:36:16 -0400 Subject: [PATCH 222/276] Adding Broad Bioimage Benchmark Collection (BBBC) The Broad Bioimage Benchmark Collection (BBBC) is a large curated collection of published data sets in bio imaging. It includes all the images, metadata and ground truths. The BBBC resource is described in the following publication: Ljosa V, Sokolnicki KL, Carpenter AE (2012). Annotated high-throughput microscopy image sets for validation. Nature Methods 9(7):637 / doi. PMID: 22743765 PMCID: PMC3627348. Available at http://dx.doi.org/10.1038/nmeth.2083 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index dee2bf2..0aa0cdd 100755 --- a/README.rst +++ b/README.rst @@ -27,6 +27,7 @@ Biology * `1000 Genomes `_ * `American Gut (Microbiome Project) `_ * `Broad Cancer Cell Line Encyclopedia (CCLE) `_ +* `Broad Bioimage Benchmark Collection (BBBC) `_ * `Cell Image Library `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `Complete Genomics Public Data `_ From 8a09814e7778b54bb1ea5ed70e9c2fca242c6143 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Fri, 15 Apr 2016 14:02:08 +0800 Subject: [PATCH 223/276] Add EMPIAR to bio. cat #215 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 0aa0cdd..d63864e 100755 --- a/README.rst +++ b/README.rst @@ -33,6 +33,7 @@ Biology * `Complete Genomics Public Data `_ * `EBI ArrayExpress `_ * `EBI Protein Data Bank in Europe `_ +* `Electron Microscopy Pilot Image Archive (EMPIAR) `_ * `ENCODE project `_ * `Ensembl Genomes `_ * `Gene Expression Omnibus (GEO) `_ From b59f3bbb6503e9bfca3d2611a8cd512bcc3e320f Mon Sep 17 00:00:00 2001 From: Pierre Fenoll Date: Tue, 26 Apr 2016 20:54:35 +0200 Subject: [PATCH 224/276] Add NYSE --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index d63864e..11748fc 100755 --- a/README.rst +++ b/README.rst @@ -208,6 +208,7 @@ Finance * `Quandl `_ * `St Louis Federal `_ * `Yahoo Finance `_ +* `NYSE Market Data `_ Geology From 4400bf5a80b1b81e06acfcdbdf6fdac4c5e2dd05 Mon Sep 17 00:00:00 2001 From: Jack Kelly Date: Wed, 8 Jun 2016 13:19:18 +0100 Subject: [PATCH 225/276] Update README.rst Adding more Energy datasets. And fixing capitalisation for UK-DALE and PLAID --- README.rst | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index 11748fc..247a1e0 100755 --- a/README.rst +++ b/README.rst @@ -187,13 +187,18 @@ Energy * `BLUEd `_ * `COMBED `_ * `Dataport `_ +* `DRED `_ * `ECO `_ * `EIA `_ +* `HES `_ - Household Electricity Study, UK * `HFED `_ * `iAWE `_ -* `Plaid `_ +* `PLAID `_ - the Plug Load Appliance Identification Dataset * `REDD `_ -* `UK-Dale `_ +* `Tracebase `_ +* `UK-DALE `_ - UK Domestic Appliance-Level Electricity +* `WHITED `_ + Finance From 2f40e980d27a8ced2274bdbb2244f25d026b9fe2 Mon Sep 17 00:00:00 2001 From: John Pellman Date: Thu, 23 Jun 2016 05:24:21 -0400 Subject: [PATCH 226/276] Added Brain Catalogue. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 11748fc..f337771 100755 --- a/README.rst +++ b/README.rst @@ -26,6 +26,7 @@ Biology * `1000 Genomes `_ * `American Gut (Microbiome Project) `_ +* `Brain Catalogue `_ * `Broad Cancer Cell Line Encyclopedia (CCLE) `_ * `Broad Bioimage Benchmark Collection (BBBC) `_ * `Cell Image Library `_ From 7e00e1a52b09d80d59a99bc3144cae4f3e9e0da4 Mon Sep 17 00:00:00 2001 From: John Pellman Date: Mon, 4 Jul 2016 11:05:14 -0400 Subject: [PATCH 227/276] Neuroscience data added; new section for neuroscience --- README.rst | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index f337771..25de67c 100755 --- a/README.rst +++ b/README.rst @@ -26,11 +26,9 @@ Biology * `1000 Genomes `_ * `American Gut (Microbiome Project) `_ -* `Brain Catalogue `_ * `Broad Cancer Cell Line Encyclopedia (CCLE) `_ * `Broad Bioimage Benchmark Collection (BBBC) `_ * `Cell Image Library `_ -* `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `Complete Genomics Public Data `_ * `EBI ArrayExpress `_ * `EBI Protein Data Bank in Europe `_ @@ -49,7 +47,6 @@ Biology * `MIT Cancer Genomics Data `_ * `NCBI Proteins `_ * `NCBI Taxonomy `_ -* `NeuroData `_ * `NIH Microarray data `_ or `FTP `_ * `OpenSNP genotypes data `_ * `Pathguid - Protein-Protein Interactions Catalog `_ @@ -63,7 +60,6 @@ Biology * `Stanford Microarray Data `_ * `Stowers Institute Original Data Repository `_ * `Systems Science of Biological Dynamics (SSBD) Database `_ -* `Temple University Hospital EEG Database `_ * `The Cancer Genome Atlas (TCGA), available via Broad GDAC `_ * `The Catalogue of Life `_ * `The Personal Genome Project `_ or `PGP `_ @@ -352,6 +348,23 @@ Natural Language * `Wikipedia Links data - 40 Million Entities in Context `_ * `WordNet databases and tools `_ +Neuroscience +------------- + +* `Allen Institute Datasets `_ +* `Brain Catalogue `_ +* `Brainomics `_ +* `CodeNeuro Datasets `_ +* `Collaborative Research in Computational Neuroscience (CRCNS) `_ +* `FCP-INDI `_ +* `Human Connectome Project `_ +* `NDAR `_ +* `NIMH Data Archive `_ +* `NeuroData `_ +* `OASIS `_ +* `OpenfMRI `_ +* `Neuroelectro `_ +* `Study Forrest `_ Physics ------- From a3bde36abbb7192bc27b64849dc051218c35ee3c Mon Sep 17 00:00:00 2001 From: Alexandre Rademaker Date: Tue, 5 Jul 2016 05:34:44 -0300 Subject: [PATCH 228/276] wordnet and the corpora from UD project --- README.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 11748fc..bf567ac 100755 --- a/README.rst +++ b/README.rst @@ -349,8 +349,10 @@ Natural Language * `USENET postings corpus of 2005~2011 `_ * `Wikidata - Wikipedia databases `_ * `Wikipedia Links data - 40 Million Entities in Context `_ +* `Universal Dependencies `_ * `WordNet databases and tools `_ - +* `Open Multilingual Wordnet `_ + Physics ------- From af605c3869628da629ec19b6d4605fe8fec4718f Mon Sep 17 00:00:00 2001 From: handmadeby Date: Thu, 7 Jul 2016 14:33:06 +0100 Subject: [PATCH 229/276] Updated TFL to current API link. The Transport for London API link was pointing to a legacy page - I updated to the current valid page. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 11748fc..d24f52b 100755 --- a/README.rst +++ b/README.rst @@ -532,7 +532,7 @@ Transportation * `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ * `Toronto Bike Share Stations (XML file) `_ -* `Transport for London (TFL) `_ +* `Transport for London (TFL) `_ * `Travel Tracker Survey (TTS) for Chicago `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ * `U.S. Domestic Flights 1990 to 2009 `_ From 21ffee83e3926fbf3d397d8bb230985e06c1dc4a Mon Sep 17 00:00:00 2001 From: Haochi Kiang Date: Wed, 20 Jul 2016 10:39:51 +0800 Subject: [PATCH 230/276] Added Uppsala Conflict Data Program "The Uppsala Conflict Data Program (UCDP) offers a number of datasets on organised violence and peacemaking, all of which can be downloaded for free through the links below." --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 11748fc..549b3f7 100755 --- a/README.rst +++ b/README.rst @@ -475,6 +475,7 @@ Social Sciences * `Texas Inmates Executed Since 1984 `_ * `Titanic Survival Data Set `_ * `UCB's Archive of Social Science Data (D-Lab) `_ +* `Uppsala Conflict Data Program `_ * `UCLA Social Sciences Data Archive `_ * `UN Civil Society Database `_ * `Universities Worldwide `_ From 2bf5f661f48801bcbcd5ffa4e160d1bd606b5500 Mon Sep 17 00:00:00 2001 From: Scott Sievert Date: Fri, 22 Jul 2016 10:52:48 -0500 Subject: [PATCH 231/276] adds caption contest dataset --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 11748fc..75624b5 100755 --- a/README.rst +++ b/README.rst @@ -307,6 +307,7 @@ Machine Learning * `Machine Learning Data Set Repository `_ * `Million Song Dataset `_ * `More Song Datasets `_ +* `New Yorker caption contest ratings `_ * `MovieLens Data Sets `_ * `RDataMining - "R and Data Mining" ebook data `_ * `Registered Meteorites on Earth `_ From 9bb6ab1e8919e0aefb9a4c33fa3b95fcbf09b95c Mon Sep 17 00:00:00 2001 From: jeremie Date: Wed, 10 Aug 2016 11:04:50 +0200 Subject: [PATCH 232/276] Fix broken link: Netflix prize --- README.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 11748fc..6e64370 100755 --- a/README.rst +++ b/README.rst @@ -126,7 +126,7 @@ Computer Networks * `CRAWDAD Wireless datasets from Dartmouth Univ. `_ * `Criteo click-through data `_ * `Open Mobile Data by MobiPerf `_ -* `Rapid7 Sonar Internet Scans `_ +* `Rapid7 Sonar Internet Scans `_ * `UCSD Network Telescope, IPv4 /8 net `_ @@ -147,7 +147,7 @@ Data Challenges * `Kaggle Competition Data `_ * `KDD Cup by Tencent 2012 `_ * `Localytics Data Visualization Challenge `_ -* `Netflix Prize `_ +* `Netflix Prize `_ * `Space Apps Challenge `_ * `Telecom Italia Big Data Challenge `_ * `Yelp Dataset Challenge `_ @@ -268,7 +268,7 @@ Healthcare * `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ -* `OpenPaymentsData, Healthcare financial relationship data `_ +* `OpenPaymentsData, Healthcare financial relationship data `_ * `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ * `World Health Organization Global Health Observatory `_ @@ -550,4 +550,3 @@ Complementary Collections * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ - From 71d9c2466db3704a409d43cbebc6f43c6da18230 Mon Sep 17 00:00:00 2001 From: Sammy X Chen Date: Thu, 11 Aug 2016 10:45:55 +0800 Subject: [PATCH 233/276] add International Economics Database --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 11748fc..8c2f9f2 100755 --- a/README.rst +++ b/README.rst @@ -160,6 +160,7 @@ Economics * `EconData from UMD `_ * `Economic Freedom of the World Data `_ * `Historical MacroEconomc Statistics `_ +* `International Economics Database `_ and `various data tools `_ * `International Trade Statistics `_ * `Internet Product Code Database `_ * `Joint External Debt Data Hub `_ From 86fe0cf6dcc5f4c1c1ad5fd628dbd0ba91dfdeae Mon Sep 17 00:00:00 2001 From: Sammy X Chen Date: Thu, 11 Aug 2016 10:51:08 +0800 Subject: [PATCH 234/276] add AWC --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 8c2f9f2..835e9b3 100755 --- a/README.rst +++ b/README.rst @@ -75,6 +75,7 @@ Climate/Weather --------------- * `Australian Weather `_ +* `Aviation Weather Center - Consistent, timely and accurate weather information for the world airspace system `_ * `Brazilian Weather - Historical data (In Portuguese) `_ * `Canadian Meteorological Centre `_ * `Climate Data from UEA (updated monthly) `_ From e2e48c39a080f8538c8d9d8d2013585a694513fc Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Mon, 15 Aug 2016 11:18:24 +0800 Subject: [PATCH 235/276] #230 --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 1513444..7265c24 100755 --- a/README.rst +++ b/README.rst @@ -154,7 +154,7 @@ Data Challenges Economics --------- -* `American Economic Ass (AEA) `_ +* `American Economic Association (AEA) `_ * `EconData from UMD `_ * `Economic Freedom of the World Data `_ * `Historical MacroEconomc Statistics `_ @@ -485,6 +485,7 @@ Social Sciences * `International Studies Compendium Project `_ * `James McGuire Cross National Data `_ * `MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_ +* `Minnesota Population Center `_ * `MIT Reality Mining Dataset `_ * `Open Crime and Policing Data in England, Wales and Northern Ireland `_ * `Paul Hensel General International Data Page `_ From 87df786d26266a95ba09e2a3f52ed10aa1c8414e Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Mon, 15 Aug 2016 11:26:55 +0800 Subject: [PATCH 236/276] Disable fake reports of links --- .travis.yml | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/.travis.yml b/.travis.yml index d4709b6..066e607 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,10 +1,10 @@ -language: ruby -rvm: - - 2.2 -before_script: - - gem install awesome_bot -script: - - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ - - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia,www.laval.ca,www.wunderground.com,data.lexingtonky.gov,arcgis,bixi - - site503=datamob.org,research.microsoft.com - - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --allow-timeout --white-list $site404,$whtlist,$site503 +# language: ruby +# rvm: +# - 2.2 +# before_script: +# - gem install awesome_bot +# script: +# - site404=www.datawrangling.com,getglue-data.s3.amazonaws.com,archive.org/details/2011-05-calufa-twitter-sql,www.stats4stem.org,lib.stat.cmu.edu,http://www.oecd.org/document/0,census.gov/acs/www/data_documentation/data_release_info/ +# - whtlist=travis,crawdad.cs.dartmouth.edu,data.nasdaq.com,137.189.35.203/WebUI/CatDatabase/catData.html,numbrary.com,www.cmr.osu.edu,gutenberg.org,donnees.gouv.qc.ca,data.rio.rj.gov.br,ntrl.ntis.gov,openflights.org,www.data.gov.bc.ca,earthdata.nasa,pgp-hms,cru.uea.ac.uk,networkdata.ics,datos.argentina,data.gov.ie,isi.edu,data.go.id,wiki.dbpedia,www.laval.ca,www.wunderground.com,data.lexingtonky.gov,arcgis,bixi +# - site503=datamob.org,research.microsoft.com +# - awesome_bot README.rst --allow-dupe --allow-redirect --set-timeout 5 --allow-timeout --white-list $site404,$whtlist,$site503 From 9d1f4fb10d6a2944a60012bd668e02fe094b1971 Mon Sep 17 00:00:00 2001 From: Sammy X Chen Date: Mon, 15 Aug 2016 13:59:28 +0800 Subject: [PATCH 237/276] Add AQUASTAT and category Earth Science Earch Science maintains data from geoscience and earth related fields, like environment, water etc. --- README.rst | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/README.rst b/README.rst index c0f7ff4..c04cf75 100755 --- a/README.rst +++ b/README.rst @@ -3,8 +3,6 @@ Awesome Public Datasets .. image:: https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg :alt: Awesome :target: https://github.com/sindresorhus/awesome -.. image:: https://travis-ci.org/caesar0301/awesome-public-datasets.svg - :target: https://travis-ci.org/caesar0301/awesome-public-datasets `This list of public data sources `_ are collected and tidied from blogs, answers, and user responses. @@ -151,6 +149,20 @@ Data Challenges * `Yelp Dataset Challenge `_ * `Bruteforce Database `_ + +Earth Science +------------- + +* `AQUASTAT - Global water resources and uses `_ +* `BODC - marine data of ~22K vars `_ +* `Earth Models `_ +* `EOSDIS - NASA's earth observing system data `_ +* `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements `_ or `on S3 `_ +* `Marinexplore - Open Oceanographic Data `_ +* `Smithsonian Institution Global Volcano and Eruption Database `_ +* `USGS Earthquake Archives `_ + + Economics --------- @@ -215,20 +227,10 @@ Finance * `NYSE Market Data `_ -Geology -------- +GIS +--- -* `Earth Models `_ -* `Smithsonian Institution Global Volcano and Eruption Database `_ -* `USGS Earthquake Archives `_ - - -GIS/Environment ---------------- - -* `BODC - marine data of ~22K vars `_ * `Cambridge, MA, US, GIS data on GitHub `_ -* `EOSDIS - NASA's earth observing system data `_ * `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ * `Geo Wiki Project - Citizen-driven Environmental Monitoring `_ @@ -236,11 +238,8 @@ GIS/Environment * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ * `Homeland Infrastructure Foundation-Level Data `_ -* `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements `_ or `on S3 `_ -* `International Institute for Systems Analysis - GIS Datasets `_ * `Landsat 8 on AWS `_ * `List of all countries in all languages `_ -* `Marinexplore - Open Oceanographic Data `_ * `National Weather Service GIS Data Portal `_ * `Natural Earth - vectors and rasters of the world `_ * `OpenAddresses `_ @@ -254,6 +253,7 @@ GIS/Environment * `World boundaries from the U.S. Department of State `_ * `World countries in multiple formats `_ + Government ---------- From 2530bbf1338df2deed7cbd7caf0c942f89e18415 Mon Sep 17 00:00:00 2001 From: Sammy X Chen Date: Mon, 15 Aug 2016 14:04:32 +0800 Subject: [PATCH 238/276] Update README.rst --- README.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index c04cf75..bc0b84b 100755 --- a/README.rst +++ b/README.rst @@ -45,7 +45,7 @@ Biology * `MIT Cancer Genomics Data `_ * `NCBI Proteins `_ * `NCBI Taxonomy `_ -* `NIH Microarray data `_ or `FTP `_ +* `NIH Microarray data `_ or `FTP `_ (see FTP link on `RAW `_) * `OpenSNP genotypes data `_ * `Pathguid - Protein-Protein Interactions Catalog `_ * `Protein Data Bank `_ @@ -224,7 +224,7 @@ Finance * `Quandl `_ * `St Louis Federal `_ * `Yahoo Finance `_ -* `NYSE Market Data `_ +* `NYSE Market Data `_ (see FTP link on `RAW `_) GIS From 0954d9aa6b21f61782358fb0debd6aad65aad2e9 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Fri, 11 Nov 2016 09:48:18 +0800 Subject: [PATCH 239/276] Add Kaggle link to Titanic data --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index bc0b84b..7146d69 100755 --- a/README.rst +++ b/README.rst @@ -500,7 +500,7 @@ Social Sciences * `StackExchange Data Explorer `_ * `Terrorism Research and Analysis Consortium `_ * `Texas Inmates Executed Since 1984 `_ -* `Titanic Survival Data Set `_ +* `Titanic Survival Data Set `_ or `on Kaggle `_ * `UCB's Archive of Social Science Data (D-Lab) `_ * `Uppsala Conflict Data Program `_ * `UCLA Social Sciences Data Archive `_ From 57d9c7bff7eb0ac17b8963c4ef4e9578f909cc2f Mon Sep 17 00:00:00 2001 From: Samuel Taylor Date: Sat, 12 Nov 2016 09:41:05 -0600 Subject: [PATCH 240/276] Remove dead link to GetGlue --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index 7146d69..dc2029d 100755 --- a/README.rst +++ b/README.rst @@ -449,7 +449,6 @@ Social Networks * `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ * `Foursquare from UMN/Sarwat (2013) `_ -* `GetGlue - users rating TV shows `_ * `GitHub Collaboration Archive `_ * `Google Scholar citation relations `_ * `High-Resolution Contact Networks from Wearable Sensors `_ From 80ecc66409f548ab4d8e2a607b94ece1dbb74300 Mon Sep 17 00:00:00 2001 From: Diomidis Spinellis Date: Sun, 27 Nov 2016 10:47:59 +0200 Subject: [PATCH 241/276] Add Microsoft's Data Science for Research --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 7146d69..ae62485 100755 --- a/README.rst +++ b/README.rst @@ -408,6 +408,7 @@ Public Domains * `Infochimps `_ * `KDNuggets Data Collections `_ * `Microsoft Azure Data Market Free DataSets `_ +* `Microsoft Data Science for Research `_ * `Numbray `_ * `Open Library Data Dumps `_ * `Reddit Datasets `_ From 6b7120dad2cfa2a28966ee5cf3c06bd42e6170f8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Arturo=20Filast=C3=B2?= Date: Thu, 8 Dec 2016 18:44:01 +0000 Subject: [PATCH 242/276] Add OONI data Add a link to data provided by the Open Observatory of Network Interference on internet censorship --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 7146d69..5706729 100755 --- a/README.rst +++ b/README.rst @@ -121,6 +121,7 @@ Computer Networks * `CommonCrawl Web Data over 7 years `_ * `CRAWDAD Wireless datasets from Dartmouth Univ. `_ * `Criteo click-through data `_ +* `OONI: Open Observatory of Network Interference - Internet censorship data `_ * `Open Mobile Data by MobiPerf `_ * `Rapid7 Sonar Internet Scans `_ * `UCSD Network Telescope, IPv4 /8 net `_ From 4dc886ac006ecf418ad49d4e4f54416fe973025a Mon Sep 17 00:00:00 2001 From: Maxwell Rebo Date: Sun, 11 Dec 2016 15:17:54 +0400 Subject: [PATCH 243/276] Update README.rst --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 7146d69..b771d33 100755 --- a/README.rst +++ b/README.rst @@ -357,6 +357,7 @@ Natural Language * `Universal Dependencies `_ * `WordNet databases and tools `_ * `Open Multilingual Wordnet `_ +* `Automatic Keyphrase Extracttion `_ Neuroscience From 0d0117a88a7f8ba4d8053b4305e834dea25c2ad6 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sun, 18 Dec 2016 16:08:36 +0800 Subject: [PATCH 244/276] Update new image sets and three NLP sets Images: Chars74K dataset and MNIST, NLP: Google MC-AFP, MS-MACRO, and MDST --- README.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.rst b/README.rst index 7146d69..e971eba 100755 --- a/README.rst +++ b/README.rst @@ -284,11 +284,13 @@ Image Processing * `2GB of Photos of Cats `_ or `Archive version `_ * `Affective Image Classification `_ * `Animals with attributes `_ +* `Chars74K dataset, Character Recognition in Natural Images (both English and Kannada are available) `_ * `Face Recognition Benchmark `_ * `ImageNet (in WordNet hierarchy) `_ * `Indoor Scene Recognition `_ * `International Affective Picture System, UFL `_ * `Massive Visual Memory Stimuli, MIT `_ +* `MNIST database of handwritten digits, near 1 million examples `_ * `Several Shape-from-Silhouette Datasets `_ * `Stanford Dogs Dataset `_ * `SUN database, MIT `_ @@ -343,11 +345,14 @@ Natural Language * `Flickr Personal Taxonomies `_ * `Freebase.com of people, places, and things `_ * `Google Books Ngrams (2.2TB) `_ +* `Google MC-AFP, generated based on the public available Gigaword dataset using Paragraph Vectors `_ * `Google Web 5gram (1TB, 2006) `_ * `Gutenberg eBooks List `_ * `Hansards text chunks of Canadian Parliament `_ * `Machine Comprehension Test (MCTest) of text from Microsoft Research `_ * `Machine Translation of European languages `_ +* `Multi-Domain Sentiment Dataset (version 2.0) `_ +* `Microsoft MAchine Reading COmprehension Dataset (or MS MARCO) `_ * `Personae Corpus `_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ * `SMS Spam Collection in English `_ From d5a61529bc585d4d11889cef03098d3e0309fc45 Mon Sep 17 00:00:00 2001 From: Victor Laerte Oliveira Date: Sun, 18 Dec 2016 20:57:22 -0300 Subject: [PATCH 245/276] Adding TravisTorrent MSR2017 Mining Challenge. TravisTorrent, a GHTorrent partner project, provides free and easy-to-use Travis CI build analyses to the masses through its open database. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index a07f976..a119174 100755 --- a/README.rst +++ b/README.rst @@ -148,6 +148,7 @@ Data Challenges * `Telecom Italia Big Data Challenge `_ * `Yelp Dataset Challenge `_ * `Bruteforce Database `_ +* `TravisTorrent Dataset - MSR'2017 Mining Challenge `_ Earth Science From 606189b55c1f628b0fa6c815f0496756cd3efc15 Mon Sep 17 00:00:00 2001 From: ghazy ben ahmed Date: Wed, 28 Dec 2016 20:56:27 +0100 Subject: [PATCH 246/276] Added Tunisia government data site --- Government.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/Government.rst b/Government.rst index 26555da..db7f229 100644 --- a/Government.rst +++ b/Government.rst @@ -85,6 +85,7 @@ Government * `Texas Open Data `_ * `The World Bank `_ * `Toronto, ON, Canada `_ +* `Tunisia `_ * `U.K. Government Data `_ * `U.S. American Community Survey `_ * `U.S. CDC Public Health datasets `_ @@ -100,4 +101,4 @@ Government * `Uruguay `_ * `Vancouver, BC Open Data Catalog `_ * `Victoria, BC, Canada `_ -* `Vienna, Austria `_ \ No newline at end of file +* `Vienna, Austria `_ From 3ba773df2de068da80e495437f1b8663a1f6939f Mon Sep 17 00:00:00 2001 From: Daniel Darabos Date: Thu, 5 Jan 2017 17:07:31 +0100 Subject: [PATCH 247/276] Fix typo. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 25186ef..6a03677 100755 --- a/README.rst +++ b/README.rst @@ -113,7 +113,7 @@ Complex Networks Computer Networks ----------------- -* `3.5B Web Pages from CommonCraw 2012 `_ +* `3.5B Web Pages from CommonCrawl 2012 `_ * `53.5B Web clicks of 100K users in Indiana Univ. `_ * `CAIDA Internet Datasets `_ * `ClueWeb09 - 1B web pages `_ From cddb768b860c18928e35b5ffc4b13cea481986e9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Pelletier?= Date: Sun, 8 Jan 2017 14:17:45 -0500 Subject: [PATCH 248/276] Update Government.rst --- Government.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/Government.rst b/Government.rst index db7f229..85f5efd 100644 --- a/Government.rst +++ b/Government.rst @@ -96,6 +96,7 @@ Government * `U.S. Food and Drug Administration (FDA) `_ * `U.S. National Center for Education Statistics (NCES) `_ * `U.S. Open Government `_ +* `Uganda Bureau of Statistics `_ * `UK 2011 Census Open Atlas Project `_ * `United Nations `_ * `Uruguay `_ From 6ea30d09b4f01d27ac433062df457aabac5c66d2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fran=C3=A7ois=20Pelletier?= Date: Sun, 8 Jan 2017 14:23:43 -0500 Subject: [PATCH 249/276] Update README.rst --- README.rst | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 6a03677..05a5b8e 100755 --- a/README.rst +++ b/README.rst @@ -68,7 +68,7 @@ Biology Climate/Weather --------------- - +* `Actuaries Climate Index `_ * `Australian Weather `_ * `Aviation Weather Center - Consistent, timely and accurate weather information for the world airspace system `_ * `Brazilian Weather - Historical data (In Portuguese) `_ @@ -151,7 +151,6 @@ Data Challenges * `Bruteforce Database `_ * `TravisTorrent Dataset - MSR'2017 Mining Challenge `_ - Earth Science ------------- @@ -259,7 +258,8 @@ GIS Government ---------- -* `OpenDataSoft's list of 1,600 open data portals `_ +* `OpenDataSoft's list of 1,600 open data `_ +* `Open Data for Africa `_ * `A list of cities and countries contributed by community `_ @@ -487,11 +487,13 @@ Social Sciences * `Datacards `_ * `European Social Survey `_ * `FBI Hate Crime 2013 - aggregated data `_ +* `Fragile States Index `_ * `GDELT Global Events Database `_ * `General Social Survey (GSS) since 1972 `_ * `German Social Survey `_ * `Global Religious Futures Project `_ * `Humanitarian Data Exchange `_ +* `INFORM Index for Risk Management `_ * `Institute for Demographic Studies `_ * `International Networks Archive `_ * `International Social Survey Program ISSP `_ @@ -500,6 +502,7 @@ Social Sciences * `MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste `_ * `Minnesota Population Center `_ * `MIT Reality Mining Dataset `_ +* `Notre Dame Global Adaptation Index (NG-DAIN) `_ * `Open Crime and Policing Data in England, Wales and Northern Ireland `_ * `Paul Hensel General International Data Page `_ * `PewResearch Internet Survey Project `_ @@ -515,7 +518,7 @@ Social Sciences * `UN Civil Society Database `_ * `Universities Worldwide `_ * `UPJOHN for Labor Employment Research `_ -* `World Bank Data `_ +* `World Bank Open Data `_ * `WorldPop project - Worldwide human population distributions `_ From e07bb6ccc26ed59f0680ffd45cd28d2d9dd6266a Mon Sep 17 00:00:00 2001 From: Katherine Schinkel Date: Sun, 15 Jan 2017 19:41:14 -0800 Subject: [PATCH 250/276] Add College Scorecard https://collegescorecard.ed.gov/data/ --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 05a5b8e..a003f47 100755 --- a/README.rst +++ b/README.rst @@ -189,6 +189,7 @@ Economics Education ------------ +* `College Scorecard Data `_ * `Student Data from Free Code Camp `_ From ff5ed076f4cef7ec935fd7ff444eaa8d38c15fee Mon Sep 17 00:00:00 2001 From: Raul Jimenez Ortega Date: Fri, 27 Jan 2017 08:10:21 +0100 Subject: [PATCH 251/276] Adding ArcGIS Open Data portal --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 05a5b8e..fee51aa 100755 --- a/README.rst +++ b/README.rst @@ -231,6 +231,7 @@ Finance GIS --- +* `ArcGIS Open Data portal `_ * `Cambridge, MA, US, GIS data on GitHub `_ * `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ From 1c940529b037528433049fdc0e9d6e0d5d0d7b2a Mon Sep 17 00:00:00 2001 From: Jad Chaar Date: Sat, 28 Jan 2017 23:43:32 -0500 Subject: [PATCH 252/276] Added links to SURFRAD data --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 05a5b8e..8b03172 100755 --- a/README.rst +++ b/README.rst @@ -80,6 +80,7 @@ Climate/Weather * `NOAA Bering Sea Climate `_ * `NOAA Climate Datasets `_ * `NOAA Realtime Weather Models `_ +* `NOAA SURFRAD Meteorology and Radiation Datasets `_ * `The World Bank Open Data Resources for Climate Change `_ * `UEA Climatic Research Unit `_ * `WorldClim - Global Climate Data `_ From 92ede117e165d4e2883bcb8c8b696d74a23b49a6 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sat, 4 Feb 2017 13:24:06 +0800 Subject: [PATCH 253/276] fix link issue #276 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 3596eda..c300e61 100755 --- a/README.rst +++ b/README.rst @@ -131,7 +131,7 @@ Computer Networks Contextual Data --------------- -* `Context-aware data sets from five domains `_ or `GitHub `_ +* `Context-aware data sets from five domains `_ Data Challenges From 20ad345175ca9e16ed7c6896448e8c2e813305e2 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sat, 4 Feb 2017 13:25:54 +0800 Subject: [PATCH 254/276] Fix link issue #277 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index c300e61..10da14f 100755 --- a/README.rst +++ b/README.rst @@ -156,7 +156,7 @@ Earth Science ------------- * `AQUASTAT - Global water resources and uses `_ -* `BODC - marine data of ~22K vars `_ +* `BODC - marine data of ~22K vars `_ * `Earth Models `_ * `EOSDIS - NASA's earth observing system data `_ * `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements `_ or `on S3 `_ From cb41229790348825ded701259413459cac920591 Mon Sep 17 00:00:00 2001 From: Philip Fung Date: Tue, 7 Feb 2017 12:24:59 -0800 Subject: [PATCH 255/276] adding National Cancer Institute - Genomic Data Commons --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 10da14f..202d181 100755 --- a/README.rst +++ b/README.rst @@ -45,6 +45,7 @@ Biology * `MIT Cancer Genomics Data `_ * `NCBI Proteins `_ * `NCBI Taxonomy `_ +* `NCI Genomic Data Commons `_ * `NIH Microarray data `_ or `FTP `_ (see FTP link on `RAW `_) * `OpenSNP genotypes data `_ * `Pathguid - Protein-Protein Interactions Catalog `_ From 64fe2cc8c35d8765bfe0735890e18ff409e1cfcd Mon Sep 17 00:00:00 2001 From: Alex Date: Mon, 13 Feb 2017 14:49:11 +1300 Subject: [PATCH 256/276] added youtube 8 and visual genome --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 10da14f..47eff1f 100755 --- a/README.rst +++ b/README.rst @@ -304,6 +304,7 @@ Image Processing * `Adience Unfiltered faces for gender and age classification `_ * `The Action Similarity Labeling (ASLAN) Challenge `_ * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ +* `Visual genome `_ Machine Learning ---------------- @@ -325,6 +326,7 @@ Machine Learning * `Restaurants Health Score Data in San Francisco `_ * `UCI Machine Learning Repository `_ * `Yahoo! Ratings and Classification Data `_ +* `Youtube 8m `_ Museums From e5cea9a18422088a4f641d9d21e6b323f9fd6526 Mon Sep 17 00:00:00 2001 From: Alex Date: Mon, 13 Feb 2017 14:57:38 +1300 Subject: [PATCH 257/276] Update README.rst --- README.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index 47eff1f..6b57705 100755 --- a/README.rst +++ b/README.rst @@ -304,7 +304,7 @@ Image Processing * `Adience Unfiltered faces for gender and age classification `_ * `The Action Similarity Labeling (ASLAN) Challenge `_ * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ -* `Visual genome `_ +* `Visual genome `_ Machine Learning ---------------- @@ -326,7 +326,7 @@ Machine Learning * `Restaurants Health Score Data in San Francisco `_ * `UCI Machine Learning Repository `_ * `Yahoo! Ratings and Classification Data `_ -* `Youtube 8m `_ +* `Youtube 8m `_ Museums From 5587d232b599a2b9dc23ab4b1c99bc2bc19ed399 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Mon, 13 Feb 2017 11:30:01 +0800 Subject: [PATCH 258/276] Add EveryPolitician, #280 --- Government.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Government.rst b/Government.rst index 85f5efd..1df8d04 100644 --- a/Government.rst +++ b/Government.rst @@ -1,6 +1,8 @@ Government ---------- +* `EveryPolitician, ongoing project collating and sharing data on every politician. `_ + * `Alberta, Province of Canada `_ * `Antwerp, Belgium `_ * `Argentina (non official) `_ From 49e07e34c284b9292cd68fb590affeb57756194e Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Mon, 13 Feb 2017 11:34:21 +0800 Subject: [PATCH 259/276] Add data.world #279 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 6b57705..2bee928 100755 --- a/README.rst +++ b/README.rst @@ -417,6 +417,7 @@ Public Domains * `CMU StatLab collections `_ * `Data360 `_ * `Datamob.org `_ +* `Data.World `_ * `Google `_ * `Infochimps `_ * `KDNuggets Data Collections `_ From 7ac9f9e367cdc5d47d897fc788b68ead5135d827 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Mon, 13 Feb 2017 11:45:07 +0800 Subject: [PATCH 260/276] Add Tennis database from Jeff Sackmann #278 --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 2bee928..b59d440 100755 --- a/README.rst +++ b/README.rst @@ -544,6 +544,7 @@ Sports * `Lahman's Baseball Database `_ * `Pinhooker: Thoroughbred Bloodstock Sale Data `_ * `Retrosheet Baseball Statistics `_ +* `Tennis database of rankings, results, and stats for ATP `_, `WTA `_, `Grand Slams `_ and `Match Charting Project `_ Time Series From 6141e30d29e36a90eeaddc756f08f7164f351b74 Mon Sep 17 00:00:00 2001 From: Emre Bolat Date: Thu, 23 Feb 2017 10:26:22 +0200 Subject: [PATCH 261/276] New addition to Agriculture category U.S. Department of Agriculture's Nutrient Database link added. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index aba0efe..b59c6c7 100755 --- a/README.rst +++ b/README.rst @@ -17,6 +17,7 @@ Other amazingly awesome lists can be found in the Agriculture ------------ * `U.S. Department of Agriculture's PLANTS Database `_ +* `U.S. Department of Agriculture's Nutrient Database `_ Biology From e746ff23857f0550d47ad3074af00d597446188a Mon Sep 17 00:00:00 2001 From: Alex Date: Fri, 24 Feb 2017 14:20:01 +1300 Subject: [PATCH 262/276] added comp vision dataset --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index aba0efe..d5a2910 100755 --- a/README.rst +++ b/README.rst @@ -306,6 +306,7 @@ Image Processing * `The Action Similarity Labeling (ASLAN) Challenge `_ * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ * `Visual genome `_ +* `Caltech Pedestrian Detection Benchmark `_ Machine Learning ---------------- From dc1f51b3263d700596603c4a52c54dd9b44d0955 Mon Sep 17 00:00:00 2001 From: Martin Linkov Date: Wed, 1 Mar 2017 11:14:10 +0100 Subject: [PATCH 263/276] CoolDatasets The twitter account upgraded to a website, the collection grows, I think it is worth including in the Complementary List --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index aba0efe..79ac468 100755 --- a/README.rst +++ b/README.rst @@ -592,6 +592,7 @@ Complementary Collections * `Data Packaged Core Datasets `_ * `Database of Scientific Code Contributions `_ * DataWrangling: `Some Datasets Available on the Web `_ +* A growing collection of public datasets: `CoolDatasets. `_ * Inside-r: `Finding Data on the Internet `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ * Quora: `Where can I find large datasets open to the public? `_ From aff0331e4e2dcbfc259b92a464c734ad73ffcd28 Mon Sep 17 00:00:00 2001 From: owkwen Date: Thu, 9 Mar 2017 13:54:36 -0500 Subject: [PATCH 264/276] Resurrected link Montreal BIXI Bike Share link is dead. Updated with new link and in english. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index aba0efe..de92f21 100755 --- a/README.rst +++ b/README.rst @@ -568,7 +568,7 @@ Transportation * `German train system by Deutsche Bahn `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ -* `Montreal BIXI Bike Share `_ +* `Montreal BIXI Bike Share `_ * `NYC Taxi Trip Data 2009- `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Uber trip data April 2014 to September 2014 `_ From 1633901880b97b47194c97f6abd896a5dbe14e8f Mon Sep 17 00:00:00 2001 From: Clement Michaud Date: Tue, 28 Mar 2017 22:04:21 +0200 Subject: [PATCH 265/276] Fix broken link to Transport for London open datasets --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index aba0efe..4a57290 100755 --- a/README.rst +++ b/README.rst @@ -579,7 +579,7 @@ Transportation * `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ * `Toronto Bike Share Stations (XML file) `_ -* `Transport for London (TFL) `_ +* `Transport for London (TFL) `_ * `Travel Tracker Survey (TTS) for Chicago `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ * `U.S. Domestic Flights 1990 to 2009 `_ From 863c2c831100a9d03eb6fba2b0644f068edf4d91 Mon Sep 17 00:00:00 2001 From: shagun Sodhani Date: Thu, 6 Apr 2017 14:00:41 +0530 Subject: [PATCH 266/276] Added webhose datasets - related to News/Blogs in multiple languages --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index b59c6c7..87071ab 100755 --- a/README.rst +++ b/README.rst @@ -372,6 +372,7 @@ Natural Language * `WordNet databases and tools `_ * `Open Multilingual Wordnet `_ * `Automatic Keyphrase Extracttion `_ +* `News/Blogs in multiple languages `_ Neuroscience From e53e99c4c468cb6528cc4993ba40cfaf58467114 Mon Sep 17 00:00:00 2001 From: Katherine Schinkel Date: Thu, 6 Apr 2017 21:09:07 -0700 Subject: [PATCH 267/276] Create PULL_REQUEST_TEMPLATE.md --- PULL_REQUEST_TEMPLATE.md | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 PULL_REQUEST_TEMPLATE.md diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 0000000..4690fa4 --- /dev/null +++ b/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,3 @@ +# Overview +Dataset Description:
+[link to dataset](putlinkhere.com) From f96c461782a6d899e21046de3d4a7b622b19e598 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Fri, 7 Apr 2017 16:47:40 +0800 Subject: [PATCH 268/276] Clear format and fix #291 --- README.rst | 69 +++++++++++++++++++++++++++--------------------------- 1 file changed, 34 insertions(+), 35 deletions(-) diff --git a/README.rst b/README.rst index da3a2e7..4068950 100755 --- a/README.rst +++ b/README.rst @@ -25,8 +25,8 @@ Biology * `1000 Genomes `_ * `American Gut (Microbiome Project) `_ -* `Broad Cancer Cell Line Encyclopedia (CCLE) `_ * `Broad Bioimage Benchmark Collection (BBBC) `_ +* `Broad Cancer Cell Line Encyclopedia (CCLE) `_ * `Cell Image Library `_ * `Complete Genomics Public Data `_ * `EBI ArrayExpress `_ @@ -64,12 +64,13 @@ Biology * `The Catalogue of Life `_ * `The Personal Genome Project `_ or `PGP `_ * `UCSC Public Data `_ -* `Universal Protein Resource (UnitProt) `_ * `UniGene `_ +* `Universal Protein Resource (UnitProt) `_ Climate/Weather --------------- + * `Actuaries Climate Index `_ * `Australian Weather `_ * `Aviation Weather Center - Consistent, timely and accurate weather information for the world airspace system `_ @@ -95,6 +96,7 @@ Complex Networks * `AMiner Citation Network Dataset `_ * `CrossRef DOI URLs `_ * `DBLP Citation dataset `_ +* `DIMACS Road Networks Collection `_ * `NBER Patent Citations `_ * `Network Repository with Interactive Exploratory Analysis Tools `_ * `NIST complex networks data collection `_ @@ -111,7 +113,7 @@ Complex Networks * `UCI Network Data Repository `_ * `UFL sparse matrix collection `_ * `WSU Graph Database `_ -* `DIMACS Road Networks Collection `_ + Computer Networks ----------------- @@ -130,15 +132,10 @@ Computer Networks * `UCSD Network Telescope, IPv4 /8 net `_ -Contextual Data ---------------- - -* `Context-aware data sets from five domains `_ - - Data Challenges --------------- +* `Bruteforce Database `_ * `Challenges in Machine Learning `_ * `CrowdANALYTIX dataX `_ * `D4D Challenge of Orange `_ @@ -150,9 +147,9 @@ Data Challenges * `Netflix Prize `_ * `Space Apps Challenge `_ * `Telecom Italia Big Data Challenge `_ -* `Yelp Dataset Challenge `_ -* `Bruteforce Database `_ * `TravisTorrent Dataset - MSR'2017 Mining Challenge `_ +* `Yelp Dataset Challenge `_ + Earth Science ------------- @@ -216,7 +213,6 @@ Energy * `WHITED `_ - Finance ------- @@ -224,12 +220,12 @@ Finance * `Google Finance `_ * `Google Trends `_ * `NASDAQ `_ +* `NYSE Market Data `_ (see FTP link on `RAW `_) * `OANDA `_ * `OSU Financial data `_ * `Quandl `_ * `St Louis Federal `_ * `Yahoo Finance `_ -* `NYSE Market Data `_ (see FTP link on `RAW `_) GIS @@ -263,9 +259,9 @@ GIS Government ---------- -* `OpenDataSoft's list of 1,600 open data `_ -* `Open Data for Africa `_ * `A list of cities and countries contributed by community `_ +* `Open Data for Africa `_ +* `OpenDataSoft's list of 1,600 open data `_ Healthcare @@ -289,10 +285,13 @@ Image Processing * `10k US Adult Faces Database `_ * `2GB of Photos of Cats `_ or `Archive version `_ +* `Adience Unfiltered faces for gender and age classification `_ * `Affective Image Classification `_ * `Animals with attributes `_ +* `Caltech Pedestrian Detection Benchmark `_ * `Chars74K dataset, Character Recognition in Natural Images (both English and Kannada are available) `_ * `Face Recognition Benchmark `_ +* `GDXray: X-ray images for X-ray testing and Computer Vision `_ * `ImageNet (in WordNet hierarchy) `_ * `Indoor Scene Recognition `_ * `International Affective Picture System, UFL `_ @@ -301,17 +300,17 @@ Image Processing * `Several Shape-from-Silhouette Datasets `_ * `Stanford Dogs Dataset `_ * `SUN database, MIT `_ -* `The Oxford-IIIT Pet Dataset `_ -* `YouTube Faces Database `_ -* `Adience Unfiltered faces for gender and age classification `_ * `The Action Similarity Labeling (ASLAN) Challenge `_ +* `The Oxford-IIIT Pet Dataset `_ * `Violent-Flows - Crowd Violence \ Non-violence Database and benchmark `_ * `Visual genome `_ -* `Caltech Pedestrian Detection Benchmark `_ +* `YouTube Faces Database `_ + Machine Learning ---------------- +* `Context-aware data sets from five domains `_ * `Delve Datasets for classification and regression (Univ. of Toronto) `_ * `Discogs Monthly Data `_ * `eBay Online Auctions (2012) `_ @@ -322,8 +321,8 @@ Machine Learning * `Machine Learning Data Set Repository `_ * `Million Song Dataset `_ * `More Song Datasets `_ -* `New Yorker caption contest ratings `_ * `MovieLens Data Sets `_ +* `New Yorker caption contest ratings `_ * `RDataMining - "R and Data Mining" ebook data `_ * `Registered Meteorites on Earth `_ * `Restaurants Health Score Data in San Francisco `_ @@ -347,6 +346,7 @@ Museums Natural Language ---------------- +* `Automatic Keyphrase Extracttion `_ * `Blogger Corpus `_ * `CLiPS Stylometry Investigation Corpus `_ * `ClueWeb09 FACC `_ @@ -361,37 +361,36 @@ Natural Language * `Hansards text chunks of Canadian Parliament `_ * `Machine Comprehension Test (MCTest) of text from Microsoft Research `_ * `Machine Translation of European languages `_ -* `Multi-Domain Sentiment Dataset (version 2.0) `_ * `Microsoft MAchine Reading COmprehension Dataset (or MS MARCO) `_ +* `Multi-Domain Sentiment Dataset (version 2.0) `_ +* `Open Multilingual Wordnet `_ * `Personae Corpus `_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ * `SMS Spam Collection in English `_ +* `Universal Dependencies `_ * `USENET postings corpus of 2005~2011 `_ +* `Webhose - News/Blogs in multiple languages `_ * `Wikidata - Wikipedia databases `_ * `Wikipedia Links data - 40 Million Entities in Context `_ -* `Universal Dependencies `_ * `WordNet databases and tools `_ -* `Open Multilingual Wordnet `_ -* `Automatic Keyphrase Extracttion `_ -* `News/Blogs in multiple languages `_ - + Neuroscience ------------- * `Allen Institute Datasets `_ * `Brain Catalogue `_ -* `Brainomics `_ -* `CodeNeuro Datasets `_ +* `Brainomics `_ +* `CodeNeuro Datasets `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ * `FCP-INDI `_ -* `Human Connectome Project `_ +* `Human Connectome Project `_ * `NDAR `_ -* `NIMH Data Archive `_ * `NeuroData `_ +* `Neuroelectro `_ +* `NIMH Data Archive `_ * `OASIS `_ * `OpenfMRI `_ -* `Neuroelectro `_ * `Study Forrest `_ @@ -419,9 +418,9 @@ Public Domains * `Archive.org Datasets `_ * `CMU JASA data archive `_ * `CMU StatLab collections `_ +* `Data.World `_ * `Data360 `_ * `Datamob.org `_ -* `Data.World `_ * `Google `_ * `Infochimps `_ * `KDNuggets Data Collections `_ @@ -477,8 +476,8 @@ Social Networks * `Skytrax' Air Travel Reviews Dataset `_ * `Social Twitter Data `_ * `SourceForge.net Research Data `_ -* `Twitter Data for Sentiment Analysis `_ * `Twitter Data for Online Reputation Management `_ +* `Twitter Data for Sentiment Analysis `_ * `Twitter Graph of entire Twitter site `_ * `Twitter Scrape Calufa May 2011 `_ * `UNIMI/LAW Social Network Datasets `_ @@ -523,11 +522,11 @@ Social Sciences * `Texas Inmates Executed Since 1984 `_ * `Titanic Survival Data Set `_ or `on Kaggle `_ * `UCB's Archive of Social Science Data (D-Lab) `_ -* `Uppsala Conflict Data Program `_ * `UCLA Social Sciences Data Archive `_ * `UN Civil Society Database `_ * `Universities Worldwide `_ * `UPJOHN for Labor Employment Research `_ +* `Uppsala Conflict Data Program `_ * `World Bank Open Data `_ * `WorldPop project - Worldwide human population distributions `_ @@ -594,8 +593,8 @@ Complementary Collections * `Data Packaged Core Datasets `_ * `Database of Scientific Code Contributions `_ -* DataWrangling: `Some Datasets Available on the Web `_ * A growing collection of public datasets: `CoolDatasets. `_ +* DataWrangling: `Some Datasets Available on the Web `_ * Inside-r: `Finding Data on the Internet `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ * Quora: `Where can I find large datasets open to the public? `_ From 68088197e998355435117ec3a660d8ad96bf4aad Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Fri, 7 Apr 2017 16:59:02 +0800 Subject: [PATCH 269/276] Modify pull_request_template --- PULL_REQUEST_TEMPLATE.md | 3 --- PULL_REQUEST_TEMPLATE.rst | 3 +++ 2 files changed, 3 insertions(+), 3 deletions(-) delete mode 100644 PULL_REQUEST_TEMPLATE.md create mode 100644 PULL_REQUEST_TEMPLATE.rst diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md deleted file mode 100644 index 4690fa4..0000000 --- a/PULL_REQUEST_TEMPLATE.md +++ /dev/null @@ -1,3 +0,0 @@ -# Overview -Dataset Description:
-[link to dataset](putlinkhere.com) diff --git a/PULL_REQUEST_TEMPLATE.rst b/PULL_REQUEST_TEMPLATE.rst new file mode 100644 index 0000000..1014736 --- /dev/null +++ b/PULL_REQUEST_TEMPLATE.rst @@ -0,0 +1,3 @@ +# Overview + +* `Dataset Description `_ From e3dcb1c503e792d692f64a179f8ee1a81a75ce1b Mon Sep 17 00:00:00 2001 From: Cameron Date: Fri, 28 Apr 2017 15:00:28 -0700 Subject: [PATCH 270/276] add flickr logo dataset --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 47e51b5..6e50e87 100755 --- a/README.rst +++ b/README.rst @@ -291,6 +291,7 @@ Image Processing * `Caltech Pedestrian Detection Benchmark `_ * `Chars74K dataset, Character Recognition in Natural Images (both English and Kannada are available) `_ * `Face Recognition Benchmark `_ +* `Flickr: 32 Class Brand Logos `_ * `GDXray: X-ray images for X-ray testing and Computer Vision `_ * `ImageNet (in WordNet hierarchy) `_ * `Indoor Scene Recognition `_ From dac0811dc28755fa5101613f31bbcbf01f887d05 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C3=ABl=20Defferrard?= Date: Wed, 10 May 2017 15:54:12 +0200 Subject: [PATCH 271/276] Add Free Music Archive --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 47e51b5..761f7e2 100755 --- a/README.rst +++ b/README.rst @@ -319,6 +319,7 @@ Machine Learning * `Labeled Faces in the Wild (LFW) `_ * `Lending Club Loan Data `_ * `Machine Learning Data Set Repository `_ +* `Free Music Archive `_ * `Million Song Dataset `_ * `More Song Datasets `_ * `MovieLens Data Sets `_ From 0bde4fd8edcf044131d5669fd22a1ac10f1b2ee3 Mon Sep 17 00:00:00 2001 From: Ryan Barrett Date: Thu, 29 Jun 2017 07:36:48 -0700 Subject: [PATCH 272/276] Add Indie Map --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index edab464..b169fcb 100755 --- a/README.rst +++ b/README.rst @@ -472,6 +472,7 @@ Social Networks * `GitHub Collaboration Archive `_ * `Google Scholar citation relations `_ * `High-Resolution Contact Networks from Wearable Sensors `_ +* `Indie Map: social graph and crawl of top IndieWeb sites `_ * `Mobile Social Networks from UMASS `_ * `Network Twitter Data `_ * `Reddit Comments `_ From 1c57e245bd11f2f6d650ad07a4c3b4d92bc6d087 Mon Sep 17 00:00:00 2001 From: Tom Morris Date: Tue, 11 Jul 2017 10:37:39 -0400 Subject: [PATCH 273/276] Datamob is gone --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index edab464..1a33385 100755 --- a/README.rst +++ b/README.rst @@ -422,7 +422,6 @@ Public Domains * `CMU StatLab collections `_ * `Data.World `_ * `Data360 `_ -* `Datamob.org `_ * `Google `_ * `Infochimps `_ * `KDNuggets Data Collections `_ From 76ee6a0012c8d5d835581928e15b3f8416b71383 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 10 Aug 2017 10:54:22 +0800 Subject: [PATCH 274/276] Fix #308 --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 1a33385..f631ee5 100755 --- a/README.rst +++ b/README.rst @@ -269,6 +269,7 @@ Healthcare * `EHDP Large Health Data Sets `_ * `Gapminder World demographic databases `_ +* `GDC supports several cancer genome programs for CCG, TCGA, TARGET etc. `_ * `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ @@ -276,7 +277,7 @@ Healthcare * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ * `OpenPaymentsData, Healthcare financial relationship data `_ -* `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ +* The Cancer Genome Atlas project (TCGA) (refer to `GDC `_ and `BigQuery table `_) * `World Health Organization Global Health Observatory `_ From a12a3b41693047128bda88552ad1543950c4bb32 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 10 Aug 2017 10:55:40 +0800 Subject: [PATCH 275/276] Fix #307 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index f631ee5..8155e1e 100755 --- a/README.rst +++ b/README.rst @@ -349,7 +349,7 @@ Museums Natural Language ---------------- -* `Automatic Keyphrase Extracttion `_ +* `Automatic Keyphrase Extraction `_ * `Blogger Corpus `_ * `CLiPS Stylometry Investigation Corpus `_ * `ClueWeb09 FACC `_ From 853dbff93781b301cc4af8249927c505192d1d41 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 10 Aug 2017 11:06:01 +0800 Subject: [PATCH 276/276] #306 --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 8155e1e..9472dc3 100755 --- a/README.rst +++ b/README.rst @@ -4,7 +4,7 @@ Awesome Public Datasets :alt: Awesome :target: https://github.com/sindresorhus/awesome -`This list of public data sources `_ +`This list of a topic-centric public data sources `_ in high quality. They are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in the @@ -270,6 +270,7 @@ Healthcare * `EHDP Large Health Data Sets `_ * `Gapminder World demographic databases `_ * `GDC supports several cancer genome programs for CCG, TCGA, TARGET etc. `_ +* `PhysioBank Databases - a large and growing archive of physiological data `_ * `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_