From 44c58b64263dc8288f6d86fda491b70e6240d80a Mon Sep 17 00:00:00 2001 From: Xiaming Date: Fri, 30 Jan 2015 10:43:36 +0800 Subject: [PATCH 01/17] Add U.S. MassGIS data --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index ed560e6..1168426 100644 --- a/README.rst +++ b/README.rst @@ -158,7 +158,8 @@ Government * `Germany `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ -* `London Datastore, U.K `_ +* `London Datastore, UK `_ +* `MassGIS, Massachusetts, U.S. `_ * `Netherlands `_ * `New Zealand `_ * `NYC betanyc `_ From 43889987053539f58940af8f8ce3cc05ec19dc28 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sat, 31 Jan 2015 17:18:37 +0800 Subject: [PATCH 02/17] Tidy data description --- README.rst | 221 ++++++++++++++++++++++++++--------------------------- 1 file changed, 108 insertions(+), 113 deletions(-) diff --git a/README.rst b/README.rst index 1168426..2a43faf 100644 --- a/README.rst +++ b/README.rst @@ -38,7 +38,7 @@ Climate/Weather * `Australian Weather `_ * `Canadian Meteorological Centre `_ -* `Climate Data from UEA (updated at roughly monthly intervals) `_ +* `Climate Data from UEA (updated monthly) `_ * `Global Climate Data Since 1929 `_ * `NOAA Bering Sea Climate `_ * `NOAA Climate Datasets `_ @@ -68,15 +68,15 @@ Complex Networks Computer Networks ----------------- -* `3.5B Web Pages - Web graph extracted from CommonCraw 2012 web corpus. `_ -* `53.5B Web clicks - Anonymized HTTP records from 100K users in Indiana Univ. `_ -* `CAIDA Internet Datasets - Network traces and topologies at geographically diverse locations. `_ -* `ClueWeb09 - About 1B web pages in ten languages that were collected in Jan. and Feb. 2009. `_ -* `ClueWeb12 - About 733M web pages collected between Feb. and May 2012. `_ -* `CommonCrawl Web Data - Petabytes of data collected over 7 years of web crawling. `_ -* `CRAWDAD Wireless datasets (Dartmouth) - A wireless network data resource for research communities. `_ -* `OpenMobileData (MobiPerf) - Mobile performance measurement data collected with active tests. `_ -* `UCSD Network Telescope - A passive traffic monitoring system covering IPv4 /8 net. `_ +* `3.5B Web Pages from CommonCraw 2012 `_ +* `53.5B Web clicks of 100K users in Indiana Univ. `_ +* `CAIDA Internet Datasets `_ +* `ClueWeb09 - 1B web pages `_ +* `ClueWeb12 - 733M web pages `_ +* `CommonCrawl Web Data over 7 years `_ +* `CRAWDAD Wireless datasets from Dartmouth Univ. `_ +* `Open Mobile Data by MobiPerf `_ +* `UCSD Network Telescope, IPv4 /8 net `_ Data Challenges @@ -95,7 +95,7 @@ Data Challenges Economics --------- -* `American Economic Ass. (AEA) `_ +* `American Economic Ass (AEA) `_ * `EconData from UMD `_ * `Internet Product Code Database `_ @@ -133,24 +133,24 @@ Finance GeoSpace/GIS ------------ -* `BODC - Marine data of nearly 22,000 oceanographic vars. `_ -* `EOSDIS - A data collection of NASA's earth observing system data and information system. `_ -* `Factual Global Location Data - 65M POIs with extended attributes in 50 countries. `_ -* `Global Administrative Areas Database (GADM) - For countries and low-level subdivisions. `_ -* `Geo Spatial Data from ASU - Several small spatial or GIS datasets. `_ -* `GeoNames - Over eight million placenames (countries, city stat etc.) of the world. `_ -* `Natural Earth - Vectors and rasters of the world in multiple scales. `_ -* `OpenStreetMap - A free map worldwide maintained by the communities. `_ -* `TIGER/Line - Official United States boundaries and roads. `_ -* `TwoFishes - Foursquare's coarse geocoder. `_ -* `TZ Timezones - A shapefile of the TZ timezones of the world. `_ +* `BODC - marine data of ~22K vars `_ +* `EOSDIS - NASA's earth observing system data `_ +* `Factual Global Location Data `_ +* `Global Administrative Areas Database (GADM) `_ +* `Geo Spatial Data from ASU `_ +* `GeoNames Worldwide `_ +* `Natural Earth - vectors and rasters of the world `_ +* `Open Street Map (OSM) `_ +* `TIGER/Line - U.S. boundaries and roads `_ +* `TwoFishes - Foursquare's coarse geocoder `_ +* `TZ Timezones shapfiles `_ Government ---------- -* `Australia `_ (abs.gov.au) -* `Australia `_ (data.gov.au) +* `Australia (abs.gov.au) `_ +* `Australia (data.gov.au) `_ * `Canada `_ * `Chicago `_ * `EuroStat `_ @@ -185,10 +185,10 @@ Government Healthcare ---------- -* `EHDP Large Health Data Sets - A collection of health datasets across domains and countries. `_ -* `Gapminder World - A collection of multi-domain, demographic databases for our world. `_ -* `Medicare Coverage Database (MCD) - Containing national and local Coverage Determinations. `_ -* `Medicare Data Engine - Download, explore, and visualize Medicare.gov Data. `_ +* `EHDP Large Health Data Sets `_ +* `Gapminder World, demographic databases `_ +* `Medicare Coverage Database (MCD), U.S. `_ +* `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ @@ -196,28 +196,29 @@ Healthcare Image Processing ---------------- -* `2GB of Photos of Cats - 10K cat images with basic annotations. `_ -* `Face Recognition Benchmark - A collection of face datasets for benchmarking algorithms. `_ -* `ImageNet - An image database organized according to the WordNet hierarchy. `_ +* `2GB of Photos of Cats `_ +* `Face Recognition Benchmark `_ +* `ImageNet - an image database in WordNet hierarchy `_ Machine Learning ---------------- -* `Delve Datasets (Univ. of Toronto) - Evaluating datasets for classification and regression. `_ -* `eBay Online Auctions (2012) - Seller-auction-bidder data with closing prices. `_ -* `IMDb Database - An online database of films, TB programs, and video games. `_ -* `Keel Repository - Multiple datasets for classification, regression, time series. `_ -* `Lending Club Loan Data - Loan status (Current, Late, Fully Paid, etc.) and latest payment info. `_ -* `Machine Learning Data Set Repository - A data search engine for machine learning tasks. `_ -* `Million Song Dataset - Audio features and metadata for a million popular music tracks. `_ -* `More Song Datasets - Complementary data of cover songs, lyrics, user listening data. `_ -* `MovieLens Data Sets - Online movie recommendation including movie tags, user ratings. `_ +* `Delve Datasets for classification and regression (Univ. of Toronto) `_ +* `Discogs Monthly Data `_ +* `eBay Online Auctions (2012) `_ +* `IMDb Database `_ +* `Keel Repository for classification, regression and time series `_ +* `Lending Club Loan Data `_ +* `Machine Learning Data Set Repository `_ +* `Million Song Dataset `_ +* `More Song Datasets `_ +* `MovieLens Data Sets `_ * `RDataMining - "R and Data Mining" ebook data `_ -* `Registered Meteorites on Earth - 34,513 meteorites updated to 2012. `_ -* `Restaurants Health Score Data - Health status of restaurants in San Francisco. `_ -* `UCI Machine Learning Repository - One of most famous ML data repositories. `_ -* `Yahoo Ratings and Classification Data - About music, movies, user clicks, images etc. `_ +* `Registered Meteorites on Earth `_ +* `Restaurants Health Score Data in San Francisco `_ +* `UCI Machine Learning Repository `_ +* `Yahoo! Ratings and Classification Data `_ Museums @@ -229,36 +230,30 @@ Museums * `The Getty vocabularies `_ -Music ------ - -* `Discogs Data - Monthly dumps of Discogs Release, Artist and Label data. `_ - - Natural Language ---------------- -* `ClueWeb09 FACC - Annotated English-language Web pages from the ClueWeb09 corpora. `_ -* `ClueWeb12 FACC - Annotated English-language Web pages from the ClueWeb12 corpora. `_ -* `DBpedia - Multi-domain ontology describing 4.58M “things” with 583M “facts”. `_ -* `Flickr Personal Taxonomies - Personalized tagging pictures with descriptive labels. `_ -* `Google Books Ngrams (2.2TB) - N-gram corpuses extracted from Google Books. `_ -* `Google Web 5gram (1TB, 2006) - 5-gram corpuses extracted from Web pages. `_ -* `Gutenberg eBooks List - Basic information about each eBook from Project Gutenberg. `_ -* `Hansards - 1.3M aligned text chunks from official records of Canadian Parliament. `_ -* `Machine Translation - The recurring translation task focusing on European languages. `_ -* `SMS Spam Collection - 5,574 real English messages, labled as being ham or spam. `_ -* `USENET corpus - A collection of public USENET postings between Oct 2005 and Jan 2011. `_ -* `Wikidata - Wikipedia databases available in JSON and XML formats. `_ -* `Wikipedia Links data - 40 Million Entities in Context. `_ -* `WordNet - Databases, associated packages and tools. `_ +* `ClueWeb09 FACC `_ +* `ClueWeb12 FACC `_ +* `DBpedia - 4.58M “things” with 583M “facts”`_ +* `Flickr Personal Taxonomies `_ +* `Google Books Ngrams (2.2TB) `_ +* `Google Web 5gram (1TB, 2006) `_ +* `Gutenberg eBooks List `_ +* `Hansards text chunks of Canadian Parliament `_ +* `Machine Translation of European languages `_ +* `SMS Spam Collection in English `_ +* `USENET postings corpus of 2005~2011 `_ +* `Wikidata - Wikipedia databases `_ +* `Wikipedia Links data - 40 Million Entities in Context `_ +* `WordNet databases and tools `_ Physics ------- -* `CERN Open Data Portal - Experimental data of CMS experiment, ALICE, ATLAS and LHCb `_ -* `NSSDC (NASA) - More than 230 TB of data from about 550 space science spacecraft `_ +* `CERN Open Data Portal `_ +* `NSSDC (NASA) data of 550 space spacecraft `_ Public Domains @@ -289,77 +284,77 @@ Public Domains Search Engines -------------- -* `Academic Torrents (UMB) - Sharing enormous datasets, for researchers, by researchers. `_ -* `Archive-it - Web archiving service built at the Internet Archive `_ -* `Datahub.io - The easy way to get, use and share data `_ +* `Academic Torrents of data sharing from UMB `_ +* `Archive-it from Internet Archive `_ +* `Datahub.io `_ * `DataMarket (Qlik) `_ -* `Freebase.com - A community-curated database of well-known people, places, and things `_ -* `Harvard Dataverse Network - Scientific data for reproducible research `_ -* `ICPSR (UMICH) - Find and analyze data `_ -* `Statista.com - Statistics and Studies from more than 18,000 Sources `_ +* `Freebase.com of people, places, and things `_ +* `Harvard Dataverse Network of scientific data `_ +* `ICPSR (UMICH) `_ +* `Statista.com - statistics and Studies `_ Social Sciences --------------- -* `Ancestry.com Forum Dataset - Forum users and messages over ten years `_ -* `CMU Enron Email - 150 users, mostly senior management of Enron `_ -* `Facebook Data Scrape (2005) - 100 American colleges and univ. `_ +* `Ancestry.com Forum Dataset over 10 years `_ +* `CMU Enron Email of 150 users `_ +* `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ -* `Foursquare (2010, 2011) - Social networks, check-in locations and categories `_ -* `Foursquare from UMN/Sarwat (2013) - Users, venues, check-ins, ratings etc. `_ -* `General Social Survey (GSS, since 1972) - Demographic and attitudinal questions, topics etc. `_ -* `GetGlue - Users rating TV shows `_ -* `GitHub Archive - Programmers collaboration, projects progress etc. `_ -* `Mobile Social Networks (UMASS) - Timestamped mote-to-mote (up to 27 subjects) connections `_ -* `PewResearch Internet Project - A wide range of surveys about library usage, online dating etc. `_ -* `SourceForge.net Research Data - Historic and status statistics of projects and users' activities `_ -* `Stack Exchange Data Explorer - User-contributed content on the Stack Exchange network `_ -* `Titanic Survival Data Set - Demographic information of Titanic passengers `_ -* `Twitter Graph - Crawled entire Twitter site including tweets, user profiles, relations `_ -* `UCB's Archive of Social Science Data (D-Lab) - Holdings of political, social and health areas `_ -* `UCLA Social Sciences Data Archive - A collection of social science data on the Web `_ -* `UNIMI/LAW Social Network Datasets - Social networks like amazon, LiveJournal, dblp and more `_ -* `Universities Worldwide - Links to 9307 Universities in 205 countries `_ -* `UPJOHN for Employment Research - Labor surveys, unemployment spells and more `_ -* `Yahoo Graph and Social Data - Web page graph, user-group membership, IM friends etc. `_ -* `Youtube Video Graph (2007,2008) - Video relations, uploaders, views, ratings and more `_ +* `Foursquare Social Network in 2010, 2011 `_ +* `Foursquare from UMN/Sarwat (2013) `_ +* `General Social Survey (GSS) since 1972 `_ +* `GetGlue - users rating TV shows `_ +* `GitHub Collaboration Archive `_ +* `Mobile Social Networks from UMASS `_ +* `PewResearch Internet Survey Project `_ +* `SourceForge.net Research Data `_ +* `StackExchange Data Explorer `_ +* `Titanic Survival Data Set `_ +* `Twitter Graph of entire Twitter site `_ +* `UCB's Archive of Social Science Data (D-Lab) `_ +* `UCLA Social Sciences Data Archive `_ +* `UNIMI/LAW Social Network Datasets `_ +* `Universities Worldwide `_ +* `UPJOHN for Labor Employment Research `_ +* `Yahoo! Graph and Social Data `_ +* `Youtube Video Social Graph in 2007,2008 `_ Sports ------ -* `Betfair Event Results - Fully time-stamped historical Betfair exchange data `_ -* `Cricsheet (baseball) - Thousands of Cricket matches `_ -* `Ergast Formula 1, from 1950 up to date (API available) `_ +* `Betfair Historical Exchange Data `_ +* `Cricsheet Matches (baseball) `_ +* `Ergast Formula 1, from 1950 up to date (API) `_ * `Football/Soccer resouces (data and APIs) `_ -* `Lahman's Baseball Database - Batting and pitching statistics, team stats etc. `_ -* `Retrosheet (baseball) - Play-by-Play files, game logs and schedules `_ +* `Lahman's Baseball Database `_ +* `Retrosheet Baseball Statistics `_ Time Series ----------- -* `Time Series data Library (TSDL), created by Rob Hyndman, MU `_ -* `UC Riverside Time Series, for classification and clustering. `_ +* `Time Series Data Library (TSDL) from MU `_ +* `UC Riverside Time Series Dataset `_ Transportation -------------- -* `Airlines OD Data 1987-2008, used by ASA Challenge 2009 `_ -* `Bike Share Data Systems - Trip histories, site maps etc. `_ -* `Edge data for US domestic flights 1990 to 2009 `_ -* `Half a million Hubway rides in MA `_ -* `Marine Traffic - Ship tracks, port calls and more `_ -* `NYC Taxi Trip Data 2013 - FOIA/FOILed by Chris Whong `_ -* `OpenFlights - Airport, airline and route data `_ -* `RITA Airline On-Time Performance data of major air carriers in US `_ +* `Airlines OD Data 1987-2008 `_ +* `Bike Share Systems (BSS) collection `_ +* `Hubway Million Rides in MA `_ +* `Marine Traffic - ship tracks, port calls and more `_ +* `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ +* `OpenFlights - airport, airline and route data `_ +* `RITA Airline On-Time Performance data `_ * `RITA/BTS transport data collection (TranStat) `_ -* `Transport for London (TFL) - Trip histories and networking statistics `_ -* `Travel Tracker Survey (TTS), Chicago, 1990, 2007-2008 `_ +* `Transport for London (TFL) `_ +* `Travel Tracker Survey (TTS) for Chicago `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ -* `U.S. Freight Analysis Framework - Freight movement among states since 2007 `_ +* `U.S. Domestic Flights 1990 to 2009 `_ +* `U.S. Freight Analysis Framework since 2007 `_ Complementary Collections @@ -369,4 +364,4 @@ Complementary Collections * Inside-r: `Finding Data on the Internet `_ * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ -* StaTrek: `Leveraging open data to understand urban lives `_ +* StaTrek: `Leveraging open data to understand urban lives `_ \ No newline at end of file From bcba2f0ccc36acf41c09084857d1b23c1b048088 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sat, 31 Jan 2015 17:21:26 +0800 Subject: [PATCH 03/17] Remove format error --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 2a43faf..45e0fbd 100644 --- a/README.rst +++ b/README.rst @@ -235,7 +235,7 @@ Natural Language * `ClueWeb09 FACC `_ * `ClueWeb12 FACC `_ -* `DBpedia - 4.58M “things” with 583M “facts”`_ +* `DBpedia - 4.58M things with 583M facts `_ * `Flickr Personal Taxonomies `_ * `Google Books Ngrams (2.2TB) `_ * `Google Web 5gram (1TB, 2006) `_ From 93111c8fc5015d4423a38443acc7f72edf016569 Mon Sep 17 00:00:00 2001 From: "Dana \"Dani\"" Date: Tue, 3 Feb 2015 13:55:36 -0600 Subject: [PATCH 04/17] Update README.rst Added Dallas, Denver, Seattle city open data --- README.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 45e0fbd..0367bda 100644 --- a/README.rst +++ b/README.rst @@ -153,6 +153,8 @@ Government * `Australia (data.gov.au) `_ * `Canada `_ * `Chicago `_ +* `Dallas Open Data `_ +* `Denver Open Data `_ * `EuroStat `_ * `FedStats `_ * `Germany `_ @@ -167,6 +169,7 @@ Government * `OECD `_ * `Open Government Data (OGD) Platform India `_ * `San Francisco Data sets `_ +* `Seattle `_ * `South Africa `_ * `The World Bank `_ * `U.K. Government Data `_ @@ -364,4 +367,4 @@ Complementary Collections * Inside-r: `Finding Data on the Internet `_ * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ -* StaTrek: `Leveraging open data to understand urban lives `_ \ No newline at end of file +* StaTrek: `Leveraging open data to understand urban lives `_ From 6608a4db4228e300e7110eeecfff443b657eb164 Mon Sep 17 00:00:00 2001 From: gwulfs Date: Wed, 4 Feb 2015 14:36:47 -0500 Subject: [PATCH 05/17] add dataset --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 0367bda..3d236b1 100644 --- a/README.rst +++ b/README.rst @@ -340,6 +340,7 @@ Time Series * `Time Series Data Library (TSDL) from MU `_ * `UC Riverside Time Series Dataset `_ +* `Hard Drive Failure Rates `_ Transportation From 854741f3668bdbe502d4050b9ece47a027dcf172 Mon Sep 17 00:00:00 2001 From: "Jamin X. Chen" Date: Thu, 5 Feb 2015 10:27:24 +0800 Subject: [PATCH 06/17] Add serveral datasets contributed by community --- README.rst | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/README.rst b/README.rst index 3d236b1..e3bfdb8 100644 --- a/README.rst +++ b/README.rst @@ -134,11 +134,12 @@ GeoSpace/GIS ------------ * `BODC - marine data of ~22K vars `_ +* `Cambridge, MA, US, GIS data on GitHub `_ * `EOSDIS - NASA's earth observing system data `_ * `Factual Global Location Data `_ -* `Global Administrative Areas Database (GADM) `_ * `Geo Spatial Data from ASU `_ * `GeoNames Worldwide `_ +* `Global Administrative Areas Database (GADM) `_ * `Natural Earth - vectors and rasters of the world `_ * `Open Street Map (OSM) `_ * `TIGER/Line - U.S. boundaries and roads `_ @@ -151,12 +152,15 @@ Government * `Australia (abs.gov.au) `_ * `Australia (data.gov.au) `_ +* `Brazil `_ +* `Cambridge, MA, US `_ * `Canada `_ * `Chicago `_ * `Dallas Open Data `_ * `Denver Open Data `_ * `EuroStat `_ * `FedStats `_ +* `France `_ * `Germany `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ @@ -199,9 +203,14 @@ Healthcare Image Processing ---------------- +* `10k US Adult Faces Database `_ * `2GB of Photos of Cats `_ +* `Affective Image Classification `_ * `Face Recognition Benchmark `_ -* `ImageNet - an image database in WordNet hierarchy `_ +* `ImageNet (in WordNet hierarchy) `_ +* `International Affective Picture System, UFL `_ +* `Massive Visual Memory Stimuli, MIT `_ +* `SUN database, MIT `_ Machine Learning @@ -294,6 +303,7 @@ Search Engines * `Freebase.com of people, places, and things `_ * `Harvard Dataverse Network of scientific data `_ * `ICPSR (UMICH) `_ +* `Open Data Certificates (beta) `_ * `Statista.com - statistics and Studies `_ @@ -348,6 +358,7 @@ Transportation * `Airlines OD Data 1987-2008 `_ * `Bike Share Systems (BSS) collection `_ +* `Bay Area Bike Share Data `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ From e546c35955fe54a7c722c64fa86538a12a6b881d Mon Sep 17 00:00:00 2001 From: saurzcode Date: Mon, 23 Feb 2015 17:56:24 +0530 Subject: [PATCH 07/17] Adding Indian Goverment Data Set --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index e3bfdb8..ec33ccc 100644 --- a/README.rst +++ b/README.rst @@ -164,6 +164,7 @@ Government * `Germany `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ +* `Indian Government `_ * `London Datastore, UK `_ * `MassGIS, Massachusetts, U.S. `_ * `Netherlands `_ From 0c70538c270b1ccffb4538106c8ac16ec99d7bd1 Mon Sep 17 00:00:00 2001 From: saurzcode Date: Mon, 23 Feb 2015 17:58:38 +0530 Subject: [PATCH 08/17] Adding Indian Government Data. --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index ec33ccc..4150713 100644 --- a/README.rst +++ b/README.rst @@ -164,7 +164,7 @@ Government * `Germany `_ * `Glasgow, Scotland, UK `_ * `Guardian world governments `_ -* `Indian Government `_ +* `Indian Government Data `_ * `London Datastore, UK `_ * `MassGIS, Massachusetts, U.S. `_ * `Netherlands `_ From 6de2446e68db11f2d73ddd470ad8f844a9e7a463 Mon Sep 17 00:00:00 2001 From: gwulfs Date: Wed, 25 Feb 2015 22:56:02 -0500 Subject: [PATCH 09/17] blogger corpus --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 4150713..945ece0 100644 --- a/README.rst +++ b/README.rst @@ -246,6 +246,7 @@ Museums Natural Language ---------------- +* `Blogger Corpus `_ * `ClueWeb09 FACC `_ * `ClueWeb12 FACC `_ * `DBpedia - 4.58M things with 583M facts `_ From 575032e57663261be6959ddcc70e752579f8843a Mon Sep 17 00:00:00 2001 From: bore3601 Date: Fri, 6 Mar 2015 17:26:38 +0800 Subject: [PATCH 10/17] Update README.rst --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index 945ece0..7250eee 100644 --- a/README.rst +++ b/README.rst @@ -334,6 +334,8 @@ Social Sciences * `UPJOHN for Labor Employment Research `_ * `Yahoo! Graph and Social Data `_ * `Youtube Video Social Graph in 2007,2008 `_ +* `Google Scholar citation relations `_ +* `Political Polarity Data `_ Sports From 24f4bdd48b5d979bd5cf53ab152c8394f51bfc20 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 18 Mar 2015 15:03:50 +0800 Subject: [PATCH 11/17] Fix the link of StaTrek --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 7250eee..2769b9e 100644 --- a/README.rst +++ b/README.rst @@ -383,4 +383,4 @@ Complementary Collections * Inside-r: `Finding Data on the Internet `_ * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ -* StaTrek: `Leveraging open data to understand urban lives `_ +* StaTrek: `Leveraging open data to understand urban lives `_ From 68fbe5674e6046411fa9ebe15a129b7262b3f5ae Mon Sep 17 00:00:00 2001 From: Xiaming Date: Fri, 20 Mar 2015 10:41:24 +0800 Subject: [PATCH 12/17] Fix #21 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 2769b9e..b2f8392 100644 --- a/README.rst +++ b/README.rst @@ -372,7 +372,7 @@ Transportation * `Transport for London (TFL) `_ * `Travel Tracker Survey (TTS) for Chicago `_ * `U.S. Bureau of Transportation Statistics (BTS) `_ -* `U.S. Domestic Flights 1990 to 2009 `_ +* `U.S. Domestic Flights 1990 to 2009 `_ * `U.S. Freight Analysis Framework since 2007 `_ From 4493cd0c447d0eda3631576165c8cd1ad78cbd48 Mon Sep 17 00:00:00 2001 From: Shaun Walbridge Date: Sun, 22 Mar 2015 22:30:09 -0400 Subject: [PATCH 13/17] Add Landsat 8 on Amazon AWS Landsat 8 access over AWS; includes instructions for access from 5 different providers. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index b2f8392..28f65bb 100644 --- a/README.rst +++ b/README.rst @@ -140,6 +140,7 @@ GeoSpace/GIS * `Geo Spatial Data from ASU `_ * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ +* `Landsat 8 on AWS `_ * `Natural Earth - vectors and rasters of the world `_ * `Open Street Map (OSM) `_ * `TIGER/Line - U.S. boundaries and roads `_ From f52376d2707a2f76ea062e5659dae9ac34ca2ae3 Mon Sep 17 00:00:00 2001 From: Nilemar de Barcelos Date: Mon, 30 Mar 2015 13:06:33 -0300 Subject: [PATCH 14/17] Added dataset for "Number of Ebola Cases and Deaths in Affected Countries" --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 28f65bb..a892d91 100644 --- a/README.rst +++ b/README.rst @@ -199,7 +199,7 @@ Healthcare * `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ - +* `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ Image Processing From 1423edaa0b87299d573ec97d43554b4e4c338f91 Mon Sep 17 00:00:00 2001 From: gwulfs Date: Tue, 31 Mar 2015 17:48:15 -0400 Subject: [PATCH 15/17] criteo --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 4150713..04b35f3 100644 --- a/README.rst +++ b/README.rst @@ -75,6 +75,7 @@ Computer Networks * `ClueWeb12 - 733M web pages `_ * `CommonCrawl Web Data over 7 years `_ * `CRAWDAD Wireless datasets from Dartmouth Univ. `_ +* `Criteo click-through data `_ * `Open Mobile Data by MobiPerf `_ * `UCSD Network Telescope, IPv4 /8 net `_ From 5c1a131a22966e10fdf4709bf39e42061cdfaaef Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 1 Apr 2015 12:56:22 +0800 Subject: [PATCH 16/17] Add GeoLife data and MIT reality mining --- README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.rst b/README.rst index d81bbb5..c6f59f5 100644 --- a/README.rst +++ b/README.rst @@ -323,6 +323,7 @@ Social Sciences * `General Social Survey (GSS) since 1972 `_ * `GetGlue - users rating TV shows `_ * `GitHub Collaboration Archive `_ +* `MIT Reality Mining Dataset `_ * `Mobile Social Networks from UMASS `_ * `PewResearch Internet Survey Project `_ * `SourceForge.net Research Data `_ @@ -365,6 +366,7 @@ Transportation * `Airlines OD Data 1987-2008 `_ * `Bike Share Systems (BSS) collection `_ * `Bay Area Bike Share Data `_ +* `GeoLife GPS Trajectory from Microsoft Research `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ From 4ad9bb7e0d2805fe5165bf59e5442b06934343f7 Mon Sep 17 00:00:00 2001 From: Xiaming Date: Wed, 1 Apr 2015 13:09:03 +0800 Subject: [PATCH 17/17] Add D4D challenge --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index c6f59f5..256eb58 100644 --- a/README.rst +++ b/README.rst @@ -84,6 +84,7 @@ Data Challenges --------------- * `Challenges in Machine Learning `_ +* `D4D Challenge of Orange `_ * `DrivenData Competitions for Social Good `_ * `ICWSM Data Challenge (since 2009) `_ * `Kaggle Competition Data `_