diff --git a/README.rst b/README.rst index faad736..09fcbd7 100644 --- a/README.rst +++ b/README.rst @@ -34,12 +34,12 @@ Biology * `EBI ArrayExrepss `_ * `ENCODE project `_ * `Human Microbiome Project (HMP) `_ -* `ICOS PSP Benchmark `_ +* `ICOS PSP Benchmark `_ * `MIT Cancer Genomics Data `_ * `NIH Microarray data (FTP) `_ * `OpenSNP genotypes data `_ * `Pathguid: Protein-Protein Interactions Catalog `_ -* `Protein Data Bank `_ +* `Protein Data Bank `_ * `PubChem Project `_ * `PubGene (now Coremine Medical) `_ * `Stanford Microarray Data `_ @@ -56,10 +56,10 @@ Climate/Weather * `Brazilian Weather - Historical data (In Portuguese) `_ * `Canadian Meteorological Centre `_ * `Climate Data from UEA (updated monthly) `_ -* `Global Climate Data Since 1929 `_ +* `Global Climate Data Since 1929 `_ * `NASA Global Imagery Browse Services `_ * `NOAA Bering Sea Climate `_ -* `NOAA Climate Datasets `_ +* `NOAA Climate Datasets `_ * `NOAA Realtime Weather Models `_ * `The World Bank Open Data Resources for Climate Change `_ * `UEA Climatic Research Unit `_ @@ -74,8 +74,8 @@ Complex Networks * `NBER Patent Citations `_ * `NIST complex networks data collection `_ * `Protein-protein interaction network `_ -* `PyPI and Maven Dependency Network `_ -* `Scopus Citation Database `_ +* `PyPI and Maven Dependency Network `_ +* `Scopus Citation Database `_ * `Small Network Data `_ * `Stanford GraphBase (Steven Skiena) `_ * `Stanford Large Network Dataset Collection `_ @@ -92,13 +92,13 @@ Computer Networks ----------------- * `3.5B Web Pages from CommonCraw 2012 `_ -* `53.5B Web clicks of 100K users in Indiana Univ. `_ +* `53.5B Web clicks of 100K users in Indiana Univ. `_ * `CAIDA Internet Datasets `_ * `ClueWeb09 - 1B web pages `_ * `ClueWeb12 - 733M web pages `_ * `CommonCrawl Web Data over 7 years `_ -* `CRAWDAD Wireless datasets from Dartmouth Univ. `_ -* `Criteo click-through data `_ +* `CRAWDAD Wireless datasets from Dartmouth Univ. `_ +* `Criteo click-through data `_ * `Open Mobile Data by MobiPerf `_ * `UCSD Network Telescope, IPv4 /8 net `_ @@ -114,14 +114,14 @@ Data Challenges * `Challenges in Machine Learning `_ * `D4D Challenge of Orange `_ -* `CrowdANALYTIX dataX `_ +* `CrowdANALYTIX dataX `_ * `DrivenData Competitions for Social Good `_ * `ICWSM Data Challenge (since 2009) `_ -* `Kaggle Competition Data `_ +* `Kaggle Competition Data `_ * `KDD Cup by Tencent 2012 `_ * `Localytics Data Visualization Challenge `_ * `Netflix Prize `_ -* `Space Apps Challenge `_ +* `Space Apps Challenge `_ * `Telecom Italia Big Data Challenge `_ * `Yelp Dataset Challenge `_ @@ -129,7 +129,7 @@ Data Challenges Economics --------- -* `American Economic Ass (AEA) `_ +* `American Economic Ass (AEA) `_ * `EconData from UMD `_ * `Internet Product Code Database `_ @@ -159,14 +159,14 @@ Finance * `NASDAQ `_ * `OANDA `_ * `OSU Financial data `_ -* `Quandl `_ -* `St Louis Federal `_ +* `Quandl `_ +* `St Louis Federal `_ * `Yahoo Finance `_ Geology ------- * `USGS Earthquake Archives `_ -* `Smithsonian Institution Global Volcano and Eruption Database `_ +* `Smithsonian Institution Global Volcano and Eruption Database `_ GeoSpace/GIS @@ -175,7 +175,7 @@ GeoSpace/GIS * `BODC - marine data of ~22K vars `_ * `Cambridge, MA, US, GIS data on GitHub `_ * `EOSDIS - NASA's earth observing system data `_ -* `Factual Global Location Data `_ +* `Factual Global Location Data `_ * `Geo Spatial Data from ASU `_ * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ @@ -201,7 +201,7 @@ Government * `Belgium `_ * `Brazil `_ * `Cambridge, MA, US `_ -* `Canada `_ +* `Canada `_ * `Chicago `_ * `Dallas Open Data `_ * `Denver Open Data `_ @@ -214,9 +214,9 @@ Government * `Germany `_ * `Ghent, Belgium `_ * `Glasgow, Scotland, UK `_ -* `Guardian world governments `_ +* `Guardian world governments `_ * `Houston Open Data `_ -* `Indian Government Data `_ +* `Indian Government Data `_ * `Indonesian Data Portal `_ * `London Datastore, UK `_ * `Los Angeles Open Data `_ @@ -225,17 +225,17 @@ Government * `Netherlands `_ * `New Zealand `_ * `NYC betanyc `_ -* `NYC Open Data `_ +* `NYC Open Data `_ * `OECD `_ * `Oklahoma `_ -* `Open Government Data (OGD) Platform India `_ +* `Open Government Data (OGD) Platform India `_ * `Oregon `_ -* `Portland, Oregon `_ +* `Portland, Oregon `_ * `Rio de Janeiro, Brazil `_ * `Romania `_ * `San Francisco Data sets `_ * `Seattle `_ -* `Singapore Government Data `_ +* `Singapore Government Data `_ * `South Africa `_ * `Switzerland `_ * `The World Bank `_ @@ -247,8 +247,8 @@ Government * `U.S. CDC Public Health datasets `_ * `U.S. Census Bureau `_ * `U.S. National Center for Education Statistics (NCES) `_ -* `U.S. Department of Housing and Urban Development (HUD) `_ -* `U.S. Federal Government Agencies `_ +* `U.S. Department of Housing and Urban Development (HUD) `_ +* `U.S. Federal Government Agencies `_ * `U.S. Federal Government Data Catalog `_ * `U.S. Food and Drug Administration (FDA) `_ * `U.S. Open Government `_ @@ -262,7 +262,7 @@ Healthcare * `EHDP Large Health Data Sets `_ * `Gapminder World, demographic databases `_ -* `Medicare Coverage Database (MCD), U.S. `_ +* `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ * `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ @@ -326,7 +326,7 @@ Natural Language * `ClueWeb12 FACC `_ * `DBpedia - 4.58M things with 583M facts `_ * `Flickr Personal Taxonomies `_ -* `Google Books Ngrams (2.2TB) `_ +* `Google Books Ngrams (2.2TB) `_ * `Google Web 5gram (1TB, 2006) `_ * `Gutenberg eBooks List `_ * `Hansards text chunks of Canadian Parliament `_ @@ -356,7 +356,7 @@ Psychology/Cognition Public Domains -------------- -* `Amazon `_ +* `Amazon `_ * `Archive.org Datasets `_ * `CMU JASA data archive `_ * `CMU StatLab collections `_ @@ -367,15 +367,15 @@ Public Domains * `KDNuggets Data Collections `_ * `Microsoft Azure Data Market Free DataSets `_ * `Numbray `_ -* `Reddit Datasets `_ -* `RevolutionAnalytics Collection `_ +* `Reddit Datasets `_ +* `RevolutionAnalytics Collection `_ * `Sample R data sets `_ * `Stats4Stem R data sets `_ * `StatSci.org `_ * `The Washington Post List `_ * `UCLA SOCR data collection `_ * `UFO Reports `_ -* `Wikileaks 911 pager intercepts `_ +* `Wikileaks 911 pager intercepts `_ * `Yahoo Webscope `_ @@ -384,20 +384,20 @@ Search Engines * `Academic Torrents of data sharing from UMB `_ * `Archive-it from Internet Archive `_ -* `Datahub.io `_ +* `Datahub.io `_ * `DataMarket (Qlik) `_ * `Freebase.com of people, places, and things `_ -* `Harvard Dataverse Network of scientific data `_ +* `Harvard Dataverse Network of scientific data `_ * `ICPSR (UMICH) `_ -* `Open Data Certificates (beta) `_ +* `Open Data Certificates (beta) `_ * `Statista.com - statistics and Studies `_ Social Networks --------------- * `72 hours #gamergate scrape `_ -* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ -* `May 2011 Calufa Twitter Scrape `_ +* `Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape `_ +* `May 2011 Calufa Twitter Scrape `_ * `Network Twitter Data `_ * `Social Twitter Data `_ * `Twitter Data for Sentiment Analysis `_ @@ -407,7 +407,7 @@ Social Sciences * `Ancestry.com Forum Dataset over 10 years `_ * `CMU Enron Email of 150 users `_ -* `EDRM Enron EMail of 151 users, hosted on S3 `_ +* `EDRM Enron EMail of 151 users, hosted on S3 `_ * `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ * `FBI Hate Crime 2013 - aggregated data `_ @@ -415,12 +415,12 @@ Social Sciences * `Foursquare from UMN/Sarwat (2013) `_ * `General Social Survey (GSS) since 1972 `_ * `GetGlue - users rating TV shows `_ -* `GitHub Collaboration Archive `_ +* `GitHub Collaboration Archive `_ * `MIT Reality Mining Dataset `_ * `Mobile Social Networks from UMASS `_ * `PewResearch Internet Survey Project `_ * `Reddit Comments `_ -* `SourceForge.net Research Data `_ +* `SourceForge.net Research Data `_ * `StackExchange Data Explorer `_ * `Titanic Survival Data Set `_ * `Texas Inmates Executed Since 1984 `_ @@ -463,10 +463,10 @@ Transportation * `Airlines OD Data 1987-2008 `_ * `Bike Share Systems (BSS) collection `_ -* `Bay Area Bike Share Data `_ +* `Bay Area Bike Share Data `_ * `GeoLife GPS Trajectory from Microsoft Research `_ * `Hubway Million Rides in MA `_ -* `Marine Traffic - ship tracks, port calls and more `_ +* `Marine Traffic - ship tracks, port calls and more `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Taxi Trip Data 2009- `_ * `OpenFlights - airport, airline and route data `_ @@ -487,8 +487,8 @@ Complementary Collections * DataWrangling: `Some Datasets Available on the Web `_ * Inside-r: `Finding Data on the Internet `_ * Quora: `Where can I find large datasets open to the public? `_ -* RS.io: `100+ Interesting Data Sets for Statistics `_ +* RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ * OpenDataMonitor: `An overview of available open data resources in Europe `_ -* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ +* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ * Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. `_