From bf5e282f438f4f43af02f99db260f445d172acf1 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Tue, 8 Dec 2015 13:23:43 +0800 Subject: [PATCH] Add TCGA #126; Clear format. --- README.rst | 71 +++++++++++++++++++++++++++++------------------------- 1 file changed, 38 insertions(+), 33 deletions(-) diff --git a/README.rst b/README.rst index 7c94bde..a7aab67 100644 --- a/README.rst +++ b/README.rst @@ -5,7 +5,7 @@ Awesome Public Datasets :target: https://github.com/sindresorhus/awesome .. image:: https://travis-ci.org/caesar0301/awesome-public-datasets.svg :target: https://travis-ci.org/caesar0301/awesome-public-datasets - + `This list of public data sources `_ are collected and tidied from blogs, answers, and user reponses. Most of the data sets listed below are free, however, some are not. @@ -27,12 +27,11 @@ Biology * `1000 Genomes `_ * `American Gut (Microbiome Project) `_ * `Collaborative Research in Computational Neuroscience (CRCNS) `_ +* `EBI ArrayExrepss `_ +* `ENCODE project `_ * `Gene Expression Omnibus (GEO) `_ * `Gene Ontology (GO) `_ * `Global Biotic Interations (GloBI) `_ -* `Sequence Read Archive(SRA) `_ -* `EBI ArrayExrepss `_ -* `ENCODE project `_ * `Human Microbiome Project (HMP) `_ * `ICOS PSP Benchmark `_ * `MIT Cancer Genomics Data `_ @@ -42,11 +41,12 @@ Biology * `Protein Data Bank `_ * `PubChem Project `_ * `PubGene (now Coremine Medical) `_ +* `Sequence Read Archive(SRA) `_ * `Stanford Microarray Data `_ +* `The Catalogue of Life `_ * `The Personal Genome Project `_ or `PGP `_ * `UCSC Public Data `_ * `UniGene `_ -* `The Catalogue of Life `_ Climate/Weather @@ -62,8 +62,8 @@ Climate/Weather * `NOAA Climate Datasets `_ * `NOAA Realtime Weather Models `_ * `The World Bank Open Data Resources for Climate Change `_ -* `WorldClim - Global Climate Data `_ * `UEA Climatic Research Unit `_ +* `WorldClim - Global Climate Data `_ * `WU Historical Weather Worldwide `_ @@ -114,8 +114,8 @@ Data Challenges --------------- * `Challenges in Machine Learning `_ -* `D4D Challenge of Orange `_ * `CrowdANALYTIX dataX `_ +* `D4D Challenge of Orange `_ * `DrivenData Competitions for Social Good `_ * `ICWSM Data Challenge (since 2009) `_ * `Kaggle Competition Data `_ @@ -166,8 +166,9 @@ Finance Geology ------- -* `USGS Earthquake Archives `_ + * `Smithsonian Institution Global Volcano and Eruption Database `_ +* `USGS Earthquake Archives `_ GeoSpace/GIS @@ -181,14 +182,14 @@ GeoSpace/GIS * `GeoNames Worldwide `_ * `Global Administrative Areas Database (GADM) `_ * `Landsat 8 on AWS `_ +* `List of all countries in all languages `_ * `Natural Earth - vectors and rasters of the world `_ +* `OpenAddresses `_ * `OpenStreetMap (OSM) `_ * `TIGER/Line - U.S. boundaries and roads `_ * `TwoFishes - Foursquare's coarse geocoder `_ * `TZ Timezones shapfiles `_ * `World countries in multiple formats `_ -* `List of all countries in all languages `_ -* `OpenAddresses `_ Government @@ -232,6 +233,7 @@ Government * `Open Government Data (OGD) Platform India `_ * `Oregon `_ * `Portland, Oregon `_ +* `Puerto Rico Government `_ * `Rio de Janeiro, Brazil `_ * `Romania `_ * `Russia `_ @@ -240,22 +242,21 @@ Government * `Singapore Government Data `_ * `South Africa `_ * `Switzerland `_ -* `The World Bank `_ * `Texas Open Data `_ -* `Puerto Rico Government `_ +* `The World Bank `_ * `U.K. Government Data `_ -* `Uruguay `_ * `U.S. American Community Survey `_ * `U.S. CDC Public Health datasets `_ * `U.S. Census Bureau `_ -* `U.S. National Center for Education Statistics (NCES) `_ * `U.S. Department of Housing and Urban Development (HUD) `_ * `U.S. Federal Government Agencies `_ * `U.S. Federal Government Data Catalog `_ * `U.S. Food and Drug Administration (FDA) `_ +* `U.S. National Center for Education Statistics (NCES) `_ * `U.S. Open Government `_ * `UK 2011 Census Open Atlas Project `_ * `United Nations `_ +* `Uruguay `_ * `Vancouver, BC Open Data Catalog `_ @@ -270,6 +271,7 @@ Healthcare * `MeSH, the vocabulary thesaurus used for indexing articles for PubMed `_ * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ +* `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ Image Processing @@ -277,17 +279,17 @@ Image Processing * `10k US Adult Faces Database `_ * `2GB of Photos of Cats (Original down - 20Agst2015) `_ or `Archive version `_ -* `Stanford Dogs Dataset `_ -* `The Oxford-IIIT Pet Dataset `_ -* `Animals with attributes `_ * `Affective Image Classification `_ +* `Animals with attributes `_ * `Face Recognition Benchmark `_ * `ImageNet (in WordNet hierarchy) `_ +* `Indoor Scene Recognition `_ * `International Affective Picture System, UFL `_ * `Massive Visual Memory Stimuli, MIT `_ +* `Stanford Dogs Dataset `_ * `SUN database, MIT `_ +* `The Oxford-IIIT Pet Dataset `_ * `YouTube Faces Database `_ -* `Indoor Scene Recognition `_ Machine Learning @@ -334,8 +336,8 @@ Natural Language * `Gutenberg eBooks List `_ * `Hansards text chunks of Canadian Parliament `_ * `Machine Translation of European languages `_ -* `SMS Spam Collection in English `_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ +* `SMS Spam Collection in English `_ * `USENET postings corpus of 2005~2011 `_ * `Wikidata - Wikipedia databases `_ * `Wikipedia Links data - 40 Million Entities in Context `_ @@ -346,10 +348,11 @@ Physics ------- * `CERN Open Data Portal `_ -* `NSSDC (NASA) data of 550 space spacecraft `_ * `NASA Exoplanet Archive `_ +* `NSSDC (NASA) data of 550 space spacecraft `_ * `Sloan Digital Sky Survey (SDSS) - Mapping the Universe `_ + Psychology/Cognition -------------- @@ -395,6 +398,7 @@ Search Engines * `Open Data Certificates (beta) `_ * `Statista.com - statistics and Studies `_ + Social Networks --------------- @@ -405,6 +409,7 @@ Social Networks * `Social Twitter Data `_ * `Twitter Data for Sentiment Analysis `_ + Social Sciences --------------- @@ -414,19 +419,23 @@ Social Sciences * `Facebook Data Scrape (2005) `_ * `Facebook Social Networks from LAW (since 2007) `_ * `FBI Hate Crime 2013 - aggregated data `_ -* `Foursquare Social Network in 2010, 2011 `_ * `Foursquare from UMN/Sarwat (2013) `_ +* `Foursquare Social Network in 2010, 2011 `_ +* `GDELT Global Events Database `_ * `General Social Survey (GSS) since 1972 `_ * `GetGlue - users rating TV shows `_ * `GitHub Collaboration Archive `_ +* `Google Scholar citation relations `_ * `MIT Reality Mining Dataset `_ * `Mobile Social Networks from UMASS `_ * `PewResearch Internet Survey Project `_ +* `Political Polarity Data `_ * `Reddit Comments `_ +* `Skytrax' Air Travel Reviews Dataset `_ * `SourceForge.net Research Data `_ * `StackExchange Data Explorer `_ -* `Titanic Survival Data Set `_ * `Texas Inmates Executed Since 1984 `_ +* `Titanic Survival Data Set `_ * `Twitter Graph of entire Twitter site `_ * `UCB's Archive of Social Science Data (D-Lab) `_ * `UCLA Social Sciences Data Archive `_ @@ -435,10 +444,6 @@ Social Sciences * `UPJOHN for Labor Employment Research `_ * `Yahoo! Graph and Social Data `_ * `Youtube Video Social Graph in 2007,2008 `_ -* `Google Scholar citation relations `_ -* `Political Polarity Data `_ -* `GDELT Global Events Database `_ -* `Skytrax' Air Travel Reviews Dataset `_ Sports @@ -455,23 +460,24 @@ Sports Time Series ----------- -* `Time Series Data Library (TSDL) from MU `_ -* `UC Riverside Time Series Dataset `_ * `Hard Drive Failure Rates `_ * `Heart Rate Time Series from MIT `_ +* `Time Series Data Library (TSDL) from MU `_ +* `UC Riverside Time Series Dataset `_ Transportation -------------- * `Airlines OD Data 1987-2008 `_ -* `Bike Share Systems (BSS) collection `_ * `Bay Area Bike Share Data `_ +* `Bike Share Systems (BSS) collection `_ * `GeoLife GPS Trajectory from Microsoft Research `_ * `Hubway Million Rides in MA `_ * `Marine Traffic - ship tracks, port calls and more `_ -* `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ * `NYC Taxi Trip Data 2009- `_ +* `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ +* `NYC Uber trip data April 2014 to September 2014 `_ * `OpenFlights - airport, airline and route data `_ * `Plane Crash Database, since 1920 `_ * `RITA Airline On-Time Performance data `_ @@ -481,7 +487,6 @@ Transportation * `U.S. Bureau of Transportation Statistics (BTS) `_ * `U.S. Domestic Flights 1990 to 2009 `_ * `U.S. Freight Analysis Framework since 2007 `_ -* `NYC Uber trip data April 2014 to September 2014 `_ Complementary Collections @@ -489,9 +494,9 @@ Complementary Collections * DataWrangling: `Some Datasets Available on the Web `_ * Inside-r: `Finding Data on the Internet `_ +* OpenDataMonitor: `An overview of available open data resources in Europe `_ +* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ * Quora: `Where can I find large datasets open to the public? `_ * RS.io: `100+ Interesting Data Sets for Statistics `_ * StaTrek: `Leveraging open data to understand urban lives `_ -* OpenDataMonitor: `An overview of available open data resources in Europe `_ -* OpenDataNetwork: `A search engine of all Socrata powered data portals ranging from small cities to federal agencies and non-profits `_ * Zenodo: `An open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science. `_