mirror of
https://github.com/awesomedata/awesome-public-datasets.git
synced 2024-04-18 07:30:58 +08:00
Merge pull request #2 from caesar0301/master
Getting my Awesome Public Datasets up to date
This commit is contained in:
commit
574397731e
|
@ -51,20 +51,24 @@ Government
|
|||
* `London, ON, Canada <http://www.london.ca/city-hall/open-data/Pages/default.aspx>`_
|
||||
* `Los Angeles Open Data <https://data.lacity.org/>`_
|
||||
* `MassGIS, Massachusetts, U.S. <http://www.mass.gov/anf/research-and-tech/it-serv-and-support/application-serv/office-of-geographic-information-massgis/>`_
|
||||
* `Metropolitain Transportation Commission (MTC), California, US <http://mtc.ca.gov/tools-resources/data-tools/open-data-library>`_
|
||||
* `Mexico <http://catalogo.datos.gob.mx/dataset>`_
|
||||
* `Missisauga, ON, Canada <http://www.mississauga.ca/portal/residents/publicationsopendatacatalogue>`_
|
||||
* `Moldova <http://data.gov.md/>`_
|
||||
* `Moncton, NB, Canada <http://www.moncton.ca/Government/Terms_of_use/Open_Data_Purpose/Data_Catalogue.htm>`_
|
||||
* `Mountain View, California, US (GIS) <http://data-mountainview.opendata.arcgis.com/>`_
|
||||
* `Montreal, QC, Canada <http://donnees.ville.montreal.qc.ca/>`_
|
||||
* `Netherlands <https://data.overheid.nl/>`_
|
||||
* `New Zealand <http://www.stats.govt.nz/browse_for_stats.aspx>`_
|
||||
* `NYC betanyc <http://betanyc.us/>`_
|
||||
* `NYC Open Data <https://nycplatform.socrata.com/>`_
|
||||
* `Oakland, California, US <https://data.oaklandnet.com/>`_
|
||||
* `OECD <https://data.oecd.org/>`_
|
||||
* `Oklahoma <https://data.ok.gov/>`_
|
||||
* `Open Government Data (OGD) Platform India <https://data.gov.in/>`_
|
||||
* `Oregon <https://data.oregon.gov/>`_
|
||||
* `Ottawa, ON, Canada <http://data.ottawa.ca/en/>`_
|
||||
* `Palo Alto, California, US <http://data.cityofpaloalto.org/home>`_
|
||||
* `Portland, Oregon <https://www.portlandoregon.gov/28130>`_
|
||||
* `Portugal - Pordata organization <http://www.pordata.pt/en/Home>`_
|
||||
* `Puerto Rico Government <https://data.pr.gov//>`_
|
||||
|
@ -75,6 +79,8 @@ Government
|
|||
* `Romania <http://data.gov.ro/>`_
|
||||
* `Russia <http://data.gov.ru>`_
|
||||
* `San Francisco Data sets <http://datasf.org/>`_
|
||||
* `San Jose, California, US <http://data.sanjoseca.gov/home/>`_
|
||||
* `San Mateo County, California, US <https://data.smcgov.org/>`_
|
||||
* `Saskatchewan, Province of Canada <http://opendatask.ca/data/>`_
|
||||
* `Seattle <https://data.seattle.gov/>`_
|
||||
* `Singapore Government Data <https://data.gov.sg/>`_
|
||||
|
@ -89,8 +95,8 @@ Government
|
|||
* `Toronto, ON, Canada <http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=1a66e03bb8d1e310VgnVCM10000071d60f89RCRD>`_
|
||||
* `Tunisia <http://www.data.gov.tn/>`_
|
||||
* `U.K. Government Data <http://data.gov.uk/data>`_
|
||||
* `U.S. American Community Survey <http://www.census.gov/acs/www/data_documentation/data_release_info/>`_
|
||||
* `U.S. CDC Public Health datasets <http://www.cdc.gov/nchs/data_access/ftp_data.htm>`_
|
||||
* `U.S. American Community Survey <https://www.census.gov/programs-surveys/acs/data.html/>`_
|
||||
* `U.S. CDC Public Health datasets <https://www.cdc.gov/nchs/data_access/ftp_data.htm>`_
|
||||
* `U.S. Census Bureau <http://www.census.gov/data.html>`_
|
||||
* `U.S. Department of Housing and Urban Development (HUD) <http://www.huduser.gov/portal/datasets/pdrdatas.html>`_
|
||||
* `U.S. Federal Government Agencies <http://www.data.gov/metrics>`_
|
||||
|
@ -102,6 +108,7 @@ Government
|
|||
* `UK 2011 Census Open Atlas Project <http://www.alex-singleton.com/r/2014/02/05/2011-census-open-atlas-project-version-two/>`_
|
||||
* `United Nations <http://data.un.org/>`_
|
||||
* `Uruguay <https://catalogodatos.gub.uy/>`_
|
||||
* `Valley Transportation Authority (VTA), California, US <https://data.vta.org/>`_
|
||||
* `Vancouver, BC Open Data Catalog <http://data.vancouver.ca/datacatalogue/>`_
|
||||
* `Victoria, BC, Canada <http://www.victoria.ca/EN/main/city/open-data-catalogue.html>`_
|
||||
* `Vienna, Austria <https://open.wien.gv.at/site/open-data/>`_
|
||||
|
|
21
README.rst
21
README.rst
|
@ -4,7 +4,7 @@ Awesome Public Datasets
|
|||
:alt: Awesome
|
||||
:target: https://github.com/sindresorhus/awesome
|
||||
|
||||
`This list of public data sources <https://github.com/caesar0301/awesome-public-datasets>`_
|
||||
`This list of a topic-centric public data sources <https://github.com/caesar0301/awesome-public-datasets>`_ in high quality. They
|
||||
are collected and tidied from blogs, answers, and user responses.
|
||||
Most of the data sets listed below are free, however, some are not.
|
||||
Other amazingly awesome lists can be found in the
|
||||
|
@ -199,7 +199,6 @@ Energy
|
|||
* `AMPds <http://ampds.org/>`_
|
||||
* `BLUEd <http://nilm.cmubi.org/>`_
|
||||
* `COMBED <http://combed.github.io/>`_
|
||||
* `Dataport <https://dataport.pecanstreet.org/>`_
|
||||
* `DRED <http://www.st.ewi.tudelft.nl/~akshay/dred/>`_
|
||||
* `ECO <http://www.vs.inf.ethz.ch/res/show.html?what=eco-data>`_
|
||||
* `EIA <http://www.eia.gov/electricity/data/eia923/>`_
|
||||
|
@ -269,6 +268,8 @@ Healthcare
|
|||
|
||||
* `EHDP Large Health Data Sets <http://www.ehdp.com/vitalnet/datasets.htm>`_
|
||||
* `Gapminder World demographic databases <http://www.gapminder.org/data/>`_
|
||||
* `GDC supports several cancer genome programs for CCG, TCGA, TARGET etc. <https://gdc.cancer.gov/>`_
|
||||
* `PhysioBank Databases - a large and growing archive of physiological data <https://www.physionet.org/physiobank/database/>`_
|
||||
* `Medicare Coverage Database (MCD), U.S. <https://www.cms.gov/medicare-coverage-database/>`_
|
||||
* `Medicare Data Engine of medicare.gov Data <https://data.medicare.gov/>`_
|
||||
* `Medicare Data File <http://go.cms.gov/19xxPN4>`_
|
||||
|
@ -276,7 +277,7 @@ Healthcare
|
|||
* `Number of Ebola Cases and Deaths in Affected Countries (2014) <https://data.hdx.rwlabs.org/dataset/ebola-cases-2014>`_
|
||||
* `Open-ODS (structure of the UK NHS) <http://www.openods.co.uk>`_
|
||||
* `OpenPaymentsData, Healthcare financial relationship data <https://openpaymentsdata.cms.gov>`_
|
||||
* `The Cancer Genome Atlas project (TCGA) <https://tcga-data.nci.nih.gov/tcga/tcgaDownload.jsp>`_ and `BigQuery table <http://google-genomics.readthedocs.org/en/latest/use_cases/discover_public_data/isb_cgc_data.html>`_
|
||||
* The Cancer Genome Atlas project (TCGA) (refer to `GDC <https://portal.gdc.cancer.gov/>`_ and `BigQuery table <http://google-genomics.readthedocs.org/en/latest/use_cases/discover_public_data/isb_cgc_data.html>`_)
|
||||
* `World Health Organization Global Health Observatory <http://www.who.int/gho/en/>`_
|
||||
|
||||
|
||||
|
@ -288,7 +289,7 @@ Image Processing
|
|||
* `Adience Unfiltered faces for gender and age classification <http://www.openu.ac.il/home/hassner/Adience/data.html>`_
|
||||
* `Affective Image Classification <http://www.imageemotion.org/>`_
|
||||
* `Animals with attributes <http://attributes.kyb.tuebingen.mpg.de/>`_
|
||||
* `Caltech Pedestrian Detection Benchmark <https://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/>`_
|
||||
* `Caltech Pedestrian Detection Benchmark <http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/>`_
|
||||
* `Chars74K dataset, Character Recognition in Natural Images (both English and Kannada are available) <http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/>`_
|
||||
* `Face Recognition Benchmark <http://www.face-rec.org/databases/>`_
|
||||
* `Flickr: 32 Class Brand Logos <http://www.multimedia-computing.de/flickrlogos/>`_
|
||||
|
@ -326,7 +327,7 @@ Machine Learning
|
|||
* `MovieLens Data Sets <http://grouplens.org/datasets/movielens/>`_
|
||||
* `New Yorker caption contest ratings <https://github.com/nextml/caption-contest-data>`_
|
||||
* `RDataMining - "R and Data Mining" ebook data <http://www.rdatamining.com/data>`_
|
||||
* `Registered Meteorites on Earth <http://healthintelligence.drupalgardens.com/content/registered-meteorites-has-impacted-earth-visualized>`_
|
||||
* `Registered Meteorites on Earth <http://publichealthintelligence.org/content/registered-meteorites-has-impacted-earth-visualized>`_
|
||||
* `Restaurants Health Score Data in San Francisco <http://missionlocal.org/san-francisco-restaurant-health-inspections/>`_
|
||||
* `UCI Machine Learning Repository <http://archive.ics.uci.edu/ml/>`_
|
||||
* `Yahoo! Ratings and Classification Data <http://webscope.sandbox.yahoo.com/catalog.php?datatype=r>`_
|
||||
|
@ -348,7 +349,8 @@ Museums
|
|||
Natural Language
|
||||
----------------
|
||||
|
||||
* `Automatic Keyphrase Extracttion <https://github.com/snkim/AutomaticKeyphraseExtraction/>`_
|
||||
* `POS/NER/Chunk annotated data <https://github.com/aritter/twitter_nlp/tree/master/data/annotated>`_
|
||||
* `Automatic Keyphrase Extraction <https://github.com/snkim/AutomaticKeyphraseExtraction/>`_
|
||||
* `Blogger Corpus <http://u.cs.biu.ac.il/~koppel/BlogCorpus.htm>`_
|
||||
* `CLiPS Stylometry Investigation Corpus <http://www.clips.uantwerpen.be/datasets/csi-corpus>`_
|
||||
* `ClueWeb09 FACC <http://lemurproject.org/clueweb09/FACC1/>`_
|
||||
|
@ -363,12 +365,15 @@ Natural Language
|
|||
* `Hansards text chunks of Canadian Parliament <http://www.isi.edu/natural-language/download/hansard/>`_
|
||||
* `Machine Comprehension Test (MCTest) of text from Microsoft Research <http://research.microsoft.com/en-us/um/redmond/projects/mctest/index.html>`_
|
||||
* `Machine Translation of European languages <http://statmt.org/wmt11/translation-task.html#download>`_
|
||||
* `Making Sense of Microposts 2013 - Concept Extraction <http://oak.dcs.shef.ac.uk/msm2013/challenge.html>`_
|
||||
* `Making Sense of Microposts 2016 - Named Entity rEcognition and Linking <http://microposts2016.seas.upenn.edu/challenge.html>`_
|
||||
* `Microsoft MAchine Reading COmprehension Dataset (or MS MARCO) <http://www.msmarco.org/dataset.aspx>`_
|
||||
* `Multi-Domain Sentiment Dataset (version 2.0) <http://www.cs.jhu.edu/~mdredze/datasets/sentiment/>`_
|
||||
* `Open Multilingual Wordnet <http://compling.hss.ntu.edu.sg/omw/>`_
|
||||
* `Personae Corpus <http://www.clips.uantwerpen.be/datasets/personae-corpus>`_
|
||||
* `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) <https://github.com/ParallelMazen/SaudiNewsNet>`_
|
||||
* `SMS Spam Collection in English <http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/>`_
|
||||
* `Stanford Question Answering Dataset (SQuAD) <https://rajpurkar.github.io/SQuAD-explorer/>`_
|
||||
* `Universal Dependencies <http://universaldependencies.org>`_
|
||||
* `USENET postings corpus of 2005~2011 <http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html>`_
|
||||
* `Webhose - News/Blogs in multiple languages <https://webhose.io/datasets>`_
|
||||
|
@ -422,7 +427,6 @@ Public Domains
|
|||
* `CMU StatLab collections <http://lib.stat.cmu.edu/datasets/>`_
|
||||
* `Data.World <https://data.world>`_
|
||||
* `Data360 <http://www.data360.org/index.aspx>`_
|
||||
* `Datamob.org <http://datamob.org/datasets>`_
|
||||
* `Google <http://www.google.com/publicdata/directory>`_
|
||||
* `Infochimps <http://www.infochimps.com/>`_
|
||||
* `KDNuggets Data Collections <http://www.kdnuggets.com/datasets/index.html>`_
|
||||
|
@ -472,6 +476,7 @@ Social Networks
|
|||
* `GitHub Collaboration Archive <https://www.githubarchive.org/>`_
|
||||
* `Google Scholar citation relations <http://www3.cs.stonybrook.edu/~leman/data/gscholar.db>`_
|
||||
* `High-Resolution Contact Networks from Wearable Sensors <http://www.sociopatterns.org/datasets/>`_
|
||||
* `Indie Map: social graph and crawl of top IndieWeb sites <http://www.indiemap.org/>`_
|
||||
* `Mobile Social Networks from UMASS <https://kdl.cs.umass.edu/display/public/Mobile+Social+Networks>`_
|
||||
* `Network Twitter Data <http://snap.stanford.edu/data/higgs-twitter.html>`_
|
||||
* `Reddit Comments <https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/>`_
|
||||
|
@ -541,7 +546,6 @@ Software
|
|||
Sports
|
||||
------
|
||||
|
||||
* `Basketball (NBA/NCAA/Euro) Player Database and Statistics <http://www.draftexpress.com/stats.php>`_
|
||||
* `Betfair Historical Exchange Data <http://data.betfair.com/>`_
|
||||
* `Cricsheet Matches (cricket) <http://cricsheet.org/>`_
|
||||
* `Ergast Formula 1, from 1950 up to date (API) <http://ergast.com/mrd/db>`_
|
||||
|
@ -571,7 +575,6 @@ Transportation
|
|||
* `GeoLife GPS Trajectory from Microsoft Research <http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/>`_
|
||||
* `German train system by Deutsche Bahn <http://data.deutschebahn.com/datasets/>`_
|
||||
* `Hubway Million Rides in MA <http://hubwaydatachallenge.org/trip-history-data/>`_
|
||||
* `Marine Traffic - ship tracks, port calls and more <http://www.marinetraffic.com/de/ais-api-services>`_
|
||||
* `Montreal BIXI Bike Share <https://montreal.bixi.com/en/open-data>`_
|
||||
* `NYC Taxi Trip Data 2009- <http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml>`_
|
||||
* `NYC Taxi Trip Data 2013 (FOIA/FOILed) <https://archive.org/details/nycTaxiTripData2013>`_
|
||||
|
|
Loading…
Reference in New Issue
Block a user