The discovery by computer of new, previously unknown information; by automatically extracting information from a usually large amount of different unstructured textual resources.
Open Data Census
- A community effort, the OCDF data census seeks to aggregate public datasets related to social media and online communities.
SNAP: Stanford Network Analysis Project
- Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
Awesome Public Datasets
- An awesome list of high-quality open datasets (HQOD) in public domains (on-going).
Common Crawl
- Open repository of web crawl data that can be accessed and analyzed by anyone.
the @unitedstates project
- @unitedstates is a shared commons of data and tools for the United States. Made by the public, used by the public.
American Presidency Project - One of the most comprehensive collection of web resources on the American presidency, including documents, public papers, executive orders, addresses, press conferences, debates, election data, approval ratings, much more.
University of Oxford Text Archive
- The University of Oxford Text Archive develops, collects, catalogues and preserves electronic literary and linguistic resources for use in Higher Education, in research, teaching and learning.
Hathi Trust
- Non-Google digitized collection: Approximately 550,000 public domain volumes as of March 2015, primarily, though not exclusively, English language materials published prior to 1923.
JSTOR for Research
- Data for Research is a free service for researchers wishing to analyze content on JSTOR through a variety of lenses and perspectives.