English Corpora Documentation README created 7/1/2024 by Talya Cooper for NYU Data Services English-corpora.org is a website that includes a set of corpora for use in linguistics research, natural language processing, and language learning, among a range of other fields. The corpora are largely in English, with some in Spanish and Portuguese, and were primarily compiled by Mark Davies, a Brigham Young University linguistics professor who is now retired but continues to maintain the collections. The text of the corpora can be searched through the English-Corpora.org web interface, which provides a number of tools for research and analysis. It also hosts a number of instructional videos. This full-text collection has been provided through an academic license for use at NYU. More information is available at https://ultraviolet.library.nyu.edu/records/516xh-p6627 and at https://guides.nyu.edu/tdm/english-corpora. Because web resources are inherently unstable, NYU Data Services has created web archives in .wacz format of both English-corpora.org and corpusdata.org (the website that hosts the full-text downloadable datasets). This way, we can continue to provide contextual documentation for these corpora in the event of an change in availability to the active website. We have also downloaded PDFs of the documentation available on English-corpora.org, again to ensure that we can provide users with the original context and documentation if anything affects the English-corpora.org website. Web archives were created on June 17, 2024 and PDFs were downloaded on June 26, 2024 ├── english_corpora_pdfs │   ├── analyze-text.pdf │   ├── architecture.pdf │   ├── association-measures.pdf │   ├── browse.pdf │   ├── collocates-ec-se.pdf │   ├── customized-word-lists.pdf │   ├── external-resources.pdf │   ├── fulltext_formats.pdf │   ├── fulltext_overview.pdf │   ├── fulltext_structures.pdf │   ├── kwic-analyze.pdf │   ├── kwic-saved.pdf │   ├── now-monitor-corpus-english.pdf │   ├── saved-word-phrase.pdf │   ├── search-history.pdf │   ├── topics-and-collocates.pdf │   ├── virtual-corpora-quick-overview.pdf │   ├── virtual-corpora.pdf │   └── word-sketch.pdf └── english_corpora_web_archives ├── english-corpora_webarchive_20240617.wacz └── corpus_data_webarchive_20240617.wacz