Linguistic DNA

The Linguistic DNA of Modern Western Thought uses digital methods and resources to analyse more than 5 million pages of printed texts. This data represents works printed in English, or in England, Ireland, Scotland, and Wales from 1473 to 1800.

The association between Reason and Nature over time
The association between Reason and Nature over time

For the early part of the period, we are working with the ca. 58,000 texts digitised as part of the Early English Books Online Text Creation Partnership collaboration (EEBO-TCP). From 1700, our major source is Gale Cengage’s Eighteenth Century Collections Online (ECCO). These resources allow an unprecedented level of comprehensiveness in the analysis of language, semantics, and conceptual history in the Early Modern period. The project applies computational tools to these resources to analyse details of Early Modern English vocabulary and semantics, including instances of social and cultural keywords and their shifting frequencies, meanings, and uses in various contexts over time. The result will be a rigorous, systematic, and scientific account of Early Modern conceptual history via its linguistic data.

In addition, the project also incorporates the recently completed Historical Thesaurus of English. The thesaurus serves as a taxonomy of language history as it is captured in the Oxford English Dictionary; it organises the 793,000 word senses in the OED and other sources into semantic categories, which can nest inside wider categories in a taxonomy up to twelve layers tall. As such, the architecture and database of the thesaurus are key to identifying concepts in the texts explored in this project.

Using the resources described above, it is possible to discern trends, relationships, and anomalies across an enormous amount of linguistic data to identify the often surprising complexities, continuities, and discontinuities inherent to linguistic and conceptual change.

Project website: https://www.linguisticdna.org/


Main contact: Marc Alexander

Developer: Matthew Groves

Start year: 2015

End year: 2017

Funded by: AHRC

Subject area: English Language & Linguistics

Keywords: Corpus LinguisticsHistory of EnglishVisualisation

Record last updated 2020-01-24