CMSW: Corpus of Modern Scottish Writing (1700-1945)

An electronic corpus of written and printed texts from the period 1700-1945, featuring over 350 documents and containing approximately 5.5 million words of text overall.

Browsing the 356 documents contained in the corpus
Browsing the 356 documents contained in the corpus

The Corpus of Modern Scottish Writing (CMSW) is an electronic corpus of written and printed texts from the period 1700-1945, complementing the Helsinki Corpus of Older Scots (1450-1700) and the Scottish Corpus of Texts and Speech (1945-present day). CMSW contains over 350 documents, containing approximately 5.5 million words of text overall.

CMSW’s documents range from printed novels, to written correspondence, to newspaper and magazine articles, to legal material such as wills and sasines. Our documents have been sourced from partners such as the Mitchell Library in Glasgow, the National Library of Scotland, and University of Glasgow Archive Services.

All our documents are assigned a year group (1700-1750, 1750-1800, 1800-1850, 1850-1900, and 1900-1945). Year groups are decided based on a document’s date of publication (if printed), and of writing (if handwritten); if this date is unknown, we have assigned a year group based on our best estimate. Documents are also classified as belonging to one of nine genre groups (administrative prose, expository prose, personal writing, instructional prose, religious prose, verse/drama, imaginative prose, journalism, and orthoepists). More information is available from the corpus details page.

The linguistic research undertaken by the CMSW team as part of the project aims primarily to account for the structures of Modern Scots orthography, with a view to enhancing automatic identification of spelling variants. This groundwork will lead to further linguistic analysis, for example of the relationship between orthography and phonology, of the extent to which Modern Scots orthography is lexicalised in different phases within the period, and of how particular orthographies are motivated by stylistic or philological intentions on the part of the author.

Project website: https://www.scottishcorpus.ac.uk/cmsw/


Main contact: Wendy Anderson

Developers: Brian AitkenDavid Beavan

Funded by: AHRC

Subject area: English Language & Linguistics

Keywords: Corpus LinguisticsDigital EditionDigitisationScots

Record last updated 2020-01-13