I was on holiday last week, and for most of this week I attended the ‘English Historical Lexicography in the Digital Age’ conference in Bergamo, Italy. On Monday and Tuesday I prepared for the conference, at which I was speaking about the Bilingual Thesaurus. I also responded to a query regarding the SPADE project, had a further conversation with Ophira Gamliel about her project and did some app account management duties. I spent Wednesday travelling to the conference, which then ran from Thursday through to Saturday lunchtime. It was an excellent conference in a lovely setting. It opened with a keynote lecture by Wendy Anderson which focussed primarily on the Mapping Metaphor project. It was great to see the project’s visualisations again, and hear some of the research that can be carried out using the online resource, and the audience seemed interested in the project. Another couple of potential expansions to the resource might be to link through to citations in the OED to analyse the context of a metaphorical usage, and to label categories as ‘concrete’ or ‘abstract’, to enable analysis of different metaphorical connections, such as concrete to abstract, or concrete to concrete. One of the audience suggested showing clusters of words and their metaphorical connections using network diagrams, although I recall that we did look at such visualisations back in the early days of the project and decided against using them.
Wendy’s session was followed by a panel on historical thesauri. Marc gave a general introduction to historical thesauri, which was interesting and informative – apparently these aren’t just historical thesauri, they are ‘Kay-Samuels’ thesauri. Marc also suggested we write a sort of ‘best practice’ guide for creating historical thesauri, which I thought sounded like a very good idea. After that there was a paper about the Bilingual Thesaurus given by Louise Sylvester and me about. I think this all went very well, but I can’t really comment further on something I presented. The next paper was given by Fraser and Rhona Alcorn about the new Historical Thesaurus of Scots project. It was good to hear their talk as I learnt some new details about the project, which I’m involved with. Rhona mentioned that the existing printed Scots Thesaurus doesn’t have any dates, and mostly focusses on rural life, so although it will be useful to a certain extent the project needs to be much broader in scope and more historically focussed. The project is due to end in January and I’ll be creating some sort of interface in December / January. Fraser mentioned that one possible idea is to look for words in the dictionary definitions that are also present in the HT category path in order to possibly put words into categories. Other plans are took at cognate terms (e.g. ‘kirk’ and ‘church’), sound shifts (e.g. ‘stan’ to ‘stone’), variant spellings and expanded variant forms. We also will need to find a way to automatically extract the dates from the DSL data too.
The final paper in the session was by Heather Pagan and was about using the semantic tags in the Anglo-Norman Dictionary to categorise entries. The AND uses a range of semantic tags (e.g. ‘bot.’, ‘law’), but these are not used in every sense – only when clarification is needed. The use of the tags is not consistent. Lots of forms are used but not documented, and lists of tags only include those that are abbreviations. The dictionary has been digitised and marked up in XML, with semantic tags marked as follows: <usage type=”zool.” />. Multiple types can be associated with an entry and different variants have now been rationalised. There are, however, some issues. For example, sometimes other words appear in a bracket where a tag might be, even though it’s not a semantic tag, and also tags are not used when things are obvious – e.g. ‘sword’ is not tagged as a weapon. There are also potential inconsistencies – ‘architecture’ vs ‘building’, ‘mathematical’ vs ‘arithmetic’, ‘maritime’ vs ‘naval’. The AHRC funded a project to redevelop the tags, and it was decided that tags in modern English would be used as they are for a modern audience. The project decided to use OED subject categories and ended up using 105 different tags. These are not hierarchical, but allow for multiple tags to be applied to each word. It is possible to browse the website by tags (http://www.anglo-norman.net/label-search.shtml) and to limit this by POS. Heather ended by pointing out some of the biases that the use of tags has demonstrated – e.g. there is a tag for ‘female’ but not for ‘male’, and religion is considered ‘Christian’ by default.
The next panel was on semantic change in lexicography and consisted of three papers. The first was about the use of the term ‘court language’ in different periods during 17th century revolutionary England. The speaker discussed ‘lexical priming’, when words are primed for collocational use through encounters in speech and writing, and also ‘priming drift’ when the meaning of the words changes. The source data was taken from EEBO and powered by CQPWeb and an initial search was on the collocations of ‘language’. There were lots of negative adjective collocates due to the polemic nature of the texts. ‘Smooth Language’ was looked at, and how its use changed from being associated negatively with the court and monarchy (meaning falsehood and fake) to being viewed as positive (e.g. sophisticated, elegant). The term ‘court language’ followed a similar path.
The next speaker looked at the use of Indian keyword used by English women travel writers in the 19th century. The speaker talked of ‘languaging’ – the changes within a language with a focus on the language activity of speakers rather than on the language system. The speaker looked at the ‘Hobson-Jobson’ Anglo-Indian Dictionary and noticed there were no references to women travel writers as sources. The speaker created a corpus of travel books by women (about 1.3 million words) consisting of letter, recollections and narratives, but no literary texts. These were all taken from Google Books and Project Gutenberg, and analysis of the corpus was undertaken using Wordsmith, comparing results to the Corpus of Later Modern English (15m tokens) as a reference corpus. This included the Victorian Women Writers project. Results were analysed using concordances, clusters and n-grams.
The last speaker of the day discussed semantic variation in the use of words to refer to North American ‘Indians’ from 1584 to 1724. The speaker suggested there was ‘overlexicalisation’ during this period – many quasi-synonymous terms. The speaker created a corpus based on the Jamestown Digital Archive, consisting of 650,000 words over 6 subcorpora of 25 years. Analysis was done using Sketchengine. The 5 most frequent terms were Indian, savage, inhabitant, heathen and native and the speaker showed graphs of frequency. The use of words was compared to quotations in the OED and the speaker categorised use of the terms in the corpus as more ‘positive’, ‘neutral’ or ‘negative’. E.g. the use of ‘Indian’ is generally more neutral than negative, but there are peaks of more negative uses during periods of crisis, such as Bacon’s Rebellion in 1676. The use of ‘savage’ was mostly negative while ‘heathen’ was used mainly in a religious sense until 1676. The speaker also noted how ‘inhabitant’ and ‘native’ ended up shifting to refer to the European settlers in the late 1600s.
Day two of the conference began with a talk about the definition of a legal term that is currently in dispute in the US, tracing its usage back through the documentary evidence. The speaker used the Lexicons of Early Modern English, which looks to be a useful resource. The next speaker was Rachel Fletcher, a PhD student at Glasgow, who discussed how to deal with texts on the boundary between the Old English and Middle English periods. This is a fundamental issue for a period dictionary, but it is difficult to decide what is OE and what isn’t. The Dictionary of Old English uses evidence from manuscripts after 1150, e.g. attestation of spellings, and it is up to the user to decide which words they want to consider as OE. DOE links through to the Corpus of Old English so you can look at dates and authors and see all usage. The speaker stated that now that many of the resources are available digitally it’s easier to switch from one resource to another, and track entries between dictionaries. Boundaries can be more fuzzy and period changes are more of a continuum than previously, which is a good thing.
The next talk was a keynote lecture by Susan Rennie about the annotated Jamieson. Susan wasn’t at the conference in person but gave her talk via Skype, which mostly went ok, although there were times when it was difficult to hear properly. Susan discussed Jamieson’s dictionary of the Scottish Language, completed in 1808. It was the first completed dictionary of Scots and was a landmark in historical lexicography. Susan discussed her ‘Annotated Jamieson’ project and the impact Jamieson’s dictionary had on later dictionaries such as the DSL.
The next speaker was the conference organiser, Marina Dossena, who gave a paper about the lexicography of Scots. She pointed out that in the late 19th century Scots was seen as dying out, and in fact this view had been around for centuries, tracing it back to Pinkerton in 1786, who considered Scots good in poetry but unacceptable in general use. The speaker pointed out that Scots is at the intersection of monolingual and bilingual lexicography, and that Scots has no dictionary where both headwords and definitions are in Scots. The final speaker of the morning session looked at the stigmatisation of phonological change in 19th century newspapers, and the role newspapers and ‘letters to the editor’ played in stigmatising certain pronunciations. The speaker used the Eighteenth-Century English Phonology Database (ECEP) as a source.
After lunch there were six half-hour papers without a break, including an hour-long keynote lecture, which was a pretty intense and exhausting afternoon. The first speaker in the session discussed letters written by women who wished to give up their babies in 18th century England. These letters were sent to a ‘foundling’ hospital in London, and were sent (mainly) by young, lower class, unmarried women living in London, but who may have come from elsewhere. Most letters were not written directly by the women, but were signed (often with a cross) by them and the differed in formality and length. The speaker analysed 63 such petitions signed by single mothers from 1773 to 1799 that were sent to the governors of the hospital. There were around 100 women a week trying to give their children to the hospital. The speaker discussed some of the terms used for a baby being born, and how these were frequently in a passive tense, e.g. ‘be delivered of child’ appeared 18 times. The speaker also showed screenshots of the Historical Thesaurus timeline, which was good to see.
The following speaker looked at how childhood became a culturally constructed life stage during 16th and 17th century England. The speaker used the OED and HT for data, showing how in the 16th century children became thought of as autonomous human beings for the first time. Different categories for child were analysed, including foetus, infant, child, boy and girl. Some 101 senses over 8 25-year periods were looked at. From OE up to the 15th century words for child were more limited, and exhibited no emotion. Children were seen as offspring or were defined by their role, e.g. ‘page’, ‘groom’. During the 16th and 17th Centuries substages come in and there is more emotional colouring to the language, including lots of animal metaphors and some plant ones.
The next speaker discussed a dictionary of homonymic proper names that is in production, focussing on some examples from British and American English, using data from the English Pronouncing Dictionary and the Longman Pronunciation Dictionary, and after this speaker there followed a keynote lecture about the Salamanca Corpus. This talk looked specifically at 18th century Northern English, but gave an introduction to the Salamanca Corpus too. It is a collection of regional texts from the early modern period to the 20th century, covering the years 1500 to 1950. It consists of manuscripts, comments of contemporary individuals, dictionaries and glossaries, the literary use of dialect, dialectal literature and (from the 19th century onwards) philological studies. The speaker pointed out how the literary use of dialect starts with Chaucer and the Master of Wakefield in the 14th century, and at this time it wasn’t used for humour. It became more common in the 16th century as a means of characterisation, generally for humorous intend, with the main dialect forms being Kentish, Devonshire, Lancashire and Yorkshire. The speaker then looked at the 18th century Northern section of the corpus, looking at some specific texts and giving some examples, noting that the section is quite small (about 160,000 words) and is almost all Yorkshire and Lancashire.
The following speaker introduced the online English Dialect Dictionary. The printed version was released in 6 volumes from 1898-1905, and the online version has digitised this and made it searchable. The period covered in 1700-1900 and there are about 70,000 entries. A word must have been in use after 1700 and for there to be some written evidence of its use for it to be included. The final speaker looked at how some of the data from the EDD had been originally compiled, specifically Oxfordshire dialect words, with the speaker pointing out that the Oxfordshire dialect is one of the least researched dialects in Britain. The speaker discussed the role of correspondents in compiling the material. There are 750 listed in the dictionary, but were likely many more than this. They answered questions about usage and were recruited via newspapers and local dialect societies. The distribution of correspondents varies across the country, with Yorkshire best represented (167) followed by Lancashire (62). Oxfordshire only had 28.
On the third day there was a single keynote lecture about the historical lexicography of Canadian English, looking at the second edition of the Dictionary of Canadianisms on Historical Principles (DCHP-2), which is available online. The speaker noted that it was only in the 1920s and 30s that the first native born generation of people in Vancouver appeared, and contrasted this to the history of Europe. The sheer size of Canada as opposed to Europe was also shown. The speaker discussed the geographical spread of dialect terms, both in the provinces of Canada and across the world. The speaker used Google’s data to look at usage in different geographical areas based on the top-level domains of sites. After this keynote there were some final remarks and discussions and the conference drew to a close.
There were some very interesting papers at the conference, and it was particularly nice to see how the Historical Thesaurus and the Dictionary of the Scots Language are being used for research.