Week Beginning 31st August 2020

I worked on many different projects this week, and the largest amount of my time went into the redevelopment of the Anglo-Norman Dictionary.  I processed a lot of the data this week and have created database tables and written extraction scripts to export labels, parts of speech, forms and cross references from the XML.  The data extracted will be used for search purposes, for display on the website in places such as the search results or will be used to navigate between entries.  The scripts will also be used when updating data in the new content management system for the dictionary when I write it.  I have extracted 85,397 parts of speech, 31,213 cross references, 150,077 forms and their types (lemma / variant / deviant) and 86,269 labels which correspond to one of 157 unique labels (usage or semantic), which I also extracted.

I have also finished work on the quick search feature, which is now fully operational.  This involved creating a new endpoint in the API for processing the search.  This includes the query for the predictive search (i.e. the drop-down list of possible options that appears as you type), which returns any forms that match what you’re typing in and the query for the full quick search, which allows you to use ‘?’ and ‘*’ wildcards (and also “” for an exact match) and returns all of the data about each entry that is needed for the search results page.  For example, if you type in ‘from’ in the ‘Quick Search’ box a drop-down list containing all matching forms will appear.  Note that these are forms not only headwords so they include lemmas but also variants and deviants.  If you select a form that is associated with one single entry then the entry’s page will load.  If you select a form that is associated with more than one entry then the search results page will load.  You can also choose to not select an item from the drop-down list and search for whatever you’re interested in.  For example, enter ‘*ment’ and press enter or the search button to view all of the forms ending in ‘ment’, as the following screenshot demonstrates (note that this is not the final user interface but one purely for test purposes):

With this example you’ll see that the results are paginated, with 100 results per page.  You can browse through the pages using the next and previous buttons or select one of the pages to jump directly to it.  You can bookmark specific results pages too.  Currently the search results display the lemma and homonym number (if applicable) and display whether the entry is an xref or not.  Associated parts of speech appear after the lemma.  Each one currently has a tooltip and we can add in descriptions of what each POS abbreviation means, although these might not be needed.  All of the variant / deviant forms are also displayed as otherwise it can be quite confusing for users if the lemma does not match the term the user entered but a form does.  All associated semantic / usage labels are also displayed.  I’m also intending to add in earliest citation date and possibly translations to the results as well, but I haven’t extracted them yet.

When you click on an entry from the search results this loads the corresponding entry page.  I have updated this to add in tabs to the left-hand column.  In addition to the ‘Browse’ tab there is a ‘Results’ tab and a ‘Log’ tab.  The latter doesn’t contain anything yet, but the former contains the search results.  This allows you to browse up and down the search results in the same way as the regular ‘browse’ feature, selecting another entry.  You can also return to the full results page.  I still need to do some tweaking to this feature, such as ensuring the ‘Results’ tab loads by default if coming from a search result.  The ‘clear’ option also doesn’t currently work properly.  I’ll continue with this next week.

For the Books and Borrowing project I spent a bit of time getting the page images for the Westerkirk library uploaded to the server and the page records created for each corresponding page image.  I also made some final tweaks to the Glasgow Students pilot website that Matthew Sangster and I worked on and this is now live and available here: https://18c-borrowing.glasgow.ac.uk/.

There are three new place-name related projects starting up at the moment and I spent some time creating initial websites for all of these.  I still need to add in the place-name content management systems for two of them, and I’m hoping to find some time to work on this next week.  I also spoke to Joanna Kopaczyk about a website for an RSE proposal she’s currently putting together and gave some advice to some people in Special Collections about a project that they are planning.

On Tuesday I had a Zoom call with the ‘Editing Robert Burns’ people to discuss developing the website for phase two of the Editing Robert Burns project.  We discussed how the website would integrate with the existing website (https://burnsc21.glasgow.ac.uk/) and discussed some of the features that would be present on the new site, such as an interactive map of Burns’ correspondence and a database of forged items.

I also had a meeting with the Historical Thesaurus people on Tuesday and spent some time this week continuing to work on the extraction of dates from the OED data, which will feed into a new second edition of the HT.  I fixed all of the ‘dot’ dates in the HT data.  This is where there isn’t a specific date but a dot is used instead (e.g. 14..) but sometimes a specific year is given in the year attribute (e.g. 1432) but at other times a more general year is given (e.g. 1400).  We worked out a set of rules for dealing with these and I created a script to process them.  I then reworked my script that extracts dates for all lexemes that match a specific date pattern (YYYY-YYYY, where the first year might be Old English and the last year might be ‘Current’) and sent this to Fraser so that the team can decide which of these dates should be used in the new version of the HT.  Next week I’ll begin work on a new version of the HT website that uses an updated dataset so we can compare the original dates with the newly updated ones.