Week Beginning 7th March 2022

This was my first five-day week after the recent UCU strike action and it was pretty full-on, involving many different projects.  I spent about a day working on the Speak For Yersel project.  I added in the content for all 32 ‘I would never say that’ questions and completed work on the new ‘Give your word’ lexical activity, which features a further 30 questions of several types.  This includes questions that have associated images and questions where multiple answers can be selected.  For the latter no more than three answers are allowed to be selected and this question type needs to be handed differently as we don’t want the map to load as soon as one answer is selected. Instead the user can select / deselect answers.  If at least one answer is selected a ‘Continue’ button appears under the question.  When you press on this the answers become read only and the map appears.  I made it so that no more than three options can be selected – you need to deselect one before you can add another.  I think we’ll need to look into the styling of the buttons, though, as currently ‘active’ (when a button is hovered over or has been pressed and nothing else has yet been pressed) is the same colour is ‘selected’.  So if you select ‘ginger’ then deselect it the button still looks selected until you press somewhere else, which is confusing.  Also if you press a fourth button it looks like it has been selected when in actual fact it’s just ‘active’ and isn’t really selected.

I also spent about a day continuing to work on the requirements document for the Books and Borrowing project.  I haven’t quite finished this initial version of the document but I’ve made good progress and I aim to have it completed next week.  Also for the project I participated in a Zoom call with RA Alex Deans and NLS Maps expert Chris Fleet about a subproject we’re going to develop for B&B for the Chambers Library in Edinburgh.  This will feature a map-based interface showing where the borrowers lived and will use a historical map layer for the centre of Edinburgh.

Chris also talked about a couple of projects at the NLS that were very useful to see.  The first one was the Jamaica journal of Alexander Innes (https://geo.nls.uk/maps/innes/) which features journal entries plotted on a historical map and a slider allowing you to quickly move through the journal entries.  The second was the Stevenson maps  of Scotland (https://maps.nls.uk/projects/stevenson/) that provides options to select different subjects and date periods.  He also mentioned a new crowdsourcing project to transcribe all of the names on the Roy Military Survey of Scotland (1747-55) maps which launched in February and already has 31,000 first transcriptions in place, which is great.  As with the GB1900 project, the data produced here will be hugely useful for things like place-name projects.

I also participated in a Zoom call with the Historical Thesaurus team where we discussed ongoing work.  This mainly involves a lot of manual linking of the remaining unlinked categories and looking at sensitive words and categories so there’s not much for me to do at this stage, but it was good to be kept up to date.

I continued to work on the new extIPA charts for the Speech Star project, which I had started on last week.  Last week I had some difficulties replicating the required phonetic symbols but this week Eleanor directed me to an existing site that features the extIPA chart (https://teaching.ncl.ac.uk/ipa/consonants-extra.html).  This site uses standard Unicode characters in combinations that work nicely, without requiring any additional fonts to be used.  I’ve therefore copied the relevant codes from there (this is just character codes like b̪ – I haven’t copied anything other than this from the site).   With the symbols in place I managed to complete an initial version of the chart, including pop-ups featuring all of the videos, but unfortunately the videos seem to have been encoded with an encoder that requires QuickTime for playback.  So although the videos are MP4 they’re not playing properly in browsers on my Windows PC – instead all I can hear is the audio.  It’s very odd as the videos play fine directly from Windows Explorer, but in Firefox, Chrome or MS Edge I just get audio and the static ‘poster’ image.  When I access the site on my iPad the videos play fine (as QuickTime is an Apple product).  Eleanor is still looking into re-encoding the videos and will hopefully get updated versions to me next week.

I also did a bit more work for the Anglo-Norman Dictionary this week.  I fixed a couple of minor issues with the DTD, for example the ‘protect’ attribute was an enumerated list that could either be ‘yes’ or ‘no’ but for some entries the attribute was present but empty, and this was against the rules.  I looked into whether an enumerated list could also include an empty option (as opposed to not being present, which is a different matter) but it looks like this is not possible (see for example http://lists.xml.org/archives/xml-dev/200309/msg00129.html).  What I did instead was to change the ‘protect’ attribute from an enumerated list with options ‘yes’ and ‘no’ to a regular data field, meaning the attribute can now include anything (including being empty).  The ‘protect’ attribute is a hangover from the old system and doesn’t do anything whatsoever in the new system so it shouldn’t really matter.  And it does mean that the XML files should now validate.

The AND people also noticed that some entries that are present in the old version of the site are missing from the new version.  I looked through the database and also older versions of the data from the new site and it looks like these entries have never been present in the new site.  The script I ran to originally export the entries from the old site used a list of headwords taken from another dataset (I can’t remember where from exactly) but I can only assume that this list was missing some headwords and this is why these entries are not in the new site.  This is a bit concerning, but thankfully the old site is still accessible.  I managed to write a little script that grabs the entire contents of the browse list from the old website, separating it into two lists, one for main entries and one for xrefs.  I then ran each headword against a local version of the current AND database, separating out homonym numbers then comparing the headword with the ‘lemma’ field in the DB and the hom with the hom.  Initially I ran main and xref queries separately, comparing main to main and xref to xref, but I realised that some entries had changed types (legitimately so, I guess) so stopped making a distinction.

The script outputted 1540 missing entries.  This initially looks pretty horrifying, but I’m fairly certain most of them are legitimate.  There are a whole bunch of weird ‘n’ forms in the old site that have a strange character (e.g. ‘nun⋮abilité’) that are not found in the new site, I guess intentionally so.  Also, there are lots of ‘S’ and ‘R’ words but I think most of these are because of joining or splitting homonyms.  Geert, the editor, looked through the output and thankfully it turns out that only a handful of entries are missing, and also that these were also missing from the old DMS version of the data so their omission occurred before I became involved in the project.

Finally this week I worked with a new dataset of the Dictionaries of the Scots Language.  I successfully imported the new data and have set up a new ‘dps-v2’ api.  There are 80,319 entries in the new data compared to 80,432 in the previous output from DPS.  I have updated our test site to use the new API and its new data, although I have not been able to set up the free-text data in Solr yet so the advanced search for full text / quotations only will not work yet.  Everything else should, though.

Also today I began to work on the layout of the bibliography page.  I have completed the display of DOST bibs but haven’t started on SND yet.  This includes the ‘style guide’ link when a note is present.  I think we may still need to tweak the layout, however.  I’ll continue to work with the new data next week.