Week Beginning 29th October 2018

This was a slightly unusual week for me, as I don’t often speak at events but I had sessions at workshops on Tuesday and Wednesday.  The first one was an ArtsLab event about AHRC Data Management Plans while the second one was a workshop organised by Bryony Randall about digital editions.  I think both workshops went well, and my sessions went pretty smoothly.  It does take time to prepare for these sorts of things, though, especially when the material needs to be written from scratch, so most of the start of the week was spent preparing for and attending these events.

I also had a REELS project meeting on Tuesday morning where we discussed the feedback we’d received about the online resource and made a plan for what still needs to be finalised before the resource goes live at an event on the 17th of November.  There are 23 items on the plan I drew up, so there’s rather a lot to get sorted in the next couple of weeks.  Also relating to place-name studies, I made the new, Leaflet powered maps for Thomas Clancy’s Saints Places website live this week.  I made the new maps for this legacy resource to replace older Google-based maps that were no longer working due to Google now requiring credit card details to use their mapping services.  An example of one of the new maps can be found here: https://saintsplaces.gla.ac.uk/saint.php?id=64.

Also this week I updated the ‘support us’ page of the DSL to include new information and a new structure (http://dsl.ac.uk/support-us/), arranged to meet Matthew Creasy to discuss future work on his Decadence and Translation project, and responded to a few more requests from Jeremy Smith about the last-minute bid he was putting together, which he managed to submit on Tuesday.   I also spoke to Scott Spurlock about his crowdsourcing project and spoke to Jane Stuart-Smith about the questionnaire for the new Seeing Speech / Dynamic Dialects websites which are nearing completion.  I set up an Google Play / App Store account for someone in MVLS who wanted to keep track of the stats for one of their apps and I spoke to Kirsteen McCue about timelines for her RNSN project.

By Thursday I managed to get settled back into my more regular work routine, and returned to work on the Bilingual Thesaurus for the first time in a few weeks.  Louise Sylvester had supplied me with some text for the homepage and the about page, so I added that in.  I also fixed the date for ‘Galiot’, which was previously only recorded with an end date, and changed the ‘there are no words in this category’ text to ‘there are no words at this level of the hierarchy’, which is hopefully less confusing.

I also split the list of words for each category  into two separate lists, one for Anglo Norman and one for Middle English.  Originally I was thinking of having these as separate tabs, but as there are generally not very many words in a category it seemed a little unnecessary, and would have made it harder for a user to compare AN and ME words at the same time.  So instead the words are split into two sections of one list.  I also added in the language of origin and language of citation text.  This information currently appears underneath the line containing the headword, POS and dates.  Finally, I added in the links to the source dictionaries.  To retain the look of the HT site and to reduce clutter these appear in a pop-up that’s opened when you click on a ‘search’ icon to the right of the word (tooltip text appears if you hover over the search icon too).  These might be replaced with in-page links for each word instead, though.  Here’s a screenshot of how things currently look, but note that the colour scheme is likely to change as Louise has specified a preference for blue and red.  I’ll probably reuse the colours below for the main ‘Thesaurus’ portal page.

I spent the rest of the week working through the HT / OED category linking issues.  This included ticking off 6621 matches that were identified by the lexeme / first date matching script, ticking off 78 further matches that Fraser had checked manually, and creating a script that matches up 1424 categories within the category ‘Thing heard’ that had things done to their category numbers that had prevented these from being paired up by previous scripts.  I haven’t ticked these off yet as Marc wanted to QA them first, so I created a further script to help with this process.  I also wrote a script to fix the category numbers of some of the HT categories where an erroneous zero appears in the number – e.g. ‘016’ is used rather than ‘16’.  There were 1355 of these errors, which have now been fixed, which should mean the previous matching scripts should be able to match up at least some of these.  Marc, Fraser and I met on Friday to discuss the process, and unfortunately one of the scripts we looked at still had its ‘update’ code active, meaning the newly fixed ‘erroneous zero’ categories were passed through it and ticked off.  After the meeting I deactivated the ‘update’ code and identified which rows had been ticked off, creating a script to help to QA these, so no real damage was done.

I also realised that the page I’d created to list statistics about matched / unmatched categories was showing an incorrect figure for unmatched categories that are not empty.  Rather than having 2931 unmatched OED categories that have a POS and are not empty the figure is actually 10594.  The stats page was subtracting the total matched figure (currently 213,553) from the total number of categories that have a POS and are not empty (216,484).  I’m afraid I hadn’t included a count of matched categories that have a POS and are not empty (currently 205,890), which is what should have been used rather than the total matched figure.  So unfortunately we have more matches to deal with than we thought.

I also made a tweak to the lexeme / first date matching script, removing ‘to ‘ from the start of lexemes in order to match them.  This helped bump a number of categories up into our thresholds for potential matches.  I also changed the thresholds and added in a new grouping.  The criteria for potential matches has been reduced by one word to 5 matching words and a total of 80% matching words.  I also created a new grouping for categories that don’t meet this threshold but still have 4 matching words.  I’ll continue with this next week.