Week Beginning 6th December 2021

I spent a bit of time this week writing as second draft of a paper for DH2022 after receiving feedback from Marc.  This one targets ‘short papers’ (500-750 words) and I managed to get it submitted before the deadline on Friday.  Now I’ll just need to see if it gets accepted – I should find out one way or the other in February.  I also made some further tweaks to the locution search for the Anglo-Norman Dictionary, ensuring that when a term appears more than once the result is repeated for each occurrence, appearing in the results grouped by each word that matches the term.  So for example ‘quatre tempres, tens’ now appears twice, once amongst the ‘tempres’ and once amongst the ‘tens’ results.

I also had a chat with Heather Pagan about the Irish Dictionary eDIL (http://www.dil.ie/) who are hoping to rework the way they handle dates in a similar way to the AND.  I said that it would be difficult to estimate how much time it would take without seeing their current data structure and getting more of an idea of how they intend to update it, and also what updates would be required to their online resource to incorporate the updated date structure, such as enhanced search facilities and whether further updates to their resource would also be part of the process.  Also whether any back-end systems would also need to be updated to manage the new data (e.g. if they have a DMS like the AND).

Also this week I helped out with some issues with the Iona place-names website just before their conference started on Thursday.  Someone had reported that the videos of the sessions were only playing briefly and then cutting out, but they all seemed to work for me, having tried them on my PC in Firefox and Edge and on my iPad in Safari.  Eventually I managed to replicate the issue in Chrome on my desktop and in Chrome on my phone, and it seemed to be an issue specifically related to Chrome, and didn’t affect Edge, which is based on Chrome.  The video file plays and then cuts out due to the file being blocked on the server.  I can only assume that the way Chrome accesses the file is different to other browsers and it’s sending multiple requests to the server which is then blocking access due to too many requests being sent (the console in the browser shows a 403 Forbidden error).  Thankfully Raymond at Arts IT Support was able to increase the number of connections allowed per browser and this fixed the issue.  It’s still a bit of a strange one, though.

I also had a chat with the DSL people about when we might be able to replace the current live DSL site with the ‘new’ site, as the server the live site is on will need to be decommissioned soon.  I also had a bit of a catch-up with Stevie Barrett, the developer in Celtic and Gaelic, and had a video call with Luca and his line-manager Kirstie Wild to discuss the current state of Digital Humanities across the College of Arts.  Luca does a similar job to me at college-level and it was good to meet him and Kirstie to see what’s been going on outside of Critical Studies.  I also spoke to Jennifer Smith about the Speak For Yersel project, as I’d not heard anything about it for a couple of weeks.  We’re going to meet on Monday to take things further.

I spent the rest of the week working on the radar diagram visualisations for the Historical Thesaurus, completing an initial version.  I’d previously created a tree browser for the thematic headings, as I discussed last week.  This week I completed work on the processing of data for categories that are selected via the tree browser.  After the data is returned the script works out which lexemes have dates that fall into the four periods (e.g. a word with dates 650-9999 needs to appear in all four periods).  Words are split by Part of speech, and I’ve arranged the axes so that N, V, Aj and Av appear first (if present), with any others following on.  All verb categories have also been merged.

I’m still not sure how widely useful these visualisations will be as they only really work for categories that have several parts of speech.  But there are some nice ones.  See for example a visualisation of ‘Badness/evil’, ‘Goodness, acceptability’ and ‘Mediocrity’ which shows words for ‘Badness/evil’ being much more prevalent in OE and ME while ‘Mediocrity’ barely registers, only for it and ‘Goodness, acceptability’ to grow in relative size EModE and ModE:

I also added in an option to switch between visualisations which use total counts of words in each selected category’s parts of speech and visualisations that use percentages.  With the latter the scale is fixed at a maximum of 100% across all periods and the points on the axes represent the percentage of the total words in a category that are in a part of speech in your chosen period.  This means categories of different sizes are more easy to compare, but does of course mean that the relative sizes of categories is not visualised.  I could also add a further option that fixes the scale at the maximum number of words in the largest POS so the visualisation still represents relative sizes of categories but the scale doesn’t fluctuate between periods (e.g. if there are 363 nouns for a category across all periods then the maximum on the scale would stay fixed at 363 across all periods, even if the maximum number of nouns in OE (for example) is 128.  Here’s the above visualisation using the percentage scale:

The other thing I did was to add in a facility to select a specific category and turn off the others.  So for example if you’ve selected three categories you can press on a category to make it appear bold in the visualisation and to hide the other categories.  Pressing on a category a second time reverts back to displaying all.  Your selection is remembered if you change the scale type or navigate through the periods.  I may not have much more time to work on this before Christmas, but the next thing I’ll do is to add in access to the lexeme data behind the visualisation.  I also need to fix a bug that is causing the ModE period to be missing a word in its counts sometimes.