Week Beginning 24th July 2023

We made a big update to the content of the Anglo-Norman Dictionary this week, replacing all entries in the letter ‘U’, plus a number of other entries elsewhere in the dictionary.  I’d documented the processes that I needed to follow during previous updates to the dictionary, so on the whole it was a straightforward process.  However, previously I’d performed the update by downloading the database to my local PC, running all required processes there, then uploading the new version of the database to the server.  The good thing with this approach is the updates are applied on my local PC so if anything goes wrong the live site is not affected.  The downside is the database for the AND is over 2GB in size and it’s not possible to upload data of this size via PHPMyAdmin.  Instead I need to ask Arts IT Support to run the import at the command-line.  I wanted to be able to manage the whole process myself this time, therefore I investigated running the update on the server.  I did of course first download the live database and ran the update scripts on my local PC to ensure no issues would be encountered.  However, the version of PHP that runs on the server is newer and more strict than the version running on my local PC so I needed to make some modifications to the processing scripts that I usually run on my local PC.  With the modifications in place I was then able to run the scripts on the server and replace the letter ‘U’ entries on the live site.  The update also required some entries elsewhere in the dictionary to be updated and this process is handled via dictionary’s online content management system.  A zip file containing entries is uploaded and a script then processes it.  Last week I had to replace the library for handling zip files and this was the first time the updated ‘upload entries’ script had been used.  Unfortunately some errors were encountered but I managed to sort these and after that the update was complete.

I did some other work for the AND this week.  The editor Geert had noticed that the initials of the author, which should appear at the bottom of the entry, were sometimes appearing at the top.  After a bit of investigation it turned out that there was a JavaScript error caused by the structure of the FEW external reference and this was preventing the relocation of the editor initials from taking place.  After identifying this I managed to resolve the issue and the initials then returned to where they should be found.  There was also another issue with the ‘cogrefs’ (i.e. the external references to other dictionaries).  An ‘∅’ character should appear where there isn’t a reference, but for many entries for the MED no such character was appearing.  It turned out that for any entries that didn’t have a MED reference instead of the <link_target> element being empty it included MED.  The XSLT script wasn’t set up for such a situation so I updated it and thankfully the ‘∅’ character then started appearing.

I also checked to see how large the AND’s XML dataset was, which the editors need to know for some additional work they’re doing.  I’d estimated it to be around 25-35Mb whereas they’d estimated it to be around 2Gb.  I’d previously written a little script that would export XML data from the database and save them as individual XML files so I ran this again for the entire database.  The resulting database of just over 60,000 files  too up 139Mb, although this takes up 180Mb on disk, due to lots of small files having a storage overhead that one large file doesn’t have.

Also this week I had a video call with Michelle Anjirbag-Reeve, a researcher who is applying for ERC funding and would be based in the School of Critical Studies if the funding is successful.  Her proposal sounds really interesting and I’ll be involved in creating a CMS, website and visualisations if it’s successful.  Fingers crossed.

I also spent a day or so this week working for the Dictionaries of the Scots Language.  Before my recent holiday I’d been working on a requirements document for new search facilities for dates and part of speech, plus search result filter options and new sparklines.  I completed an initial first draft of this document this week.  It took quite a bit of thinking through and the document is rather long and the updates have implications for other parts of the site.  I realised whilst writing the sections on filtering that I’ll need to change the way that headword searches function, as they do not currently use Solr and the filter options will rely on Solr and I therefore needed to add a section explaining this.  Also, whilst thinking through the sparklines I realised we might want to cluster the data to make continuous blocks rather than using the individual citation dates and I included a discussion of this in the document too.

I also made a few more updates to the SpeechStar website this week and managed to find some time to return working on the Books and Borrowing front-end.  For this I implemented a first version of the ‘On this day’ feature, which I’ve currently added to the homepage of the dev site.  What the feature does is to pick out a random borrowing for the current day and display information about it, for example:

“On this day in 1829, Mr Robert Allan, a borrower at Advocates Library borrowed 2 volumes of Histoire de la Vie et de la Mort des deux illustres fréres Corneille et Jean de Witt. by Cornelis de Witt.”

The feature picks out and displays the borrower, the library, the number of volumes borrowed (if this is over 1), plus the title and the author of the book borrowed.  The borrower, title and author are links to perform a search while the library is a link to the library page.  There is a ‘reload’ button in the bottom left and when you press on this the area scroll ups and then scrolls down again with a new randomly selected borrowing from the day.  There is also a link in the bottom right to view all of the borrowings for the current day.  These are presented on a new page that also features options to select a different day and month, in case people want to see what was borrowed on their birthday, for example.  On this day items on this page are listed in date order and then by library.  It’s maybe not the most serious and academic of features, but I think it’s a nice addition and makes the data feel more alive.

I’m going to be on holiday again for the next two weeks so there won’t be another report from me until the week of the 14th of August.