Week Beginning 5th April 2021

This week began with Easter Monday, which was a holiday.  I’d also taken Tuesday and Thursday off to cover some of the Easter school holidays so it was a two-day working week for me.  I spent some of this time continuing to download and process images of library register books for the Books and Borrowing project, including 14 from St Andrews and several further books from Edinburgh.  I was also in communication with one of the people responsible for the Dictionary of the Scots Language’s new editor interface regarding the export of new data from this interface and importing it into the DSL’s website.  I was sent a ZIP file containing a sample of the data for SND and DOST, plus a sample of the bibliographical data, with some information on the structure of the files and some points for discussion.

I looked through all of the files and considered how I might be able to incorporate the data into the systems that I created for the DSL’s website.  I should be able to run the new dictionary XML files through my upload script with only a few minor modifications required.  It’s also really great that the bibliographies and cross references are getting sorted via the new Editor interface.  One point of discussion is that the new editor interface has generated new IDs for the entries, and the old IDs are not included.  I reckoned that it would be good if the old IDs were included in the XML as well, just in case we ever need to match up the current data with older datasets.  I did notice that the old IDs already appeared to be included in the <url> fields, but after discussion we decided that it would be safer to include them as an attribute of the <entry> tag, e.g. <entry oldid=”snd848”> or something like that, which is what will happen when I receive the full dataset.

There are also new labels for entries, stating when and how the entry was prepared.  The actual labels are stored in a spreadsheet and a numerical ID appears in the XML to reference a row in the spreadsheet.  This method of dealing with labels seems fine with me – I can update my system to use the labels from the spreadsheet and display the relevant labels depending on the numerical codes in the entry XML.  I reckon it’s probably better to not store the actual labels in the XML as this saves space and makes it easier to change the label text, if required, as it’s only then stored in a single place.

The bibliographies are looking good in the sample data, but I pointed out that it might be handy to have a reference of the old bibliographical IDs in the XML, if that’s possible.  There were also spurious xmlns=”” attributes in the new XML, but these shouldn’t pose any problems and I said that it’s ok to leave them in.  Once I receive the full dataset with some tweaks (e.g. the inclusion of old IDs) then I will do some further work on this.

I spent most of the rest of my available time working on the new Comparative Kingship place-names systems.  I completed work on the Scotland CMS, including adding in the required parishes and former parishes.  This means my place-name system has now been fully modernised and uses the Bootstrap framework throughout, which looks a lot better and works more effectively on all screen dimensions.

I also imported the data from GB1900 for the relevant parishes.  There are more than 10,000 names, although a lot of these could be trimmed out – lots of ‘F.P.’ for footpath etc.  It’s likely that the parishes listed are rather broader than the study will be.  All the names in and around St Andrews are in there, for example.  In order to generate altitude for each of the names imported from GB1900 I had to run a script I’d written that passes the latitude and longitude for each name in turn to Google Maps, which then returns elevation data.  I had to limit the frequency of submissions to one every few seconds otherwise Google blocks access, so it took rather a long time for the altitudes of more than 10,000 names to be gathered, but the process completed successfully.

Also this week I dealt with an issue with the SCOTS corpus, which had broken (the database had gone offline) and helped Raymond at Arts IT Support to investigate why the Anglo-Norman Dictionary server had been blocking uploads to the dictionary management system when thousands of files were added to the upload form.  It turns out that while the Glasgow IP address range was added into the whitelist the VPN’s IP address range wasn’t, which is why uploads were being blocked.

Next week I’m also taking a couple of days off to cover the Easter School holidays, and will no doubt continue with the DSL and Comparative Kingship projects then.

Week Beginning 29th March 2021

This was a four-day week due to Good Friday.  I spent a couple of these days working on a new place-names project called Comparative Kingship that involves Aberdeen University.  I had several email exchanges with members of the project team about how the website and content management systems for the project should be structured and set up the subdomain where everything will reside.  This is a slightly different project as it will involve place-name surveys in Scotland and Ireland that will be recorded in separate systems.  This is because slightly different data needs to be recorded for each survey, and Ireland has a different grid reference system to Scotland.  For these reasons I’ll need to adapt my existing CMS that I’ve used on several other place-name projects, which will take a little time.  I decided to take the opportunity to modernise the CMS whilst redeveloping it.  I created the original version of the CMS back in 2016, with elements of the interface based on older projects than this, and the interface now looks pretty dated and doesn’t work so well on touchscreens.  I’m migrating the user interface to the Bootstrap user interface framework, which looks more modern and works a lot better on a variety of screen sizes.  It is going to take some time to complete this migration, as I need to update all of the forms used in the CMS, but I made good progress this week and I’m probably about half-way through the process.  After this I’ll still need to update the systems to reflect the differences in the Scottish and Irish data, which will probably take several more days, especially if I need to adapt the system of automatically generating latitude, longitude and altitude from a grid reference to work with Irish grid references.

I also continued with the development of the Dictionary Management System for the Anglo-Norman Dictionary, fixing some issues relating to how sense numbers are generated (but uncovering further issues that still need to be addressed) and fixing a bug whereby older ‘history’ entries were not getting associated with new versions of entries that were uploaded.  I also created a simple XML preview facility, which allows the editor to paste their entry XML into a text area and for this to then be rendered as it would appear in the live site.  I also made a large change to how the ‘upload XML entries’ feature works.  Previously editors could attach any number of individual XML files to the form (even thousands) and these would then get uploaded.  However, I encountered an issue with the server rejecting so many file uploads in such a short period of time and blocking access to the PC that sent the files.  To get around this I investigated allowing a ZIP file containing XML files to be uploaded instead.  Upon upload my script would then extract the ZIP and process all of the XML files contained therein.  It turns out that this approach worked very well – no more issues with the server rejecting files and the processing is much speedier as it all happens in a batch rather than the script being called each time a single file is uploaded.  I tested the ZIP approach by zipping up all 3,179 XML files from the recent R data update and the Zip file was uploaded and processed in a few seconds, with all entries making their way into the holding area.  However, with this approach there is no feedback in the ‘Upload Log’ until the server-side script has finished processing all of the files in the ZIP, at which point all updates appear in the log at the same time, so there may be a wait of maybe 20-30 seconds (if it’s a big ZIP file) before it looks like anything has happened.  Despite this I’d say that with this update the DMS should now be able to handle full letter updates.

Also this week I added a ‘name of the month’ feature to the homepage of the Iona place-names project (https://iona-placenames.glasgow.ac.uk/) and continued to process the register images for the Books and Borrowing project.  I also spoke to Marc Alexander about Data Management Plans for a new project he’s involved with.