Week Beginning 16th November 2020

I spent two days each working on updates to the Dictionary of the Scots Language and the redevelopment of the Anglo-Norman Dictionary this week, with the remaining day spent on tasks for a few projects.  For the DSL I fixed an issue with the way a selected item was being remembered from the search results.  If you performed a search, then navigated to a page of results that wasn’t the first page then clicked on one of the results to view an entry this was setting a ‘current entry’ variable.  This was used when loading the results from the ‘return to results’ button on the entry page to ensure that the page of the results that featured the entry you were looking at would be displayed. However, if you then clicked the ‘refine search’ button to return to the advanced search page this ‘current entry’ value would be retained, meaning that when you clicked the search button to perform a new search the results would load at whatever page the ‘current entry’ was located on, if it appeared in the new search results.  As the ‘current entry’ would not necessarily be in the new results set the issue only cropped up every now and then.  Thankfully having identified the issue it was easy to fix – whenever the search form loads as a result of a ‘refine search’ selection the ‘current entry’ variable is now cleared.  It is still retained and used when you return to the search results from an entry page.

I then investigated an issue with citation numbers that will need further input from the editors before moving onto implementing an option that allows you to show or hide the ‘browse’ panel on the right of entries.  I removed the animations when you show and hide either the ‘search’ or ‘browse’ columns as these were proving to be a bit clunky with so much content being shown or hidden.  I also had to rework how the hide button in the ‘search’ column functions in order to get the new options working, but hopefully this hasn’t introduced any issues.  Additionally, I had to make quite a few updates to the stylesheet and the JavaScript to get this to work, as the width of the ‘entry’ column now has double the number of possible values.  Also, I needed to ensure that the entry width with and without the ‘browse’ column worked at all screen dimensions as different styles are called at different screen widths.  Currently the choice of which columns are visible resets every time the entry page loads but I may make the system remember your choice during a session, meaning if you hide the browse column once it will stay hidden.  The change isn’t live yet but only works on one of our test sites, but once I get feedback from the editors I’ll apply it to the live site.

I also had a discussion with the editors about removing duplicate child entries from the data and grabbing an updated dataset from the old editing system one last time.  Removing duplicate child entries will mean deleting several thousand entries and I was reluctant to do so on from the test systems we already have in case anything goes wrong, so I’ve decided to set up a new test version, which will be available via a new version of the API and will be accessible via one of the existing test front-ends I’ve created.  I’ll be tackling this next week.

For the AND I focussed on importing more than 4,500 new or updated entries into the dictionary.  In order to do this I needed to look at the new data and figure out how it differed structurally from the existing data, as previously the editors’ XML was passed through an online system that made changes to it before it was published on the old website.  I discovered that there were five main differences:

  1. <main_entry> does not have an ID, or a ‘lead’ or a ‘unikey’. I’ll need to add in the ‘lead’ attribute as this is what controls that editor’s initials on the entry page and I decided to add in an ID as a means of uniquely identifying the XML, although there is already a new unique ID for each entry in the database.  ‘unikey’ doesn’t seem to be used anywhere so I decided not to do anything about this.
  2. <sense> and <subsense> do not have IDs or ‘n’ numbers. The latter I already set up a script to generate so I could reuse that.  The former aren’t used anywhere as each sense also has a <senseInfo> with another ID associated with it.
  3. <senseInfo> does not have IDs, ‘seq’ or ‘pos’. IDs here are important as they are used to identify senses / subsenses for the searches and I needed to develop a way of adding these in here.  ‘seq’ does not appear to be used but ‘pos’ is.  It’s used to generate the different part of speech sections between senses and in the ‘summary’ and this will need to be added in.
  4. <attestation> does not have IDs and these are needed as they are recorded in the translation data.
  5. <dateInfo> has an ID but the existing data from the DMS does not have IDs for <dateInfo>. I decided to retain these but they won’t be used anywhere.

I wrote a script that processed the XML to add in entry IDs, editor initials, sense numbers, senseInfo IDs, parts of speech and attestation IDs.  This took a while to implement and test but it seemed to work successfully.  After that I needed to work on a script that would import the data into my online system, which included regenerating all of the data used for search purposes, such as extracting cross references, forms, labels, citations and their dates, translations and earliest dates for entries.  All seemed to work well and I made the new data available via the front-end for the editors to test out.

I also asked the editors about how to track different versions of data so we know which entries are new or updated as a result of the recent update.  It turned out that there are six different statements that need to be displayed underneath entries depending on when the entries were published so I spent a bit of time applying these to entries and updating the entry page to display the notices.

After that I made a couple of tweaks to the new data (e.g. links to the MED were sometimes missing some information needed for the links to work) and discussed adding in commentaries with Geert.  I then went through all of the emails to and from the editors in order to compile a big list of items that I still needed to tackle before we can launch the site.  It’s a list that totals some 51 items, so I’m going to have my work cut out for me, especially as it is vital that the new site launches before Christmas.

The other projects I worked on this week included the interactive map of Burns Suppers for Paul Malgrati in Scottish Literature.  Last week I’d managed to import the core fields from his gigantic spreadsheet and this week I made some corrections to the data and created the structure to hold the filters all of the filters that are in the data.

I wrote a script that goes through all of the records in the spreadsheet and stores the filters in the database when required.  Note that a filter is only stored for a record when it’s not set to ‘NA’, which cuts down on the clutter.  There are a total of 24,046 filters now stored in the system.  The next stage will be to update the front-end to add in the options to select any combination of filter and to build the queries necessary to process the selection and return the relevant locations, which is something I aim to tackle next week.

Also this week I participated in the weekly Zoom call for the Iona place-names project and I updated the Iona CMS to strip out all of the non-Iona names and gave the team access to the project’s Content Management system.  The project website also went live this week, although as of yet there is still not much content on it.  It can be found here, though: https://iona-placenames.glasgow.ac.uk/

Also this week I helped to export some data for a member of the Berwickshire place-names project team, I responded to a query from Gerry McKeever about the St. Andrews data in the Books and Borrowing project’s database and I fixed an issue with Rob Maslen’s City of Lost Books site, which had somehow managed to lose its nice header font.