Week Beginning 15th June 2020

This was week 13 of Lockdown, with still no end in sight.  I spent most of my time on the Books and Borrowing project, as there is still a huge amount to do to get the project’s systems set up.  Last week I’d imported several thousand records into the database and had given the team access to the Content Management System to test things out.  One thing that cropped up was that the autocomplete that is used for selecting existing books, borrowers and authors was sometimes not working, or if it did work on selection of an item the script that then populates all of the fields about the book, borrower or author was not working.  I’d realised that this was because there were invisible line break characters (\n or \r) in the imported data and the data is passed to the autocomplete via a JSON file.  Line break characters are not allowed in a JSON file and therefore the autocomplete couldn’t access the data.  I spent some time writing a script that would clean the data of all offending characters and after running this the autocomplete and pre-population scripts worked fine.  However, a further issue cropped up with the text editors in the various forms in the CMS.  These use the TinyMCE widget to allow formatting to be added to the text area, which works great.  However, whenever a new line is created this adds in HTML paragraphs ( ‘<p></p>’, which is good) but the editor also adds a hidden line break character (‘\r’ or ‘\n’ which is bad).  When this field is then used to populate a form via the selection of an autocomplete value the line break makes the data invalid and the form fails to populate.  After identifying this issue I managed ensured all such characters are stripped out of any uploaded data and that fixed the issue.

I had to spend some time fixing a few more bugs that the team had uncovered during the week.  The ‘delete borrower’ option was not appearing, even when a borrower was associated with no records, and I fixed this.  There was also an issue with autocompletes not working in certain situations (e.g. when trying to add an existing borrower to a borrowing record that was initially created without a borrower).  I tracked down and fixed these.  Another issue involved the record page order incrementing whenever the record was edited, even when this had not been manually changed, while another involved book edition data not getting saved in some cases when a borrowing record was created.  I tracked down and fixed these issues too.

With these fixes in place I then moved on to adding new features to the CMS, specifically facilities to add and browse the book works, editions and authors that are used across the project.  Pressing on the ‘Add Book’ menu item  nowloads a page through which you can choose to add a Book Work or a Book Edition (with associated Work, if required).  You can also associate authors with the Works and Editions too. Pressing on the ‘Browse Books’ option now loads a page that lists all of the Book Works in a table, with counts of the number of editions and borrowing records associated with each.  There’s also a row for all editions that don’t currently have a work.  There are currently 1925 such editions so most of the data appears in this section, but this will change.

Through the page you can edit a work (including associating authors) by pressing on the ‘edit’ button.  You can delete a work so long as it isn’t associated with an Edition.  You can bring up a list of all editions in the work by pressing on the eye icon.  Once loaded, the editions are displayed in a table.  I may need to change this as there are so many fields relating to editions that the table is very wide.  It’s usable if I make my browser take up the full width of my widescreen monitor, but for people using a smaller screen it’s probably going to be a bit unwieldy.  From the list of editions you can press the ‘edit’ button to edit one of them – for example assigning one of the ‘no work’ editions to a work (existing or newly created via the edit form).  You can also delete an edition if it’s not associated with anything.  The Edition table includes a list of borrowing records, but I’ll also need to find a way to add in an option to display a list of all of the associated records for each, as I imagine this will be useful.

Pressing on the ‘Add Author’ menu item brings up a form allowing a new author to be added, which will then be available to associate with books throughout the CMS, while pressing on the ‘Browse Authors’  menu item brings up a list of authors.  At the moment this table (and the book tables) can’t be reordered by their various columns.  This is something else I still need to implement.  You can delete an author if it’s not associated with anything and also edit the author details.  As with the book tables I also need to add in a facility to bring up a list of all records the author is associated with, in addition to just displaying counts. I also noticed that there seems to be a bug somewhere that is resulting in blank authors occasionally being generated, and I’ll need to look into this.

I then spent some time setting up the project’s server, which is hosted at Stirling University.  I was given access details by Stirling’s IT Support people and managed to sign into the Stirling VPN and get access to the server and the database.  There was an issue getting write access to the server, but after that was resolved I was able to upload all of the CMS files, set up the WordPress instance that will be the main project website and migrate the database.

I was hoping I’d be able to get the CMS up and running on the new server without issue, but unfortunately this did not prove to be the case.  It turns out that the Stirling server uses a different (and newer) version of the PHP scripting language than the Glasgow server and some of the functionality is different, for example on the Glasgow server you can call a function with less parameters than it is set up to require (e.g. addAuthor(1) when the function is set up to take 2 parameters (e.g.addAuthor(1,2)).  The version on the Stirling server doesn’t allow this and instead the script breaks and a blank page is displayed.  It took a bit of time to figure out what was going on, and now I know what the issue is I’m going to have to go through every script and check how every function is called, and this is going to be my priority next week.

I also spent a bit of time finalising the website for the project’s pilot project, which deals with borrowing records at Glasgow.  This was managed by Matt Sangster, and he’d sent me a list of things we wanted to sort; I spent a few hours going through this, and we’re just about at the point where the website can be made publicly available.

I had intended to spend Friday working on the new way of managing dates for the Historical Thesaurus.  The script I’d created to generate the dates for all 790,000-odd lexemes completed during last Friday night and over the weekend I wrote another script that would then shift the connectors up one (so a dash would be associated with the date before the dash rather than the one after it, for example).  This script then took many hours to run.  Unfortunately I didn’t get a chance to look further into this until Thursday, when I found a bit of time to analyse the output, at which point I realised that while the generation of the new fulldate field had worked successfully, the insertion of bracketed dates into the new dates table had failed, as the column was set as an integer and I’d forgotten to strip out the brackets.  Due to this problem I had to set my scripts running all over again.  The first one completed at lunchtime on Friday, but the second didn’t complete until Saturday so I didn’t manage to work on the HT this week.  However, this did mean that I was able to return to a Scots Thesaurus data processing task that Fraser asked me to look into at the start of May, so it’s not all bad news.

Fraser’s task required me to set up the Stanford Part of Speech tagger on my computer, which meant configuring Java and other such tasks that took a bit of time.  I then write a script that took the output of a script I’d written over a year ago that contained monosemous headwords in the DOST data, ran their definitions through the Part of Speech tagger and then outputted this to a new table.  This may sound straightforward, but it took quite some time to get everything working, and then another couple of hours for the script to process around 3,000 definitions.  But I was able to send the output to Fraser on Friday evening.

Also this week I gave advice to a few members of staff, such as speaking to Matthew Creasy about his new Scottish Cosmopolitanism project, Jane Stuart-Smith about a new project that she’s putting together with QMU, Heather Pagan of the Anglo-Norman Dictionary about a proposal she’s putting together, Rhona Alcorn about the Scots School Dictionary app and Gerry McKeever about publicising his interactive map.