Week Beginning 27th April 2020

The sixth week of lockdown continued in much the same manner as the previous ones, dividing my time between working and home-schooling my son.  I spent the majority of the week continuing to work on the requirements document for the Books and Borrowing project.  As I worked through this I returned to the database design and made some changes as my understanding of the system increased.  This included adding in a new field for the original transcription and a new ‘order on page’ field to the borrowing table as I realised that without such a column it wouldn’t be possible for an RA to add a new record anywhere other than after the last record on a page.  It’s quite likely that an RA will accidentally skip a record and might need to reinstate it, or an RA might intentionally want to leave out some records to return to later.  The ‘order on page’ column (which will be automatically generated but can be manually edited) will ensure these situations can be handled.

As I worked through the requirements I began to realise that the amount of data the RAs may have to compile for each borrowing record is possibly going to be somewhat overwhelming.  Much of it is optional, but completing all the information could take a long time:  creating a new Book Holding and Item record, linking it to an Edition and Work or creating new records for these, associating authors or creating new authors, adding in genre information, creating a new borrower record or associating an existing one, adding in occupations, adding in cross references to other borrowers, writing out a diplomatic transcription, filling in all of the core fields and additional fields.  That’s a huge amount to do for each record and we may need to consider what is going to be possible for the RAs to do in the available time.

By the end of Tuesday I had finished working on a first version of the requirements document, weighing in at more than 11,000 words, and sent it on to Katie and Matt for feedback.  We have agreed to meet (via Zoom) next Tuesday to discuss any changes to the document.  During the rest of the week I began to develop the systems for the project.  This included implementing the database (creating each of the 23 tables that will be needed to store the project’s data) and installing and configuring WordPress, which will be used to power the simple parts of the project website.

I also continued to work with the data for the Anglo-Norman Dictionary this week.  I managed to download all of the files from the server over the weekend.  I now have two versions of the ‘entry_hash’ database file plus many thousand XML files that were in the ‘commit-out’ directory.  I extracted the different version of the ‘entry_hash’ table using the method I figured out last week and discovered that it was somewhat larger than the version I was working with last week, containing 4,556,011 lines as opposed to 3,969,350 and 54,025 entries as opposed to 53,945.  I sent this extracted file to Heather for her to have a look at.

I then decided to update the test website I’d made a few months ago.  This version used the ‘all.xml’ file as a data source and allowed a user to browse the dictionary entries using a list of all the headwords and to view each entry (well, certain aspects of the entry that I’d formatted from the XML).  Thankfully I managed to locate the scripts I’d used to extract the XML and migrate it to a database on the server and I ran both versions of the ‘entry_hash’ output through this script, resulting in three different dictionary data sources.  I then updated the simple website to add in a switcher to swap between data sources.  Extracting the data also led to me realising that there was a ‘deleted’ record type in the entry table and if I disregarded records of this type the older entry_hash data had 53,925 entries and the newer one had 54,002, so a difference of 77.  The old ‘all.xml’ data from 2015 had 50,403 entries that aren’t set to ‘deleted’.  In looking through the files from the server I had also managed to track down the XSLT file used to transform the XML into HTML for the existing website, so I added this to my test website, together with the CSS file from the existing website.  This meant the full entries could now be displayed in a fully formatted manner, which is useful.

Heather had a chance to look through and compare the three test website versions and discovered that the new ‘entry_hash’ version contained all of the data that was in their data management system that had yet to be published on the public website.  This was really good news as it meant that we now have a complete dataset without needing to integrate individual XML files.  With a full dataset now secured I am now in a position to move on to the requirements for the new dictionary website.

Also this week I made some further tweaks to the ‘guess the category’ quiz for the Historical Thesaurus.  The layout now works better on a phone in portrait mode (the choices now take up the full width of the area).  I also fixed the line-height of the quiz word, which was previously overlapping if this went over more than one line.  I updated things so that when pressing ‘next’ or restarting the quiz the page automatically scrolls so that the quiz word is in view.  I fixed the layout to ensure that there should always be enough space for the ticks and crosses now (they should no longer end up dropping down below the box if the category text is long).  Also, any very long words in the category that previously ended up breaking out of their boxes are now cut off when they reach the edge of the box.  I could auto-hyphen long words to split them over multiple lines, and I might investigate this next week.  I also fixed an issue with the ‘Next’ button: when restarting a quiz after reaching the summary the ‘Next’ button was still labelled ‘Summary’.  Finally, I’ve added a timer to the quiz so you can see how long it took you and to try and beat your fastest time.  When you reach the summary page your time is now displayed in the top bar along with your score.  I’ve also added some text above the ‘Try again’ button.  If you get less than full marks is says “Can you do better next time?” and if you did get full marks it says “You got them all right, but can you beat your time of x”.

Finally this week I helped Roslyn Potter of the DSL to get a random image loading into a new page.  This page asks people to record themselves saying a selection of Scots words and there are 10 different images featuring words.  The random feature displays one image and provides a button to load another random image, plus a feature to view all of the images.  The page can be viewed here: https://dsl.ac.uk/scotsvoices/