Week Beginning 13th July 2020

This was week 18 of Lockdown, which is now definitely easing here.  I’m still working from home, though, and will be for the foreseeable future.  I took Friday off this week, so it was a four-day week for me.  I spent about half of this time on the Books and Borrowing project, during which time I returned to adding features to the content management system, after spending recent weeks importing datasets.  I added a number of indexes to the underlying database which should speed up the loading of certain pages considerably.  E.g. the browse books, borrowers and author pages.  I then updated the ‘Books’ tab when viewing a library (i.e. the page that lists all of the book holdings in the library) so that it now lists the number of book holdings in the library above the table.  The table itself now has separate columns for all additional fields that have been created for book holdings in the library and it is now possible to order the table by any of the headings (pressing on a heading a second time reverses the ordering).  The count of ‘Borrowing records’ for each book in the table is now a button and pressing on it brings up a popup listing all of the borrowing records that are associated with the book holding record, and from this pop-up you can then follow a link to view the borrowing record you’re interested in.  I then made similar changes to the ‘Borrowers’ tab when viewing a library (i.e. the page that lists all of the borrowers the library has). It also now displays the total number of borrowers at the top.  This table already allowed the reordering by any column, so that’s not new, but as above, the ‘Borrowing records’ count is now a link that when clicked on opens a list of all of the borrowing records the borrower is associated with.

The big new feature I implemented this week was borrower cross references.   These can be added via the ‘Borrowers’ tab within a library when adding or editing a borrower on this page.  When adding or editing a borrower there is now a section of the form labelled ‘Cross-references to other borrowers’.  If there are any existing cross references these will appear here, with a checkbox beside each that you can tick if you want to delete the cross reference (the user can tick the box then press ‘Edit’ to edit the borrower and the reference will be deleted).  Any number of new cross references can be added by pressing on the ‘Add a cross-reference’ button (multiple times, if required).  Doing so adds two fields to the form, one for a ‘description’, which is the text that shows how the current borrower links to the referenced borrowing record, and one for ‘referenced borrower’, which is an auto-complete.  Type in a name or part of a name and any borrower that matches in any library will be listed.  The library appears in brackets after the borrower’s name to help differentiate records.  Select a borrower and then when the ‘Add’ or ‘Edit’ button is pressed for the borrower the cross reference will be made.

Cross-references work in both directions – if you add a cross reference from Borrower A to Borrower B you don’t then need to load up the record for Borrower B to add a reference back to Borrower A.  The description text will sit between the borrower whose form you make the cross reference on and the referenced borrower you select, so if you’re on the edit form for Borrower A and link to Borrower B and the description is ‘is the son of’ then the cross reference will appear as ‘Borrower A is the son of Borrower B’.  If you then view Borrower B the cross reference will still be written in this order.  I also updated the table of borrowers to add in a new ‘X-Refs’ column that lists all cross-references for a borrower.

I spent the remainder of my working week completing smaller tasks for a variety of projects, such as updating the spreadsheet output of duplicate child entries for the DSL people, getting an output of the latest version of the Thesaurus of Old English data for Fraser, advising Eleanor Lawson on ‘.ac.uk’ domain names and having a chat with Simon Taylor about the pilot Place-names of Fife project that I worked on with him several years ago.  I also wrote a Data Management Plan for a new AHRC proposal the Anglo-Norman Dictionary people are putting together, which involved a lengthy email correspondence with Heather Pagan at Aberystwyth.

Finally, I returned to the ongoing task of merging data from the Oxford English Dictionary with the Historical Thesaurus.  We are currently attempting to extract citation dates from OED entries in order to update the dates of usage that we have in the HT.  This process uses the new table I recently generated from the OED XML dataset which contains every citation date for every word in the OED (more than 3 million dates).  Fraser had prepared a document listing how he and Marc would like the HT dates to be updated (e.g. if the first OED citation date is earlier than the HT start date by 140 years or more then use the OED citation date as the suggested change).  Each rule was to be given its own type, so that we could check through each type individually to make sure the rules were working ok.

It took about a day to write an initial version of the script, which I ran on the first 10,000 HT lexemes as a test.  I didn’t split the output into different tables depending on the type, but instead exported everything to a spreadsheet so Marc and Fraser could look through it.

In the spreadsheet if there is no ‘type’ for a row it means it didn’t match any of the criteria, but I included these rows anyway so we can check whether there are any other criteria the rows should match.  I also included all the OED citation dates (rather than just the first and last) for reference.  I noted that Fraser’s document doesn’t seem to take labels into consideration.  There are some labels in the data, and sometimes there’s a new label for an OED start or end date when nothing else is different, e.g. htid 1479 ‘Shore-going’:  This row has no ‘type’ but does have new data from the OED.

Another issue I spotted is that as the same ‘type’ variable is set when a start date matches the criteria and then when an end date matches the criteria, the ‘type’ as set during start date is then replaced with the ‘type’ for end date.  I think, therefore, that we might have to split the start and end processes up, or append the end process type to the start process type rather than replacing it (so e.g. type 2-13 rather than type 2 being replaced by type 13).  I also noticed that there are some lexemes where the HT has ‘current’ but the OED has a much earlier last citation date (e.g. htid 73 ‘temporal’ has 9999 in the HT but 1832 in the OED.  Such cases are not currently considered.

Finally, according to the document, Antes and Circas are only considered for update if the OED and HT date is the same, but there are many cases where the start / end OED date is picked to replace the HT date (because it’s different) and it has an ‘a’ or ‘c’ and this would then be lost.  Currently I’m including the ‘a’ or ‘c’ in such cases, but I can remove this if needs be (e.g. HT 37 ‘orb’ has HT start date 1601 (no ‘a’ or ‘c’) but this is to be replaced with OED 1550 that has an ‘a’.  Clearly the script will need to be tweaked based on feedback from Marc and Fraser, but I feel like we’re finally making some decent progress with this after all of the preparatory work that was required to get to this point.

Next Monday is the Glasgow Fair holiday, so I won’t be back to work until the Tuesday.