Week Beginning 10th January 2022

I continued to work on the Books and Borrowing project for a lot of this week, completing some of the tasks I began last week and working on some others.  We ran out of server space for digitised page images last week, and although I freed up some space by deleting a bunch of images that were no longer required we still have a lot of images to come.  The team estimates that a further 11,575 images will be required.  If the images we receive for these pages are comparable to the ones from the NLS, which average around 1.5Mb each, then 30Gb should give us plenty of space.  However, after checking through the images we’ve received from other digitisation units it turns out that the  NLS images are a vit of an outlier in term of file size and generally 8-10Mb is more usual.  If we use this as an estimate then we would maybe require 120Gb-130Gb of additional space.  I did some experiments with resizing and changing the image quality of one of the larger images, managing to bring an 8.4Mb image down to 2.4Mb while still retaining its legibility.  If we apply this approach to the tens of thousands of larger images we have then this would result in a considerable saving of storage.  However, Stirling’s IT people very kindly offered to give us a further 150Gb of space for the images so this resampling process shouldn’t be needed for now at least.

Another task for the project this week was to write a script to renumber the folio numbers for the 14 volumes from the Advocates Library that I noticed had irregular numbering.  Each of the 14 volumes had different issues with their handwritten numbering, so I had to tailor my script to each volume in turn, and once the process was complete the folio numbers used to identify page images in the CMS (and eventually in the front-end) entirely matched the handwritten numbers for each volume.

My next task for the project was to import the records for several volumes from the Royal High School of Edinburgh but I ran into a bit of an issue.  I had previously been intending to extract the ‘item’ column and create a book holding record and a single book item record for each distinct entry in the column.  This would then be associated with all borrowing records in RHS that also feature this exact ‘item’.  However, this is going to result in a lot of duplicate holding records due to the contents of the ‘item’ column including information about different volumes of a book and/or sometimes using different spellings.

For example, in SL137142 the book ‘Banier’s Mythology’ appears four times as follows (assuming ‘Banier’ and ‘Bannier’ are the same):

  1. Banier’s Mythology v. 1, 2
  2. Banier’s Mythology v. 1, 2
  3. Bannier’s Myth 4 vols
  4. Bannier’s Myth. Vol 3 & 4

My script would create one holding and item record for ‘Banier’s Mythology v. 1, 2’ and associate it with the first two borrowing records but the 3rd and 4th items above would end up generating two additional holding / item records which would then be associated with the 3rd and 4th borrowing records.

No script I can write (at least not without a huge amount of work) would be able to figure out that all four of these books are actually the same, or that there are actually 4 volumes for the one book, each requiring its own book item record, and that volumes 1 & 2 need to be associated with borrowing records 1&2 while all 4 volumes need to be associated with borrowing record 3 and volumes 3&4 need to be associated with borrowing record 4.  I did wonder whether I might be able to automatically extract volume data from the ‘item’ column but there is just too much variation.

We’re going to have to tackle the normalisation of book holding names and the generation of all required book items for volumes at some point and this either needs to be done prior to ingest via the spreadsheets or after ingest via the CMS.

My feeling is that it might be simpler to do it via the spreadsheets before I import the data.  If we were to do this then the ‘Item’ column would become the ‘original title’ and we’d need two further columns, one for the ‘standardised title’ and one listing the volumes, consisting of a number of each volume separated with a comma.  With the above examples we would end up with the following (with a | representing a column division):

  1. Banier’s Mythology v. 1, 2 | Banier’s Mythology | 1,2
  2. Banier’s Mythology v. 1, 2 | Banier’s Mythology | 1,2
  3. Bannier’s Myth 4 vols | Banier’s Mythology | 1,2,3,4
  4. Bannier’s Myth. Vol 3 & 4 | Banier’s Mythology | 3,4

If each sheet of the spreadsheet is ordered alphabetically by the ‘item’ column it might not take too long to add in this information.  The additional fields could also be omitted where the ‘item’ column has no volumes or different spellings.  E.g. ‘Hederici Lexicon’ may be fine as it is.  If the ‘standardised title’ and ‘volumes’ columns are left blank in this case then when my script reaches the record it will know to use ‘Hederici Lexicon’ as both original and standardised titles and to generate one single unnumbered book item record for it.  We agreed that normalising the data prior to ingest would be the best approach and I will therefore wait until I receive updated data before I proceed further with this.

Also this week I generated a new version of a spreadsheet containing the records for one register for Gerry McKeever, who wanted borrowers, book items and book holding details to be included in addition to the main borrowing record.  I also made a pretty major update to the CMS to enable books and borrower listings for a library to be filtered by year of borrowing in addition to filtering by register.  Users can either limit the data by register or year (not both).  They need to ensure the register drop-down is empty for the year filter to work, otherwise the selected register will be used as the filter.  On either the ‘books’ or ‘borrowers’ tab in the year box they can add either a single year (e.g. 1774) or a range (e.g. 1770-1779).  Then when ‘Go’ is pressed the data displayed is limited to the year or years entered.  This also includes the figures in the ‘borrowing records’ and ‘Total borrowed items’ columns.  Also, the borrowing records listed when a related pop-up is opened will only feature those in the selected years.

I also worked with Raymond in Arts IT Support and Geert, the editor of the Anglo-Norman Dictionary to complete the process of migrating the AND website to the new server.  The website (https://anglo-norman.net/) is now hosted on the new server and is considerably faster than it was previously.  We also took the opportunity the launch the Anglo-Norman Textbase, which I had developed extensively a few months ago.  Searching and browsing can be found here: https://anglo-norman.net/textbase/ and this marks the final major item in my overhaul of the AND resource.

My last major task of the week was to start work on a database of ultrasound video files for the Speech Star project.  I received a spreadsheet of metadata and the video files from Eleanor this week and began processing everything.  I wrote a script to export the metadata into a three-table related database (speakers, prompts and individual videos of speakers saying the prompts) and began work on the front-end through which this database and the associated video files will be accessed.  I’ll be continuing with this next week.

In addition to the above I also gave some advice to the students who are migrating the IJOSTS journal over the WordPress, had a chat with the DSL people about when we’ll make the switch to the new API and data, set up a WordPress site for Joanna Kopaczyk for the International Conference on Middle English, upgraded all of the WordPress sites I manage to the latest version of WordPress, made a few tweaks to the 17th Century Symposium website for Roslyn Potter, spoke to Kate Simpson in Information Studies about speaking to her Digital Humanities students about what I do and arranged server space to be set up for the Speak For Yersel project website and the Speech Star project website.  I also helped launch the new Burns website: https://burnsc21-letters-poems.glasgow.ac.uk/ and updated the existing Burns website to link into it via new top-level tabs.  So a pretty busy week!