This was my first week back after the Christmas holidays, and it was a three-day week. I spent the days almost exclusively on the Books and Borrowing project. We had received a further batch of images for 23 library registers from the NLS, which I needed to download from the NLS’s server and process. This involved renaming many thousands of images via a little script I’d written in order to give the images more meaningful filenames and stripping out several thousand images of blank pages that had been included but are not needed by the project. I then needed to upload the images to the project’s web server and then generate all of the necessary register and page records in the CMS for each page image.
I also needed up update the way folio numbers were generated for the registers. For the previous batch of images from the NLS I had just assigned the numerical part of the image’s filename as the folio number, but it turns out that most of the images have a hand-written page number in the top-right which starts at 1 for the first actual page of borrowing records. There are usually a few pages before this, and these need to be given Roman numerals as folio numbers. I therefore had to write another script that would take into consideration the number of front-matter pages in each register, assign Roman numerals as folio numbers to them and then begin the numbering of borrowing record pages from 1 after that, incrementing through the rest of the volume.
I guess it was inevitable with data of this sort, but I ran into some difficulties whilst processing it. Firstly, there were some problems with the Jpeg images the NLS had sent for two of the volumes. These didn’t match the Tiff images for the volumes, with each volume having an incorrect number of files. Thankfully the NLS were able to quickly figure out what had gone wrong and were able to supply updated images.
The next issue to crop up occurred when I began to upload the images to the server. After uploading about 5Gb of images the upload terminated, and soon after that I received emails from the project team saying they were unable to log into the CMS. It turns out that the server had run out of storage. Each time someone logs into the CMS the server needs a tiny amount of space to store a session variable, but there wasn’t enough space to store this, meaning it was impossible to log in successfully. I emailed the IT people at Stirling (Where the project server is located) to enquire about getting some further space allocated but I haven’t heard anything back yet. In the meantime I deleted the images from the partially uploaded volume which freed up enough space to enable the CMS to function again. I also figured out a way to free up some further space: The first batch of images from the NLS also included images of blank pages across 13 volumes – several thousand images. It was only after uploading these and generating page records that we had decided to remove the blank pages, but I only removed the CMS records for these pages – the image files were still stored on the server. I therefore wrote another script to identify and delete all of the blank page images from the first batch that was uploaded, which freed up 4-5Gb of space from the server, which was enough to complete the upload of the second batch of registers from the NLS. We will still need more space, though, as there are still many thousands of images left to add.
I also took the opportunity to update the folio numbers of the first batch of NLS registers to bring them into line with the updated method we’d decided on for the second batch (Roman numerals for front-matter and then incrementing page numbers from the first page of borrowing records). I wrote a script to renumber all of the required volumes, which was mostly a success.
However, I also noticed that the automatically generated folio numbers often became out of step with the hand-written folio numbers found in the top-right corner of the images. I decided to go through each of the volumes to identify all that became unaligned and to pinpoint on exactly which page or pages the misalignment occurred. This took some time as there were 32 volumes that needed checked, and each time an issue was spotted I needed to look back through the pages and associated images from the last page until I found the point where the page numbers correctly aligned. I discovered that there were numbering issues with 14 of the 32 volumes, mainly due to whoever wrote the numbers in getting muddled. There are occasions where a number is missed, or a number is repeated. In once volume the page numbers advance by 100 from one page to the next. It should be possible for me to write a script that will update the folio numbers to bring them into alignment with the erroneous handwritten numbers (for example where a number is repeated these will be given ‘a’ and ‘b’ suffixes). I didn’t have time to write the script this week but will do so next week.
Also for the project this week I looked through the spreadsheet of borrowing records from the Royal High School of Edinburgh that one of the RAs has been preparing. I had a couple of questions about the spreadsheet, and I’m hoping to be able to process it next week. I also exported the records from one register for Gerry McKeever to work on, as these records now need to be split across two volumes rather than one.
Also this week I had an email conversation with Marc Alexander about a few issues, during which he noted that the Historical Thesaurus website was offline. Further investigation revealed that the entire server was offline, meaning several other websites were down too. I asked Arts IT Support to look into this, which took a little time as it was a physical issue with the hardware and they were all still working remotely. However, the following day they were able to investigate and address the issue, which they reckon was caused by a faulty network port.