This was week 14 of Lockdown and I spent most of it continuing to work on the Books and Borrowing project. Last week I’d planned to migrate the CMS from my test server at Glasgow to the official project server at Stirling, but during the process some discrepancies between PHP versions on the servers meant that the code which worked fine at Glasgow was giving errors at Stirling. As mentioned in last week’s post, on the Stirling server calling a function while passing less than the required number of variables resulted in a fatal error, plus database ‘warnings’ (e.g. an empty string rather than a numeric zero being inserted into an integer field) were being treated as fatal errors too. It took most of Monday to go through my scripts and identify all the places such issues cropped up, but by the end of the day I had the CMS set up and fully usable at Stirling and had asked the team to start using it.
I then spent some further time working on the public website for the project, installing a theme, working with fonts and colour schemes, selecting header images, adding logos to the footer and other such matters. I made six different versions of the interface and emailed screenshots to the team for comment. We all agreed on the interface and I then made some further tweaks to it, during which time team member Kit Baston was adding content to the pages. On Thursday the website went live and you can access it here: https://borrowing.stir.ac.uk/. Here’s a screenshot too:
I also continued to make improvements to the CMS this week, adding new functionality to the pages for browsing book editions, book works and authors. The table of Book Works now includes a column listing the number of Holdings each Work is associated with and now includes the options of ordering the listed Works by any of the columns in the table. When a book work row is expanded and its associated editions loads in, this table also now features the number of holdings an edition is associated with and allows the table to be ordered by any of the columns. I then made the number of holdings and records listed for each Work and Edition a link (so long as the number is greater than 0). Pressing on the link brings up a popup that lists the holdings and records. Each item in the list features an ‘eye’ icon and pressing on this will take you to the record in question (either in the library’s list of holdings or the page that the borrowing record appears on) with the page opening at the item in question.
On Friday I had a Zoom call wit Project PI Katie Halsey and Co-I Matt Sangster to discuss my work on the project and to decide where I should focus my attention next. We agreed that it would be good to get all of the sample data into the system now, so that the team can see what’s already there and begin the process of merging records and rationalising the data. Therefore I’ll be spending a lot of next week writing import scripts for the remaining datasets.
I worked for a number of additional projects this week as well. On Tuesday I had a Zoom call with Jane Stuart-Smith, Eleanor Lawson of QMU and Joanne Cleland of Strathclyde to discuss a new project that they’re putting together. I can’t say too much about it at this stage, but I’ll probably be doing the technical work for the project, if it gets funding. I also spoke with Thomas Clancy about another place-names project that has been funded and I’ll need to adapt my existing place-names system for. This will probably be starting in September and involves a part of East Ayrshire. I also adding in some forum software to Matthew Creasy’s new project website that I recently put together for him. He’s hoping to launch this next week so will probably add in a link to it then.
I also managed to spend some time this week looking into the Historical Thesaurus’s new dates system. My scripts to generate the new HT date structure completed over the weekend and I then had to manually fix the 60 or so label errors that Fraser had previously identified in his spreadsheet. I then wrote a further script to check that the original fulldate, the new fulldate and a fulldate generated on the fly from the new date table all matched for each lexeme. This brought up about a thousand lexemes where the match wasn’t identical. Most of these were due to ‘b’ dates not being recorded in a consistent manner in the original data (sometimes two digits e.g. 1781/86 and sometimes one digit e.g. 1781/6). There were some other issues with dates that had both labels and slashes as connectors, whereby the label ended up associated with both dates rather than just one. There were also some issues with bracketed dates sometimes being recorded with the brackets and sometimes not, plus a few that had a dash before the date instead. I went through the 1000 or so rows and fixed the ones that actually needed fixing (maybe about 50). I then imported the new lexeme_dates table into the online database. There are 1,381,772 rows in it. I also attempted to import the updated lexeme database (which includes a new fulldate column plus new firstdate and lastdate fields). Unfortunately the file contains too much data to be uploaded and the process timed out. I contacted Arts IT Support and they managed to increase the execution time on the server and I was then able to get this second table uploaded too.
Fraser had sent around a document listing the next steps in the data update process and I read through this and began to think things through. Fraser noted that the unique date types list didn’t appear to include ‘a’ and ‘c’ for firstdates. I checked my script that generated the date types (way back in April last year) and spotted an error – the script was looking for a column called ‘oefirstdac’ where it should have been looking for ‘firstdac’. What this means is any lexeme that has an ‘a’ or ‘c’ with its first date has been rolled into the count for regular first dates, but it turns out that this is what Fraser wanted to happen anyway, so no harm was done there.
Before I can make a start on getting all HT lexemes that are XXXX-XXXX, OE-XXXX and XXXX-Current and are matched to an OED lexeme and grabbing the OED date information I’ll need to find a way to actually get the new OED date information. Fraser noted that we can’t just use the OED ‘sortdate’ and ‘enddate’ fields but instead need to use the first and last citation dates as these have ‘a’ and ‘c’. I’m going to need to get access to the most recent version of all of the OED XML files and to write a script that goes through all of the quotations data, such as:
<quotations><q year=”1200″><date>?c1200</date></q><q year=”1392″><date>a1393</date></q><q year=”1450″><date>c1450</date></q><q year=”1481″><date>1481</date></q><q year=”1520″><date>?1520</date></q><q year=”1530″><date>1530</date></q><q year=”1556″><date>1556</date></q><q year=”1608″><date>1608</date></q><q year=”1647″><date>1647</date></q><q year=”1690″><date>1690</date></q><q year=”1709″><date>1709</date></q><q year=”1728″><date>1728</date></q><q year=”1755″><date>1755</date></q><q year=”1804″><date>1804</date></q><q year=”1882″><date>1882</date></q><q year=”1967″><date>1967</date></q><q year=”2007″><date>2007</date></q></quotations>
And then picks out the first date and the last date, plus any ‘a’, ‘c’ and ‘?’ value. This is going to be another long process, but I can’t begin it until I can get my hands on the full OED dataset, which I don’t have with my at home.
This was week 13 of Lockdown, with still no end in sight. I spent most of my time on the Books and Borrowing project, as there is still a huge amount to do to get the project’s systems set up. Last week I’d imported several thousand records into the database and had given the team access to the Content Management System to test things out. One thing that cropped up was that the autocomplete that is used for selecting existing books, borrowers and authors was sometimes not working, or if it did work on selection of an item the script that then populates all of the fields about the book, borrower or author was not working. I’d realised that this was because there were invisible line break characters (\n or \r) in the imported data and the data is passed to the autocomplete via a JSON file. Line break characters are not allowed in a JSON file and therefore the autocomplete couldn’t access the data. I spent some time writing a script that would clean the data of all offending characters and after running this the autocomplete and pre-population scripts worked fine. However, a further issue cropped up with the text editors in the various forms in the CMS. These use the TinyMCE widget to allow formatting to be added to the text area, which works great. However, whenever a new line is created this adds in HTML paragraphs ( ‘<p></p>’, which is good) but the editor also adds a hidden line break character (‘\r’ or ‘\n’ which is bad). When this field is then used to populate a form via the selection of an autocomplete value the line break makes the data invalid and the form fails to populate. After identifying this issue I managed ensured all such characters are stripped out of any uploaded data and that fixed the issue.
I had to spend some time fixing a few more bugs that the team had uncovered during the week. The ‘delete borrower’ option was not appearing, even when a borrower was associated with no records, and I fixed this. There was also an issue with autocompletes not working in certain situations (e.g. when trying to add an existing borrower to a borrowing record that was initially created without a borrower). I tracked down and fixed these. Another issue involved the record page order incrementing whenever the record was edited, even when this had not been manually changed, while another involved book edition data not getting saved in some cases when a borrowing record was created. I tracked down and fixed these issues too.
With these fixes in place I then moved on to adding new features to the CMS, specifically facilities to add and browse the book works, editions and authors that are used across the project. Pressing on the ‘Add Book’ menu item nowloads a page through which you can choose to add a Book Work or a Book Edition (with associated Work, if required). You can also associate authors with the Works and Editions too. Pressing on the ‘Browse Books’ option now loads a page that lists all of the Book Works in a table, with counts of the number of editions and borrowing records associated with each. There’s also a row for all editions that don’t currently have a work. There are currently 1925 such editions so most of the data appears in this section, but this will change.
Through the page you can edit a work (including associating authors) by pressing on the ‘edit’ button. You can delete a work so long as it isn’t associated with an Edition. You can bring up a list of all editions in the work by pressing on the eye icon. Once loaded, the editions are displayed in a table. I may need to change this as there are so many fields relating to editions that the table is very wide. It’s usable if I make my browser take up the full width of my widescreen monitor, but for people using a smaller screen it’s probably going to be a bit unwieldy. From the list of editions you can press the ‘edit’ button to edit one of them – for example assigning one of the ‘no work’ editions to a work (existing or newly created via the edit form). You can also delete an edition if it’s not associated with anything. The Edition table includes a list of borrowing records, but I’ll also need to find a way to add in an option to display a list of all of the associated records for each, as I imagine this will be useful.
Pressing on the ‘Add Author’ menu item brings up a form allowing a new author to be added, which will then be available to associate with books throughout the CMS, while pressing on the ‘Browse Authors’ menu item brings up a list of authors. At the moment this table (and the book tables) can’t be reordered by their various columns. This is something else I still need to implement. You can delete an author if it’s not associated with anything and also edit the author details. As with the book tables I also need to add in a facility to bring up a list of all records the author is associated with, in addition to just displaying counts. I also noticed that there seems to be a bug somewhere that is resulting in blank authors occasionally being generated, and I’ll need to look into this.
I then spent some time setting up the project’s server, which is hosted at Stirling University. I was given access details by Stirling’s IT Support people and managed to sign into the Stirling VPN and get access to the server and the database. There was an issue getting write access to the server, but after that was resolved I was able to upload all of the CMS files, set up the WordPress instance that will be the main project website and migrate the database.
I was hoping I’d be able to get the CMS up and running on the new server without issue, but unfortunately this did not prove to be the case. It turns out that the Stirling server uses a different (and newer) version of the PHP scripting language than the Glasgow server and some of the functionality is different, for example on the Glasgow server you can call a function with less parameters than it is set up to require (e.g. addAuthor(1) when the function is set up to take 2 parameters (e.g.addAuthor(1,2)). The version on the Stirling server doesn’t allow this and instead the script breaks and a blank page is displayed. It took a bit of time to figure out what was going on, and now I know what the issue is I’m going to have to go through every script and check how every function is called, and this is going to be my priority next week.
I also spent a bit of time finalising the website for the project’s pilot project, which deals with borrowing records at Glasgow. This was managed by Matt Sangster, and he’d sent me a list of things we wanted to sort; I spent a few hours going through this, and we’re just about at the point where the website can be made publicly available.
I had intended to spend Friday working on the new way of managing dates for the Historical Thesaurus. The script I’d created to generate the dates for all 790,000-odd lexemes completed during last Friday night and over the weekend I wrote another script that would then shift the connectors up one (so a dash would be associated with the date before the dash rather than the one after it, for example). This script then took many hours to run. Unfortunately I didn’t get a chance to look further into this until Thursday, when I found a bit of time to analyse the output, at which point I realised that while the generation of the new fulldate field had worked successfully, the insertion of bracketed dates into the new dates table had failed, as the column was set as an integer and I’d forgotten to strip out the brackets. Due to this problem I had to set my scripts running all over again. The first one completed at lunchtime on Friday, but the second didn’t complete until Saturday so I didn’t manage to work on the HT this week. However, this did mean that I was able to return to a Scots Thesaurus data processing task that Fraser asked me to look into at the start of May, so it’s not all bad news.
Fraser’s task required me to set up the Stanford Part of Speech tagger on my computer, which meant configuring Java and other such tasks that took a bit of time. I then write a script that took the output of a script I’d written over a year ago that contained monosemous headwords in the DOST data, ran their definitions through the Part of Speech tagger and then outputted this to a new table. This may sound straightforward, but it took quite some time to get everything working, and then another couple of hours for the script to process around 3,000 definitions. But I was able to send the output to Fraser on Friday evening.
Also this week I gave advice to a few members of staff, such as speaking to Matthew Creasy about his new Scottish Cosmopolitanism project, Jane Stuart-Smith about a new project that she’s putting together with QMU, Heather Pagan of the Anglo-Norman Dictionary about a proposal she’s putting together, Rhona Alcorn about the Scots School Dictionary app and Gerry McKeever about publicising his interactive map.
This was week 12 of Lockdown and on Monday I arranged to get access to my office at work in order to copy some files from my work PC. There were some scripts that I needed for the Historical Thesaurus, Fraser’s Scots Thesaurus and the Books and Borrowing projects so I reckoned it was about time to get access. It all went pretty smoothly, thankfully. My train into Central was very quiet – I think there were only about five people in my carriage, and none of them were near me. I walked to the West End and called security to let them know I’d arrived, then got into my office and spent about an hour and a half copying files and doing some work tasks. It was a bit strange to be back in my office after so long, with my calendar still showing March. Once the files were all copied I left the building, checked out with security and walked back through a still deserted town to Central. My train carriage was completely empty on the way back home.
I spent most of the rest of the week continuing with my work on the Books and Borrowing project. My main task was importing sample data into the content management system. Matt had sent me the latest copy of the Glasgow Student data over the weekend, and once I had the data processing scripts from the PC at work I could then process his spreadsheet and upload it to the pilot project database. Processing the Glasgow Student data was not entirely straightforward as the transcriber had used Microsoft Office formatting in the spreadsheet cells to replicate features such as superscript text and strikethroughs. It is a bit of a pain to export an Excel spreadsheet as plain text while retaining such formatting, but thankfully I’d solved that issue previously and my script was able to take an Excel file that had been saved as HTML and then pick out the formatting to keep whilst ditching all of the horrible HTML formatting that Microsoft adds in to Office files that are saved in that format.
Once the Glasgow Student data had been uploaded to the pilot project website I could then migrate it to the Books and Borrowing data structure. It took the best part of a day to write a script that processed the data, dealing with issues like multiple book levels, additional fields and generating ledgers and pages. After the migration there were 3 ledgers, 403 pages and 8191 borrowing records, with associations to 832 borrowers and 1080 books. With this in place I then began to import sample data from a previous study of Innerpeffray library. This was also in a spreadsheet, but was structured very differently and I needed to write a separate data import script to process it. There were some additional complications due to the character encoding the spreadsheet uses, that resulting in lots of hidden special characters being embedded in the text when the spreadsheet was converted to a plain text file for upload. This really messed up the upload process and took some time to get to the bottom of. Also, there is variation in page numbering (e.g. sometimes ‘3r’, sometimes ‘3 r’) and this resulted in multiple pages being created for each variation before I spotted the issue. Also, the spreadsheet is not always listed in page order – there were records from earlier pages added in amongst later pages. This also messed up the upload process before I spotted the issue and updated my script to take this into consideration. There were also some issues of data failing to upload when it contained accented characters, but I think I got to the bottom of that.
As with the Glasgow data, I created editions from holdings. I did add in a check to see whether any of the Glasgow editions matched the titles of the Innerpeffray titles, and used the existing Glasgow edition if this situation arose, but due to the differences in transcription I don’t think any existing editions have been used. This will need some manual correction at some point. Similarly, there may be some existing Glasgow authors that might be used rather than repeating the same information from Innerpeffray but due to differences in transcription I don’t think this will have happened either. As before, author data has for now just been uploaded into the ‘surname’ field and will need to be manually split up further and some Glasgow and Innerpeffray authors will need to be merged. For example, in the Glasgow data we have ‘Cave, William, 1637-1713.’ Whereas in Innerpeffray we have ‘Cave, William, 1637-1713’. Because of the full stop at the end of the Glasgow author these have ended up being inserted as separate authors. After the upload process was complete there were 6550 borrowing records for Innerpeffray, split over 340 pages in one ledger. A total of 1017 unique borrowers and 840 unique book holdings were added to the library.
I created user accounts for the rest of the team to access the CMS and test things out once the sample data for these two libraries was in place. The project PI, Katie Halsey spotted an issue with the autocomplete for selecting an existing edition not working, so I spent some time investigating this. It turns out that there are more character encoding issues with the data that are resulting in the JSON file that is generated for use in the autocomplete failing to be valid. This is also happening with the AJAX script that populates the fields once an autocomplete option is selected. I only investigated this on Friday afternoon and didn’t have time to fix it, but I’m hoping that next week if I fix the character encoding issues and ensure all line break characters are removed from the data then things will be ok.
Other than the Books and Borrowing project, I spoke to Rhona Alcorn of the DSL this week to discuss timescales for DSL developments. I also fixed an issue with the Android version of the Scots School Dictionary app. I gave some advice to Cris Sarg, who is managing the data for the Glasgow Medical Humanities project, and I made some further tweaks to the ‘export data for publication’ facilities for Carole Hough’s REELS project.
I rounded off the week by working on sorting out the new way of storing dates for the Historical Thesaurus. Although we’d previously decided on a structure for the new dates system (which is much more rational and will allow labels to be associated with specific dates rather than the lexeme as a whole) I hadn’t generated the actual new date data. My earlier script (which I retrieved from my office on Monday) instead iterated through each lexeme, generated the new date information and only outputted data if the generated full date did not match the original full date. I’d saved this output as a spreadsheet and Fraser had gone through the rows and had identified any the needed fixed, updating the spreadsheet as required. I then wrote a script to fix the date columns that needed fixed in order for the new fulldate to be properly generated.
With that in place I then wrote a script to generate the new date information for each of the more than 700,000 lexemes in the system. I tried running this on the server initially, but this quickly timed out, meaning I had to run the script locally. I will then be able to import the table into the online database. The script took about 20 hours to run, but seems to have worked successfully, with almost 1.4 million date rows generated for the lexemes. Hopefully next week I’ll find the time to work on this some more.
During week 11 of Lockdown I continued to work on the Books and Borrowing project, but also spent a fair amount of time catching up with other projects that I’d had to put to one side due to the development of the Books and Borrowing content management system. This included reading through the proposal documentation for Jennifer Smith’s follow-on funding application for SCOSYA, and writing a new version of the Data Management Plan based on this updated documentation and making some changes to the ‘export data for print publication’ facility for Carole Hough’s REELS project. I also spent some time creating as new export facility to format the place-name elements and any associated place-names for print publication too.
During this week a number of SSL certificates expired for a bunch of websites, which meant browsers were displaying scary warning messages when people visited the sites. I had to spend a bit of time tracking these down and passing the details over to Arts IT Support for them to fix as it is not something I have access rights to do myself. I also liaised with Mike Black to migrate some websites over from the server that houses many project websites to a new server. This is because the old server is running out of space and is getting rather temperamental and freeing up some space should address the issue.
I also made some further tweaks to Paul Malgrati’s interactive map of Burns’ Suppers and created a new WordPress-powered project website for Matthew Creasy’s new ‘Scottish Cosmopolitanism at the Fin de Siècle’ project. This included the usual choosing a theme, colour schemes and fonts, adding in header images and footer logos and creating initial versions of the main pages of the site. I’d also received a query from Jane Stuart-Smith about the audio recordings in the SCOTS Corpus so I did a bit of investigation about that.
Fraser Dallachy had got back to me with some further tasks for me to carry out on the processing of dates for the Historical Thesaurus, and I had intended to spend some time on this towards the end of the week, but when I began to look into this I realised that the scripts I’d written to process the old HT dates (comprising 23 different fields) and to generate the new, streamlined date system that uses a related table with just 6 fields were sitting on my PC in my office at work. Usually all the scripts I work on are located on a server, meaning I can easily access them from anywhere by connecting to the server and downloading them. However, sometimes I can’t run the scripts on the server as they may need to be left running for hours (or sometimes days) if they’re processing large amounts of data or performing intensive tasks on the data. In these cases the scripts run directly on my office PC, and this was the situation with the dates script. I realised I would need to get into my office at work on retrieve the scripts, so I put in a request to be allowed into work. Staff are not currently allowed to just go into work – instead you need to get approval from your Head of School and then arrange a time that suits security. Thankfully it looks like I’ll be able to go in early next week.
Other than these issues, I spent my time continuing to work for the Books and Borrowing project. On Tuesday we had a Zoom call with all six members of the core project team, during which I demonstrated the CMS as it currently stands. This gave me an opportunity to demonstrate the new Author association facilities I had created last week. The demonstration all went very smoothly and I think the team are happy with how the system works, although no doubt once they actually begin to use it there will be bugs to fix and workflows to tweak. I also spent some time before the meeting testing the system again, and fixing some issues that were not quite right with the author system.
I spent the remainder of my time on the project completing work on the facility to add, edit and view book holding records directly via the library page, as opposed to doing so whilst adding / editing a borrowing record. I also implemented a similar facility for borrowers as well. Next week I will begin to import some of the sample data from various libraries into the system and will allow the team to access the system to test it out.
We’ve now reached week 10 of Lockdown, and I spent it in much the same way as previous weeks, dividing my time between work and homeschooling my son. This week I continued to focus on the development of the content management system for the Books and Borrowing project. On Tuesday I had a Zoom meeting to demonstrate the system as it currently stands to the project PI Katie Halsey and Co-I Matt Sangster. Monday was a bank holiday but I decided to work it and take the day off at a later date in order to prepare a walkthrough and undertake a detailed testing of the system, which uncovered a number of bugs that I then tracked down and fixed. My walkthrough went through all of the features that are so far in place: creating, editing and deleting libraries, viewing libraries, adding ledgers and additional fields to libraries, viewing, editing and deleting these ledgers and additional fields, adding pages to ledgers, editing and deleting them, viewing a page, the automated approach to constructing navigation between pages, viewing records on pages and then the big thing: adding and editing borrowing records. This latter process can involve adding data about the borrowing (e.g. lending date), one or more borrowers (which may be new borrowers or ones already in the system), a new or existing book holding, which may consist of one or more book items (e.g. volumes 1 and 3 of a book) and may be connected to one or more new or existing project-wide book edition records which may have a new or existing top-level book work record.
The walkthrough via Zoom went well, with me sharing my screen with Katie and Matt so they could follow my actions as I used the CMS. I was a bit worried they would think the add / edit borrowing record form would be too complicated but although it does look rather intimidating, most of the information is optional and many parts of it will be automatically populated by linking to existing records via autocomplete drop-downs, so once there is a critical mass of existing data in the system (e.g. existing book and borrower records) the process of adding new borrowing records will be much quicker and easier.
The only major change that I needed to make following the walkthrough was to add a new ‘publication end date’ field to book edition and book work records as some books are published in parts over multiple years (especially books comprised of multiple volumes). I implemented this after the meeting and then spent most of the remainder of the week continuing to implement further aspects of the CMS. I made a start on the facility to view a list of all book holding records that have been created for a library, through which the project team will be able to bring up a list of all borrowing records that involve the book. I got as far as getting a table listing the book holdings in place, but as the project team will be started next week I figured it would make more sense to try and tackle the last major part of the system that still needed to be implemented: creating and associating author records with the four levels of book record.
A book may have any number of authors and their associations with a book record cascades down through the levels. For example, if an author is associated with a book via its top-level ‘book work’ record then the author will automatically be associated with a related ‘book edition’ record, any ‘book holding’ records this edition is connected to and any ‘book item’ records belonging to the book holding. But we need to be able to associate an author not just with ‘book works’ but with any level of book record, as a book may have a different author at one of these levels (e.g. a particular volume may be attributed to a different author) or the same author may be referred to by a different alias in a particular edition. Therefore I had to update the already complicated add / edit borrowing record form to enable authors to be created, associated and disassociated with any book level. Plus I needed to add in an autocomplete facility to enable authors already in the system to be attached to records and to ensure that the author sections clear and reset themselves if the user removes the book from the borrowing record. It took a long time to implement this system, but by the end of the week I’d got an initial version working. It will need a lot of testing and no doubt some fixing next week, but it’s a relief to get this major part of the system in place. I also added in a little feature that keeps the user’s CMS session going for as long as the browser is on a page of the CMS, which is very important as the complicated forms may take a long time to complete and it would be horrible if the sessions timed out before the user was able to submit the form.
I didn’t have time to do much else this week. I was supposed to have a Zoom call about the Historical Thesaurus on Friday but this has been postponed as we’re all pretty busy with other things. One of the server that hosts a lot of project websites has been experiencing difficulties this week so I had to deal with emails from staff about this and to contact Arts IT Support to ask them to fix things as it’s not something I have access to myself. The server appears to be down again as I’m writing this, unfortunately.
The interactive map I’d created for Gerry McKeever’s Regional Romanticism project was launched this week, and can now be accessed here: https://regionalromanticism.glasgow.ac.uk/paul-jones/ but be aware that this is one of the sites currently affected by the server issue so the map, or parts of the site in general may be unavailable.
Next week the project team for the Books and Borrowing project start work and I will be giving them a demonstration of the CMS on Tuesday, so no doubt I will be spending a lot of time continuing to work on this then.