
Month: May 2020
Week Beginning 18th May 2020
I spent week 9 of Lockdown continuing to implement the content management system for the Books and Borrowing project. I was originally hoping to have completed an initial version of the system by the end of this week, but this was unfortunately not possible due to having to juggle work and home-schooling, commitments to other projects and the complexity of the project’s data. It took several days to complete the scripts for uploading a new borrowing record due to the interrelated nature of the data structure. A borrowing record can be associated with one or more borrowers, and each of these may be new borrower records or existing ones, meaning data needs to be pulled in via an autocomplete to prepopulate the section of the form. Books can also be new or existing records but can also have one or more new or existing book item records (as a book may have multiple volumes) and may be linked to one or more project-wide book edition records which may already exist or may need to be created as part of the upload process, and each of these may be associated with a new or existing top-level book work record. Therefore the script for uploading a new borrowing record needs to incorporate the ‘add’ and ‘edit’ functionality for a lot of associated data as well. However, as I have implemented all of these aspects of the system now it will make it quicker and easier to develop the dedicated pages for adding and editing borrowers and the various book levels once I move onto this. I still haven’t working on the facilities to add in book authors, genres or borrower occupations, which I intend to move onto once the main parts of the system are in place.
After completing the scripts for processing the display of the ‘add borrowing’ form and the storing of all of the uploaded data I moved onto the script for viewing all of the borrowing records on a page. Due to the huge number of potential fields I’ve had to experiment with various layouts, but I think I’ve got one that works pretty well, which displays all of the data about each record in a table split into four main columns (Borrowing, Borrower, Book Holding / Items, Book Edition / Works). I’ve also added in a facility to delete a record from the page. I then moved on to the facility to edit a borrowing record, which I’ve added to the ‘view’ page rather than linking out to a separate page. When the ‘edit’ button is pressed on for a record its row in the table is replace with the ‘edit’ form, which is identical in style and functionality to the ‘add’ form, but is prepopulated with all of the record’s data. As with the ‘add’ form, it’s possible to associated multiple borrowers and book items and editions, and also to manage the existing associations using this script. The processing of the form uses the same logic as the ‘add’ script so thankfully didn’t require much time to implement.
What I still need to do is add authors and borrower occupations to the ‘view page’, ‘add record’ and ‘edit record’ facilities, add the options to view / edit / add / delete a library’s book holdings and borrowers independently of the borrowing records, plus facilities to manage book editions / works, authors, genres and occupations at the top level as opposed to when working on a record. I also still need to add in the facilities to view / zoom / pan a page image and add in facilities to manage borrower cross-references. This is clearly quite a lot, but the core facilities of adding, editing and deleting borrowing, borrower and book records is now in place, which I’m happy about. Next week I’ll continue to work on the system ahead of the project’s official start date at the beginning on June.
Also this week I made a few tweaks to the interface for the Place-names of Mull and Ulva project, spoke to Matthew Creasy some more about the website for his new project, spoke to Jennifer Smith about the follow-on funding proposal for the SCOSYA project and investigated an issue that was affecting the server that hosts several project websites (basically it turned out that the server had run out of disk space).
I also spent some time working on scripts to process data from the OED for the Historical Thesaurus. Fraser is working on incorporating new dates from the OED and needs to work out which dates in the HT data we want to replace and which should be retained. The script makes groups of all of the distinct lexemes in the OED data. If the group has two or more lexemes it then checks that at least one of them is revised. It then makes subgroups of all of the lexemes that have the same date (so for example all the ‘Strike’ words with the same ‘sortdate’ and ‘lastdate’ are grouped together). If one word in the whole group is ‘revised’ and at least two words have the same date then the words with the same dates are displayed in the table. The script also checks for matches in the HT lexemes (based on catid, refentry, refid and lemmaid fields). If there is a match this data is also displayed. I then further refined the output based on feedback from Fraser, firstly highlighting in green those rows where at least two of the HT dates match, and secondly splitting the table into three separate tables, one with the green rows, one containing all other OED lexemes that have a matching HT lexeme and a third containing OED lexemes that (As of yet) do not have a matching HT lexeme.
Week Beginning 11th May 2020
This was week 8 of Lockdown and I spent the majority of it working on the content management system for the Books and Borrowing project. The project is due to begin at the start of June and I’m hoping to have the CMS completed and ready to use by the project team by then, although there is an awful lot to try and get into place. I can’t really go into too much detail about the CMS, but I have completed the pages to add a library and to browse a list of libraries with the option of deleting a library if it doesn’t have any ledgers. I’ve also done quite a lot with the ‘View library’ page. It’s possible to edit a library record, add a ledger and add / edit / delete additional fields for a library. You can also list all of the ledgers in a library with options to edit the ledger, delete it (if it contains no pages) and add a new page to it. You can also display a list of pages in a ledger, with options to edit the page or delete it (if it contains no records). You can also open a page in the ledger and browse through the next and previous pages.
I’ve been trying a new approach with the CMS for this project, involving more in-page editing. For example, the list of ledgers is tabular based with fields for things like the number of pages, the ledger name and its start and end dates. When the ‘edit’ button is pressed on rather than taking the user away from this page to a separate page, the row in the table becomes editable. This approach is rather more complicated to develop and relies a lot more on JavaScript, but it seems to be working pretty well. It was further complicated by having textareas that use the TinyMCE text editing tool, which then needs to be reinitiated when the editable boxes load in. Also, you can’t have multiple forms within a table in HTML, meaning there can be only one form wrapped around the whole table. Initially I was thinking that when the row became editable the JavaScript would add in form tags in the row too, but this approach doesn’t work properly so instead I’ve just had to implement a single form with its type controlled by hidden inputs that change when a row is selected. The situation is complicated as it’s not just the ledger record that needs to be edited from within the table, but there are also facilities to add and edit ledger pages, which also need to use the same form.
At the moment I’m in the middle of creating the facility to add a new borrowing record to the page. This is the most complex part of the system as a record may have multiple borrowers, each of which may have multiple occupations, and multiple books, each of which may be associated with higher level book records. Plus the additional fields for the library need to be taken into consideration too. By the end of the week I was at the point of adding in an auto-complete to select an existing borrower record and I’ll continue with this on Monday.
In addition to the B&B project I did some work for other projects as well. For Thomas Clancy’s Place-names of Kirkcudbrightshire project (now renamed Place-names of the Galloway Glens) I had a few tweaks and updates to put in place before Thomas launched the site on Tuesday. I added a ‘Search place-names’ box to the right-hand column of every non-place-names page which takes you to the quick search results page and I added a ‘Place-names’ menu item to the site menu, so users can access the place-names part of the site. Every place-names page now features a sub-menu with access to the place-names pages (Browse, element glossary, advanced search, API, quick search). To return to the place-name introductory page you can press on the ‘Place-names’ link in the main menu bar. I had unfortunately introduced a bug to the ‘edit place-name’ page in the CMS when I changed the ordering of parishes to make KCB parishes appear first. This was preventing any place-names in BMC from having their cross references, feature type and parishes saved when the form was submitted. This has now been fixed. I also added Google Analytics to the site. The virtual launch on Tuesday went well and the site can now be accessed here: https://kcb-placenames.glasgow.ac.uk/.
I also added in links to the DSL’s email and Instagram accounts to the footer of the DSL site and added some new fields to the database and CMS of the Place-names of Mull and Ulva site. I also created a new version of the Burns Supper map for Paul Malgrati that included more data and a new field for video dimensions that the video overlay now uses. I also replied to Matthew Creasy about a query regarding the website for his new Scottish Cosmopolitanism project and a query from Jane Roberts about the Thesaurus of Old English and made a small tweak to the data of Gerry McKeever’s interactive map for Regional Romanticism.
Week Beginning 4th May 2020
Week seven of lockdown continued in much the same fashion as the preceding weeks, the only difference being Friday was a holiday to mark the 75th anniversary of VE day. I spent much of the four working days on the development of the content management system for the Books and Borrowing project. The project RAs will start using the system in June and I’m aiming to get everything up and running before then so this is my main focus at the moment. I also had a Zoom meeting with project PI Katie Halsey and Co-I Matt Sangster on Tuesday to discuss the requirements document I’d completed last week and the underlying data structures I’d defined in the weeks before. Both Katie and Matt were very happy with the document, although Matt had a few changes he wanted made to the underlying data structures and the CMS. I made the necessary changes to the data design / requirements document and the project’s database that I’d set up last week. The changes were:
Borrowing spans have now been removed from libraries and these will instead be automatically inferred based on the start and end dates of ledger records held in these libraries. Ledgers now have a new ‘ledger type’ field which currently allows the choice of ‘Professorial’, ‘Student’ or ‘Town’. This field will allow borrowing spans for libraries to be altered based on a selected ledger type. The way occupations for borrowers is recorded has been updated to enable both original occupations from the records and a normalised list of occupations to be recorded. Borrowers may not have an original occupation but still might have a standardised occupation so I’ve decided to use the occupations table as previously designed to hold information about standardised occupations. A borrower may have multiple standardised occupations. I have also added a new ‘original occupation’ field to the borrower record where any number of occupations found for the borrower in the original documentation (e.g. river watcher) can be added if necessary. The book edition table now has an ‘other authority URL’ field and an ‘other authority type’ field which can be used if ESTC is not appropriate. The ‘type’ currently features ‘Worldcat’, ‘CERL’ and ‘Other’ and ‘Language’ has been moved from Holding to Edition. Finally, in Book Holding the short title is now original title and long title is now standardised title while the place and date of publication fields have been removed as the comparable fields at Edition level will be sufficient.
In terms of the development of the CMS, I created a Bootstrap-based interface for the system, which currently just uses the colour scheme I used for Matt’s pilot 18th Century Borrowing project. I created the user authentication scripts and the menu structure and then started to create the actual pages. So far I’ve created a page to add a new library record and all of the information associated with a library, such as any number of sources. I then created the facility to browse and delete libraries and the main ‘view library’ page, which will act as a hub through which all book and borrowing records associated with the library will be managed. This page has a further tab-based menu with options to allow the RA to view / add ledgers, additional fields, books and borrowers, plus the option to edit the main library information. So far I’ve completed the page to edit the library information and have started work on the page to add a ledger. I’m making pretty good progress with the CMS, but there is still a lot left to do. Here’s a screenshot of the CMS if you’re interested in how it looks:
Also this week I had a Zoom meeting the Marc Alexander and Fraser Dallachy to discuss update to the Historical Thesaurus as we head towards a second edition. This will include adding in new words from the OED and new dates for existing words. My new date structure will also go live, so there will need to be changes to how the timelines work. Marc is hoping to go live with new updates in August. We also discussed the ‘guess the category’ quiz, with Marc and Fraser having some ideas about limiting the quiz to certain categories, or excluding other categories that might feature inappropriate content. We may also introduce a difficulty level based on date, with an ‘easy’ version only containing words that were in use for a decent span of time in the past 200 years.
Other work I did this week included making some tweaks to the data for Gerry McKeever’s interactive map, fixing an issue with videos continuing to play after the video overlay was closed for Paul Malgrati’s Burns Supper map, replying to a query from Alasdair Whyte about his Place-names of Mull and Ulva project and looking into an issue for Fraser’s Scots Thesaurus project which unfortunately I can’t do anything about as the scripts I’d created for this (which needed to be let running for several days) are on the computer in my office. If this lockdown ever ends I’ll need to tackle this issue then.
Week Beginning 27th April 2020
The sixth week of lockdown continued in much the same manner as the previous ones, dividing my time between working and home-schooling my son. I spent the majority of the week continuing to work on the requirements document for the Books and Borrowing project. As I worked through this I returned to the database design and made some changes as my understanding of the system increased. This included adding in a new field for the original transcription and a new ‘order on page’ field to the borrowing table as I realised that without such a column it wouldn’t be possible for an RA to add a new record anywhere other than after the last record on a page. It’s quite likely that an RA will accidentally skip a record and might need to reinstate it, or an RA might intentionally want to leave out some records to return to later. The ‘order on page’ column (which will be automatically generated but can be manually edited) will ensure these situations can be handled.
As I worked through the requirements I began to realise that the amount of data the RAs may have to compile for each borrowing record is possibly going to be somewhat overwhelming. Much of it is optional, but completing all the information could take a long time: creating a new Book Holding and Item record, linking it to an Edition and Work or creating new records for these, associating authors or creating new authors, adding in genre information, creating a new borrower record or associating an existing one, adding in occupations, adding in cross references to other borrowers, writing out a diplomatic transcription, filling in all of the core fields and additional fields. That’s a huge amount to do for each record and we may need to consider what is going to be possible for the RAs to do in the available time.
By the end of Tuesday I had finished working on a first version of the requirements document, weighing in at more than 11,000 words, and sent it on to Katie and Matt for feedback. We have agreed to meet (via Zoom) next Tuesday to discuss any changes to the document. During the rest of the week I began to develop the systems for the project. This included implementing the database (creating each of the 23 tables that will be needed to store the project’s data) and installing and configuring WordPress, which will be used to power the simple parts of the project website.
I also continued to work with the data for the Anglo-Norman Dictionary this week. I managed to download all of the files from the server over the weekend. I now have two versions of the ‘entry_hash’ database file plus many thousand XML files that were in the ‘commit-out’ directory. I extracted the different version of the ‘entry_hash’ table using the method I figured out last week and discovered that it was somewhat larger than the version I was working with last week, containing 4,556,011 lines as opposed to 3,969,350 and 54,025 entries as opposed to 53,945. I sent this extracted file to Heather for her to have a look at.
I then decided to update the test website I’d made a few months ago. This version used the ‘all.xml’ file as a data source and allowed a user to browse the dictionary entries using a list of all the headwords and to view each entry (well, certain aspects of the entry that I’d formatted from the XML). Thankfully I managed to locate the scripts I’d used to extract the XML and migrate it to a database on the server and I ran both versions of the ‘entry_hash’ output through this script, resulting in three different dictionary data sources. I then updated the simple website to add in a switcher to swap between data sources. Extracting the data also led to me realising that there was a ‘deleted’ record type in the entry table and if I disregarded records of this type the older entry_hash data had 53,925 entries and the newer one had 54,002, so a difference of 77. The old ‘all.xml’ data from 2015 had 50,403 entries that aren’t set to ‘deleted’. In looking through the files from the server I had also managed to track down the XSLT file used to transform the XML into HTML for the existing website, so I added this to my test website, together with the CSS file from the existing website. This meant the full entries could now be displayed in a fully formatted manner, which is useful.
Heather had a chance to look through and compare the three test website versions and discovered that the new ‘entry_hash’ version contained all of the data that was in their data management system that had yet to be published on the public website. This was really good news as it meant that we now have a complete dataset without needing to integrate individual XML files. With a full dataset now secured I am now in a position to move on to the requirements for the new dictionary website.
Also this week I made some further tweaks to the ‘guess the category’ quiz for the Historical Thesaurus. The layout now works better on a phone in portrait mode (the choices now take up the full width of the area). I also fixed the line-height of the quiz word, which was previously overlapping if this went over more than one line. I updated things so that when pressing ‘next’ or restarting the quiz the page automatically scrolls so that the quiz word is in view. I fixed the layout to ensure that there should always be enough space for the ticks and crosses now (they should no longer end up dropping down below the box if the category text is long). Also, any very long words in the category that previously ended up breaking out of their boxes are now cut off when they reach the edge of the box. I could auto-hyphen long words to split them over multiple lines, and I might investigate this next week. I also fixed an issue with the ‘Next’ button: when restarting a quiz after reaching the summary the ‘Next’ button was still labelled ‘Summary’. Finally, I’ve added a timer to the quiz so you can see how long it took you and to try and beat your fastest time. When you reach the summary page your time is now displayed in the top bar along with your score. I’ve also added some text above the ‘Try again’ button. If you get less than full marks is says “Can you do better next time?” and if you did get full marks it says “You got them all right, but can you beat your time of x”.
Finally this week I helped Roslyn Potter of the DSL to get a random image loading into a new page. This page asks people to record themselves saying a selection of Scots words and there are 10 different images featuring words. The random feature displays one image and provides a button to load another random image, plus a feature to view all of the images. The page can be viewed here: https://dsl.ac.uk/scotsvoices/