I was off on Monday this week for the September Weekend holiday. My four working days were split across many different projects, but the main ones were the Historical Thesaurus and the Anglo-Norman Dictionary.
For the HT I continued with the preparations for the second edition. I updated the front-end so that multiple changelog items are now checked for and displayed (these are the little tooltips that say whether a lexeme’s dates have been updated in the second edition). Previously only one changelog was being displayed but this approach wasn’t sufficient as a lexeme may have a changed start and end date. I also fixed a bug in the assigning of the ‘end date verified as after 1945’ code, which was being applied to some lexemes with much earlier end dates. My script set the type to 3 in all cases where the last HT date was 9999. What it needed to do was to only set it to type 3 if the last HT date was 9999 and the last OED date was after 1945. I wrote a little script to fix this, which affected about 7,400 lexemes.
I also wrote a script to check off a bunch of HT and OED categories that had been manually matched by an RA. I needed to make a few tweaks to the script after testing it out, but after running it on the data we had a further 846 categories matched up, which is great. Fraser had previously worked on a document listing a set of criteria for working out whether an OED lexeme was ‘new’ or not (i.e. unlinked to an HT lexeme). This was a pretty complicated document with many different stages, and the output of the various stages needing to be outputted into seven different spreadsheets and it took quite a long time to write and test a script that would handle all of these stages. However, I managed to complete work on it and after a while it finished executing and resulted in the 7 CSV files, one for each code mentioned in the document. I was very glad that I had my new PC as I’m not sure my old one could have coped with it – for the Levenshtein tests data every word in the HT had to be stored in memory throughout the script’s execution, for example. On Friday I had a meeting with Marc and Fraser where we discussed the progress we’d been making and further tweaks to the script were proposed that I’ll need to implement next week.
For the Anglo-Norman Dictionary I continued to work on the ‘Entry’ page, implementing a mixture of major features and minor tweaks. I updated the way the editor’s initials were being displayed as previously these were the initials of the editor who made the most recent update in the changelog where what was needed were the initials of the person who created the record, contained in the ‘lead’ attribute of the main entry. I also attempted to fix an issue with references in the entry that were set to ‘YBB’. Unlike other references, these were not in the data I had as they were handled differently. I thought I’d managed to fix this, but it looks like ‘YBB’ is used to refer to many different sources so can’t be trusted to be a unique identifier. This is going to need further work.
Minor tweaks included changing the font colour of labels, making the ‘See Also’ header bigger and clearer, removing the final semi-colon from lists of items, adding in line breaks between parts of speech in the summary and other such things. I then spent quite a while integrating the commentaries. These were another thing that weren’t properly integrated with the entries but were added in as some sort of hack. I decided it would be better to have them as part of the editors’ XML rather than attempting to inject them into the entries when they were requested for display. I managed to find the commentaries in another hash file and thankfully managed to extract the XML from this using the Python script I’d previously written for the main entry hash file. I then wrote a script that identified which entry the commentary referred to, retrieved the entry and then inserted the commentary XML into the middle of it (underneath the closing </head> element.
It took somewhat longer than I expected to integrate the data as some of the commentaries contained Greek, and the underlying database was not set up to handle multi-byte UTF-8 characters (which Greek are), meaning these commentaries could not be added to the database. I needed to change the structure of the database and re-import all of the data as simply changing the character encoding of the columns gave errors. I managed to complete this process and import the commentaries and then begin the process of making them appear in the front-end. I still haven’t completely finished this (no formatting or links in the commentaries are working yet) and I’ll need to continue with this next week.
Also this week I added numbers to the senses. This also involved updating the editor’s XML to add a new ‘n’ attribute to the <sense> tag, e.g. <sense id=”AND-201-47B626E6-486659E6-805E33CE-A914EB1F-S001″ n=”1″>. As with the current site, the senses reset to 1 when a new part of speech begins. I also ensured that [sic] now appears, as does the language tag, with a question mark if the ‘cert’ attribute is present and not 100. Uncertain parts of speech are also now visible too (again if ‘cert’ is present and not 100), I increased the font size of the variant forms and citation dates are now visible. There is still a huge amount of work to do, but progress is definitely being made.
Also this week I reviewed the transcriptions from a private library that we are hoping to incorporate into the Books and Borrowing project and tweaked the way ‘additional fields’ are stored to enable the Ras to enter HTML characters into them. I also created a spreadsheet template for a recording the correspondence of Robert Burns for Craig Lamont and spoke to Eila Williamson about the design of the new Names Studies website. I updated the text on the homepage of this site, which Lorna Hughes sent me and gave some advice to Luis Gomes about a data management plan he is preparing. I also updated the working on the search results page for ‘V3’ of the DSL to bring it into line with ‘V2’ and participated in a Zoom call for the Iona project where we discussed the new website and images that might be used in the design.
This was a pretty busy week, involving lots of different projects. I set up the systems for a new place-name project focusing on Ayrshire this week, based on the system that I initially developed for the Berwickshire project and has subsequently been used for Kirkcudbrightshire and Mull. It didn’t take too long to port the system over, but the PI also wanted the system to be populated with data from the GB1900 crowdsourcing project. This project has transcribed every place-name on the GB1900 Ordnance Survey maps across the whole of the UK and is an amazing collection of data totalling some 2.5 million names. I had previously extracted a subset of names for the Mull and Ulva project so thankfully had all of the scripts needed to get the information for Ayrshire. Unfortunately what I didn’t have was the data in a database, as I’d previously extracted it to my PC at work. This meant that I had to run the extraction script again on my home PC, which took about three days to work through all of the rows in the monstrous CSV file. Once this was complete I could then extract the names found in the Ayrshire parishes that the project will be dealing with, resulting in almost 4,000 place-names. However, this wasn’t the end of the process as while the extracted place-names had latitude and longitude they didn’t have grid references or altitude. My place-names system is set up to automatically generate these values and I could customise the scripts to automatically apply the generated data to each of the 4000 places. Generating the grid reference was pretty straightforward but grabbing the altitude was less so, as it involved submitting a query to Google Maps and then inserting the returned value into my system using an AJAX call. I ran into difficulties with my script exceeding the allowed number of Google Map queries and also the maximum number of page requests on our server, resulting in my PC getting blocked by the server and a ‘Forbidden’ error being displayed instead, but with some tweaking I managed to get everything working within the allowed limits.
I also continued to work on the Second Edition of the Historical Thesaurus. I set up a new version of the website that we will work on for the Second Edition, and created new versions of the database tables that this new site connects to. I also spent some time thinking about how we will implement some kind of changelog or ‘history’ feature to track changes to the lexemes, their dates and corresponding categories. I had a Zoom call with Marc and Fraser on Wednesday to discuss the developments and we realised that the date matching spreadsheets I’d generated last week could do with some additional columns from the OED data, namely links through to the entries on the OED website and also a note to say whether the definition contains ‘(a)’ or ‘(also’ as these would suggest the entry has multiple senses that may need a closer analysis of the dates.
I then started to update the new front-end to use the new date structure that we will use for the Second Edition (with dates stored in a separate date table rather than split across almost 20 different date fields in the lexeme table). I updated the timeline visualisations (mini and full) to use this new date table, and although this took quite some time to get my head around the resulting code is MUCH less complicated than the horrible code I had to write to deal with the old 20-odd date columns. For example, the code to generate the data for the mini timelines is about 70 lines long now as opposed to over 400 previously.
The timelines use the new data tables in the category browse and the search results. I also spotted some dates weren’t working properly with the old system but are working properly now. I then updated the ‘label’ autocomplete in the advanced search to use the labels in the new date table. What I still need to do is update the search to actually search for the new labels and also to search the new date tables for both ‘simple’ and ‘complex’ year searches. This might be a little tricky, and I will continue on this next week.
Also this week I gave Gerry McKeever some advice about preserving the data of his Regional Romanticism project, spoke to the DSL people about the wording of the search results page, gave feedback on and wrote some sections for Matthew Creasy’s Chancellor’s Fund proposal, gave feedback to Craig Lamont regarding the structure of a spreadsheet for holding data about the correspondence of Robert Burns and gave some advice to Rob Maslen about the stats for his ‘City of Lost Books’ blog. I also made a couple of tweaks to the content management system for the Books and Borrowers project based on feedback from the team.
I spent the remainder of the week working on the redevelopment of the Anglo-Norman dictionary. I updated the search results page to style the parts of speech to make it clearer where one ends and the next begins. I also reworked the ‘forms’ section to add in a cut-off point for entries that have a huge number of forms. In such cases the long list of cut off and an ellipsis is added in, together with an ‘expand’ button. Pressing on this scrolls down the full list of forms and the button is replaced with a ‘collapse’ button. I also updated the search so that it no longer includes cross references (these are to be used for the ‘Browse’ list only) and the quick search now defaults to an exact match search whether you select an item from the auto-complete or not. Previously it performed an exact match if you selected an item but defaulted to a partial match if you didn’t. Now if you search for ‘mes’ (for example) and press enter or the search button your results are for “mes” (exactly). I suspect most people will select ‘mes’ from the list of options, which already did this, though. It is also still possible to use the question mark wildcard with an ‘exact’ search, e.g. “m?s” will find 14 entries that have three letter forms beginning with ‘m’ and ending in ‘s’.
I also updated the display of the parts of speech so that they are in order of appearance in the XML rather than alphabetically and I’ve updated the ‘v.a.’ and ‘v.n.’ labels as the editor requested. I also updated the ‘entry’ page to make the ‘results’ tab load by default when reaching an entry from the search results page or when choosing a different entry in the search results tab. In addition, the search result navigation buttons no longer appear in the search tab if all the results fit on the page and the ‘clear search’ button now works properly. Also, on the search results page the pagination options now only appear if there is more than one page of results.
On Friday I began to process the entry XML for display on the entry page, which was pretty slow going, wading through the XSLT file that is used to transform the XML to HTML for display. Unfortunately I can’t just use the existing XSLT file from the old site because we’re using the editor’s version of the XML and not the system version, and the two are structurally very different in places.
So far I’ve been dealing with forms and have managed to get the forms listed, with grammatical labels displayed where available and commas separating forms and semi-colons separating groups of forms. Deviant forms are surrounded by brackets. Where there are lots of forms the area is cut off as with the search results. I still need to add in references where these appear, which is what I’ll tackle next week. Hopefully now I’ve started to get my head around the XML a bit progress with the rest of the page will be a little speedier, but there will undoubtedly be many more complexities that will need to be dealt with.
I worked on many different projects this week, and the largest amount of my time went into the redevelopment of the Anglo-Norman Dictionary. I processed a lot of the data this week and have created database tables and written extraction scripts to export labels, parts of speech, forms and cross references from the XML. The data extracted will be used for search purposes, for display on the website in places such as the search results or will be used to navigate between entries. The scripts will also be used when updating data in the new content management system for the dictionary when I write it. I have extracted 85,397 parts of speech, 31,213 cross references, 150,077 forms and their types (lemma / variant / deviant) and 86,269 labels which correspond to one of 157 unique labels (usage or semantic), which I also extracted.
I have also finished work on the quick search feature, which is now fully operational. This involved creating a new endpoint in the API for processing the search. This includes the query for the predictive search (i.e. the drop-down list of possible options that appears as you type), which returns any forms that match what you’re typing in and the query for the full quick search, which allows you to use ‘?’ and ‘*’ wildcards (and also “” for an exact match) and returns all of the data about each entry that is needed for the search results page. For example, if you type in ‘from’ in the ‘Quick Search’ box a drop-down list containing all matching forms will appear. Note that these are forms not only headwords so they include lemmas but also variants and deviants. If you select a form that is associated with one single entry then the entry’s page will load. If you select a form that is associated with more than one entry then the search results page will load. You can also choose to not select an item from the drop-down list and search for whatever you’re interested in. For example, enter ‘*ment’ and press enter or the search button to view all of the forms ending in ‘ment’, as the following screenshot demonstrates (note that this is not the final user interface but one purely for test purposes):
With this example you’ll see that the results are paginated, with 100 results per page. You can browse through the pages using the next and previous buttons or select one of the pages to jump directly to it. You can bookmark specific results pages too. Currently the search results display the lemma and homonym number (if applicable) and display whether the entry is an xref or not. Associated parts of speech appear after the lemma. Each one currently has a tooltip and we can add in descriptions of what each POS abbreviation means, although these might not be needed. All of the variant / deviant forms are also displayed as otherwise it can be quite confusing for users if the lemma does not match the term the user entered but a form does. All associated semantic / usage labels are also displayed. I’m also intending to add in earliest citation date and possibly translations to the results as well, but I haven’t extracted them yet.
When you click on an entry from the search results this loads the corresponding entry page. I have updated this to add in tabs to the left-hand column. In addition to the ‘Browse’ tab there is a ‘Results’ tab and a ‘Log’ tab. The latter doesn’t contain anything yet, but the former contains the search results. This allows you to browse up and down the search results in the same way as the regular ‘browse’ feature, selecting another entry. You can also return to the full results page. I still need to do some tweaking to this feature, such as ensuring the ‘Results’ tab loads by default if coming from a search result. The ‘clear’ option also doesn’t currently work properly. I’ll continue with this next week.
For the Books and Borrowing project I spent a bit of time getting the page images for the Westerkirk library uploaded to the server and the page records created for each corresponding page image. I also made some final tweaks to the Glasgow Students pilot website that Matthew Sangster and I worked on and this is now live and available here: https://18c-borrowing.glasgow.ac.uk/.
There are three new place-name related projects starting up at the moment and I spent some time creating initial websites for all of these. I still need to add in the place-name content management systems for two of them, and I’m hoping to find some time to work on this next week. I also spoke to Joanna Kopaczyk about a website for an RSE proposal she’s currently putting together and gave some advice to some people in Special Collections about a project that they are planning.
On Tuesday I had a Zoom call with the ‘Editing Robert Burns’ people to discuss developing the website for phase two of the Editing Robert Burns project. We discussed how the website would integrate with the existing website (https://burnsc21.glasgow.ac.uk/) and discussed some of the features that would be present on the new site, such as an interactive map of Burns’ correspondence and a database of forged items.
I also had a meeting with the Historical Thesaurus people on Tuesday and spent some time this week continuing to work on the extraction of dates from the OED data, which will feed into a new second edition of the HT. I fixed all of the ‘dot’ dates in the HT data. This is where there isn’t a specific date but a dot is used instead (e.g. 14..) but sometimes a specific year is given in the year attribute (e.g. 1432) but at other times a more general year is given (e.g. 1400). We worked out a set of rules for dealing with these and I created a script to process them. I then reworked my script that extracts dates for all lexemes that match a specific date pattern (YYYY-YYYY, where the first year might be Old English and the last year might be ‘Current’) and sent this to Fraser so that the team can decide which of these dates should be used in the new version of the HT. Next week I’ll begin work on a new version of the HT website that uses an updated dataset so we can compare the original dates with the newly updated ones.
Week 16 of Lockdown and still working from home. I continued working on the data import for the Books and Borrowers project this week. I wrote a script to import data from Haddington, which took some time due to the large number of additional fields in the data (15 across Borrowers, Holdings and Borrowings), but are executing it resulted in a further 5,163 borrowing records across 2 ledgers and 494 pages being added, including 1399 book holding records and 717 borrowers.
I then moved onto the datasets from Leighton and Wigtown. Leighton was a much smaller dataset, with just 193 borrowing records over 18 pages in one ledger and involving 18 borrowers and 71 books. As before, I have just created book holding records for these (rather than project-wide edition records), although in this case there are authors for books too, which I have also created. Wigtown was another smaller dataset. The spreadsheet has three sheets, the first is a list of borrowers, the second a list of borrowings and the third a list of books. However, no unique identifiers are used to connect the borrowers and books to the information in the borrowings sheet and there’s no other field that matches across the sheets to allow the data to be automatically connected up. For example, in the Books sheet there is the book ‘History of Edinburgh’ by author ‘Arnot, Hugo’ but in the borrowings tab author surname and forename are split into different columns (so ‘Arnot’ and ‘Hugo’ and book titles don’t match (in this case the book appears as simply ‘Edinburgh’ in the borrowings). Therefore I’ve not been able to automatically pull in the information from the books sheet. However, as there are only 59 books in the books sheet it shouldn’t take too much time to manually add the necessary data when created Edition records. It’s a similar issue with Borrowers in the first sheet – they appear with name in one column (e.g. ‘Douglas, Andrew’) but in the Borrowings sheet the names are split into separate forename and surname columns. There are also instances of people with the same name (e.g. ‘Stewart, John’) but without unique identifiers there’s no way to differentiate these. There are only 110 people listed in the Borrowers sheet, and only 43 in the actual borrowing data, so again, it’s probably better if any details that are required are added in manually.
I imported a total of 898 borrowing records for Wigtown. As there is no page or ledger information in the data I just added these all to one page in a made-up ledger. It does however mean that the page can take quite a while to load in the CMS. There are 43 associated borrowers and 53 associated books, which again have been created as Holding records only and have associated authors. However, there are multiple Book Items created for many of these 53 books – there are actually 224 book items. This is because the spreadsheet contains a separate ‘Volume’ column and a book may be listed with the same title but a different volume. In such cases a Holding record is made for the book (e.g. ‘Decline and Fall of Rome’) and an Item is made for each Volume that appears (in this case 12 items for the listed volumes 1-12 across the dataset). With these datasets imported I have now processed all of the existing data I have access to, other than the Glasgow Professors borrowing records, but these are still being worked on.
I did some other tasks for the project this week as well, including reviewing the digitisation policy document for the project, which lists guidelines for the team to follow when they have to take photos of ledger pages themselves in libraries where no professional digitisation service is available. I also discussed how borrower occupations will be handled in the system with Katie.
In addition to the Books and Borrowers project I found time to work on a number of other projects this week too. I wrote a Data Management Plan for an AHRC Networking proposal that Carolyn Jess-Cooke in English Literature is putting together and I had an email conversation with Heather Pagan of the Anglo-Norman Dictionary about the Data Management Plan she wants me to write for a new AHRC proposal that Glasgow will be involved with. I responded to a query about a place-names project from Thomas Clancy, a query about App certification from Brian McKenna in IT Services and a query about domain name registration from Eleanor Lawson at QMU. Also (outside of work time) I’ve been helping my brother-in-law set up Beacon Genealogy, through which he offers genealogy and family history research services.
Also this week I worked with Jennifer Smith to make a number of changes to the content of the SCOSYA website (https://scotssyntaxatlas.ac.uk/) to provide more information about the project for REF purposes and I added a new dataset to the interactive map of Burns Suppers that I’m creating for Paul Malgrati in Scottish Literature. I also went through all of the WordPress sites I manage and upgraded them to the most recent version of WordPress.
Finally, I spent some time writing scripts for the DSL people to help identify child entries in the DOST and SND datasets that haven’t been properly merged with main entries when exported from their editing software. In such cases the child entries have been added to the main entries, but then they haven’t been removed as separate entries in the output data, meaning the child entries appear twice. When attempting to process the SND data I discovered there were some errors in the XML file (mismatched tags) that prevented my script from processing the file, so I had to spend some time tracking these down and fixing them. But once this had been done my script could do through the entire dataset, look for an ID that appeared as a URL in one entry and as an ID of another entry and in such cases pull out the IDs and the full XML of each entry and export it into an HTML table. There were about 180 duplicate child entries in DOST but a lot more in SND (the DOST file is about 1.5mb, the SND one is about 50mb). Hopefully once the DSL people have analysed the data we can then strip out the unnecessary child entries and have a better dataset to import into the new editing system the DSL is going to be using.
During week 11 of Lockdown I continued to work on the Books and Borrowing project, but also spent a fair amount of time catching up with other projects that I’d had to put to one side due to the development of the Books and Borrowing content management system. This included reading through the proposal documentation for Jennifer Smith’s follow-on funding application for SCOSYA, and writing a new version of the Data Management Plan based on this updated documentation and making some changes to the ‘export data for print publication’ facility for Carole Hough’s REELS project. I also spent some time creating as new export facility to format the place-name elements and any associated place-names for print publication too.
During this week a number of SSL certificates expired for a bunch of websites, which meant browsers were displaying scary warning messages when people visited the sites. I had to spend a bit of time tracking these down and passing the details over to Arts IT Support for them to fix as it is not something I have access rights to do myself. I also liaised with Mike Black to migrate some websites over from the server that houses many project websites to a new server. This is because the old server is running out of space and is getting rather temperamental and freeing up some space should address the issue.
I also made some further tweaks to Paul Malgrati’s interactive map of Burns’ Suppers and created a new WordPress-powered project website for Matthew Creasy’s new ‘Scottish Cosmopolitanism at the Fin de Siècle’ project. This included the usual choosing a theme, colour schemes and fonts, adding in header images and footer logos and creating initial versions of the main pages of the site. I’d also received a query from Jane Stuart-Smith about the audio recordings in the SCOTS Corpus so I did a bit of investigation about that.
Fraser Dallachy had got back to me with some further tasks for me to carry out on the processing of dates for the Historical Thesaurus, and I had intended to spend some time on this towards the end of the week, but when I began to look into this I realised that the scripts I’d written to process the old HT dates (comprising 23 different fields) and to generate the new, streamlined date system that uses a related table with just 6 fields were sitting on my PC in my office at work. Usually all the scripts I work on are located on a server, meaning I can easily access them from anywhere by connecting to the server and downloading them. However, sometimes I can’t run the scripts on the server as they may need to be left running for hours (or sometimes days) if they’re processing large amounts of data or performing intensive tasks on the data. In these cases the scripts run directly on my office PC, and this was the situation with the dates script. I realised I would need to get into my office at work on retrieve the scripts, so I put in a request to be allowed into work. Staff are not currently allowed to just go into work – instead you need to get approval from your Head of School and then arrange a time that suits security. Thankfully it looks like I’ll be able to go in early next week.
Other than these issues, I spent my time continuing to work for the Books and Borrowing project. On Tuesday we had a Zoom call with all six members of the core project team, during which I demonstrated the CMS as it currently stands. This gave me an opportunity to demonstrate the new Author association facilities I had created last week. The demonstration all went very smoothly and I think the team are happy with how the system works, although no doubt once they actually begin to use it there will be bugs to fix and workflows to tweak. I also spent some time before the meeting testing the system again, and fixing some issues that were not quite right with the author system.
I spent the remainder of my time on the project completing work on the facility to add, edit and view book holding records directly via the library page, as opposed to doing so whilst adding / editing a borrowing record. I also implemented a similar facility for borrowers as well. Next week I will begin to import some of the sample data from various libraries into the system and will allow the team to access the system to test it out.
This was week 8 of Lockdown and I spent the majority of it working on the content management system for the Books and Borrowing project. The project is due to begin at the start of June and I’m hoping to have the CMS completed and ready to use by the project team by then, although there is an awful lot to try and get into place. I can’t really go into too much detail about the CMS, but I have completed the pages to add a library and to browse a list of libraries with the option of deleting a library if it doesn’t have any ledgers. I’ve also done quite a lot with the ‘View library’ page. It’s possible to edit a library record, add a ledger and add / edit / delete additional fields for a library. You can also list all of the ledgers in a library with options to edit the ledger, delete it (if it contains no pages) and add a new page to it. You can also display a list of pages in a ledger, with options to edit the page or delete it (if it contains no records). You can also open a page in the ledger and browse through the next and previous pages.
At the moment I’m in the middle of creating the facility to add a new borrowing record to the page. This is the most complex part of the system as a record may have multiple borrowers, each of which may have multiple occupations, and multiple books, each of which may be associated with higher level book records. Plus the additional fields for the library need to be taken into consideration too. By the end of the week I was at the point of adding in an auto-complete to select an existing borrower record and I’ll continue with this on Monday.
In addition to the B&B project I did some work for other projects as well. For Thomas Clancy’s Place-names of Kirkcudbrightshire project (now renamed Place-names of the Galloway Glens) I had a few tweaks and updates to put in place before Thomas launched the site on Tuesday. I added a ‘Search place-names’ box to the right-hand column of every non-place-names page which takes you to the quick search results page and I added a ‘Place-names’ menu item to the site menu, so users can access the place-names part of the site. Every place-names page now features a sub-menu with access to the place-names pages (Browse, element glossary, advanced search, API, quick search). To return to the place-name introductory page you can press on the ‘Place-names’ link in the main menu bar. I had unfortunately introduced a bug to the ‘edit place-name’ page in the CMS when I changed the ordering of parishes to make KCB parishes appear first. This was preventing any place-names in BMC from having their cross references, feature type and parishes saved when the form was submitted. This has now been fixed. I also added Google Analytics to the site. The virtual launch on Tuesday went well and the site can now be accessed here: https://kcb-placenames.glasgow.ac.uk/.
I also added in links to the DSL’s email and Instagram accounts to the footer of the DSL site and added some new fields to the database and CMS of the Place-names of Mull and Ulva site. I also created a new version of the Burns Supper map for Paul Malgrati that included more data and a new field for video dimensions that the video overlay now uses. I also replied to Matthew Creasy about a query regarding the website for his new Scottish Cosmopolitanism project and a query from Jane Roberts about the Thesaurus of Old English and made a small tweak to the data of Gerry McKeever’s interactive map for Regional Romanticism.
Week seven of lockdown continued in much the same fashion as the preceding weeks, the only difference being Friday was a holiday to mark the 75th anniversary of VE day. I spent much of the four working days on the development of the content management system for the Books and Borrowing project. The project RAs will start using the system in June and I’m aiming to get everything up and running before then so this is my main focus at the moment. I also had a Zoom meeting with project PI Katie Halsey and Co-I Matt Sangster on Tuesday to discuss the requirements document I’d completed last week and the underlying data structures I’d defined in the weeks before. Both Katie and Matt were very happy with the document, although Matt had a few changes he wanted made to the underlying data structures and the CMS. I made the necessary changes to the data design / requirements document and the project’s database that I’d set up last week. The changes were:
Borrowing spans have now been removed from libraries and these will instead be automatically inferred based on the start and end dates of ledger records held in these libraries. Ledgers now have a new ‘ledger type’ field which currently allows the choice of ‘Professorial’, ‘Student’ or ‘Town’. This field will allow borrowing spans for libraries to be altered based on a selected ledger type. The way occupations for borrowers is recorded has been updated to enable both original occupations from the records and a normalised list of occupations to be recorded. Borrowers may not have an original occupation but still might have a standardised occupation so I’ve decided to use the occupations table as previously designed to hold information about standardised occupations. A borrower may have multiple standardised occupations. I have also added a new ‘original occupation’ field to the borrower record where any number of occupations found for the borrower in the original documentation (e.g. river watcher) can be added if necessary. The book edition table now has an ‘other authority URL’ field and an ‘other authority type’ field which can be used if ESTC is not appropriate. The ‘type’ currently features ‘Worldcat’, ‘CERL’ and ‘Other’ and ‘Language’ has been moved from Holding to Edition. Finally, in Book Holding the short title is now original title and long title is now standardised title while the place and date of publication fields have been removed as the comparable fields at Edition level will be sufficient.
In terms of the development of the CMS, I created a Bootstrap-based interface for the system, which currently just uses the colour scheme I used for Matt’s pilot 18th Century Borrowing project. I created the user authentication scripts and the menu structure and then started to create the actual pages. So far I’ve created a page to add a new library record and all of the information associated with a library, such as any number of sources. I then created the facility to browse and delete libraries and the main ‘view library’ page, which will act as a hub through which all book and borrowing records associated with the library will be managed. This page has a further tab-based menu with options to allow the RA to view / add ledgers, additional fields, books and borrowers, plus the option to edit the main library information. So far I’ve completed the page to edit the library information and have started work on the page to add a ledger. I’m making pretty good progress with the CMS, but there is still a lot left to do. Here’s a screenshot of the CMS if you’re interested in how it looks:
Also this week I had a Zoom meeting the Marc Alexander and Fraser Dallachy to discuss update to the Historical Thesaurus as we head towards a second edition. This will include adding in new words from the OED and new dates for existing words. My new date structure will also go live, so there will need to be changes to how the timelines work. Marc is hoping to go live with new updates in August. We also discussed the ‘guess the category’ quiz, with Marc and Fraser having some ideas about limiting the quiz to certain categories, or excluding other categories that might feature inappropriate content. We may also introduce a difficulty level based on date, with an ‘easy’ version only containing words that were in use for a decent span of time in the past 200 years.
Other work I did this week included making some tweaks to the data for Gerry McKeever’s interactive map, fixing an issue with videos continuing to play after the video overlay was closed for Paul Malgrati’s Burns Supper map, replying to a query from Alasdair Whyte about his Place-names of Mull and Ulva project and looking into an issue for Fraser’s Scots Thesaurus project which unfortunately I can’t do anything about as the scripts I’d created for this (which needed to be let running for several days) are on the computer in my office. If this lockdown ever ends I’ll need to tackle this issue then.
I was on holiday for all of last week and Monday and Tuesday this week. My son and I were supposed to be visiting my parents for Easter, but we were unable to do so due to the lockdown and instead had to find things to amuse ourselves with around the house. I answered a few work emails during this time, including alerting Arts IT Support to some issues with the WordPress server and responding to a query from Ann Fergusson at the DSL. I returned to work (from home, of course) on Wednesday and spent the three days working on various projects.
For the Books and Borrowers project I spent some time downloading and looking through the digitised and transcribed borrowing registers of St. Andrews. They have made three registers from the second half of the 18th century available via a Wiki interface (see https://arts.st-andrews.ac.uk/transcribe/index.php?title=Main_Page) and we were given access to all of these materials that had been extracted and processed by Patrick McCann, who I used to work very closely with back when we were both based at HATII and worked for the Digital Curation Centre. Having looked through the materials it’s clear that we will be able to use the transcriptions, which will be a big help. The dates will probably need to be manually normalised, though, and we will need access to higher resolution images than the ones we have been given in order to make a zoom and pan interface using them.
I also updated the introductory text for Gerry McKeever’s interactive map of the novel Paul Jones, and I think this feature is now ready to go live, once Gerry want to launch it. I also fixed an issue with the Editing Robert Burns website that was preventing the site editors (namely Craig Lamont) from editing blog posts. I also created a further new version of the Burns Supper map for Paul Malgrati. This version incorporates updated data, which has greatly increased the number of Suppers that appear on the map and I also changed the way videos work. Previously if an entry had a link to a video then a button was added to the entry that linked through to the externally hosted video site (which could be YouTube, Facebook, Twitter or some other site). Instead, the code now identifies the origin of the video and I’ve managed to embed players from YouTube, Facebook and Twitter. These now open the videos in the same drop-down overlay as the images. The YouTube and Facebook players are centre aligned but unfortunately Twitter’s player displays to the left and can’t be altered. Also, the YouTube and Facebook players expect the width and height of the player to be specified. I’ve taken these from the available videos, but ideally the desired height and width should be stored as separate columns in the spreadsheet so these can be applied to each video as required. Currently all YouTube and all Facebook videos have the same width and height, which can mean landscape oriented Facebook videos appear rather small, for example. Also, some videos can’t be embedded due to their settings (e.g. the Singapore Facebook video). However, I’ve added a ‘watch video’ button underneath the player so people can always click through to the original posting.
I also responded to a query from Rhona Alcorn about how DSL data exported from their new editing system will be incorporated into the live DSL site, responded to a query from Thomas Clancy about making updates to the Place-names of Kirkcudbrightshire website and responded to a query from Kirsteen McCue about an AHRC proposal she’s putting together.
I returned to looking at the ‘guess the category quiz’ that I’d created for the Historical Thesaurus before the Easter holidays and updated the way it worked. I reworked the way the database is queried so as to make things more efficient, to ensure the same category isn’t picked as more than one of the four options and to ensure that the selected word isn’t also found in one of the three ‘wrong’ category choices. I also decided to update the category table to include two new columns, one that holds a count of the number of lexemes that have a ‘wordoed’ and the other than holds a count of the number of lexemes that have a ‘wordoe’ in each category. I then ran a script that generated these figures for all 250,000 or so categories. This is really just caching information that can be gleaned from a query anyway, but it makes querying a lot faster and makes it easier to pinpoint categories of a particular size and I think these columns will be useful for tasks beyond the quiz (e.g. show me the 10 largest Aj categories). I then created a new script that queries the database using these columns and returns data for the quiz.
This script is much more streamlined and considerably less prone to getting stuck in loops of finding nothing but unsuitable categories. Currently the script is set to only bring back categories that have at least two OED words in them, but this could easily be changed to target larger categories only (which would presumably make the quiz more of a challenge). I could also add in a check to exclude any words that are also found in the category name to increase the challenge further. The actual quiz page itself was pretty much unaltered during these updates, but I did add in a ‘loading’ spinner, which helps the transition between questions.
I’ve also created an Old English version of the quiz which works in the same way except the date of the word isn’t displayed and the ‘wordoe’ column is used. Getting 5/5 on this one is definitely more of a challenge! Here’s an example question:
I spent the rest of the week upgrading all of the WordPress sites I manage to the latest WordPress release. This took quite a bit of time as I had to track down the credentials for each site, many of which I didn’t already have a note of at home. There were also some issues with some of the sites that I needed to get Arts IT Support to sort out (e.g. broken SSL certificates, sites with the login page blocked even when using the VPN). By the end of the week all of the sites were sorted.
This was the second week of the Coronavirus lockdown and I followed a similar arrangement to last week, managing to get a pretty decent amount of work done in between home-schooling sessions for my son. I spent most of my time working for the Books and Borrowing project. I had a useful conference call with the PI Katie Halsey and Co-I Matt Sangster last week, and the main outcome of that meeting for me was that I’d further expand upon the data design document I’d previously started in order to bring it into line with our understanding of the project’s requirements. This involved some major reworking of the entity-relationship diagram I had previously designed based on my work with the sample datasets, with the database structure increasing from 11 related tables to 21, incorporating a new system to trace books and their authors across different libraries, to include borrower cross-references and to greatly increase the data recorded about libraries. I engaged in many email conversations with Katie and Matt over the course of the week as I worked on the document, and on Friday I sent them a finalised version consisting of 34 pages and more than 7,000 words. This is still in ‘in progress’ version and will no doubt need further tweaks based on feedback and also as I build the system, but I’d say it’s a pretty solid starting point. My next step will be to add a new section to the document that describes the various features of the content management system that will connect to the database and enable to project’s RAs to add and edit data in a streamlined and efficient way.
Also this week I did some further work for the DSL people, who have noticed some inconsistencies with the way their data is stored in their own records compared to how it appears in the new editing system that they are using. I wasn’t directly involved in the process of getting their data into the new editing system but spent some time going through old emails, looking at the data and trying to figure out what might have happened. I also had a conference call with Marc Alexander and the Anglo-Norman Dictionary people to discuss the redevelopment of their website. It looks like this will be going ahead and I will be doing the redevelopment work. I’ll try to start on this after Easter, with my first task being the creation of a design document that will map out exactly what features the new site will include and how these relate to the existing site. I also need to help the AND people to try and export the most recent version of their data from the server as the version they have access to is more than a year old. We’re going to aim to relaunch the site in November, all being well.
I also had a chat with Fraser Dallachy about the new quiz I’m developing for the Historical Thesaurus. Fraser had a couple of good ideas about the quiz (e.g. making versions for Old and Middle English) that I’ll need to see about implementing in the coming weeks. I also had an email conversation with the other developers in the College of Arts about documenting the technologies that we use or have used in the past for projects and made a couple of further tweaks to the Burns Supper map based on feedback from Paul Malgrati.
I’m going to be on holiday next week and won’t be back to work until Wednesday the 15th of April so there won’t be any further updates from me for a while.
This was the first full week of the Coronavirus lockdown and as such I was working from home and also having to look after my nine year-old son who is also at home on lockdown. My wife and I have arranged to split the days into morning and afternoon shifts, with one of us home-schooling our son while the other works during each shift and extra work squeezed in before and after these shifts. The arrangement has worked pretty well for all of us this week and I’ve managed to get a fair amount of work done.
This included spotting and requesting fixes for a number of other sites that had started to display scary warnings about their SSL certificates, working on an updated version of the Data Management Plan for the SCOSYA follow-on proposal, fixing some log-in and account related issues for the DSL people and helping Carolyn Jess-Cooke in English Literature with some technical issues relating to a WordPress blog she has set up for a ‘Stay at home’ literary festival (https://stayathomefest.wordpress.com/). I also had a conference call with Katie Halsey and Matt Sangster about the Books and Borrowers project, which is due to start at the beginning of June. It was my first time using the Zoom videoconferencing software and it worked very well, other than my cat trying to participate several times. We had a good call and made some plans for the coming weeks and months. I’m going to try and get an initial version of the content management system and database for the project in place before the official start of the project so that the RAs will be able to use this straight away. This is of even greater importance now as they are likely to be limited in the kinds of research activities they can do at the start of the project because of travel restrictions and will need to work with digital materials.
Other than these issues I divided my time between three projects. The first was the Burns Supper map for Paul Malgrati in Scottish Literature. Paul had sent me some images that are to be used in the map and I spent some time integrating these. The image appears as a thumbnail with credit text (if available) appearing underneath. If there is a link to the place the image was taken from the credit text appears as a link. Clicking on the image thumbnail opens the full image in a new tab. I also added links to the videos where applicable too, but I decided not to embed the videos in the page as I think these would be too small and there would be just too much going on for locations that have both videos and an image. Paul also wanted clusters to be limited by areas (e.g. a cluster for Scotland rather than these just being amalgamated with a big cluster for Europe when zooming out) and I investigated this. I discovered that it is possible to create groups of locations. E.g. have a new column in the spreadsheet named ‘cluster’ or something like that and all the ‘Scotland’ ones could have ‘Scotland’ here, or all the South American ones could have ‘South America’ here. These will then be the top level clusters and they will not be further amalgamated on zoom out. Once Paul gets back to me with the clusters he would like for the data I’ll update things further. Below is an image of the map with the photos embedded:
The second major project I worked on was the interactive map for Gerry McKeever’s Regional Romanticism project. Gerry had got back to me with a new version of the data he’d been working on and some feedback from other people he’d sent the map to. I created a new version of the map featuring the new data and incorporated some changes to how the map worked based on feedback, namely I moved the navigation buttons to the top of the story pane and have made them bigger, with a new white dividing line between the buttons and the rest of the pane. This hopefully makes them more obvious to people and means the buttons are immediately visible rather than people potentially having to scroll to see them. I’ve also replaced the directional arrows with thicker chevron icons and have changed the ‘Return to start’ button to ‘Restart’. I’ve also made the ‘Next’ button on both the overview and the first slide blink every few seconds, at Gerry’s request. Hopefully this won’t be too annoying for people. Finally I made the slide number bigger too. Here’s a screenshot of how things currently look:
I then decided to chain several questions together to make the quiz more fun. Once the correct answer is given a ‘Next’ button appears, leading to a new question. I set up a ‘max questions’ variable that controls how many questions there are (e.g. 3, 5 or 10) and the questions keep coming until this number is reached. When the number is reached the user can then view a summary that tells them which words and (correct) categories were included, provides links to the categories and gives the user an overall score. I decided that if the user guesses correctly the first time they should get one star. If they guess correctly a second time they get half a star and any more guesses get no stars. The summary and star ratings for each question are also displayed as the following screenshot shows:
It’s shaping up pretty nicely, but I still need to work on the script that exports data from the database. Identifying random categories that contain at least one non-OE word and are of the same part of speech as the first randomly chosen category currently means hundreds or even thousands of database calls before a suitable category is returned. This is inefficient and occasionally the script was getting caught in a loop and timing out before it found a suitable category. I managed to catch this by having some sample data that loads if a suitable category isn’t found after 1000 attempts, but it’s not ideal. I’ll need to work on this some more over the next few weeks as time allows.