Week Beginning 21st November 2022

I participated in the UCU strike action on Thursday and Friday this week, so it was a three-day week for me.  I spent much of this time researching RDF, SKOS and OWL semantic web technologies in an attempt to understand them better in the hope that they might be of some use for future thesaurus projects.  I’ll be giving a paper about this at a workshop in January so I’m not going to say too much about my investigations here.  There is a lot to learn about, however, and I can see me spending quite a lot more time on this in the coming weeks.

Other than this I returned to working on the Anglo-Norman Dictionary.  I added in a line of text that had somehow been omitted from one of the Textbase XML files and added in a facility to enable project staff to delete an entry from the dictionary.  In reality this just deactivates the entry, removing it from the front-end but still keeping a record of it in the database in case the entry needs to be reinstated.  I also spoke to the editor about some proposed changes to the dictionary management system and begin to think about how these new features will function and how they will be developed.

For the Books and Borrowing project I had a chat with IT Services at Stirling about setting up an Apache Solr system for the project.  It’s looking like we will be able to proceed with this option, which will be great.  I also had a chat with Jennifer Smith about the new Speak For Yersel project areas.  It looks like I’ll be creating the new resources around February next year.  I also fixed an issue with the Place-names of Iona data export tool and discussed a new label that will be applied to data for the ‘About’ box for entries in the Dictionaries of the Scots Language.

I also prepared for next week’s interview panel and engaged in a discussion with IT Services about the future of the servers that are hosted in the College of Arts.

Week Beginning 7th November 2022

I participated in an event about Digital Humanities in the College of Arts that Luca had organised on Monday, at which I discussed the Books and Borrowing project.  It was a good event and I hope there will be more like it in future.  Luca also discussed a couple of his projects and mentioned that the new Curious Travellers project is using Transkribus (https://readcoop.eu/transkribus/) which is an OCR / text recognition tool for both printed text and handwriting that I’ve been interested in for a while but haven’t yet needed to use for a project.  I will be very interested to hear how Curious Travellers gets on with the tool in future.  Luca also mentioned a tool called Voyant (https://voyant-tools.org/) that I’d never heard of before that allows you to upload a text and then access many analysis and visualisation tools.  It looks like it has a lot of potential and I’ll need to investigate it more thoroughly in future.

Also this week I had to prepare for and participate a candidate shortlisting session for a new systems developer post in the College of Arts and Luca and I had a further meeting with Liz Broe of College of Arts admin about security issues relating to the servers and websites we host.  We need to improve the chain of communication from Central IT Services to people like me and Luca so that security issues that are identified can be addressed speedily.  As of yet we’ve still not heard anything further from IT Services so I have no idea what these security issues are, whether they actually relate to any websites I’m in charge of and whether these issues relate to the code or the underlying server infrastructure.  Hopefully we’ll hear more soon.

The above took a fair bit of time out of my week and I spent most of the remainder of the week working on the Books and Borrowing project.  One of the project RAs had spotted an issue with a library register page appearing out of sequence so I spent a little time rectifying that.  Other than that I continued to develop the front-end, working on the quick search that I had begun last week and by the end of the week I was still very much in the middle of working through the quick search and the presentation of the search results.

I have an initial version of the search working now and I created an index page on the test site I’m working on that features a quick search box.  This is just a temporary page for test purposes – eventually the quick search box will appear in the header of every page.  The quick search does now work for both dates using the pattern matching I discussed last week and for all other fields that the quick search needs to cover.  For example, you can now view all of the borrowing records with a borrowed date between February 1790 and September 1792 (1790/02-1792/09) which returns 3426 borrowing records.  Results are paginated with 100 records per page and options to navigate between pages appear at the top and bottom of each results page.

The search results currently display the complete borrowing record for each result, which is the same layout as you find for borrowing records on a page.  The only difference is additional information about the library, register and page the borrowing record appears on can be found at the top of the record.  These appear as links and if you press on the page link this will open the page centred on the selected borrowing record.  For date searches the borrowing date for each record is highlighted in yellow, as you can see in the screenshot below:

The non-date search also works, but is currently a bit too slow.  For example a search for all borrowing records that mention ‘Xenophon’ takes a few seconds to load, which is too long.  Currently non-date quick searches do a very simple find and replace to highlight the matched text in all relevant fields.  This currently makes the matched text upper case, but I don’t intend to leave it like this.  You can also search for things like the ESTC too.

However, there are several things I’m not especially happy about:

  1. The speed issue: the current approach is just too slow
  2. Ordering the results: currently there are no ordering options because the non-date quick search performs five different queries that return borrowing IDs and these are then just bundled together. To work out the ordering (such as by date borrowed, by borrower name)  many more fields in addition to borrowing ID would need to be returned, potentially for thousands of records and this is going to be too slow with the current data structure
  3. The search results themselves are a bit overwhelming for users, as you can see from the above screenshot. There is so much data it’s a bit hard to figure out what you’re interested in and I will need input from the project team as to what we should do about this.  Should we have a more compact view of results?  If so what data should be displayed?  The difficulty is if we omit a field that is the only field that includes the user’s search term it’s potentially going to be very confusing
  4. This wasn’t mentioned in the requirements document I wrote for the front-end, but perhaps we should provide more options for filtering the search results. I’m thinking of facetted searching like you get in online stores:  You see the search results and then there are checkboxes that allow you to narrow down the results.  For example, we could have checkboxes containing all occupations in the results allowing the user to select one or more.  Or we have checkboxes for ‘place of publication’ allowing the user to select ‘London’, or everywhere except ‘London’.
  5. Also not mentioned, but perhaps we should add some visualisations to the search results too. For example, a bar graph showing the distribution of all borrowing records in the search results over time, or another showing occupations or gender of the borrowings in the search results etc.  I feel that we need some sort of summary information as the results themselves are just too detailed to easily get an overall picture of.

I came across the Universal Short Title Catalogue website this week (e.g. https://www.ustc.ac.uk/explore?q=xenophon) it does a lot of the things I’d like to implement (graphs, facetted search results) and it does it all very speedily with a pleasing interface and I think we could learn a lot from this.

Whilst thinking about the speed issues I began experimenting with Apache Solr (https://solr.apache.org/) which is a free search platform that is much faster than a traditional relational database and provides options for facetted searching.  We use Solr for the advanced search on the DSL website so I’ve had a bit of experience with it.  Next week I’m going to continue to investigate whether we might be better off using it, or whether creating cached tables in our database might be simpler and work just as well for our data.  But if we are potentially going to use Solr then we would need to install it on a server at Stirling.  Stirling’s IT people might be ok with this (they did allow us to set up a IIIF server for our images, after all) but we’d need to check.  I should have a better idea as to whether Solr is what we need by the end of next week, all being well.

Also this week I spent some time working on the Speech Star project.  I updated the database to highlight key segments in the ‘target’ field which had been highlighted in the original spreadsheet version of the data by surrounding the segment with bar characters.  I’d suggested this as when exporting data from Excel to a CSV file all Excel formatting such as bold text is lost, but unfortunately I hadn’t realised that there may be more than one highlighted segment in the ‘target’ field.  This made figuring out how to split the field and apply a CSS style to the necessary characters a little trickier but I got there in the end.  After adding in the new extraction code I reprocessed the data, and currently the key segment appears in bold red text, as you can see in the following screenshot:

I also spent some time adding text to several of the ancillary pages of the site, such as the homepage and the ‘about’ page and restructured the menus, grouping the four database pages together under one menu item.

Also this week I tweaked the help text that appears alongside the advanced search on the DSL website and fixed an error with the data of the Thesaurus of Old English website that Jane Roberts had accidentally introduced.

Week Beginning 31st October 2022

I spent a lot of the week continuing to work on the Books and Borrowing front end.  To begin with I worked on the ‘borrowers’ tab in the ‘library’ page and created an initial version of it.  Here’s an example of how it looks:

As with books, the page lists borrowers alphabetically, in this case by borrower surname.  Letter tabs and counts of the number of borrowers with surnames beginning with the letter appear at the top and you can select a letter to view all borrowers with surnames beginning with the letter.  I had to create a couple of new fields in the borrower table to speed the querying up, saving the initial letter of each borrower’s surname and a count of their borrowings.

The display of borrowers is similar to the display of books, with each borrower given a box that you can press on to highlight.  Borrower ID appears in the top right and each borrower’s full name appears as a green title.  The name is listed as it would be read, but this could be updated if required.  I’m not sure where the ‘other title’ field would go if we did this, though – presumably something like ‘Macdonald, Mr Archibald of Sanda’.

The full information about a borrower is listed in the box, including additional fields and normalised occupations.  Cross references to other borrowers also appear.  As with the ‘Books’ tab, much of this data will be linked to search results once I’ve created the search options (e.g. press on an occupation to view all borrowers with this occupation, press on the number of borrowings to view the borrowings) but this is not in place yet.  You can also change the view from ‘surname’ to ‘top 100 borrowers’, which lists the top 100 most prolific borrowers (or less if there are less than 100 borrowers in the library).  As with the book tab, a number appears at the top left of each record to show the borrower’s place on the ‘hitlist’ and the number of borrowings is highlighted in red to make it easier to spot.

I also fixed some issues with the book and author caches that were being caused by spaces at the start of fields and author surnames beginning with a non-capitalised letter (e.g. ‘von’) which was messing things up as the cache generation script was previously only matching upper case, meaning ‘v’ wasn’t getting added to ‘V’.  I’ve regenerated the cache to fix this.

I then decided to move onto the search rather than the ‘Facts & figures’ tab as I reckoned this should be prioritised.  I began work on the quick search initially, and I’m still very much in the middle of this.  The quick search has to search an awful lot of and to do this several different queries need to be run.  I’ll need to see how this works in terms of performance as I fear the ‘quick’ search risks being better named the ‘slow’ search.

We’ve stated that users will be able to search for dates in the quick search and these need to be handled differently.  For now the API checks to see whether the passed search string is a date by running a pattern match on the string.  This converts all numbers in the string into an ‘X’ character and then checks to see whether the resulting string matches a valid date form.  For the API I’m using a bar character (|) to designate a ranged date and a dash to designate a division between day, month and year.  I can’t use a slash (/) as the search string is passed in the URL and slashes have meaning in URLs.  For info, here are the valid date string patterns:

“XXXX”,”XXXX-XX”,”XXXX-XX-XX”,”XXXX|XXXX”,”XXXX|XXXX-XX”,”XXXX|XXXX-XX-XX”,”XXXX-XX|XXXX”,”XXXX-XX|XXXX-XX”,”XXXX-XX|XXXX-XX-XX”,”XXXX-XX-XX|XXXX”,”XXXX-XX-XX|XXXX-XX”,”XXXX-XX-XX|XXXX-XX-XX”

So for example, if someone searches for ‘1752’ or ‘1752-03’ or ‘1752-02|1755-07-22’ the system will recognise these as a date search and process them accordingly.  I should point out that I can and probably will get people to enter dates in a more typical way in the front-end, using slashes between day, month and year and a dash between ranged dates (e.g. ‘1752/02-1755/07/22’) but I’ll convert these before passing the string to the API in the URL.

I have the query running to search the dates, and this in itself was a bit complicated to generate as including a month or a month and a day in a ranged query changes the way the query needs to work.  E.g. if the user searches for ‘1752-1755’ then we need to return all borrowing records with a borrowed year of ‘1752’ or later and ‘1755’ or earlier.  However, if the query is ‘1752/06-1755-03’ then the query can’t just be ‘all borrowed records with a borrowed year of ‘1752’ or later and a borrowed month of ‘06’ or later and a borrowed year of ‘1755’ or earlier and a borrowed month of ‘03’ or earlier as this would return no results.  This is because the query is looking to return borrowings with a borrowed month of ‘06’ or later and also ‘03’ or earlier.  Instead the query needs to find borrowing records that have a borrowed year of 1752 AND a borrowed month of ‘06’ or later OR have a borrowed year later than 1752 AND have a borrowed year of 1755 AND a borrowed month of ‘03’ or earlier OR have a borrowed year earlier than 1755.

I also have the queries running that search for all necessary fields that aren’t dates.  This currently requires five separate queries to be run to check fields like author names, borrower occupations, book edition fields such as ESTC etc.  The queries currently return a list of borrowing IDs, and this is as far as I’ve got.  I’m wondering now whether I should create a cached table for the non-date data queried by the quick search, consisting of a field for the borrowing ID and a field for the term that needs to be searched, with each borrowing having many rows depending on the number of terms they have (e.g. a row for each occupation of every borrower associated with the borrowing, a row for each author surname, a row for each forename, a row for ESTC).  This should make things much speedier to search, but will take some time to generate.  I’ll continue to investigate this next week.

Also this week I updated the structure of the Speech Star database to enable each prompt to have multiple sounds etc.  I had to update the non-disordered page and the child error page to work with the new structure, but it seems to be working.  I also had to update the ‘By word’ view as previously sound, articulation and position were listed underneath the word and above the table.  As these fields may now be different for each record in the table I’ve removed the list and have instead added the data as columns to the table.  This does however mean that the table contains a lot of identical data for many of the rows now.

I then added in tooptip / help text containing information about what the error types mean in the child speech error database.  On the ‘By Error Type’ page the descriptions currently appear as small text to the right of the error type title.  On the ‘By Word’ page  the error type column has an ‘i’ icon after the error type.  Hovering over or pressing on this displays a tooltip with the error description, as you can see in the following screenshot:

I also updated the layout of the video popups to split the metadata across two columns and also changed the order of the errors on the ‘By error type’ page so that the /r/ errors appear in the correct alphabetical order for ‘r’ rather than appearing first due to the text beginning with a slash.  With this all in place I then replicated the changes on the version of the site that is going to be available via the Seeing Speech URL.

Kirsteen McCue contacted me last week to ask for advice on a British Academy proposal she’s putting together and after asking some questions about the project I wrote a bit of text about the data and its management for her.  I also sorted out my flights and accommodation for the workshop I’m attending in Zurich in January and did a little bit of preparation for a session on Digital Humanities that Luca Guariento has organised for next Monday.  I’ll be discussing a couple of projects at this event.  I also exported all of the Speak For Yersel survey data and sent this to a researcher who is going to do some work with the data and fixed an issue that had cropped up with the Place-names of Kirkcudbright website.  I also spent a bit of time on DSL duties this week, helping with some user account access issues and discussing how links will be added from entries to the lengthy essay on the Scots Language that we host on the site.

 

Week Beginning 3rd October 2022

The Speak For Yersel project launched this week and is now available to use here: https://speakforyersel.ac.uk/.  It’s been a pretty intense project to work on and has required much more work than I’d expected, but I’m very happy with the end result.  We didn’t get as much media attention as we were hoping for, but social media worked out very well for the project and in the space of a week we’d had more than 5,000 registered users completing thousands of survey questions.  I spent some time this week tweaking things after the launch.  For example, I hadn’t added the metadata tags required by Twitter and Facebook / WhatsApp to nicely format links to the website (for example the information detailed here https://developers.facebook.com/docs/sharing/webmasters/) and it took a bit of time to add these in with the correct content.

I also gave some advice to Anja Kuschmann at Strathclyde about applying for a domain for the new VARICS project I’m involved with and investigated a replacement batch of videos that Eleanor had created for the Seeing Speech website.  I’ll need to wait until she gets back to me with files that match the filenames used on the existing site before I can take this further, though.  I also fixed an issue with the Berwickshire place-names website which has lost its additional CSS and investigated a problem with the domain for the Uist Saints website that has still unfortunately not been resolved.

Other than these tasks I spent the rest of the week continuing to develop the front-end for the Books and Borrowing project.  I completed an initial version of the ‘page’ view, including all three views (image, text and image and text).  I added in a ‘jump to page’ feature, allowing you (as you might expect) to jump directly to any page in the register when viewing a page.  I also completed the ‘text’ view of the page, which now features all of the publicly accessible data relating to the records – borrowing records, borrowers, book holding and item records and any associated book editions and book works, plus associated authors.  There’s an awful lot of data and it took quite a lot of time to think about how best to lay it all out (especially taking into consideration screens of different sizes), but I’m pretty happy with how this first version looks.

Currently the first thing you see for a record is the transcribed text, which is big and green.  Then all fields relating to the borrowing appear under this.  The record number as it appears on the page plus the record’s unique ID are displayed in the top right for reference (and citation).  Then follows a section about the borrower, with the borrower’s name in green (I’ve used this green to make all of the most important bits of text stand out from the rest of the record but the colour may be changed in future).  Then follows the information about the book holding and any specific volumes that were borrowed.  If there is an associated site-wide book edition record (or records) these appear in a dark grey box, together with any associated book work record (although there aren’t many of these associations yet).  If there is a link to a library record this appears as a button on the right of the record.  Similarly, if there’s an ESTC and / or other authority link for the edition these appear to the right of the edition section.

Authors now cascade down through the data as we initially planned.  If there’s an author associated with a work it is automatically associated with and displayed alongside the edition and holding.  If there’s an author associated with an editon but not a work it is then associated with the holding.  If a book at a specific level has an author specified then this replaces any cascading author from this point downwards in the sequence.  Something that isn’t in place yet are the links from information to search results, as I haven’t developed the search yet.  But eventually things like borrower name, author, book title etc will be links allowing you to search directly for the items.

One other thing I’ve added in is the option to highlight a record.  Press anywhere in a record and it is highlighted in yellow.  Press again to reset it.  This can be quite useful as you’re scrolling through a page with lots of records on if there are certain records you’re interested in.  You can highlight as many records as you want.  It’s possible that we may add other functionality to this, e.g. the option to download the data for selected records.  Here’s a screenshot of the text view of the page:

I also completed the ‘image and text’ view.  This works best on a large screen (i.e. not a mobile phone, although it is just about possible to use it on one, as I did test this out).  The image takes up about 60% of the screen width and the text takes up the remaining 40%.   The height of the records section is fixed to the height of the image area and is scrollable, so you can scroll down the records whilst still viewing the image (rather than the whole page scrolling and the image disappearing off the screen).  I think this view works really well and the records are still perfectly usable in the more confined area and it’s great to be able to compare the image and the text side by side.  Here’s a screenshot of the same page when viewing both text and image:

I tested the new interface out with registers from all of our available libraries and everything is looking good to me.  Some registers don’t have images yet, so I added in a check for this to ensure that the image views and page thumbnails don’t appear for such registers.  After that I moved onto developing the interface to browse book holdings when viewing a library.  I created an API endpoint for returning all of the data associated with holding records for a specified library.  This includes all of the book holding data, information about each of the book items associated with the holding record (including the number of borrowing records for each), the total number of borrowing records for the holding, any associated book edition and book work records (and there may be multiple editions associated with each holding) plus any authors associated with the book.  Authors cascade down through the record as they do when viewing borrowing records in the page.  This is a gigantic amount of information, especially as libraries may have many thousands of book holding records.  The API call loads pretty rapidly for smaller libraries (e.g. Chambers Library with 961 book holding records) but for larger ones (e.g. St Andrews with over 8,500 book holding records) the API call takes too long to return the data (in the latter case it takes about a minute and returns a JSON file that’s over 6Mb in size).  The problem is the data needs to be returned in full in order to do things like order it by largest number of borrowings.  Clearly dynamically generating the data each time is going to be too slow so instead I am going to investigate caching the data.  For example, that 6Mb JSON file can just site there as an actual file rather than being generated each time.  Instead I will write a script to regenerate the cached files and I can run this whenever data gets updated (or maybe once a week whilst the project is still active).  I’ll continue to work on this next week.

Week Beginning 19th September 2022

It was a four-day week this week due to the Queen’s funeral on Monday.  I divided my time for the remaining four days over several projects.  For Speak For Yersel I finally tackled the issue of the way maps are loaded.  The system had been developed for a map to be loaded afresh every time data is requested, with any existing map destroyed in the process.  This worked fine when the maps didn’t contain demographic filters as generally each map only needed to be loaded once and then never changed until an entirely new map was needed (e.g. for the next survey question).  However, I was then asked to incorporate demographic filters (age groups, gender, education level), with new data requested based on the option the user selected.  This all went through the same map loading function, which still destroyed and reinitiated the entire map on each request.  This worked, but wasn’t ideal, as it meant the map reset to its default view and zoom level whenever you changed an option, map tiles were reloaded from the server unnecessarily and if the user was in ‘full screen’ mode they were booted out of this as the full screen map no longer existed.  For some time I’ve been meaning to redevelop this to address these issues, but I’ve held off as there were always other things to tackled and I was worried about essentially ripping apart the code and having to rebuilt fundamental aspects of it.  This week I finally plucked up the courage to delve into the code.

I created a test version of the site so as to not risk messing up the live version and managed to develop an updated method of loading the maps.  This method initiates the map only once when a page is first loaded rather than destroying and regenerating the map every time a new question is loaded or demographic data is changed.  This means the number of map tile loads is greatly reduced as the base map doesn’t change until the user zooms or pans.  It also means the location and zoom level a user has left the map on stays the same when the data is changed.  For example, if they’re interested in Glasgow and are zoomed in on it they can quickly flick between different demographic settings and the map will stay zoomed in on Glasgow rather than resetting each time.  Also, if you’re viewing the map in full-screen mode you can now change the demographic settings without the resource exiting out of full screen mode.

All worked very well, with the only issues being that the transitions between survey questions and quiz questions weren’t as smooth as the with older method.  Previously the map scrolled up and was then destroyed, then a new map was created and the data was loaded into the area before it smoothly scrolled down again.  For various technical reasons this no longer worked quite as well any more.  The map area still scrolls up and down, but the new data only populates the map as the map area scrolls down, meaning for a brief second you can still see the data and legend for the previous question before it switches to the new data.  However, I spent some further time investigating this issue and managed to fix it, with different fixes required for the survey and the quiz.  I also noticed a bug whereby the map would increase in size to fit the available space but the map layers and data were not extending properly into the newly expanded area.  This is a known issue with Leaflet maps that have their size changed dynamically and there’s actually a Leaflet function that sorts it – I just needed to call map.invalidateSize(); and the map worked properly again.  Of course it took a bit of time to figure this simple fix out.

I also made some further updates to the site.  Based on feedback about the difficulty some people are having about which surveys they’ve done, I updated the site to log when the user completes a survey.  Now when the user goes to the survey index page a count of the number of surveys they’ve completed is displayed in the top right and a green tick has been added to the button of each survey they have completed.  Also, when they reach the ‘what next’ page for a survey a count of their completed survey is also shown.  This should make it much easier for people to track what they’ve done.  I also made a few small tweaks to the data at the request of Jennifer, and create a new version of the animated GIF that has speech bubbles, as the bubble for Shetland needed its text changed.  As I didn’t have the files available I took the opportunity regenerate the GIF, using a larger map, as the older version looked quite fuzzy on a high definition screen like an iPad.  I kept the region outlines on as well to tie it in better with our interactive maps.  Also the font used in the new version is now the ‘Baloo’ font we use for the site.  I stored all of the individual frames both as images and as powerpoint slides so I can change them if required.  For future reference, I created the animated GIF using https://ezgif.com/maker with a 150 second delay between slides, crossfade on and a fader delay of 8.

Also this week I researched an issue with the Scots Thesaurus that was causing the site to fail to load.  The WordPress options table had become corrupted and unreadable and needed to be replaced with a version from the backups, which thankfully fixed things.  I also did my expenses from the DHC in Sheffield, which took longer than I thought it would, and made some further tweaks to the Kozeluch mini-site on the Burns C21 website.  This included regenerating the data from a spreadsheet via a script I’d written and tweaking the introductory text.  I also responded to a request from Fraser Dallachy to regenerate some data that a script Id’ previously written had outputted.  I also began writing a requirements document for the redevelopment of the place-names project front-ends to make them more ‘map first’.

I also did a bit more work for Speech Star, making some changes to the database of non-disordered speech and moving the ‘child speech error database’ to a new location.  I also met with Luca to have a chat about the BOSLIT project, its data, the interface and future plans.  We had a great chat and I then spent a lot of Friday thinking about the project and formulating some feedback that I sent in a lengthy email to Luca, Lorna Hughes and Kirsteen McCue on Friday afternoon.

Week Beginning 12th September 2022

I spent a bit of time this week going through my notes from the Digital Humanities Congress last week and writing last week’s lengthy post.  I also had my PDR session on Friday and I needed to spend some time preparing for this, writing all of the necessary text and then attending the session.  It was all very positive and it was a good opportunity to talk to my line manager about my role.  I’ve been in this job for ten years this month and have been writing these blog posts every working week for those ten years, which I think is quite an achievement.

In terms of actual work on projects, it was rather a bitty week, with my time spread across lots of different projects.  On Monday I had a Zoom call for the VariCS project, a phonetics project in collaboration with Strathclyde that I’m involved with.  The project is just starting up and this was the first time the team had all met.  We mainly discussed setting up a web presence for the project and I gave some advice on how we could set up the website, the URL and such things.  In the coming weeks I’ll probably get something set up for the project.

I then moved onto another Burns-related mini-project that I worked on with Kirsteen McCue many months ago – a digital edition of Koželuch’s settings of Robert Burns’s Songs for George Thomson.  We’re almost ready to launch this now and this week I created a page for an introductory essay, migrated a Word document to WordPress to fill the page, including adding in links and tweaking the layout to ensure things like quotes displayed properly.  There are still some further tweaks that I’ll need to implement next week, but we’re almost there.

I also spent some time tweaking the Speak For Yersel website, which is now publicly accessible (https://speakforyersel.ac.uk/) but still not quite finished.  I created a page for a video tour of the resource and made a few tweaks to the layout, such as checking the consistency of font sizes used throughout the site.  I also made some updates to the site text and added in some lengthy static content to the site in the form or a teachers’ FAQ and a ‘more information’ page.  I also changed the order of some of the buttons shown after a survey is completed to hopefully make it clearer that other surveys are available.

I also did a bit of work for the Speech Star project.  There had been some issues with the Central Scottish Phonetic Features MP4s playing audio only on some operating systems and the replacements that Eleanor had generated worked for her but not for me.  I therefore tried uploading them to and re-downloading them from YouTube, which thankfully seemed to fix the issue for everyone.  I then made some tweaks to the interfaces to the two project websites.  For the public site I made some updates to ensure the interface looked better on narrow screens, ensuring changing the appearance of the ‘menu’ button and making the logo and site header font smaller to they take up less space.  I also added an introductory video to the homepage too.

For the Books and Borrowing project I processed the images for another library register.  This didn’t go entirely smoothly.  I had been sent 73 images and these were all upside down so needed rotating.  It then transpired that I should have been sent 273 images so needed to chase up the missing ones.  Once I’d been sent the full set I was then able to generate the page images for the register, upload the images and associate them with the records.

I then moved on to setting up the front-end for the Ayr Place-names website.  In the process of doing so I became aware that one of the NLS map layers that all of our place-name projects use had stopped working.  It turned out that the NLS had migrated this map layer to a third party map tile service (https://www.maptiler.com/nls/) and the old URLs these sites were still using no longer worked.  I had a very helpful chat with Chris Fleet at NLS Maps about this and he explained the situation.  I was able to set up a free account with the maptiler service and update the URLS in four place-names websites that referenced the layer (https://berwickshire-placenames.glasgow.ac.uk/, https://kcb-placenames.glasgow.ac.uk/, https://ayr-placenames.glasgow.ac.uk and https://comparative-kingship.glasgow.ac.uk/scotland/).  I’ll need to ensure this is also done for the two further place-names projects that are still in development (https://mull-ulva-placenames.glasgow.ac.uk and https://iona-placenames.glasgow.ac.uk/).

I managed to complete the work on the front-end for the Ayr project, which was mostly straightforward as it was just adapting what I’d previously developed for other projects.  The thing that took the longest was getting the parish data and the locations where the parish three-letter acronyms should appear, but I was able to get this working thanks to the notes I’d made the last time I needed to deal with parish boundaries (as documented here: https://digital-humanities.glasgow.ac.uk/2021-07-05/.  After discussions with Thomas Clancy about the front-end I decided that it would be a good idea to redevelop the map-based interface to display al of the data on the map by default and to incorporate all of the search and browse options within the map itself.  This would be a big change, and it’s one I had been thinking of implementing anyway for the Iona project, but I’ll try and find some time to work on this for all of the place-name sites over the coming months.

Finally, I had a chat with Kirsteen McCue and Luca Guariento about the BOSLIT project.  This project is taking the existing data for the Bibliography of Scottish Literature in Translation (available on the NLS website here: https://data.nls.uk/data/metadata-collections/boslit/) and creating a new resource from it, including visualisations.  I offered to help out with this and will be meeting with Luca to discuss things further, probably next week.

 

Week Beginning 29th August 2022

I divided my time between a number of different projects this week.  For Speak For Yersel I replaced the ‘click’ transcripts with new versions that incorporated shorter segments and more highlighted words.  As the segments were now different I also needed to delete all existing responses to the ‘click’ activity.  I then completed the activity once for each speaker to test things out, and all seems to work fine with the new data.  I also changed the pop-up ‘percentage clicks’ text to ‘% clicks occurred here’, which is more accurate than the previous text which suggested it was the percentage of respondents.  I also fixed an issue with the map height being too small on the ‘where do you think this speaker is from’ quiz and ensured the page scrolls to the correct place when a new question is loaded.  I also removed the ‘tip’ text from the quiz intros and renamed the ‘where do you think this speaker is from’ map buttons on the map intro page.  I’d also been asked to trim down the number of ‘translation’ questions from the ‘I would never say that’ activity so I removed some of those.  I then changed and relocated the ‘heard in films and TV’ explanatory text removed the question mark from the ‘where do you think the speaker is from’ quiz intro page.

Mary had encountered a glitch with the transcription popups, whereby the page would flicker and jump about when certain popups were hovered over.  This was caused by the page height increasing to accommodate the pop-up, causing a scrollbar to appear in the browser, which changed the position of the cursor and made the pop-up disappear, making the scrollbar go and causing a glitchy loop.  I increased the height of the page for this activity so the scrollbar issue is no longer encountered, and I also made the popups a bit wider so they don’t need to be as long.  Mary also noticed that some of the ‘all over Scotland’ dynamically generated map answers seemed to be incorrect.  After some investigation I realised that this was a bug that had been introduced when I added in the ‘I would never say that’ quizzes on Friday.  A typo in the code meant that the 60% threshold for correct answers in each region was being used rather than ‘100 divided by the number of answer options’.  Thankfully once identified this was easy to fix.

I also participated in a Zoom call for the project this week to discuss the launch of the resource with the University’s media people.  It was agreed that the launch will be pushed back to the beginning of October as this should be a good time to get publicity.  Finally for the project this week I updated the structure of the site so that the ‘About’ menu item could become a drop-down menu, and I created placeholder pages for three new pages that will be added to this menu for things like FAQs.

I also continued to work on the Books and Borrowing project this week.  On Friday last week I didn’t quite get to finish a script to merge page records for one of the St Andrews registers as it needed further testing on my local PC before I ran it on the live data.  I tackled this issue first thing on Monday and it was a task I had hoped would only take half an hour or so.  Unfortunately things did not go well and it took most of the morning to sort out.  I initially attempted to run things on my local PC to test everything out, but I forgot to update the database connection details.  Usually this wouldn’t be an issue as generally the databases I work with use ‘localhost’ as a connection URL, so the Stirling credentials would have been wrong for my local DB and the script would have just quit, but Stirling (where the system is hosted) uses a full URL instead of ‘localhost’.  This meant that even though I had a local copy of the database on my PC and the scripts were running on a local server set up on my PC the scripts were in fact connecting to the real database at Stirling.  This meant the live data was being changed.  I didn’t realise this as the script was running and as it was taking some time I cancelled it, meaning the update quit halfway through changing borrowing records and deleting page records in the CMS.

 

I then had to write a further script to delete all of the page and borrowing records for this register from the Stirling server and reinstate the data from my local database.  Thankfully this worked ok.  I then ran my test script on the actual local database on my PC and the script did exactly what I wanted it to do, namely:

Iterate through the pages and for each odd numbered page move the records on these to the preceding even numbered page, and at the same time regenerate the ‘page order’ for each record so they follow on from the existing records.  Then the even page needs its folio number updated to add in the odd number (e.g. so folio number 2 becomes ‘2-3’) and generate an image reference based on this (e.g. UYLY207-2_2-3).  Then delete the odd page record and after all that is done regenerate the ‘next’ and ‘previous’ page links for all pages.

This all worked so I ran the script on the server and updated the live data.  However, I then noticed that there are gaps in the folio numbers and this has messed everything up.  For example, folio number 314 isn’t followed by 315 but 320.  320 isn’t an odd number so it doesn’t get joined to 314.  All subsequent page joins are then messed up.  There are also two ‘350’ pages in the CMS and two images that reference 350.  We have UYLY207-2_349-350 and also UYLY207-2_350-351.  There might be other situations where the data isn’t uniform too.

I therefore had to use my ‘delete and reinsert’ script again to revert to the data prior to the update as my script wasn’t set up to work with pages that don’t just increment their folio number by 1 each time.  After some discussion with the RA I updated the script again so that it would work with the non-uniform data and thankfully all worked fine after that.  Later in the week I also found some time to process two further St Andrews registers that needed their pages and records merged, and thankfully these went much smoother.

I also worked on the Speech Star project this week.  I created a new page on both of the project’s sites (which are not live yet) for viewing videos of Central Scottish phonetic features.  I also replaced the temporary logos used on the sites with the finalised logos that had been designed by a graphic designer.  However, the new logo only really works well on a white background as the white cut-out round the speech bubble into the star becomes the background colour of the header.  The blue that we’re currently using for the site header doesn’t work so well with the logo colours.  Also, the graphic designer had proposed using a different font for the site and I decided to make a new interface for the site, which you can see below.  I’m still waiting for feedback to see whether the team prefer this to the old interface (a screenshot of which you can see on this page: https://digital-humanities.glasgow.ac.uk/2022-01-17/) but I personally think it looks a lot better.

I also returned to the Burns Manuscript database that I’d begun last week.  I added a ‘view record’ icon to each row which if pressed on opens a ‘card’ view of the record on its own page.  I also added in the search options, which appear in a section above the table.  By default, the section is hidden and you can show/hide it by pressing on a button.  Above this I’ve also added in a placeholder where some introductory text can go.  If you open the ‘Search options’ section you’ll find text boxes where you can enter text for year, content, properties and notes.  For year you can either enter a specific year or a range.  The other text fields are purely free-text at the moment, so no wildcards.  I can add these in but I think it would just complicate things unnecessarily.  On the second row are checkboxes for type, location name and condition.  You can select one or more of each of these.

The search options are linked by AND, and the checkbox options are linked internally by OR.  For example, filling in ‘1780-1783’ for year and ‘wrapper’ for properties will find all rows with a date between 1780 and 1783 that also have ‘wrapper’ somewhere in their properties.  If you enter ‘work’ in content and select ‘Deed’ and ‘Fragment’ as types you will find all rows that are either ‘Deed’ or ‘Wrapper’ and have ‘work’ in their content.

If a search option is entered and you press the ‘Search’ button the page will reload with the search options open, and the page will scroll down to this section.  Any rows matching your criteria will be displayed in the table below this.  You can also clear the search by pressing on the ‘Clear search options’ button.  In addition, if you’re looking at search results and you press on the ‘view record’ button the ‘Return to table’ button on the ‘card’ view will reload the search results.  That’s this mini-site completed now, pending feedback from the project team, and you can see a screenshot of the site with the search box open below:

Also this week I’d arranged an in-person coffee and catch up with the other College of Arts developers.  We used to have these meetings regularly before Covid but this was the first time since then that we’d all met up.  It was really great to chat with Luca Guariento, Stevie Barrett and David Wilson again and to share the work we’d been doing since we last met.  Hopefully we can meet again soon.

Finally this week I helped out with a few WordPress questions from a couple of projects and I also had a chance to update all of the WordPress sites I manage (more than 50) to the most recent version.

 

Week Beginning 25th July 2022

I was on holiday for most of the previous two weeks, working two days during this period.  I’ll also be on holiday again next week, so I’ve had quite a busy time getting things done.  Whilst I was away I dealt with some queries from Joanna Kopaczyk about the Future of Scots website.  I also had to investigate a request to fill in timesheets for my work on the Speak For Yersel project, as apparently I’d been assigned to the project as ‘Directly incurred’ when I should have been ‘Directly allocated’.  Hopefully we’ll be able to get me reclassified but this is still in-progress.  I also fixed a couple of issues with the facility to export data for publication for the Berwickshire place-name project for Carole Hough, and fixed an issue with an entry in the DSL, which was appearing in the wrong place in the dictionary.  It turned out that the wrong ‘url’ tag had been added to the entry’s XML several years ago and since then the entry was wrongly positioned.  I fixed the XML and this sorted things.  I also responded to a query from Geert of the Anglo-Norman Dictionary about Aberystwyth’s new VPN and whether this would affect his access to the AND.  I also investigated an issue Simon Taylor was having when logging into a couple of our place-names systems.

On the Monday I returned to work I launched two new resources for different projects.  For the Books and Borrowing project I published the Chambers Library Map (https://borrowing.stir.ac.uk/chambers-library-map/) and reorganised the site menu to make space for the new page link.  The resource has been very well received and I’m pretty pleased with how it’s turned out.  For the Seeing Speech project I launched the new Gaelic Tongues resource (https://www.seeingspeech.ac.uk/gaelic-tongues/) which has received a lot of press coverage, which is great for all involved.

I spent the rest of the week dividing my time primarily between three projects:  Speak For Yersel, Books and Borrowing and Speech Star.  For Books and Borrowing I continued processing the backlog of library register image files that has built up.  There were about 15 registers that needed to be processed, and each needed to be handled in a different way.  This included nine registers from Advocates Library that had been digitised by the NLS, for which I needed to batch process the images to rename them, delete blank pages, create page records in the CMS and then tweak the automatically generated folio numbers to account for discrepancies in the handwritten page number in the images.  I also processed a register for the Royal High School, which involved renaming the images so they match up with image numbers already assigned to page records in the CMS, inserting new page records and updating the ‘next’ and ‘previous’ links for pages for which new images had been uncovered and generating new page records for many tens of new pages that follow on from the ones that have already been created in the CMS.  I also uploaded new images for the Craigston register and created a new register including all page records and associated image URLs for a further register for Aberdeen.  I still have some further RHS registers to do and a few from St Andrews, but these will need to wait until I’m back from my holiday.

For Speech Star I downloaded a ZIP containing 500 new ultrasound MP4 videos.  I then had to process them to generate ‘poster’ images for each video (these are images that get displayed before the user chooses to play the video).  I then had to replace the existing normalised speech database with data from a new spreadsheet that included these new videos plus updates to some of the existing data.  This included adding a few new fields and changing the way the age filter works, as much of the new data is for child speakers who have specific ages in months and years, and these all need to be added to a new ‘under 18’ age group.

For Speak For Yersel I had an awful lot to do.  I started with a further large-scale restructuring of the website following feedback from the rest of the team.  This included changing the site menu order, adding in new final pages to the end of surveys and quizzes and changing the text of buttons that appear when displaying the final question.

I then developed the map filter options for age and education for all of the main maps.  This was a major overhaul of the maps.  I removed the slide up / slide down of the map area when an option is selected as this was a bit long and distracting.  Now the map area just updates (although there is a bit of a flicker as the data gets replaced).  The filter options unfortunately make the options section rather big, which is going to be an issue on a small screen.  On my mobile phone the options section takes up 100% of the width and 80% of the height of the map area unless I press the ‘full screen’ button.  However, I figured out a way to ensure that the filter options section scrolls if the content extends beyond the bottom of the map.

I also realised that if you’re in full screen mode and you select a filter option the map exits full screen as the map section of the page reloads.  This is very annoying, but I may not be able to fix it as it would mean completely changing how the maps are loaded.  This is because such filters and options were never intended to be included in the maps and the system was never developed to allow for this.  I’ve had to somewhat shoehorn in the filter options and it’s not how I would have done things had I known from the beginning that these options were required.  However, the filters work and I’m sure they will be useful.  I’ve added in filters for age, education and gender, as you can see in the following screenshot:

I also updated the ‘Give your word’ activity that asks to identify younger and older speakers to use the new filters too.  The map defaults to showing ‘all’ and the user then needs to choose an age.  I’m still not sure how useful this activity will be as the total number of dots for each speaker group varies considerably, which can easily give the impression that more of one age group use a form compared to another age group purely because one age group has more dots overall.  The questions don’t actually ask anything about geographical distribution so having the map doesn’t really serve much purpose when it comes to answering the question.  I can’t help but think that just presenting people with percentages would work better, or some other sort of visualisation like a bar graph or something.

I then moved on to working on the quiz for ‘she sounds really clever’ and so far I have completed both the first part of the quiz (questions about ratings in general) and the second part (questions about listeners from a specific region and their ratings of speakers from regions).  It’s taken a lot of brain-power to get this working as I decided to make the system work out the correct answer and to present it as an option alongside randomly selected wrong answers.  This has been pretty tricky to implement (especially as depending on the question the ‘correct’ answer is either the highest or the lowest) but will make the quiz much more flexible – as the data changes so will the quiz.

Part one of the quiz page itself is pretty simple.  There is the usual section on the left with the question and the possible answers.  On the right is a section containing a box to select a speaker and the rating sliders (readonly).  When you select a speaker the sliders animate to their appropriate location.  I decided to not include the map or the audio file as these didn’t really seem necessary for answering the questions, they would clutter up the screen and people can access them via the maps page anyway (well, once I move things from the ‘activities’ section).  Note that the user’s answers are stored in the database (the region selected and whether this was the correct answer at the time).  Part two of the quiz features speaker/listener true/false questions and this also automatically works out the correct answer (currently based on the 50% threshold).  Note that where there is no data for a listener rating a speaker from a region the rating defaults to 50.  We should ensure that we have at least one rating for a listener in each region before we let people answer these questions.  Here is a screenshot of part one of the quiz in action, with randomly selected ‘wrong’ answers and a dynamically outputted ‘right’ answer:

I also wrote a little script to identify duplicate lexemes in categories in the Historical Thesaurus as it turns out there are some occasions where a lexeme appears more than once (with different dates) and this shouldn’t happen.  These will need to be investigated and the correct dates will need to be established.

I will be on holiday again next week so there won’t be another post until the week after I’m back.

 

Week Beginning 4th July 2022

I had a lovely week’s holiday last week and returned to work for one week only before I head off for a further two weeks.  I spent most of my time this week working on the Speak For Yersel project implementing a huge array of changes that the team wanted to make following the periods of testing in schools a couple of weeks ago.  There were also some new sections of the resource to work on as well.

By Tuesday I had completed the restructuring of the site as detailed in the ‘Roadmap’ document, meaning the survey and quizzes have been separated, as have the ‘activities’ and ‘explore maps’.  This has required quite a lot of restructuring of the code, but I think all is working as it should.  I also updated the homepage text.  One thing I wasn’t sure about is what should happen when the user reaches the end of the survey.  Previously this led into the quiz, but for now I’ve created a page that provides links to the quiz, the ‘more activities’ and the ‘explore maps’ options for the survey in question.

The quizzes should work as they did before, but they now have their own progress bar.  Currently at the end of the quiz the only link offered is to explore the maps, but we should perhaps change this.  The ‘more activities’ work slightly differently to how these were laid out in the roadmap.  Previously a user selected an activity then it loaded an index page with links to the activities and the maps.  As the maps are now separated this index page was pretty pointless, so instead when you select an activity it launches straight into it.  The only one that still has an index page is the ‘Clever’ one as this has multiple options.  However, thinking about this activity:  it’s really just an ‘explore’ like the ‘explore maps’ rather than an actual interactive activity per se, so we should perhaps move this to the ‘explore’ page.

I also made all of the changes to the ‘sounds about right’ survey including replacing sound files and adding / removing questions.  I ended up adding a new ‘question order’ field to the database and questions are now ordered using this, as previously the order was just set by the auto-incrementing database ID which meant inserting a new question to appear midway through the survey was very tricky.  Hopefully this change of ordering hasn’t had any knock-on effects elsewhere.

I then made all of the changes to two other activities:  the ‘lexical’ one and the ‘grammatical’ one.  These included quite a lot of tweaks to questions, question options, question orders and the number of answers that could be selected for questions.  With all of this in place I moved onto the ‘Where do you think this speaker is from’ sections.  The ‘survey’ now only consists of the click map and when you press the ‘Check Answers’ button some text appears under the buttons with links through to where the user can go next.

For the ‘more activities’ section the main click activity is now located here.  It took quite a while to get this to work, as moving sections introduced some conflicts in the code that were a bit tricky to identify.  I replaced the explanatory text and I also added in the limit to the number of presses.  I’ve added a section to the right of the buttons that displays the number of presses the user has left.  Once there are no presses left the ‘Press’ button gets disabled.  I still think people are going to reach the 5 click limit too soon and will get annoyed when they realise they can’t add further clicks and they can’t reset the exercise to give it another go.  After you’ve listened to the four speakers a page is displayed saying you’ve completed the activity and giving links to other parts.  Below is a screenshot of the new ‘click’ activity with the limit in place (and also the new site menu):

 

The ’Quiz’ has taken quite some time to implement but is now fully operational.  I had to do a lot of work behind the scenes to get the percentages figured out and to get the quiz to automatically work out which answer should be the correct one, but it all works now.  The map displays the ‘Play’ icons as I figured people would want to be able to hear the clips as well as just see the percentages.  Beside each clip icon the percentage of respondents who correctly identified the location of the speaker is displayed.  The markers are placed at the ‘correct’ points on the map, as shown when you view the correct locations in the survey activities.  Question 1 asks you to identify the most recognised, question 2 the least recognised.  Quiz answers are logged in the database so we’ll be able to track answers.  Here’s a screenshot of the quiz:

I also added the percentage map to the ‘explore maps’ page too, and I gave people the option of focussing on the answers submitted from specific regions.  An ‘All regions’ map displays the same data as the quiz map, but then the user can choose (for example) Glasgow and view the percentages of correctly identified speakers that respondents from the Glasgow area identified, thus allowing them to compare how people in each area managed to identify speakers in the areas.  I decided to add a count of the number of people that have responded too.

The ‘explore maps’ for ‘guess the region’ has a familiar layout – buttons on the left that when pressed on load a map on the right.  The buttons correspond to the region of people who completed the ‘guess the region’ survey.  The first option shows the answers of all respondents from all regions.  This is exactly the same as the map in the quiz, except I’ve also displayed the number of respondents above the map.  Two things to be aware of:

Firstly, a respondent can complete the quiz as many times as they want, so each respondent may have multiple datasets.  Secondly, the click map (both quiz and ‘explore maps’) currently includes people from outside of Scotland as well as people who selected an area when registering.  There are currently 18 respondents and 3 of these are outside of Scotland.

When you click on a specific region button in the left-hand column the results of respondents from that specific region only are displayed on the map.  The number of respondents is also listed above the map.  Most of the regions currently have no respondents, meaning an empty map is displayed and a note above the map explains why.  Ayrshire has one respondent.  Glasgow has two.  Note that the reason there are such varied percentages in Glasgow from just two respondents (rather than just 100%, 50% and 0%) is because one or more of the respondents has completed the quiz more than once.  Lothian has two respondents.  North East has 10.  Here’s how the maps look:

On Friday I began to work on the ‘click transcription’ visualisations, which will display how many times speakers have clicked in each of the sections of the transcriptions they listen to in the ‘click’ activity.  I only managed to get as far as writing the queries and scripts to generate the data, rather than any actual visualisation of the data.  When looking at the aggregated data for the four speakers I discovered that the distribution of clicks across sections was a bit more uniform that I thought it might be.  We might need to consider how we’re going to work out the thresholds for different sizes.  I was going to base it purely on the number of clicks, but I realised that this would not work as the more responses we get the more clicks there will be.  Instead I decided to use percentages of the total number of clicks for a speaker.  E.g. for speaker 4 there are currently a total of 65 clicks so the percentages for each section would be:

 

11% Have you seen the TikTok vids with the illusions?
6% They’re brilliant!
9% I just watched the glass one.
17% The guy’s got this big glass full of water in his hands.
8% He then puts it down,
8% takes out one of those big knives
6% and slices right through it.
6% I sometimes get so fed up with Tiktok
8% – really does my head in –
8% but I’m not joking,
14% I want to see more and more of this guy.

 

(which adds up to 101% with rounding).  But what should the thresholds be?  E.g. 0-6% = regular, 7-10% = bigger, 11-15% even bigger, 16%+ biggest?  I’ll need input from the team about this.  I’m not a statistician but there may be better approaches, such as using standard deviation and such things.

I still have quite a lot of work to do for the project, namely:  Completing the ‘where do you think the speaker is from’ as detailed above; implementing the ‘she sounds really clever’ updates; adding in filter options to the map (age ranges and education levels); investigating dynamically working out the correct answers to map-based quizzes.

In addition to my Speak For Yersel work I participated in an interview with the AHRC about the role of technicians in research projects.  I’d participated in a focus group a few weeks ago and this was a one-on-one follow-up video call to discuss in greater detail some of the points I’d raised in the focus group.  It was a good opportunity to discuss y role and some of the issues I’ve encountered over the years.

I also installed some new themes for the OHOS project website and fixed an issue with the Anglo-Norman Dictionary website, as the editor had noticed that cognate references were not always working.  After some investigation I realised that this was happening when the references for a cognate dictionary included empty tags as well as completed tags.  I had to significantly change how this section of the entry is generated in the XSLT from the XML, which took some time to implement and test.  All seems to be working, though.

I also did some work for the Books and Borrowing project.  Whilst I’d been on holiday I’d been sent page images for a further ten library registers and I needed to process these.  This can be something of a time-consuming process as each set of images needs to be processed in a different way, such as renaming images, removing unnecessary images at the start and end, uploading the images to the server, generating the page images for each register and then bringing the automatically generated page numbers into line with any handwritten page numbers on the images, which may not always be sequentially numbered.  I processed two registers for the Advocates library from the NLS and three registers from Aberdeen library.  I looked into processing the images for a register from the High School of Edinburgh, but I had some questions about the images and didn’t hear back from the researcher before the end of the week, so I needed to leave these.  The remaining registers were from St Andrews and I had further questions about these, as the images are double-page spreads but existing page records in the CMS treat each page separately.  As the researcher dealing with St Andrews was on holiday I’ll need to wait until I’m back to deal with these too.

Also this week I completed the two mandatory Moodle courses about computer security and GDPR, which took a bit longer that I thought they might.

Week Beginning 20th June 2022

I completed an initial version of the Chambers Library map for the Books and Borrowing project this week.  It took quite a lot of time and effort to implement the subscription period range slider.  Searching for a range when the data also has a range of dates rather than a single date means we needed to make a decision about what data gets returned and what doesn’t.  This is because the two ranges (the one chosen as a filter by the user and the one denoting the start and end periods of subscription for each borrower) can overlap in many different ways.  For example, the period chosen by the user is 05 1828 to 06 1829.  Which of the following borrowers should therefore be returned?

  1. Borrowers range is 06 1828 to 02 1829: Borrower’s range is fully within the period so should definitely be included
  2. Borrowers range is 01 1828 to 07 1828: Borrower’s range extends beyond the selected period at the start and ends within the selected period.  Presumably should be included.
  3. Borrowers range is 01 1828 to 09 1829: Borrower’s range extends beyond the selected period in both directions.  Presumably should be included.
  4. Borrowers range is 05 1829 to 09 1829: Borrower’s range begins during the selected period and ends beyond the selected period. Presumably should be included.
  5. Borrowers range is 01 1828 to 04 1828: Borrower’s range is entirely before the selected period. Should not be included
  6. Borrowers range is 07 1829 to 10 1829: Borrower’s range is entirely after the selected period. Should not be included.

Basically if there is any overlap between the selected period and the borrower’s subscription period the borrower will be returned.  But this means most borrowers will always be returned a lot of the time.  It’s a very different sort of filter to one that purely focuses on a single date – e.g. filtering the data to only those borrowers whose subscription periods *begins* between 05 1828 and 06 1829.

Based on the above assumptions I began to write the logic that would decide which borrowers to include when the range slider is altered.  It was further complicated by having to deal with months as well as years.  Here’s the logic in full if you fancy getting a headache:

if(((mapData[i].sYear>startYear || (mapData[i].sYear==startYear && mapData[i].sMonth>=startMonth)) && ((mapData[i].eYear==endYear && mapData[i].eMonth <=endMonth) || mapData[i].eYear<endYear)) || ((mapData[i].sYear<startYear ||(mapData[i].sYear==startYear && mapData[i].sMonth<=startMonth)) && ((mapData[i].eYear==endYear && mapData[i].eMonth >=endMonth) || mapData[i].eYear>endYear)) || ((mapData[i].sYear==startYear && mapData[i].sMonth<=startMonth || mapData[i].sYear>startYear) && ((mapData[i].eYear==endYear && mapData[i].eMonth <=endMonth) || mapData[i].eYear<endYear) && ((mapData[i].eYear==startYear && mapData[i].eMonth >=startMonth) || mapData[i].eYear>startYear)) || (((mapData[i].sYear==startYear && mapData[i].sMonth>=startMonth) || mapData[i].sYear>startYear) && ((mapData[i].sYear==endYear && mapData[i].sMonth <=endMonth) || mapData[i].sYear<endYear) && ((mapData[i].eYear==endYear && mapData[i].eMonth >=endMonth) || mapData[i].eYear>endYear)) || ((mapData[i].sYear<startYear ||(mapData[i].sYear==startYear && mapData[i].sMonth<=startMonth)) && ((mapData[i].eYear==startYear && mapData[i].eMonth >=startMonth) || mapData[i].eYear>startYear)))

I also added the subscription period to the popups.  The only downside to the range slider is that the occupation marker colours change depending on how many occupations are present during a period, so you can’t always tell an occupation by its colour. I might see if I can fix the colours in place, but it might not be possible.

I also noticed that the jQuery UI sliders weren’t working very well on touchscreens so installed the jQuery TouchPunch library to fix that (https://github.com/furf/jquery-ui-touch-punch).  I also made the library marker bigger and gave it a white border to more easily differentiate it from the borrower markers.

I then moved onto incorporating page images in the resource too.  Where a borrower has borrower records the relevant pages where these borrowing records are found now appear as thumbnails in the borrower popup.  These are generated by the IIIF server based on dimensions passed to it, which is much nicer than having to generate and store thumbnails directly.  I also updated the popup to make it wider when required to give more space for the thumbnails.  Here’s a screenshot of the new thumbnails in action:

Clicking on a thumbnail opens a further popup containing a zoomable / pannable image of the page.  This proved to be rather tricky to implement.  Initially I was going to open a popup in the page (outside of the map container) using a jQuery UI Dialog.  However, I realised that this wouldn’t work when the map was being viewed in full-screen mode, as nothing beyond the map container is visible in such circumstances.  I then considered opening the image in the borrower popup but this wasn’t really big enough.  I then wondered about extending the ‘Map options’ section and replacing the contents of this with the image, but this then caused issues for the contents of the ‘Map options’ section, which didn’t reinitialise properly when the contents were reinstated.  I then found a plugin for the Leaflet mapping library that provides a popup within the map interface (https://github.com/w8r/Leaflet.Modal) and decided to use this.  However, it’s all a little complex as the popup then has to include another mapping library called OpenLayers that enables the zooming and panning of the page image, all within the framework of the overall interactive map.  It is all working and I think it works pretty well, although I guess the map interface is a little cluttered, what with the ‘Map Options’ section, the map legend, the borrower popup and then the page image popup as well.  Here’s a screenshot with the page image open:

All that’s left to do now is add in the introductory text once Alex has prepared it and then make the map live.  We might need to rearrange the site’s menu to add in a link to the Chambers Map as it’s already a bit cluttered.

Also for the project I downloaded images for two further library registers for St Andrews that had previously been missed.  However, there are already records for the registers and pages in the CMS so we’re going to have to figure out a way to work out which image corresponds to which page in the CMS.  One register has a different number of pages in the CMS compared to the image files so we need to work out how to align the start and end and if there are any gaps or issues in the middle.  The other register is more complicated because the images are double pages whereas it looks like the page records in the CMS are for individual pages.  I’m not sure how best to handle this.  I could either try and batch process the images to chop them up or batch process the page records to join them together.  I’ll need to discuss this further with Gerry, who is dealing with the data for St Andrews.

Also this week I prepared for and gave a talk to a group of students from Michigan State University who were learning about digital humanities.  I talked to them for about an hour about a number of projects, such as the Burns Supper map (https://burnsc21.glasgow.ac.uk/supper-map/), the digital edition I’d created for New Modernist Editing (https://nme-digital-ode.glasgow.ac.uk/), the Historical Thesaurus (https://ht.ac.uk/), Books and Borrowing (https://borrowing.stir.ac.uk/) and TheGlasgowStory (https://theglasgowstory.com/).  It went pretty and it was nice to be able to talk about some of the projects I’ve been involved with for a change.

I also made some further tweaks to the Gentle Shepherd Performances page which is now ready to launch, and helped Geert out with a few changes to the WordPress pages of the Anglo-Norman Dictionary.  I also made a few tweaks to the WordPress pages of the DSL website and finally managed to get a hotel room booked for the DHC conference in Sheffield in September.  I also made a couple of changes to the new Gaelic Tongues section of the Seeing Speech website and had a discussion with Eleanor about the filters for Speech Star.  Fraser had been in touch with about 500 Historical Thesaurus categories that had been newly matched to OED categories so I created a little script to add these connections to the online database.

I also had a Zoom call with the Speak For Yersel team.  They had been testing out the resource at secondary schools in the North East and have come away with lots of suggested changes to the content and structure of the resource.  We discussed all of these and agreed that I would work on implementing the changes the week after next.

Next week I’m going to be on holiday, which I have to say I’m quite looking forward to.