Week Beginning 24th January 2022

I had a very busy week this week, working on several different projects.  For the Books and Borrowing project I participated in the team Zoom call on Monday to discuss the upcoming development of the front-end and API for the project, which will include many different search and browse facilities, graphs and visualisations.  I followed this up with a lengthy email to the PI and Co-I where I listed some previous work I’ve done and discussed some visualisation libraries we could use.  In the coming weeks I’ll need to work with them to write a requirements document for the front-end.  I also downloaded images from Orkney library, uploaded all of them to the server and generated the necessary register and page records.  One register with 7 pages already existed in the system and I ensured that page images were associated with these and the remaining pages of the register fit in with the existing ones.  I also processed the Wigtown data that Gerry McKeever had been working on, splitting the data associated with one register into two distinct registers, uploading page images and generating the necessary page records.  This was a pretty complicated process, and I still need to complete the work on it next week, as there are several borrowing records listed as separate rows when in actual fact they are merely another volume of the same book borrowed at the same time.  These records will need to be amalgamated.

For the Speak For Yersel project I had a meeting with the PI and RA on Monday to discuss updates to the interface I’ve been working on, new data for the ‘click’ exercise and a new type of exercise that will precede the ‘click’ exercise and will involve users listening to sound clips then dragging and dropping them onto areas of a map to see whether they can guess where the speaker is from.  I spent some time later in the week making all of the required changes to the interface and the grammar exercise, including updating the style used for the interactive map and using different marker colours.

I also continued to work on the speech database for the Speech Star project based on feedback I received about the first version I completed last week.  I added in some new introductory text and changed the order of the filter options.  I also made the filter option section hidden by default as it takes up quite a lot of space, especially on narrow screens.  There’s now a button to show / hide the filters, with the section sliding down or up.  If a filter option is selected the section remains visible by default.  I also changed the colour of the filter option section to a grey with a subtle gradient (it gets lighter towards the right) and added a similar gradient to the header, just to see how it looks.

The biggest update was to the filter options, which I overhauled so that instead of a drop-down list where one option in each filter type can be selected there are checkboxes for each filter option, allowing multiple items of any type to be selected.  This was a fairly large change to implement as the way selected options are passed to the script and the way the database is queried needed to be completely changed.  When an option is selected the page immediately reloads to display the results of the selection and this can also change the contents of the other filter option boxes – e.g. selecting ‘alveolar’ limits the options in the ‘sound’ section.  I also removed the ‘All’ option and left all checkboxes unselected by default.  This is how filters on clothes shopping sites do it – ‘all’ is the default and a limit is only applied if an option is ticked.

I also changed the ‘accent’ labels as requested, changed the ‘By Prompt’ header to ‘By Word’ and updated the order of items in the ‘position’ filter.  I also fixed an issue where ‘cheap’ and ‘choose’ were appearing in a column instead of the real data.  Finally, I made the overlay that appears when a video is clicked on darker so it’s more obvious that you can’t click on the buttons.  I did investigate whether it was possible to have the popup open while other page elements were still accessible but this is not something that the Bootstrap interface framework that I’m using supports, at least not without a lot of hacking about with its source code.  I don’t think it’s worth pursuing this as the popup will cover much of the screen on tablets / phones anyway, and when I add in the option to view multiple videos the popup will be even larger.

Also this week I made some minor tweaks to the Burns mini-project I was working on last week and had a chat with the DSL people about a few items, such as the data import process that we will be going through again in the next month or so and some of the outstanding tasks that I still need to tackle with the DSL’s interface.

I also did some work for the AND this week, investigating a weird timeout error that cropped up on the new server and discussing how best to tackle a major update to the AND’s data.  The team have finished working on a major overhaul of the letter S and this is now ready to go live.  We have decided that I will ask for a test instance of the AND to be set up so I can work with the new data, testing out how the DMS runs on the new server and how it will cope with such a large update.

The editor, Geert, had also spotted an issue with the textbase search, which didn’t seem to include one of the texts (Fabliaux) he was searching for.  I investigated the issue and it looked like the script that extracted words from pages may have silently failed in some cases.  There are 12,633 page records in the textbase, each of which has a word count.  When the word count is greater than zero my script processes the contents of the page to generate the data for searching.  However, there appear to be 1889 pages in the system that have a word count of zero, including all of Fabliaux.  Further investigation revealed that my scripts expect the XML to be structured with the main content in a <body> tag.  This cuts out all of the front matter and back matter from the searches, which is what we’d agreed should happen and thankfully accounts for many of the supposedly ‘blank’ pages listed above as they’re not the actual body of the text.

However, Fabliaux doesn’t include the <body> tag in the standard way.  In fact, the XML file consists of multiple individual texts, each of which has a separate <body> tag.  As my script didn’t find a <body> in the expected place no content was processed.  I ran a script to check the other texts and the following also have a similar issue:  gaunt1372 (710 pages) and polsongs (111 pages), in addition to the 37 pages of Fabliaux.  Having identified these I could update my script that generates search words and re-ran it for these texts, fixing the issue.

Also this week I attended a Zoom-based seminar on ‘Digitally Exhibiting Textual Heritage’ that was being run by Information Studies.  This featured four speakers from archives, libraries and museums discussing how digital versions of texts can be exhibited, both in galleries and online.  Some really interesting projects were discussed, both past and present.  This included the BL’s ‘Turning the Pages’ system (http://www.bl.uk/turning-the-pages/) , some really cool transparent LCD display cases (https://crystal-display.com/transparent-displays-and-showcases/) that allow images to be projected on clear glass while objects behind the panel are still visible.  3d representations of gallery spaces were discussed (e.g. https://www.lib.cam.ac.uk/ghostwords), as were ‘long form narrative scrolls’ such as https://www.nytimes.com/projects/2012/snow-fall/index.html#/?part=tunnel-creek,  http://www.wolseymanuscripts.ac.uk/ and https://stories.durham.ac.uk/journeys-prologue/.  There is a tool that can be used to create these here: https://shorthand.com/.  It was a very interesting session!