Week Beginning 1st July 2024

I was back at work this week after a lovely week’s holiday in Mexico.  My biggest task of the week was to migrate the ‘map first’ place-names interface I’ve been working on for the Iona place-names project to the Ayr place-names resource.  The version I’ve created for the Iona project has not yet launched and the Ayr project is nearing completion so it looks like this might be the project that gets to launch the new interface first, but we’ll see.

It took a bit of time to migrate the interface as the Iona project is slightly different to the previous place-names projects I’ve worked on (Ayr, Galloway Glens and Berwickshire).  The resource has many bilingual fields to record Gaelic versions, it has a ‘flattened’ altitude range due to Iona being so close to the sea and it doesn’t feature parish boundaries as Iona is all in one parish.  To get the resource to work with the Ayr data I had to address all of these issues, wit the parish boundaries being the trickiest to get working.  In order to integrate these I needed to update the map’s ‘Display Options’ to add the option in and then update my code to incorporate a new display option.  As this needed to be represented in the page URL I needed to shift all parts of the URL relating to the search, browse and viewing of records along by one, and my code then needed to ensure this shift was represented throughout.  It was a bit of a pain to sort out but I got there.

I also made a new colour scheme for the Ayr map interface to differentiate it from the Iona one, but this is only temporary and we might change this before we go live.  I also added in a logo for the map (an ‘A’ for ‘Ayr’ taken from the background map of the website) but again this might be changed to something else (e.g. a ‘C’ or the Coalfield Communities icon if this is allowed), or removed entirely before the resource launches.  Below is a screenshot of the new map interface showing the results of a quick search for ‘Burn’ with place-name labels and parish boundaries (the orange lines) turned on using the OS1881 map:

I had two meetings this week, both on Tuesday.  The first was with my line manager Marc Alexander to discuss the logistics of the new project the Books and Borrowing team are putting a proposal together for.  I can’t say too much about this for the moment.  The second was with Deven Parker to discuss her Playbills project with a couple of people from Computing Science who are researching AI.  It was interesting to hear the possibilities that might be offered by AI in terms of extracting data form the Playbill images, although I think that we’ll need to discuss things with them in more detail if we are to ensure they fully understand the data and what we need to get out of it.  More discussions will no doubt follow.

I had an email discussion with Sofia from the Iona project about further updates to the map interface and to explain how to successfully import the data exported from the resource into Excel in a way that would ensure accented characters are not garbled.  I also gave some advice to Pauline Graham of the DSL about creating user accounts and helped Pauline MacKay of Scottish Literature with some issues she’d been having accessing the content management system for the Burns correspondence.  This project also encountered an issue later in the week whereby the scripts weren’t executing but were instead downloading.  This was very concerning and turned out to be a problem with our hosting company that we were thankfully able to fix.

Also this week I added feedback questionnaire popups, pages and menu items to the https://www.seeingspeech.ac.uk/speechstar/ and https://speechstar.ac.uk/ websites, made a few further tweaks to the text of each resource and gave some advice to Eleanor about accessing the Google Analytics stats for each site.

I was also asked to add a further library register to the Books and Borrowing resource.  This consisted of around 190 images, which were supplied as PNG files.  Unfortunately we need the images to be JPEGs to be consistent with all of our other images so I needed to figure out a way to convert the images.  Batch converting images from PNG to JPEG seems like the sort of thing that should be straightforward to do using Photoshop or even just in Windows, but despite trying several methods I didn’t find anything that worked.  Eventually I installed the ImageMagick command-line tool and used a single command:

magick mogrify -format jpg *.png

That converted all of the files in one go, as detailed here: https://imagemagick.org/script/mogrify.php.  Unfortunately I then discovered that my access to the Stirling VPN had been blocked so I was unable to access the project’s server.  Unblocking my access required authorisation from a few people and by the end of the week the process had still not been completed so as of yet I haven’t been able to complete this task.

Finally this week we received some feedback from the testing of the new Wales ‘Speak For Yersel’ resource, which required me to make many changes to the project’s data.  This included replacing existing and adding new sound files, adding new answer options, updating questions and adding new ones.  I also fixed a bug that had been caused by a difference in the way blank fields were stored in the database for new questions that I’d added a while back.  These were classed as ‘empty’ for previously created data but ‘null’ for the new data (two different things in databases) and my code wasn’t dealing with the ‘null’ values properly.  I made the fields blank instead, which has thankfully sorted things.  Note that the data was all recorded successfully for survey responses – the issue was purely with their display on the maps.  I also spotted that some map markers for the ‘Mate’ question were not displaying properly due to the sheer number of possible options being greater than my code was set up to work with.  I added in some new marker colours and this addressed the issue.

Week Beginning 17th June 2024

I had a couple of meetings this week, the first of which was with the Books and Borrowing team to discuss a potential new project.  I had to do a fair bit of preparation for this and had several discussions following the meeting too.  It’s all looks really great, but I can’t go into any more details at this stage.  My second meeting was with Deven Parker to discuss her Playbills project.  This was also a very productive meeting and plans are beginning to come together.  I’ve been invited to participate in a meeting Deven is having with some AI people (erm, that’s people studying AI rather than ‘AI people’… at least I hope so) in Computing Science next month, which I’m pretty excited about.

I spent quite a lot of time this week continuing with my mockups of a new interface for the Dictionaries of the Scots Language website.  I can’t share any screenshots of my work yet, but this week I added in the ‘Add yogh’ button to the ‘Older Scots’ search bar (both index page and entry page).  This appears as a button to the left of the bar with ‘What is yogh’ added as a link underneath.  Pressing on the ‘Add’ button adds the yogh to the input.  Pressing on the ‘What…’ link opens a modal overlay featuring the explanatory text.  I also created a mockup of the ‘advanced search’ form.  As with the live site, the advanced search features a tab for entries and another for bibliography.  Each has a search form section and a help section and the layout is pretty similar to the live site, but has been modernised and tidied up.  As with the live site, pressing on the ‘In’ buttons in the entry search changes which search options are visible and the layout works a lot better on mobile screens than the live site does.

Finally I completed a mockup of the ‘About’ page.  The page features boxes for the ‘top level’ information, each of which appears as a link (not currently linking to anywhere).  Where an information type has subpages these appear in a list within the box.  Initially I wasn’t going to add a quick search box to the ‘About’ pages, but decided it would be better to do so.  However, as these pages are not within a selected dictionary section the search bar needs to include a dictionary selector.  I’ve added this to the left of the search and hopefully it should be intuitive to use.  At the moment the yogh information only appears when ‘Older Scots’ is selected and is hidden when ‘Modern Scots’ is selected.  When Older Scots is selected the search bar text input does get a bit small on mobile screens, but it’s still perfectly usable.  I’d envisage such a bar appearing on all of the ancillary pages, but we would retain the specific dictionary searches when in a dictionary.

I also continued with updates to the Anglo-Norman Dictionary this week, using my newly created workflow to add a further six texts to the site’s Textbase.  I also rearranged the ‘browse’ page so that the texts are now arranged by genre, and there are buttons to jump straight to a genre you’re interested in.  You can view the updated feature here: https://anglo-norman.net/textbase-browse/ and below is a screenshot:

Also this week I added further content to the Speechstar website (more videos and an additional ‘Phonemic target’ metadata field for all records listed here: https://www.seeingspeech.ac.uk/speechstar/disordered-child-speech-sentences-database/.  I also went through the other site (https://speechstar.ac.uk/) to rename to project from ‘Speech Star’ to ‘SpeechSTAR’ wherever this text appears.  I also had to spend rather a lot of time creating timesheets for every month I’ve worked on the project since July 2021 due to my time having been costed incorrectly.  As you can imagine, this was quite a long and tedious task, but thankfully it was made easier by having this blog to consult.

I’ll be on holiday next week so there will be no more from me until the start of July.

Week Beginning 27th May 2024

Monday was the late May bank holiday this week, so it was a four-day week for me.  On Tuesday I had an online meeting with Tony Harris, a developer at Cambridge who will be working on a project about Middle English Lexicon.  The project involves Louise Sylvester, who is in charge of the Bilingual Thesaurus of Everyday Live in Medieval England (https://thesaurus.ac.uk/bth/) which I developed, and this new project is going to expand upon the data held in this resource.  We had a good chat about the Bilingual Thesaurus and the technologies I’d used to put it together, and discussed some ways in which the new project might function from a technical perspective.  It’s likely that we’ll meet again in the coming months to expand upon our ideas and I’ll probably be involved with the project in some small capacity.

Also on Tuesday fellow Arts developer Stevie Barrett and I met with the ‘Technical Champion’ for the College of Arts and Humanities, Aris Palyvos who is a technician in Archaeology.  We had a good chat about the role of technicians in the College and how we can improve our visibility.  We now have a Teams group for technicians and hopefully we’ll be able to meet up with some of the others in the College in the coming months.

Last week I’d started work on a content management system for Burns correspondence and I spent a bit of time this week finishing things off.  I’ve given Craig and Pauline in Scottish Literature access to the CMS now and they have someone starting next week who will be using the system so I’ll just need to see how they get on and if they request any changes.

Also this week I made a few further tweaks to the new Speak For Yersel survey regions, fixed a couple of typos on the Speech Star website and helped to resolve a fairly serious issue with the Books and Borrowing website.  The IIIF server that the website uses had gone offline, meaning none of the images of register pages were loading.  Our usual IT guy at Stirling was out of office, but thankfully someone else there was able to get things up and running again.

I also completed my work on the migration of the British Association for Romantic Studies’ journal the BARS Review (https://www.bars.ac.uk/review).  This has taken quite some time over the past few weeks to get sorted but it’s now up and running.  To get it working I needed to switch the PHP version the site was using from a rather ancient version to the current version, and thankfully the other parts of the site that don’t use the OJS system were not adversely affected by this change.  I also took the opportunity to add Google reCAPTCHA to the site for registration and login, which should hopefully stop the spam registrations.  Registration also now requires the user to verify their registration via an email.  I also made a few additional security updates that I’d better not discuss here.

I spent the rest of the week working for the Dictionaries of the Scots Language, making a few tweaks to the advanced search on our test server, investigating some issues and replying to emails.  I also made a fairly major change to the sparkline data for entries so that dates of attestation beyond the period of each dictionary are handled in a different manner.  Previously all such dates were bundled together as the start or end date and then this date was used to generate blocks for the sparkline visualisation.  For example, ‘Abeich’ in SND has a first date of attestation of 1568, a long time before the official start date of the dictionary, which is 1700.  Previously this start date was being converted to 1700.  Our ‘cut off’ point for generating blocks of continuous attestation is now set to 50 years, meaning that if there are two or more attestations 50 years or less from each other this results in a block of continuous usage in the sparkline visualisation.  As the next date of attestation for the entry was 1721 the resulting sparkline therefore gave a continuous block from 1700 to 1721, which did not affect the underlying data, plus the sparkline text then included ‘1700-1721’ which was not at all accurate.  See the following screenshot to see what I mean:

I updated the code that generates the data for the sparklines so that any dates prior to 1700 result in the text ‘<1700’ appearing and the code no longer uses such dates as a starting point for a ‘block’ in the visualisation.  After the update we’re now presented with the following sparkline, which has a line at the start of the visualisation representing ‘<1700’ and then a gap from this point until 1721, which is the first attestation in the dictionary’s official period:

In order to get this working I needed to regenerate the data for Solr and then update the Solr core with the new data.  For now this is only running on my laptop and I have put in a request with our IT people to update the online Solr cores, as I don’t have access to do this myself.  Once the change has been made our online test site will be updated and hopefully it won’t be too much longer before we can actually update the live site and make this new feature available to everyone.

Week Beginning 20th May 2024

This was a week of working on many different projects for me.  I spent some time making final tweaks to the Speech Star websites (https://speechstar.ac.uk/ and https://www.seeingspeech.ac.uk/speechstar/) as the project officially came to an end this week.  This included adding new videos, replacing existing videos, updating the metadata, updating the site text and adding in a few new images.  Both resources have come together really well and I’m sure they will be hugely useful for speech therapy for a long time to come.

I also made a further update to the Anglo-Norman Dictionary resource related to the new ‘language’ search I added in last week.  After this had gone live I had intended to make the languages in the entries link to the search results and I spent a bit of time getting this working.  This involved updating the XSLT, the CSS and the JavaScript used to display the dictionary entries and the update is not live yet, but it’s in place on a test page and works pretty well.  See the ‘(loanword….’ section in the following screenshot to see how it will work:

I also replied to a number of emails that had been sent to me from Ann Fergusson of the DSL regarding the new date search and sparklines that I developed for the project many months ago but is still to go live.  Ann gave me some feedback on a number of issues and I spent some time making updates.  The big change will be to the way dates of attestation that are before or after the active period of the dictionary in question.  I’m going to have to update the way the sparkline data is generated, which may take some time.  I’ll hopefully be able to look into this next week.

For the new regions of the Speak For Yersel project I updated the privacy policies and now everything is ready for us to begin sending out the URLs for the new areas for test purposes and for the Books and Borrowing project I wrote a couple of scripts to find and then generate some missing pages from the registers of Selkirk library.  The first script identified page images that we have on the server for the two registers that do not have corresponding page records in our database.  The script was set to output a 300px wide version of each missing image plus its filename and it was also possible to click on the image to view a full-size version.  It turned out that the majority of the missing images were omitted for a reason – they were either blank or didn’t contain borrowing records.  We decided that 70 images needed to be added in and 117 could be safely ignored.  My second script then added in the missing pages.  This took a bit of time to get working as not only did I need to create the pages, I also needed to ensure they were slotted into the correct place, updating the ‘next’ and ‘previous’ links and the page order as required.  But the pages are now available and someone can begin the process of transcribing the borrowing records found on the pages.

I also spent about a day or so this week creating a content management system for the Burns Letter Writing Trail, which will be an interactive map of Burns’ correspondence that will eventually be added to https://www.burnsc21-letters-poems.glasgow.ac.uk/.  I can’t really say much more about it at this stage, but the CMS is about 80% complete and I hope to finish the rest next week.

Finally, I spent about a day looking through the images and YAML files of playbills from the 19th century for Deven Parker.  The images are generally a relatively small file size but they are actually pretty high resolution and I think they will work in a ‘zoom and pan’ interface.  As an experiment I set up such an interface for one image, using the same JavaScript library I used for the Books and Borrowing project.  I spent the rest of the day getting to grips with the YAMP files, building up my understanding of the textual data, sketching out an initial relational database structure that could be used for the project and compiling a list of questions for Deven.  I’ll probably have to meet with her to go through these and see if or how my involvement with the project will develop.

 

Week Beginning 13th May 2024

I continued with the upgrade of the BARS review this week.  I’d managed to complete the upgrade process last week, but discovered that the links to the actual text of journal articles as HTML and PDFs were not working.  Further investigation this week revealed that this was a major problem.  The way files are stored and referenced in the new version of the Open Journal System is entirely different to how things were in the original version.  Previously there was an ‘article_files’ table where files associated with articles were located but the new version instead features a ‘files’ table that contains paths to files that are entirely different to the earlier version and the actual directory structure of the system.  I realised that it was likely that the upgrade process not only upgraded the database but also moved files around too and as I never got the path to the files right on my Windows PC any upgrades that should have been applied to the files will have failed (although having said that I never saw any errors). I therefore had to begin the upgrade process again so that the files would actually get moved / renamed to match the updates to the database.

After further investigation it appeared that several people have experienced the same issue whilst upgrading, for example https://forum.pkp.sfu.ca/t/none-of-the-pdf-files-can-be-viewed-or-downloaded-after-upgraded-to-ojs-3/27381 and https://forum.pkp.sfu.ca/t/solved-cant-see-pdfs-after-upgrade-to-3-1-1-4/49382 and https://forum.pkp.sfu.ca/t/after-upgrading-to-ojs-3-1-1-4-files-are-not-found/48938/23.

I fired up the 2.4.8.5 version of the site that I had running on my old PC and this time ensured the config file included the correct path to the files.  After doing so the instance managed to find the files, meaning I could restart the upgrade from this version.  The first time I upgraded to 3.2.1 with the correct path the files were successfully indexed and added to the database, but no corresponding files were actually generated.  It turned out that this was because my ‘files’ directory had somehow been set to read only.  I fixed this an re-ran the upgrade to 3.2.1 and thankfully the files were all renamed and moved successfully.  After that the upgrade to 3.4 worked fine.

However, the HTML version of the articles were still not opening in the browser but were instead getting downloaded.  I found a forum post about this too: https://forum.pkp.sfu.ca/t/ojs-3-1-0-1-html-downloaded-it-automatically-from-articles/36024/8 which suggests that a plugin needs to be activated.  After managing to log in as an admin user I managed to find and active the required plugin (HTML Article Galley).  I also activated the ‘PDF.js PDF viewer’ plugin too.  This now means you can view the HTML and PDF versions of the article in your browser, as with the old site.  I also needed to update the permissions of a couple of directories and now the upgrade process is complete.  For now the new version of the site is still running on a test server, so I’ll still need to replace the live site with the newly upgraded version once Matt is ready.

Also this week I investigated an issue with the Books and Borrowing server, as staff were unable to log into the CMS.  As I suspected, this was because the server had run out of storage, meaning there was no space for a session variable to be stored.  I contacted Stirling IT about this and thankfully they were able to free up some space.  I also investigated some missing images and register pages from Selkirk library.  It turns out that we only included pages that had been transcribed by a previous researcher, which meant that many of the register pages have been missed out.  Thankfully we have the full set of digitised images and I’m therefore going to have to write a script that creates the missing pages, something I’ll try to tackle next week.

This week I finally moved back to my office in 13 University Gardens from the nicer office I’d been squatting in on Professors Square for the past year and a half.  All my stuff had been moved over for me and when I was on campus on Tuesday I therefore had to spend a bit of time getting everything set up and in order.

Also on Tuesday, I had a meeting with Deven Parker, who has a Leverhulme funded position in English Literature and is working with Matt Sangster.  We discussed a prospective project involving the digitisation of playbills from UK theatre from 1750 to 1843.  There are currently around 300,000 digitised images and may be up to 500,000 in the end.  Deven wants an online database to be made for these, featuring searchable text and the image and we discussed some of the possibilities.  I’m going to have a look at some of the data next week and will think about what can be done.

Also this week we went live with the language search on for the Anglo-Norman Dictionary.  This is now available as a tab on the advanced search page (https://anglo-norman.net/search/) and allows users to find dictionary entries that have been tagged as loanwords.  It’s great to have this feature live.  I also needed to update the bibliographical links to the DEAF website as they had changed their site, breaking all of the links we previously had.  This required me to update the links in a few places, but all is working again now – for example see the DEAF links on this page: https://anglo-norman.net/bibliography/

I also made a couple of small fixes to the Emblems website (https://www.emblems.arts.gla.ac.uk/french/), ensured project images are not too large on the Glasgow Medical Humanities website (https://glasgowmedhums.ac.uk/projects/) and fixed a typo that had existed for many years on the New Modernist Editing ‘Digial Ode’ site (https://nme-digital-ode.glasgow.ac.uk) .  I also had a chat with Craig Lamont about the interactive map of Burns correspondence that I will be developing.  There will soon be an RA who will begin to compile the data and I will need to start creating a content management system for this in the next week or so.

I also continued to make updates to the new Speak For Yersel surveys (that are not yet quite ready to launch).  This included adding in animated ‘bubble’ maps to the homepage and updating the survey tool to incorporate an optional question to the registration page about bilingualism, which I then incorporated into the surveys for Wales and the Republic of Ireland.

I then rounded off the week by making further updates to the Speech Star website, including adding text to the ‘About’ page, adding in some new videos, replacing some existing ones, updating metadata and ensuring all video popups are bookmarkable and have full citation information.  For example, here’s a direct link to the ‘Lip consonants’ video in ‘Sound groups’: https://speechstar.ac.uk/speech-sound-animations/#location=65.

Week Beginning 30th April 2024

I worked for several different projects this week.  On Monday I had a meeting with the Books and Borrowing PI and Co-I Katie and Matt and some researchers at UCL who are working on a project that deals with similar sorts of data to Books and Borrowing.  They were interested in how we developed the B&B resource and we had a good discussion about various technical and organisational issues for about an hour.  As a result of this meeting I decided to publish the development documentation for the project, which is now available here: https://borrowing.stir.ac.uk/documents/.  This includes the data description document, the requirements documents for the project’s content management system, front-end and API, plus my ‘developer diary’, consisting of all of the sections relating to B&B taken from this blog.  Hopefully the documents will be of some use to future projects.

Also for the Books and Borrowing project this week, the server that hosts the site at Stirling underwent a major upgrade and I therefore had to check through the site and the CMS to ensure everything still worked properly.  There were a few issues with the CMS that I needed to fix, but other than that all would appear to be fine.

I also began the process of updating the BARS review site for Matt.  This site is powered by the Open Journal System, but the version in use is now extremely out of date and needs an upgrade – a process that is not at all straightforward.  I spent about a day working on the upgrade and the process is not even close to completion yet.  It would appear that no 2.X versions of OJS will run on PHP 7, never mind the current PHP 8 and unfortunately my local server setup doesn’t function with anything before PHP 7.  I managed to get a version of PHP5 running on an old PC I’ve got but after copying the OJS files and database I’d taken from the live hosting onto this PC (and updating the database connection settings) I couldn’t get the site to load – all I got was a strange browser error that I’d never encountered before.

This current live version of the site is running version 2.4.3 of OJS, and the upgrade pathway states that this would need to be upgraded to 2.4.8.5 (which will then need to be upgraded to 3.2.1, which can then be upgraded to the most recent 3.4 release).  I therefore decided to try installing a fresh version of 2.4.8.5 to see if I had any better luck with that.  Initially things didn’t look good as when attempting to access the locally hosted site all I got was a fatal PHP error.  I managed to track this down and hacked about with the code to stop the error cropping up (I figured that since this version is only an interim installation it wouldn’t really matter).  After that the fresh install (with no BARS data) appeared to run successfully.

The upgrade path documentation (https://docs.pkp.sfu.ca/dev/upgrade-guide/en/) states that to upgrade to a newer version you should set up the newer version and then point it at the older version’s database and reinstate the directories holding the old site’s files, so this is what I did.  The upgrade script successfully noted the older database version and the upgrade process seemed to run successfully.  At this point I then had what appeared to be the BARS site running on my old PC.  However, there were some issues.  The colour scheme appears to have been lost and there are some other differences in the layout.  But more importantly, while the journals and articles all seem to be present and correct, pressing on the ‘HTML’ and ‘PDF’ links to access the content don’t actually load in any content.  Thankfully Matt suggested that my local version of the site may need its config file updated to point to the correct files directory, and this was indeed the case, so progress is being made.

I then migrated everything onto my current PC, which currently has the PHP version set to 7, which OJS 2.4.8.5 should support.  Unfortunately it doesn’t and when I try to access the site I just get fatal PHP errors relating to functions called in the code that have been removed from PHP 7.  This is not necessarily a big issue, though, as now I have the site on my current PC I should be able to upgrade to OJS 3.X, a process I will continue with next week.

Also this week I made a few more tweaks to Matthew Creasy’s conference website and responded to a few emails that came in requesting my help with various things.  I also made several more updates to the Speech Star website, which is now actually live (https://speechstar.ac.uk/).  This included renaming the site and updating the homepage text, updating the favicon on the other Speech Star website to cut down on tab confusion (see https://www.seeingspeech.ac.uk/speechstar/) plus adding in a new menu item and link from this site to the other one.  I also replaced the vocal tract video found here: https://speechstar.ac.uk/speech-sound-animations/ with a newer version.

I also returned to my work on the new language search for the Anglo-Norman Dictionary, completing an initial version of it.  As with the label search, the languages are listed down the left and you can add or remove them from the panel on the right by clicking on them.  Boolean options appear when multiple languages are selected and there’s an option to limit the search to compound words, exclude compound words or search for all.  Below is a screenshot of how the search form currently looks:

The search is fully operational, but is not yet live as I need to hear back from the editor, Geert, about when he’d like the new feature to launch.

Finally this week I made some further updates to the Speak For Yersel follow-on projects.  This included making a number of changes to the Welsh survey, adding in the site logos that Mary had created using ArcGIS and adding in the top-level tabs that will enable users to switch between survey areas.  I also created a new ‘speech bubbles’ animated GIF for Northern Ireland based on images Mary sent me.  The new survey areas are not yet publicly available but below is a screenshot of the Northern Ireland area, showing the logo, the top-level tabs and the animated GIF.  We’re getting there!

Week Beginning 22nd April 2024

The Books and Borrowing project (https://borrowing.stir.ac.uk/) had its official launch on Friday this week, and it was great to celebrate the completion of a project that has been such a major part of my working life for these past four years.  I spent a lot of this week preparing for the launch, at which I gave a talk about the creation of the resource.  This covered the definition of the data structures, the creation of the database, the planning and development of the content management system and then the front-end and API for the project.  There was a lot more I could have said about our use of technologies such as the IIIF server for images and Apache Solr for search facilities, but I had to keep the talk relatively brief and couldn’t include everything.  I think my talk went pretty well, and it was also really great to hear from the other members of the project team, many of whom also gave talks about the research they had undertaken using the resource.

As part of my preparations I also get the site running on my laptop in case there were any issues with the server or general internet connection during the launch.  I also spotted and fixed a small bug with the search results filter options.  The filter by place of publication was working, but when a place was selected it was not getting ‘ticked’ in the left-hand filter options.  This meant it was rather difficult to unselect the filter.  The issue must have been introduced when I updated how publication places were stored in the Solr index a few months ago and thankfully I was able to fix it without having to regenerate the data.

Also this week I made a small tweak to the website for the International James Joyce Symposium (https://ijjf2024.glasgow.ac.uk/), adding a further logo and link to the footer.  I also made a few more minor updates to the Speech Star website.

I also continued to work on the new language search for the Anglo-Norman Dictionary.  I updated my data import scripts so that language data would be extracted during batch import of data and ensured existing language data was deleted during the process of deleting older data.  I then updated the dictionary’s content management system to ensure that language data was properly dealt with when entries were added or edited through the system and updated the ‘view entry’ page in the system so that language data for each entry is now visible on the page, in the same way as parts of speech, labels and other data.

I spent the remainder of my week working on the Speak For Yersel follow-on projects, updating several of the questions for the Republic of Ireland, adding in introductory text for this survey area and updating all three survey areas to add some explanatory text to the start of each survey question.

 

Week Beginning 15th April 2024

The Books and Borrowing project has its official launch next week, and I spent some time this week preparing for it.  This included fixing a speed issue with the site-wide fact page (https://borrowing.stir.ac.uk/facts/) that was taking far too long to load its default view, something that has somehow got worse since I originally created the feature.  After undertaking some investigation into what was causing the slow loading time it turned out that the main sticking point was the loading of data for the two ‘Borrowings through time’ visualisations.  This was taking quite a long time to calculate for all libraries.  I therefore created cached files for the data used for all libraries and have set the API to call on these rather than querying the database whenever specific libraries are not requested.  This has greatly speeded up the loading of the page, which (for me at least) is now practically instantaneous, while before it was taking up to 30 seconds.

I also went through all of my blog posts to extract text relating to the project going back to create a single document.  It’s 130 pages long and contains more than 56,000 words, covering all of the work I did for the project over the course of four years.  I then spent some time preparing a talk I’ll give at the launch about the development of the resource.  I haven’t finished working on this yet and will continue with it next week.  I also met with Matt to discuss the British Association for Romantic Studies journal (https://www.bars.ac.uk/review/index.php/barsreview) that needs and overhaul.  I’m going to work on this for Matt and hopefully get an updated version in place before the end of May.

Also this week I continued to work for the Speech Star project, adding another batch of videos to the website.  This took a fair amount of time to implement as the videos needed to get added to a variety of different pages.

I then returned to working on the Speak For Yersel follow-on project, focussing on Wales this time.  I needed to generate a top-level ‘regions’ geojson file that amalgamated the area polygons into 22 larger regions.  I did this in QGIS, manually selecting and merging the areas.  With the regions file in place I could then set up the Wales survey using my survey setup script.  This all went pretty smoothly, although there were some inconsistencies between the area geojson file and the settlements spreadsheet that I needed to fix before I could get the survey to setup successfully.

I’d also received feedback about the Republic of Ireland and Northern Ireland surveys, including many changes that needed to be made to the survey questions.  I decided that the easiest way to handle this would be to delete the surveys (which were still only test versions with no real data) and start again with updated question / answer spreadsheets.  There were some other updates that had been suggested that would apply to all surveys and I updated all three sites to incorporate these.  I then spent a bit of time investigating QGIS and whether it could be used to create simpler versions of the region polygons in order to generate a logo for each of our new sites that would be analogous to the Scottish survey.  After discussing this with Jennifer and Mary we agreed that Mary would take this forward, so we’ll hopefully see the outcome next week.

Week Beginning 8th April 2024

I’d taken a day off this week, so only worked four.  I began the week continuing to work on the Anglo-Norman Dictionary, making a tweak to the publications scripts I was working on last week and then planning a new search of language tags that the editor wanted to be added.  Language tags are at entry level (i.e. they apply to the whole entry) and are used to denote loanwords.

There are only 2660 entries that currently feature the language tag (and 24,762 that don’t) so the search is going to be fairly limited.  I explored two possible developments.  Firstly, we could have a separate tab on the ‘Advanced Search’ page for ‘Language’, as we do for ‘Semantic & Usage Labels’.  The new tab would work in a similar way to this (see https://anglo-norman.net/search/), with a list of languages and a count of the number of associated entries.  We could either then make a search run as soon as a language is clicked on, or we could allow multiple languages to be selected and then joined with Booleans as with the ‘Label’ search.  The latter would be more consistent, but I’m not sure how useful it would be as there aren’t many entries that have multiple language tags (so ‘AND’ and ‘NOT’ would not be so helpful).  I guess ‘OR’ would be more useful, but the user could just perform separate searches as there won’t be huge number of results anyway.

Secondly, we could add a language selector to the ‘Headwords & Forms’ search tab, underneath ‘Citation date’.  We could provide a list of languages (including a note explaining how the languages are used and that they are not widely applied), with each language appearing as a checkbox (checking multiple will act as ‘OR’).  The language search could then be used on its own (leaving ‘Headword’ and ‘Citation date’ blank) or in conjunction with the other search options.

So for example

  1. Retrieve all of the words of Scandinavian origin
  2. Retrieve all of the words of Scandinavian origin that have a headword / form beginning ‘sc’
  3. Retrieve all of the words of Scandinavian origin that have a headword / form beginning ‘sc’ whose entries feature a citation with a date between 1400 and 1450

After further consultation with the editor, Geert, we decided that I’d start by developing the separate tab option and if we may expand the headword search to incorporate language at a later date, if it’s still considered necessary.

Incorporating a language search is going to mean updating the database and the entry publication scripts (both in the management system and my batch scripts) to extract language data from the entries when they are edited or created. I’ll also need to update the ‘view entry’ page in the DMS so the language data is listed.

My plan of action is to do the following:

  1. Create a new database table that will hold entry IDs, language IDs and whether the entry is a compound.  Where an entry has multiple languages it will have multiple rows in this table.
  2. Write a script that will iterate through the entries, will extract language data and will populate this table
  3. Incorporate the script into the publication workflow and ensure an entry’s language listing is cleared when an entry is deleted prior to a major batch upload.
  4. Update the ‘entry search’ facility in the site’s API to add a new search type for language.  This will accept similar arguments to the existing ‘label’ search type: one or more language IDs, Booleans to be used between the language IDs and  whether the search should be limited to compound words
  5. Add a further endpoint that will return a list of all languages together with a count of the number of entries that feature each language
  6. Update the advanced search page to add in a new ‘Language’ tab.  This will have a similar structure to the ‘Labels’ tab and will feature a list of languages together with counts of associated entries in a scrollable area on the left.  It will be possible to click on a language to add or remove it from a further section on the right of the page where selected languages will be listed.  If multiple languages are selected a drop-down list of Boolean options will appear between each language.  Pressing on the ‘Search’ button after selecting one or more languages will perform a language search.  This will list all corresponding entries in the same way as a Headword search.

I managed to complete the first two tasks, extracting 3097 language tags and adding these to the database.

Also this week I had discussions with the Books and Borrowing people about the official launch of the resource that’s taking place in a couple of weeks.  I’m going to be speaking at the launch so I needed to figure out what I should be talking about.  I also returned to the ‘Browse book editions’ page on the website (https://borrowing.stir.ac.uk/books/) which was at this point taking a long time to load.  This is because the page defaults to displaying all book editions in the system that have a title beginning with ‘A’-  almost 3000 books.  I did consider adding pagination to the facility, but I personally find it easier to scroll through a long page rather than flicking between many smaller pages, plus it means a user can use ‘Find’ in their browser to search the listing.  Another option I considered was to limit the default display to a particular genre of book rather than all genres, but I decided that this might confuse people if they don’t notice the limit has been applied.  Instead I set the page to not load a specific letter tab by default.  The tabs load, but to view the content of one of them the user actually has to select one.  This means the page now loads instantaneously and people get to choose what options they want to view without having a long wait.

Also this week I made some further updates to the Speech Star websites, adding in new ExtIPA animation videos to both the ‘pre’ and ‘post’ 2015 charts.  This was a bit fiddly and took some time but we now have most of the animations in place.  I also exported all of the Historical Thesaurus data for Fraser, as a project needs an up to date copy of it.

 

Week Beginning 11th March 2024

I spent some time this week further tweaking the Speak For Yersel survey tool I’ve been working on recently.  I completed an initial version of the tool last week, using it to publish a test version of the linguistic survey for the Republic of Ireland (not yet publicly available) and this week I ran the data for Northern Ireland through the tool.  As I did so I began to think about the instructions that would be needed at each stage and I also reworked the final stages of the tool.

The final stage previously involved importing the area GeoJSON file and the settlement CSV file in order to associate actual settlements with larger geographical areas and to generate the areas within which survey responses will be randomly assigned a location.  This was actually a two-step process and I therefore decided to split the stage in two, firstly ensuring the GeoJSON file is successfully parsed and imported and only then importing the settlement CSV file.  I also created a final ‘setup complete’ stage as well, as previously the tool didn’t give any feedback that the process had completed successfully.

With the updates in place creating the Northern Ireland survey using the tool was a pretty straightforward process, taking a matter of minutes.  I then moved on to creating our third and final survey for Wales, but unfortunately I soon realised that we didn’t have a top-level ‘regions’ GeoJSON file for this survey area.  The ‘regions’ file provides the broadest level of geographical areas and are visible on the maps when you hover over areas.  For example in the original SFY resource for Scotland there are 14 top-level regions such as ‘Fife’  or ‘Borders’ with labels that are visible when using the map such as the ones here: https://speakforyersel.ac.uk/explore-maps/lexical/.

Initially I tried creating my own regions in QGIS using the area GeoJSON file to group areas by the region contained in the settlement CSV files (e.g. ‘Anglesey’).  However, this resulted in around 22 regions, which I think is too many for a survey area the size of Wales – for the Republic of Ireland we have 8 and for Northern Ireland we have 11.  I asked the team about this and they are going to do some investigation so until I hear back from them Wales is on hold.

I also spent quite a bit of time this week continuing to migrate the Anthology of 16th and Early 17th Century Scots Poetry from ancient HTML to TEI XML.  Previously the poems I’ve been migrating have varied from 14-line sonnets to poems up to around 200 lines in length.  I’ve been manually copying each line into the Oxygen XML editor as I needed to check and replace ‘3’ characters that had been used to represent yoghs, add in the glosses and check for other issues.  This week I reached the King’s Quair, which compared to the other poems is a bit of an epic, weighing in at over 1300 lines.  I realised manually pasting in each line wasn’t an option if I wanted to keep my sanity and therefore I wrote a little jQuery script that extracted the text from the HTML table cell and generated the necessary XML line syntax.  I was then able to run the script, make a few further tweaks to line groups and then paste the whole poem into Oxygen.  This was significantly quicker than manual migration, but I did still need to add in the glosses, of which there were over 200, so it still took some time.  I continued to import other poems using my new method and I really feel like I’ve broken the back of the anthology now – and by the end of the week I’ve completed the migration of 114 poems.  Hopefully I’ll be able to launch the new site before Easter.

Also this week I began investigating the WCAG accessibility guidelines (https://www.w3.org/WAI/WCAG22/quickref/?versions=2.1) after we received a request about an accessibility statement for the Anglo-Norman Dictionary.  I spoke to a few people in the University who have used accessibility tools to validate websites and managed to perform an initial check of the AND website, which held up pretty well.  I’m intending to look through the guidelines and tools in greater details and hopefully update the sites I manage to make them more accessible after Easter.

Also this week I spoke to Susan Rennie about transferring ownership of the Scots Thesaurus domain to her after the DNS registration expires in April, added some statements to a couple of pages of the Books and Borrowing website referencing the project’s API and giving some information about it, and spoke to B&B project PI Katie Halsey about creating a preservation dataset for the project and depositing it with a research repository.