Week Beginning 21st August 2023

I spent most of this week working for the Dictionaries of the Scots Language, working on the new quotation date search.  I decided to work on the update on a version of the site and its data running on my laptop initially, as I have direct control over the Solr instance running on my laptop – something I don’t have on the server.  My first task was to create a new Solr index for the quotations and to write a script to export data from the database in a format that Solr could then index.  With over 700,000 quotations this took a bit of time, and I did encounter some issues, such as several tens of thousands of quotations not having date tags, meaning dates for the quotations could not be extracted.  I had a lengthy email conversation with the DSL team about this and thankfully it looks like the issue is not something I need to deal with:  data is being worked on in their editing system and the vast majority of the dating issues I’d encountered will be fixed the next time the data is exported for me to use.  I also encountered some further issues that needed o be addressed as I worked with the data.  For example, I realised I needed to add a count of the total number of quotes for an entry to each quote item in Solr to be able to work out the ranking algorithm for entries and this meant updating the export script, the structure of the Solr index and then re-exporting all 700,000 quotations.  Below is screenshot of the Solr admin interface, showing a query of the new quotation index – a search for ‘barrow’.

With this in place I then needed to update the API that processes search requests, connects to Solr and spits out the search results in a suitable format for use on the website.  This meant completely separating out and overhauling the quotation search, as it needed to connect to a different Solr index that featured data that had a very different structure.  I needed to ensure quotations could be grouped by their entries and then subjected to the same ‘max results’ limitations as other searches.  I also needed to create the ranking algorithm for entries based on the number of returned quotes vs the total number of quotes, sort the entries based on this and also ensure a maximum of 10 quotes per entry were displayed.  I also had to add in a further search option for dates, as I’d already detailed in the requirements document I’d previously written.  The screenshot below is of the new quotation endpoint in the API, showing a section of the results for ‘barrow’ in ‘snd’ between 1800 and 1900.

The next step was to update the front-end to add in the new ‘date’ drop-down when quotations are selected and then to ensure the new quotation search information could be properly extracted, formatted and passed to the API to return the relevant data.  The following screenshot shows the search form.  The explanatory text still needs some work as it currently doesn’t feel very elegant – I think there’s a ‘to’ missing somewhere.

The final step for the week was to deal with the actual results themselves, as they are rather different in structure to the previous results, as entries now potentially have multiple quotes, each of which contains information relating to the quote (e.g. dates, bib ID) and each of which may feature multiple snippets, if the term appears several times within a single quote.  I’ve managed to get the results to display correctly and the screenshot below shows the results of a search for ‘barrow’ in snd between 1800 and 1900.

The new search also now lets you perform a Boolean search on the contents of individual quotations rather than all quotations in an entry.  So for example you can search for ‘Messages AND Wean’ in quotes from 1980-1999 and only find those that match whereas previously if an entry featured one quote with ‘messages’ and another with ‘wean’ it would get returned.  The screenshot below shows the new results.

There are a few things that I need to discuss with the team, though.  Firstly the ranking system.  As previously agreed upon, entries are ranked based on the proportion of quotes that contain the search term.  But this is possibly ranking entries that only have one quote too highly.  If there is only one quote and it features the term then 100% of quotes feature the term so the entry is highly ranked, but longer, possibly more important entries are ranked lower because (for example) out of 50 quotes 40 feature the term.  We might want to look into weighting entries that have more quotes overall.  For example, an SND quotation search for ‘prince’ (see below).  ‘Prince’ is ranked first, but then results 2-6 appear because they only have one quote, which happens to feature ‘prince’.

The second issues is that the new system cuts off quotations for entries after the tenth (as you can see for ‘Prince’, above).  We’d agreed on this approach to stop entries with lots of quotes swamping the results, but currently nothing is displayed to say that the results have been snipped.  We might want to add a note under the tenth quote.

The third issue is that the quote field in Solr is currently stemmed, meaning the stems of words are stored and Solr can then match alternative forms.  This can work well – for example the ‘messages AND wean’ results include results for ‘message’ and ‘weans’ too.  But it can also be a bit too broad.  See for example the screenshot below, which shows a quotation search for ‘aggressive’.  As you can see, it has returned quotations that feature ‘aggression’, ‘aggressively’ and ‘aggress’ in addition to ‘aggressive’.  This might be useful, but it might cause confusion and we’ll need to discuss this further at some point.

Next week I’ll hopefully start work on the filtering of search results for all search types, which will involve a major change to the way headword searches work and more big changes to the Solr indexes.

Also this week I investigated applying OED DOIs to the OED lexemes we link to in the Historical Thesaurus.  Each OED sense now has its own DOI that we can get access to, and I was sent a spreadsheet containing several thousand as an example.  The idea is that links from the HT’s lexemes to the OED would be updated to use these DOIs rather than performing a search of the OED for the work, which is what currently happens.

After a few hours of research I reckoned it would be possible to apply the DOIs to the HT data, but there are some things that we’ll need to consider.  The OED spreadsheet looks like it will contain every sense and the HT data does not, so much of the spreadsheet will likely not match anything in our system.  I wrote a little script to check the spreadsheet against the HT’s OED lexeme table and 6186 rows in the spreadsheet match one (or more) lexeme in the database table while 7256 don’t.  I also noted that the combination of entry_id and element_id (in our database called refentry and refid) is not necessarily unique in the HT’s OED lexeme table.  This can be if a word appears in multiple categories, plus there is a further ID called ‘lemmaid’ that was sometimes used to differentiate specific lexemes in combination with the other two IDs.  In the spreadsheet there are 1180 rows that match multiple rows in the HT’s OED lexeme table.  However, this also isn’t a problem and usually just means a word appears in multiple categories.  It just means that the same DOI would apply to multiple lexemes.

What is potentially a problem is that we haven’t matched up all of the OED lexeme records with the HT lexeme records.  While 6186 rows in the spreadsheet match one or more rows in the OED lexeme table, only 4425 rows in the spreadsheet match one or more rows in the HT’s lexeme table.  We will not be able to update the links to switch to DOIs for any HT lexemes that aren’t matched to an OED lexeme.  After checking I discovered that there are 87,713 non-OE lexemes in the HT lexeme table that are not linked to an OED lexeme.  None of these will be able to have a DOI (and neither will the OE words, presumably).

Another potential problem is that the sense an HT lexeme is linked to is not necessarily the main sense for the OED lexeme.  In such cases the DOI then leads to a section of the OED entry that is only accessible to logged in users of the OED site.  An example from the spreadsheet is ‘aardvark’.  Our HT lexeme links to entry_id 22, element_id 16201412, which has the DOI https://doi.org/10.1093/OED/1516256385 which when you’re not logged in displays a ‘Please purchase a subscription’ page.  The other entry for ‘aardvark’ in the spreadsheet has entry_id 22 and element_id 16201390, which has the DOI https://doi.org/10.1093/OED/9531538482 which leads to the summary page, but the HT’s link will be the first DOI above and not the second.  Note that currently we link to the search results on the OED site, which actually might be more useful for many people.  Aarkvark as found here: https://ht.ac.uk/category/?type=search&qsearch=aardvark&page=1#id=39313 currently links to this OED page: https://www.oed.com/search/dictionary/?q=aard-vark

To summarise:  I can update all lexemes in the HT’s OED lexeme table that match the entry_id and element_id columns in the spreadsheet to add in the relevant DOI.  I can also then ensure that any HT lexeme records linked to these OED lexemes also feature the DOI, but this will apply to less lexemes due to there still being many HT lexemes that are not linked.  I could then update the links through to the OED for these lexemes, but this might not actually work as well as the current link to search results due to many OED DOIs leading to restricted pages.  I’ll need to hear back from the rest of the team before I can take this further.

Also this week I had a meeting with Pauline Mackay and Craig Lamont to discuss an interactive map of Burns’ correspondents.  We’d discussed this about three years ago and the are now reaching a point where they would like to develop the map.  We discussed various options for base maps, data categorisation and time sliders and I gave them a demonstration of the Books and Borrowing project’s Chamber’s library map, which I’d previously developed (https://borrowing.stir.ac.uk/chambers-library-map/).  They were pretty impressed with this and thought it would be a good model for their map.  Pauline and Craig are now going to work on some sample data to get me started, and once I receive this I’ll be able to begin development.  We had our meeting in the café of the new ARC building, which I’d never been to before, so it was a good opportunity to see the place.

Also this week I fixed some issues with images for one of the library registers for the Royal High School for the Books and Borrowing project.  These had been assigned the wrong ID in the spreadsheet I’d initially used to generate the data and I needed to write a little script to rectify this.

Finally, I had a chat with Joanna Kopaczyk about a potential project she’s putting together.  I can’t say much about it at this stage, but I’ll probably be able to use the systems I developed last year for the Anglo-Norman Dictionary’s Textbase (see https://anglo-norman.net/textbase-browse/ and https://anglo-norman.net/textbase-search/).  I’m meeting with Joanna to discuss this further next week.

 

Week Beginning 14th August 2023

I was back at work this week after a lovely two-week holiday (although I did spend a couple of hours making updates to the Speech Star website whilst I was away).  After catching up with emails, getting back up to speed with where I’d left off and making a helpful new ‘to do’ list I got stuck into fixing the language tags in the Anglo-Norman Dictionary.

In June the editor Geert noticed that language tags had disappeared from the XML files of many entries.  Further investigation by me revealed that this probably happened during the import of data into the new AND system and had affected entries up to and including the import of R; entries that were part of the subsequent import of S had their language tags intact.  It is likely that the issue was caused by the script that assigns IDs and numbers to <sense> and <senseInfo> tags as part of the import process, as this script edits the XML.  Further testing revealed that the updated import workflow that was developed for S retained all language tags, as does the script that processes single and batch XML uploads as part of the DMS.  This means the error has been rectified, but we still need to fix the entries that have lost their language tags.

I was able to retrieve a version of the data as it existed prior to batch updates being applied to entry senses and from this I was able to extract the missing language tags for these entries.  I was also able to run this extraction process on the R data as it existed prior to upload.  I then ran the process on the live database to extract language tags from entries that featured them, for example entries uploaded during the import of S.  The script was also adapted to extract the ‘certainty’ attribute from the tags if present.  This was represented in the output as the number 50, separated from the language by a bar character (e.g. ‘Arabic|50’).  Where an entry featured multiple language tags these were separated by a comma (e.g. ‘Latin,Hebrew’).

Geert made the decision that language tags, which were previously associated with specific senses or subsenses, should instead be associated with entries as a whole.  This structural change will greatly simplify the reinstatement of missing tags and it will also make it easier to add language tags to entries that do not already feature them.

The language data that I compiled was stored in a spreadsheet featuring three columns: Slug: the unique form of a headword used in entry URLs; Live Langs: language tags extracted from the live database; Old Langs: language tags extracted from the data prior to processing.  A fourth column was also added where manual overrides to the preceding two columns could be added by Geert.  This column could also be used to add entries that did not previously have a language tag but needed one.

Two further issues were addressed at this stage.  The first related to compound words, where the language applied to one part of the word.  In the original data these were represented by combining the language with ‘A.F.’, for example ‘M.E._and_A.F.’.  Continuing with this approach would make it more difficult to search for specific languages and the decision was made to only store the non-A.F. language with a note that the word is a compound.  This was encoded in the spreadsheet with a bar character followed by ‘Y’.  To ensure the data could be more easily machine-readable the compound character would always be the third part of the language data, whether or not certainty was present in the second part.  For example ‘M.E.|50|Y’ represents a word that is possibly from M.E. and is a compound while ‘M.E.||Y’ represents a word that is definitely from M.E and is a compound.

The second issue to be addressed was how to handle entries that featured languages but whose language tags were not needed.  In such cases Geert added the characters ‘$$’ to the fourth column.

The spreadsheet was edited by Geert and currently features 2741 entries that are to be updated.  Each entry in the spreadsheet will be edited using the following workflow:

  1. All existing language tags in the entry will be deleted. These generally occur in senses or subsenses, but some entries feature them in the <head> element.
  2. If the entry has ‘$$’ in column 4 then no further updates will be made
  3. If there is other data in column 4 this will be used
  4. If there is no data in column 4 then data from column 2 will be used
  5. If there is no data in columns 4 or 2 then data from column 3 will be usedWhere there are multiple languages separated by a comma these will be split and treated separately.
  6. For each language the presence of a certainty value and / or a compound will be ascertained
  7. In the XML the new language tags will appear below the <head> tag.
  8. An entry will feature one language tag for each language specified
  9. The specific language will be stored in the ‘lang’ attribute
  10. Certainty (if present) will be stored in the ‘cert’ attribute which may only contain ‘50’ to represent ‘uncertain’.
  11. Compound (if present) will be stored in a new ‘compound’ attribute which may only contain ‘true’ to denote the word is a compound.
  12. For example, ‘Latin|50,Greek|50’ will be stored as two <language> tags beneath the <head> tag as follows: <language lang=”Latin” cert=”50” /><language lang=”Greek” cert=”50” /> while ‘M.E.||Y’ will be stored as: <language lang=”M.E.” compound=”true” />

I ran and tested the update on a local version of the data and the output was checked by Geert and me.  After backing up the live database I then ran the update on it and all went well.  The dictionary’s DTD also needed to be updated to ensure the new language tag can be positioned as an optional child element of the ‘main_entry’ element.  The DTD was also updated to remove language as a child of ‘sense’, ‘subsense’ and ‘head’.

Previously the DTD had a limited list of languages that can appear in the ‘lang’ attribute, but I’m uncertain whether this ever worked as the XML definitely included languages that were not in the list.  Instead I created a ‘picklist’ for languages that pulls its data from a list of languages stored in the online database.  We use this approach for other things such as semantic labels so it was pretty easy to set up.  I also added in the new optional ‘compound’ attribute.

With all of this in place I then updated the XSLT and some of the CSS in order to display the new language tags, which now appear as italicised text above any part of speech.  For example, an entry with multiple languages, one of which is uncertain: https://anglo-norman.net/entry/ris_3 and an entry that’s a compound with another language: https://anglo-norman.net/entry/rofgable.  Eventually I will update the site further to enable searches for language tags, but this will come at a later date.

Also this week I spent a bit of time in email conversations with the Dictionaries of the Scots Language people, discussing updates to bibliographical entries, the new part of speech system, DOST citation dates that were later than 1700 and making further tweaks to my requirements document for the date and part of speech searches based on feedback received from the team.  We’re all in agreement about how the new feature will work now, which means I’ll be able to get started on the development next week, all being well.

I also gave some advice to Gavin Miller about a new proposal he’s currently putting together, helped out Matthew Creasy with the website for his James Joyce Symposium website, spoke to Craig Lamont about the Burns correspondents project and checked how the stats are working on sites that were moved to our newer server a while back (all thankfully seems to be working fine).

I spent the remainder of the week implementing a ‘cite this page’ feature for the Books and Borrowing project, and the feature now appears on every page that features data.  A ‘Cite this page’ button appears in the right-hand corner of the page title.  Pressing the button brings up a pop-up containing citation options in a variety of styles.  I’ve taken this from other projects I’ve been involved with (e.g. the Historical Thesaurus) and we might want to tweak it, but at the moment something along the lines of the following is displayed (full URL crudely ‘redacted’ as the site isn’t live yet):

Developing this feature has taken a bit of time due to the huge variation in the text that describes the page.  This can also make the citation rather long, for example:

Advanced search for ‘Borrower occupation: Arts and Letters, Borrower occupation: Author, Borrower occupation: Curator, Borrower occupation: Librarian, Borrower occupation: Musician, Borrower occupation: Painter/Limner, Borrower occupation: Poet, Borrower gender: Female, Author gender: Female’. 2023. In Books and Borrowing: An Analysis of Scottish Borrowers’ Registers, 1750-1830. University of Stirling. Retrieved 18 August 2023, from [very long URL goes here]

I haven’t included a description of selected filters and ‘order by’ options, but these are present in the URL.  I may add filters and orders to the description, or we can just leave it as it is and let people tweak their citation text if they want.

The ‘cite this page’ button appears on all pages that feature data, not just the search results.  For example register pages and the list of book editions.  Hopefully the feature will be useful once the site goes live.

Week Beginning 6th Febraury 2023

I tested positive for Covid on Saturday, which I think is the fourth time I’ve had it now.  However, this time it hit me a bit harder than previously and I was off work with it on Monday and Tuesday.  I was still feeling rather less than 100% on Wednesday but I decided to work (from home) anyway as Thursday and Friday this week were strike days so it would be the only day I would be able to work.  I managed to make it through the day but by the end I was really struggling and I was glad for the strike days as I was then quite unwell for the rest of the week.

I spent the day continuing to work on the advanced search interface for the Books and Borrowing project.  Whilst doing so I noticed that the borrower title data contained lots of variants that will probably need to be standardised (e.g. ‘capt’, ‘capt.’ and ‘captain’) so I emailed Katie and Matt a list of these.  I also spotted a problem with the occupations that had occurred during batch import of data for Innerpeffray.  There are 62 borrowers that have been assigned to the occupation category ‘Minister/Priest’ but this occupation is the only one where there are three hierarchical levels and ‘Minister/Priest’ is the second level and should therefore not be assignable.  Only endpoints of the hierarchy, such as ‘Catholic’ and ‘Church of Scotland’ should be assignable.  Hopefully this will be a fairly simple thing to fix, though.

For the Advanced Search form the requirements document stated that there will be an option for selecting libraries and a further one for selecting registers that will dynamically update depending on the libraries that are selected.  As I worked on this I realised it would be simpler to use if I just amalgamated the two choices, so instead I created one area that lists libraries and the registers contained in each.  From this you can select / deselect entire libraries and/or registers within libraries.  It does mean the area is rather large and I may update the interface to hide the registers unless you manually choose to view them.  But for now the listed registers are all displayed and include the number of borrowings in each.  There are several that have no borrowings and if these continue to have no borrowings I should probably remove them from the list as they would never feature in the search results anyway.

There is also a section for fields relating to the borrowing and a further one for fields relating to borrower.  This includes a list of borrower titles, with the option of selecting / deselecting any of these.  Beside each one is a count of the number of borrowers that have each title.  I’m currently still working on borrower occupation.  This currently features another area with checkboxes for each level of occupation, with counts of the number of borrowers in each occupation.  I’m still working on the select / deselect options so these are not all working at all levels yet.  I had hoped to finish this on Wednesday but my brain had turned to mush by the end of the day and I just couldn’t get it working.

Also this week I investigated an issue with Google Analytics for the Dictionaries of the Scots Language and responded to a couple of emails from Ann and Rhona.  I also spoke to Jennifer Smith about her extension of the Speak For Yersel project and exported some statistics about the number of questions answered.  I also responded to a query from Craig Lamont about the Edinburgh Enlightenment map we’d put together several years ago and spoke to Pauline Mackay about the Burns letter writing trail that I’ll be working on in the coming months.

Next week is the school half-term and I’m either on annual leave or on strike until next Friday.

Week Beginning 26th September 2022

I spent most of my time this week getting back into the development of the front-end for the Books and Borrowing project.  It’s been a long time since I was able to work on this due to commitments to other projects and also due to there being a lot more for me to do than I was expecting regarding processing images and generating associated data in the project’s content management system over the summer.  However, I have been able to get back into the development of the front-end this week and managed to make some pretty good progress.  The first thing I did was to make some changes to the ‘libraries’ page based on feedback I received ages ago from the project’s Co-I Matt Sangster.  The map of libraries used clustering to group libraries that are close together when the map is zoomed out, but Matt didn’t like this.  I therefore removed the clusters and turned the library locations back into regular individual markers.  However, it is now rather difficult to distinguish the markers for a number of libraries.  For example, the markers for Glasgow and the Hunterian libraries (back when the University was still on the High Street) are on top of each other and you have to zoom in a very long way before you can even tell there are two markers there.

I also updated the tabular view of libraries.  Previously the library name was a button that when clicked on opened the library’s page.  Now the name is text and there are two buttons underneath.  The first one opens the library page while the second pans and zooms the map to the selected library, whilst also scrolling the page to the top of the map.  This uses Leaflet’s ‘flyTo’ function which works pretty well, although the map tiles don’t quite load in fast enough for the automatic ‘zoom out, pan and zoom in’ to proceed as smoothly as it ought to.

After that I moved onto the library page, which previously just displayed the map and the library name. I updated the tabs for the various sections to display the number of registers, books and borrowers that are associated with the library.  The Introduction page also now features the information recorded about the library that has been entered into the CMS.  This includes location information, dates, links to the library etc.  Beneath the summary info there is the map, and beneath this is a bar chart showing the number of borrowings per year at the library.  Beneath the bar chart you can find the longer textual fields about the library such as descriptions and sources.  Here’s a screenshot of the page for St Andrews:

I also worked on the ‘Registers’ tab, which now displays a tabular list of the selected library’s registers, and I also ensured that when you select one of the tabs other than ‘Introduction’ the page automatically scrolls down to the top of the tabs to avoid the need to manually scroll past the header image (but we still may make this narrower eventually).  The tabular list of registers can be ordered by any of the columns and includes data on the number of pages, borrowers, books and borrowing records featured in each.

When you open a register the information about it is displayed (e.g. descriptions, dates, stats about the number of books etc referenced in the register) and large thumbnails of each page together with page numbers and the number of records on each page are displayed.  The thumbnails are rather large and I could make them smaller, but doing so would mean that all the pages end up looking the same – beige rectangles.  The thumbnails are generated on the fly by the IIIF server and the first time a register is loaded it can take a while for the thumbnails to load in.  However, generated thumbnails are then cached on the server so subsequent page loads are a lot quicker.  Here’s a screenshot of a register page for St Andrews:

One thing I also did was write a script to add in a new ‘pageorder’ field to the ‘page’ database table.  I then wrote a script that generated the page order for every page in every register in the system.  This picks out the page that has no preceding page and iterates through pages based on the ‘next page’ ID.  Previously pages in lists were ordered by their auto-incrementing ID, but this meant that if new pages needed to be inserted for a register they ended up stuck at the end of the list, even though the ‘next’ and ‘previous’ links worked successfully.  This new ‘pageorder’ field ensures lists of pages are displayed in the proper order.  I’ve updated the CMS to ensure this new field is used when viewing a register, although I haven’t as of yet updated the CMS to regenerate the ‘pageorder’ for a register if new pages are added out of sequence.  For now if this happens I’ll need to manually run my script again to update things.

Anyway, back to the front-end:  The new ‘pageorder’ is used in the list of pages mentioned above so the thumbnails get displaying in the correct order.  I may add pagination to this page, as all of the thumbnails are currently on one page and it can take a while to load, although these days people seem to prefer having long pages rather than having data split over multiple pages.

The final section I worked on was the page for viewing an actual page of the register, and this is still very much in progress.  You can open a register page by pressing on its thumbnail and currently you can navigate through the register using the ‘next’ and ‘previous’ buttons or return to the list of pages.  I still need to add in a ‘jump to page’ feature here too.  As discussed in the requirements document, there will be three views of the page: Text, Image and Text and Image side-by-side.  Currently I have implemented the image view only.  Pressing on the ‘Image view’ tab opens a zoomable / pannable interface through which the image of the register page can be viewed.  You can also make this interface full screen by pressing on the button in the top right.  Also, if you’re viewing the image and you use the ‘next’ and ‘previous’ navigation links you will stay on the ‘image’ tab when other pages load.  Here’s a screenshot of the ‘image view’ of the page:

Also this week I wrote a three-page requirements document for the redevelopment of the front-ends for the various place-names projects I’ve created using the system originally developed for the Berwickshire place-names project which launched back in 2018.  The requirements document proposes some major changes to the front-end, moving to an interface that operates almost entirely within the map and enabling users to search and browse all data from within the map view rather than having to navigate to other pages.  I sent the document off to Thomas Clancy, for whom I’m currently developing the systems for two place-names projects (Ayr and Iona) and I’ll just need to wait to hear back from him before I take things further.

I also responded to a query from Marc Alexander about the number of categories in the Thesaurus of Old English, investigated a couple of server issues that were affecting the Glasgow Medical Humanities site, removed all existing place-name elements from the Iona place-names CMS so that the team can start afresh and responded to a query from Eleanor Lawson about the filenames of video files on the Seeing Speech site.  I also made some further tweaks to the Speak For Yersel resource ahead of its launch next week.  This included adding survey numbers to the survey page and updating the navigation links and writing a script that purges a user and all related data from the system.  I ran this to remove all of my test data from the system.  If we do need to delete a user in future (either because their data is clearly spam or a malicious attempt to skew the results, or because a user has asked us to remove their data) I can run this script again.  I also ran through every single activity on the site to check everything was working correctly.  The only thing I noticed is that I hadn’t updated the script to remove the flags for completed surveys when a user logs out, meaning after logging out and creating a new user the ticks for completed surveys were still displaying.  I fixed this.

I also fixed a few issues with the Burns mini-site about Kozeluch, including updating the table sort options which had stopped working correctly when I added a new column to the table last week and fixing some typos with the introductory text.  I also had a chat with the editor of the Anglo-Norman Dictionary about future developments and responded to a query from Ann Ferguson about the DSL bibliographies.  Next week I will continue with the B&B developments.

Week Beginning 19th September 2022

It was a four-day week this week due to the Queen’s funeral on Monday.  I divided my time for the remaining four days over several projects.  For Speak For Yersel I finally tackled the issue of the way maps are loaded.  The system had been developed for a map to be loaded afresh every time data is requested, with any existing map destroyed in the process.  This worked fine when the maps didn’t contain demographic filters as generally each map only needed to be loaded once and then never changed until an entirely new map was needed (e.g. for the next survey question).  However, I was then asked to incorporate demographic filters (age groups, gender, education level), with new data requested based on the option the user selected.  This all went through the same map loading function, which still destroyed and reinitiated the entire map on each request.  This worked, but wasn’t ideal, as it meant the map reset to its default view and zoom level whenever you changed an option, map tiles were reloaded from the server unnecessarily and if the user was in ‘full screen’ mode they were booted out of this as the full screen map no longer existed.  For some time I’ve been meaning to redevelop this to address these issues, but I’ve held off as there were always other things to tackled and I was worried about essentially ripping apart the code and having to rebuilt fundamental aspects of it.  This week I finally plucked up the courage to delve into the code.

I created a test version of the site so as to not risk messing up the live version and managed to develop an updated method of loading the maps.  This method initiates the map only once when a page is first loaded rather than destroying and regenerating the map every time a new question is loaded or demographic data is changed.  This means the number of map tile loads is greatly reduced as the base map doesn’t change until the user zooms or pans.  It also means the location and zoom level a user has left the map on stays the same when the data is changed.  For example, if they’re interested in Glasgow and are zoomed in on it they can quickly flick between different demographic settings and the map will stay zoomed in on Glasgow rather than resetting each time.  Also, if you’re viewing the map in full-screen mode you can now change the demographic settings without the resource exiting out of full screen mode.

All worked very well, with the only issues being that the transitions between survey questions and quiz questions weren’t as smooth as the with older method.  Previously the map scrolled up and was then destroyed, then a new map was created and the data was loaded into the area before it smoothly scrolled down again.  For various technical reasons this no longer worked quite as well any more.  The map area still scrolls up and down, but the new data only populates the map as the map area scrolls down, meaning for a brief second you can still see the data and legend for the previous question before it switches to the new data.  However, I spent some further time investigating this issue and managed to fix it, with different fixes required for the survey and the quiz.  I also noticed a bug whereby the map would increase in size to fit the available space but the map layers and data were not extending properly into the newly expanded area.  This is a known issue with Leaflet maps that have their size changed dynamically and there’s actually a Leaflet function that sorts it – I just needed to call map.invalidateSize(); and the map worked properly again.  Of course it took a bit of time to figure this simple fix out.

I also made some further updates to the site.  Based on feedback about the difficulty some people are having about which surveys they’ve done, I updated the site to log when the user completes a survey.  Now when the user goes to the survey index page a count of the number of surveys they’ve completed is displayed in the top right and a green tick has been added to the button of each survey they have completed.  Also, when they reach the ‘what next’ page for a survey a count of their completed survey is also shown.  This should make it much easier for people to track what they’ve done.  I also made a few small tweaks to the data at the request of Jennifer, and create a new version of the animated GIF that has speech bubbles, as the bubble for Shetland needed its text changed.  As I didn’t have the files available I took the opportunity regenerate the GIF, using a larger map, as the older version looked quite fuzzy on a high definition screen like an iPad.  I kept the region outlines on as well to tie it in better with our interactive maps.  Also the font used in the new version is now the ‘Baloo’ font we use for the site.  I stored all of the individual frames both as images and as powerpoint slides so I can change them if required.  For future reference, I created the animated GIF using https://ezgif.com/maker with a 150 second delay between slides, crossfade on and a fader delay of 8.

Also this week I researched an issue with the Scots Thesaurus that was causing the site to fail to load.  The WordPress options table had become corrupted and unreadable and needed to be replaced with a version from the backups, which thankfully fixed things.  I also did my expenses from the DHC in Sheffield, which took longer than I thought it would, and made some further tweaks to the Kozeluch mini-site on the Burns C21 website.  This included regenerating the data from a spreadsheet via a script I’d written and tweaking the introductory text.  I also responded to a request from Fraser Dallachy to regenerate some data that a script Id’ previously written had outputted.  I also began writing a requirements document for the redevelopment of the place-names project front-ends to make them more ‘map first’.

I also did a bit more work for Speech Star, making some changes to the database of non-disordered speech and moving the ‘child speech error database’ to a new location.  I also met with Luca to have a chat about the BOSLIT project, its data, the interface and future plans.  We had a great chat and I then spent a lot of Friday thinking about the project and formulating some feedback that I sent in a lengthy email to Luca, Lorna Hughes and Kirsteen McCue on Friday afternoon.

Week Beginning 12th September 2022

I spent a bit of time this week going through my notes from the Digital Humanities Congress last week and writing last week’s lengthy post.  I also had my PDR session on Friday and I needed to spend some time preparing for this, writing all of the necessary text and then attending the session.  It was all very positive and it was a good opportunity to talk to my line manager about my role.  I’ve been in this job for ten years this month and have been writing these blog posts every working week for those ten years, which I think is quite an achievement.

In terms of actual work on projects, it was rather a bitty week, with my time spread across lots of different projects.  On Monday I had a Zoom call for the VariCS project, a phonetics project in collaboration with Strathclyde that I’m involved with.  The project is just starting up and this was the first time the team had all met.  We mainly discussed setting up a web presence for the project and I gave some advice on how we could set up the website, the URL and such things.  In the coming weeks I’ll probably get something set up for the project.

I then moved onto another Burns-related mini-project that I worked on with Kirsteen McCue many months ago – a digital edition of Koželuch’s settings of Robert Burns’s Songs for George Thomson.  We’re almost ready to launch this now and this week I created a page for an introductory essay, migrated a Word document to WordPress to fill the page, including adding in links and tweaking the layout to ensure things like quotes displayed properly.  There are still some further tweaks that I’ll need to implement next week, but we’re almost there.

I also spent some time tweaking the Speak For Yersel website, which is now publicly accessible (https://speakforyersel.ac.uk/) but still not quite finished.  I created a page for a video tour of the resource and made a few tweaks to the layout, such as checking the consistency of font sizes used throughout the site.  I also made some updates to the site text and added in some lengthy static content to the site in the form or a teachers’ FAQ and a ‘more information’ page.  I also changed the order of some of the buttons shown after a survey is completed to hopefully make it clearer that other surveys are available.

I also did a bit of work for the Speech Star project.  There had been some issues with the Central Scottish Phonetic Features MP4s playing audio only on some operating systems and the replacements that Eleanor had generated worked for her but not for me.  I therefore tried uploading them to and re-downloading them from YouTube, which thankfully seemed to fix the issue for everyone.  I then made some tweaks to the interfaces to the two project websites.  For the public site I made some updates to ensure the interface looked better on narrow screens, ensuring changing the appearance of the ‘menu’ button and making the logo and site header font smaller to they take up less space.  I also added an introductory video to the homepage too.

For the Books and Borrowing project I processed the images for another library register.  This didn’t go entirely smoothly.  I had been sent 73 images and these were all upside down so needed rotating.  It then transpired that I should have been sent 273 images so needed to chase up the missing ones.  Once I’d been sent the full set I was then able to generate the page images for the register, upload the images and associate them with the records.

I then moved on to setting up the front-end for the Ayr Place-names website.  In the process of doing so I became aware that one of the NLS map layers that all of our place-name projects use had stopped working.  It turned out that the NLS had migrated this map layer to a third party map tile service (https://www.maptiler.com/nls/) and the old URLs these sites were still using no longer worked.  I had a very helpful chat with Chris Fleet at NLS Maps about this and he explained the situation.  I was able to set up a free account with the maptiler service and update the URLS in four place-names websites that referenced the layer (https://berwickshire-placenames.glasgow.ac.uk/, https://kcb-placenames.glasgow.ac.uk/, https://ayr-placenames.glasgow.ac.uk and https://comparative-kingship.glasgow.ac.uk/scotland/).  I’ll need to ensure this is also done for the two further place-names projects that are still in development (https://mull-ulva-placenames.glasgow.ac.uk and https://iona-placenames.glasgow.ac.uk/).

I managed to complete the work on the front-end for the Ayr project, which was mostly straightforward as it was just adapting what I’d previously developed for other projects.  The thing that took the longest was getting the parish data and the locations where the parish three-letter acronyms should appear, but I was able to get this working thanks to the notes I’d made the last time I needed to deal with parish boundaries (as documented here: https://digital-humanities.glasgow.ac.uk/2021-07-05/.  After discussions with Thomas Clancy about the front-end I decided that it would be a good idea to redevelop the map-based interface to display al of the data on the map by default and to incorporate all of the search and browse options within the map itself.  This would be a big change, and it’s one I had been thinking of implementing anyway for the Iona project, but I’ll try and find some time to work on this for all of the place-name sites over the coming months.

Finally, I had a chat with Kirsteen McCue and Luca Guariento about the BOSLIT project.  This project is taking the existing data for the Bibliography of Scottish Literature in Translation (available on the NLS website here: https://data.nls.uk/data/metadata-collections/boslit/) and creating a new resource from it, including visualisations.  I offered to help out with this and will be meeting with Luca to discuss things further, probably next week.

 

Week Beginning 29th August 2022

I divided my time between a number of different projects this week.  For Speak For Yersel I replaced the ‘click’ transcripts with new versions that incorporated shorter segments and more highlighted words.  As the segments were now different I also needed to delete all existing responses to the ‘click’ activity.  I then completed the activity once for each speaker to test things out, and all seems to work fine with the new data.  I also changed the pop-up ‘percentage clicks’ text to ‘% clicks occurred here’, which is more accurate than the previous text which suggested it was the percentage of respondents.  I also fixed an issue with the map height being too small on the ‘where do you think this speaker is from’ quiz and ensured the page scrolls to the correct place when a new question is loaded.  I also removed the ‘tip’ text from the quiz intros and renamed the ‘where do you think this speaker is from’ map buttons on the map intro page.  I’d also been asked to trim down the number of ‘translation’ questions from the ‘I would never say that’ activity so I removed some of those.  I then changed and relocated the ‘heard in films and TV’ explanatory text removed the question mark from the ‘where do you think the speaker is from’ quiz intro page.

Mary had encountered a glitch with the transcription popups, whereby the page would flicker and jump about when certain popups were hovered over.  This was caused by the page height increasing to accommodate the pop-up, causing a scrollbar to appear in the browser, which changed the position of the cursor and made the pop-up disappear, making the scrollbar go and causing a glitchy loop.  I increased the height of the page for this activity so the scrollbar issue is no longer encountered, and I also made the popups a bit wider so they don’t need to be as long.  Mary also noticed that some of the ‘all over Scotland’ dynamically generated map answers seemed to be incorrect.  After some investigation I realised that this was a bug that had been introduced when I added in the ‘I would never say that’ quizzes on Friday.  A typo in the code meant that the 60% threshold for correct answers in each region was being used rather than ‘100 divided by the number of answer options’.  Thankfully once identified this was easy to fix.

I also participated in a Zoom call for the project this week to discuss the launch of the resource with the University’s media people.  It was agreed that the launch will be pushed back to the beginning of October as this should be a good time to get publicity.  Finally for the project this week I updated the structure of the site so that the ‘About’ menu item could become a drop-down menu, and I created placeholder pages for three new pages that will be added to this menu for things like FAQs.

I also continued to work on the Books and Borrowing project this week.  On Friday last week I didn’t quite get to finish a script to merge page records for one of the St Andrews registers as it needed further testing on my local PC before I ran it on the live data.  I tackled this issue first thing on Monday and it was a task I had hoped would only take half an hour or so.  Unfortunately things did not go well and it took most of the morning to sort out.  I initially attempted to run things on my local PC to test everything out, but I forgot to update the database connection details.  Usually this wouldn’t be an issue as generally the databases I work with use ‘localhost’ as a connection URL, so the Stirling credentials would have been wrong for my local DB and the script would have just quit, but Stirling (where the system is hosted) uses a full URL instead of ‘localhost’.  This meant that even though I had a local copy of the database on my PC and the scripts were running on a local server set up on my PC the scripts were in fact connecting to the real database at Stirling.  This meant the live data was being changed.  I didn’t realise this as the script was running and as it was taking some time I cancelled it, meaning the update quit halfway through changing borrowing records and deleting page records in the CMS.

 

I then had to write a further script to delete all of the page and borrowing records for this register from the Stirling server and reinstate the data from my local database.  Thankfully this worked ok.  I then ran my test script on the actual local database on my PC and the script did exactly what I wanted it to do, namely:

Iterate through the pages and for each odd numbered page move the records on these to the preceding even numbered page, and at the same time regenerate the ‘page order’ for each record so they follow on from the existing records.  Then the even page needs its folio number updated to add in the odd number (e.g. so folio number 2 becomes ‘2-3’) and generate an image reference based on this (e.g. UYLY207-2_2-3).  Then delete the odd page record and after all that is done regenerate the ‘next’ and ‘previous’ page links for all pages.

This all worked so I ran the script on the server and updated the live data.  However, I then noticed that there are gaps in the folio numbers and this has messed everything up.  For example, folio number 314 isn’t followed by 315 but 320.  320 isn’t an odd number so it doesn’t get joined to 314.  All subsequent page joins are then messed up.  There are also two ‘350’ pages in the CMS and two images that reference 350.  We have UYLY207-2_349-350 and also UYLY207-2_350-351.  There might be other situations where the data isn’t uniform too.

I therefore had to use my ‘delete and reinsert’ script again to revert to the data prior to the update as my script wasn’t set up to work with pages that don’t just increment their folio number by 1 each time.  After some discussion with the RA I updated the script again so that it would work with the non-uniform data and thankfully all worked fine after that.  Later in the week I also found some time to process two further St Andrews registers that needed their pages and records merged, and thankfully these went much smoother.

I also worked on the Speech Star project this week.  I created a new page on both of the project’s sites (which are not live yet) for viewing videos of Central Scottish phonetic features.  I also replaced the temporary logos used on the sites with the finalised logos that had been designed by a graphic designer.  However, the new logo only really works well on a white background as the white cut-out round the speech bubble into the star becomes the background colour of the header.  The blue that we’re currently using for the site header doesn’t work so well with the logo colours.  Also, the graphic designer had proposed using a different font for the site and I decided to make a new interface for the site, which you can see below.  I’m still waiting for feedback to see whether the team prefer this to the old interface (a screenshot of which you can see on this page: https://digital-humanities.glasgow.ac.uk/2022-01-17/) but I personally think it looks a lot better.

I also returned to the Burns Manuscript database that I’d begun last week.  I added a ‘view record’ icon to each row which if pressed on opens a ‘card’ view of the record on its own page.  I also added in the search options, which appear in a section above the table.  By default, the section is hidden and you can show/hide it by pressing on a button.  Above this I’ve also added in a placeholder where some introductory text can go.  If you open the ‘Search options’ section you’ll find text boxes where you can enter text for year, content, properties and notes.  For year you can either enter a specific year or a range.  The other text fields are purely free-text at the moment, so no wildcards.  I can add these in but I think it would just complicate things unnecessarily.  On the second row are checkboxes for type, location name and condition.  You can select one or more of each of these.

The search options are linked by AND, and the checkbox options are linked internally by OR.  For example, filling in ‘1780-1783’ for year and ‘wrapper’ for properties will find all rows with a date between 1780 and 1783 that also have ‘wrapper’ somewhere in their properties.  If you enter ‘work’ in content and select ‘Deed’ and ‘Fragment’ as types you will find all rows that are either ‘Deed’ or ‘Wrapper’ and have ‘work’ in their content.

If a search option is entered and you press the ‘Search’ button the page will reload with the search options open, and the page will scroll down to this section.  Any rows matching your criteria will be displayed in the table below this.  You can also clear the search by pressing on the ‘Clear search options’ button.  In addition, if you’re looking at search results and you press on the ‘view record’ button the ‘Return to table’ button on the ‘card’ view will reload the search results.  That’s this mini-site completed now, pending feedback from the project team, and you can see a screenshot of the site with the search box open below:

Also this week I’d arranged an in-person coffee and catch up with the other College of Arts developers.  We used to have these meetings regularly before Covid but this was the first time since then that we’d all met up.  It was really great to chat with Luca Guariento, Stevie Barrett and David Wilson again and to share the work we’d been doing since we last met.  Hopefully we can meet again soon.

Finally this week I helped out with a few WordPress questions from a couple of projects and I also had a chance to update all of the WordPress sites I manage (more than 50) to the most recent version.

 

Week Beginning 22nd August 2022

I continued to spend a lot of my time working on the Speak For Yersel project this week.  We had a team meeting on Monday at which we discussed the outstanding tasks and particularly how I was going to tackle converting the quiz questions into dynamic answers.  Previously the quiz question answers were static, which will not work well as the maps the users will reference in order to answer a question are dynamic, meaning the correct answer may evolve over time.  I had proposed a couple of methods that we could use to ensure that the answers were dynamically generated based on the currently available data and we finalised our approach today.

Although I’d already made quite a bit of progress with my previous test scripts, there was still a lot to do to actually update the site.  I needed to update the structure of the database, the script that outputs the data for use in the site, the scripts that handle the display of questions and the evaluation of answers, and the scripts that store a user’s selected answers.

Changes to the database allow for dynamic quiz questions to be stored (non-dynamic ones have fixed ‘answer options’ but dynamic ones don’t).  Changes also allow for references to the relevant answer option of the survey question the quiz question is about to be stored (e.g. that the quiz is about the ‘mother’ map and specifically about the use of ‘mam’).  I made significant updates to the script that outputs data for use in the site to integrate the functions from my earlier test script that calculated the correct answer.  I updated these functions to change the logic somewhat.  They now only use ‘method 1’ as mentioned in an earlier post.  This method also now has a built-in check to filter out regions that have the highest percentage of usage but only a limited amount of data.  Currently this is set to a minimum of 10 answers for the option in question (e.g. ‘mam’) rather than total number of answers in a region.  Regions are ordered by their percentage usage (highest first) and the script iterates down through the regions and will pick as ‘correct’ the first one that has at least 10 answers.  I’ve also added in a contingency in cases where none of the regions have at least 10 answers (currently the case for the ‘rocket’ question).  In such cases the region marked as ‘correct’ will be the one that has the highest raw count of answers for the answer option rather than the highest percentage.

With the ‘correct’ region picked out the script then picks out all other regions where the usage percentage is at least 10% lower than the correct percentage.  This is to ensure that there isn’t an ‘incorrect’ answer that is too similar to the ‘correct’ one.  If this results in less than three regions (as regions are only returned if they have clicks for the answer option) then the system goes through the remaining regions and adds these in with a zero percentage.  These ‘incorrect’ regions are then shuffled and three are picked out at random.  The ‘correct’ answer is then added to these three and the options are shuffled again to ensure the ‘correct’ option is randomly positioned.  The dynamically generated output is then plugged into the output script that the website uses.

I then updated the front-end to work with this new data.  This also required me to create a new database table to hold the user’s answers, storing the region the user presses on and whether their selection was correct, along with the question ID and the person ID.  Non-dynamic answers store the ID of the ‘answer option’ that the user selected, but these dynamic questions don’t have static ‘answer options’ so the structure needed to be different.

I then implemented the dynamic answers for the ‘most of Scotland’ questions.  For these questions the script needs to evaluate whether a form is used throughout Scotland or not.  The algorithm gets all of the answer options for the survey question (e.g. ‘crying’ and ‘greetin’) and for each region works out the percentage of responses for each option.  The team had previously suggested a fixed percentage threshold of 60%, but I reckoned it might be better for the threshold to change depending on how many answer options there are.  Currently I’ve set the threshold to be 100 divided by the number of options.  So where there are two options the threshold is 50%.  Where there are four options (e.g. the ‘wean’ question) the threshold is 25% (i.e. if 25% or more of the answers in a region are for ‘wean’ it is classed as present in the region).  Where there are three options (e.g. ‘clap’) the threshold is 33%.  Where there are 5 options (e.g. ‘clarty’) the threshold is 20%.

The algorithm counts the number of regions that meet the threshold, and if the number is 8 or more then the term is considered to be found throughout Scotland and ‘Yes’ is the correct answer.  If not then ‘No’ is the correct answer.  I also had to update the way answers are stored in the database so these yes/no answers can be saved (as they have no associated region like the other questions).

I then moved onto tackling the non-standard (in terms of structure) questions to ensure they are dynamically generated as well.  These were rather tricky to do as they each had to be handled differently as they were asking different things of the data (e.g. a question like ‘What are you likely to call the evening meal if you live in Tayside and Angus (Dundee) and didn’t go to Uni?’).  I also made the ‘Sounds about right’ quiz dynamic.

I then moved onto tackling the ‘I would never say that’ quiz, which has been somewhat tricky to get working as the structure of the survey questions and answers is very different.  Quizzes for the other surveys involved looking at a specific answer option but for this survey the answer options are different rating levels that each need to be processed and handled differently.

For this quiz for each region the system returns the number of times each rating level has been selected and works out the percentages for each.  It then adds the ‘I’ve never heard this’ and ‘people elsewhere say this’ percentages together as a ‘no’ percentage and adds the ‘people around me say this’ and ‘I’d say this myself’ percentages together as a ‘yes’ percentage.  Currently there is no weighting but we may want to consider this (e.g. ‘I’d say this’ would be worth more than ‘people around me’).

With these ratings stored the script handled question types differently.  For the ‘select a region’ type of question the system works in a similar way to the other quizzes:  It sorts the regions by ‘yes’ percentage with the biggest first.  It then iterates through the regions and picks as the correct answer the first it comes to where the total number of responses for the region is the same or greater than the minimum allowed (currently set to 10).  Note that this is different to the other quizzes where this check for 10 is made against the specific answer option rather than the number of responses in the region as a whole.

If no region passes the above check then the region with the highest ‘yes’ percentage without a minimum allowed check is chosen as the correct answer.  The system then picks out all other regions with data where the ‘yes’ percentage is at least 10% lower than the correct answer, adds in regions with no data if less than three have data, shuffles the regions and picks out three.  These are then added to the ‘correct’ region and the answers are shuffled again.

I changed the questions that had an ‘all over Scotland’ answer option so that these are now ‘yes/no’ questions, e.g. ‘Is ‘Are you wanting to come with me?’ heard throughout most of Scotland?’.  For these questions the system uses 8 regions as the threshold, as with the other quizzes.  However, the percentage threshold for ‘yes’ is fixed.  I’ve currently set this to 60% (i.e. at least 60% of all answers in a region are either ‘people around me say this’ or ‘I’d say this myself’).  There is currently no minimum number of responses limit for this question type, so a region with 1 single answer that’s ‘people around me say this’ will have a 100% ‘yes’ and the region will included.  This is also the case for the ‘most of Scotland’ questions in the other quizzes, as we may need to tweak this.

As we’re using percentages rather than exact number of dots the questions can sometimes be a bit tricky.  For example the first question currently has Glasgow as the correct answer because all but two of the markers in this region are ‘people around me say this’ or ‘I’d say this myself’.  But if you turn off the other two categories and just look at the number of dots you might surmise that the North East is the correct answer as there are more dots there, even though proportionally fewer of them are the high ratings.  I don’t know if we can make it clearer that we’re asking which region has proportionally more higher ratings without confusing people further, though.

I also spent some time this week working on the Book and Borrowing project.  I had to make a few tweaks to the Chambers map of borrowers to make the map work better on smaller screens.  I ensured that both the ‘Map options’ section on the left and the ‘map legend’ on the right are given a fixed height that is shorter than the map and the areas become scrollable, as I’d noticed that on short screens both these areas could end up longer than the map and therefore their lower parts were inaccessible.  I’ve also added a ‘show/hide’ button to the map legend, enabling people to hide the area if it obscures their view of the map.

I also sent on some renamed library register files from St Andrews to Gerry for him to align with existing pages in the CMS, replaced some of the page images for the Dumfries register and renamed and uploaded images for a further St Andrews register that already existed in the CMS, ensuring the images became associated with the existing pages.

I started to work on the images for another St Andrews register that already exists in the system, but for this one the images are a double page spread so I need to merge two pages into one in the CMS.  The script needs to find all odd numbered pages then move the records on these to the preceding even numbered page, and at the same time regenerate the ‘page order’ for each record so they follow on from the existing records.  Then the even page needs its folio number updated to add in the odd number (e.g. so folio number 2 becomes ‘2-3’.  Then I need to delete the odd page record and after all that is done I need to regenerate the ‘next’ and ‘previous’ page links for all pages.  I completed everything except the final task, but I really need to test the script out on a version of the database running on my local PC first, as if anything goes wrong data could very easily be lost. I’ll need to tackle this next week as I ran out of time this week.

I also participated in our six-monthly formal review meeting for the Dictionaries of the Scots Language where we discussed our achievements in the past six months and our plans for the next.  I also made some tweaks to the DSL website, such as splitting up the ‘Abbreviations and symbols’ buttons into two separate links, updating the text found on a couple of the old maps pages and considering future changes to the bibliography XSLT to allow links in the ‘oral sources’

Finally this week I made a start on the Burns manuscript database for Craig Lamont.  I wrote a script that extracts the data from Craig’s spreadsheet and imports it into an online database.  We will be able to rerun this whenever I’m given a new version of the spreadsheet.  I then created an initial version of a front-end for the database within the layout for the Burns Correspondence and Poetry site. Currently the front-end only displays the data in one table with columns for type, date, content, physical properties, additional notes and locations.  The latter contains the location name, shelfmark (if applicable) and condition (if applicable) for all locations associated with a record, each on a separate line with the location name in bold.  Currently it’s possible to order the columns by clicking on them.  Clicking a second time reverses the order.  I haven’t had a chance to create any search or filter options yet but I’m intending to continue with this next week.

Week Beginning 13th June 2022

I worked for several different projects this week.  For the Books and Borrowing project I processed and imported a further register for the Advocates library that had been digitised by the NLS.  I also continued with the interactive map of Chambers library borrowers, although I couldn’t spend as much time on this as I’d hoped as my access to Stirling University’s VPN had stopped working and without VPN access I can’t connect to the database and the project server.  It took a while to resolve the issue as access needs to be approved by some manager or other, but once it was sorted I got to work on some updates.

One thing I’d noticed last week was that when zooming and panning the historical map layer was throwing out hundreds of 403 Forbidden errors to the browser console.  This was not having any impact on the user experience, but was still a bit messy and I wanted to get to the bottom of the issue.  I had a very helpful (as always) chat with Chris Fleet at NLS Maps, who provided the historical map layer and he reckoned it was because the historical map only covers a certain area and moving beyond this was still sending requests for map tiles that didn’t exist.  Thankfully an option exists in Leaflet that allows you to set the boundaries for a map layer (https://leafletjs.com/reference.html#latlngbounds) and I updated the code to do just that, which seems to have stopped the errors.

I then returned to the occupations categorisation, which was including far too many options.  I therefore streamlined the occupations, displaying the top-level occupation only.  I think this works a lot better (although I need to change the icon colour for ‘unknown’).  Full occupation information is still available for each borrower via the popup.

I also had to change the range slider for opacity as standard HTML range sliders don’t allow for double-ended ranges.  We require a double-ended range for the subscription period and I didn’t want to have two range sliders that looked different on one page.  I therefore switched to a range slider offered by the jQuery UI interface library (https://jqueryui.com/slider/#range).  The opacity slider still works as before, it just looks a little different.  Actually, it works better than before, as the opacity now changes as you slide rather than only updating after you mouse-up.

I then began to implement the subscription period slider.  This does not yet update the data.  It’s been pretty tricky to implement this.  The range needs to be dynamically generated based on the earliest and latest dates in the data, and dates are both year and month, which need to be converted into plain integers for the slider and then reinterpreted as years and months when the user updates the end positions.  I think I’ve got this working as it should, though.  When you update the ends of the slider the text above that lists the months and years updates to reflect this.  The next step will be to actually filter the data based on the chosen period.  Here’s a screenshot of the map featuring data categorised by the new streamlined occupations and the new sliders displayed:

For the Speak For Yersel project I made a number of tweaks to the resource, which Jennifer and Mary are piloting with school children in the North East this week.  I added in a new grammatical question and seven grammatical quiz questions.  I tweaked the homepage text and updated the structure of questions 27-29 of the ‘sound about right’ activity.  I ensured that ‘Dumfries’ always appears as ‘Dumfries and Galloway’ in the ‘clever’ activity and follow-on and updated the ‘clever’ activity to remove the stereotype questions.  These were the ones where users had to rate the speakers from a region without first listening to any audio clips and Jennifer reckoned these were taking too long to complete.  I also updated the ‘clever’ follow-on to hide the stereotype options and switched the order of the listener and speaker options in the other follow-on activity for this type.

For the Speech Star project I replaced the data for the child speech error database with a new, expanded dataset and added in ‘Speaker Code’ as a filter option.  I also replicated the child speech and normalised speech databases from the clinical website we’re creating on the more academic teaching site we’re creating and also pulled in the IPA chart from Seeing Speech into this resource too.  Here’s a screenshot of how the child speech error database looks with the new ‘speaker code’ filter with ‘vowel disorder’ selected:

I also responded to Craig Lamont in Scottish literature with some further feedback on the structure of his Burns Manuscript Database spreadsheet, which is now shaping up nicely.  Craig had also sent me an updated spreadsheet with data for the Ramsay Gentle Shepherd performances project.  I’d set this up (interactive map, timeline and filterable tabular data) a few weeks ago, migrating it to the University’s T4 website management system.  All had worked then but when I logged into T4 and previewed the page I previously created I discovered it longer worked.  The page hadn’t been updated since the end of May and I had no idea what’s gone wrong.  I can only assume that the linked content (i.e. the links to the JavaScript files) had somehow become unlinked.  I decided, therefore, that it would be easier to just host the JavaScript files on another server I have direct access to rather than having to shoehorn it all into T4.  I made an updated version with the new dataset and this is working well.

I also made a couple of tweaks to the DSL this week, installing the TablePress plugin for the ancillary pages and creating a further alternative logo for the DSL’s Facebook posts.  I also returned to going some work for the Anglo-Norman Dictionary, offering some advice to the editor Geert about incorporating publications and overhauling how cross references are displayed in the Dictionary Management System.

I updated the ‘View Entry’ page in the DMS.  Previously it only included cross references FROM the entry you’re looking at TO any other entries.  I.e. it only displayed content when the entry was of type ‘xref’ rather than ‘main’.  Now in addition to this there’s a further section listing all cross references TO the entry you’re looking at from any entry of type ‘xref’ that links to it.

In addition there is a button allowing you to view all entries that include a cross reference to the current entry anywhere in their XML – i.e. where an <xref> tag that features the current entry’s slug is found at any level in any other main entry’s XML.  This code is hugely memory intensive to run, as basically all 27,464 main entries need to be pulled into the script, with the full XML contents of each checked for matching xrefs.  For this reason the page doesn’t run the code each time the ‘view entry’ page is loaded but instead only runs when you actively press the button.  It takes a few seconds for the script to process, but after it does the cross references are listed in the same manner as the ‘pure’ xrefs in the preceding sections.

Finally I participated in a Zoom-based focus group for the AHRC about the role of technicians in research projects this week.  It was great to participate to share my views on my role and to hear from other people with similar roles at other organisations.

Week Beginning 6th June 2022

I’d taken Monday off this week to have an extra-long weekend following the jubilee holidays on Thursday and Friday last week.  On Tuesday I returned to another meeting for Speak For Yersel and a list of further tweaks to the site, including many changes to three of the five activities and a new set of colours for the map marker icons, which make the markers much more easy to differentiate.

I spent most of the week working on the Books and Borrowing project.  We’d been sent a new library register from the NLS and I spent a bit of time downloading the 700 or so images, processing them and uploading them into our system.  As usual, page numbers go a bit weird.  Page 632 is written as 634 and then after page 669 comes not 670 but 700!  I ran my script to bring the page numbers in the system into line with the oddities of the written numbers.  On Friday I downloaded a further library register which I’ll need to process next week.

My main focus for the project was the Chambers Library interactive map sub-site.  The map features the John Ainslie 1804 map from the NLS, and currently it uses the same modern map as I’ve used elsewhere in the front-end for consistency, although this may change.  The map defaults to having a ‘Map options’ pane open on the left, and you can open and close this using the button above it.  I also added a ‘Full screen’ button beneath the zoom buttons in the bottom right.  I also added this to the other maps in the front-end too. Borrower markers have a ‘person’ icon and the library itself has the ‘open book’ icon as found on other maps.

By default the data is categorised by borrower gender, with somewhat stereotypical (but possibly helpful) blue and pink colours differentiating the two.  There is one borrower with an ‘unknown’ gender and this is set to green.  The map legend in the top right allows you to turn on and off specific data groups.  The screenshot below shows this categorisation:

The next categorisation option is occupation, and this has some problems.  The first is there are almost 30 different occupations, meaning the legend is awfully long and so many different marker colours are needed that some of them are difficult to differentiate.  Secondly, most occupations only have a handful of people.  Thirdly, some people have multiple occupations, and if so these are treated as one long occupation, so we have both ‘Independent Means > Gentleman’ and then ‘Independent Means > Gentleman, Politics/Office Holders > MP (Britain)’.  It would be tricky to separate these out as the marker would then need to belong to two sets with two colours, plus what happens if you hide one set?  I wonder if we should just use the top-level categorisation for the groupings instead?  This would result in 12 groupings plus ‘unknown’, meaning the legend would be both shorter and narrower.  Below is a screenshot of the occupation categorisation as it currently stands:

The next categorisation is subscription type, which I don’t think needs any explanation.  I then decided to add in a further categorisation for number of borrowings, which wasn’t originally discussed but as I used the page I found myself looking for an option to see who borrowed the most, or didn’t borrow anything.  I added the following groupings, but these may change: 0, 1-10, 11-20, 21-50, 51-70, 70+ and have used a sequential colour scale (darker = more borrowings).  We might want to tweak this, though, as some of the colours are a bit too similar.  I haven’t added in the filter to select subscription period yet, but will look into this next week.

At the bottom of the map options is a facility to change the opacity of the historical map so you can see the modern street layout.  This is handy for example for figuring out why there is a cluster of markers in a field where ‘Ainslie Place’ was presumably built after the historical map was produced.

I decided to not include the marker clustering option in this map for now as clustering would make it more difficult to analyse the categorisation as markers from multiple groupings would end up clustered together and lose their individual colours until the cluster is split.  Marker hover-overs display the borrower name and the pop-ups contain information about the borrower.  I still need to add in the borrowing period data, and also figure out how best to link out to information about the borrowings or page images.  The Chambers Library pin displays the same information as found in the ‘libraries’ page you’ve previously seen.

Also this week I responded to a couple of queries from the DSL people about Google Analytics and the icons that gets used for the site when posting on Facebook.  Facebook was picking out the University of Glasgow logo rather than the DSL one, which wasn’t ideal.  Apparently there’s a ‘meta’ tag that you need to add to the site header in order for Facebook to pick up the correct logo, as discussed here: https://stackoverflow.com/questions/7836753/how-to-customize-the-icon-displayed-on-facebook-when-posting-a-url-onto-wall

I also created a new user for the Ayr place-names project and dealt with a couple of minor issues with the CMS that Simon Taylor had encountered.  I also investigated a certificate error with the ohos.ac.uk website and responded to a query about QR codes from fellow developer David Wilson.  Also, Craig Lamont in Scottish Literature got in touch about a spreadsheet listed Burns manuscripts that he’s been working on with a view to turning it into a searchable online resource and I gave him some feedback about the structure of the spreadsheet.

Finally, I did a bit of work for the Historical Thesaurus, working on a further script to match up HT and OED categories based on suggestions by researcher Beth Beattie.  I found a script I’d produced in from 2018 that ran pattern matching on headings and I adapted this to only look at subcats within 02.02 and 02.03, picking out all unmatched OED subcats from these (there are 627) and then finding all unmatched HT categories where our ‘t’ numbers match the OED path.  Previously the script used the HT oedmaincat column to link up OED and HT but this no longer matches (e.g. HT ‘smarten up’ has ‘t’ nums 02.02.16.02 which matches OED 02.02.16.02 ‘to smarten up’ whereas HT ‘oedmaincat’ is ’02.04.05.02’).

The script lists the various pattern matches at the top of the page and the output is displayed in a table that can be copied and pasted into Excel.  Of the 627 OED subcats there are 528 that match an HT category.  However, some of them potentially match multiple HT categories.  These appear in red while one to one matches appear in green.  Some of these multiple matches are due to Levenshtein matches (e.g. ‘sadism’ and ‘sadist’) but most are due to there being multiple subcats at different levels with the exact same heading.  These can be manually tweaked in Excel and then I could run the updated spreadsheet through a script to insert the connections.  We also had an HT team meeting this week that I attended.