Week Beginning 29th January 2024

I spent a lot of time this week working on the Books and Borrowing project, making final preparations for the launch of the full website.  By the end of the week the website was mostly publicly available (see https://borrowing.stir.ac.uk) but it wasn’t as smooth a process as I was hoping for and there are still things I need to finish next week.

My first task of the week was to write a script that identified borrowers who have at least one active borrowing record but of these zero are records on pages that are active in registers that are active.  This should then bring back a list of borrowers who are only associated with inactive registers.  It took quite some time to get my head around this after several earlier attempts didn’t do exactly what was requested and my final script identified 128 borrowers that were then deactivated, all from St Andrews.

I then moved on to an issue that had been noticed with a search for author names.  A search for ‘Keats’ was bringing back matches for ‘Keating’, which was clearly not very helpful.  The cause of this was Solr trying to be too helpful.  The author name fields were stored at ‘text_en’ and this field type has stemming applied to it, whereby stems of words are identified (e.g. ‘Keat’) and a search for a stem plus a known suffix (e.g. ‘s’, ‘ing’) will bring back other forms of the same stem.  For names this is hopeless as ‘Keating’ is in no way connected to ‘Keats’.

It turned out that this issue was affecting many other fields as well, such as book titles.  A search for ‘excellencies’ was finding book titles containing the forms ‘excellency’ and also ‘excellence’, which again was pretty unhelpful.  I did some investigation into stemming and whether a Solr query could be set to ignore it, but this did not seem to be possible.  For a while I thought I’ve had to change all of the fields to strings, which would have been awful as strings in Solr are case sensitive and do not get split into tokens, meaning wildcards would need to be used and the search scripts I’d created would need to be rewritten.

Thankfully I discovered that if I stored the text in the field type ‘text_general’ then stemming would be ignored but the text would still be split into tokens.  I created a test Solr index on my laptop with all of the ‘text_en’ fields set to ‘text_general’ and searching this index for author surname ‘Keats’ only brought back ‘Keats’ and book title for ‘excellencies’ only brought back ‘excellencies’.  This is exactly what we wanted and with the change in place I was able to fully regenerate the cached data and the JSON files for import into Solr (a process that takes a couple of hours to complete) and ask for the online Solr index to be updated.

I also updated the help text on the website, adding in some new tooltips about editors and translators to the advanced search form.  With no other issues reported by the team I then began the process of making the site live, including adding a ‘quick search’ bar to the header of all pages, adding in menu items to access the data and adding the ‘on this day’ feature to the homepage.

However, I noticed one fairly sizeable issue, unfortunately. The stats on the ‘facts’ page do not correspond to the results you get when you click through to the search results.  I therefore removed the top-level ‘Facts & figures’ menu item until I can figure out what’s going on here. I first noticed an issue with the top ten genre lists.  The number of borrowings for the most popular genre (history) were crazy.  Overall borrowings of ‘history’ is listed as ‘104849’, which almost as many as the total number of borrowing records than we have in the system.  Also the link to view borrowings in the search was limiting the search to Haddington, which is clearly wrong.    Other numbers in the page just weren’t matching up to the search results either.  For example for Chambers the most prolific borrower is Mrs Thomas Hutchings with 274 listed borrowings, but following the link to the search results gives 282.

It took several hours of going through the code to ascertain what was going on with the top ten genre issue and it was all down to a missing equals sign in an ‘if’ statement.  Where there should have been two (==) there was only one, and this was stopping the following genre code from processing successfully.

There is unfortunately still the issue of the figures on the facts pages not exactly matching up with the number of results returned by the search.  I had noticed this before but I’d hoped it was caused by the database still being updated by the teasm and being slightly out of sync with the Solr index.  Alas, it’s looking like this is not the case.  There must instead be discrepancies in the queries used to generate the Solr index and those used the generate the facts data.  I fear that it might be the case that in the chain of related data some checks for ‘isactive’ have been omitted.  Each register, page, borrowing record, borrower, author, and book at each level has its own ‘isactive’ flag.  If (for example) a book item is set to inactive but a query fails to check the book item ‘isactive’ flag then any associated borrowing records will still be returned (unless they have each been set to inactive).  I’m going to have to check every query to ensure the flags are always present.  And it’s even more complicated than that because queries don’t necessarily always include the same data types.  E.g. a borrower is related to a book via a borrowing record and if you’re only interested in borrowers and books the query doesn’t need to include the associated page or register.  But of course if the page or register where the borrowing record is located is inactive then this does become important.  I might actually overhaul the ‘facts’ so they are generated directly from the Solr index.  This would mean things should remain consistent, even when updates are made to the data in the CMS (these would not be reflected until the Solr index is regenerated).  Something I’ll need to work on next week.

Also this week I had a Zoom call with Jennifer Smith and Mary Robinson to discuss expanding the Speak For Yersel project to other areas, namely Wales, Ireland and Northern Ireland.  The data for these areas is now ready to use and we agreed that I would start working with it next week.

I also fixed a couple of issues with the DSL website.  The first was an easy one – the team wanted the boxes on the homepage to be updated to include a ‘Donate’ box.  It didn’t take long to get this done.  The second was an issue with Boolean searches on our test site.  Performing a fulltext search for ‘letters and dunb’ on the live site was returning 192 results while the same search on our test site was returning 54,466 results (limited to 500).

It turned out that the search string was being converted to lower case before it was being processed by the API.  I must have added this in to ignore case in the headword search, but an unintended consequence was that the Booleans were also converted to lower case and were therefore not getting picked up as Booleans.  I updated the API so that the search string is not converted to lower case, but where a headword search is to be performed the string is converted to lower case after the Boolean logic is executed.  With the update in place the searches started working properly again.

I also made a minor tweak to the advanced search on both live and new sites so that query strings with quotations in them no longer lose their quotation marks when returning to the advanced search form.  This was an issue that was identified at our in-person meeting a couple of weeks ago.

I found a bit of time to continue working on the new map for the Place-names of Iona project this week too, during which I completed work on the ‘Table view’ feature.  I updated the table’s popup so that it never becomes taller than the visible map.  If the table is longer than this then the popup now features a scrollbar, which works a lot better than the previous method, whereby pressing on the browser’s scrollbar closed the popup.

I have also now completed the option to press on a place-name in the table to view the place-name on the map.  It took a lot of experimentation to figure out how to get the relevant marker popup to open and also populate it with the correct data.  Things went a little crazy during testing when opening a popup was somehow getting chained to any subsequent popup openings, resulting in the map getting stuck in an endless loop of opening and closing different popups and attempting to autoscroll between them all.  Thankfully I figured out what was causing this and sorted it.  Now when you press on a place-name in the table view any open popups are closed, the map navigates to the appropriate location and the relevant popup opens.  Originally I had the full record popup opening as well as the smaller marker popup, but I began to find this quite annoying as when I pressed on a place-name in the table what I generally wanted to see was where the place-name was on the map, and the full record obscured this.  Instead I’ve set it to open the marker popup, which provides the user with an option to open the full record without obscuring things.  I’ve also now made it possible tocite / bookmark / share the table in a URL.

I then moved onto developing the ‘Download CSV’ option, which took quite some time.  The link at the bottom of the left-hand menu updates every time the map data changes and pressing on the button now downloads the map data as a CSV.

Also this week I had discussions with the Anglo-Norman Dictionary team about a new proposal they are putting together and I updated the Fife Place-names website to fix a couple of issues that had been introduced when the site was migrated to a new server.  I also continued to migrate the collection of Scots poems discussed in previous weeks from ancient HTML to XML, although I didn’t have much time available to spend on this.

Week Beginning 22nd January 2024

I’d hoped that we’d be able to go live with the full Books and Borrowing site this week, but unfortunately there are still some final tweaks to be made so we’re not live yet.  I did manage to get everything ready to go, but then some final checking by the team uncovered some additional issues that needed sorting.

Migrating all of the pages in the development site to their final URLs proved to be quite challenging as it was not simply a matter of copying the pages.  Instead I needed to amalgamate three different sets of scripts and stylesheets (the WordPress site, the dev site and the Chambers map), which involved some rewriting and quite a lot of checking.  However, by the end of Monday I had completed the process and I sent the URLs to the team for a final round of checking.

The team uncovered a few issues that I managed to address, including some problems with the encoding of ampersands and books and borrowers without associated dates not getting returned in the site-wide browse options.  I needed to regenerate the cache files after sorting some of these, which also took a bit of time.

I also realised that the default view of the ‘Browse books’ page was taking fat too long to load, with load times of more than 20 seconds.  I therefore decided to create a cache for the standard view of books, with separate cache files for books by first letter of their title and their author.  These caches would then be used whenever such lists were requested rather than querying everything each time.  When filters are applied (e.g. date ranges, genres) the cache would be ignored, but such filtered views generally bring back less books and are returned quicker anyway.  It took a while to write a script to generate the cache, to update the API to use the cache and then add the cache generation script to the documentation, but once all was in place the page was much quicker to load.  It does still take a few seconds to load in as there are almost 3000 books beginning with the letter ‘A’ in our system and even processing a static file containing this much data takes time.  But it’s a marked improvement.

On Friday afternoon the project PI Katie Halsey got back to me with a list of further updates that I need to make before the launch, one of which will require the Solr index structure to be updated and all of the Solr data to be regenerated, which I’ll tackle next week.  Hopefully after that we’ll finally be able to launch the site.

Also this week I fixed a small bug that had been introduced to the DSL sites when we migrated to a new server before Christmas.  This was causing an error when the ‘clear results’ option was selected when viewing the entry page and I’ve sorted it now.  I also investigated an issue then including URLs in the description of place-names on the Ayrshire place-names site.  It turned out that relative URLs were getting blocked as a security risk, but a switch to absolute URLs sorted this.

I also made some fixes to some of the data stored in the Speech Star databases that had been uploaded with certain features missing and provided some CV-like information about the projects I’ve worked on for Clara Cohen in English Language to include in a proposal I’m involved with.

I then investigated a couple of issues with the Anglo-Norman Dictionary.  The first was that the Editor Geert needed a new ‘usage label’ to the XML for the site, which would then need to be searchable in the front-end.  After spending some time refamiliarising myself with how things worked I realised it should be very easy to add new labels.  All I needed to do was to add the label to the database and then it was automatically picked up by the DTD and given as an option in Oxygen.  Then when an entry is uploaded to our dictionary management system it will be processed and should (hopefully) automatically appear in the advanced search label options (once it has at least one active association to an entry).  Geert has begun using the new label and we’ll see if the process works as I think it will.

The second issue was one that Geert had noticed in the Dictionary Management System when manually editing the citations.  The ‘Edit citations’ feature allows you to select a source and then view every time this source is cited in all of the entries.  The list then lets you manually edit each of these.  The issue was cropping up when an entry featured more than one citation for a source and more than one of these needed to be edited.  In such cases some of the edits were being lost.  The problem was that when multiple citations were being edited at the same time the code was extracting the XML from the database and editing it afresh for each one, meaning only the changes for the last were being stored.  I’ve sorted this now.

I also spent some time continuing to develop the new place-names map for the Iona project.  I updated the elements glossary to add a new hidden column to the data that stores the plain alphanumeric value of the name (i.e. all weird characters removed).  This is now used for ordering purposes, meaning any elements beginning with a character such as an asterisk or a bracket now appear in the correct place.

I also began working on the ‘Table view’ and for now I’ve set this to appear in a pop-up, as with the place-name record.  Pressing on the ‘Table view’ button brings up the table, which contains all of the data currently displayed on the map.  You can reorder the table by any of the headings by pressing on them (press a second time to reverse the order).  Below is a screenshot demonstrating how this currently looks:

You can also press on a place-name to close the table and open the place-name’s full record.  The map also pans and zooms to the relevant position.  What I haven’t done yet is managed to get the marker pop-up to also appear, so it’s currently still rather difficult to tell which marker is the one you’re interested in.  I’m working on this, but it’s surprisingly difficult to achieve due to the leaflet mapping library not allowing you to assign IDs to markers.

Using the pop-up for the table is good in some ways as it keeps everything within the map interface, but it does have some issues.  I’ve had to make the pop-up wider to accommodate the table, so it’s not going to work great on mobile phones.  Also when the table is very long and you try to use the browser’s scrollbar the pop-up closes, which is pretty annoying.  I think I might have found a solution to this (adding a scrollbar to the pop-up content itself), but I haven’t had time to implement it yet.  An alternative may be to just have the table opening in a new page, but it seems a shame to not have everything contained in the map.

Finally this week I spent some time working through the poems on the old ‘Anthology of 16th and Early 17th Century Scots Poetry’ site.  I created a new interface for this last week (which is not yet live) and decided it would make sense to migrate the poems from ancient HTML to TEI XML.  This week I began this process, manually converting the firs 13 poems (out of about 130) to XML.  With these in place I then worked on an XSLT stylesheet that would transform this XML into the required HTML.  It was here that I ran into an issue.  I absolutely hate working with XSLT – it’s really old fashioned and brittle and cumbersome to use.  Things I can literally achieve in seconds for HTML using jQuery can take half a day of struggling with XSLT.  This time I couldn’t even get the XSLT script to output anything.  I was using similar code to that which I’d created for the DSL and AND, but nothing was happening.  I spent hours trying to get the XSLT to match any element in the XML but nothing I tried worked.  Eventually I asked Luca, who has had more experience with XML than I have had, for advice and thankfully he managed to identify the issue.  I hadn’t included a namespace for TEI in the XSLT file.  I needed to include the following reference:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0">

And then each time I wanted to match an XML element such as <text> I couldn’t just use ‘text’ but instead needed to add a tei prefix ‘tei:text’.  Without this the XSLT wouldn’t do anything.  Yep, XSLT is brittle and cumbersome and an absolute pain to use.  But with Luca’s help I managed to get it working and other than transcribing the remaining texts (a big job I’ll tackle bit by bit over many weeks) all is pretty much in place.

 

Week Beginning 8th January 2024

This was my first week back after the Christmas holidays and after catching up with emails I spent the best part of two days fixing the content management system of one of the resources that had been migrated during the end of last year.   The Saints Places resource (https://saintsplaces.gla.ac.uk/) is not one I created but I’ve taken on responsibility for it due to my involvement with other place-names resources.  The front-end was migrated by Luca and was working perfectly, but he hadn’t touched the CMS, which is understandable given that the project launched more than ten years ago.  However, I was contacted during the holidays by one of the project team who said that the resource is still regularly updated and therefore I needed to get the CMS up and running again.  This required updates to database query calls and session management and it took quite some time to update and test everything.  I also lost an hour or so with a script that was failing to initiate a session, even though the session start code looked identical to other scripts that worked.  It turned out that this was due to the character encoding of the script, which had been set to UTF-8 BOM, which meant that hidden characters were being outputted to the browser by PHP before the session was instantiated, which then made the session fail.  Thankfully once I realised this it was straightforward to convert the script from UTF-8 BOM to regular UTF-8, which solved the problem.

With this unexpected task out of the way I then returned to my work on the new map interface for the Place-names of Iona project, working through the ‘to do’ list I’d created after our last project meeting just before Christmas.  I updated the map legend filter list to add in a ‘select all’ option.  This took some time to implement but I think it will be really useful.  You can now deselect the ‘select all’ to be left with an empty map, allowing you to start adding in the data you’re interested in rather than having to manually remove all of the uninteresting categories.  You can also reselect ‘select all’ to add everything back in again.

I did a bit of work on the altitude search, making it possible to search for an altitude of zero (either on its own or with a range starting at zero such as ‘0-10’).  This was not previously working as zero was being treated as empty, meaning the search didn’t run.  I’ve also fixed an issue with the display of place-names with a zero altitude – previously these displayed an altitude of ‘nullm’ but they now display ‘0m’.  I also updated the altitude filter groups to make them more fine-grained and updated the colours to make them more varied rather than the shades of green we previously had.  Now 0-24m is a sandy yellow, 25-49, is light green, 50-74m is dark green, 75-99 is brown and anything over 99 is dark grey (currently no matching data).

I also made the satellite view the default map tileset, with the previous default moved to third in the list and labelled ‘Relief’.  This proved to be trickier to update than I thought it would be (e.g. pressing the ‘reset map’ button was still loading the old default even though it shouldn’t have) but I managed to get it sorted.  I also updated the map popups so they have a white background and a blue header to match the look of the full record and removed all references to Landranger maps in the popup as these were not relevant.  Below is a screenshot showing these changes:

I then moved onto the development of the elements glossary, which I completed this week.  This can now be accessed from the ‘Element glossary’ menu item and opens in a pop-up the same as the advanced search and the record.  By default elements across all languages are loaded but you can select a specific language from the drop-down list.  It’s also possible to cite or bookmark a specific view of the glossary, which will load the map with the glossary open at the required place.

I’ve tried to make better use of space than similar pages on the old place-names sites by using three columns.  The place-name elements are links and pressing on one performs a search for the element in question.  I also updated the full record popup to link the elements listed in it to the search results.  I had intended to link to the glossary rather than the search results, which is what happens in the other place-names sites, but I thought it would be more useful and less confusing to link directly to the search results instead.  Below is a screenshot showing the glossary open and displaying elements in Scottish Standard English:

I also think I’ve sorted out the issue with in-record links not working as they should in Chrome and other issues involving bar characters.  I’ve done quite a bit of testing with Chrome and all seems fine to me, but I’ll need to wait an see if other members of the team encounter any issues.  I also added in the ‘translation’ field to the popup and full record, although there are only a few records that currently have this field populated, relabelled the historical OS maps and fixed a bug in the CMS that was resulting in multiple ampersands being generated when an ampersand was used in certain fields.

My final update for the project this week was to change the historical forms in the full record to hide the source information by default.  You now need to press a ‘show sources’ checkbox above the historical forms to turn these on.  I think having the sources turned off really helps to make the historical forms easier to understand.

I also spent a bit of time this week on the Books and Borrowing project, including participating in a project team Zoom call on Monday.  I had thought that we’d be ready for a final cache generation and the launch of the full website this week, but the team are still making final tweaks to the data and this had therefore been pushed back to Wednesday next week.  But this week I updated the ‘genre through time’ visualisation as it turned out that the query that returned the number of borrowing records per genre per year wasn’t quite right and this was giving somewhat inflated figures, which I managed to resolve.  I also created records for the first volume of the Leighton Library Minute Books.  There will be three such volumes in total, all of which will feature digitised images only (no transcriptions).  I processed the images and generated page records for the first volume and will tackle the other two once the images are ready.

Also this week I made a few visual tweaks to the Erskine project website (https://erskine.glasgow.ac.uk/) and I fixed a misplaced map marker in the Place-names of Berwickshire resource (https://berwickshire-placenames.glasgow.ac.uk/).  For some reason the longitude was incorrect for the place-name, even though the latitude was fine, which resulted in the marker displaying in Wales.  I also fixed a couple of issues with the Old English Thesaurus for Jane Roberts and responded to a query from Jennifer Smith regarding the Speak For Yersel resource.

Finally, I investigated an issue with the Anglo-Norman Dictionary.  An entry was displaying what appeared to be an erroneous first date so I investigated what was going on.  The earliest date for the entry was being generated from this attestation:

<attestation id="C-e055cdb1"><dateInfo>

<text_date post="1390" pre="1314" cert="">1390-1412</text_date>

<ms_date post="1400" pre="1449" cert="">s.xv<sup>1</sup></ms_date>

</dateInfo>

<quotation>luy donantz aussi congié et eleccion d’estudier en divinitee ou en loy canoun a son plesir, et ce le plus favorablement a cause de nous</quotation>

<reference><source siglum="Lett_and_Pet" target=""><loc>412.19</loc></source></reference>

</attestation>

Specifically the text date:

<text_date post="1390" pre="1314" cert="">1390-1412</text_date>

This particular attestation was being picked as the earliest due to a typo in the ‘pre’ date which is 1314 when it should be 1412.  Where there is a range of dates the code generates a single year at the midpoint that is used as a hidden first date for ordering purposes (this was agreed upon back when we were first adding in first dates of attestation).  The code to do this subtracts the ‘post’ date from the ‘pre’ date, divides this in two and then adds it to the ‘post’ date, which finds the middle point.  With the typo the code therefore subtracts 1390 from 1314, giving -76.  This is divided in two giving -38.  This is then added onto the ‘post’ date of 1390, which gives 1352.  1352 is the earliest date for any of the entry’s attestations and therefore the earliest display date is set to ‘1390-1412’.  Fixing the typo in the XML and processing the file would therefore rectify the issue.

Week Beginning 18th December 2023

This was the last working week before Christmas, and it was a four-day week due to Friday being given in lieu of Christmas Eve, which is on Sunday this year.  I’ll be off for the next two weeks, which I’m really looking forward to.

The Books and Borrowing project officially comes to an end on the 31st of December, although we’re not going to launch the front-end until sometime in January.  I still had rather a lot of things to do for the project and therefore spent the entirely of my four days this week working for this project.  Of course, the other team members were also frantically trying to get things finished off, which often led to them spotting something they needed me to sort out, so I found myself even busier than I was expecting.  However, by Thursday I had managed to complete all of the tasks I’d hoped to finish, plus many more that were sent my way as the week progressed.

At the end of last week I’d begun updating the site text that the project PI Katie had supplied me with, and I completed this task, finally banishing all of the placeholder text.  This also involved much discussion about the genre visualisations and what they actually represent, which we thankfully reached agreement on.  I also added in some further images of a register at Leighton library and wrote a script to batch update changes to the publication places, dates of publication and formats of many book edition records.  One of the researchers also spotted that the ‘next’ and ‘previous’ links for the two Selkirk registers were not working, due to an earlier amalgamation of page records into ‘double spread’ records.  I therefore wrote another script to sort these out.

I then added new ‘Top ten book work’ lists to the site-wide ‘Facts’ page (overall and by the gender of borrowers).  This required me to update the script that generated the cache that I developed last week, to rerun the script to generate fresh data, to update the API to ensure that works were incorporated into the output and to update the front-end to add in the data.  Hopefully the information will be of interest to people.

I then overhauled the highlighting of search terms in the search results.  This was previously only working with the quick search, and only when no wildcards were used in the search.  Instead I used a nice JavaScript library called mark.js (https://markjs.io/) that I’d previously used for the DSL website to add in highlighting on the client-side.  Now any the values in any search fields that are searched for will be highlighted in the record, including when wildcards are used.  I also updated the highlight style to make it a bit less harsh.

It should be noted that highlighting is still a bit of a blunt tool – any search terms will be highlighted throughout the entire record where the term is found.  So if you search for the occupation ‘farmer’ then wherever ‘farmer’ is found in the record it will be highlighted, not just in the normalised occupation list.  Similarly, if you search for ‘born’ then the ‘born’ text in the author information will be highlighted.  It’s not feasible to make the highlighting more nuanced in the time we have left, but despite this I think that on the whole the highlighting is useful.

I reckoned that the highlighting could end up being a bit distracting so I added in an option to turn results highlighting on or off.  I added a button to process this to the search results page, as part of the buttons that include the ‘Cite’ and ‘Download’ options.  The user’s choice is remembered by the site, so if you turn highlighting off and then navigate through the pages of results or perform a filter the highlights stay off.  They will stay off until you turn them on again, even if you return to the site after closing your browser.

One of the researchers noticed that an unnecessary near-duplicate genre had somehow been introduced into the system (‘Fine Art’ instead of ‘Fine Arts’) so I removed the and reassigned any records that were assigned to the erroneous version.  The PI Katie also spotted some odd behaviour with the search form boxes.  When using the browser’s ‘back’ button search data was being added to the wrong search boxes.  This took quite some time to investigate and I couldn’t replicate the issue in Firefox (the browser I use by default), but when using a Chrome-based browser (MS Edge) I experience the issue.  It turns out it’s nothing to do with my code but a bug in Chrome (see https://github.com/vuejs/vue/issues/11165).  The fix mentioned on this page was to add ‘autocomplete=”off”’ to the form and this seems to have sorted the problem.  It’s crazy that this issue with Chrome hasn’t been fixed as the posts on the page identifying the issue started in 2020.

Katie also spotted another issue when using Chrome.  Applying multiple filters to the search results wasn’t working in Chrome, even though it worked fine in Firefox.  This time it was caused by Chrome encoding the bar character to %7C while Firefox keeps it as ‘|’.  My filter script was splitting up filters on the actual bar character and as this wasn’t present in Chrome multiple filters were not working (even though they were working fine in Firefox).  Thankfully once identified this was relatively easy to fix.

I also managed to implement a ‘compact’ view of borrowing records this week, something that had been on my ‘to do’ list for a while.  Borrowing records can be extremely verbose and rather overwhelming so we decided to give the option to view compact versions of the records that contain a narrower set of fields.  I added a  compact / full record view switcher to the bar of options in the top right of the search results and library register page, beside the ‘Cite’ option.  As with the highlighting feature I previously discussed, the choice is remembered in your browser, even if you return to the site in a later session (so long as you’re using the same device and browser, of course).

For the compact view I decided to retain the links to the library, register and page as I figured it would be useful to be able to see these.  Also included are the borrowed and returned dates, the borrowers (names only), the title of the Book Work (or Works) if the record has such an association and the title of the Holding if not, any associated authors and genres, plus a list of the volumes borrowed (if applicable).  The following screenshot shows what the compact looks like:

My final tasks of the week were to add in a cookie banner for the site and install Google Analytics.  In the New Year I’ll need to regenerate the Solr index and then integrate the development site with the live site.  This will include making updates to paths throughout the code, ensuring the existing Chambers Maps continues to function, adding links to the pages of the development site to the site menu and adding the quick search option to the site header.  It will be great once the site is fully accessible.

Also this week I created a new Google Analytics property for a site the DSL launched a month or two ago and spoke to Geert, the editor of the AND about an issue he’d spotted with entry dates (that I’ll investigate after Christmas).  I finished off my work for the year by removing the Twitter widget from every site I’m responsible for.  Twitter blocked access to the widget that allows a feed to be embedded in a website a few months ago and it looks like this is a permanent change.  It meant instead of a nice Twitter feed an empty feed and a ‘nothing to see here’ message was displayed on all of my sites, which was obviously no good.  It feels quite liberating to drop Twitter (or X as it is currently called).

That’s all from me for this year.  If anyone is reading this I wish you all the best for Christmas and 2024!

Week Beginning 11th December 2023

I devoted about three days of this week to developing the new place-names map for the Iona project.  My major task was to make the resource ‘remember’ things, which has taken a long time to implement as basically everything I’d developed so far needed to be extended and reworked.  Now as you scroll around the map and zoom in and out the hash in the URL in the address bar updates.  Your selected ‘Display options’ also appear in the hash.  What this means is you can bookmark or share a specific view.  I can’t share the full URL of the map yet as it’s not publicly available, but for example the hash ‘#15.5/56.3490/-6.4676/code/tileNLS1/labelsOn’ provides a URL that loads the map zoomed in on the islands of Stac MhicMhurchaidh and Rèidh-Eilean, with the OS 1840-1880 base map selected, categorised by classification code and labels always on.  Any search or browse options you’ve entered or selected are also remembered in this way, for example the hash ‘#16/56.3305/-6.3967/language/tileSat/labelsOff/browse/nameCurrent|C’ gives a URL for a ‘browse’ showing current names beginning with ‘C’ on a satellite map categorised by element language focussed on the main settlement.

The same approach also works for the search facility, with search options separated by a double bar and the search term and its value separated by a single bar.  For example ‘#13.75/56.3287/-6.4139/date/tileSat/labelsOn/advancedSearch/eid|181||elementText|d%C3%B9n%20(G)||lang|G’ gives a URL for an advanced search showing results where the element language is Gaelic and the element in question is ‘dùn’, categorised by earliest recorded date.  With an advanced search you can now press on the ‘Refine your advanced search’ button and any advanced search options previously entered will appear in the search boxes.

You can also now bookmark and share a record when you open the ‘full record’, as the action of opening a record also adds information to the URL in the address bar.  So, for example ‘#14/56.3293/-6.4125/date/tileSat/labelsOn/advancedSearch/eid|181||elementText|d%C3%B9n%20(G)||lang|G/record/4262’ is the hash of the record for ‘Dùn nam Manach’.  Note that this also preserves whatever search or browse led to the place-name being opened too.

I also made some updates to the ‘full record’, the most important being that the text appearing as links can now be clicked on to perform a search for that information.  So for example if you press on the classification code in the record this will close the record and perform a search for the code.  Place-name element links are not yet operational, though, as these will link to the element glossary, which I still need to create.  I have however created a new menu item in the left-hand menu for the glossary and have figured out how it will work.  I’m intending to make it a modal pop-up like the record and advanced search.  I did consider adding it into the left-hand menu like the ‘browse’ but there’s just too much information for it to work there.

I also added in a new ‘Cite this record’ tab to the ‘full record’ which will display citation information for the specific record.  I still need to add this in.  Also new is a bar of icons underneath the left-hand menu options.  This contains buttons for citing the map view, viewing the data displayed on the map as a table and downloading the map data as a CSV file, but none of these are operational yet.

On Thursday I had a meeting with the Iona team to discuss the map.  They are pretty pleased with how it’s turning out, but they did notice a few bugs and things they would like done differently.  I made a sizeable ‘to do’ list and I will tackle this in the new year.

I spent most of the remainder of the week working on the Books and Borrowing project.  I updated the languages assigned to a list of book editions that had been given to me in a spreadsheet and added a few extra pages and page images to one of the registers.   I then returned to my ‘to do’ list for the project and worked through some of the outstanding items.  I moved the treemaps on the library and site-wide ‘facts’ pages to separate tabs and I went through the code to ensure that the data for all visualisations only uses borrowing records set to ‘in stats’.  This wasn’t done before so many of the visualisations and data summaries will have changed slightly.  I also removed the non-male/female ‘top ten’ lists in the library facts page, as requested.

I then moved on to creating a cache for the facts page data, which took about a day to implement.  I firstly generated static data for each library and stored this as JSON in the database.  This is then used for the library facts page rather than processing the data each time.  However, the site-wide facts page lets the user select any combination of libraries (or select all libraries) and the ‘top ten’ lists therefore have to dynamically reflect the chosen libraries.  This meant updating the API to pull in the ‘facts’ JSON files for each selected library and then analyse them in order to generate new ‘top tens’ for the chosen libraries.  For example, working out the top ten genres for all selected libraries meant going through the individual top ten genre lists for each library, working out the total number of borrowings for each genre and then reordering things after this merging of data was complete.

Despite still requiring this processing the new method of using the cached data is considerably faster than querying and generating the data afresh each time the user requests it.  Previously displaying the site-wide ‘facts’ page for all libraries was taking up to a minute to complete whereas now it takes just a few seconds.  I also made a start on updating the site text that Katie had sent me earlier in the week.  A large number of tweaks and changes are required and this is likely to take quite a long time, but I hope to have it finished next week.

Towards the start of the week I also spent some time in discussions about what should become of the Pilot Scots Thesaurus website.  The project ended in 2015, the PI moved on from the University several years ago and the domain name will expire in April 2024.  We eventually decided that the site will be archived, with the data added to the Enlighten research data repository and the main pages of the site archived and made available via the University’s web archive partner.

Towards the end of the week I did some further work for the Anglo-Norman Dictionary, including replacing the semi-colon with a diamond in the entry summary box (e.g. see https://anglo-norman.net/entry/colur_1) and discussing whether labels should also appear in the box (we decided against it).  I also had a discussion with the editor Geert about adding new texts to the Textbase and thought a little about the implications of this, given that the texts are not marked up as TEI XML and the Textbase was developed around TEI XML texts.  I’ll probably do some further investigation in the new year once Geert sends me on some sample files to work with.

 

Week Beginning 4th December 2023

After spending much of my time over the past three weeks adding genre to the Books and Borrowing project I turned my attention to other projects for most of this week.  One of my main tasks was to go through the feedback from the Dictionaries of the Scots Language people regarding the new date and quotation searches I’d developed back in September.  There was quite a lot to go through, fixing bugs and updating the functionality and layout of the new features.  This included fixing a bug with the full text Boolean search, which was querying the headword field rather than the full text and changing the way quotation search ranking works.  Previously quotation search results were ranked by the percentage of matching quotes, and if this was the same then the entry with the largest number of quotes would appear higher.  Unfortunately this meant that entries with only one quote ended up ranked higher than entries with large numbers of quotes, not all of which contained the term.  I updated this so that the algorithm now counts the number of matching quotes and ranks primarily on this, only using the percentage of matching quotes when two entries have the same number of matching quotes.  So now a quotation search for ‘dreich’ ranks what are hopefully the most important entries first.

I also updated the display of dates in quotations to make them bold and updated the CSV download option to limit the number of fields that get returned.  I also noticed that when a quotation search exceeded the maximum number of allowed results (e.g. ‘heid’) it was returning no results due to a bug in the code, which I fixed.  I also fixed a bug that was stopping wildcards in quick searches from working as intended and fixed an issue with the question mark wildcard in the advanced headword search.

I then made updates to the layout of the advanced search page, including adding placeholder ‘YYYY’ text to the year boxes, adding a warning about the date range when dates provided are beyond the scope of the dictionaries and overhauling the search help layout down the right of the search form.  The help text scroll down/up was always a bit clunky so I’ve replaced it with what I think is a neater version.  You can see this, and the year warning in the following screenshot:

I also tweaked the layout of the search results page, including updating the way the information about what was search for is displayed, moving some text to a tooltip, moving the ‘hide snippets’ option to the top menu bar and ensuring the warning that is displayed when too many results are returned appears directly above the results.  You can see all of this in the following screenshot:

I then moved onto updates to the sparklines.  The team decided they wanted the gap length between attestations to be increased from 25 to 50 years.  This would mean individual narrow lines would then be grouped into thicker blocks.  They also wanted the SND sparkline to extend to 2005, whereas previously it was cut off at 2000 (with any attestations after this point given the year 2000 in the visualisation).  These updates required me to make changes to the scripts that generate the Solr data and to then regenerate the data and import it into Solr.  This took some time to develop and process, and currently the results are only running on my laptop as it’s likely the team will want further changes made to the data.  The following screenshot shows a sparkline when the gap length was set to 25 years:

And the following screenshot shows the same sparkline with the gap length set to 50 years:

I also updated the dates that are displayed in an entry beside the sparkline to include the full dates of attestation as found in the sparkline tooltip rather than just displaying the first and last dates of attestation.

I completed going through the feedback and making updates on Wednesday and now I need to want and see whether further updates are required before we go live with the new date and quotation search facilities.

I spent the rest of the week working on various projects.  I made a small tweak to remove an erroneous category from the Old English Thesaurus and dealt with a few data issues for the Books and Borrowing project too, including generating spreadsheets of data for checking (e.g. list of all of the distinct borrower titles) and then making updates to the online database after these spreadsheets had been checked.  I also fixed a bug with the genre search, which was joining multiple genre selections with Boolean AND when it should have been joining them with Boolean OR.

I also returned to working for the Anglo-Norman Dictionary.  This included updating the XSLT so that legiturs in variant lists displayed properly (see ‘la noitement (l. l’anoitement))’ here: https://anglo-norman.net/entry/anoitement).  Whilst sorting this out I noticed that some entries would appear to have multiple ‘active’ records in the database – a situation that should not have happened.  After spotting this I did some frantic investigation to understand what was going on.  Thankfully it turned out that the issue has only affected 23 entries, with all but two of them having two active records.  I’m not sure what happened with ‘bland’ to result in 36 active records, and ‘anoitement’ with 9, but I figured out a way to resolve the issue and ensure it doesn’t happen again in future.  I updated the script that publishes holding area entries to ensure any existing ‘active’ records are removed when the new record is published.  Previously the script was only dealing with one ‘active’ entry (as that is all there should have been), which I think may have been how the issue cropped up.  In future the duplicate issue will rectify itself whenever one of the records with duplicate active records is edited – at the point of publication all existing ‘active’ records will be moved to the ‘history’ table.

Also for the AND this week I updated the DTD to ensure that superscript text is allowed in commentaries.  I also removed the embedded Twitter feed from the homepage as it looks like this facility has been permanently removed by Twitter / X.  I’ve also tweaked the logo on narrow screens so it doesn’t display so large, which should make the site better to use on mobile phones and I fixed an issue with the entry proofreader which was referencing an older version of jQuery that no longer existed.  I also fixed the dictionary’s ‘browse up’ facility, which had broken.

I also found some time to return to working on the new map interface for the Iona place-names project and have now added in the full record details.  When you press on a marker to open the popup there is now a ‘View full record’ button.  Pressing on this opens an overlay the same as the ‘Advanced search’ that contains all of the information about the record, in the same way as the record page on the other place-name resources.  This is divided into a tab for general information and another for historical forms as you can see from the following screenshot:

Finally this week I kept project teams updated on another server move that took place overnight on Thursday.  This resulted in downtime for all affected websites, but all was working again the next morning.  I needed to go through all of the websites to ensure they were working as intended after the move, and thankfully all was well.

Week Beginning 27th November 2023

I completed work on the integration of genre into the Books and Borrowing systems this week.  It took a considerable portion of the week to finalise the updates but it’s really great to have it done, as it’s the last major update to the project.

My first task was to add genre selection to the top-level ‘Browse Editions’ page, which I’m sure will be very useful.  As you can see in the following screenshot, genres now appear as checkboxes as with the search form, allowing users to select one or more they’re interested in.  This can be done in combination with publication date too.  The screenshot shows the book editions that are either ‘Fiction’ or ‘Travel’ that were published between 1625 and 1740.  The selection is remembered when the user changes to a different view (i.e. authors or ‘top 100’) and when they select a different letter from the tabs.

It proved to be pretty tricky and time-consuming to implement.  I realised that not only did the data that is displayed need to be updated to reflect the genre selection, but the counts in the letter tabs needed to be updated too.  This may not seem like a big thing, but the queries behind it took a great deal of thought.  I also realised whilst working on the book counts that the counts in the author tabs were wrong – they were only counting direct author associations at edition level rather than taking higher level associations from works into consideration.  Thankfully this was not affecting the actual data that was displayed, just the counts in the tabs.  I’ve sorted this too now, which also took some time.

With this in place I then added a similar option to the in-library ‘Book’ page.  This works in the same way as the top-level ‘Editions’ page, allowing you to select one or more genres to limit the list of books that are displayed, for example only books in the genres of ‘Belles Lettres’ and ‘Fiction’ at Chambers, ordered by title or the most popular ‘Travel’ books at Chambers.  This did unfortunately take some time to implement as Book Holdings are not exactly the same as Editions in terms of their structure and connections so even though I could use much of the same code that I’d written for Editions many changes needed to be made.

The new Solr core was also created and populated at Stirling this week, after which I was able to migrate my development code from my laptop to the project server, which meant I could share my work with others, which was good.

I then moved onto adding genre to the in-library ‘facts’ page and the top-level ‘facts’ page.  Below is a very long screenshot of the entire ‘facts’ page for Haddington library and I’ll discuss the new additions below:

The number of genres found at the library is now mentioned in the ‘Summary’ section and there is now a ‘Most popular genres’ section, which is split by gender as with the other lists.  I also added in pie charts showing book genres represented at the library and the percentage of borrowings of each genre.  Unfortunately these can get a bit cluttered due to there being up to 20-odd genres present, so I’ve added in a legend showing which colour is which genre.  You can hover over a slice to view the genre title and name and you can click on a slice to perform a search for borrowing records featuring a book of the genre in the library.  Despite being a bit cluttered I think the pies can be useful, especially when comparing the two charts – for example at Haddington ‘Theology’ books make up more than 36% of the library but only 8% of the borrowings.

Due to the somewhat cluttered nature of the pie charts I also experimented with a treemap view of Genre.  I had stated we would include such a view in the requirements document, but at that time I had thought genre would be hierarchical, and a treemap would display the top-level genres and the division of lower level genres within these.  Whilst developing the genre features I realised that without this hierarchy the treemap would merely replicate the pie chart and wouldn’t be worth including.

However, when the pie charts turned out to be so cluttered I decided to experiment with treemaps as an alternative.  The results currently appear after the pie charts in the page.  I initially liked how they looked – the big blocks look vaguely ‘bookish’ and having the labels in the blocks makes it easier to see what’s what.  However, there are downsides.  Firstly, it can be rather difficult to tell which genre is the biggest, due to the blocks having different dimensions – does a tall, thin block have a larger area than a shorter, fatter block, for example.  It’s also much more difficult to compare two treemaps as the position of the genres changes depending on their relative size.  Thankfully the colour stays the same, but it takes longer than it should to ascertain where a genre has moved to in the other treemap and how its size compares.  I met with the team on Friday to discuss the new additions and we agreed that we could keep the treemaps, but that I’d add them to a separate tab, with only the pie charts visible by default.

I then added in the ‘borrowings over time by genre’ visualisation to the in-library and top level ‘facts’ pages.  As you can see from the above screenshot, these divide the borrowings in a stacked bar chart per year (other month if a year is clicked on) into genre, much in the same way as the preceding ‘occupations’ chart.  Note however that the total numbers for each year are not the same as for the occupations through time visualisation as books may have multiple genres and borrowers may have multiple occupations and the counts reflect the number of times a genre / occupation is associated with a borrowing record each year (or month if you drill down into a year).  We might need to explain this somewhere.

We met on Friday to discuss the outstanding tasks.  We’ll probably go live with the resource in January, but I will try to get as many of my outstanding tasks completed before Christmas as possible.

Also this week I fixed another couple of minor issues with the Dictionaries of the Scots Language.  The WordPress part of the site had defaulted to using the new, horrible blocks interface for widgets after a recent update, meaning the widgets I’d created for the site no longer worked.  Thankfully installing the ‘Classic Widgets’ plugin fixed the issue.  I also needed to tweak the CSS for one of the pages where the layout was slightly wonky.

I also made a minor update to the Speech Star site and made a few more changes to the new Robert Fergusson site, which has now gone live (see https://robert-fergusson.glasgow.ac.uk/). I also had a chat with our IT people about a further server switch that is going to take place next week and responded to some feedback about the new interactive map of Iona placenames I’m developing.

Also this week I updated the links to one of the cognate reference websites (FEW) from entries in the Anglo-Norman Dictionary, as the website had changed its URL and site structure.  After some initial investigation it appeared that the new FEW website made it impossible to link to a specific page, which is not great for an academic resource that people will want to bookmark and cite.  Ideally the owners of the site should have placed redirects from the pages of the old resource to the corresponding page on the new resource (as I did for the AND).

The old links to the FEW as were found in the AND (e.g. the FEW link that before the update was on this page: https://anglo-norman.net/entry/poer_1) were formatted like so: https://apps.atilf.fr/lecteurFEW/lire/volume/90/page/231 which now gives a ‘not found’ error.  The above URL has the volume number (9, which for reasons unknown to me was specified as ‘90’) and the page number (213).  The new resource as found here: https://lecteur-few.atilf.fr/ and it lets you select a volume (e.g. 9: Placabilis-Pyxis) and enter a page (e.g. 231), which then updates the data on the page (e.g. showing ‘posse’ as the original link from AND ‘poer 1’ used to do).  But crucially, their system does not update the URL in the address bar, meaning no-one can cite or bookmark their updated view and it looked like we couldn’t link to a specific view.

Their website makes it possible to click on a form to load a page (e.g. https://lecteur-few.atilf.fr/index.php/page/lire/e/198595) but the ID in the resulting page URL is an autogenerated ID that bears no relation to the volume or page number and couldn’t possibly be ascertained by the AND (or any other system) so is of no use to us.  Also, the ‘links’ that users click on to load the above URL are not HTML links at all but are generated in JavaScript after the user clicks on them.  This means it wouldn’t be possible for me to write a script that would grab each link for each matching form.  It also means a user can’t hover over the link to see where it leads or open the link in a new tab or window, which is also not ideal.  In addition, once you’re on a page like the one linked to above navigating between pages doesn’t update the URL in the address bar so a user who loads the page than navigates to different pages and finds something of interest can’t then bookmark or cite the correct page as the URL is still for the first page they loaded, which again is not very good.

Thankfully Geert noticed that another cognate reference site (the DMF) had updated their links to use new URLs that are not documented on the FEW site, but do appear to work (e.g. https://lecteur-few.atilf.fr/lire/90/231).  This was quite a relief to discover as otherwise we would not have been able to link to specific FEW pages.  Once I knew this URL structure was available updating the URLs across the site was a quick update.

Finally this week, I had a meeting with Clara Cohen and Maria Dokovova to discuss a possible new project that they are putting together.  This will involve developing a language game aimed at primary school kids and we discussed some possible options for this during our meeting.  After wards I wrote up my notes and gave the matter some further thought.

Week Beginning 20th November 2023

I spent most of this week working towards adding genre to the Books and Borrowing front-end, working on a version running on my laptop.  My initial task was to update the Solr index to add in additional fields for genre.  With the new fields added I then had to update my script that generates the data for Solr to incorporate the fields.  The Solr index is of borrowing records so as with authors, I needed to extract all genre associations at all book levels (work, edition, holding, item) for each book that was associated with a borrowing record, ensuring lower level associations replaced any higher level associations and removing any duplicates.  This is all academic for now as all genre associations are at Work level, but this may not always be the case.  It took a few attempts to get the data just right (e.g. after one export I realised it would be good to have genre IDs in the index as well as their names) and each run-through took about an hour or so to process, but all is looking good now.  I’ll need to ask Stirling IT to create a new Solr core and ingest the new data on the server at Stirling as this is not something I have the access to do myself, and I’ll do this next week.  The screenshot below shows one of the records in Solr with the new genre fields present.

With Solr updated I then began updating the front-end, in a version of the site running on my laptop.  This required making significant updates to the API that generates all of the data for the front-end by connecting to both Solr and the database as well as updating the actual output to ensure genre is displayed.   I updated the Advanced Search forms (simple and advanced) to add in a list of genres from which you can select any you’re interested in (see the following two screenshots) and updated the search facilities to enable the selected genres to be searched, either on their own or in combination with the other search options.

On the search results page any genres associated with a matching record are displayed, with associations at higher book levels cascading down to lower book levels (unless the lower book level has its own genre records).  Genres appear in the records as clickable items, allowing you to perform a search for a genre you’re interested in by clicking on it.  I’ve also added in genre as a filter option down the left of the results page.  Any genres present in the results are listed, together with a count of the number of associated records, and you can filter the results by pressing on a genre, as you can see in the following screenshot, which shows the results of a quick search for ‘Egypt’, displaying the genre filter options and showing the appearance of genre in the records.

Genre is displayed in a similar way wherever book records appear elsewhere in the site, for example the lists of books for a library, the top-level ‘book editions’ page and when viewing a specific page in a library register.

There is still more to be done with genre, which I’ll continue with next week.  This includes adding in new visualisations for genre, adding in new ‘facts and figures’ relating to genre and adding in facilities to limit the ‘browse books’ pages to specific genres.  I’ll keep you posted next week.

I also spent some time going through the API and front-end fixing any notifications and warnings given by the PHP scripting language.  These are not errors as such, just messages that PHP logs when it thinks there might be an issue, for example if a variable is referenced without it being explicitly instantiated first.  These messages get added to a log file and are never publicly displayed (unless the server is set to display them) but it’s better to address them to avoid cluttering up the log files so I’ve (hopefully) sorted them all now. Also for the project this week I generated a list of all book editions that currently have no associated book work.  There are currently 2474 of these and they will need to be investigated by the team.

I also met with Luca Guariento and Stevie Barret to have a catch-up and also to compile a list of key responsibilities for a server administrator who would manage the Arts servers.  We discovered this week that Arts IT Support is no longer continuing, with all support being moved to central IT Services.  We still have our own servers and require someone to manage them so hopefully our list will be taken into consideration and we will be kept informed of any future developments.

Also this week I created a new blog for a project Gavin Miller is setting up, fixed an issue that took down every dictionary entry in the Anglo-Norman Dictionary (caused by one of the project staff adding an invalid ID to the system) and completed the migration of the old Arts server to our third-party supplier.

I also investigated an issue with the Place-names of Mull and Ulva CMS that was causing source details to be wiped.  The script that populates the source fields when an existing source is selected from the autocomplete list was failing to load in data.  This meant that all other fields for the source were left blank, so when the ‘Add’ button was pressed the script assumed the user wanted all of the other fields to be blank and therefore wiped them.  This situation was only happening very infrequently and what I reckon happened is that the data for the source that failed included a character that is not permitted in JSON data (maybe a double quote or a tab), meaning when the script tried to grab the data it failed to parse it and silently failed to populate the required fields.  I therefore updated the script that returns the source fields so that double quotes and tab characters are stripped out of the fields before the data is returned.  I also created a script based on this that outputs all sources as JSON data to check for errors and thankfully the output is valid JSON.

I also made a couple of minor tweaks to the Dictionaries of the Scots Language site, fixing an issue with the display of the advanced search results that had been introduced when I updated the code prior to the site’s recent migration to a new server and updating the wording of the ‘About this entry’ box.  I also had an email conversation with Craig Lamont about a potential new project and spoke to Clara Cohen about a project she’s putting together.

Week Beginning 11th September 2023

I spent a fair amount of time this week preparing for my PDR session – bringing together information about what I’ve done over the past year and filling out the necessary form.  I also had a meeting with Jennifer Smith to discuss an ESRC proposal she’s putting together using some of the data from the SCOSYA project and then spent some further time after the meeting researching some tools the project might use and reading the Case for Support.

I also spent a bit of time working for the Anglo-Norman Dictionary, updating the XSLT file to better handle varlists in citations.  So for example instead of:

( MS: s.xiiiex )  Satureia: (A6) gallice savoroye (var. saveray  (A9) MS: c.1300 ;  saveroy  (A12) MS: s.xiii4/4 ;  savoreie  (B3) MS: s.xiv4/4 ;  savoré  (C35) MS: s.xv )  Plant Names 230

we’d have:

( MS: s.xiiiex )  Satureia: (A6) gallice savoroye (var.  (A9: c.1300) saveray; (A12: s.xiii4/4) saveroy; (B3: s.xiv4/4) savoreie;  (C35: s.xv) savoré)  Plant Names 230

I completed an initial version of the update using test files and after discussions with the editor Geert and a few minor tweaks the update went live on Wednesday.

I also spent a bit of time working to fix the House of Fraser Archive website, which I created with Graeme Cannon many moons ago.  It uses an eXist XML database but needed to be migrated to a new server with more modern versions due to security issues.  I spent some time figuring out how to connect to the new eXist database and had just managed to find a solution when the server went down and I was unable to access it.  It was still offline at the end of the week, which is a bit frustrating.

I also made a couple of minor tweaks to a conference website for Matthew Creasy and gave some advice to Ewan Hannaford about adding people to a mailing list.  My updates to the DSL also went live this week on the DSL’s test server, and I emailed the team a detailed report of the changes, highlighting points for discussion.  I’m sure I’ll need to make a number of changes to the features I’ve developed over the past few weeks once the team have had a chance to test things out.  We’ll see what they say once they get back to me.

I was also contacted this week by Eleanor Lawson with a long list of changes she wanted me to make to the two Speech Star websites.  Many of these were minor tweaks to text, but there were some larger issues too.  I needed to update the way sound filters appear on the website in order to group different sounds together and to ensure the sounds always appear in the correct order.  This was a pretty tricky thing to accomplish as the filters are automatically generated and vary depending on what other filter options the user has selected.  It took a while to get working, but I got there in the end, thankfully.  Eleanor had also sent me a new set of videos that needed to be added to the Edinburgh MRI Modelled Speech Corpus.  These were chunks of some of the existing videos as a decision had been made that splitting them up would be more useful for users.  I therefore had to process the videos and add all of the required data for them to the database.  All is looking good now, though.

Next week I’ll be participating in the UCU strike action on Monday and Tuesday so it’s going to be a short week for me.

Week Beginning 4th September 2023

I continued with the new developments for the Dictionaries of the Scots Language for most of this week, focussing primarily on implementing the sparklines for dates of attestation.  I decided to use the same JavaScript library as I used for the Historical Thesaurus (https://omnipotent.net/jquery.sparkline) to produce a mini bar chart for the date range, with either a 1 when a date is present or a zero where a date is not present.  In order to create the ranges for an entry all of the citations that have a date for the entry are returned in date order.  For SND the sparkline range is 1700 to 2000 and for DOST the range is 1050 to 1700.  Any citations with dates beyond this are given a date of the start or end as applicable.  Each year in the range is created with a zero assigned by default and then my script iterates through the citations to figure out which of the years needs to be assigned a 1, taking into consideration citations that have a date range in addition to ones that have a single year.  After that my script iterates through the years to generate blocks of 1 values where individual 1s are found 25 years or less from each other, as I’d agreed with the team, in order to make continuous periods of usage.  My script also generates a textual representation of the blocks and individual years that is then used as a tooltip for the sparkline.

I’d originally intended each year in the range to then appear as a bar in the sparkline, with no gaps between the bars in order to make larger blocks, but the bar chart sparkline that the library offers has a minimum bar width of 1 pixel.  As the DOST period is 650 years this meant the sparkline would be 650 pixels wide.  The screenshot below shows how this would have looked (note that in this and the following two screenshots the data represented in the sparklines is test data and doesn’t correspond to the individual entries):

I then tried grouping the individual years into bars representing five years instead.  If a 1 was present in a five-year period then the value for that five year block was given a 1, otherwise it was given a 0.  As you can see in the following screenshot, this worked pretty well, giving the same overall view of the data but in a smaller space.  However, the sparklines were still a bit too long.  I also added in the first attested date for the entry to the left of the sparkline here, as was specified in the requirements document:

As a further experiment I grouped the individual years into bars representing a decade, and again if a year in that decade featured a 1 the decade was assigned a 1, otherwise it was assigned a 0.  This resulted in a sparkline that I reckon is about the right size, as you can see in the screenshot below:

With this in place I then updated the Solr indexes for entries and quotations to add in fields for the sparkline data and the sparkline tooltip text.  I then updated my scripts that generated entry and quotation data for Solr to incorporate the code for generating the sparklines, first creating blocks of attestation where individual citation dates were separated by 25 years or less and then further grouping the data into decades.  It took some time to get this working just right.  For example, on my first attempt when encountering individual years the textual version was outputting a range with the start and end year the same (e.g. 1710-1710) when it should have just outputted a single year.  But after a few iterations the data outputted successfully and I imported the new data into Solr.

With the sparkline data in Solr I then needed to update the API to retrieve the data alongside other data types and after that I could work with the data in the front-end, populating the sparklines for each result with the data for each entry and adding in the textual representation as a tooltip.  Having previously worked with a DOST entry as a sample, I realised at this point that as the SND period is much shorter (300 years as opposed to 650) the SND sparklines would be a lot shorter (30 pixels as opposed to 65).  Thankfully the sparkline library allows you to specify the width of the bars as each sparkline is generated and I set the width of SND bars to two pixels as opposed to the one pixel for DOST, making the SND sparklines a comparable 600 pixels wide.  It does mean that the visualisation of the SND data is not exactly the same as for DOST (e.g. an individual year is represented as 2 pixels as opposed to 1) but I think the overall picture given is comparable and I don’t think this is a problem – we are just giving an overall impression of periods of attestation after all.  The screenshot below shows the search results with the sparklines working with actual data, and also demonstrates a tooltip that displays the actual periods of attestation:

At this point I spotted another couple of quirks that needed to be dealt with.  Firstly, we have some entries that don’t feature any citations that include dates.  These understandably displayed a blank sparkline.  In such cases I have updated the tooltip text to display ‘No dates of attestation currently available’.  Secondly, there is a bug in the sparkline library that means an empty sparkline is displayed if all data values are identical.  Having spotted this I updated my code to ensure a full block of colour was displayed in the sparkline instead of white.

With the sparklines in the search results now working I then moved onto the display of sparklines in the entry page.  I wasn’t entirely sure where was the best place to put the sparkline so for now I’ve added it to the ‘About this entry’ section.  I’ve also added in the dates of attestation to this section too.  This is a simplified version showing the start and end dates.  I’ve used ‘to’ to separate the start and end date rather than a dash because both the start and end dates can in themselves be ranges.  This is because here I’m using the display version of the first date of the earliest citation and the last date of the latest citation (or first date if there is no last date).  Note that this includes prefixes and representations such as ’15..’.  The sparkline tooltip uses the raw years only.  You can see an entry with the new dates and sparkline below:

The design of the sparklines isn’t finalised yet and we may choose to display them differently.  For example, we don’t need to use the purple I’ve chosen and we could have rounded ends.  The following screenshot shows the sparklines with the blue from the site header as a bar colour and rounded ends.  This looks quite pleasing, but rounded ends do make it a little more difficult to see the data at the ends of the sparkline.  See for example DOST ‘scunner n.’ where the two lines at the very right of the sparkline are a bit hard to see.

I also managed to complete the final task in this block of work for the DSL, which was to add in links to the search results to download the data as a CSV.  The API already has facilities to output data as a CSV, but I needed to tweak this a bit to ensure the data was exported as we needed it.  Fields that were arrays were not displaying properly and certain fields needed to be supressed.   For other sites I’ve developed I was able to link directly to the API’s CSV output from the front-end but the DSL’s API is not publicly accessible so I had to do things a littler differently here.  Instead pressing on the ‘download’ link fires an AJAX call to a PHP script that passes the query string to the API without exposing the URL of the API, then takes the CSV data and presents it as a downloadable file.  This took a bit of time to sort out as the API was in itself offering the CSV as a downloadable file and this wasn’t working when being passed to another script.  Instead I had to set the API to output the CSV data on screen, meaning the scripts called via AJAX could then grab this data and process it.

With all of this working I put in a Helpdesk request to get the Solr instances set up and populated on the server and I then copied all of the updated files to the DSL’s test instance.  As of Friday the new Solr indexes don’t seem to be working but hopefully early next week everything will be operational.  I’ll then just need to tweak the search strings of the headword search so that the new Solr headword search matches the existing search.

Also this week I had a chat with Thomas Clancy about the development of the front-end for the Iona place-names project.  About a year ago I wrote a specification for the front-end but never heard anything further about it, but it looks like development will be starting soon.  I also had a chat with Jennifer Smith about the data for the Speak For Yersel spin-off projects and it looks like this will be coming together in the next few weeks too.  We also discussed another project that may use the data from SCOSYA and I might have some involvement in this.

Other than that I spent a bit of time on the Anglo-Norman Dictionary, creating a CSS file to style the entry XML in the Oxygen XML editor’s ‘Author’ view.  The team are intending to use this view to collaborate on the entries and previously we hadn’t created any styles for it.  I had to generate styles that replicated the look of the online dictionary as much as possible, which took some time to get right.  I’m pretty happy with the end result, though, which you can see in the following screenshot: