Week Beginning 12th February 2024

I’d taken Monday off this week and on Tuesday I continued to work on the Speak For Yerself follow-on projects.  Last week I started working with the data and discovered that it wasn’t stored in a particularly consistent manner.  I overhauled the ‘lexis’ data and this week I performed a similar task for the ‘morphology’ and ‘phonology’ data.  I also engaged in email conversations with Jennifer and Mary about the data and how it will eventually be accessed by researchers, in addition to the general public.

I then moved on to looking at the GeoJSON data that will be used to ascertain where a user is located and which area the marker representing their answer should be randomly positioned in.  Wales was missing its area data, but thankfully Mary was able to track it down.

For Speak For Yersel we had three levels of locations:

Location:  The individual places that people can select when they register (e.g. ‘Hillhead, Glasgow’).

Area: The wider area that the location is found in.  We store GeoJSON coordinates for these areas and they are then used as the boundaries for placing a random marker to represent the answer of a person who selected a specific location in the area when they registered.  So for example we have a GeoJSON shape for ‘Glasgow Kelvin’ that Hillhead is located in.  Note that these shapes are never displayed on any maps.

Region: the broader geographical region that the area is located in.  These are the areas that appear on the maps (e.g. ‘Glasgow’ or ‘Fife’) and they are stored as GeoJSON files.

For the new areas we didn’t have the ‘region’ data.  I therefore did some experimenting with the QGIS package and I found a way of merging areas to form regions, as the following screenshot demonstrates:

I was therefore able to create the necessary region shapes myself using the following method:

  1. I opened the GeoJSON file in QGIS via the file browser and added the OpenStreetMap XYZ layer in ‘XYZ Tiles’, ensuring this was the bottom layer in the layer browser
  2. In the layer styling right-hand panel selected the ‘ABC’ labels icon and chose ‘County’ as the value, meaning the county names are displayed on the map
  3. In the top row of icons selected the ‘Select Features by area or single click’ icon (the 23rd icon along in my version of QGIS)
  4. I could then do ‘Ctrl+click’ to select multiple areas
  5. Then I selected the ‘Vector’ menu item and selected ‘Geoprocessing’ then ‘Dissolve’
  6. In the dialog box I had to press the green ‘reload’ icon to make the ‘Selected features only’ checkbox clickable then I clicked it
  7. Then I pressed ‘Run’ which created a new, merged shape.
  8. The layer then needed to be saved using the layer browser in the left panel.
  9. This gave me separate GeoJSON files for each region but I was then able to merge them into one file by opening the ‘Toolbox’ by pressing on the cog icon in the top menu bar, searching for ‘merge’ then open ‘Vector general; -> Merge Vector layers, selecting the input layers and ensuring the destination CRS is WGS84, then entering a filename and running the script to merge all layers.

I was then able to edit / create / delete attributes for each region area by pressing on the ‘open attribute table’ icon in the top menu bar.  It’s been a good opportunity to learn more about QGIS and next week I’ll begin updating the code to importing the data and setting up the systems.

Also this week I created an entry for the Books and Borrowing project on this site (see https://digital-humanities.glasgow.ac.uk/project/?id=160).  On Friday afternoon I also investigated a couple of issues with the search that Matt Sangster had spotted.  He noticed that an author surname search for ‘Byron’ wasn’t finding Lord Byron, and entering ‘Lord Byron’ into the surname search was bringing back lots of results that didn’t have this text in the author surname.

It turned out that Byron hadn’t been entered into the system correctly and was in as forename ‘George’, surname ‘Gordon’ with ‘Lord Byron’ as ‘othername’.  I’ll need to regenerate the data once this error has been fixed.  But the second issue, whereby an author surname search for ‘Lord Byron’ was returning lots of records is a strange one.  This would appear to be an issue with searches for multiple words and unfortunately it’s something that will need a major reworking.  I hadn’t noticed previously, but if you search for multiple words without surrounding them by quotes Solr searches the first word against the field and the remaining words against all fields.  So “surname ‘Lord’ OR any field ‘Byron’”.  Whereas what the query should be doing is “surname ‘Lord’ AND surname ‘Byron’”.  This is something that will probably affect all free-text fields.  I’m going to have to update the search to ensure multi-word searches without quotes are processed correctly, which will take some time and I’ll try to tackle next week.  I also need to create a ‘copy’ field for place of publication as this is being tokenised in the search facet options.  So much for thinking my work on this project was at an end!

Also this week I spent many hours going through the Iona map site to compile a spreadsheet listing all of the text that appears in English in order to make the site multilingual.  There is a Gaelic column in the spreadsheet and the plan is that someone will supply the appropriate forms.  There are 157 separate bits of text, with some being individual words and others being somewhat longer.  By far the longest is the content of the copyright and attribution popup, although we might also want to change this as it references the API which might not be made public.  We might also want to change some of the other English text, such as the ‘grid ref’ tooltip that gives as an example a grid reference that isn’t relevant to Iona.  I’ll hold off on developing the multilingual interface until I’m sure the team definitely want to proceed with this.

Finally this week I continued to migrate some of the poems from the Anthology of 16th and Early 17th Century Scots Poetry to TEI XML.  It’s going to take a long time to get through all of them, but progress is being made.

Week Beginning 5th February 2024

I continued to make updates to the Books and Borrowing website this week after the soft launch last week.  I had noticed last week that the figures appearing on the ‘Facts’ page didn’t correlate with the number of results returned through the search facilities.  I reckoned this was because the ‘Facts’ page, which queries the database directly was not necessarily including the full chain of interrelated tables and their individual ‘isactive’ flags when returning figures whereas the search facilities use the Solr index, and the data stored within this were generated using the full chain.  So for example, counts of borrowing records for books may not incorporate the register page record, but if the register page is set to ‘inactive’ then it is important to factor in the register page.

Another issue was the ‘in stats’ flag.  We have this flag for borrowing records to decide whether the record should appear in the stats or not, so for example a duplicate record could be omitted from the stats, but would still be findable when viewing the register page.  The search results were finding records with ‘instats’ set to ‘no’ but these were not included in the ‘Facts’ page, meaning the numbers would never match up.

To try and make the figures more consistent I decided to update the ‘Facts’ page to use the Solr index for calculating borrowing records.  This slightly changed the figures that appear in the summary section on this page, but as a borrowing record can involve multiple borrowers the total borrowings broken down by gender may not equal the total borrowings.  If a borrowing has two borrowers, one male and one female then it will count as one borrowing in the total borrowings and one each in the totals per gender.

Further investigation into the discrepancies between figures and on the ‘Facts’ page and the number of records returned in a related search did seem to suggest that these are caused by the ‘instats’ flag.  For example, for Chambers library, Fiction is the most popular genre with 4417 borrowings listed.  This is the figure for borrowings with ‘instats’ set to ‘Y’.  But following the link through to the search results listing borrowing records involving the genre ‘Fiction’ at Chambers displays 4420 borrowings.  This is because the search does not take into consideration the ‘instats’ field.  I tried running the queries (and others) through Solr and this does appear to be the reason for the discrepancies.  After discussion with the team we decided therefore to update the search page to avoid returning any ‘instats’ results.  These records will still be findable when viewing register pages, but will not now appear in any search results.  With this update in place the figures on the ‘Facts’ page began to match up with the search results.

However, after further testing I noticed that the stats for authors and genres in the site-wide facts page were still not aligned with the number of search results returned and I spent most of Wednesday investigating this.  It was a very long and complex process but I finally managed to sort it.  I updated the ‘top ten’ cache files for each library to use data from the Solr index to calculate the number of borrowings, but this alone wasn’t giving the correct figures.  The reason was that each library had its own ‘top ten’ for authors and genres, both overall and for each gender.  Then when the facts page was being generated for multiple libraries these lists were then brought together for each library to formulate overall top tens.  If an author or genre was not in the top ten for a library the data for this item was not included in the calculations as only the top ten were being stored.  For example, if ‘Fiction’ was the eleventh most borrowed genre at a library then its borrowings were not found and were therefore not getting added to the total.  What I’ve had to do instead is store all genres and authors for each library in the cache rather than just the top tens, thus ensuring that all items are joined and their number of borrowings are compared.  Unfortunately this does mean the cache files are now a lot bigger and more processing needs to be done.  But at least the figures do now match up.  Also this week I updated the Chambers Library Map to add a link to view the relevant borrowing records to the borrower popups (see https://borrowing.stir.ac.uk/chambers-library-map).

I then moved onto making further updates to the Place-names of Iona map.  Sofia had spotted two minor issues with the map and I fixed them.  Cross references were only set to be included in the data if both records had ‘landranger’ set to ‘Y’.  I’m not sure why this was – it must have been something decided in an earlier project.  I removed this requirement and the cross references now display in the full map record.  However, pressing on the links wasn’t actually doing anything so I also fixed this.  Pressing on a cross reference now works in the same way as the links in the tabular view of the data – the popup is closed, the map focusses on the location of the cross reference and the marker pop-up opens.  The other issue was that the glossary pop-up could end up longer than the page and wasn’t using the updated in-popup scrollbar that I’d developed for the tabular view, so I updated this.

I also continued working through my ‘to do’ list for the project, which is now almost entirely ticked off.  I updated the searches and browses so that they now shift the map to accommodate all of the returned markers.  I wasn’t sure about doing this as I worried it might be annoying, but I think it works pretty well.  For example, if you browse for place-names beginning with ‘U’ the map now zooms into the part of the island where all of these names are found rather than leaving you looking at a section of the island that may not feature any of the markers.

I also updated the citations to use shorter links.  Whenever you press on the ‘cite’ button or open a record the system grabs the hash part of the URL (the part after ‘#’) and generates a random string of characters as a citation ID.  This random string is then checked against a new citation table I’ve created to ensure it’s not already in use (a new string is generated if so) and then the hash plus the citation ID are stored in the database.  This citation ID is then displayed in the citation links.  Loading a URL with a citation ID then queries the database to find the relevant hash and then redirects the browser to this URL.  I’ll need to keep an eye on how this works in practice as we may end up with huge numbers of citation URLs in the database.

I also updated the maximum zoom of the map layers to be whatever each layer supports.  The satellite with labels view supports the highest zoom of the layers (20) and if you’re on this then switch to a different layer (e.g. Modern OS with a max zoom of 16.25) then the map will automatically zoom out to this level.

The only major thing left to tackle is the bilingual view.  My first step in implementing this will be to go through the site and document each and every label and bit of text.  I’ll create a spreadsheet containing these and then someone will have to fill in the corresponding Gaelic text for each.  I’ll then have to update the site so that the code references a language file for every bit of text rather than having English hard-coded.  I’ll begin investigating this next week.

Also this week I spent a bit of time working for the DSL.  They wanted the Twitter logo that is displayed on the site to be replaced with the ‘Artist formerly known as Twitter’ logo.  Unfortunately this was a bigger job than expected as the logos are part of a font package called ‘Font Awesome’ that provides icons across the site (e.g. the magnifying glass for the search button).  The version of the font used on the website was a couple of years old so obviously didn’t include the new ‘X’ logo.  I therefore had to replace the package with a newer version, but the structure of the package and the way in which icons are called and included had changed, which meant I needed to make several updates across the site.  I also replaced the Twitter logo on the entry page in the ‘Share’ section of the right-hand panel.

I spent most of Thursday working on the migration of the Anthology of 16th and Early 17th Century Scots Poetry.  I’ve now migrated more than 25 poems to TEI XML and I’ve completed the interface for displaying these.  I still need to make some tweaks to the XSLT (e.g. to decide how line indents should be handled) but I have now managed to sort other issues (e.g. poems with line numbers that start at a number other than 1).  I should now be able to continue manually migrating the poems to XML when I have a few spare minutes between other commitments and hopefully in a couple of months at most I’ll be able to launch the new site.

On Friday I began looking at developing the Speak For Yersel tool (https://speakforyersel.ac.uk/) for new geographical areas.  We now have data for Wales, the Republic of Ireland and Northern Ireland and I began going through this.  I had hoped I’d just be able to use the spreadsheets I’d been given access to but after looking at them I realised I would not be able to use them as they currently are.  I’m hoping to reshape the code into a tool that it will be possible to easily apply to other areas.  What I want to do is create the tool for one of our three areas and once it’s done it should then be possible to simply ‘plug in’ the data from the other two areas and everything will just work.  Unfortunately the spreadsheets for the three areas are all differently structured and for this approach to work the data needs to be structured absolutely identically in each region.  I.e. every column in every tab of every region must be labelled exactly the same and all columns must be in exactly the same location in each spreadsheet.

For example, the ’Lexis’ tab is vastly different in each of the three areas.  ROI has an ID field, part of speech, a ‘direction of change’ column, 6 ‘variant’ columns, a ‘notes’ column and an ‘asking in NI’ column.  It has a total of 14 columns.  NI doesn’t have ID, PoS or direction of change but has 7 ‘Choice’ columns, a ‘potential photos’ column, no ‘notes’ column and then two columns for cross references to Scotland and ROI.  It has a total of 12 columns.  Wales has ‘Feature type’ and ‘collection method’ columns that aren’t found in the other two areas, it has no ‘Variable’ column, but does have something called ‘Feature’ (although this is much longer than ‘Variable’ in the other two areas).  It has an ‘if oral – who’ column that is entirely empty, six ‘Option’ columns plus a further unlabelled column that only contains the word ‘mush’ in it and four unlabelled columns at the end before a final ‘Picture’ column.  It has a total of 17 columns, plus the data for the feature ‘butt, boyo, mun, mate, dude, boi, lad, la, bruh, bro, fella’ in ‘Option F’ states ‘Lad ++ la/bruh/bro/fella (these should be separate options)’.  As you can probably appreciate that there’s no way that any of this can be automatically extracted.

I therefore worked on reformatting the spreadsheets to be consistent, starting with the ‘lexis’ spreadsheets for the three areas, which took most of the day to complete.  Hopefully the other tabs won’t take as long.  The spreadsheet has the following column structure:

  1. ID: this takes the form ‘area-lexis-threeDigitNumber’, e.g. ‘roi-lexis-001’. This allows rows in different areas to be cross referenced and differentiates between survey types in a readable manner
  2. Order: a numeric field specifying the order the question will appear in. For now this is sequential, but if anyone ever wants to change the order (as happened with SFY) they can do so via this field.
  3. Variable: The textual identifier for the question. Note that there are a few questions in the Wales data that don’t have one of these
  4. Question: The question to be asked
  5. Picture filename: the filename of the picture (if applicable)
  6. Picture alt text: Alt text to be used for the picture (if applicable). Note that I haven’t filled in this field
  7. Picture credit: The credit to appear somewhere near the picture (if applicable)
  8. Xrefs: If this question is linked to others (e.g. in different areas) any number of cross references can be added here.  I haven’t completed this field but the format should be the ID of the cross-referenced question.  If there are multiple then separate with a bar character.  E.g. ‘wales-lexis-001|roi-lexis-006’
  9. Notes: Any notes (currently only relevant to ROI)
  10. POS: Part of speech (as above)
  11. Change: (as above)
  12. Options: Any number of columns containing the answer options.  The reason these appear last is that the number of columns is not fixed.

Next week I’ll work on the other two survey question types and once the data is in order I’ll be able to start developing the tool itself.  I’ve taken Monday next week off, though, so I won’t be starting on this until Tuesday, all being well.

Week Beginning 29th January 2024

I spent a lot of time this week working on the Books and Borrowing project, making final preparations for the launch of the full website.  By the end of the week the website was mostly publicly available (see https://borrowing.stir.ac.uk) but it wasn’t as smooth a process as I was hoping for and there are still things I need to finish next week.

My first task of the week was to write a script that identified borrowers who have at least one active borrowing record but of these zero are records on pages that are active in registers that are active.  This should then bring back a list of borrowers who are only associated with inactive registers.  It took quite some time to get my head around this after several earlier attempts didn’t do exactly what was requested and my final script identified 128 borrowers that were then deactivated, all from St Andrews.

I then moved on to an issue that had been noticed with a search for author names.  A search for ‘Keats’ was bringing back matches for ‘Keating’, which was clearly not very helpful.  The cause of this was Solr trying to be too helpful.  The author name fields were stored at ‘text_en’ and this field type has stemming applied to it, whereby stems of words are identified (e.g. ‘Keat’) and a search for a stem plus a known suffix (e.g. ‘s’, ‘ing’) will bring back other forms of the same stem.  For names this is hopeless as ‘Keating’ is in no way connected to ‘Keats’.

It turned out that this issue was affecting many other fields as well, such as book titles.  A search for ‘excellencies’ was finding book titles containing the forms ‘excellency’ and also ‘excellence’, which again was pretty unhelpful.  I did some investigation into stemming and whether a Solr query could be set to ignore it, but this did not seem to be possible.  For a while I thought I’ve had to change all of the fields to strings, which would have been awful as strings in Solr are case sensitive and do not get split into tokens, meaning wildcards would need to be used and the search scripts I’d created would need to be rewritten.

Thankfully I discovered that if I stored the text in the field type ‘text_general’ then stemming would be ignored but the text would still be split into tokens.  I created a test Solr index on my laptop with all of the ‘text_en’ fields set to ‘text_general’ and searching this index for author surname ‘Keats’ only brought back ‘Keats’ and book title for ‘excellencies’ only brought back ‘excellencies’.  This is exactly what we wanted and with the change in place I was able to fully regenerate the cached data and the JSON files for import into Solr (a process that takes a couple of hours to complete) and ask for the online Solr index to be updated.

I also updated the help text on the website, adding in some new tooltips about editors and translators to the advanced search form.  With no other issues reported by the team I then began the process of making the site live, including adding a ‘quick search’ bar to the header of all pages, adding in menu items to access the data and adding the ‘on this day’ feature to the homepage.

However, I noticed one fairly sizeable issue, unfortunately. The stats on the ‘facts’ page do not correspond to the results you get when you click through to the search results.  I therefore removed the top-level ‘Facts & figures’ menu item until I can figure out what’s going on here. I first noticed an issue with the top ten genre lists.  The number of borrowings for the most popular genre (history) were crazy.  Overall borrowings of ‘history’ is listed as ‘104849’, which almost as many as the total number of borrowing records than we have in the system.  Also the link to view borrowings in the search was limiting the search to Haddington, which is clearly wrong.    Other numbers in the page just weren’t matching up to the search results either.  For example for Chambers the most prolific borrower is Mrs Thomas Hutchings with 274 listed borrowings, but following the link to the search results gives 282.

It took several hours of going through the code to ascertain what was going on with the top ten genre issue and it was all down to a missing equals sign in an ‘if’ statement.  Where there should have been two (==) there was only one, and this was stopping the following genre code from processing successfully.

There is unfortunately still the issue of the figures on the facts pages not exactly matching up with the number of results returned by the search.  I had noticed this before but I’d hoped it was caused by the database still being updated by the teasm and being slightly out of sync with the Solr index.  Alas, it’s looking like this is not the case.  There must instead be discrepancies in the queries used to generate the Solr index and those used the generate the facts data.  I fear that it might be the case that in the chain of related data some checks for ‘isactive’ have been omitted.  Each register, page, borrowing record, borrower, author, and book at each level has its own ‘isactive’ flag.  If (for example) a book item is set to inactive but a query fails to check the book item ‘isactive’ flag then any associated borrowing records will still be returned (unless they have each been set to inactive).  I’m going to have to check every query to ensure the flags are always present.  And it’s even more complicated than that because queries don’t necessarily always include the same data types.  E.g. a borrower is related to a book via a borrowing record and if you’re only interested in borrowers and books the query doesn’t need to include the associated page or register.  But of course if the page or register where the borrowing record is located is inactive then this does become important.  I might actually overhaul the ‘facts’ so they are generated directly from the Solr index.  This would mean things should remain consistent, even when updates are made to the data in the CMS (these would not be reflected until the Solr index is regenerated).  Something I’ll need to work on next week.

Also this week I had a Zoom call with Jennifer Smith and Mary Robinson to discuss expanding the Speak For Yersel project to other areas, namely Wales, Ireland and Northern Ireland.  The data for these areas is now ready to use and we agreed that I would start working with it next week.

I also fixed a couple of issues with the DSL website.  The first was an easy one – the team wanted the boxes on the homepage to be updated to include a ‘Donate’ box.  It didn’t take long to get this done.  The second was an issue with Boolean searches on our test site.  Performing a fulltext search for ‘letters and dunb’ on the live site was returning 192 results while the same search on our test site was returning 54,466 results (limited to 500).

It turned out that the search string was being converted to lower case before it was being processed by the API.  I must have added this in to ignore case in the headword search, but an unintended consequence was that the Booleans were also converted to lower case and were therefore not getting picked up as Booleans.  I updated the API so that the search string is not converted to lower case, but where a headword search is to be performed the string is converted to lower case after the Boolean logic is executed.  With the update in place the searches started working properly again.

I also made a minor tweak to the advanced search on both live and new sites so that query strings with quotations in them no longer lose their quotation marks when returning to the advanced search form.  This was an issue that was identified at our in-person meeting a couple of weeks ago.

I found a bit of time to continue working on the new map for the Place-names of Iona project this week too, during which I completed work on the ‘Table view’ feature.  I updated the table’s popup so that it never becomes taller than the visible map.  If the table is longer than this then the popup now features a scrollbar, which works a lot better than the previous method, whereby pressing on the browser’s scrollbar closed the popup.

I have also now completed the option to press on a place-name in the table to view the place-name on the map.  It took a lot of experimentation to figure out how to get the relevant marker popup to open and also populate it with the correct data.  Things went a little crazy during testing when opening a popup was somehow getting chained to any subsequent popup openings, resulting in the map getting stuck in an endless loop of opening and closing different popups and attempting to autoscroll between them all.  Thankfully I figured out what was causing this and sorted it.  Now when you press on a place-name in the table view any open popups are closed, the map navigates to the appropriate location and the relevant popup opens.  Originally I had the full record popup opening as well as the smaller marker popup, but I began to find this quite annoying as when I pressed on a place-name in the table what I generally wanted to see was where the place-name was on the map, and the full record obscured this.  Instead I’ve set it to open the marker popup, which provides the user with an option to open the full record without obscuring things.  I’ve also now made it possible tocite / bookmark / share the table in a URL.

I then moved onto developing the ‘Download CSV’ option, which took quite some time.  The link at the bottom of the left-hand menu updates every time the map data changes and pressing on the button now downloads the map data as a CSV.

Also this week I had discussions with the Anglo-Norman Dictionary team about a new proposal they are putting together and I updated the Fife Place-names website to fix a couple of issues that had been introduced when the site was migrated to a new server.  I also continued to migrate the collection of Scots poems discussed in previous weeks from ancient HTML to XML, although I didn’t have much time available to spend on this.

Week Beginning 22nd January 2024

I’d hoped that we’d be able to go live with the full Books and Borrowing site this week, but unfortunately there are still some final tweaks to be made so we’re not live yet.  I did manage to get everything ready to go, but then some final checking by the team uncovered some additional issues that needed sorting.

Migrating all of the pages in the development site to their final URLs proved to be quite challenging as it was not simply a matter of copying the pages.  Instead I needed to amalgamate three different sets of scripts and stylesheets (the WordPress site, the dev site and the Chambers map), which involved some rewriting and quite a lot of checking.  However, by the end of Monday I had completed the process and I sent the URLs to the team for a final round of checking.

The team uncovered a few issues that I managed to address, including some problems with the encoding of ampersands and books and borrowers without associated dates not getting returned in the site-wide browse options.  I needed to regenerate the cache files after sorting some of these, which also took a bit of time.

I also realised that the default view of the ‘Browse books’ page was taking fat too long to load, with load times of more than 20 seconds.  I therefore decided to create a cache for the standard view of books, with separate cache files for books by first letter of their title and their author.  These caches would then be used whenever such lists were requested rather than querying everything each time.  When filters are applied (e.g. date ranges, genres) the cache would be ignored, but such filtered views generally bring back less books and are returned quicker anyway.  It took a while to write a script to generate the cache, to update the API to use the cache and then add the cache generation script to the documentation, but once all was in place the page was much quicker to load.  It does still take a few seconds to load in as there are almost 3000 books beginning with the letter ‘A’ in our system and even processing a static file containing this much data takes time.  But it’s a marked improvement.

On Friday afternoon the project PI Katie Halsey got back to me with a list of further updates that I need to make before the launch, one of which will require the Solr index structure to be updated and all of the Solr data to be regenerated, which I’ll tackle next week.  Hopefully after that we’ll finally be able to launch the site.

Also this week I fixed a small bug that had been introduced to the DSL sites when we migrated to a new server before Christmas.  This was causing an error when the ‘clear results’ option was selected when viewing the entry page and I’ve sorted it now.  I also investigated an issue then including URLs in the description of place-names on the Ayrshire place-names site.  It turned out that relative URLs were getting blocked as a security risk, but a switch to absolute URLs sorted this.

I also made some fixes to some of the data stored in the Speech Star databases that had been uploaded with certain features missing and provided some CV-like information about the projects I’ve worked on for Clara Cohen in English Language to include in a proposal I’m involved with.

I then investigated a couple of issues with the Anglo-Norman Dictionary.  The first was that the Editor Geert needed a new ‘usage label’ to the XML for the site, which would then need to be searchable in the front-end.  After spending some time refamiliarising myself with how things worked I realised it should be very easy to add new labels.  All I needed to do was to add the label to the database and then it was automatically picked up by the DTD and given as an option in Oxygen.  Then when an entry is uploaded to our dictionary management system it will be processed and should (hopefully) automatically appear in the advanced search label options (once it has at least one active association to an entry).  Geert has begun using the new label and we’ll see if the process works as I think it will.

The second issue was one that Geert had noticed in the Dictionary Management System when manually editing the citations.  The ‘Edit citations’ feature allows you to select a source and then view every time this source is cited in all of the entries.  The list then lets you manually edit each of these.  The issue was cropping up when an entry featured more than one citation for a source and more than one of these needed to be edited.  In such cases some of the edits were being lost.  The problem was that when multiple citations were being edited at the same time the code was extracting the XML from the database and editing it afresh for each one, meaning only the changes for the last were being stored.  I’ve sorted this now.

I also spent some time continuing to develop the new place-names map for the Iona project.  I updated the elements glossary to add a new hidden column to the data that stores the plain alphanumeric value of the name (i.e. all weird characters removed).  This is now used for ordering purposes, meaning any elements beginning with a character such as an asterisk or a bracket now appear in the correct place.

I also began working on the ‘Table view’ and for now I’ve set this to appear in a pop-up, as with the place-name record.  Pressing on the ‘Table view’ button brings up the table, which contains all of the data currently displayed on the map.  You can reorder the table by any of the headings by pressing on them (press a second time to reverse the order).  Below is a screenshot demonstrating how this currently looks:

You can also press on a place-name to close the table and open the place-name’s full record.  The map also pans and zooms to the relevant position.  What I haven’t done yet is managed to get the marker pop-up to also appear, so it’s currently still rather difficult to tell which marker is the one you’re interested in.  I’m working on this, but it’s surprisingly difficult to achieve due to the leaflet mapping library not allowing you to assign IDs to markers.

Using the pop-up for the table is good in some ways as it keeps everything within the map interface, but it does have some issues.  I’ve had to make the pop-up wider to accommodate the table, so it’s not going to work great on mobile phones.  Also when the table is very long and you try to use the browser’s scrollbar the pop-up closes, which is pretty annoying.  I think I might have found a solution to this (adding a scrollbar to the pop-up content itself), but I haven’t had time to implement it yet.  An alternative may be to just have the table opening in a new page, but it seems a shame to not have everything contained in the map.

Finally this week I spent some time working through the poems on the old ‘Anthology of 16th and Early 17th Century Scots Poetry’ site.  I created a new interface for this last week (which is not yet live) and decided it would make sense to migrate the poems from ancient HTML to TEI XML.  This week I began this process, manually converting the firs 13 poems (out of about 130) to XML.  With these in place I then worked on an XSLT stylesheet that would transform this XML into the required HTML.  It was here that I ran into an issue.  I absolutely hate working with XSLT – it’s really old fashioned and brittle and cumbersome to use.  Things I can literally achieve in seconds for HTML using jQuery can take half a day of struggling with XSLT.  This time I couldn’t even get the XSLT script to output anything.  I was using similar code to that which I’d created for the DSL and AND, but nothing was happening.  I spent hours trying to get the XSLT to match any element in the XML but nothing I tried worked.  Eventually I asked Luca, who has had more experience with XML than I have had, for advice and thankfully he managed to identify the issue.  I hadn’t included a namespace for TEI in the XSLT file.  I needed to include the following reference:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0">

And then each time I wanted to match an XML element such as <text> I couldn’t just use ‘text’ but instead needed to add a tei prefix ‘tei:text’.  Without this the XSLT wouldn’t do anything.  Yep, XSLT is brittle and cumbersome and an absolute pain to use.  But with Luca’s help I managed to get it working and other than transcribing the remaining texts (a big job I’ll tackle bit by bit over many weeks) all is pretty much in place.

 

Week Beginning 15th January 2024

I received an email from Theo Van Heijnsberg in Scottish Literature over the weekend alerting me to the fact that one of the old STELLA resources that is still used for teaching was no longer available.  At the end of last year we’d shut down the server that was hosting the old STELLA resources, and at the time we thought we’d migrated all of the resources that were still being used but I guess this one was overlooked.  We had of course taken backups of the server before we shut it down so it was relatively easy to find and restore the resource, especially as it was an old plain HTML website that was last updated way back in 1999.  The website (An Anthology of 16th and Early 17th Century Scots Poetry) can now be found here: https://stella.glasgow.ac.uk/anthol/title.html.

As the website needs to continue to be hosted I decided that I should probably update the interface to bring it into the 21st century.  I therefore spent a bit of time creating a Bootstrap based interface, which is fairly unadorned and retains the logo.  I’ve added in a footer with links that University websites are now required to feature.  There is also a ‘Top’ button that appears as you scroll down the page, allowing you to quickly return to the top.  The homepage features two columns, one with the contents and the other containing the ‘Editors’ text.  So far I’ve only added content to the ‘Alexander Scott’ page.  This currently features the list of poems and the first two poems.  Pressing on the title of one of these in the list scrolls the page to the relevant point (you can use the ‘Top’ button to return to the list).  There is a button for returning to the contents page at the top (pressing on the site title or logo does the same).  Here’s a screenshot of how the new site currently looks:

The biggest issue is that the poems are all marked up in very old HTML using tables for layout and tags that were find in 1999 but are not great to use now.  In order to extract the text I need to copy and paste each individual line, line number and gloss into a new HTML structure.  After doing this for two poems I decided that instead of doing this it would be better to migrate the text to TEI XML and to store the data separately from the presentation.  That way it will be much easier to reuse the poems in future.  I would then write a script to transform the XML into the required HTML for display on the site.  However, after deciding this Theo got back to me to say that another project at St. Andrews is going to use the data from the resource and incorporate it into a larger online resource.  This project is still just at the planning stages, but it’s likely the Anthology website will not be needed for much longer.  I’ll probably still explore the XML pathway, though, as it’s a useful exercise to hone my XML skills.  It will take me some time to go through more than 100 poems, though, and I’ll need to do this between my commitments to funded research projects.

Also this week I did some further work for the Books and Borrowing project.  I fixed an image that wasn’t working in one of the Royal High School registers and reassigned images to one of the Orkney registers due to a duplicate image throwing everything from that point onwards off by one.  I also updated the website interface to ensure that tables work better on narrow screens.  I applied a minimum width of 800px to tables (e.g. the tabular list of libraries and the lists or registers), meaning on very narrow screens the tables no longer get squashed into spaces that make them entirely unusable but instead extend beyond the width of the screen and you need to scroll to view their full contents.  It seems to work ok on my rather ancient phone.

On Wednesday we were finally ready to generate the final pre-publication cache files, so I completed this task – updating all of the caches and generating a new Solr index file.  Once this had been set up for us by Stirling’s IT people I then updated the connection details in the API.  Also in this update I fixed an issue with the place of publication in the advanced search.  Previously if you typed in ‘london’ (lower case) and didn’t select ‘London’ form the drop-down list and then pressed the ‘search’ button you found no results as publication place was case sensitive.  The correct results are now returned.  I wasn’t able to ‘go live’ with the new site this week, however, due to other commitments.  I’ll start on this first thing next week.

On Thursday this week I attended a staff meeting in Edinburgh for the DSL.  Several new members of staff have joined the DSL recently and a meeting was arranged so everyone could get together, which is quite a rare occurrence, especially as the DSL do not have offices since the Lockdown happened.  I gave a little talk about the DSL’s website and my role in the DSL, and gave a brief history of the development of the DSL’s website.  It will be ten years this September since we launched the modern DSL website, which is quite a milestone.  It was a really great opportunity to meet everyone and we had a lovely meal out afterwards in a private dining room at the Scottish Arts Club.

I also continued with the new map of Iona place-names this week, tiocking some more items of my ‘to do’ list.  I made some further updates to the map legend.  When categorising markers by code or element language the marker colours are now fixed.  I’ve gone with shades of purple for ‘Gaelic’ and shades of blue for ‘SSE’ in the languages while in the classification codes coastal features are sandy colours, water is blue, vegetation is green etc.

I also worked on the ordering of the items in the legends for classification codes and languages.  Previously the order was pretty arbitrary and depended on the order the items were found in the results.  Now the groupings are ordered alphabetically, except for ‘Gaelic’ groups, which appear first.  I’m afraid ‘en’ appears last, though, as ordering is case sensitive.  I did attempt to negate this, but this caused other problems so I’ve kept it as it is.  I also ensured that ‘Gaelic’ and ‘SSE’ always come first in the order of languages (e.g. ‘Gaelic & en’ and not ‘en & Gaelic’) and I added tooltips for language groupings that need them.  Here’s a screenshot showing the new legend and colours:

I also added in the ‘Cite’ text to the record, so when opening a full record if you select the ‘cite this record’ tab you’ll see the citation options.  I also added in the citation option for the map as a whole (the ‘cite’ button in the left-hand menu).  This has taken some time to implement as the citation text needed to include descriptive text to reflect the options that have been selected, including the base map, the classification type and whether the map is displaying a browse, a quick search or an advanced search, plus which exact options have been selected.

So for example when browsing for all current names starting with ‘S’ using the modern OS map, categorised by language and you press on the ‘Cite’ button the APA style of citation is formatted to display the following:

Map of Iona’s place-names: browse for place-names starting with ‘S’, classified by language, base map: modern OS. 2024. In Iona’s Namescape. Glasgow: University of Glasgow. Retrieved 19 January 2024, from [Map URL here].

Next week I’ll begin on the tabular view of the data, as well as going live with the Books and Borrowing site.

Week Beginning 8th January 2024

This was my first week back after the Christmas holidays and after catching up with emails I spent the best part of two days fixing the content management system of one of the resources that had been migrated during the end of last year.   The Saints Places resource (https://saintsplaces.gla.ac.uk/) is not one I created but I’ve taken on responsibility for it due to my involvement with other place-names resources.  The front-end was migrated by Luca and was working perfectly, but he hadn’t touched the CMS, which is understandable given that the project launched more than ten years ago.  However, I was contacted during the holidays by one of the project team who said that the resource is still regularly updated and therefore I needed to get the CMS up and running again.  This required updates to database query calls and session management and it took quite some time to update and test everything.  I also lost an hour or so with a script that was failing to initiate a session, even though the session start code looked identical to other scripts that worked.  It turned out that this was due to the character encoding of the script, which had been set to UTF-8 BOM, which meant that hidden characters were being outputted to the browser by PHP before the session was instantiated, which then made the session fail.  Thankfully once I realised this it was straightforward to convert the script from UTF-8 BOM to regular UTF-8, which solved the problem.

With this unexpected task out of the way I then returned to my work on the new map interface for the Place-names of Iona project, working through the ‘to do’ list I’d created after our last project meeting just before Christmas.  I updated the map legend filter list to add in a ‘select all’ option.  This took some time to implement but I think it will be really useful.  You can now deselect the ‘select all’ to be left with an empty map, allowing you to start adding in the data you’re interested in rather than having to manually remove all of the uninteresting categories.  You can also reselect ‘select all’ to add everything back in again.

I did a bit of work on the altitude search, making it possible to search for an altitude of zero (either on its own or with a range starting at zero such as ‘0-10’).  This was not previously working as zero was being treated as empty, meaning the search didn’t run.  I’ve also fixed an issue with the display of place-names with a zero altitude – previously these displayed an altitude of ‘nullm’ but they now display ‘0m’.  I also updated the altitude filter groups to make them more fine-grained and updated the colours to make them more varied rather than the shades of green we previously had.  Now 0-24m is a sandy yellow, 25-49, is light green, 50-74m is dark green, 75-99 is brown and anything over 99 is dark grey (currently no matching data).

I also made the satellite view the default map tileset, with the previous default moved to third in the list and labelled ‘Relief’.  This proved to be trickier to update than I thought it would be (e.g. pressing the ‘reset map’ button was still loading the old default even though it shouldn’t have) but I managed to get it sorted.  I also updated the map popups so they have a white background and a blue header to match the look of the full record and removed all references to Landranger maps in the popup as these were not relevant.  Below is a screenshot showing these changes:

I then moved onto the development of the elements glossary, which I completed this week.  This can now be accessed from the ‘Element glossary’ menu item and opens in a pop-up the same as the advanced search and the record.  By default elements across all languages are loaded but you can select a specific language from the drop-down list.  It’s also possible to cite or bookmark a specific view of the glossary, which will load the map with the glossary open at the required place.

I’ve tried to make better use of space than similar pages on the old place-names sites by using three columns.  The place-name elements are links and pressing on one performs a search for the element in question.  I also updated the full record popup to link the elements listed in it to the search results.  I had intended to link to the glossary rather than the search results, which is what happens in the other place-names sites, but I thought it would be more useful and less confusing to link directly to the search results instead.  Below is a screenshot showing the glossary open and displaying elements in Scottish Standard English:

I also think I’ve sorted out the issue with in-record links not working as they should in Chrome and other issues involving bar characters.  I’ve done quite a bit of testing with Chrome and all seems fine to me, but I’ll need to wait an see if other members of the team encounter any issues.  I also added in the ‘translation’ field to the popup and full record, although there are only a few records that currently have this field populated, relabelled the historical OS maps and fixed a bug in the CMS that was resulting in multiple ampersands being generated when an ampersand was used in certain fields.

My final update for the project this week was to change the historical forms in the full record to hide the source information by default.  You now need to press a ‘show sources’ checkbox above the historical forms to turn these on.  I think having the sources turned off really helps to make the historical forms easier to understand.

I also spent a bit of time this week on the Books and Borrowing project, including participating in a project team Zoom call on Monday.  I had thought that we’d be ready for a final cache generation and the launch of the full website this week, but the team are still making final tweaks to the data and this had therefore been pushed back to Wednesday next week.  But this week I updated the ‘genre through time’ visualisation as it turned out that the query that returned the number of borrowing records per genre per year wasn’t quite right and this was giving somewhat inflated figures, which I managed to resolve.  I also created records for the first volume of the Leighton Library Minute Books.  There will be three such volumes in total, all of which will feature digitised images only (no transcriptions).  I processed the images and generated page records for the first volume and will tackle the other two once the images are ready.

Also this week I made a few visual tweaks to the Erskine project website (https://erskine.glasgow.ac.uk/) and I fixed a misplaced map marker in the Place-names of Berwickshire resource (https://berwickshire-placenames.glasgow.ac.uk/).  For some reason the longitude was incorrect for the place-name, even though the latitude was fine, which resulted in the marker displaying in Wales.  I also fixed a couple of issues with the Old English Thesaurus for Jane Roberts and responded to a query from Jennifer Smith regarding the Speak For Yersel resource.

Finally, I investigated an issue with the Anglo-Norman Dictionary.  An entry was displaying what appeared to be an erroneous first date so I investigated what was going on.  The earliest date for the entry was being generated from this attestation:

<attestation id="C-e055cdb1"><dateInfo>

<text_date post="1390" pre="1314" cert="">1390-1412</text_date>

<ms_date post="1400" pre="1449" cert="">s.xv<sup>1</sup></ms_date>

</dateInfo>

<quotation>luy donantz aussi congié et eleccion d’estudier en divinitee ou en loy canoun a son plesir, et ce le plus favorablement a cause de nous</quotation>

<reference><source siglum="Lett_and_Pet" target=""><loc>412.19</loc></source></reference>

</attestation>

Specifically the text date:

<text_date post="1390" pre="1314" cert="">1390-1412</text_date>

This particular attestation was being picked as the earliest due to a typo in the ‘pre’ date which is 1314 when it should be 1412.  Where there is a range of dates the code generates a single year at the midpoint that is used as a hidden first date for ordering purposes (this was agreed upon back when we were first adding in first dates of attestation).  The code to do this subtracts the ‘post’ date from the ‘pre’ date, divides this in two and then adds it to the ‘post’ date, which finds the middle point.  With the typo the code therefore subtracts 1390 from 1314, giving -76.  This is divided in two giving -38.  This is then added onto the ‘post’ date of 1390, which gives 1352.  1352 is the earliest date for any of the entry’s attestations and therefore the earliest display date is set to ‘1390-1412’.  Fixing the typo in the XML and processing the file would therefore rectify the issue.

Week Beginning 18th December 2023

This was the last working week before Christmas, and it was a four-day week due to Friday being given in lieu of Christmas Eve, which is on Sunday this year.  I’ll be off for the next two weeks, which I’m really looking forward to.

The Books and Borrowing project officially comes to an end on the 31st of December, although we’re not going to launch the front-end until sometime in January.  I still had rather a lot of things to do for the project and therefore spent the entirely of my four days this week working for this project.  Of course, the other team members were also frantically trying to get things finished off, which often led to them spotting something they needed me to sort out, so I found myself even busier than I was expecting.  However, by Thursday I had managed to complete all of the tasks I’d hoped to finish, plus many more that were sent my way as the week progressed.

At the end of last week I’d begun updating the site text that the project PI Katie had supplied me with, and I completed this task, finally banishing all of the placeholder text.  This also involved much discussion about the genre visualisations and what they actually represent, which we thankfully reached agreement on.  I also added in some further images of a register at Leighton library and wrote a script to batch update changes to the publication places, dates of publication and formats of many book edition records.  One of the researchers also spotted that the ‘next’ and ‘previous’ links for the two Selkirk registers were not working, due to an earlier amalgamation of page records into ‘double spread’ records.  I therefore wrote another script to sort these out.

I then added new ‘Top ten book work’ lists to the site-wide ‘Facts’ page (overall and by the gender of borrowers).  This required me to update the script that generated the cache that I developed last week, to rerun the script to generate fresh data, to update the API to ensure that works were incorporated into the output and to update the front-end to add in the data.  Hopefully the information will be of interest to people.

I then overhauled the highlighting of search terms in the search results.  This was previously only working with the quick search, and only when no wildcards were used in the search.  Instead I used a nice JavaScript library called mark.js (https://markjs.io/) that I’d previously used for the DSL website to add in highlighting on the client-side.  Now any the values in any search fields that are searched for will be highlighted in the record, including when wildcards are used.  I also updated the highlight style to make it a bit less harsh.

It should be noted that highlighting is still a bit of a blunt tool – any search terms will be highlighted throughout the entire record where the term is found.  So if you search for the occupation ‘farmer’ then wherever ‘farmer’ is found in the record it will be highlighted, not just in the normalised occupation list.  Similarly, if you search for ‘born’ then the ‘born’ text in the author information will be highlighted.  It’s not feasible to make the highlighting more nuanced in the time we have left, but despite this I think that on the whole the highlighting is useful.

I reckoned that the highlighting could end up being a bit distracting so I added in an option to turn results highlighting on or off.  I added a button to process this to the search results page, as part of the buttons that include the ‘Cite’ and ‘Download’ options.  The user’s choice is remembered by the site, so if you turn highlighting off and then navigate through the pages of results or perform a filter the highlights stay off.  They will stay off until you turn them on again, even if you return to the site after closing your browser.

One of the researchers noticed that an unnecessary near-duplicate genre had somehow been introduced into the system (‘Fine Art’ instead of ‘Fine Arts’) so I removed the and reassigned any records that were assigned to the erroneous version.  The PI Katie also spotted some odd behaviour with the search form boxes.  When using the browser’s ‘back’ button search data was being added to the wrong search boxes.  This took quite some time to investigate and I couldn’t replicate the issue in Firefox (the browser I use by default), but when using a Chrome-based browser (MS Edge) I experience the issue.  It turns out it’s nothing to do with my code but a bug in Chrome (see https://github.com/vuejs/vue/issues/11165).  The fix mentioned on this page was to add ‘autocomplete=”off”’ to the form and this seems to have sorted the problem.  It’s crazy that this issue with Chrome hasn’t been fixed as the posts on the page identifying the issue started in 2020.

Katie also spotted another issue when using Chrome.  Applying multiple filters to the search results wasn’t working in Chrome, even though it worked fine in Firefox.  This time it was caused by Chrome encoding the bar character to %7C while Firefox keeps it as ‘|’.  My filter script was splitting up filters on the actual bar character and as this wasn’t present in Chrome multiple filters were not working (even though they were working fine in Firefox).  Thankfully once identified this was relatively easy to fix.

I also managed to implement a ‘compact’ view of borrowing records this week, something that had been on my ‘to do’ list for a while.  Borrowing records can be extremely verbose and rather overwhelming so we decided to give the option to view compact versions of the records that contain a narrower set of fields.  I added a  compact / full record view switcher to the bar of options in the top right of the search results and library register page, beside the ‘Cite’ option.  As with the highlighting feature I previously discussed, the choice is remembered in your browser, even if you return to the site in a later session (so long as you’re using the same device and browser, of course).

For the compact view I decided to retain the links to the library, register and page as I figured it would be useful to be able to see these.  Also included are the borrowed and returned dates, the borrowers (names only), the title of the Book Work (or Works) if the record has such an association and the title of the Holding if not, any associated authors and genres, plus a list of the volumes borrowed (if applicable).  The following screenshot shows what the compact looks like:

My final tasks of the week were to add in a cookie banner for the site and install Google Analytics.  In the New Year I’ll need to regenerate the Solr index and then integrate the development site with the live site.  This will include making updates to paths throughout the code, ensuring the existing Chambers Maps continues to function, adding links to the pages of the development site to the site menu and adding the quick search option to the site header.  It will be great once the site is fully accessible.

Also this week I created a new Google Analytics property for a site the DSL launched a month or two ago and spoke to Geert, the editor of the AND about an issue he’d spotted with entry dates (that I’ll investigate after Christmas).  I finished off my work for the year by removing the Twitter widget from every site I’m responsible for.  Twitter blocked access to the widget that allows a feed to be embedded in a website a few months ago and it looks like this is a permanent change.  It meant instead of a nice Twitter feed an empty feed and a ‘nothing to see here’ message was displayed on all of my sites, which was obviously no good.  It feels quite liberating to drop Twitter (or X as it is currently called).

That’s all from me for this year.  If anyone is reading this I wish you all the best for Christmas and 2024!

Week Beginning 11th December 2023

I devoted about three days of this week to developing the new place-names map for the Iona project.  My major task was to make the resource ‘remember’ things, which has taken a long time to implement as basically everything I’d developed so far needed to be extended and reworked.  Now as you scroll around the map and zoom in and out the hash in the URL in the address bar updates.  Your selected ‘Display options’ also appear in the hash.  What this means is you can bookmark or share a specific view.  I can’t share the full URL of the map yet as it’s not publicly available, but for example the hash ‘#15.5/56.3490/-6.4676/code/tileNLS1/labelsOn’ provides a URL that loads the map zoomed in on the islands of Stac MhicMhurchaidh and Rèidh-Eilean, with the OS 1840-1880 base map selected, categorised by classification code and labels always on.  Any search or browse options you’ve entered or selected are also remembered in this way, for example the hash ‘#16/56.3305/-6.3967/language/tileSat/labelsOff/browse/nameCurrent|C’ gives a URL for a ‘browse’ showing current names beginning with ‘C’ on a satellite map categorised by element language focussed on the main settlement.

The same approach also works for the search facility, with search options separated by a double bar and the search term and its value separated by a single bar.  For example ‘#13.75/56.3287/-6.4139/date/tileSat/labelsOn/advancedSearch/eid|181||elementText|d%C3%B9n%20(G)||lang|G’ gives a URL for an advanced search showing results where the element language is Gaelic and the element in question is ‘dùn’, categorised by earliest recorded date.  With an advanced search you can now press on the ‘Refine your advanced search’ button and any advanced search options previously entered will appear in the search boxes.

You can also now bookmark and share a record when you open the ‘full record’, as the action of opening a record also adds information to the URL in the address bar.  So, for example ‘#14/56.3293/-6.4125/date/tileSat/labelsOn/advancedSearch/eid|181||elementText|d%C3%B9n%20(G)||lang|G/record/4262’ is the hash of the record for ‘Dùn nam Manach’.  Note that this also preserves whatever search or browse led to the place-name being opened too.

I also made some updates to the ‘full record’, the most important being that the text appearing as links can now be clicked on to perform a search for that information.  So for example if you press on the classification code in the record this will close the record and perform a search for the code.  Place-name element links are not yet operational, though, as these will link to the element glossary, which I still need to create.  I have however created a new menu item in the left-hand menu for the glossary and have figured out how it will work.  I’m intending to make it a modal pop-up like the record and advanced search.  I did consider adding it into the left-hand menu like the ‘browse’ but there’s just too much information for it to work there.

I also added in a new ‘Cite this record’ tab to the ‘full record’ which will display citation information for the specific record.  I still need to add this in.  Also new is a bar of icons underneath the left-hand menu options.  This contains buttons for citing the map view, viewing the data displayed on the map as a table and downloading the map data as a CSV file, but none of these are operational yet.

On Thursday I had a meeting with the Iona team to discuss the map.  They are pretty pleased with how it’s turning out, but they did notice a few bugs and things they would like done differently.  I made a sizeable ‘to do’ list and I will tackle this in the new year.

I spent most of the remainder of the week working on the Books and Borrowing project.  I updated the languages assigned to a list of book editions that had been given to me in a spreadsheet and added a few extra pages and page images to one of the registers.   I then returned to my ‘to do’ list for the project and worked through some of the outstanding items.  I moved the treemaps on the library and site-wide ‘facts’ pages to separate tabs and I went through the code to ensure that the data for all visualisations only uses borrowing records set to ‘in stats’.  This wasn’t done before so many of the visualisations and data summaries will have changed slightly.  I also removed the non-male/female ‘top ten’ lists in the library facts page, as requested.

I then moved on to creating a cache for the facts page data, which took about a day to implement.  I firstly generated static data for each library and stored this as JSON in the database.  This is then used for the library facts page rather than processing the data each time.  However, the site-wide facts page lets the user select any combination of libraries (or select all libraries) and the ‘top ten’ lists therefore have to dynamically reflect the chosen libraries.  This meant updating the API to pull in the ‘facts’ JSON files for each selected library and then analyse them in order to generate new ‘top tens’ for the chosen libraries.  For example, working out the top ten genres for all selected libraries meant going through the individual top ten genre lists for each library, working out the total number of borrowings for each genre and then reordering things after this merging of data was complete.

Despite still requiring this processing the new method of using the cached data is considerably faster than querying and generating the data afresh each time the user requests it.  Previously displaying the site-wide ‘facts’ page for all libraries was taking up to a minute to complete whereas now it takes just a few seconds.  I also made a start on updating the site text that Katie had sent me earlier in the week.  A large number of tweaks and changes are required and this is likely to take quite a long time, but I hope to have it finished next week.

Towards the start of the week I also spent some time in discussions about what should become of the Pilot Scots Thesaurus website.  The project ended in 2015, the PI moved on from the University several years ago and the domain name will expire in April 2024.  We eventually decided that the site will be archived, with the data added to the Enlighten research data repository and the main pages of the site archived and made available via the University’s web archive partner.

Towards the end of the week I did some further work for the Anglo-Norman Dictionary, including replacing the semi-colon with a diamond in the entry summary box (e.g. see https://anglo-norman.net/entry/colur_1) and discussing whether labels should also appear in the box (we decided against it).  I also had a discussion with the editor Geert about adding new texts to the Textbase and thought a little about the implications of this, given that the texts are not marked up as TEI XML and the Textbase was developed around TEI XML texts.  I’ll probably do some further investigation in the new year once Geert sends me on some sample files to work with.

 

Week Beginning 4th December 2023

After spending much of my time over the past three weeks adding genre to the Books and Borrowing project I turned my attention to other projects for most of this week.  One of my main tasks was to go through the feedback from the Dictionaries of the Scots Language people regarding the new date and quotation searches I’d developed back in September.  There was quite a lot to go through, fixing bugs and updating the functionality and layout of the new features.  This included fixing a bug with the full text Boolean search, which was querying the headword field rather than the full text and changing the way quotation search ranking works.  Previously quotation search results were ranked by the percentage of matching quotes, and if this was the same then the entry with the largest number of quotes would appear higher.  Unfortunately this meant that entries with only one quote ended up ranked higher than entries with large numbers of quotes, not all of which contained the term.  I updated this so that the algorithm now counts the number of matching quotes and ranks primarily on this, only using the percentage of matching quotes when two entries have the same number of matching quotes.  So now a quotation search for ‘dreich’ ranks what are hopefully the most important entries first.

I also updated the display of dates in quotations to make them bold and updated the CSV download option to limit the number of fields that get returned.  I also noticed that when a quotation search exceeded the maximum number of allowed results (e.g. ‘heid’) it was returning no results due to a bug in the code, which I fixed.  I also fixed a bug that was stopping wildcards in quick searches from working as intended and fixed an issue with the question mark wildcard in the advanced headword search.

I then made updates to the layout of the advanced search page, including adding placeholder ‘YYYY’ text to the year boxes, adding a warning about the date range when dates provided are beyond the scope of the dictionaries and overhauling the search help layout down the right of the search form.  The help text scroll down/up was always a bit clunky so I’ve replaced it with what I think is a neater version.  You can see this, and the year warning in the following screenshot:

I also tweaked the layout of the search results page, including updating the way the information about what was search for is displayed, moving some text to a tooltip, moving the ‘hide snippets’ option to the top menu bar and ensuring the warning that is displayed when too many results are returned appears directly above the results.  You can see all of this in the following screenshot:

I then moved onto updates to the sparklines.  The team decided they wanted the gap length between attestations to be increased from 25 to 50 years.  This would mean individual narrow lines would then be grouped into thicker blocks.  They also wanted the SND sparkline to extend to 2005, whereas previously it was cut off at 2000 (with any attestations after this point given the year 2000 in the visualisation).  These updates required me to make changes to the scripts that generate the Solr data and to then regenerate the data and import it into Solr.  This took some time to develop and process, and currently the results are only running on my laptop as it’s likely the team will want further changes made to the data.  The following screenshot shows a sparkline when the gap length was set to 25 years:

And the following screenshot shows the same sparkline with the gap length set to 50 years:

I also updated the dates that are displayed in an entry beside the sparkline to include the full dates of attestation as found in the sparkline tooltip rather than just displaying the first and last dates of attestation.

I completed going through the feedback and making updates on Wednesday and now I need to want and see whether further updates are required before we go live with the new date and quotation search facilities.

I spent the rest of the week working on various projects.  I made a small tweak to remove an erroneous category from the Old English Thesaurus and dealt with a few data issues for the Books and Borrowing project too, including generating spreadsheets of data for checking (e.g. list of all of the distinct borrower titles) and then making updates to the online database after these spreadsheets had been checked.  I also fixed a bug with the genre search, which was joining multiple genre selections with Boolean AND when it should have been joining them with Boolean OR.

I also returned to working for the Anglo-Norman Dictionary.  This included updating the XSLT so that legiturs in variant lists displayed properly (see ‘la noitement (l. l’anoitement))’ here: https://anglo-norman.net/entry/anoitement).  Whilst sorting this out I noticed that some entries would appear to have multiple ‘active’ records in the database – a situation that should not have happened.  After spotting this I did some frantic investigation to understand what was going on.  Thankfully it turned out that the issue has only affected 23 entries, with all but two of them having two active records.  I’m not sure what happened with ‘bland’ to result in 36 active records, and ‘anoitement’ with 9, but I figured out a way to resolve the issue and ensure it doesn’t happen again in future.  I updated the script that publishes holding area entries to ensure any existing ‘active’ records are removed when the new record is published.  Previously the script was only dealing with one ‘active’ entry (as that is all there should have been), which I think may have been how the issue cropped up.  In future the duplicate issue will rectify itself whenever one of the records with duplicate active records is edited – at the point of publication all existing ‘active’ records will be moved to the ‘history’ table.

Also for the AND this week I updated the DTD to ensure that superscript text is allowed in commentaries.  I also removed the embedded Twitter feed from the homepage as it looks like this facility has been permanently removed by Twitter / X.  I’ve also tweaked the logo on narrow screens so it doesn’t display so large, which should make the site better to use on mobile phones and I fixed an issue with the entry proofreader which was referencing an older version of jQuery that no longer existed.  I also fixed the dictionary’s ‘browse up’ facility, which had broken.

I also found some time to return to working on the new map interface for the Iona place-names project and have now added in the full record details.  When you press on a marker to open the popup there is now a ‘View full record’ button.  Pressing on this opens an overlay the same as the ‘Advanced search’ that contains all of the information about the record, in the same way as the record page on the other place-name resources.  This is divided into a tab for general information and another for historical forms as you can see from the following screenshot:

Finally this week I kept project teams updated on another server move that took place overnight on Thursday.  This resulted in downtime for all affected websites, but all was working again the next morning.  I needed to go through all of the websites to ensure they were working as intended after the move, and thankfully all was well.

Week Beginning 27th November 2023

I completed work on the integration of genre into the Books and Borrowing systems this week.  It took a considerable portion of the week to finalise the updates but it’s really great to have it done, as it’s the last major update to the project.

My first task was to add genre selection to the top-level ‘Browse Editions’ page, which I’m sure will be very useful.  As you can see in the following screenshot, genres now appear as checkboxes as with the search form, allowing users to select one or more they’re interested in.  This can be done in combination with publication date too.  The screenshot shows the book editions that are either ‘Fiction’ or ‘Travel’ that were published between 1625 and 1740.  The selection is remembered when the user changes to a different view (i.e. authors or ‘top 100’) and when they select a different letter from the tabs.

It proved to be pretty tricky and time-consuming to implement.  I realised that not only did the data that is displayed need to be updated to reflect the genre selection, but the counts in the letter tabs needed to be updated too.  This may not seem like a big thing, but the queries behind it took a great deal of thought.  I also realised whilst working on the book counts that the counts in the author tabs were wrong – they were only counting direct author associations at edition level rather than taking higher level associations from works into consideration.  Thankfully this was not affecting the actual data that was displayed, just the counts in the tabs.  I’ve sorted this too now, which also took some time.

With this in place I then added a similar option to the in-library ‘Book’ page.  This works in the same way as the top-level ‘Editions’ page, allowing you to select one or more genres to limit the list of books that are displayed, for example only books in the genres of ‘Belles Lettres’ and ‘Fiction’ at Chambers, ordered by title or the most popular ‘Travel’ books at Chambers.  This did unfortunately take some time to implement as Book Holdings are not exactly the same as Editions in terms of their structure and connections so even though I could use much of the same code that I’d written for Editions many changes needed to be made.

The new Solr core was also created and populated at Stirling this week, after which I was able to migrate my development code from my laptop to the project server, which meant I could share my work with others, which was good.

I then moved onto adding genre to the in-library ‘facts’ page and the top-level ‘facts’ page.  Below is a very long screenshot of the entire ‘facts’ page for Haddington library and I’ll discuss the new additions below:

The number of genres found at the library is now mentioned in the ‘Summary’ section and there is now a ‘Most popular genres’ section, which is split by gender as with the other lists.  I also added in pie charts showing book genres represented at the library and the percentage of borrowings of each genre.  Unfortunately these can get a bit cluttered due to there being up to 20-odd genres present, so I’ve added in a legend showing which colour is which genre.  You can hover over a slice to view the genre title and name and you can click on a slice to perform a search for borrowing records featuring a book of the genre in the library.  Despite being a bit cluttered I think the pies can be useful, especially when comparing the two charts – for example at Haddington ‘Theology’ books make up more than 36% of the library but only 8% of the borrowings.

Due to the somewhat cluttered nature of the pie charts I also experimented with a treemap view of Genre.  I had stated we would include such a view in the requirements document, but at that time I had thought genre would be hierarchical, and a treemap would display the top-level genres and the division of lower level genres within these.  Whilst developing the genre features I realised that without this hierarchy the treemap would merely replicate the pie chart and wouldn’t be worth including.

However, when the pie charts turned out to be so cluttered I decided to experiment with treemaps as an alternative.  The results currently appear after the pie charts in the page.  I initially liked how they looked – the big blocks look vaguely ‘bookish’ and having the labels in the blocks makes it easier to see what’s what.  However, there are downsides.  Firstly, it can be rather difficult to tell which genre is the biggest, due to the blocks having different dimensions – does a tall, thin block have a larger area than a shorter, fatter block, for example.  It’s also much more difficult to compare two treemaps as the position of the genres changes depending on their relative size.  Thankfully the colour stays the same, but it takes longer than it should to ascertain where a genre has moved to in the other treemap and how its size compares.  I met with the team on Friday to discuss the new additions and we agreed that we could keep the treemaps, but that I’d add them to a separate tab, with only the pie charts visible by default.

I then added in the ‘borrowings over time by genre’ visualisation to the in-library and top level ‘facts’ pages.  As you can see from the above screenshot, these divide the borrowings in a stacked bar chart per year (other month if a year is clicked on) into genre, much in the same way as the preceding ‘occupations’ chart.  Note however that the total numbers for each year are not the same as for the occupations through time visualisation as books may have multiple genres and borrowers may have multiple occupations and the counts reflect the number of times a genre / occupation is associated with a borrowing record each year (or month if you drill down into a year).  We might need to explain this somewhere.

We met on Friday to discuss the outstanding tasks.  We’ll probably go live with the resource in January, but I will try to get as many of my outstanding tasks completed before Christmas as possible.

Also this week I fixed another couple of minor issues with the Dictionaries of the Scots Language.  The WordPress part of the site had defaulted to using the new, horrible blocks interface for widgets after a recent update, meaning the widgets I’d created for the site no longer worked.  Thankfully installing the ‘Classic Widgets’ plugin fixed the issue.  I also needed to tweak the CSS for one of the pages where the layout was slightly wonky.

I also made a minor update to the Speech Star site and made a few more changes to the new Robert Fergusson site, which has now gone live (see https://robert-fergusson.glasgow.ac.uk/). I also had a chat with our IT people about a further server switch that is going to take place next week and responded to some feedback about the new interactive map of Iona placenames I’m developing.

Also this week I updated the links to one of the cognate reference websites (FEW) from entries in the Anglo-Norman Dictionary, as the website had changed its URL and site structure.  After some initial investigation it appeared that the new FEW website made it impossible to link to a specific page, which is not great for an academic resource that people will want to bookmark and cite.  Ideally the owners of the site should have placed redirects from the pages of the old resource to the corresponding page on the new resource (as I did for the AND).

The old links to the FEW as were found in the AND (e.g. the FEW link that before the update was on this page: https://anglo-norman.net/entry/poer_1) were formatted like so: https://apps.atilf.fr/lecteurFEW/lire/volume/90/page/231 which now gives a ‘not found’ error.  The above URL has the volume number (9, which for reasons unknown to me was specified as ‘90’) and the page number (213).  The new resource as found here: https://lecteur-few.atilf.fr/ and it lets you select a volume (e.g. 9: Placabilis-Pyxis) and enter a page (e.g. 231), which then updates the data on the page (e.g. showing ‘posse’ as the original link from AND ‘poer 1’ used to do).  But crucially, their system does not update the URL in the address bar, meaning no-one can cite or bookmark their updated view and it looked like we couldn’t link to a specific view.

Their website makes it possible to click on a form to load a page (e.g. https://lecteur-few.atilf.fr/index.php/page/lire/e/198595) but the ID in the resulting page URL is an autogenerated ID that bears no relation to the volume or page number and couldn’t possibly be ascertained by the AND (or any other system) so is of no use to us.  Also, the ‘links’ that users click on to load the above URL are not HTML links at all but are generated in JavaScript after the user clicks on them.  This means it wouldn’t be possible for me to write a script that would grab each link for each matching form.  It also means a user can’t hover over the link to see where it leads or open the link in a new tab or window, which is also not ideal.  In addition, once you’re on a page like the one linked to above navigating between pages doesn’t update the URL in the address bar so a user who loads the page than navigates to different pages and finds something of interest can’t then bookmark or cite the correct page as the URL is still for the first page they loaded, which again is not very good.

Thankfully Geert noticed that another cognate reference site (the DMF) had updated their links to use new URLs that are not documented on the FEW site, but do appear to work (e.g. https://lecteur-few.atilf.fr/lire/90/231).  This was quite a relief to discover as otherwise we would not have been able to link to specific FEW pages.  Once I knew this URL structure was available updating the URLs across the site was a quick update.

Finally this week, I had a meeting with Clara Cohen and Maria Dokovova to discuss a possible new project that they are putting together.  This will involve developing a language game aimed at primary school kids and we discussed some possible options for this during our meeting.  After wards I wrote up my notes and gave the matter some further thought.