Week Beginning 19th June 2023

I continued to work for the Books and Borrowing project this week, switching the search facilities over to use a new Solr index that includes author gender.  It is now possible to incorporate author gender into searches, for example bringing back all borrowing records involving books written by women.  This will be a hugely useful feature.  I also fixed an issue with a couple of page images of a register at Leighton library that weren’t displaying.

The rest of my time this week was spent developing a new Bootstrap powered interface for the project’s website, which is now live (https://borrowing.stir.ac.uk/).  You’d struggle to notice any difference between this new version and the old one as the point of creating this new theme was not to change the look of the website but to make Bootstrap (https://getbootstrap.com/) layout options available to the dev site.  This will allow me to make improvements to the layout of things like the advanced search forms.  I haven’t made any such updates yet, but that is what I’ll focus on next.

It has taken quite a bit of time to get the new theme working properly – blog posts with ‘featured images’ that replace the site’s header image proved to be particularly troublesome to get working – but I think all is functioning as it should be now.  There are a few minor differences between the new theme and the old one.  The new theme has a ‘Top’ button that appears in the bottom right when you scroll down a long page, which is something I find useful.  The drop-down menus in the navbar look a bit different, as does the compact navbar shown on narrow screens.  All pages now feature the sidebar whereas previously some (e.g. https://borrowing.stir.ac.uk/libraries/) weren’t showing it.  Slightly more text is shown in the snippets on the https://borrowing.stir.ac.uk/project-news/ and other blog index pages.  Our title font is now used for more titles throughout the site.  I’ve also added in a ‘favicon’ for the site, which appears in the browser tab.  It’s the head of the woman second from the right in the site banner, although it is a bit indistinct.  My first attempt was the book held by the woman in the middle of the banner but this just ended up as a beige blob.

Next week I’ll update the layout of the dev site pages to use Bootstrap.  I’m going to be on holiday the week after next and at a conference the week after that so this might be a good time to share the URL for feedback, as other than adding in book genre when this is available everything else should pretty much be finished.

For the Anglo-Norman Dictionary this week I participated in a conference call to discuss collaborative XML editing environments.  The team are wanting to work together directly on XML files and to have a live preview of how these changes appear.  The team are investigating https://www.fontoxml.com/ and also https://paligo.net/ and https://www.xpublisher.com/en/xeditor.  However, none of these solutions give any mention whatsoever of pricing on their websites, which is incredibly frustrating and off-putting.  I also mentioned the DPS system that the DSL uses (https://www.idmgroup.com/content-management/dps-info.html).  We’ll need to give this some further thought.

I also spent a bit of time writing a script to extract language tags from the data.  The script goes through each ‘active’ entry in the online database and picks out all of the language tags from the live entry XML and stores each language and the number of times each language appears in each entry (across all senses, subsenses, locutions).  It does the same for the ‘dms_entry_old’ XML data (i.e. the data that was originally stored in the current system before any transformations or edits were made) for each of these ‘active’ entries (if the XML data exists) and similarly stores each language and frequency as ‘old’ languages.  In addition, the script goes through each of the ‘R’ XML files and picks out all language tags contained in them, augmenting in the list of ‘old’ languages.  For each ‘active’ entry that has at least one ‘live’ or ‘old’ language the script exports the slug and the ‘live’ and ‘old’ languages, consisting of each language found and the number of times found in the entry.  This data is then saved in a spreadsheet.

There are 1908 entries that will need to be updated and this update will consist of removing all language tags from each sense / subsense in each listed entry, adding a new language tag at entry level (probably below the <pos> tag) for each distinct language found, updating the DTD to make the newly positioned tags valid and updating the XSLT to ensure the new tags get displayed properly in the web pages.

I also began to think about how I’ll implement date / part of speech searches and sparklines in the Dictionaries of the Scots Language and have started writing a requirements document for the new features.  We had previously discussed adding the date search and filter options to quotations searches only, but subsequent emails from the editor suggest that these would be offered for other searches too and that in addition we would add in a ‘first attested’ search / filter option.

The quotation search will look for a term in the quotations, with each quotation having an associated date or date range.  Filtering the results by date will then narrow the results to only those quotations that have a date within the period specified by the filter.  For example, a quotation search limited to SND for the term ‘dreich’ will find 113 quotations.  Someone could then use the date filter to enter the years 1900-1950 which will then limit the quotations to 26 (those that have dates in this period).

At the moment I’m not sure how a ‘first attested’ filter would work for a quotation search.  A ‘first attested’ date is something that would be stored at entry rather than quotation level and would presumably be the start date of the earliest quotation.  So for example the SND entry for ‘Driech’ has an earliest quotation date of 1721 and we would therefore store this as the ‘first attested’ date for this entry.

This could be a very useful filter for entry searches and although it could perhaps be useful in a quotation search it might just confuse users.  E.g. the above search for the term ‘Dreich’ in SND finds 113 quotations.  A ‘first attested’ filter would then be used to limit these to quotations associated with entries that have a first attested date in the period selected by the user.  So for example if the user enters 1700-1750 in the ‘Dreich’ results then the 113 quotations would then be limited to those belonging to entries that were first attested in this period, which would include the entry ‘Dreich’.  But the listed quotations would still include all of those for the entry ‘Driech’ that include with search term ‘Dreich’ not just those from 1700-1750 because the limit was placed on entries with a first attested date in that period and not quotations found in that period.

In addition, the term searched for would not necessarily appear in the quotation that gave the entry its first attested date.  An entry can only have one first attested date and (in the case of a quotation search) the results will only display quotations that feature the search term, which will quite possibly not include the earliest quotation.  A search for quotations featuring ‘dreich’ in SND will not return the earliest quotation for the entry SND ‘Dreich’ as the form in this quotation is actually ‘dreigh’.

If we do want to offer date searching / filtering for all entry searches and not just quotation searches we would also have to consider whether we would then just store the dates of the earliest and last quotations to denote the ‘active’ period for the entry or whether we would need to take into account any gaps in this period as will be demonstrated by the sparklines.  If it’s the former then the ‘active’ period for SND ‘Dreich’ would be 1721-2000, so someone searching the full text of entries for the term ‘dreich’ and then entering ‘1960-1980’ as a ‘use’ filter will then still find this entry.  If it’s the latter than this filter would not find the entry as we don’t have any quotations between 1954 and 1986.

Also this week I had to spend a bit of time fixing a number of sites after a server upgrade stopped a number of scripts working.  It took a bit of time to track all of these down and fix them.  I also responded to a couple of questions from Dimitra Fimi of the Centre for Fantasy and the Fantastic regarding WordPress stats and mailing list software and discussed a new conference website with Matthew Creasy in English Literature.