Week Beginning 18th December 2023

This was the last working week before Christmas, and it was a four-day week due to Friday being given in lieu of Christmas Eve, which is on Sunday this year.  I’ll be off for the next two weeks, which I’m really looking forward to.

The Books and Borrowing project officially comes to an end on the 31st of December, although we’re not going to launch the front-end until sometime in January.  I still had rather a lot of things to do for the project and therefore spent the entirely of my four days this week working for this project.  Of course, the other team members were also frantically trying to get things finished off, which often led to them spotting something they needed me to sort out, so I found myself even busier than I was expecting.  However, by Thursday I had managed to complete all of the tasks I’d hoped to finish, plus many more that were sent my way as the week progressed.

At the end of last week I’d begun updating the site text that the project PI Katie had supplied me with, and I completed this task, finally banishing all of the placeholder text.  This also involved much discussion about the genre visualisations and what they actually represent, which we thankfully reached agreement on.  I also added in some further images of a register at Leighton library and wrote a script to batch update changes to the publication places, dates of publication and formats of many book edition records.  One of the researchers also spotted that the ‘next’ and ‘previous’ links for the two Selkirk registers were not working, due to an earlier amalgamation of page records into ‘double spread’ records.  I therefore wrote another script to sort these out.

I then added new ‘Top ten book work’ lists to the site-wide ‘Facts’ page (overall and by the gender of borrowers).  This required me to update the script that generated the cache that I developed last week, to rerun the script to generate fresh data, to update the API to ensure that works were incorporated into the output and to update the front-end to add in the data.  Hopefully the information will be of interest to people.

I then overhauled the highlighting of search terms in the search results.  This was previously only working with the quick search, and only when no wildcards were used in the search.  Instead I used a nice JavaScript library called mark.js (https://markjs.io/) that I’d previously used for the DSL website to add in highlighting on the client-side.  Now any the values in any search fields that are searched for will be highlighted in the record, including when wildcards are used.  I also updated the highlight style to make it a bit less harsh.

It should be noted that highlighting is still a bit of a blunt tool – any search terms will be highlighted throughout the entire record where the term is found.  So if you search for the occupation ‘farmer’ then wherever ‘farmer’ is found in the record it will be highlighted, not just in the normalised occupation list.  Similarly, if you search for ‘born’ then the ‘born’ text in the author information will be highlighted.  It’s not feasible to make the highlighting more nuanced in the time we have left, but despite this I think that on the whole the highlighting is useful.

I reckoned that the highlighting could end up being a bit distracting so I added in an option to turn results highlighting on or off.  I added a button to process this to the search results page, as part of the buttons that include the ‘Cite’ and ‘Download’ options.  The user’s choice is remembered by the site, so if you turn highlighting off and then navigate through the pages of results or perform a filter the highlights stay off.  They will stay off until you turn them on again, even if you return to the site after closing your browser.

One of the researchers noticed that an unnecessary near-duplicate genre had somehow been introduced into the system (‘Fine Art’ instead of ‘Fine Arts’) so I removed the and reassigned any records that were assigned to the erroneous version.  The PI Katie also spotted some odd behaviour with the search form boxes.  When using the browser’s ‘back’ button search data was being added to the wrong search boxes.  This took quite some time to investigate and I couldn’t replicate the issue in Firefox (the browser I use by default), but when using a Chrome-based browser (MS Edge) I experience the issue.  It turns out it’s nothing to do with my code but a bug in Chrome (see https://github.com/vuejs/vue/issues/11165).  The fix mentioned on this page was to add ‘autocomplete=”off”’ to the form and this seems to have sorted the problem.  It’s crazy that this issue with Chrome hasn’t been fixed as the posts on the page identifying the issue started in 2020.

Katie also spotted another issue when using Chrome.  Applying multiple filters to the search results wasn’t working in Chrome, even though it worked fine in Firefox.  This time it was caused by Chrome encoding the bar character to %7C while Firefox keeps it as ‘|’.  My filter script was splitting up filters on the actual bar character and as this wasn’t present in Chrome multiple filters were not working (even though they were working fine in Firefox).  Thankfully once identified this was relatively easy to fix.

I also managed to implement a ‘compact’ view of borrowing records this week, something that had been on my ‘to do’ list for a while.  Borrowing records can be extremely verbose and rather overwhelming so we decided to give the option to view compact versions of the records that contain a narrower set of fields.  I added a  compact / full record view switcher to the bar of options in the top right of the search results and library register page, beside the ‘Cite’ option.  As with the highlighting feature I previously discussed, the choice is remembered in your browser, even if you return to the site in a later session (so long as you’re using the same device and browser, of course).

For the compact view I decided to retain the links to the library, register and page as I figured it would be useful to be able to see these.  Also included are the borrowed and returned dates, the borrowers (names only), the title of the Book Work (or Works) if the record has such an association and the title of the Holding if not, any associated authors and genres, plus a list of the volumes borrowed (if applicable).  The following screenshot shows what the compact looks like:

My final tasks of the week were to add in a cookie banner for the site and install Google Analytics.  In the New Year I’ll need to regenerate the Solr index and then integrate the development site with the live site.  This will include making updates to paths throughout the code, ensuring the existing Chambers Maps continues to function, adding links to the pages of the development site to the site menu and adding the quick search option to the site header.  It will be great once the site is fully accessible.

Also this week I created a new Google Analytics property for a site the DSL launched a month or two ago and spoke to Geert, the editor of the AND about an issue he’d spotted with entry dates (that I’ll investigate after Christmas).  I finished off my work for the year by removing the Twitter widget from every site I’m responsible for.  Twitter blocked access to the widget that allows a feed to be embedded in a website a few months ago and it looks like this is a permanent change.  It meant instead of a nice Twitter feed an empty feed and a ‘nothing to see here’ message was displayed on all of my sites, which was obviously no good.  It feels quite liberating to drop Twitter (or X as it is currently called).

That’s all from me for this year.  If anyone is reading this I wish you all the best for Christmas and 2024!

Week Beginning 11th December 2023

I devoted about three days of this week to developing the new place-names map for the Iona project.  My major task was to make the resource ‘remember’ things, which has taken a long time to implement as basically everything I’d developed so far needed to be extended and reworked.  Now as you scroll around the map and zoom in and out the hash in the URL in the address bar updates.  Your selected ‘Display options’ also appear in the hash.  What this means is you can bookmark or share a specific view.  I can’t share the full URL of the map yet as it’s not publicly available, but for example the hash ‘#15.5/56.3490/-6.4676/code/tileNLS1/labelsOn’ provides a URL that loads the map zoomed in on the islands of Stac MhicMhurchaidh and Rèidh-Eilean, with the OS 1840-1880 base map selected, categorised by classification code and labels always on.  Any search or browse options you’ve entered or selected are also remembered in this way, for example the hash ‘#16/56.3305/-6.3967/language/tileSat/labelsOff/browse/nameCurrent|C’ gives a URL for a ‘browse’ showing current names beginning with ‘C’ on a satellite map categorised by element language focussed on the main settlement.

The same approach also works for the search facility, with search options separated by a double bar and the search term and its value separated by a single bar.  For example ‘#13.75/56.3287/-6.4139/date/tileSat/labelsOn/advancedSearch/eid|181||elementText|d%C3%B9n%20(G)||lang|G’ gives a URL for an advanced search showing results where the element language is Gaelic and the element in question is ‘dùn’, categorised by earliest recorded date.  With an advanced search you can now press on the ‘Refine your advanced search’ button and any advanced search options previously entered will appear in the search boxes.

You can also now bookmark and share a record when you open the ‘full record’, as the action of opening a record also adds information to the URL in the address bar.  So, for example ‘#14/56.3293/-6.4125/date/tileSat/labelsOn/advancedSearch/eid|181||elementText|d%C3%B9n%20(G)||lang|G/record/4262’ is the hash of the record for ‘Dùn nam Manach’.  Note that this also preserves whatever search or browse led to the place-name being opened too.

I also made some updates to the ‘full record’, the most important being that the text appearing as links can now be clicked on to perform a search for that information.  So for example if you press on the classification code in the record this will close the record and perform a search for the code.  Place-name element links are not yet operational, though, as these will link to the element glossary, which I still need to create.  I have however created a new menu item in the left-hand menu for the glossary and have figured out how it will work.  I’m intending to make it a modal pop-up like the record and advanced search.  I did consider adding it into the left-hand menu like the ‘browse’ but there’s just too much information for it to work there.

I also added in a new ‘Cite this record’ tab to the ‘full record’ which will display citation information for the specific record.  I still need to add this in.  Also new is a bar of icons underneath the left-hand menu options.  This contains buttons for citing the map view, viewing the data displayed on the map as a table and downloading the map data as a CSV file, but none of these are operational yet.

On Thursday I had a meeting with the Iona team to discuss the map.  They are pretty pleased with how it’s turning out, but they did notice a few bugs and things they would like done differently.  I made a sizeable ‘to do’ list and I will tackle this in the new year.

I spent most of the remainder of the week working on the Books and Borrowing project.  I updated the languages assigned to a list of book editions that had been given to me in a spreadsheet and added a few extra pages and page images to one of the registers.   I then returned to my ‘to do’ list for the project and worked through some of the outstanding items.  I moved the treemaps on the library and site-wide ‘facts’ pages to separate tabs and I went through the code to ensure that the data for all visualisations only uses borrowing records set to ‘in stats’.  This wasn’t done before so many of the visualisations and data summaries will have changed slightly.  I also removed the non-male/female ‘top ten’ lists in the library facts page, as requested.

I then moved on to creating a cache for the facts page data, which took about a day to implement.  I firstly generated static data for each library and stored this as JSON in the database.  This is then used for the library facts page rather than processing the data each time.  However, the site-wide facts page lets the user select any combination of libraries (or select all libraries) and the ‘top ten’ lists therefore have to dynamically reflect the chosen libraries.  This meant updating the API to pull in the ‘facts’ JSON files for each selected library and then analyse them in order to generate new ‘top tens’ for the chosen libraries.  For example, working out the top ten genres for all selected libraries meant going through the individual top ten genre lists for each library, working out the total number of borrowings for each genre and then reordering things after this merging of data was complete.

Despite still requiring this processing the new method of using the cached data is considerably faster than querying and generating the data afresh each time the user requests it.  Previously displaying the site-wide ‘facts’ page for all libraries was taking up to a minute to complete whereas now it takes just a few seconds.  I also made a start on updating the site text that Katie had sent me earlier in the week.  A large number of tweaks and changes are required and this is likely to take quite a long time, but I hope to have it finished next week.

Towards the start of the week I also spent some time in discussions about what should become of the Pilot Scots Thesaurus website.  The project ended in 2015, the PI moved on from the University several years ago and the domain name will expire in April 2024.  We eventually decided that the site will be archived, with the data added to the Enlighten research data repository and the main pages of the site archived and made available via the University’s web archive partner.

Towards the end of the week I did some further work for the Anglo-Norman Dictionary, including replacing the semi-colon with a diamond in the entry summary box (e.g. see https://anglo-norman.net/entry/colur_1) and discussing whether labels should also appear in the box (we decided against it).  I also had a discussion with the editor Geert about adding new texts to the Textbase and thought a little about the implications of this, given that the texts are not marked up as TEI XML and the Textbase was developed around TEI XML texts.  I’ll probably do some further investigation in the new year once Geert sends me on some sample files to work with.

 

Week Beginning 4th December 2023

After spending much of my time over the past three weeks adding genre to the Books and Borrowing project I turned my attention to other projects for most of this week.  One of my main tasks was to go through the feedback from the Dictionaries of the Scots Language people regarding the new date and quotation searches I’d developed back in September.  There was quite a lot to go through, fixing bugs and updating the functionality and layout of the new features.  This included fixing a bug with the full text Boolean search, which was querying the headword field rather than the full text and changing the way quotation search ranking works.  Previously quotation search results were ranked by the percentage of matching quotes, and if this was the same then the entry with the largest number of quotes would appear higher.  Unfortunately this meant that entries with only one quote ended up ranked higher than entries with large numbers of quotes, not all of which contained the term.  I updated this so that the algorithm now counts the number of matching quotes and ranks primarily on this, only using the percentage of matching quotes when two entries have the same number of matching quotes.  So now a quotation search for ‘dreich’ ranks what are hopefully the most important entries first.

I also updated the display of dates in quotations to make them bold and updated the CSV download option to limit the number of fields that get returned.  I also noticed that when a quotation search exceeded the maximum number of allowed results (e.g. ‘heid’) it was returning no results due to a bug in the code, which I fixed.  I also fixed a bug that was stopping wildcards in quick searches from working as intended and fixed an issue with the question mark wildcard in the advanced headword search.

I then made updates to the layout of the advanced search page, including adding placeholder ‘YYYY’ text to the year boxes, adding a warning about the date range when dates provided are beyond the scope of the dictionaries and overhauling the search help layout down the right of the search form.  The help text scroll down/up was always a bit clunky so I’ve replaced it with what I think is a neater version.  You can see this, and the year warning in the following screenshot:

I also tweaked the layout of the search results page, including updating the way the information about what was search for is displayed, moving some text to a tooltip, moving the ‘hide snippets’ option to the top menu bar and ensuring the warning that is displayed when too many results are returned appears directly above the results.  You can see all of this in the following screenshot:

I then moved onto updates to the sparklines.  The team decided they wanted the gap length between attestations to be increased from 25 to 50 years.  This would mean individual narrow lines would then be grouped into thicker blocks.  They also wanted the SND sparkline to extend to 2005, whereas previously it was cut off at 2000 (with any attestations after this point given the year 2000 in the visualisation).  These updates required me to make changes to the scripts that generate the Solr data and to then regenerate the data and import it into Solr.  This took some time to develop and process, and currently the results are only running on my laptop as it’s likely the team will want further changes made to the data.  The following screenshot shows a sparkline when the gap length was set to 25 years:

And the following screenshot shows the same sparkline with the gap length set to 50 years:

I also updated the dates that are displayed in an entry beside the sparkline to include the full dates of attestation as found in the sparkline tooltip rather than just displaying the first and last dates of attestation.

I completed going through the feedback and making updates on Wednesday and now I need to want and see whether further updates are required before we go live with the new date and quotation search facilities.

I spent the rest of the week working on various projects.  I made a small tweak to remove an erroneous category from the Old English Thesaurus and dealt with a few data issues for the Books and Borrowing project too, including generating spreadsheets of data for checking (e.g. list of all of the distinct borrower titles) and then making updates to the online database after these spreadsheets had been checked.  I also fixed a bug with the genre search, which was joining multiple genre selections with Boolean AND when it should have been joining them with Boolean OR.

I also returned to working for the Anglo-Norman Dictionary.  This included updating the XSLT so that legiturs in variant lists displayed properly (see ‘la noitement (l. l’anoitement))’ here: https://anglo-norman.net/entry/anoitement).  Whilst sorting this out I noticed that some entries would appear to have multiple ‘active’ records in the database – a situation that should not have happened.  After spotting this I did some frantic investigation to understand what was going on.  Thankfully it turned out that the issue has only affected 23 entries, with all but two of them having two active records.  I’m not sure what happened with ‘bland’ to result in 36 active records, and ‘anoitement’ with 9, but I figured out a way to resolve the issue and ensure it doesn’t happen again in future.  I updated the script that publishes holding area entries to ensure any existing ‘active’ records are removed when the new record is published.  Previously the script was only dealing with one ‘active’ entry (as that is all there should have been), which I think may have been how the issue cropped up.  In future the duplicate issue will rectify itself whenever one of the records with duplicate active records is edited – at the point of publication all existing ‘active’ records will be moved to the ‘history’ table.

Also for the AND this week I updated the DTD to ensure that superscript text is allowed in commentaries.  I also removed the embedded Twitter feed from the homepage as it looks like this facility has been permanently removed by Twitter / X.  I’ve also tweaked the logo on narrow screens so it doesn’t display so large, which should make the site better to use on mobile phones and I fixed an issue with the entry proofreader which was referencing an older version of jQuery that no longer existed.  I also fixed the dictionary’s ‘browse up’ facility, which had broken.

I also found some time to return to working on the new map interface for the Iona place-names project and have now added in the full record details.  When you press on a marker to open the popup there is now a ‘View full record’ button.  Pressing on this opens an overlay the same as the ‘Advanced search’ that contains all of the information about the record, in the same way as the record page on the other place-name resources.  This is divided into a tab for general information and another for historical forms as you can see from the following screenshot:

Finally this week I kept project teams updated on another server move that took place overnight on Thursday.  This resulted in downtime for all affected websites, but all was working again the next morning.  I needed to go through all of the websites to ensure they were working as intended after the move, and thankfully all was well.

Week Beginning 27th November 2023

I completed work on the integration of genre into the Books and Borrowing systems this week.  It took a considerable portion of the week to finalise the updates but it’s really great to have it done, as it’s the last major update to the project.

My first task was to add genre selection to the top-level ‘Browse Editions’ page, which I’m sure will be very useful.  As you can see in the following screenshot, genres now appear as checkboxes as with the search form, allowing users to select one or more they’re interested in.  This can be done in combination with publication date too.  The screenshot shows the book editions that are either ‘Fiction’ or ‘Travel’ that were published between 1625 and 1740.  The selection is remembered when the user changes to a different view (i.e. authors or ‘top 100’) and when they select a different letter from the tabs.

It proved to be pretty tricky and time-consuming to implement.  I realised that not only did the data that is displayed need to be updated to reflect the genre selection, but the counts in the letter tabs needed to be updated too.  This may not seem like a big thing, but the queries behind it took a great deal of thought.  I also realised whilst working on the book counts that the counts in the author tabs were wrong – they were only counting direct author associations at edition level rather than taking higher level associations from works into consideration.  Thankfully this was not affecting the actual data that was displayed, just the counts in the tabs.  I’ve sorted this too now, which also took some time.

With this in place I then added a similar option to the in-library ‘Book’ page.  This works in the same way as the top-level ‘Editions’ page, allowing you to select one or more genres to limit the list of books that are displayed, for example only books in the genres of ‘Belles Lettres’ and ‘Fiction’ at Chambers, ordered by title or the most popular ‘Travel’ books at Chambers.  This did unfortunately take some time to implement as Book Holdings are not exactly the same as Editions in terms of their structure and connections so even though I could use much of the same code that I’d written for Editions many changes needed to be made.

The new Solr core was also created and populated at Stirling this week, after which I was able to migrate my development code from my laptop to the project server, which meant I could share my work with others, which was good.

I then moved onto adding genre to the in-library ‘facts’ page and the top-level ‘facts’ page.  Below is a very long screenshot of the entire ‘facts’ page for Haddington library and I’ll discuss the new additions below:

The number of genres found at the library is now mentioned in the ‘Summary’ section and there is now a ‘Most popular genres’ section, which is split by gender as with the other lists.  I also added in pie charts showing book genres represented at the library and the percentage of borrowings of each genre.  Unfortunately these can get a bit cluttered due to there being up to 20-odd genres present, so I’ve added in a legend showing which colour is which genre.  You can hover over a slice to view the genre title and name and you can click on a slice to perform a search for borrowing records featuring a book of the genre in the library.  Despite being a bit cluttered I think the pies can be useful, especially when comparing the two charts – for example at Haddington ‘Theology’ books make up more than 36% of the library but only 8% of the borrowings.

Due to the somewhat cluttered nature of the pie charts I also experimented with a treemap view of Genre.  I had stated we would include such a view in the requirements document, but at that time I had thought genre would be hierarchical, and a treemap would display the top-level genres and the division of lower level genres within these.  Whilst developing the genre features I realised that without this hierarchy the treemap would merely replicate the pie chart and wouldn’t be worth including.

However, when the pie charts turned out to be so cluttered I decided to experiment with treemaps as an alternative.  The results currently appear after the pie charts in the page.  I initially liked how they looked – the big blocks look vaguely ‘bookish’ and having the labels in the blocks makes it easier to see what’s what.  However, there are downsides.  Firstly, it can be rather difficult to tell which genre is the biggest, due to the blocks having different dimensions – does a tall, thin block have a larger area than a shorter, fatter block, for example.  It’s also much more difficult to compare two treemaps as the position of the genres changes depending on their relative size.  Thankfully the colour stays the same, but it takes longer than it should to ascertain where a genre has moved to in the other treemap and how its size compares.  I met with the team on Friday to discuss the new additions and we agreed that we could keep the treemaps, but that I’d add them to a separate tab, with only the pie charts visible by default.

I then added in the ‘borrowings over time by genre’ visualisation to the in-library and top level ‘facts’ pages.  As you can see from the above screenshot, these divide the borrowings in a stacked bar chart per year (other month if a year is clicked on) into genre, much in the same way as the preceding ‘occupations’ chart.  Note however that the total numbers for each year are not the same as for the occupations through time visualisation as books may have multiple genres and borrowers may have multiple occupations and the counts reflect the number of times a genre / occupation is associated with a borrowing record each year (or month if you drill down into a year).  We might need to explain this somewhere.

We met on Friday to discuss the outstanding tasks.  We’ll probably go live with the resource in January, but I will try to get as many of my outstanding tasks completed before Christmas as possible.

Also this week I fixed another couple of minor issues with the Dictionaries of the Scots Language.  The WordPress part of the site had defaulted to using the new, horrible blocks interface for widgets after a recent update, meaning the widgets I’d created for the site no longer worked.  Thankfully installing the ‘Classic Widgets’ plugin fixed the issue.  I also needed to tweak the CSS for one of the pages where the layout was slightly wonky.

I also made a minor update to the Speech Star site and made a few more changes to the new Robert Fergusson site, which has now gone live (see https://robert-fergusson.glasgow.ac.uk/). I also had a chat with our IT people about a further server switch that is going to take place next week and responded to some feedback about the new interactive map of Iona placenames I’m developing.

Also this week I updated the links to one of the cognate reference websites (FEW) from entries in the Anglo-Norman Dictionary, as the website had changed its URL and site structure.  After some initial investigation it appeared that the new FEW website made it impossible to link to a specific page, which is not great for an academic resource that people will want to bookmark and cite.  Ideally the owners of the site should have placed redirects from the pages of the old resource to the corresponding page on the new resource (as I did for the AND).

The old links to the FEW as were found in the AND (e.g. the FEW link that before the update was on this page: https://anglo-norman.net/entry/poer_1) were formatted like so: https://apps.atilf.fr/lecteurFEW/lire/volume/90/page/231 which now gives a ‘not found’ error.  The above URL has the volume number (9, which for reasons unknown to me was specified as ‘90’) and the page number (213).  The new resource as found here: https://lecteur-few.atilf.fr/ and it lets you select a volume (e.g. 9: Placabilis-Pyxis) and enter a page (e.g. 231), which then updates the data on the page (e.g. showing ‘posse’ as the original link from AND ‘poer 1’ used to do).  But crucially, their system does not update the URL in the address bar, meaning no-one can cite or bookmark their updated view and it looked like we couldn’t link to a specific view.

Their website makes it possible to click on a form to load a page (e.g. https://lecteur-few.atilf.fr/index.php/page/lire/e/198595) but the ID in the resulting page URL is an autogenerated ID that bears no relation to the volume or page number and couldn’t possibly be ascertained by the AND (or any other system) so is of no use to us.  Also, the ‘links’ that users click on to load the above URL are not HTML links at all but are generated in JavaScript after the user clicks on them.  This means it wouldn’t be possible for me to write a script that would grab each link for each matching form.  It also means a user can’t hover over the link to see where it leads or open the link in a new tab or window, which is also not ideal.  In addition, once you’re on a page like the one linked to above navigating between pages doesn’t update the URL in the address bar so a user who loads the page than navigates to different pages and finds something of interest can’t then bookmark or cite the correct page as the URL is still for the first page they loaded, which again is not very good.

Thankfully Geert noticed that another cognate reference site (the DMF) had updated their links to use new URLs that are not documented on the FEW site, but do appear to work (e.g. https://lecteur-few.atilf.fr/lire/90/231).  This was quite a relief to discover as otherwise we would not have been able to link to specific FEW pages.  Once I knew this URL structure was available updating the URLs across the site was a quick update.

Finally this week, I had a meeting with Clara Cohen and Maria Dokovova to discuss a possible new project that they are putting together.  This will involve developing a language game aimed at primary school kids and we discussed some possible options for this during our meeting.  After wards I wrote up my notes and gave the matter some further thought.