Week Beginning 31st October 2022

I spent a lot of the week continuing to work on the Books and Borrowing front end.  To begin with I worked on the ‘borrowers’ tab in the ‘library’ page and created an initial version of it.  Here’s an example of how it looks:

As with books, the page lists borrowers alphabetically, in this case by borrower surname.  Letter tabs and counts of the number of borrowers with surnames beginning with the letter appear at the top and you can select a letter to view all borrowers with surnames beginning with the letter.  I had to create a couple of new fields in the borrower table to speed the querying up, saving the initial letter of each borrower’s surname and a count of their borrowings.

The display of borrowers is similar to the display of books, with each borrower given a box that you can press on to highlight.  Borrower ID appears in the top right and each borrower’s full name appears as a green title.  The name is listed as it would be read, but this could be updated if required.  I’m not sure where the ‘other title’ field would go if we did this, though – presumably something like ‘Macdonald, Mr Archibald of Sanda’.

The full information about a borrower is listed in the box, including additional fields and normalised occupations.  Cross references to other borrowers also appear.  As with the ‘Books’ tab, much of this data will be linked to search results once I’ve created the search options (e.g. press on an occupation to view all borrowers with this occupation, press on the number of borrowings to view the borrowings) but this is not in place yet.  You can also change the view from ‘surname’ to ‘top 100 borrowers’, which lists the top 100 most prolific borrowers (or less if there are less than 100 borrowers in the library).  As with the book tab, a number appears at the top left of each record to show the borrower’s place on the ‘hitlist’ and the number of borrowings is highlighted in red to make it easier to spot.

I also fixed some issues with the book and author caches that were being caused by spaces at the start of fields and author surnames beginning with a non-capitalised letter (e.g. ‘von’) which was messing things up as the cache generation script was previously only matching upper case, meaning ‘v’ wasn’t getting added to ‘V’.  I’ve regenerated the cache to fix this.

I then decided to move onto the search rather than the ‘Facts & figures’ tab as I reckoned this should be prioritised.  I began work on the quick search initially, and I’m still very much in the middle of this.  The quick search has to search an awful lot of and to do this several different queries need to be run.  I’ll need to see how this works in terms of performance as I fear the ‘quick’ search risks being better named the ‘slow’ search.

We’ve stated that users will be able to search for dates in the quick search and these need to be handled differently.  For now the API checks to see whether the passed search string is a date by running a pattern match on the string.  This converts all numbers in the string into an ‘X’ character and then checks to see whether the resulting string matches a valid date form.  For the API I’m using a bar character (|) to designate a ranged date and a dash to designate a division between day, month and year.  I can’t use a slash (/) as the search string is passed in the URL and slashes have meaning in URLs.  For info, here are the valid date string patterns:

“XXXX”,”XXXX-XX”,”XXXX-XX-XX”,”XXXX|XXXX”,”XXXX|XXXX-XX”,”XXXX|XXXX-XX-XX”,”XXXX-XX|XXXX”,”XXXX-XX|XXXX-XX”,”XXXX-XX|XXXX-XX-XX”,”XXXX-XX-XX|XXXX”,”XXXX-XX-XX|XXXX-XX”,”XXXX-XX-XX|XXXX-XX-XX”

So for example, if someone searches for ‘1752’ or ‘1752-03’ or ‘1752-02|1755-07-22’ the system will recognise these as a date search and process them accordingly.  I should point out that I can and probably will get people to enter dates in a more typical way in the front-end, using slashes between day, month and year and a dash between ranged dates (e.g. ‘1752/02-1755/07/22’) but I’ll convert these before passing the string to the API in the URL.

I have the query running to search the dates, and this in itself was a bit complicated to generate as including a month or a month and a day in a ranged query changes the way the query needs to work.  E.g. if the user searches for ‘1752-1755’ then we need to return all borrowing records with a borrowed year of ‘1752’ or later and ‘1755’ or earlier.  However, if the query is ‘1752/06-1755-03’ then the query can’t just be ‘all borrowed records with a borrowed year of ‘1752’ or later and a borrowed month of ‘06’ or later and a borrowed year of ‘1755’ or earlier and a borrowed month of ‘03’ or earlier as this would return no results.  This is because the query is looking to return borrowings with a borrowed month of ‘06’ or later and also ‘03’ or earlier.  Instead the query needs to find borrowing records that have a borrowed year of 1752 AND a borrowed month of ‘06’ or later OR have a borrowed year later than 1752 AND have a borrowed year of 1755 AND a borrowed month of ‘03’ or earlier OR have a borrowed year earlier than 1755.

I also have the queries running that search for all necessary fields that aren’t dates.  This currently requires five separate queries to be run to check fields like author names, borrower occupations, book edition fields such as ESTC etc.  The queries currently return a list of borrowing IDs, and this is as far as I’ve got.  I’m wondering now whether I should create a cached table for the non-date data queried by the quick search, consisting of a field for the borrowing ID and a field for the term that needs to be searched, with each borrowing having many rows depending on the number of terms they have (e.g. a row for each occupation of every borrower associated with the borrowing, a row for each author surname, a row for each forename, a row for ESTC).  This should make things much speedier to search, but will take some time to generate.  I’ll continue to investigate this next week.

Also this week I updated the structure of the Speech Star database to enable each prompt to have multiple sounds etc.  I had to update the non-disordered page and the child error page to work with the new structure, but it seems to be working.  I also had to update the ‘By word’ view as previously sound, articulation and position were listed underneath the word and above the table.  As these fields may now be different for each record in the table I’ve removed the list and have instead added the data as columns to the table.  This does however mean that the table contains a lot of identical data for many of the rows now.

I then added in tooptip / help text containing information about what the error types mean in the child speech error database.  On the ‘By Error Type’ page the descriptions currently appear as small text to the right of the error type title.  On the ‘By Word’ page  the error type column has an ‘i’ icon after the error type.  Hovering over or pressing on this displays a tooltip with the error description, as you can see in the following screenshot:

I also updated the layout of the video popups to split the metadata across two columns and also changed the order of the errors on the ‘By error type’ page so that the /r/ errors appear in the correct alphabetical order for ‘r’ rather than appearing first due to the text beginning with a slash.  With this all in place I then replicated the changes on the version of the site that is going to be available via the Seeing Speech URL.

Kirsteen McCue contacted me last week to ask for advice on a British Academy proposal she’s putting together and after asking some questions about the project I wrote a bit of text about the data and its management for her.  I also sorted out my flights and accommodation for the workshop I’m attending in Zurich in January and did a little bit of preparation for a session on Digital Humanities that Luca Guariento has organised for next Monday.  I’ll be discussing a couple of projects at this event.  I also exported all of the Speak For Yersel survey data and sent this to a researcher who is going to do some work with the data and fixed an issue that had cropped up with the Place-names of Kirkcudbright website.  I also spent a bit of time on DSL duties this week, helping with some user account access issues and discussing how links will be added from entries to the lengthy essay on the Scots Language that we host on the site.