Week Beginning 13th February 2023

This was a one-day week for me as I’d taken the Monday off to cover the school half-term holiday while Tuesday to Thursday were strike days.  Thankfully the strike days scheduled for the next two weeks have now been called off so I should be able to get a little more done.

I spent my one day of work continuing with the development of the front-end for the Books and Borrowing project, and managed to complete an initial version of the advanced search form.  In my previous update I said I hadn’t quite managed to complete the selection options for the borrower occupation section.  This is complete now – you can select and deselect occupations at any level and corresponding occupations at higher or lower levels of the hierarchy will also select / deselect as required.  I have also added in autocompletes for borrower settlement and street.  If you start typing into one of these boxes (e.g. ‘black’ in settlement) a list of matching options will appear from which you can select one.  Note that the entered text can appear anywhere in the settlement name (e.g. ‘parish of Blackford’ is brought back) and we might want to change this to just match the beginning of settlements.

Street works in the same way (e.g. type in ‘king’).  A couple of things to point out, though.  Firstly: the selection of settlement and of street are currently in no way connected.  E.g. if you select ‘Blackford’ as a settlement and then attempt to type in a street the system doesn’t limit this to just streets within Blackford.  I could update things to connect the two search boxes in such a way, though.  Secondly:  I think we’ll have to give people freedom to ignore the autocomplete if they want.  For example, if you enter ‘king’ in ‘street’ you’ll see lots of very specific addresses (e.g. ’15 great king street’).  If you select one of those you’re obviously limiting your search quite considerably.  Whereas if we allow people to enter ‘great king street’ to bring back all borrowings at all addresses on this street the search might be more useful.  I’ve also added in borrower gender which (as specified in the requirements document) allows one single gender to be selected.  Thinking about it, we might want to make this a multi-select like other things instead.

The book author section is exactly the same as the ‘simple’ search and the book work section is pretty straightforward (and still awaits the addition of genre).  In the book edition section the ESTC field is an autocomplete.  This works slightly differently in that it matches the beginning of the ESTC only (e.g. ‘T1001’ matches IDs beginning with that text) and three characters rather than two need to be entered.  Even three gives a very long list and I may make it four characters before the list appears.  The last autocomplete field is place of publication.  This matches text anywhere in the place.  For example, type in ‘lon’ and you’ll see all of the places involving London, but also places like Bouillon.  I did wonder about making this a multi-select instead, but there are possibly too many to list all at once.

There are also two further multi-select areas for language and format.  Format wasn’t listed in the requirements document as being a multi-select but I think it makes sense for it to be one.  Each of these areas lists the number of book editions that have the language / format and (as previously discussed) the data needs tidying up as it’s a bit messy.  So, that’s the ‘advanced’ form complete, although the layout is not finalised so it will eventually look a lot nicer (I hope).  But there’s no getting around the fact that there are an intimidatingly large number of search options listed and we might need to think some more about this.

Also this week I inserted a missing page into the records for a register for Leighton library and sent the data for the 62 borrowers that have been erroneously assigned the mid-tier ‘Minister/Priest’ occupation to Katie to be assigned a final tier occupation instead.

I also spoke to Matthew Creasy about a conference website he would like to put together and to Ophira Gamliel about some T4 issues, and to discuss an AHRC proposal she submitted a while back for which I wrote the Data Management Plan that has been successfully awarded funding.  This will begin sometime over the summer.

Next week I will continue to implement the advanced search for Books and Borrowing.

Week Beginning 6th Febraury 2023

I tested positive for Covid on Saturday, which I think is the fourth time I’ve had it now.  However, this time it hit me a bit harder than previously and I was off work with it on Monday and Tuesday.  I was still feeling rather less than 100% on Wednesday but I decided to work (from home) anyway as Thursday and Friday this week were strike days so it would be the only day I would be able to work.  I managed to make it through the day but by the end I was really struggling and I was glad for the strike days as I was then quite unwell for the rest of the week.

I spent the day continuing to work on the advanced search interface for the Books and Borrowing project.  Whilst doing so I noticed that the borrower title data contained lots of variants that will probably need to be standardised (e.g. ‘capt’, ‘capt.’ and ‘captain’) so I emailed Katie and Matt a list of these.  I also spotted a problem with the occupations that had occurred during batch import of data for Innerpeffray.  There are 62 borrowers that have been assigned to the occupation category ‘Minister/Priest’ but this occupation is the only one where there are three hierarchical levels and ‘Minister/Priest’ is the second level and should therefore not be assignable.  Only endpoints of the hierarchy, such as ‘Catholic’ and ‘Church of Scotland’ should be assignable.  Hopefully this will be a fairly simple thing to fix, though.

For the Advanced Search form the requirements document stated that there will be an option for selecting libraries and a further one for selecting registers that will dynamically update depending on the libraries that are selected.  As I worked on this I realised it would be simpler to use if I just amalgamated the two choices, so instead I created one area that lists libraries and the registers contained in each.  From this you can select / deselect entire libraries and/or registers within libraries.  It does mean the area is rather large and I may update the interface to hide the registers unless you manually choose to view them.  But for now the listed registers are all displayed and include the number of borrowings in each.  There are several that have no borrowings and if these continue to have no borrowings I should probably remove them from the list as they would never feature in the search results anyway.

There is also a section for fields relating to the borrowing and a further one for fields relating to borrower.  This includes a list of borrower titles, with the option of selecting / deselecting any of these.  Beside each one is a count of the number of borrowers that have each title.  I’m currently still working on borrower occupation.  This currently features another area with checkboxes for each level of occupation, with counts of the number of borrowers in each occupation.  I’m still working on the select / deselect options so these are not all working at all levels yet.  I had hoped to finish this on Wednesday but my brain had turned to mush by the end of the day and I just couldn’t get it working.

Also this week I investigated an issue with Google Analytics for the Dictionaries of the Scots Language and responded to a couple of emails from Ann and Rhona.  I also spoke to Jennifer Smith about her extension of the Speak For Yersel project and exported some statistics about the number of questions answered.  I also responded to a query from Craig Lamont about the Edinburgh Enlightenment map we’d put together several years ago and spoke to Pauline Mackay about the Burns letter writing trail that I’ll be working on in the coming months.

Next week is the school half-term and I’m either on annual leave or on strike until next Friday.

Week Beginning 30th January 2023

This was a four-day week as the latest round of UCU strike action began on Wednesday.  Strike action if going to continue for the next two months, which is going to have a major impact on what I can achieve each week.

I spent almost all of this week working on the Books and Borrowing project.  This first two days were mainly spent dealing with data related issues.  This included writing a script to merge duplicate editions based on a spreadsheet of editions that I’d previously sent Matt to which he had added a column to denote which duplicate should be merged with which.  It took quite some time to write the script due to having to deal with associated book works and authors.  Some of the duplicates that were to be deleted had book work associations whilst the edition to keep didn’t.  These cases had to be checked for and the book work association had to be transferred over.

Authors were a little more complicated as both the duplicate to be deleted and the one to keep may have multiple associated authors.  If the duplicate edition to keep had no authors but the one to be deleted did then each of these had to be associated with the edition to keep.  But if both the edition to delete and the one to keep had authors only those authors from the ‘to delete’ edition that were not already represented in the ‘to keep’ edition’s author list had to be associated.  In such cases where an author did need to be associated with the ’to keep’ edition I also added in a further check to ensure the author being associated didn’t have the same name (but different ID) as one already associated, as there are duplicate authors in the system.

With all of this done the script then had to reassign the holding records from the ‘to delete’ edition to the ‘to keep’ one and then finally delete the relevant edition.  As the script makes significant changes to the data I first ran it on a version of the data I had running on my laptop to check that the script worked as intended, which thankfully it did.  After completing the test I then (after taking another backup of the database in case of problems) ran the script on the live data.  The process resulted in 541 duplicate editions being deleted from the system and as far as I can tell all is well.  We now have 13,086 editions in the system and 13,014 of these do not have an associated book work.  We only have 75 book works in the system.

The next step is to assign book works to editions and add in book genres.  In order to do this I created a further spreadsheet containing the editions with columns for book work, authors and three columns which can be used to record up to three genres.  I also sent Matt and Katie a further spreadsheet containing the details of the 75 existing book works in our system.  It’s going to be rather complicated to fill in the spreadsheet as there’s a lot going on and it took me quite a while to figure out a workflow for filling it in.  Hopefully with that in place filling it in should be straightforward, if time-consuming.

I also ran some queries, did some checks and generated some spreadsheets for the Wigtown data for Gerry McKeever.  With these data related issues out of the way I then returned to developing the front-end.  Whilst working on an issue relating to ordering the results by date I noticed that we have quite a lot of borrowing records in the system that have no dates.  There are almost 12,000 that don’t have a ‘borrowed year’.  There’s possibly a good reason for it, but of these 2,376 have a borrowed day and a borrowed month but no year, which seems more strange.  I emailed Katie and Matt about this and they’re going to investigate.

I managed to finish work on the ‘Year borrowed’ bar chart this week.  Without providing a year filter the bar chart shows the distribution of borrowing records divided into decades, for example this search for ‘rome’, ordered by date borrowed:

You can then click on one of the decade bars to limit the results to just those in the chosen decade, for example clicking on the ‘1780’ bar:

This then displays a bar chart showing a breakdown of borrowing records per year within the selected decade.  You are given the option of clearing the year filter to return to the full view and you can also click on an individual year bar to limit the results to just that year, for example limiting to the year 1788:

When you reach this level no bar chart is displayed as year is the unit that’s filtered and there is only one year selected.  But options are given to return to the decade view or clear the year filter.  You can of course combine the year filter with any of the other filter options.  I guess at year level we could display a similar bar chart for borrowings per month, but this might be too fine-grained and confusing (plus would be a lot more work as everything is currently set up to work with year only).  It’s something to consider, though.

I did spot a problem with the bar chart:  I realised that when you searched for an individual year or a range within an individual year the results were still showing the options to view the decade and clear the year filter, both of which then gave errors.  This has now been sorted – no year filter options should be shown when the main search is only for a year.

For the remainder of the week I began working on the advanced search.  As specified in the requirements document, currently the advanced search page features two tabs, one for a ‘simple’ advanced search and one for an ‘advanced’ advanced search.  So far I’ve just been working on the forms, which in turn has necessitated making some changes to the API (to bring back a simple list of all libraries and to enable an entire list of registers to be returned).  The forms allow you to select / deselect libraries and select / deselect all.  In the ‘Simple’ tab there are also textboxes for entering date of borrowing, author forename and surname, year of birth / death and book title, plus a placeholder for genre.  The requirements document stated that date of borrowing would have boxes for entering years and days and a drop-down list for selecting month, with two sets to be used for range dates.  I’ve decided that since the quick search already allows dates to be entered directly as text that it would make sense to just follow the same method for the advanced search.

Author dates as currently specified are going to be a bit messy for BC dates, where people need to enter a negative value.  This is messy because a dash is used for date ranges so we may end up with something like ‘-1000–200’ (that’s two dashes in the middle).  I’m not sure what we can do about this, though.  I guess having different boxes for ‘from’ and ‘to’ for ranged dates would avoid the issue.  For the ‘advanced’ advanced search lists of selectable registers will appear depending on the libraries that are selected.  This is what I’m still in the middle of working on.

If I have the time I would like to create a new theme for the website that will look pretty similar but will use the Bootstrap front-end toolkit (https://getbootstrap.com/).  The current WordPress theme doesn’t use this which means creating complex layouts is more difficult and messy.  I created a Bootstrap based WordPress theme for the Anglo-Norman Dictionary (e.g. this search form: https://anglo-norman.net/textbase-search/) but I’ll just have to see how much time I have as I think it’s better to get the essentials in place first.  But what it means is in the meantime things like the search form layout will possibly not be finalised (but will be functional).

In addition to the above I fixed an issue with the Thesaurus of Old English data for Jane Roberts and I completed setting up an initial WordPress site for the VARICS project.  I also did a little work for the Dictionaries of the Scots language, fixing a broken link from entries to DOST abbreviations, replying to an email from Rhona about a cookie policy for the website and investigating an issue with text in italics in quotations not being found when a ‘quotations only’ advanced search is performed.

It turns out that the code I’d written to generate the data for the quotations was only set to pick up the direct contents of <q> and to ignore the contents of any child elements such as <i>.  This is not the case with the full text and ‘exclude quotations’ data.  I identified the issue and updated the code, running a test entry through it to test that the italicised text in quotes is now getting indexed properly.  It may well be that there was a reason why the code was set up in this way, though, as Ann mentioned that there are other tags within quotes whose content should be ignored.  I’ll need further input from the team before I do anything further about this.