On Monday this week I attended the Books and Borrowing conference in Stirling. I gave a demonstration of the front-end I’ve been developing which I think went pretty well. Everyone seems very pleased with the site and its features.
I also spent a bit of time working for the DSL. I investigated an issue with square brackets in the middle of words which was causing the search to fail. It would appear that Solr will happily ignore square brackets in the data when they are at the beginning or end of a word, but when they’re in the middle the word is then split into two. We decided that in future we will strip square brackets out of the data before it gets ingested into Solr. I also updated the site so that the show/hide quotations and etymology setting is no longer ‘remembered’ by the site when the user navigates between entries. This was added in as a means of enabling users to tailor how they viewed the entries without having to change the view on every entry they looked at, but I can appreciate it might have been a bit confusing for people so it’s now been removed. I also implemented a simple Cookie banner for the site and after consultation with the team this went live.
Also this week I responded to a query from Piotr Wegorowski in English Language and Linguistics about hosting podcasts and had an email chat with Mike Irwin, who started this week as head of IT for the College of Arts. I spent the rest of the week working through the WordPress sites I manage and ensuring they all work via our new external host. This involved fixing a number of issues with content management systems and APIs, updating plugins, creating child themes and installing new security plugins. By the end of the week around 20 websites that had been offline since early February were back online again, including this one!
I was off last week for Easter and was only working two days this week. I regenerated the Solr data for the Books and Borrowing project ahead of a demonstration of the front-end in Liverpool at the end of the week. I wasn’t at this event but apparently the demonstration went very well. I also tracked down and processed missing images for three library registers from Glasgow that had somehow not been incorporated into the system and I spent some time going through the site and making tweaks and fixes. These included adding a link to borrowers page from dev site homepage and adding a ‘download full image’ option when viewing a register page, which opens the full image in a new browser tab and from there you can save it. I also fixed an issue with book item part number ordering – previously the volumes were ordered messily and now they are in numeric order wherever volumes are listed. I also removed author years of birth and death from the ‘simple’ advanced search, as requested, and added in borrower forename and surnames to the ‘simple’ advanced search, as requested. I also fixed the search results sort by date not working. This was caused by a bug that took a while to track down. All of the search results ordering options should now be working. I then fixed an issue with empty tabs appearing in borrower list alphabetical tabs. This was being caused by new borrowers being added since generating cached data. I’ve fixed the issue now so the empty tab should no longer appear. I also managed to fix the bar chart legend being cut off the bottom of the visualisations on the ‘libraries’ and ‘library intro’ pages and updated the colours used in the bar chart to differentiate them a bit better. I also fixed the date filter in the search results so it no longer displays the bar chart or ‘view decade’ button if a single year is searched for. It was doing so previously and then pressing on the options broke things. Finally, I fixed an issue with the apostrophe in the occupation “Advocate’s Clerk” breaking the search. I also spent some time preparing a demonstration of the site that I’ll be giving at the project’s conference in Stirling next week.
Other than Books and Borrowing work, I spent some time dealing with the migration of WordPress sites to external hosting and ensuring they were working.
I was off on Thursday and Friday this week. I spent some of the remaining time getting some of the migrated WordPress sites up and running again. I spent the remainder of my time continuing with the Books and Borrowing front-end and pretty much completed the library borrowers page. As with the list of borrowers within a library, you can view the list by borrower surname or view the ‘top 100’ most prolific borrowers. In addition to this, there are a couple of limiting options. As with book editions, you can limit the list of borrowers by a date range. Here the date range is ‘active period’. I’ve created new fields to store the year of the first and last borrowing for each borrower and this is the borrower’s ‘active period’. Note that this is something different to limiting the list to borrowers who actually borrowed a book within the selected period. If your selected range is ‘1780-1790’ and a borrower’s first borrowing was in 1777 and their last borrowing was in 1799 then the borrower will be returned as they are considered ‘active’ in your chosen period. It may well be that they didn’t actually borrow anything in the period 1780-1790 but they’ll still be returned. If instead of ‘active period’ we wanted to query exact years of borrowing this would be a much more complex query as each individual borrowing record for every borrower would need to be queried. I could create another Solr index for borrowers that would allow this, but I’m hoping that ‘active period’ is sufficient for the borrowers page. In addition to the borrower’s active period you can also limit by gender, and you can combine the two limits. For example, listing all of the female borrowers by surname that were active between 1750 and 1800, or viewing the ‘top 100’ female borrowers that were active between 1800 and 1850.
I also added in a ‘limit by occupation’ feature to the borrowers page. Occupations are listed in the same way as they are on the advanced search form – you can select and deselect checkboxes. Note that I’ve also removed the unassigned occupations from both this list and the advanced search list. Unlike the advanced search list, the occupations that are displayed vary depending on the other limit options you have selected. For example, if you limit the list of borrowers to ‘Female’ then only those occupations that have been assigned to female borrowers are displayed, together with the number of assigned borrowers. When you select one or more occupations the returned borrowers and the tabs update to reflect your choice, as they do with gender and borrowing period. I haven’t had time to add in a limit by library yet, but I will do so eventually.
I am off work next week for the Easter holiday and for all but two days of the following week too.
Also this week I completed an initial version of the ‘Browse book’ page in the Books and Borrowing front-end. The page works pretty much exactly as it was specified in the requirements document, apart from a limit by genre not being in place yet. The page presents all of the book editions in the system and works in a similar way to the ‘browse book holdings’ tab when viewing a library. By default the book editions are displayed by title and there are tabs for each initial letter and a count of the number of book editions with titles beginning with the letter. The data for each book edition (and associated book work where available) is listed, with searchable items clickable (e.g. click on an edition title to search for all borrowing records involving this edition). The number of borrowings is also listed as are all of the individual holding records associated with the edition.
You can change the way the book edition data is displayed to select ordering by author or the ‘top 100’ view in addition to title. There is also a date slider that allows you to limit the view to books published within a specific period. There is currently one book in the system with a publication date of 1968 which is why the range currently goes that far – once this has been fixed the range will automatically fix itself too. The range slider is double-ended – drag each end to match the period you’re interested in then press ‘Update’ and the page will reload and will only display book editions published in your chosen range. So for example you can view the ‘top 100’ for books published in different periods and compare them.
Next week I will begin and hopefully finish the ‘Browse borrowers’ page. That then leaves the ‘facts and figures’ (both library and system-wide) plus the integration of genre to tackle. Plus working on the user interface, migrating to Bootstrap and fixing a number of little issues I’ve spotted.
This was a two-day week due to UCU strike action from Wednesday to Friday. I spent some of this time sorting out travel and accommodation for the DH2023 conference in Graz. I spent the remainder of my time on the Books and Borrowing project. I continued to add ‘click to search’ options to all of the data. In the search results the format, editors and translators all now can be clicked on. I’ve also separated out the ESTC. The button linking to the BL site is still present on the right, but ESTC is also now listed like other data types on the left, and as with other data types you can click on it to perform a search in our site for the ESTC.
I also updated everywhere else that the data is displayed to incorporate the ‘click to search’ options too. This includes the register pages and the ‘books’ and ‘borrowers’ tabs when you’re looking at a library. I’ve also set the ‘page’ view to default to the ‘image and text’ view, which was requested a while back.
I then began working on Section 5 of the requirements document, which is the ‘Browse Books’ feature through which book editions will be listed. I needed to make some changes to the database to implement this but I encountered a problem with the online database – when I look at the structure of a table it should list the columns, but for each table this list is blank. I was a little worried about adding new columns so I contacted Mike at Stirling but unfortunately he was out of office. Therefore I decided to work on things on my laptop instead of the live database. I made a good start with the API endpoints that will be required for this feature, and have written new scripts to generate cached data. I’ve not completed the feature yet but hopefully it won’t take too much longer to implement. I just hope I can remember where I left things once I’m back to work again on Thursday next week.
This week I continued to focus primarily on the development of the Books and Borrowing front-end, making various updates to the search and browse facilities. I investigated an issue with authors not appearing in the Solr data and located the cache generation scripts I’d previously written but hadn’t re-run recently. These include generating cached data for authors and also things like number of borrowings. I have now updated my ‘Solr data generation how to’ document to note that these cache generation scripts need to be re-run so hopefully this shouldn’t be an issue in future. I’ve also re-run all of the scripts now but this won’t affect anything until I send new data to the Stirling IT people.
I also updated my Solr generation scripts to split up edition languages and place of publication on the bar character (and also separating out any places in square brackets). This means each language and place will be independently searchable in future. I have also updated the advanced search form so that languages and places are split by the bar character. For language this means that where a language had a bar (e.g. ‘Latin | Arabic’) the associated count then gets added to each of the individual occurrences of the language. The list of languages on the advanced search form is now much more reasonable and less cluttered.
I was considering replacing the auto-complete for place of publication with a multi-select, but I’ve decided against this as even with the bar splitting there are still about 350 different places, which is too much. Instead I’ve updated the autocomplete to bring back individual places. For example previously ‘Stirling’ appeared in a long list of places all separated by bars and this is what was returned when you typed ‘Stirling’ in. Now only ‘Stirling’ appears.
I then created a new API endpoint to retrieve the data for a single borrower and I now use this call when a user presses on a borrower name in the search results, enabling me to display the borrower name in the ‘You searched for’ section rather than just the borrower ID. I have also updated things so that if you do press on a borrower name to search for all records involving the borrower, plus you can now press ‘refine your search’ and the borrower’s details (title, forename, surname, othernames) will populate the advanced search form, which is quite nice. Note however that if you do this then press the ‘Search’ button this then searches on these fields and not the specific borrower ID, so the results may be very different (e.g. multiple John Smiths).
I also realised that I will need to rethink how the pubdate search works as I’d forgotten that we have both ‘pubdate’ and ‘pubdateend’ fields. Currently I have just been using ‘pubdate’ but this is not going to give accurate results where we have a range of years. Instead I’m generating and saving each year in the range. This allows the search to work without updating the code, and after testing it out all would appear to work very well.
I then worked on adding in more ‘click to search’ options to the search results page. I added in a search option for ‘Holding title’ that when clicked on searches for the holding title’s ID (and populates the search form with the holding title if you choose to refine your search). Unfortunately I realised that I had somehow not included the book holding ID as a field in the Solr data. I therefore updated the schema on my laptop and regenerated and ingested the data into Solr on my laptop to check that the process worked. Thankfully all went smoothly, but it does mean that until I next update the live Solr data the ‘click to search’ for holding title doesn’t actually provide any results.
I also added in ‘click to search’ for book edition and book work title. I had included the book edition and work IDs in the Solr data so thankfully I was able to get these searches working fully. As with book holding, when you click on a title to perform the search and then choose to refine your search the edition / work title appears in the advanced search form.
I also added in a similar option for authors. You can now click on an author’s name at any of the levels of association and a search for author ID will be performed, with author forename and surname appearing in the advanced search form if you ‘refine’.
I then moved on to adding in ‘click to search’ for edition language and publication place. These also had their own challenges as I needed to split up languages / publication places into individual clickable areas based on the bar character and also the square brackets for publication place. I managed to get it all working, but of course this search won’t work properly until the Solr data is updated anyway.
Also this week I had an email conversation with Ophira Gamliel about a new project that will be starting soon. We discussed how the project’s website will function, interactive maps, URLs, page design and other such matters. I’ll be starting on this sometime after Easter.
I continued to develop the front-end for the Books and Borrowing project for most of this week, completing work on an initial version of the advanced search facility. Last week I decided to change the way the API is referenced for the search. Previously there was going to be one endpoint for the quick search, which would accept one search parameter, and another for the advanced search, which would accept multiple parameters. I decided instead to amalgamate the two into one single search endpoint as in reality both search facilities will need to do the same things: format the search options for Solr, work out the pagination, deal with ordering options and work out which filters need to be applied.
In order to amalgamate the endpoints I needed to rework the quick search facility that I had already created, and this meant breaking the quick search for a while. Thankfully I managed to put it all back together again with the quick search working once more, but with slightly different URLs and a differently structured API call. With this in place I began to add the advanced search data types to the API so as to construct the query that will be passed to Solr to return the advanced search results. This basically allows specific fields in the Solr data (e.g. author names, library names, dates) to be queried rather than querying all fields, which the quick search does.
As I worked on this I ran into a spot of bother with author years of birth and death that were negative (i.e. BC). They just weren’t working as they should have done and a bit of investigation revealed that this was because I was storing the years as strings rather than integers. I regenerated the data on my laptop, saving the years as integers, and after that negative dates worked. However, I soon realised why I hadn’t been saving the years as integers: some author dates are not integers but are things like ‘1650?’ or ‘16__’. When I tried ingesting these into Solr the records gave errors and failed to get added. I therefore had to add a further check to avoid any non-integer dates getting added to Solr. This means the associated records now get added but don’t have the offending dates. This isn’t a huge issue as the dates would never have been searchable anyway. For now this update is not present on the live site as I will wait until the next data export to add this, so in the meantime negative author dates will not work but the issue has been sorted.
I also ran into another issue with how I was structuring the URLs for the advanced search. Short URLs as I’ve previously used work fine, but the advanced search is going to potentially result in some very long URLs, with advanced search fields and values stored in a specific section of the URL between slashes, for example:
However, such URLs were resulting in a 403 forbidden error on the server. I contacted Stirling IT Services to enquire about this and discovered that the issue wasn’t the length of the URL but the length of the text between slashes in the URL. The file system only allows filenames to be 255 characters in length and even though the above URL isn’t actually referencing filenames but is split up into variables by my script, the server first has to treat the URL as if it contained filenames (well, folder names) and it’s the server at a very fundamental level that is preventing things from working.
Unfortunately this meant I had to go back to the drawing board regarding how the search URLs would work. Previously (as shown above) the search variables appeared first, with variable names and values separated by a bar and each pairing separated by a double bar. After that things like filter queries, pagination and ordering options are included in the URL. I needed to split the variable pairings up with a slash instead to avoid the lengthy text between slashes, but this would mean I could no longer be certain where in the URL things like pagination would appear. Instead I needed to switch around the order of things in the URL, ensuring pagination, filter queries and sorting options appear first and then all of the search criteria follow, as many as are required. This took quite some time to implement and does unfortunately mean that none of the existing links I’ve sent the team will work any more, but we are now in a better place and the search’s lengthy URLs will now work.
So for example an advanced search is for borrowing records in two Edinburgh registers (Da.2.10 and Da.2.11) where the borrower surname is ‘smith’, the book edition language is ‘english’ and the format is ‘8v0’ gives results and if you press ‘refine search’ the ‘advanced’ tab is displayed with these options already filled in, allowing you to update them as required.
I also investigated and fixed an issue with selecting / deselecting the third level religion occupations and I began to make some of the information in the search results searchable. I’ve currently added in ‘click to search’ options for the borrowed date and the borrower name. These now appear with a dotted line under them and if you press on one you will immediately see all of the associated borrowings The click through for borrower name still needs a bit of work as it is actually a new search option not present in the search form – a search for borrower ID. At the moment it’s only the ID that appears in the ‘you searched for’ section but I will fix this.
There are still some things that are not yet working properly. Search filters that feature slashes or bars or ampersands currently cause things to break. Also Solr is sometimes being too clever for its own good in bringing back records that are of relevance but don’t actually match the search criteria. For example a search for the transcription field containing ‘betsy thoughtless’ currently brings back records that don’t include this text in the transcription field but in other fields, meaning Solr returns the records because it thinks they might be of interest. This can be avoided by using quotes but I need to investigate whether there is a better way to deal with this.
On Friday I dealt with the normalisation of data that Matt and the team had been working on. This rationalises the data in fields such as borrower title and book edition language so that the same form is always used for the same thing. For example there is just one form for ‘captain’ rather than there being ‘capt’, ‘capt.’, ‘captain’ etc. I wrote scripts to process all of these updates and further scripts to pick out forms that need additional checking. I also regenerated the distinct forms for things like borrower title so Matt could check that no further unwanted forms had been added since I last exported the forms.
Also this week I made some further updates to the Edinburgh’s Enlightenment map and investigated an issue Ann was having with entries that featured slashes in the Dictionaries of the Scots Language.
This was a week of server woes. Two servers that host many of our most important sites went offline last Friday and our IT Support people weren’t able to get them back online again until the following Tuesday. Then just as things were getting back to normal on Tuesday an issue was spotted with another of our servers that meant it needed to be taken offline, resulting in at least 30-40 websites going down, including this one. As I’m typing this the following Monday the server is still offline and I’ve not heard anything about a timescale for getting it back up again.
Thankfully I’m currently working for the Books and Borrowing project, which is based at servers hosted at Stirling University so my ability to work was not affected by the outages, but it’s really bad news for active research projects that require the online resources to function and it reflects really badly on both the College of Arts and the University of Glasgow as a whole.
For the Books and Borrowing project I dealt with some data correction issues for Haddington library that one of the researchers had spotted, including swapping page images around and moving borrowing records to different pages. I also corrected the occupation errors that we’d spotted with some borrowers from Innerpeffray library using a spreadsheet that Katie sent me. She had noticed that there were also several duplicate borrowers in the data and had noted which records needed to be amalgamated so I dealt with these as well.
My main task for the week was to update the Solr data we use for the quick search to incorporate all of the data that we will also need to use for the advanced search. On Monday and Tuesday I spent some time reworking the Solr instance running on my laptop so as to get it ready to handle the advanced search. This involved adding new fields for all of the types of data the advanced search needs to query.
I also figured out how to get around the stemming issue for fields like occupation. For text fields Solr creates stemmed versions of all recognisable words in the fields, so for example ‘searching’, ‘searched’, ‘searches’ etc all have the stemmed form ‘search’. This then allows free-text searches to find data that’s of relevance, which can be really useful. Unfortunately when displaying search filters it’s the stemmed forms that get returned and displayed beside the checkboxes and these can be a bit confusing. I figured out that you can create copy fields for these text fields in Solr where the text is stored as strings rather than text, and strings do not get stemmed. The search can then use the text field and the search filters can then use the string field. Pressing on a search filter then searches the string field, which is case sensitive, but this isn’t an issue as what’s being searched is the full text of the checkbox label (e.g. ‘Religion and Clergy’) which will always match the string form Solr stores. This means that the search filters now say something like ‘Education’ rather than ‘educat’ and full author names now get displayed, which is great.
I also added in borrower title and ESTC as search filters as I thought these might be useful. Plus I’ve fixed the issue of fields that hold multiple values not being sortable. For example, a borrowing record may have multiple occupations associated with it as there may be multiple borrowers and each borrower may have several occupations. Because of this it was not possible to sort the search results by borrower occupation. The fix for this was to generate a further field for each that stores all of the multiple values as a single string. For borrower occupation for sorting purposes the occupation at the bottom of the hierarchy appears first, so if a borrowing record features a borrower with occupation ‘Law -> Advocate’ the record will be sorted under ‘Advocate’ then ‘Law’. For borrower names and author names the ordering is surname then forename.
With all of these changes in place I took a copy of the live database (also taking the opportunity to deactivate all of the test libraries in the system), regenerated the JSON files that Solr indexes and then ingested them into my updated Solr instance on my laptop. After that I ran some tests to check all was working fine. After that I sent the data to the IT people at Stirling (I need to get them to import the data into the Solr instance on the server) and on Wednesday morning they imported it all and thankfully everything went smoothly.
With the new data in place I updated the API and the search results page to add in the new filters (Borrower title and ESTC) and to switch the filters over to using the string versions for display so we now have full occupation and author names displayed. I also updated the ‘Order by’ facility to allow all sorting options to work. Unfortunately whilst doing so I spotted that I’d forgotten to add in the code to populate the book edition title single field so I’m afraid sorting by this field doesn’t work yet, but other options such as borrower occupation and author and borrower name are now working. I updated my Solr data generation script to add in the book edition title now so next time I regenerate the data this will work.
I then started to work on implementing the advanced search. I decided to change the way the API is referenced for the search. Previously there was going to be one endpoint for the quick search, which would accept one search parameter, and another for the advanced search, which would accept multiple parameters. I decided instead to amalgamate the two into one single search endpoint as in reality both search facilities will need to do the same things: format the search options for Solr, work out the pagination, deal with ordering options and work out which filters need to be applied.
In order to amalgamate the endpoints I needed to rework the quick search facility that I had already created, and this meant breaking the quick search for a while. Thankfully I managed to put it all back together again with the quick search working once more, but with slightly different URLs and a differently structured API call. With this in place I began to add the advanced search data types to the API so as to construct the query that will be passed to Solr to return the advanced search results. This basically allows specific fields in the Solr data (e.g. author names, library names, dates) to be queried rather than querying all fields, which the quick search does. As I left things off on Friday I was in the middle of adding in the option of searching author birth and death years, but I’d run into a little difficulty when processing negative years (i.e. BC years) that I’m going to have to investigate further next week.
Also this week I made some changes to an old interactive map I’d made back in 2015 showing important places relating to Edinburgh’s enlightenment. This is hosted on the University’s T4 system and the T4 people were keen for alt tags to be added to the image map tiles. Thankfully I found an answer for this on Stack Overflow (https://stackoverflow.com/a/27606381) whereby attributes can be set each time a tile is loaded. The alt tag text is an empty string so I’m uncertain whether this will actually help anyone, but it pleases the validators, anyway. There were some other issues with the site that had been caused by the University website changing its styles since the map was published, and I fixed these too. As of yet the changes have not been approved, even after several days, so I’m not sure what’s going on there.
My other task this week was to create an initial interface for the VARICS project website, using the logo, fonts and colour scheme that the designer had created for the project. I spent a bit of time customising the theme to incorporate these and have emailed the PI to let her know that things are ready to add content to. It’s possible I’ll need to make further changes to the interface, but it’s a good starting point at least.