This was my first week back after the Christmas holidays and after catching up with emails I spent the best part of two days fixing the content management system of one of the resources that had been migrated during the end of last year. The Saints Places resource (https://saintsplaces.gla.ac.uk/) is not one I created but I’ve taken on responsibility for it due to my involvement with other place-names resources. The front-end was migrated by Luca and was working perfectly, but he hadn’t touched the CMS, which is understandable given that the project launched more than ten years ago. However, I was contacted during the holidays by one of the project team who said that the resource is still regularly updated and therefore I needed to get the CMS up and running again. This required updates to database query calls and session management and it took quite some time to update and test everything. I also lost an hour or so with a script that was failing to initiate a session, even though the session start code looked identical to other scripts that worked. It turned out that this was due to the character encoding of the script, which had been set to UTF-8 BOM, which meant that hidden characters were being outputted to the browser by PHP before the session was instantiated, which then made the session fail. Thankfully once I realised this it was straightforward to convert the script from UTF-8 BOM to regular UTF-8, which solved the problem.
With this unexpected task out of the way I then returned to my work on the new map interface for the Place-names of Iona project, working through the ‘to do’ list I’d created after our last project meeting just before Christmas. I updated the map legend filter list to add in a ‘select all’ option. This took some time to implement but I think it will be really useful. You can now deselect the ‘select all’ to be left with an empty map, allowing you to start adding in the data you’re interested in rather than having to manually remove all of the uninteresting categories. You can also reselect ‘select all’ to add everything back in again.
I did a bit of work on the altitude search, making it possible to search for an altitude of zero (either on its own or with a range starting at zero such as ‘0-10’). This was not previously working as zero was being treated as empty, meaning the search didn’t run. I’ve also fixed an issue with the display of place-names with a zero altitude – previously these displayed an altitude of ‘nullm’ but they now display ‘0m’. I also updated the altitude filter groups to make them more fine-grained and updated the colours to make them more varied rather than the shades of green we previously had. Now 0-24m is a sandy yellow, 25-49, is light green, 50-74m is dark green, 75-99 is brown and anything over 99 is dark grey (currently no matching data).
I also made the satellite view the default map tileset, with the previous default moved to third in the list and labelled ‘Relief’. This proved to be trickier to update than I thought it would be (e.g. pressing the ‘reset map’ button was still loading the old default even though it shouldn’t have) but I managed to get it sorted. I also updated the map popups so they have a white background and a blue header to match the look of the full record and removed all references to Landranger maps in the popup as these were not relevant. Below is a screenshot showing these changes:
I then moved onto the development of the elements glossary, which I completed this week. This can now be accessed from the ‘Element glossary’ menu item and opens in a pop-up the same as the advanced search and the record. By default elements across all languages are loaded but you can select a specific language from the drop-down list. It’s also possible to cite or bookmark a specific view of the glossary, which will load the map with the glossary open at the required place.
I’ve tried to make better use of space than similar pages on the old place-names sites by using three columns. The place-name elements are links and pressing on one performs a search for the element in question. I also updated the full record popup to link the elements listed in it to the search results. I had intended to link to the glossary rather than the search results, which is what happens in the other place-names sites, but I thought it would be more useful and less confusing to link directly to the search results instead. Below is a screenshot showing the glossary open and displaying elements in Scottish Standard English:
I also think I’ve sorted out the issue with in-record links not working as they should in Chrome and other issues involving bar characters. I’ve done quite a bit of testing with Chrome and all seems fine to me, but I’ll need to wait an see if other members of the team encounter any issues. I also added in the ‘translation’ field to the popup and full record, although there are only a few records that currently have this field populated, relabelled the historical OS maps and fixed a bug in the CMS that was resulting in multiple ampersands being generated when an ampersand was used in certain fields.
My final update for the project this week was to change the historical forms in the full record to hide the source information by default. You now need to press a ‘show sources’ checkbox above the historical forms to turn these on. I think having the sources turned off really helps to make the historical forms easier to understand.
I also spent a bit of time this week on the Books and Borrowing project, including participating in a project team Zoom call on Monday. I had thought that we’d be ready for a final cache generation and the launch of the full website this week, but the team are still making final tweaks to the data and this had therefore been pushed back to Wednesday next week. But this week I updated the ‘genre through time’ visualisation as it turned out that the query that returned the number of borrowing records per genre per year wasn’t quite right and this was giving somewhat inflated figures, which I managed to resolve. I also created records for the first volume of the Leighton Library Minute Books. There will be three such volumes in total, all of which will feature digitised images only (no transcriptions). I processed the images and generated page records for the first volume and will tackle the other two once the images are ready.
Also this week I made a few visual tweaks to the Erskine project website (https://erskine.glasgow.ac.uk/) and I fixed a misplaced map marker in the Place-names of Berwickshire resource (https://berwickshire-placenames.glasgow.ac.uk/). For some reason the longitude was incorrect for the place-name, even though the latitude was fine, which resulted in the marker displaying in Wales. I also fixed a couple of issues with the Old English Thesaurus for Jane Roberts and responded to a query from Jennifer Smith regarding the Speak For Yersel resource.
Finally, I investigated an issue with the Anglo-Norman Dictionary. An entry was displaying what appeared to be an erroneous first date so I investigated what was going on. The earliest date for the entry was being generated from this attestation:
<attestation id="C-e055cdb1"><dateInfo> <text_date post="1390" pre="1314" cert="">1390-1412</text_date> <ms_date post="1400" pre="1449" cert="">s.xv<sup>1</sup></ms_date> </dateInfo> <quotation>luy donantz aussi congié et eleccion d’estudier en divinitee ou en loy canoun a son plesir, et ce le plus favorablement a cause de nous</quotation> <reference><source siglum="Lett_and_Pet" target=""><loc>412.19</loc></source></reference> </attestation>
Specifically the text date:
<text_date post="1390" pre="1314" cert="">1390-1412</text_date>
This particular attestation was being picked as the earliest due to a typo in the ‘pre’ date which is 1314 when it should be 1412. Where there is a range of dates the code generates a single year at the midpoint that is used as a hidden first date for ordering purposes (this was agreed upon back when we were first adding in first dates of attestation). The code to do this subtracts the ‘post’ date from the ‘pre’ date, divides this in two and then adds it to the ‘post’ date, which finds the middle point. With the typo the code therefore subtracts 1390 from 1314, giving -76. This is divided in two giving -38. This is then added onto the ‘post’ date of 1390, which gives 1352. 1352 is the earliest date for any of the entry’s attestations and therefore the earliest display date is set to ‘1390-1412’. Fixing the typo in the XML and processing the file would therefore rectify the issue.
I completed work on the integration of genre into the Books and Borrowing systems this week. It took a considerable portion of the week to finalise the updates but it’s really great to have it done, as it’s the last major update to the project.
My first task was to add genre selection to the top-level ‘Browse Editions’ page, which I’m sure will be very useful. As you can see in the following screenshot, genres now appear as checkboxes as with the search form, allowing users to select one or more they’re interested in. This can be done in combination with publication date too. The screenshot shows the book editions that are either ‘Fiction’ or ‘Travel’ that were published between 1625 and 1740. The selection is remembered when the user changes to a different view (i.e. authors or ‘top 100’) and when they select a different letter from the tabs.
It proved to be pretty tricky and time-consuming to implement. I realised that not only did the data that is displayed need to be updated to reflect the genre selection, but the counts in the letter tabs needed to be updated too. This may not seem like a big thing, but the queries behind it took a great deal of thought. I also realised whilst working on the book counts that the counts in the author tabs were wrong – they were only counting direct author associations at edition level rather than taking higher level associations from works into consideration. Thankfully this was not affecting the actual data that was displayed, just the counts in the tabs. I’ve sorted this too now, which also took some time.
With this in place I then added a similar option to the in-library ‘Book’ page. This works in the same way as the top-level ‘Editions’ page, allowing you to select one or more genres to limit the list of books that are displayed, for example only books in the genres of ‘Belles Lettres’ and ‘Fiction’ at Chambers, ordered by title or the most popular ‘Travel’ books at Chambers. This did unfortunately take some time to implement as Book Holdings are not exactly the same as Editions in terms of their structure and connections so even though I could use much of the same code that I’d written for Editions many changes needed to be made.
The new Solr core was also created and populated at Stirling this week, after which I was able to migrate my development code from my laptop to the project server, which meant I could share my work with others, which was good.
I then moved onto adding genre to the in-library ‘facts’ page and the top-level ‘facts’ page. Below is a very long screenshot of the entire ‘facts’ page for Haddington library and I’ll discuss the new additions below:
The number of genres found at the library is now mentioned in the ‘Summary’ section and there is now a ‘Most popular genres’ section, which is split by gender as with the other lists. I also added in pie charts showing book genres represented at the library and the percentage of borrowings of each genre. Unfortunately these can get a bit cluttered due to there being up to 20-odd genres present, so I’ve added in a legend showing which colour is which genre. You can hover over a slice to view the genre title and name and you can click on a slice to perform a search for borrowing records featuring a book of the genre in the library. Despite being a bit cluttered I think the pies can be useful, especially when comparing the two charts – for example at Haddington ‘Theology’ books make up more than 36% of the library but only 8% of the borrowings.
Due to the somewhat cluttered nature of the pie charts I also experimented with a treemap view of Genre. I had stated we would include such a view in the requirements document, but at that time I had thought genre would be hierarchical, and a treemap would display the top-level genres and the division of lower level genres within these. Whilst developing the genre features I realised that without this hierarchy the treemap would merely replicate the pie chart and wouldn’t be worth including.
However, when the pie charts turned out to be so cluttered I decided to experiment with treemaps as an alternative. The results currently appear after the pie charts in the page. I initially liked how they looked – the big blocks look vaguely ‘bookish’ and having the labels in the blocks makes it easier to see what’s what. However, there are downsides. Firstly, it can be rather difficult to tell which genre is the biggest, due to the blocks having different dimensions – does a tall, thin block have a larger area than a shorter, fatter block, for example. It’s also much more difficult to compare two treemaps as the position of the genres changes depending on their relative size. Thankfully the colour stays the same, but it takes longer than it should to ascertain where a genre has moved to in the other treemap and how its size compares. I met with the team on Friday to discuss the new additions and we agreed that we could keep the treemaps, but that I’d add them to a separate tab, with only the pie charts visible by default.
I then added in the ‘borrowings over time by genre’ visualisation to the in-library and top level ‘facts’ pages. As you can see from the above screenshot, these divide the borrowings in a stacked bar chart per year (other month if a year is clicked on) into genre, much in the same way as the preceding ‘occupations’ chart. Note however that the total numbers for each year are not the same as for the occupations through time visualisation as books may have multiple genres and borrowers may have multiple occupations and the counts reflect the number of times a genre / occupation is associated with a borrowing record each year (or month if you drill down into a year). We might need to explain this somewhere.
We met on Friday to discuss the outstanding tasks. We’ll probably go live with the resource in January, but I will try to get as many of my outstanding tasks completed before Christmas as possible.
Also this week I fixed another couple of minor issues with the Dictionaries of the Scots Language. The WordPress part of the site had defaulted to using the new, horrible blocks interface for widgets after a recent update, meaning the widgets I’d created for the site no longer worked. Thankfully installing the ‘Classic Widgets’ plugin fixed the issue. I also needed to tweak the CSS for one of the pages where the layout was slightly wonky.
I also made a minor update to the Speech Star site and made a few more changes to the new Robert Fergusson site, which has now gone live (see https://robert-fergusson.glasgow.ac.uk/). I also had a chat with our IT people about a further server switch that is going to take place next week and responded to some feedback about the new interactive map of Iona placenames I’m developing.
Also this week I updated the links to one of the cognate reference websites (FEW) from entries in the Anglo-Norman Dictionary, as the website had changed its URL and site structure. After some initial investigation it appeared that the new FEW website made it impossible to link to a specific page, which is not great for an academic resource that people will want to bookmark and cite. Ideally the owners of the site should have placed redirects from the pages of the old resource to the corresponding page on the new resource (as I did for the AND).
The old links to the FEW as were found in the AND (e.g. the FEW link that before the update was on this page: https://anglo-norman.net/entry/poer_1) were formatted like so: https://apps.atilf.fr/lecteurFEW/lire/volume/90/page/231 which now gives a ‘not found’ error. The above URL has the volume number (9, which for reasons unknown to me was specified as ‘90’) and the page number (213). The new resource as found here: https://lecteur-few.atilf.fr/ and it lets you select a volume (e.g. 9: Placabilis-Pyxis) and enter a page (e.g. 231), which then updates the data on the page (e.g. showing ‘posse’ as the original link from AND ‘poer 1’ used to do). But crucially, their system does not update the URL in the address bar, meaning no-one can cite or bookmark their updated view and it looked like we couldn’t link to a specific view.
Thankfully Geert noticed that another cognate reference site (the DMF) had updated their links to use new URLs that are not documented on the FEW site, but do appear to work (e.g. https://lecteur-few.atilf.fr/lire/90/231). This was quite a relief to discover as otherwise we would not have been able to link to specific FEW pages. Once I knew this URL structure was available updating the URLs across the site was a quick update.
Finally this week, I had a meeting with Clara Cohen and Maria Dokovova to discuss a possible new project that they are putting together. This will involve developing a language game aimed at primary school kids and we discussed some possible options for this during our meeting. After wards I wrote up my notes and gave the matter some further thought.
I spent most of this week working towards adding genre to the Books and Borrowing front-end, working on a version running on my laptop. My initial task was to update the Solr index to add in additional fields for genre. With the new fields added I then had to update my script that generates the data for Solr to incorporate the fields. The Solr index is of borrowing records so as with authors, I needed to extract all genre associations at all book levels (work, edition, holding, item) for each book that was associated with a borrowing record, ensuring lower level associations replaced any higher level associations and removing any duplicates. This is all academic for now as all genre associations are at Work level, but this may not always be the case. It took a few attempts to get the data just right (e.g. after one export I realised it would be good to have genre IDs in the index as well as their names) and each run-through took about an hour or so to process, but all is looking good now. I’ll need to ask Stirling IT to create a new Solr core and ingest the new data on the server at Stirling as this is not something I have the access to do myself, and I’ll do this next week. The screenshot below shows one of the records in Solr with the new genre fields present.
With Solr updated I then began updating the front-end, in a version of the site running on my laptop. This required making significant updates to the API that generates all of the data for the front-end by connecting to both Solr and the database as well as updating the actual output to ensure genre is displayed. I updated the Advanced Search forms (simple and advanced) to add in a list of genres from which you can select any you’re interested in (see the following two screenshots) and updated the search facilities to enable the selected genres to be searched, either on their own or in combination with the other search options.
On the search results page any genres associated with a matching record are displayed, with associations at higher book levels cascading down to lower book levels (unless the lower book level has its own genre records). Genres appear in the records as clickable items, allowing you to perform a search for a genre you’re interested in by clicking on it. I’ve also added in genre as a filter option down the left of the results page. Any genres present in the results are listed, together with a count of the number of associated records, and you can filter the results by pressing on a genre, as you can see in the following screenshot, which shows the results of a quick search for ‘Egypt’, displaying the genre filter options and showing the appearance of genre in the records.
Genre is displayed in a similar way wherever book records appear elsewhere in the site, for example the lists of books for a library, the top-level ‘book editions’ page and when viewing a specific page in a library register.
There is still more to be done with genre, which I’ll continue with next week. This includes adding in new visualisations for genre, adding in new ‘facts and figures’ relating to genre and adding in facilities to limit the ‘browse books’ pages to specific genres. I’ll keep you posted next week.
I also spent some time going through the API and front-end fixing any notifications and warnings given by the PHP scripting language. These are not errors as such, just messages that PHP logs when it thinks there might be an issue, for example if a variable is referenced without it being explicitly instantiated first. These messages get added to a log file and are never publicly displayed (unless the server is set to display them) but it’s better to address them to avoid cluttering up the log files so I’ve (hopefully) sorted them all now. Also for the project this week I generated a list of all book editions that currently have no associated book work. There are currently 2474 of these and they will need to be investigated by the team.
I also met with Luca Guariento and Stevie Barret to have a catch-up and also to compile a list of key responsibilities for a server administrator who would manage the Arts servers. We discovered this week that Arts IT Support is no longer continuing, with all support being moved to central IT Services. We still have our own servers and require someone to manage them so hopefully our list will be taken into consideration and we will be kept informed of any future developments.
Also this week I created a new blog for a project Gavin Miller is setting up, fixed an issue that took down every dictionary entry in the Anglo-Norman Dictionary (caused by one of the project staff adding an invalid ID to the system) and completed the migration of the old Arts server to our third-party supplier.
I also investigated an issue with the Place-names of Mull and Ulva CMS that was causing source details to be wiped. The script that populates the source fields when an existing source is selected from the autocomplete list was failing to load in data. This meant that all other fields for the source were left blank, so when the ‘Add’ button was pressed the script assumed the user wanted all of the other fields to be blank and therefore wiped them. This situation was only happening very infrequently and what I reckon happened is that the data for the source that failed included a character that is not permitted in JSON data (maybe a double quote or a tab), meaning when the script tried to grab the data it failed to parse it and silently failed to populate the required fields. I therefore updated the script that returns the source fields so that double quotes and tab characters are stripped out of the fields before the data is returned. I also created a script based on this that outputs all sources as JSON data to check for errors and thankfully the output is valid JSON.
I also made a couple of minor tweaks to the Dictionaries of the Scots Language site, fixing an issue with the display of the advanced search results that had been introduced when I updated the code prior to the site’s recent migration to a new server and updating the wording of the ‘About this entry’ box. I also had an email conversation with Craig Lamont about a potential new project and spoke to Clara Cohen about a project she’s putting together.
I continued with the new ‘map-first’ front-end for the Iona project this week, adding in an option to reset the map to the default view on the ‘Home’ section adding the scale of the map to the bottom left. I then spent quite a bit of time working on a first version of the ‘Browse’ option. Pressing on the ‘Browse’ accordion header now displays a drop-down list through which you can select what type of browse you want to do. This defaults to ‘Current place-name’ and displays the letters that place-names begin with together with a count of the number of places beginning with each letter. The browse accordion replicates all that is found on the browse pages of the place-names resources I’ve previously developed (e.g. https://berwickshire-placenames.glasgow.ac.uk/place-names/?p=browse), but is much more efficient with the use of space. Pressing on a letter updates the map markers to only show those of places beginning with the letter, as the following screenshot demonstrates (with a pop-up open):
I haven’t implemented the automatic zooming to show all markers yet, and selected display options such as categorisation are not yet remembered, but I will be adding these features in. The other browse options are also all fully functioning, although ‘Parish’ might not be much use as currently every place-name is found in the same parish. ‘Source’ has a lot of data, but the area scrolls so I think it works ok. The screenshot below shows a browse for sources with ‘Admiralty Chart no. 2617’ selected:
I also implemented the quick search this week, which as with the other place-name resources searches current names, historical forms and elements. The screenshot below shows a search for ‘cnoc’, zoomed in on the north of the island with labels turned on:
Note that in addition to pressing the ‘reset map’ button on the ‘Home’ section if you’ve performed a quick search and want to reset things you can just delete the text from the searchbox and press the search button. I hope to be able to continue with this next week. The big things remaining are the advanced search, the full place-name record popup and also what to do about the elements glossary. There are lots of smaller things still to do too, such as allowing specific views to be bookmarked, shared and cited.
I spent most of the remainder of the week preparing the sites on our internally hosted server for an upgrade to PHP8. This involved getting PHP8 working on my local PC and then setting up each site locally with errors turned on in order to test each script. I’ve also been taking the opportunity to deal with any warnings and notices, plus upgrading jQuery where applicable. The only fatal issue I’ve encountered so far has been the use of the ‘usort’ function with a Boolen return value, which is deprecated in PHP 8 (see https://stackoverflow.com/questions/65382799/what-happened-in-php8-0-0-to-break-usort-intstrlenastrlenb). This has affected a few sites. Other than that I’ve not found anything that breaks in PHP8. I haven’t finished checking all sites yet, but will hopefully finish this tedious but necessary task next week.
Also this week I helped Petra Poncarova with an issue with her project website. She was wanting to add accordions to pages and I found and installed a plugin that works with the Classic WordPress editor (https://en-gb.wordpress.org/plugins/easy-accordion-free/). It’s very easy to use and I added an accordion section to one of the pages on her site to show how it works: https://erskine.glasgow.ac.uk/people/donald-sinclair/.
I also gave some advice to Craig Lamont as he works on the banner for Rhona Brown’s new website, spoke to Thomas Clancy about another new place-names project and arranged to have a new WordPress site set up for a project Gavin Miller is setting up.
After a delightful holiday last week I was back at work again this week. This involved spending quite a bit of time catching up with emails and dealing with the ongoing issue of migrating sites from old servers to either our new external supplier or a newer server hosted internally. I was involved with the migration of the SCOTS Corpus to a new server, with my work including fixing a few PHP errors that were cropping up on the more up to date server. There were also some issues relating to database connections as the original code (which I didn’t write) uses rather a lot of connections – more than the new server was set to allow. We had thought we’d fixed the issue but it looks like further investigation will be required.
We also migrated the thesaurus.ac.uk site and the Bilingual Thesaurus of Everyday Life in Medieval England (https://thesaurus.ac.uk/bth/) to a new server, which also required tweaking some of the code. The new server was caching scripts that generated different output each time they were run (e.g. to generate the random category on the homepage), meaning the category wasn’t random but was constantly stuck on ‘Lard a roast’, which wasn’t very helpful. Thankfully we managed to unstick the cache.
Also this week I investigated an issue with the advanced search of the Dictionaries of the Scots Language as the full-text search had stopped working. It turned out that the Solr index that powers this search had entirely disappeared from the server, which is more than a little concerning. It wasn’t a huge issue to rectify as I had the configuration scripts and the data on my PC, but we’re in the dark as to how the index could have been removed. It had also been brought to my attention that some of the video files I’d uploaded for the Speech Star project before I went on holiday had also disappeared and I’ve reuploaded them too. Our IT people are investigating what might have caused these issues and if they are linked, but it is concerning.
I also spent a bit of time looking through the old arts.gla.ac.uk server to try and figure out what needed to be retained from it. It’s mostly old subject area sites that were long ago superseded by T4, plus old conference sites that are no longer needed. A few of the other sites I’ve already previously moved to T4 myself (e.g. https://www.gla.ac.uk/schools/critical/aboutus/resources/stella/projects/starn/ and https://www.gla.ac.uk/schools/critical/aboutus/resources/stella/projects/bibliography-of-scottish-literature/). The only site that I think need to be retained are the STELLA apps that I developed from old teaching resources in around 2015. I therefore requested a new subdomain be set up to host them and migrated them over. I’ve also requested we set up external hosting for arts.gla.ac.uk, purely to host redirects from old URLs so we don’t end up with broken links. The new sites are now available (see https://stella.glasgow.ac.uk/aries/, https://stella.glasgow.ac.uk/grammar/, https://stella.glasgow.ac.uk/eoe/, https://stella.glasgow.ac.uk/metre/ and https://stella.glasgow.ac.uk/readings/) but the redirects from the old URLs are not yet in place. I’d really like to spend some time redeveloping all of these old apps (apart from ARIES, which has already been redeveloped). Maybe next year I’ll find some time.
I also set up a new project website for Rhona Brown in Scottish Literature. I’ve created a bare-bones website at the moment and I’m awaiting further instruction from her on things like themes, colour schemes, site structure and logos. I also tweaked the project website I’d set up a couple of weeks ago for Petra Poncarova in Scottish Literature to improve the URLs for the Gaelic version of the homepage and helped a project team member get access to a site I’d set up for Matthew Creasy in English Literature.
On Wednesday morning this week I participated in a networking event for the new Research Professional Staff Network. The event went well and it was very interesting to find out more about other people involved in research support across the University.
For the remainder of the week I began work on the development of the new ‘map first’ interface for the place-names projects, which I’m developing initially for the Iona project. Below is a screenshot of how things look so far:
At the moment the interface consists of a narrow bar at the top of the browser window with the site’s icon, title and subtitle using the blue colour of the site’s banner as a background. You can press on the logo or site title to navigate to the main site. The rest of the browser is taken up with the map. On the left is the side menu. As discussed in the requirements document I previously wrote, it consists of four collapsible sections, with ‘Home’ open by default. I haven’t had the time to implement the search and browse options yet, but the ‘Display options’ section is operational, as you can see above. Pressing on the section’s title will open the section and you can access the various options. You can show or hide the side menu by pressing on the button above it.
For the moment the map displays all data that has been marked as ‘on web’ in the CMS (362 records, I think). By default these are colour-coded by classification code. The legend is displayed in the top right, allowing you to turn specific features on or off. You can also show or hide the legend to free up space. In the bottom right are zoom options plus a ‘full screen’ button that does what you’d expect. You can press on a map marker to open up the pop-up. As of yet there is no link through to the full record and some Gaelic fields may be visible. These will be removed at some point.
Using the ‘Display options’ in the side menu you can change how the map markers are classified. We may need to be a little more fine-grained with start date and especially altitude. Also colours for classification codes are currently arbitrarily assigned but we might want to change this – having blue for ‘field’ seems a bit daft, for example. You can also change the base map and these options are currently the same as for the other place-name sites. We still need to figure out if / how we can integrate another map of Iona that we discussed at a meeting before I went on holiday. There is also an option to turn labels on or off.
That’s as far as I’ve got this week. There’s still a lot to do but I’ve made pretty good progress. I’ll hopefully find some time to continue with this next week. I also discovered that the Leaflet mapping library has a method to set the map view so as to show all markers at the closest zoom possible so I’ll ensure I use this when I develop the search and the browse. I’m currently already using it when the map is first opened to ensure that all of Iona, Soa in the south-west and Eilean Annraidh in the north-east are always visible, no matter what dimensions your screen / browse window are.
This was a week of many different projects. On Monday I completed work on a new project website for Petra Poncarova in Scottish Literature, and it is now publicly accessible (see https://erskine.glasgow.ac.uk/). I also added a blog page to Ophira Gamliel’s project website, created a page for their first blog post (now available here: https://himuje-malabar.glasgow.ac.uk/reconnecting-the-split-moon/) and updated the site to include a link to the blog in the site menu. This required shifting a few things around to make room for the new menu item. I also investigated an issue Luca was having in migrating one of Graeme Cannon’s old websites which was similarly structured to the House of Fraser Archive site and managed to find the section of code that was causing the problem (a flag in a regular expression that has since been deprecated).
On Tuesday I completed my work on the CSV endpoints for the Books and Borrowing project, ensuring all nested arrays are ‘flattened’ when producing the two-dimensional CSV file. This has been a lengthy and tedious task, but it’s good that it’s done, and it should mean that future researchers will be able to extract and reuse the data in a relatively straightforward manner.
On Wednesday I met Luca and Stevie, two of my fellow College of Arts developers to have a catch-up, which was hugely useful as always. We’ll hopefully meet up again in the next couple of months. I also responded to a request from Luca to help get some screenshots ready for print publication. Screenshots are generally 72DPI but this is too low for print. I’ve previously got around this using Photoshop by loading the image then going to image -> image size. In the options you can then untick ‘Resample Image’ and then update the ‘resolution’ to whatever you want. I’ve never actually printed the resulting images to check any difference, but I’ve never had anyone come back and ask for better versions. I guess another option would be to take the screenshots on something like an iPad that natively runs at a higher DPI.
Also on Wednesday I spent some time on the DSL, investigating an issue with Google Analytics for Pauline Graham and then investigating a problem with phrase searching and highlighting that Pauline had also noticed on both the live and test sites. When a phrase was searched for each individual word in the phrase was being highlighted in the entry, and then if you returned to the search results and went back to an entry from there no highlighting worked. Also some search results were not featuring snippets. This turned out to be three separate issues that needed to be investigated and fixed:
- Separate word highlighting: The default setting in the highlighting library I installed a few months ago highlighted each word in a string. If there were multiple words separated by spaces then all matching words would be highlighted. Thankfully the library (https://markjs.io/) has a setting that only matches the entire string and I’ve activated this now. Now if you perform a search for ‘off or on’ or something and navigate to a result only the exact term will be highlighted.
- Losing the highlighting when navigating back to the results and then to an entry: This was a problem with spaces getting encoded between pages. They were becoming the URL encoded equivalent ‘%B’ or ‘+’ and after that the string no longer matched. I’ve sorted this.
- Lack of snippets: The issue was down to the length of the entry. In Solr, the snippet generation is a separate process to the search matching. While the search checks the entire entry the snippet generation by default only looks at the first 51,200 characters. An entry such as ‘Mak’ is a long entry and if the search term only matches text quite far down the entry a snippet doesn’t get created. After discovering this I’ve updated the setting so that 100,000 characters are analysed instead and this has fixed the issue. More information about this can be found at https://stackoverflow.com/questions/52511154/solr-empty-highlight-entry-on-match.
This investigation took some of Thursday as well, after which I moved back to the Books and Borrowing project, for which I spent some time generating data relating to the Royal High School for checking purposes. I also received some bid documentation for a proposal Gavin Miller is putting together. Gavin wanted me to read through the documentation and add in some further sections relating to the data. The data will consist of a directory of projects and resources which will be available to search and browse, plus will be visualised on an interactive map. I added in some information and hopefully the proposal is a success.
On Friday I made some further updates to the Speech Star websites, adding in some new videos to the Edinburgh MRI Modelled Speech Corpus (https://www.seeingspeech.ac.uk/speechstar/edinburgh-mri-modelled-speech-corpus/) and arranging their layout a bit better. I also replied to a request from Rhona Brown, who would like a website to be set up for a new project she’s starting work on soon. I listed a few options we could pursue and I need to wait to hear more from her now.
I also spent quite of bit of time investigating some minor issues Ann Ferguson had spotted with the predictive search on the DSL website, most of which will thankfully be sorted when the new Solr based headword search goes live.
Finally, I had a meeting with the Placenames of Iona project to discuss the development of a new ‘map first’ interface for the data. I met with Thomas, Sofia and Alasdair and it was really great to actually have an in person meeting with them, having never done so before. We discussed many aspects of the interface and had some really useful discussions. I’ll be starting on the development of the front-end in the coming weeks.
I had my PDR session on Monday this week, which was all very positive. There was also one further UCU strike day on Wednesday this week, cutting my working days down to four. The project I devoted the most of the available time to was Books and Borrowing. Last week I had begun reworking the API to make it more usable and this week I completed this task, adding in a few endpoints that I’d created but hadn’t added to the documentation. I then moved onto the task of adding ‘Download data’ links to the front-end. These links now appear as buttons beside the ‘Cite’ button on any page that displays data, as you can see in the following screenshot:
Pressing on the button loads the API endpoint used to return the data found on the page with ‘CSV’ rather than ‘JSON’ selected as the file type. This then prompts the file to be downloaded by the browser rather than loading the data into the browser tab. It took a bit of time to add these links to every required page on the site, but I think I’ve got them all. However, the CSV downloads still needed quite a lot of work doing to them. When formatted as JSON any data held in nested arrays are properly transformed and usable, but a CSV is a flat file consisting of columns and rows and the data has a more complicated structure than this. For example, if we have one row in the CSV file for each borrowing record on a register page the record may have multiple associated borrowers, each with any number of occupations consisting of multiple fields. The record’s book holding may have any number of book items and may be associated with multiple book editions and there may be multiple authors associated with any level of book record (item, holding, edition and work). Representing this structure in a simple two-dimensional spreadsheet is very tricky and requires the data to be ‘flattened’. In order to do so a script needs to work out the maximum number of each variable items a record in the returned data has in order to create the required columns (with heading labels) and to pad out any other records that don’t have the maximum number of items with empty columns so that the columns of all records line up.
So, for example, when looking at borrowers: If borrowing row number 16 out of 20 has a borrower with five occupations then column headings need to be added for five sets of occupation columns and the data for the remaining 19 rows needs to be padded out with empty data to ensure any columns that appear after occupations continue to line up. As a borrowing may involve multiple borrowers this then becomes even more complicated.
I managed to update the API to ensure nested arrays were flattened for several of the most complicated endpoints, such as a page of records and the search results. The resulting CSV files can become quite monstrously large, with over 200 columns of data a regular occurrence. However, with the data properly structured and labelled it should hopefully make it easier for users who are interested in the data to download the CSV and then delete the columns they are not interested in, resulting in a more manageable file. I still need to complete the ‘flattening’ of CSV data for a few other endpoints, which I hope to tackle next week.
Also this week I had an email discussion with Petra Poncarova, a researcher in Scottish Literature who is beginning a research project and requires a project website. I’ve arranged for hosting to be set up for this and by the end of the week we had the desired subdomain and WordPress installation. I spent a bit of time on Friday afternoon getting the structure and plugins in place and next week I’ll work on the interface for the website.
I also made a couple of further updates to the House of Fraser Archive this week. I’d completed most of the work last week but hadn’t managed to get the search facility working. After some suggestions from Luca I managed to figure out what the problem was (it turned out to be the date search part of the query that was broken) and the search is now operational. We even managed to get results highlighting in the records working again, which is something I wasn’t sure we’d be able to do.
The rest of my time was spent making updates to the Speech Star websites (and Seeing Speech). Eleanor had noticed some errors in the metadata for a couple of the videos in the IPA charts so I fixed these. There were also some better quality videos to add to the ExtIPA charts and some further updates to the metadata here too. Also for this project Jane Stuart-Smith contacted me to say that I had been erroneously categorised as ‘Directly Incurred’ rather than ‘Directly Allocated’ when the grant application had been processed, which is now causing some bother. I may have to create timesheets for my work on the project, but we’ll see what transpires.
I spent a fair amount of time this week preparing for my PDR session – bringing together information about what I’ve done over the past year and filling out the necessary form. I also had a meeting with Jennifer Smith to discuss an ESRC proposal she’s putting together using some of the data from the SCOSYA project and then spent some further time after the meeting researching some tools the project might use and reading the Case for Support.
I also spent a bit of time working for the Anglo-Norman Dictionary, updating the XSLT file to better handle varlists in citations. So for example instead of:
( MS: s.xiiiex ) Satureia: (A6) gallice savoroye (var. saveray (A9) MS: c.1300 ; saveroy (A12) MS: s.xiii4/4 ; savoreie (B3) MS: s.xiv4/4 ; savoré (C35) MS: s.xv ) Plant Names 230
( MS: s.xiiiex ) Satureia: (A6) gallice savoroye (var. (A9: c.1300) saveray; (A12: s.xiii4/4) saveroy; (B3: s.xiv4/4) savoreie; (C35: s.xv) savoré) Plant Names 230
I completed an initial version of the update using test files and after discussions with the editor Geert and a few minor tweaks the update went live on Wednesday.
I also spent a bit of time working to fix the House of Fraser Archive website, which I created with Graeme Cannon many moons ago. It uses an eXist XML database but needed to be migrated to a new server with more modern versions due to security issues. I spent some time figuring out how to connect to the new eXist database and had just managed to find a solution when the server went down and I was unable to access it. It was still offline at the end of the week, which is a bit frustrating.
I also made a couple of minor tweaks to a conference website for Matthew Creasy and gave some advice to Ewan Hannaford about adding people to a mailing list. My updates to the DSL also went live this week on the DSL’s test server, and I emailed the team a detailed report of the changes, highlighting points for discussion. I’m sure I’ll need to make a number of changes to the features I’ve developed over the past few weeks once the team have had a chance to test things out. We’ll see what they say once they get back to me.
I was also contacted this week by Eleanor Lawson with a long list of changes she wanted me to make to the two Speech Star websites. Many of these were minor tweaks to text, but there were some larger issues too. I needed to update the way sound filters appear on the website in order to group different sounds together and to ensure the sounds always appear in the correct order. This was a pretty tricky thing to accomplish as the filters are automatically generated and vary depending on what other filter options the user has selected. It took a while to get working, but I got there in the end, thankfully. Eleanor had also sent me a new set of videos that needed to be added to the Edinburgh MRI Modelled Speech Corpus. These were chunks of some of the existing videos as a decision had been made that splitting them up would be more useful for users. I therefore had to process the videos and add all of the required data for them to the database. All is looking good now, though.
Next week I’ll be participating in the UCU strike action on Monday and Tuesday so it’s going to be a short week for me.
I’d originally intended each year in the range to then appear as a bar in the sparkline, with no gaps between the bars in order to make larger blocks, but the bar chart sparkline that the library offers has a minimum bar width of 1 pixel. As the DOST period is 650 years this meant the sparkline would be 650 pixels wide. The screenshot below shows how this would have looked (note that in this and the following two screenshots the data represented in the sparklines is test data and doesn’t correspond to the individual entries):
I then tried grouping the individual years into bars representing five years instead. If a 1 was present in a five-year period then the value for that five year block was given a 1, otherwise it was given a 0. As you can see in the following screenshot, this worked pretty well, giving the same overall view of the data but in a smaller space. However, the sparklines were still a bit too long. I also added in the first attested date for the entry to the left of the sparkline here, as was specified in the requirements document:
As a further experiment I grouped the individual years into bars representing a decade, and again if a year in that decade featured a 1 the decade was assigned a 1, otherwise it was assigned a 0. This resulted in a sparkline that I reckon is about the right size, as you can see in the screenshot below:
With this in place I then updated the Solr indexes for entries and quotations to add in fields for the sparkline data and the sparkline tooltip text. I then updated my scripts that generated entry and quotation data for Solr to incorporate the code for generating the sparklines, first creating blocks of attestation where individual citation dates were separated by 25 years or less and then further grouping the data into decades. It took some time to get this working just right. For example, on my first attempt when encountering individual years the textual version was outputting a range with the start and end year the same (e.g. 1710-1710) when it should have just outputted a single year. But after a few iterations the data outputted successfully and I imported the new data into Solr.
With the sparkline data in Solr I then needed to update the API to retrieve the data alongside other data types and after that I could work with the data in the front-end, populating the sparklines for each result with the data for each entry and adding in the textual representation as a tooltip. Having previously worked with a DOST entry as a sample, I realised at this point that as the SND period is much shorter (300 years as opposed to 650) the SND sparklines would be a lot shorter (30 pixels as opposed to 65). Thankfully the sparkline library allows you to specify the width of the bars as each sparkline is generated and I set the width of SND bars to two pixels as opposed to the one pixel for DOST, making the SND sparklines a comparable 600 pixels wide. It does mean that the visualisation of the SND data is not exactly the same as for DOST (e.g. an individual year is represented as 2 pixels as opposed to 1) but I think the overall picture given is comparable and I don’t think this is a problem – we are just giving an overall impression of periods of attestation after all. The screenshot below shows the search results with the sparklines working with actual data, and also demonstrates a tooltip that displays the actual periods of attestation:
At this point I spotted another couple of quirks that needed to be dealt with. Firstly, we have some entries that don’t feature any citations that include dates. These understandably displayed a blank sparkline. In such cases I have updated the tooltip text to display ‘No dates of attestation currently available’. Secondly, there is a bug in the sparkline library that means an empty sparkline is displayed if all data values are identical. Having spotted this I updated my code to ensure a full block of colour was displayed in the sparkline instead of white.
With the sparklines in the search results now working I then moved onto the display of sparklines in the entry page. I wasn’t entirely sure where was the best place to put the sparkline so for now I’ve added it to the ‘About this entry’ section. I’ve also added in the dates of attestation to this section too. This is a simplified version showing the start and end dates. I’ve used ‘to’ to separate the start and end date rather than a dash because both the start and end dates can in themselves be ranges. This is because here I’m using the display version of the first date of the earliest citation and the last date of the latest citation (or first date if there is no last date). Note that this includes prefixes and representations such as ’15..’. The sparkline tooltip uses the raw years only. You can see an entry with the new dates and sparkline below:
The design of the sparklines isn’t finalised yet and we may choose to display them differently. For example, we don’t need to use the purple I’ve chosen and we could have rounded ends. The following screenshot shows the sparklines with the blue from the site header as a bar colour and rounded ends. This looks quite pleasing, but rounded ends do make it a little more difficult to see the data at the ends of the sparkline. See for example DOST ‘scunner n.’ where the two lines at the very right of the sparkline are a bit hard to see.
I also managed to complete the final task in this block of work for the DSL, which was to add in links to the search results to download the data as a CSV. The API already has facilities to output data as a CSV, but I needed to tweak this a bit to ensure the data was exported as we needed it. Fields that were arrays were not displaying properly and certain fields needed to be supressed. For other sites I’ve developed I was able to link directly to the API’s CSV output from the front-end but the DSL’s API is not publicly accessible so I had to do things a littler differently here. Instead pressing on the ‘download’ link fires an AJAX call to a PHP script that passes the query string to the API without exposing the URL of the API, then takes the CSV data and presents it as a downloadable file. This took a bit of time to sort out as the API was in itself offering the CSV as a downloadable file and this wasn’t working when being passed to another script. Instead I had to set the API to output the CSV data on screen, meaning the scripts called via AJAX could then grab this data and process it.
With all of this working I put in a Helpdesk request to get the Solr instances set up and populated on the server and I then copied all of the updated files to the DSL’s test instance. As of Friday the new Solr indexes don’t seem to be working but hopefully early next week everything will be operational. I’ll then just need to tweak the search strings of the headword search so that the new Solr headword search matches the existing search.
Also this week I had a chat with Thomas Clancy about the development of the front-end for the Iona place-names project. About a year ago I wrote a specification for the front-end but never heard anything further about it, but it looks like development will be starting soon. I also had a chat with Jennifer Smith about the data for the Speak For Yersel spin-off projects and it looks like this will be coming together in the next few weeks too. We also discussed another project that may use the data from SCOSYA and I might have some involvement in this.
Other than that I spent a bit of time on the Anglo-Norman Dictionary, creating a CSS file to style the entry XML in the Oxygen XML editor’s ‘Author’ view. The team are intending to use this view to collaborate on the entries and previously we hadn’t created any styles for it. I had to generate styles that replicated the look of the online dictionary as much as possible, which took some time to get right. I’m pretty happy with the end result, though, which you can see in the following screenshot:
I spent pretty much the whole week working on the new date facilities for the Dictionaries of the Scots Language. I have now migrated the headword search to Solr, which was a fairly major undertaking, but was necessary to allow headword searches to be filtered. I decided to create a new Solr core for DSL entries that would be used by both the headword search and the fulltext / fulltext with no quotes searches. This made sense because I would otherwise have needed to update the Solr core for fulltext to add the additional fields needed for filtering anyway. With the new core in place I then updated the script I wrote to generate the Solr index to include the new fields (e.g. headword, forms, dates) and generated the data, which I then imported into Solr.
With the new Solr core populated with data I then updated the API to work with it, replacing the existing headword search which queried the database with a new search than instead connects to Solr. As all of the fields that get returned by a search are now stored in Solr the database no longer needs to be queried at all, which should make things faster. Previously the fulltext search queried the Solr index and then once entries were returned the database was then queried for each of these to add in the other necessary fields, which was a bit inefficient.
With the API updated the website (still only the version running on my laptop) then automatically used the new Solr index for headword searches: the quick search, the predictive search and the headword search in the advanced search. I did some tests comparing the site on my laptop to the live site and things are looking good, but I will need to tweak the default use of wildcards. The new headword search matches exact terms by default, which is the equivalent on the live site of surrounding the term with quotes, (something the quick search does by default anyway). I can’t really tweak this easily until I move the new site to our test server, though, as Windows (which my laptop uses) can’t cope with asterisks and quotes in filenames, which means the website on my laptop breaks if a URL includes these characters.
With the new headword and fulltext index in place I then moved on to implementing the date filtering options. In order to do so I realised I would also have to add the entry attestation dates (from and to) to the quotation index as well, as a ‘first attested’ filter on a quotation search will use these dates. This meant updating the quotation Solr index structure, tweaking my script that generates the quotation data for Solr, running the script to output the data and then ingesting this into Solr, all of which took some time.
I then worked with the Solr admin interface to figure out how to perform a filter query for both first attestation and the ‘in use’ period. ‘First attested’ was pretty straightforward as a single year is queried. Say the year is 1658 a filter would bring it back if the filter was a single year that matched (i.e. 1658) or a range that contained the year (e.g. 1650-1700). The ‘in use’ filter was much more complex to figure out as the data to be queried is a range. If the attestation period is 1658-1900 and a single year filter is given (e.g. 1700) then we need to check whether this year is within the range. What is more complicated is when the filter is also a range. E.g. 1600-1700 needs to return the entry even though 1600 is less than 1658 and 1850-2000 needs to return the entry even though 2000 is greater than 1900. 1600-2000 also needs to return the entry even though both ends extend beyond the period, but 1660-1670 also needs to return the entry as both ends are entirely within the period.
The answer to this headache-inducing problem was to run a query that checked whether the start date of the filter range was less than the end date of the attestation range and the end date of the filter range was greater than the start date of the attestation range. So for example the attestation range is 1658-1900. Filter range 1 is 1600-1700. 1600 is less than 1900 and 1700 is greater than 1658 so the entry is returned. Filter range 2 is 1850-2000. 1850 is less than 1900 and 200 is greater than 1658 so the entry is returned. Filter range 3 is 1600-2000. 1600 is less than 1900 and 2000 is greater than 1658 so the entry is returned. Filter range 4 is 1660-1670. 1660 is less and 1900 and 1670 is greater than 1658 so the entry is returned. Filter range 5 is 1600-1650. 1600 is less than 1900 but 1650 is not greater than 1658 so the entry is not returned. Filter range 6 is 1901-1950. 1901 is not less than 1900 but 1950 is greater than 1658 so the entry is not returned.
Having figured out how to implement the filter query in Solr I then needed to update the API to take filter query requests, process them, format the query and then pass this to Solr. This was a pretty major update and took quite some time to implement, especially as the quotation search needed to be handled differently for the ‘in use’ search, which as agreed with the team was to query the dates of the individual quotations rather than the overall period of attestation for an entry. I managed to get it all working, though, allowing me to pass searches to the API by changing variables in a URL and filter the results by passing further variables.
With this in place I could then update the front-end to add in the option of filtering the results. I decided to add the option as a box above the search results. Originally I was going to place it down the left-hand side, but space is rather limited due to the two-column layout of a search result that covers both SND and DOST. The new ‘Filter the results’ box consists of buttons for choosing between ‘First attested’ and ‘In use’ and ‘from’ and ‘to’ boxes where years can be entered. There is also an ‘Update’ button and a ‘clear’ button. Supplying a filter and pressing ‘Update’ reloads the page with the results filtered based on your criteria. It will be possible to bookmark or cite a filtered search as the filters are added to the page URL. The filter box appears on all search results pages, including the quick search, and seems to be working as intended.
So for example the screenshot below shows a filtered fulltext search for ‘burn’. Without the filter this brings back more results than we allow, but if you filter the results to only those that are first attested between 1600 and 1700 a more reasonable number is returned, as the screenshot shows:
The second screenshot shows the entries that were in use during this period rather than first attested, which as you can see gives a larger number of results:
As mentioned, the ‘in use’ filter works differently for quotations, limiting those that are displayed to ones in the filter period. The screenshot below shows an ‘in use’ filter of 1650 to 1750 for a quotation search for ‘burn’:
The filter is ‘remembered’ when you navigate to an entry and then use the ‘back to search results’ button. You can clear the filter by pressing on the ‘clear’ button or deleting the years in the ‘from’ and ‘to’ boxes and pressing ‘update’. Previously if there was only one search result the results page would automatically redirect to the entry. This was also happening when an applied filter only gave one result, which I found very annoying so instead if a filter is present and only one result is returned the results page is still displayed instead. Next week I’ll work on adding the first attested dates to the search results and I’ll also begin to develop the sparklines.
Other than this I had a meeting with Joanna Kopaczyk to further discuss a project she’s putting together. It looks like it will be a fairly small pilot project to begin with and I’ll only be involved in a limited capacity, but it has great potential.