I spent most of my time this week getting back into the development of the front-end for the Books and Borrowing project. It’s been a long time since I was able to work on this due to commitments to other projects and also due to there being a lot more for me to do than I was expecting regarding processing images and generating associated data in the project’s content management system over the summer. However, I have been able to get back into the development of the front-end this week and managed to make some pretty good progress. The first thing I did was to make some changes to the ‘libraries’ page based on feedback I received ages ago from the project’s Co-I Matt Sangster. The map of libraries used clustering to group libraries that are close together when the map is zoomed out, but Matt didn’t like this. I therefore removed the clusters and turned the library locations back into regular individual markers. However, it is now rather difficult to distinguish the markers for a number of libraries. For example, the markers for Glasgow and the Hunterian libraries (back when the University was still on the High Street) are on top of each other and you have to zoom in a very long way before you can even tell there are two markers there.
I also updated the tabular view of libraries. Previously the library name was a button that when clicked on opened the library’s page. Now the name is text and there are two buttons underneath. The first one opens the library page while the second pans and zooms the map to the selected library, whilst also scrolling the page to the top of the map. This uses Leaflet’s ‘flyTo’ function which works pretty well, although the map tiles don’t quite load in fast enough for the automatic ‘zoom out, pan and zoom in’ to proceed as smoothly as it ought to.
After that I moved onto the library page, which previously just displayed the map and the library name. I updated the tabs for the various sections to display the number of registers, books and borrowers that are associated with the library. The Introduction page also now features the information recorded about the library that has been entered into the CMS. This includes location information, dates, links to the library etc. Beneath the summary info there is the map, and beneath this is a bar chart showing the number of borrowings per year at the library. Beneath the bar chart you can find the longer textual fields about the library such as descriptions and sources. Here’s a screenshot of the page for St Andrews:
I also worked on the ‘Registers’ tab, which now displays a tabular list of the selected library’s registers, and I also ensured that when you select one of the tabs other than ‘Introduction’ the page automatically scrolls down to the top of the tabs to avoid the need to manually scroll past the header image (but we still may make this narrower eventually). The tabular list of registers can be ordered by any of the columns and includes data on the number of pages, borrowers, books and borrowing records featured in each.
When you open a register the information about it is displayed (e.g. descriptions, dates, stats about the number of books etc referenced in the register) and large thumbnails of each page together with page numbers and the number of records on each page are displayed. The thumbnails are rather large and I could make them smaller, but doing so would mean that all the pages end up looking the same – beige rectangles. The thumbnails are generated on the fly by the IIIF server and the first time a register is loaded it can take a while for the thumbnails to load in. However, generated thumbnails are then cached on the server so subsequent page loads are a lot quicker. Here’s a screenshot of a register page for St Andrews:
One thing I also did was write a script to add in a new ‘pageorder’ field to the ‘page’ database table. I then wrote a script that generated the page order for every page in every register in the system. This picks out the page that has no preceding page and iterates through pages based on the ‘next page’ ID. Previously pages in lists were ordered by their auto-incrementing ID, but this meant that if new pages needed to be inserted for a register they ended up stuck at the end of the list, even though the ‘next’ and ‘previous’ links worked successfully. This new ‘pageorder’ field ensures lists of pages are displayed in the proper order. I’ve updated the CMS to ensure this new field is used when viewing a register, although I haven’t as of yet updated the CMS to regenerate the ‘pageorder’ for a register if new pages are added out of sequence. For now if this happens I’ll need to manually run my script again to update things.
Anyway, back to the front-end: The new ‘pageorder’ is used in the list of pages mentioned above so the thumbnails get displaying in the correct order. I may add pagination to this page, as all of the thumbnails are currently on one page and it can take a while to load, although these days people seem to prefer having long pages rather than having data split over multiple pages.
The final section I worked on was the page for viewing an actual page of the register, and this is still very much in progress. You can open a register page by pressing on its thumbnail and currently you can navigate through the register using the ‘next’ and ‘previous’ buttons or return to the list of pages. I still need to add in a ‘jump to page’ feature here too. As discussed in the requirements document, there will be three views of the page: Text, Image and Text and Image side-by-side. Currently I have implemented the image view only. Pressing on the ‘Image view’ tab opens a zoomable / pannable interface through which the image of the register page can be viewed. You can also make this interface full screen by pressing on the button in the top right. Also, if you’re viewing the image and you use the ‘next’ and ‘previous’ navigation links you will stay on the ‘image’ tab when other pages load. Here’s a screenshot of the ‘image view’ of the page:
Also this week I wrote a three-page requirements document for the redevelopment of the front-ends for the various place-names projects I’ve created using the system originally developed for the Berwickshire place-names project which launched back in 2018. The requirements document proposes some major changes to the front-end, moving to an interface that operates almost entirely within the map and enabling users to search and browse all data from within the map view rather than having to navigate to other pages. I sent the document off to Thomas Clancy, for whom I’m currently developing the systems for two place-names projects (Ayr and Iona) and I’ll just need to wait to hear back from him before I take things further.
I also responded to a query from Marc Alexander about the number of categories in the Thesaurus of Old English, investigated a couple of server issues that were affecting the Glasgow Medical Humanities site, removed all existing place-name elements from the Iona place-names CMS so that the team can start afresh and responded to a query from Eleanor Lawson about the filenames of video files on the Seeing Speech site. I also made some further tweaks to the Speak For Yersel resource ahead of its launch next week. This included adding survey numbers to the survey page and updating the navigation links and writing a script that purges a user and all related data from the system. I ran this to remove all of my test data from the system. If we do need to delete a user in future (either because their data is clearly spam or a malicious attempt to skew the results, or because a user has asked us to remove their data) I can run this script again. I also ran through every single activity on the site to check everything was working correctly. The only thing I noticed is that I hadn’t updated the script to remove the flags for completed surveys when a user logs out, meaning after logging out and creating a new user the ticks for completed surveys were still displaying. I fixed this.
I also fixed a few issues with the Burns mini-site about Kozeluch, including updating the table sort options which had stopped working correctly when I added a new column to the table last week and fixing some typos with the introductory text. I also had a chat with the editor of the Anglo-Norman Dictionary about future developments and responded to a query from Ann Ferguson about the DSL bibliographies. Next week I will continue with the B&B developments.
It was a four-day week this week due to the Queen’s funeral on Monday. I divided my time for the remaining four days over several projects. For Speak For Yersel I finally tackled the issue of the way maps are loaded. The system had been developed for a map to be loaded afresh every time data is requested, with any existing map destroyed in the process. This worked fine when the maps didn’t contain demographic filters as generally each map only needed to be loaded once and then never changed until an entirely new map was needed (e.g. for the next survey question). However, I was then asked to incorporate demographic filters (age groups, gender, education level), with new data requested based on the option the user selected. This all went through the same map loading function, which still destroyed and reinitiated the entire map on each request. This worked, but wasn’t ideal, as it meant the map reset to its default view and zoom level whenever you changed an option, map tiles were reloaded from the server unnecessarily and if the user was in ‘full screen’ mode they were booted out of this as the full screen map no longer existed. For some time I’ve been meaning to redevelop this to address these issues, but I’ve held off as there were always other things to tackled and I was worried about essentially ripping apart the code and having to rebuilt fundamental aspects of it. This week I finally plucked up the courage to delve into the code.
I created a test version of the site so as to not risk messing up the live version and managed to develop an updated method of loading the maps. This method initiates the map only once when a page is first loaded rather than destroying and regenerating the map every time a new question is loaded or demographic data is changed. This means the number of map tile loads is greatly reduced as the base map doesn’t change until the user zooms or pans. It also means the location and zoom level a user has left the map on stays the same when the data is changed. For example, if they’re interested in Glasgow and are zoomed in on it they can quickly flick between different demographic settings and the map will stay zoomed in on Glasgow rather than resetting each time. Also, if you’re viewing the map in full-screen mode you can now change the demographic settings without the resource exiting out of full screen mode.
All worked very well, with the only issues being that the transitions between survey questions and quiz questions weren’t as smooth as the with older method. Previously the map scrolled up and was then destroyed, then a new map was created and the data was loaded into the area before it smoothly scrolled down again. For various technical reasons this no longer worked quite as well any more. The map area still scrolls up and down, but the new data only populates the map as the map area scrolls down, meaning for a brief second you can still see the data and legend for the previous question before it switches to the new data. However, I spent some further time investigating this issue and managed to fix it, with different fixes required for the survey and the quiz. I also noticed a bug whereby the map would increase in size to fit the available space but the map layers and data were not extending properly into the newly expanded area. This is a known issue with Leaflet maps that have their size changed dynamically and there’s actually a Leaflet function that sorts it – I just needed to call map.invalidateSize(); and the map worked properly again. Of course it took a bit of time to figure this simple fix out.
I also made some further updates to the site. Based on feedback about the difficulty some people are having about which surveys they’ve done, I updated the site to log when the user completes a survey. Now when the user goes to the survey index page a count of the number of surveys they’ve completed is displayed in the top right and a green tick has been added to the button of each survey they have completed. Also, when they reach the ‘what next’ page for a survey a count of their completed survey is also shown. This should make it much easier for people to track what they’ve done. I also made a few small tweaks to the data at the request of Jennifer, and create a new version of the animated GIF that has speech bubbles, as the bubble for Shetland needed its text changed. As I didn’t have the files available I took the opportunity regenerate the GIF, using a larger map, as the older version looked quite fuzzy on a high definition screen like an iPad. I kept the region outlines on as well to tie it in better with our interactive maps. Also the font used in the new version is now the ‘Baloo’ font we use for the site. I stored all of the individual frames both as images and as powerpoint slides so I can change them if required. For future reference, I created the animated GIF using https://ezgif.com/maker with a 150 second delay between slides, crossfade on and a fader delay of 8.
Also this week I researched an issue with the Scots Thesaurus that was causing the site to fail to load. The WordPress options table had become corrupted and unreadable and needed to be replaced with a version from the backups, which thankfully fixed things. I also did my expenses from the DHC in Sheffield, which took longer than I thought it would, and made some further tweaks to the Kozeluch mini-site on the Burns C21 website. This included regenerating the data from a spreadsheet via a script I’d written and tweaking the introductory text. I also responded to a request from Fraser Dallachy to regenerate some data that a script Id’ previously written had outputted. I also began writing a requirements document for the redevelopment of the place-names project front-ends to make them more ‘map first’.
I also did a bit more work for Speech Star, making some changes to the database of non-disordered speech and moving the ‘child speech error database’ to a new location. I also met with Luca to have a chat about the BOSLIT project, its data, the interface and future plans. We had a great chat and I then spent a lot of Friday thinking about the project and formulating some feedback that I sent in a lengthy email to Luca, Lorna Hughes and Kirsteen McCue on Friday afternoon.
I spent a bit of time this week going through my notes from the Digital Humanities Congress last week and writing last week’s lengthy post. I also had my PDR session on Friday and I needed to spend some time preparing for this, writing all of the necessary text and then attending the session. It was all very positive and it was a good opportunity to talk to my line manager about my role. I’ve been in this job for ten years this month and have been writing these blog posts every working week for those ten years, which I think is quite an achievement.
In terms of actual work on projects, it was rather a bitty week, with my time spread across lots of different projects. On Monday I had a Zoom call for the VariCS project, a phonetics project in collaboration with Strathclyde that I’m involved with. The project is just starting up and this was the first time the team had all met. We mainly discussed setting up a web presence for the project and I gave some advice on how we could set up the website, the URL and such things. In the coming weeks I’ll probably get something set up for the project.
I then moved onto another Burns-related mini-project that I worked on with Kirsteen McCue many months ago – a digital edition of Koželuch’s settings of Robert Burns’s Songs for George Thomson. We’re almost ready to launch this now and this week I created a page for an introductory essay, migrated a Word document to WordPress to fill the page, including adding in links and tweaking the layout to ensure things like quotes displayed properly. There are still some further tweaks that I’ll need to implement next week, but we’re almost there.
I also spent some time tweaking the Speak For Yersel website, which is now publicly accessible (https://speakforyersel.ac.uk/) but still not quite finished. I created a page for a video tour of the resource and made a few tweaks to the layout, such as checking the consistency of font sizes used throughout the site. I also made some updates to the site text and added in some lengthy static content to the site in the form or a teachers’ FAQ and a ‘more information’ page. I also changed the order of some of the buttons shown after a survey is completed to hopefully make it clearer that other surveys are available.
I also did a bit of work for the Speech Star project. There had been some issues with the Central Scottish Phonetic Features MP4s playing audio only on some operating systems and the replacements that Eleanor had generated worked for her but not for me. I therefore tried uploading them to and re-downloading them from YouTube, which thankfully seemed to fix the issue for everyone. I then made some tweaks to the interfaces to the two project websites. For the public site I made some updates to ensure the interface looked better on narrow screens, ensuring changing the appearance of the ‘menu’ button and making the logo and site header font smaller to they take up less space. I also added an introductory video to the homepage too.
For the Books and Borrowing project I processed the images for another library register. This didn’t go entirely smoothly. I had been sent 73 images and these were all upside down so needed rotating. It then transpired that I should have been sent 273 images so needed to chase up the missing ones. Once I’d been sent the full set I was then able to generate the page images for the register, upload the images and associate them with the records.
I then moved on to setting up the front-end for the Ayr Place-names website. In the process of doing so I became aware that one of the NLS map layers that all of our place-name projects use had stopped working. It turned out that the NLS had migrated this map layer to a third party map tile service (https://www.maptiler.com/nls/) and the old URLs these sites were still using no longer worked. I had a very helpful chat with Chris Fleet at NLS Maps about this and he explained the situation. I was able to set up a free account with the maptiler service and update the URLS in four place-names websites that referenced the layer (https://berwickshire-placenames.glasgow.ac.uk/, https://kcb-placenames.glasgow.ac.uk/, https://ayr-placenames.glasgow.ac.uk and https://comparative-kingship.glasgow.ac.uk/scotland/). I’ll need to ensure this is also done for the two further place-names projects that are still in development (https://mull-ulva-placenames.glasgow.ac.uk and https://iona-placenames.glasgow.ac.uk/).
I managed to complete the work on the front-end for the Ayr project, which was mostly straightforward as it was just adapting what I’d previously developed for other projects. The thing that took the longest was getting the parish data and the locations where the parish three-letter acronyms should appear, but I was able to get this working thanks to the notes I’d made the last time I needed to deal with parish boundaries (as documented here: https://digital-humanities.glasgow.ac.uk/2021-07-05/. After discussions with Thomas Clancy about the front-end I decided that it would be a good idea to redevelop the map-based interface to display al of the data on the map by default and to incorporate all of the search and browse options within the map itself. This would be a big change, and it’s one I had been thinking of implementing anyway for the Iona project, but I’ll try and find some time to work on this for all of the place-name sites over the coming months.
Finally, I had a chat with Kirsteen McCue and Luca Guariento about the BOSLIT project. This project is taking the existing data for the Bibliography of Scottish Literature in Translation (available on the NLS website here: https://data.nls.uk/data/metadata-collections/boslit/) and creating a new resource from it, including visualisations. I offered to help out with this and will be meeting with Luca to discuss things further, probably next week.
I divided my time between a number of different projects this week. For Speak For Yersel I replaced the ‘click’ transcripts with new versions that incorporated shorter segments and more highlighted words. As the segments were now different I also needed to delete all existing responses to the ‘click’ activity. I then completed the activity once for each speaker to test things out, and all seems to work fine with the new data. I also changed the pop-up ‘percentage clicks’ text to ‘% clicks occurred here’, which is more accurate than the previous text which suggested it was the percentage of respondents. I also fixed an issue with the map height being too small on the ‘where do you think this speaker is from’ quiz and ensured the page scrolls to the correct place when a new question is loaded. I also removed the ‘tip’ text from the quiz intros and renamed the ‘where do you think this speaker is from’ map buttons on the map intro page. I’d also been asked to trim down the number of ‘translation’ questions from the ‘I would never say that’ activity so I removed some of those. I then changed and relocated the ‘heard in films and TV’ explanatory text removed the question mark from the ‘where do you think the speaker is from’ quiz intro page.
Mary had encountered a glitch with the transcription popups, whereby the page would flicker and jump about when certain popups were hovered over. This was caused by the page height increasing to accommodate the pop-up, causing a scrollbar to appear in the browser, which changed the position of the cursor and made the pop-up disappear, making the scrollbar go and causing a glitchy loop. I increased the height of the page for this activity so the scrollbar issue is no longer encountered, and I also made the popups a bit wider so they don’t need to be as long. Mary also noticed that some of the ‘all over Scotland’ dynamically generated map answers seemed to be incorrect. After some investigation I realised that this was a bug that had been introduced when I added in the ‘I would never say that’ quizzes on Friday. A typo in the code meant that the 60% threshold for correct answers in each region was being used rather than ‘100 divided by the number of answer options’. Thankfully once identified this was easy to fix.
I also participated in a Zoom call for the project this week to discuss the launch of the resource with the University’s media people. It was agreed that the launch will be pushed back to the beginning of October as this should be a good time to get publicity. Finally for the project this week I updated the structure of the site so that the ‘About’ menu item could become a drop-down menu, and I created placeholder pages for three new pages that will be added to this menu for things like FAQs.
I also continued to work on the Books and Borrowing project this week. On Friday last week I didn’t quite get to finish a script to merge page records for one of the St Andrews registers as it needed further testing on my local PC before I ran it on the live data. I tackled this issue first thing on Monday and it was a task I had hoped would only take half an hour or so. Unfortunately things did not go well and it took most of the morning to sort out. I initially attempted to run things on my local PC to test everything out, but I forgot to update the database connection details. Usually this wouldn’t be an issue as generally the databases I work with use ‘localhost’ as a connection URL, so the Stirling credentials would have been wrong for my local DB and the script would have just quit, but Stirling (where the system is hosted) uses a full URL instead of ‘localhost’. This meant that even though I had a local copy of the database on my PC and the scripts were running on a local server set up on my PC the scripts were in fact connecting to the real database at Stirling. This meant the live data was being changed. I didn’t realise this as the script was running and as it was taking some time I cancelled it, meaning the update quit halfway through changing borrowing records and deleting page records in the CMS.
I then had to write a further script to delete all of the page and borrowing records for this register from the Stirling server and reinstate the data from my local database. Thankfully this worked ok. I then ran my test script on the actual local database on my PC and the script did exactly what I wanted it to do, namely:
Iterate through the pages and for each odd numbered page move the records on these to the preceding even numbered page, and at the same time regenerate the ‘page order’ for each record so they follow on from the existing records. Then the even page needs its folio number updated to add in the odd number (e.g. so folio number 2 becomes ‘2-3’) and generate an image reference based on this (e.g. UYLY207-2_2-3). Then delete the odd page record and after all that is done regenerate the ‘next’ and ‘previous’ page links for all pages.
This all worked so I ran the script on the server and updated the live data. However, I then noticed that there are gaps in the folio numbers and this has messed everything up. For example, folio number 314 isn’t followed by 315 but 320. 320 isn’t an odd number so it doesn’t get joined to 314. All subsequent page joins are then messed up. There are also two ‘350’ pages in the CMS and two images that reference 350. We have UYLY207-2_349-350 and also UYLY207-2_350-351. There might be other situations where the data isn’t uniform too.
I therefore had to use my ‘delete and reinsert’ script again to revert to the data prior to the update as my script wasn’t set up to work with pages that don’t just increment their folio number by 1 each time. After some discussion with the RA I updated the script again so that it would work with the non-uniform data and thankfully all worked fine after that. Later in the week I also found some time to process two further St Andrews registers that needed their pages and records merged, and thankfully these went much smoother.
I also worked on the Speech Star project this week. I created a new page on both of the project’s sites (which are not live yet) for viewing videos of Central Scottish phonetic features. I also replaced the temporary logos used on the sites with the finalised logos that had been designed by a graphic designer. However, the new logo only really works well on a white background as the white cut-out round the speech bubble into the star becomes the background colour of the header. The blue that we’re currently using for the site header doesn’t work so well with the logo colours. Also, the graphic designer had proposed using a different font for the site and I decided to make a new interface for the site, which you can see below. I’m still waiting for feedback to see whether the team prefer this to the old interface (a screenshot of which you can see on this page: https://digital-humanities.glasgow.ac.uk/2022-01-17/) but I personally think it looks a lot better.
I also returned to the Burns Manuscript database that I’d begun last week. I added a ‘view record’ icon to each row which if pressed on opens a ‘card’ view of the record on its own page. I also added in the search options, which appear in a section above the table. By default, the section is hidden and you can show/hide it by pressing on a button. Above this I’ve also added in a placeholder where some introductory text can go. If you open the ‘Search options’ section you’ll find text boxes where you can enter text for year, content, properties and notes. For year you can either enter a specific year or a range. The other text fields are purely free-text at the moment, so no wildcards. I can add these in but I think it would just complicate things unnecessarily. On the second row are checkboxes for type, location name and condition. You can select one or more of each of these.
The search options are linked by AND, and the checkbox options are linked internally by OR. For example, filling in ‘1780-1783’ for year and ‘wrapper’ for properties will find all rows with a date between 1780 and 1783 that also have ‘wrapper’ somewhere in their properties. If you enter ‘work’ in content and select ‘Deed’ and ‘Fragment’ as types you will find all rows that are either ‘Deed’ or ‘Wrapper’ and have ‘work’ in their content.
If a search option is entered and you press the ‘Search’ button the page will reload with the search options open, and the page will scroll down to this section. Any rows matching your criteria will be displayed in the table below this. You can also clear the search by pressing on the ‘Clear search options’ button. In addition, if you’re looking at search results and you press on the ‘view record’ button the ‘Return to table’ button on the ‘card’ view will reload the search results. That’s this mini-site completed now, pending feedback from the project team, and you can see a screenshot of the site with the search box open below:
Also this week I’d arranged an in-person coffee and catch up with the other College of Arts developers. We used to have these meetings regularly before Covid but this was the first time since then that we’d all met up. It was really great to chat with Luca Guariento, Stevie Barrett and David Wilson again and to share the work we’d been doing since we last met. Hopefully we can meet again soon.
Finally this week I helped out with a few WordPress questions from a couple of projects and I also had a chance to update all of the WordPress sites I manage (more than 50) to the most recent version.
I continued to spend a lot of my time working on the Speak For Yersel project this week. We had a team meeting on Monday at which we discussed the outstanding tasks and particularly how I was going to tackle converting the quiz questions into dynamic answers. Previously the quiz question answers were static, which will not work well as the maps the users will reference in order to answer a question are dynamic, meaning the correct answer may evolve over time. I had proposed a couple of methods that we could use to ensure that the answers were dynamically generated based on the currently available data and we finalised our approach today.
Although I’d already made quite a bit of progress with my previous test scripts, there was still a lot to do to actually update the site. I needed to update the structure of the database, the script that outputs the data for use in the site, the scripts that handle the display of questions and the evaluation of answers, and the scripts that store a user’s selected answers.
Changes to the database allow for dynamic quiz questions to be stored (non-dynamic ones have fixed ‘answer options’ but dynamic ones don’t). Changes also allow for references to the relevant answer option of the survey question the quiz question is about to be stored (e.g. that the quiz is about the ‘mother’ map and specifically about the use of ‘mam’). I made significant updates to the script that outputs data for use in the site to integrate the functions from my earlier test script that calculated the correct answer. I updated these functions to change the logic somewhat. They now only use ‘method 1’ as mentioned in an earlier post. This method also now has a built-in check to filter out regions that have the highest percentage of usage but only a limited amount of data. Currently this is set to a minimum of 10 answers for the option in question (e.g. ‘mam’) rather than total number of answers in a region. Regions are ordered by their percentage usage (highest first) and the script iterates down through the regions and will pick as ‘correct’ the first one that has at least 10 answers. I’ve also added in a contingency in cases where none of the regions have at least 10 answers (currently the case for the ‘rocket’ question). In such cases the region marked as ‘correct’ will be the one that has the highest raw count of answers for the answer option rather than the highest percentage.
With the ‘correct’ region picked out the script then picks out all other regions where the usage percentage is at least 10% lower than the correct percentage. This is to ensure that there isn’t an ‘incorrect’ answer that is too similar to the ‘correct’ one. If this results in less than three regions (as regions are only returned if they have clicks for the answer option) then the system goes through the remaining regions and adds these in with a zero percentage. These ‘incorrect’ regions are then shuffled and three are picked out at random. The ‘correct’ answer is then added to these three and the options are shuffled again to ensure the ‘correct’ option is randomly positioned. The dynamically generated output is then plugged into the output script that the website uses.
I then updated the front-end to work with this new data. This also required me to create a new database table to hold the user’s answers, storing the region the user presses on and whether their selection was correct, along with the question ID and the person ID. Non-dynamic answers store the ID of the ‘answer option’ that the user selected, but these dynamic questions don’t have static ‘answer options’ so the structure needed to be different.
I then implemented the dynamic answers for the ‘most of Scotland’ questions. For these questions the script needs to evaluate whether a form is used throughout Scotland or not. The algorithm gets all of the answer options for the survey question (e.g. ‘crying’ and ‘greetin’) and for each region works out the percentage of responses for each option. The team had previously suggested a fixed percentage threshold of 60%, but I reckoned it might be better for the threshold to change depending on how many answer options there are. Currently I’ve set the threshold to be 100 divided by the number of options. So where there are two options the threshold is 50%. Where there are four options (e.g. the ‘wean’ question) the threshold is 25% (i.e. if 25% or more of the answers in a region are for ‘wean’ it is classed as present in the region). Where there are three options (e.g. ‘clap’) the threshold is 33%. Where there are 5 options (e.g. ‘clarty’) the threshold is 20%.
The algorithm counts the number of regions that meet the threshold, and if the number is 8 or more then the term is considered to be found throughout Scotland and ‘Yes’ is the correct answer. If not then ‘No’ is the correct answer. I also had to update the way answers are stored in the database so these yes/no answers can be saved (as they have no associated region like the other questions).
I then moved onto tackling the non-standard (in terms of structure) questions to ensure they are dynamically generated as well. These were rather tricky to do as they each had to be handled differently as they were asking different things of the data (e.g. a question like ‘What are you likely to call the evening meal if you live in Tayside and Angus (Dundee) and didn’t go to Uni?’). I also made the ‘Sounds about right’ quiz dynamic.
I then moved onto tackling the ‘I would never say that’ quiz, which has been somewhat tricky to get working as the structure of the survey questions and answers is very different. Quizzes for the other surveys involved looking at a specific answer option but for this survey the answer options are different rating levels that each need to be processed and handled differently.
For this quiz for each region the system returns the number of times each rating level has been selected and works out the percentages for each. It then adds the ‘I’ve never heard this’ and ‘people elsewhere say this’ percentages together as a ‘no’ percentage and adds the ‘people around me say this’ and ‘I’d say this myself’ percentages together as a ‘yes’ percentage. Currently there is no weighting but we may want to consider this (e.g. ‘I’d say this’ would be worth more than ‘people around me’).
With these ratings stored the script handled question types differently. For the ‘select a region’ type of question the system works in a similar way to the other quizzes: It sorts the regions by ‘yes’ percentage with the biggest first. It then iterates through the regions and picks as the correct answer the first it comes to where the total number of responses for the region is the same or greater than the minimum allowed (currently set to 10). Note that this is different to the other quizzes where this check for 10 is made against the specific answer option rather than the number of responses in the region as a whole.
If no region passes the above check then the region with the highest ‘yes’ percentage without a minimum allowed check is chosen as the correct answer. The system then picks out all other regions with data where the ‘yes’ percentage is at least 10% lower than the correct answer, adds in regions with no data if less than three have data, shuffles the regions and picks out three. These are then added to the ‘correct’ region and the answers are shuffled again.
I changed the questions that had an ‘all over Scotland’ answer option so that these are now ‘yes/no’ questions, e.g. ‘Is ‘Are you wanting to come with me?’ heard throughout most of Scotland?’. For these questions the system uses 8 regions as the threshold, as with the other quizzes. However, the percentage threshold for ‘yes’ is fixed. I’ve currently set this to 60% (i.e. at least 60% of all answers in a region are either ‘people around me say this’ or ‘I’d say this myself’). There is currently no minimum number of responses limit for this question type, so a region with 1 single answer that’s ‘people around me say this’ will have a 100% ‘yes’ and the region will included. This is also the case for the ‘most of Scotland’ questions in the other quizzes, as we may need to tweak this.
As we’re using percentages rather than exact number of dots the questions can sometimes be a bit tricky. For example the first question currently has Glasgow as the correct answer because all but two of the markers in this region are ‘people around me say this’ or ‘I’d say this myself’. But if you turn off the other two categories and just look at the number of dots you might surmise that the North East is the correct answer as there are more dots there, even though proportionally fewer of them are the high ratings. I don’t know if we can make it clearer that we’re asking which region has proportionally more higher ratings without confusing people further, though.
I also spent some time this week working on the Book and Borrowing project. I had to make a few tweaks to the Chambers map of borrowers to make the map work better on smaller screens. I ensured that both the ‘Map options’ section on the left and the ‘map legend’ on the right are given a fixed height that is shorter than the map and the areas become scrollable, as I’d noticed that on short screens both these areas could end up longer than the map and therefore their lower parts were inaccessible. I’ve also added a ‘show/hide’ button to the map legend, enabling people to hide the area if it obscures their view of the map.
I also sent on some renamed library register files from St Andrews to Gerry for him to align with existing pages in the CMS, replaced some of the page images for the Dumfries register and renamed and uploaded images for a further St Andrews register that already existed in the CMS, ensuring the images became associated with the existing pages.
I started to work on the images for another St Andrews register that already exists in the system, but for this one the images are a double page spread so I need to merge two pages into one in the CMS. The script needs to find all odd numbered pages then move the records on these to the preceding even numbered page, and at the same time regenerate the ‘page order’ for each record so they follow on from the existing records. Then the even page needs its folio number updated to add in the odd number (e.g. so folio number 2 becomes ‘2-3’. Then I need to delete the odd page record and after all that is done I need to regenerate the ‘next’ and ‘previous’ page links for all pages. I completed everything except the final task, but I really need to test the script out on a version of the database running on my local PC first, as if anything goes wrong data could very easily be lost. I’ll need to tackle this next week as I ran out of time this week.
I also participated in our six-monthly formal review meeting for the Dictionaries of the Scots Language where we discussed our achievements in the past six months and our plans for the next. I also made some tweaks to the DSL website, such as splitting up the ‘Abbreviations and symbols’ buttons into two separate links, updating the text found on a couple of the old maps pages and considering future changes to the bibliography XSLT to allow links in the ‘oral sources’
Finally this week I made a start on the Burns manuscript database for Craig Lamont. I wrote a script that extracts the data from Craig’s spreadsheet and imports it into an online database. We will be able to rerun this whenever I’m given a new version of the spreadsheet. I then created an initial version of a front-end for the database within the layout for the Burns Correspondence and Poetry site. Currently the front-end only displays the data in one table with columns for type, date, content, physical properties, additional notes and locations. The latter contains the location name, shelfmark (if applicable) and condition (if applicable) for all locations associated with a record, each on a separate line with the location name in bold. Currently it’s possible to order the columns by clicking on them. Clicking a second time reverses the order. I haven’t had a chance to create any search or filter options yet but I’m intending to continue with this next week.
I worked for several different projects this week. For the Books and Borrowing project I processed and imported a further register for the Advocates library that had been digitised by the NLS. I also continued with the interactive map of Chambers library borrowers, although I couldn’t spend as much time on this as I’d hoped as my access to Stirling University’s VPN had stopped working and without VPN access I can’t connect to the database and the project server. It took a while to resolve the issue as access needs to be approved by some manager or other, but once it was sorted I got to work on some updates.
One thing I’d noticed last week was that when zooming and panning the historical map layer was throwing out hundreds of 403 Forbidden errors to the browser console. This was not having any impact on the user experience, but was still a bit messy and I wanted to get to the bottom of the issue. I had a very helpful (as always) chat with Chris Fleet at NLS Maps, who provided the historical map layer and he reckoned it was because the historical map only covers a certain area and moving beyond this was still sending requests for map tiles that didn’t exist. Thankfully an option exists in Leaflet that allows you to set the boundaries for a map layer (https://leafletjs.com/reference.html#latlngbounds) and I updated the code to do just that, which seems to have stopped the errors.
I then returned to the occupations categorisation, which was including far too many options. I therefore streamlined the occupations, displaying the top-level occupation only. I think this works a lot better (although I need to change the icon colour for ‘unknown’). Full occupation information is still available for each borrower via the popup.
I also had to change the range slider for opacity as standard HTML range sliders don’t allow for double-ended ranges. We require a double-ended range for the subscription period and I didn’t want to have two range sliders that looked different on one page. I therefore switched to a range slider offered by the jQuery UI interface library (https://jqueryui.com/slider/#range). The opacity slider still works as before, it just looks a little different. Actually, it works better than before, as the opacity now changes as you slide rather than only updating after you mouse-up.
I then began to implement the subscription period slider. This does not yet update the data. It’s been pretty tricky to implement this. The range needs to be dynamically generated based on the earliest and latest dates in the data, and dates are both year and month, which need to be converted into plain integers for the slider and then reinterpreted as years and months when the user updates the end positions. I think I’ve got this working as it should, though. When you update the ends of the slider the text above that lists the months and years updates to reflect this. The next step will be to actually filter the data based on the chosen period. Here’s a screenshot of the map featuring data categorised by the new streamlined occupations and the new sliders displayed:
For the Speak For Yersel project I made a number of tweaks to the resource, which Jennifer and Mary are piloting with school children in the North East this week. I added in a new grammatical question and seven grammatical quiz questions. I tweaked the homepage text and updated the structure of questions 27-29 of the ‘sound about right’ activity. I ensured that ‘Dumfries’ always appears as ‘Dumfries and Galloway’ in the ‘clever’ activity and follow-on and updated the ‘clever’ activity to remove the stereotype questions. These were the ones where users had to rate the speakers from a region without first listening to any audio clips and Jennifer reckoned these were taking too long to complete. I also updated the ‘clever’ follow-on to hide the stereotype options and switched the order of the listener and speaker options in the other follow-on activity for this type.
For the Speech Star project I replaced the data for the child speech error database with a new, expanded dataset and added in ‘Speaker Code’ as a filter option. I also replicated the child speech and normalised speech databases from the clinical website we’re creating on the more academic teaching site we’re creating and also pulled in the IPA chart from Seeing Speech into this resource too. Here’s a screenshot of how the child speech error database looks with the new ‘speaker code’ filter with ‘vowel disorder’ selected:
I also made a couple of tweaks to the DSL this week, installing the TablePress plugin for the ancillary pages and creating a further alternative logo for the DSL’s Facebook posts. I also returned to going some work for the Anglo-Norman Dictionary, offering some advice to the editor Geert about incorporating publications and overhauling how cross references are displayed in the Dictionary Management System.
I updated the ‘View Entry’ page in the DMS. Previously it only included cross references FROM the entry you’re looking at TO any other entries. I.e. it only displayed content when the entry was of type ‘xref’ rather than ‘main’. Now in addition to this there’s a further section listing all cross references TO the entry you’re looking at from any entry of type ‘xref’ that links to it.
In addition there is a button allowing you to view all entries that include a cross reference to the current entry anywhere in their XML – i.e. where an <xref> tag that features the current entry’s slug is found at any level in any other main entry’s XML. This code is hugely memory intensive to run, as basically all 27,464 main entries need to be pulled into the script, with the full XML contents of each checked for matching xrefs. For this reason the page doesn’t run the code each time the ‘view entry’ page is loaded but instead only runs when you actively press the button. It takes a few seconds for the script to process, but after it does the cross references are listed in the same manner as the ‘pure’ xrefs in the preceding sections.
Finally I participated in a Zoom-based focus group for the AHRC about the role of technicians in research projects this week. It was great to participate to share my views on my role and to hear from other people with similar roles at other organisations.
I’d taken Monday off this week to have an extra-long weekend following the jubilee holidays on Thursday and Friday last week. On Tuesday I returned to another meeting for Speak For Yersel and a list of further tweaks to the site, including many changes to three of the five activities and a new set of colours for the map marker icons, which make the markers much more easy to differentiate.
I spent most of the week working on the Books and Borrowing project. We’d been sent a new library register from the NLS and I spent a bit of time downloading the 700 or so images, processing them and uploading them into our system. As usual, page numbers go a bit weird. Page 632 is written as 634 and then after page 669 comes not 670 but 700! I ran my script to bring the page numbers in the system into line with the oddities of the written numbers. On Friday I downloaded a further library register which I’ll need to process next week.
My main focus for the project was the Chambers Library interactive map sub-site. The map features the John Ainslie 1804 map from the NLS, and currently it uses the same modern map as I’ve used elsewhere in the front-end for consistency, although this may change. The map defaults to having a ‘Map options’ pane open on the left, and you can open and close this using the button above it. I also added a ‘Full screen’ button beneath the zoom buttons in the bottom right. I also added this to the other maps in the front-end too. Borrower markers have a ‘person’ icon and the library itself has the ‘open book’ icon as found on other maps.
By default the data is categorised by borrower gender, with somewhat stereotypical (but possibly helpful) blue and pink colours differentiating the two. There is one borrower with an ‘unknown’ gender and this is set to green. The map legend in the top right allows you to turn on and off specific data groups. The screenshot below shows this categorisation:
The next categorisation option is occupation, and this has some problems. The first is there are almost 30 different occupations, meaning the legend is awfully long and so many different marker colours are needed that some of them are difficult to differentiate. Secondly, most occupations only have a handful of people. Thirdly, some people have multiple occupations, and if so these are treated as one long occupation, so we have both ‘Independent Means > Gentleman’ and then ‘Independent Means > Gentleman, Politics/Office Holders > MP (Britain)’. It would be tricky to separate these out as the marker would then need to belong to two sets with two colours, plus what happens if you hide one set? I wonder if we should just use the top-level categorisation for the groupings instead? This would result in 12 groupings plus ‘unknown’, meaning the legend would be both shorter and narrower. Below is a screenshot of the occupation categorisation as it currently stands:
The next categorisation is subscription type, which I don’t think needs any explanation. I then decided to add in a further categorisation for number of borrowings, which wasn’t originally discussed but as I used the page I found myself looking for an option to see who borrowed the most, or didn’t borrow anything. I added the following groupings, but these may change: 0, 1-10, 11-20, 21-50, 51-70, 70+ and have used a sequential colour scale (darker = more borrowings). We might want to tweak this, though, as some of the colours are a bit too similar. I haven’t added in the filter to select subscription period yet, but will look into this next week.
At the bottom of the map options is a facility to change the opacity of the historical map so you can see the modern street layout. This is handy for example for figuring out why there is a cluster of markers in a field where ‘Ainslie Place’ was presumably built after the historical map was produced.
I decided to not include the marker clustering option in this map for now as clustering would make it more difficult to analyse the categorisation as markers from multiple groupings would end up clustered together and lose their individual colours until the cluster is split. Marker hover-overs display the borrower name and the pop-ups contain information about the borrower. I still need to add in the borrowing period data, and also figure out how best to link out to information about the borrowings or page images. The Chambers Library pin displays the same information as found in the ‘libraries’ page you’ve previously seen.
Also this week I responded to a couple of queries from the DSL people about Google Analytics and the icons that gets used for the site when posting on Facebook. Facebook was picking out the University of Glasgow logo rather than the DSL one, which wasn’t ideal. Apparently there’s a ‘meta’ tag that you need to add to the site header in order for Facebook to pick up the correct logo, as discussed here: https://stackoverflow.com/questions/7836753/how-to-customize-the-icon-displayed-on-facebook-when-posting-a-url-onto-wall
I also created a new user for the Ayr place-names project and dealt with a couple of minor issues with the CMS that Simon Taylor had encountered. I also investigated a certificate error with the ohos.ac.uk website and responded to a query about QR codes from fellow developer David Wilson. Also, Craig Lamont in Scottish Literature got in touch about a spreadsheet listed Burns manuscripts that he’s been working on with a view to turning it into a searchable online resource and I gave him some feedback about the structure of the spreadsheet.
Finally, I did a bit of work for the Historical Thesaurus, working on a further script to match up HT and OED categories based on suggestions by researcher Beth Beattie. I found a script I’d produced in from 2018 that ran pattern matching on headings and I adapted this to only look at subcats within 02.02 and 02.03, picking out all unmatched OED subcats from these (there are 627) and then finding all unmatched HT categories where our ‘t’ numbers match the OED path. Previously the script used the HT oedmaincat column to link up OED and HT but this no longer matches (e.g. HT ‘smarten up’ has ‘t’ nums 02.02.16.02 which matches OED 02.02.16.02 ‘to smarten up’ whereas HT ‘oedmaincat’ is ’02.04.05.02’).
The script lists the various pattern matches at the top of the page and the output is displayed in a table that can be copied and pasted into Excel. Of the 627 OED subcats there are 528 that match an HT category. However, some of them potentially match multiple HT categories. These appear in red while one to one matches appear in green. Some of these multiple matches are due to Levenshtein matches (e.g. ‘sadism’ and ‘sadist’) but most are due to there being multiple subcats at different levels with the exact same heading. These can be manually tweaked in Excel and then I could run the updated spreadsheet through a script to insert the connections. We also had an HT team meeting this week that I attended.
I had a very busy week this week, working on several different projects. For the Books and Borrowing project I participated in the team Zoom call on Monday to discuss the upcoming development of the front-end and API for the project, which will include many different search and browse facilities, graphs and visualisations. I followed this up with a lengthy email to the PI and Co-I where I listed some previous work I’ve done and discussed some visualisation libraries we could use. In the coming weeks I’ll need to work with them to write a requirements document for the front-end. I also downloaded images from Orkney library, uploaded all of them to the server and generated the necessary register and page records. One register with 7 pages already existed in the system and I ensured that page images were associated with these and the remaining pages of the register fit in with the existing ones. I also processed the Wigtown data that Gerry McKeever had been working on, splitting the data associated with one register into two distinct registers, uploading page images and generating the necessary page records. This was a pretty complicated process, and I still need to complete the work on it next week, as there are several borrowing records listed as separate rows when in actual fact they are merely another volume of the same book borrowed at the same time. These records will need to be amalgamated.
For the Speak For Yersel project I had a meeting with the PI and RA on Monday to discuss updates to the interface I’ve been working on, new data for the ‘click’ exercise and a new type of exercise that will precede the ‘click’ exercise and will involve users listening to sound clips then dragging and dropping them onto areas of a map to see whether they can guess where the speaker is from. I spent some time later in the week making all of the required changes to the interface and the grammar exercise, including updating the style used for the interactive map and using different marker colours.
I also continued to work on the speech database for the Speech Star project based on feedback I received about the first version I completed last week. I added in some new introductory text and changed the order of the filter options. I also made the filter option section hidden by default as it takes up quite a lot of space, especially on narrow screens. There’s now a button to show / hide the filters, with the section sliding down or up. If a filter option is selected the section remains visible by default. I also changed the colour of the filter option section to a grey with a subtle gradient (it gets lighter towards the right) and added a similar gradient to the header, just to see how it looks.
The biggest update was to the filter options, which I overhauled so that instead of a drop-down list where one option in each filter type can be selected there are checkboxes for each filter option, allowing multiple items of any type to be selected. This was a fairly large change to implement as the way selected options are passed to the script and the way the database is queried needed to be completely changed. When an option is selected the page immediately reloads to display the results of the selection and this can also change the contents of the other filter option boxes – e.g. selecting ‘alveolar’ limits the options in the ‘sound’ section. I also removed the ‘All’ option and left all checkboxes unselected by default. This is how filters on clothes shopping sites do it – ‘all’ is the default and a limit is only applied if an option is ticked.
I also changed the ‘accent’ labels as requested, changed the ‘By Prompt’ header to ‘By Word’ and updated the order of items in the ‘position’ filter. I also fixed an issue where ‘cheap’ and ‘choose’ were appearing in a column instead of the real data. Finally, I made the overlay that appears when a video is clicked on darker so it’s more obvious that you can’t click on the buttons. I did investigate whether it was possible to have the popup open while other page elements were still accessible but this is not something that the Bootstrap interface framework that I’m using supports, at least not without a lot of hacking about with its source code. I don’t think it’s worth pursuing this as the popup will cover much of the screen on tablets / phones anyway, and when I add in the option to view multiple videos the popup will be even larger.
Also this week I made some minor tweaks to the Burns mini-project I was working on last week and had a chat with the DSL people about a few items, such as the data import process that we will be going through again in the next month or so and some of the outstanding tasks that I still need to tackle with the DSL’s interface.
I also did some work for the AND this week, investigating a weird timeout error that cropped up on the new server and discussing how best to tackle a major update to the AND’s data. The team have finished working on a major overhaul of the letter S and this is now ready to go live. We have decided that I will ask for a test instance of the AND to be set up so I can work with the new data, testing out how the DMS runs on the new server and how it will cope with such a large update.
The editor, Geert, had also spotted an issue with the textbase search, which didn’t seem to include one of the texts (Fabliaux) he was searching for. I investigated the issue and it looked like the script that extracted words from pages may have silently failed in some cases. There are 12,633 page records in the textbase, each of which has a word count. When the word count is greater than zero my script processes the contents of the page to generate the data for searching. However, there appear to be 1889 pages in the system that have a word count of zero, including all of Fabliaux. Further investigation revealed that my scripts expect the XML to be structured with the main content in a <body> tag. This cuts out all of the front matter and back matter from the searches, which is what we’d agreed should happen and thankfully accounts for many of the supposedly ‘blank’ pages listed above as they’re not the actual body of the text.
However, Fabliaux doesn’t include the <body> tag in the standard way. In fact, the XML file consists of multiple individual texts, each of which has a separate <body> tag. As my script didn’t find a <body> in the expected place no content was processed. I ran a script to check the other texts and the following also have a similar issue: gaunt1372 (710 pages) and polsongs (111 pages), in addition to the 37 pages of Fabliaux. Having identified these I could update my script that generates search words and re-ran it for these texts, fixing the issue.
Also this week I attended a Zoom-based seminar on ‘Digitally Exhibiting Textual Heritage’ that was being run by Information Studies. This featured four speakers from archives, libraries and museums discussing how digital versions of texts can be exhibited, both in galleries and online. Some really interesting projects were discussed, both past and present. This included the BL’s ‘Turning the Pages’ system (http://www.bl.uk/turning-the-pages/) , some really cool transparent LCD display cases (https://crystal-display.com/transparent-displays-and-showcases/) that allow images to be projected on clear glass while objects behind the panel are still visible. 3d representations of gallery spaces were discussed (e.g. https://www.lib.cam.ac.uk/ghostwords), as were ‘long form narrative scrolls’ such as https://www.nytimes.com/projects/2012/snow-fall/index.html#/?part=tunnel-creek, http://www.wolseymanuscripts.ac.uk/ and https://stories.durham.ac.uk/journeys-prologue/. There is a tool that can be used to create these here: https://shorthand.com/. It was a very interesting session!
I divided my time mostly between three projects this week: Speech Star, Speak For Yersel and a Burns mini-project for Kirsteen McCue. For Speech Star I set up the project’s website, based on our mockup number 9 (which still needs work) and completed an initial version of the speech database. As with the Dynamic Dialects accent chart (https://www.dynamicdialects.ac.uk/accent-chart/) , there are limiting options and any combination of these can be selected. The page refreshes after each selection is made and the contents of the other drop-down lists vary depending on the option that is selected. As requested, there are 6 limiting options (accent, sex, age range, sound, articulation and position).
I created two ‘views’ of the data that are available in different tabs on the page. The first is ‘By Accent’ which lists all data by region. Within each region there is a table for each speaker with columns for the word that’s spoken and its corresponding sound, articulation and position. Users can press on a column heading to order the table by that column. Pressing again reverses the order. Note that this only affects the current table and not those of other speakers. Users can also press on the button in the ‘Word’ column to open a popup containing the video, which automatically plays. Pressing any part of the browser window outside of the popup closes the popup and stops the video, as does pressing on the ‘X’ icon in the top-right of the popup.
The ‘By Prompt’ tab presents exactly the same data, but arranged by the word that’s spoken rather than by accent. This allows you to quickly access the videos for all speakers if you’re interested in hearing a particular sound. Note that the limit options apply equally to both tabs and are ‘remembered’ if you switch from one tab to the other.
The main reason I created the two-tab layout is to give users the bi-directional access to video clips that the Dynamic Dialects Accent Chart offers without ending up with a table that is far too long for most screens, especially mobile screens. One thing I haven’t included yet is the option to view multiple video clips side by side. I remember this was discussed as a possibility some time ago but I need to discuss this further with the rest of the team to understand how they would like it to function. Below is a screenshot of the database, but note that the interface is still just a mockup and all elements such as the logo, fonts and colours will likely change before the site launches:
For the Speak For Yersel project I also created an initial project website using our latest mockup template and I migrated both sample activities over to the new site. At the moment the ‘Home’ and ‘About’ pages just have some sample blocks of text I’ve taken from SCOSYA. The ‘Activities’ page provides links to the ‘grammar’ and ‘click’ exercises which mostly work in the same way as in the old mockups with a couple of differences that took some time to implement.
Firstly, the ‘grammar’ exercise now features actual interactive maps throughout the various stages. These are the sample maps I created previously that feature large numbers of randomly positioned markers and local authority boundaries. I also added a ‘fullscreen’ option to the bottom-right of each map (the same as SCOSYA) to give people the option of viewing a larger version of the map. Here’s an example of how the grammar exercise now looks:
Also this week I gave some further advice to the students who are migrating the IJOSTS journal. Fixed an issue with some data in the Old English Thesaurus for Jane Roberts and responded to an enquiry about the English Language Twitter account.
I continued to work on the Books and Borrowing project for a lot of this week, completing some of the tasks I began last week and working on some others. We ran out of server space for digitised page images last week, and although I freed up some space by deleting a bunch of images that were no longer required we still have a lot of images to come. The team estimates that a further 11,575 images will be required. If the images we receive for these pages are comparable to the ones from the NLS, which average around 1.5Mb each, then 30Gb should give us plenty of space. However, after checking through the images we’ve received from other digitisation units it turns out that the NLS images are a vit of an outlier in term of file size and generally 8-10Mb is more usual. If we use this as an estimate then we would maybe require 120Gb-130Gb of additional space. I did some experiments with resizing and changing the image quality of one of the larger images, managing to bring an 8.4Mb image down to 2.4Mb while still retaining its legibility. If we apply this approach to the tens of thousands of larger images we have then this would result in a considerable saving of storage. However, Stirling’s IT people very kindly offered to give us a further 150Gb of space for the images so this resampling process shouldn’t be needed for now at least.
Another task for the project this week was to write a script to renumber the folio numbers for the 14 volumes from the Advocates Library that I noticed had irregular numbering. Each of the 14 volumes had different issues with their handwritten numbering, so I had to tailor my script to each volume in turn, and once the process was complete the folio numbers used to identify page images in the CMS (and eventually in the front-end) entirely matched the handwritten numbers for each volume.
My next task for the project was to import the records for several volumes from the Royal High School of Edinburgh but I ran into a bit of an issue. I had previously been intending to extract the ‘item’ column and create a book holding record and a single book item record for each distinct entry in the column. This would then be associated with all borrowing records in RHS that also feature this exact ‘item’. However, this is going to result in a lot of duplicate holding records due to the contents of the ‘item’ column including information about different volumes of a book and/or sometimes using different spellings.
For example, in SL137142 the book ‘Banier’s Mythology’ appears four times as follows (assuming ‘Banier’ and ‘Bannier’ are the same):
- Banier’s Mythology v. 1, 2
- Banier’s Mythology v. 1, 2
- Bannier’s Myth 4 vols
- Bannier’s Myth. Vol 3 & 4
My script would create one holding and item record for ‘Banier’s Mythology v. 1, 2’ and associate it with the first two borrowing records but the 3rd and 4th items above would end up generating two additional holding / item records which would then be associated with the 3rd and 4th borrowing records.
No script I can write (at least not without a huge amount of work) would be able to figure out that all four of these books are actually the same, or that there are actually 4 volumes for the one book, each requiring its own book item record, and that volumes 1 & 2 need to be associated with borrowing records 1&2 while all 4 volumes need to be associated with borrowing record 3 and volumes 3&4 need to be associated with borrowing record 4. I did wonder whether I might be able to automatically extract volume data from the ‘item’ column but there is just too much variation.
We’re going to have to tackle the normalisation of book holding names and the generation of all required book items for volumes at some point and this either needs to be done prior to ingest via the spreadsheets or after ingest via the CMS.
My feeling is that it might be simpler to do it via the spreadsheets before I import the data. If we were to do this then the ‘Item’ column would become the ‘original title’ and we’d need two further columns, one for the ‘standardised title’ and one listing the volumes, consisting of a number of each volume separated with a comma. With the above examples we would end up with the following (with a | representing a column division):
- Banier’s Mythology v. 1, 2 | Banier’s Mythology | 1,2
- Banier’s Mythology v. 1, 2 | Banier’s Mythology | 1,2
- Bannier’s Myth 4 vols | Banier’s Mythology | 1,2,3,4
- Bannier’s Myth. Vol 3 & 4 | Banier’s Mythology | 3,4
If each sheet of the spreadsheet is ordered alphabetically by the ‘item’ column it might not take too long to add in this information. The additional fields could also be omitted where the ‘item’ column has no volumes or different spellings. E.g. ‘Hederici Lexicon’ may be fine as it is. If the ‘standardised title’ and ‘volumes’ columns are left blank in this case then when my script reaches the record it will know to use ‘Hederici Lexicon’ as both original and standardised titles and to generate one single unnumbered book item record for it. We agreed that normalising the data prior to ingest would be the best approach and I will therefore wait until I receive updated data before I proceed further with this.
Also this week I generated a new version of a spreadsheet containing the records for one register for Gerry McKeever, who wanted borrowers, book items and book holding details to be included in addition to the main borrowing record. I also made a pretty major update to the CMS to enable books and borrower listings for a library to be filtered by year of borrowing in addition to filtering by register. Users can either limit the data by register or year (not both). They need to ensure the register drop-down is empty for the year filter to work, otherwise the selected register will be used as the filter. On either the ‘books’ or ‘borrowers’ tab in the year box they can add either a single year (e.g. 1774) or a range (e.g. 1770-1779). Then when ‘Go’ is pressed the data displayed is limited to the year or years entered. This also includes the figures in the ‘borrowing records’ and ‘Total borrowed items’ columns. Also, the borrowing records listed when a related pop-up is opened will only feature those in the selected years.
I also worked with Raymond in Arts IT Support and Geert, the editor of the Anglo-Norman Dictionary to complete the process of migrating the AND website to the new server. The website (https://anglo-norman.net/) is now hosted on the new server and is considerably faster than it was previously. We also took the opportunity the launch the Anglo-Norman Textbase, which I had developed extensively a few months ago. Searching and browsing can be found here: https://anglo-norman.net/textbase/ and this marks the final major item in my overhaul of the AND resource.
My last major task of the week was to start work on a database of ultrasound video files for the Speech Star project. I received a spreadsheet of metadata and the video files from Eleanor this week and began processing everything. I wrote a script to export the metadata into a three-table related database (speakers, prompts and individual videos of speakers saying the prompts) and began work on the front-end through which this database and the associated video files will be accessed. I’ll be continuing with this next week.
In addition to the above I also gave some advice to the students who are migrating the IJOSTS journal over the WordPress, had a chat with the DSL people about when we’ll make the switch to the new API and data, set up a WordPress site for Joanna Kopaczyk for the International Conference on Middle English, upgraded all of the WordPress sites I manage to the latest version of WordPress, made a few tweaks to the 17th Century Symposium website for Roslyn Potter, spoke to Kate Simpson in Information Studies about speaking to her Digital Humanities students about what I do and arranged server space to be set up for the Speak For Yersel project website and the Speech Star project website. I also helped launch the new Burns website: https://burnsc21-letters-poems.glasgow.ac.uk/ and updated the existing Burns website to link into it via new top-level tabs. So a pretty busy week!