Week Beginning 13th June 2022

I worked for several different projects this week.  For the Books and Borrowing project I processed and imported a further register for the Advocates library that had been digitised by the NLS.  I also continued with the interactive map of Chambers library borrowers, although I couldn’t spend as much time on this as I’d hoped as my access to Stirling University’s VPN had stopped working and without VPN access I can’t connect to the database and the project server.  It took a while to resolve the issue as access needs to be approved by some manager or other, but once it was sorted I got to work on some updates.

One thing I’d noticed last week was that when zooming and panning the historical map layer was throwing out hundreds of 403 Forbidden errors to the browser console.  This was not having any impact on the user experience, but was still a bit messy and I wanted to get to the bottom of the issue.  I had a very helpful (as always) chat with Chris Fleet at NLS Maps, who provided the historical map layer and he reckoned it was because the historical map only covers a certain area and moving beyond this was still sending requests for map tiles that didn’t exist.  Thankfully an option exists in Leaflet that allows you to set the boundaries for a map layer (https://leafletjs.com/reference.html#latlngbounds) and I updated the code to do just that, which seems to have stopped the errors.

I then returned to the occupations categorisation, which was including far too many options.  I therefore streamlined the occupations, displaying the top-level occupation only.  I think this works a lot better (although I need to change the icon colour for ‘unknown’).  Full occupation information is still available for each borrower via the popup.

I also had to change the range slider for opacity as standard HTML range sliders don’t allow for double-ended ranges.  We require a double-ended range for the subscription period and I didn’t want to have two range sliders that looked different on one page.  I therefore switched to a range slider offered by the jQuery UI interface library (https://jqueryui.com/slider/#range).  The opacity slider still works as before, it just looks a little different.  Actually, it works better than before, as the opacity now changes as you slide rather than only updating after you mouse-up.

I then began to implement the subscription period slider.  This does not yet update the data.  It’s been pretty tricky to implement this.  The range needs to be dynamically generated based on the earliest and latest dates in the data, and dates are both year and month, which need to be converted into plain integers for the slider and then reinterpreted as years and months when the user updates the end positions.  I think I’ve got this working as it should, though.  When you update the ends of the slider the text above that lists the months and years updates to reflect this.  The next step will be to actually filter the data based on the chosen period.  Here’s a screenshot of the map featuring data categorised by the new streamlined occupations and the new sliders displayed:

For the Speak For Yersel project I made a number of tweaks to the resource, which Jennifer and Mary are piloting with school children in the North East this week.  I added in a new grammatical question and seven grammatical quiz questions.  I tweaked the homepage text and updated the structure of questions 27-29 of the ‘sound about right’ activity.  I ensured that ‘Dumfries’ always appears as ‘Dumfries and Galloway’ in the ‘clever’ activity and follow-on and updated the ‘clever’ activity to remove the stereotype questions.  These were the ones where users had to rate the speakers from a region without first listening to any audio clips and Jennifer reckoned these were taking too long to complete.  I also updated the ‘clever’ follow-on to hide the stereotype options and switched the order of the listener and speaker options in the other follow-on activity for this type.

For the Speech Star project I replaced the data for the child speech error database with a new, expanded dataset and added in ‘Speaker Code’ as a filter option.  I also replicated the child speech and normalised speech databases from the clinical website we’re creating on the more academic teaching site we’re creating and also pulled in the IPA chart from Seeing Speech into this resource too.  Here’s a screenshot of how the child speech error database looks with the new ‘speaker code’ filter with ‘vowel disorder’ selected:

I also responded to Craig Lamont in Scottish literature with some further feedback on the structure of his Burns Manuscript Database spreadsheet, which is now shaping up nicely.  Craig had also sent me an updated spreadsheet with data for the Ramsay Gentle Shepherd performances project.  I’d set this up (interactive map, timeline and filterable tabular data) a few weeks ago, migrating it to the University’s T4 website management system.  All had worked then but when I logged into T4 and previewed the page I previously created I discovered it longer worked.  The page hadn’t been updated since the end of May and I had no idea what’s gone wrong.  I can only assume that the linked content (i.e. the links to the JavaScript files) had somehow become unlinked.  I decided, therefore, that it would be easier to just host the JavaScript files on another server I have direct access to rather than having to shoehorn it all into T4.  I made an updated version with the new dataset and this is working well.

I also made a couple of tweaks to the DSL this week, installing the TablePress plugin for the ancillary pages and creating a further alternative logo for the DSL’s Facebook posts.  I also returned to going some work for the Anglo-Norman Dictionary, offering some advice to the editor Geert about incorporating publications and overhauling how cross references are displayed in the Dictionary Management System.

I updated the ‘View Entry’ page in the DMS.  Previously it only included cross references FROM the entry you’re looking at TO any other entries.  I.e. it only displayed content when the entry was of type ‘xref’ rather than ‘main’.  Now in addition to this there’s a further section listing all cross references TO the entry you’re looking at from any entry of type ‘xref’ that links to it.

In addition there is a button allowing you to view all entries that include a cross reference to the current entry anywhere in their XML – i.e. where an <xref> tag that features the current entry’s slug is found at any level in any other main entry’s XML.  This code is hugely memory intensive to run, as basically all 27,464 main entries need to be pulled into the script, with the full XML contents of each checked for matching xrefs.  For this reason the page doesn’t run the code each time the ‘view entry’ page is loaded but instead only runs when you actively press the button.  It takes a few seconds for the script to process, but after it does the cross references are listed in the same manner as the ‘pure’ xrefs in the preceding sections.

Finally I participated in a Zoom-based focus group for the AHRC about the role of technicians in research projects this week.  It was great to participate to share my views on my role and to hear from other people with similar roles at other organisations.

Week Beginning 30th May 2022

It was a three-day week as Thursday and Friday were bank holidays for the Queen’s Platinum Jubilee.  I spent most of the available time working on the Books and Borrowers project.  I had a chat with RA Alex Deans about the data for the Chambers Library sub-project that we’re hoping to launch in July.  Although this data is already in the system it needs additional latitude and longitude data so we can position borrowers on an interactive map.  We decided to add this data and some other data using the ‘additional fields’ system in the CMS and Alex is hopefully going to get this done by next week.

I’d made a start on the API for the project last week, and this week I completed the endpoint that displays all of the data that will be needed for the ‘Browse Libraries’ page, which can be accessed as JSON or CSV data.  This includes counts of registers, borrowing records, books and borrowers plus a breakdown of the number of borrowings per year at each library that will be used for the stacked column chart.  The systems reside on servers at Stirling University, and their setup has the database on a different server to the code.  This means there is an overhead when sending queries to the database as each one needs to be sent as an HTTP request rather than dealt with locally.  This has led me to be a bit more efficient when constructing queries.  For example, rather than running individual ‘count’ queries for each library after running an initial query to retrieve all library details I’ve instead used subqueries as part of the initial query so all the data including the counts gets processed and returned by the database via one HTTP request.

With the data retrieval aspects of the ‘browse libraries’ page completed I then moved on to developing the page itself.  It has an introductory section (with placeholder text for now) then a map showing the locations of the libraries.  Any libraries that currently have lat/lng data appear on this map.  The markers are clustered when zoomed out, with the number referring to the number of libraries in the cluster.  I selected a map design that I thought fitted in with the site, but this might change, and I used an open book icon for the library map marker on a red background (to match the site’s header text colour) and again this may change.  You can hover over a marker to see the library name and press on a marker to open a popup containing a link to the library, the library name and alternative names, location, foundation date, type and statistics about registers, books, borrowers and records.

Beneath the map is a tabular view of the data.  This is the exact same data as is found on the map.  Library names are buttons leading to the library’s page.  You can change the order of the table by pressing on a heading (e.g. to see which library has the most books).  Pressing a second time reverses the order.  Below is a screenshot showing the map and the table, with the table ordered by number of borrowing records:

Beneath the table is a stacked column chart showing borrowings at the libraries over time that I created using the extremely useful HighCharts JavaScript library (See https://www.highcharts.com/demo).  At the moment the borrowing records start somewhere between 1700 and 1710 and end somewhere between 1890 and 1899.  Actually, there are some borrowing records beyond even this but are presumably mistakes (e.g. one had a year of ‘179’ or something like that).  As generating a graph with a bar for each year would result in about 200 bars I decided this wasn’t feasible and instead grouped borrowings into decades.  This sort of works, but we still have many decades at the start and end that only have a few records, but we may limit the decades we focus on.  We’re also visualising the data from 18 libraries in the chart, which is a lot.  This takes up a lot of space under the chart (where you can hover over a name to highlight the data in the bars).  However, you can open the menu to view the chart full screen, which makes it more legible.  You can also view the year data in a table by selecting the ‘data table’ option.  Below is a screenshot of the bar chart:

There are a couple of things I could do to make this more legible if required.  Firstly, we could use a stacked bar chart instead (https://www.highcharts.com/demo/bar-stacked).  The years would then be on the y-axis and we could have a very long chart with all of the years in place rather than aggregating to decades.  This would make it more difficult to view the legend and the x-axis tick marks, as you would need to scroll down to see them.  Secondly, we could stick with the decade view but then give the user the option of selecting a decade to view a new chart featuring the individual years in that decade.  This would make it harder for users to get the big picture all at once, although I guess the decade view would give that.

Also this week I checked up on the Speak For Yersel website, as we had sent the URL out to people with an interest in the Scots language at the end of last week.  When I checked on Wednesday we’d had 168 registered users.  These users had submitted 8,110 answers for the main questions plus 85 for the ‘drag onto map’ and 85 for the transcript.  606 of those main answers are from people who have chosen ‘outside Scotland’.  I also realised that I’d set the markers to be smaller if there were more than 100 answers on a map but the markers looked too small so I’ve updated things to make them the same size no matter how many answers there are.

My other main task for the week was to finalise the transfer of the Uist Saints website.  We managed to get the domain name ownership transferred over to Glasgow and paid the subscription fee for the next nine years and the version of the site hosted at Glasgow can now be found here: https://uistsaints.co.uk/

 

Week Beginning 23rd May 2022

I’d completed all of the outstanding tasks for ‘Speak For Yersel’ last week so this week I turned my attention to several other projects.  For the Books and Borrowing project I wrote a script to strip out duplicate author records from the data and reassign any books associated with the duplicates to the genuine author records.  The script iterated through each author in the ‘duplicates’ spreadsheet, found all rows where the ‘AID’ did not match the ‘AID to keep’ column, reassigned any book author records from the former to the latter and then deleted the author record.  The script deleted 310 duplicate authors and reassigned 735 books to other authors, making the data in the content management system a lot cleaner.

I then migrated the Uist Saints website to a server at Glasgow and got everything working at a temporary URL.  All looked fine to me, although there was an issue with the homepage that needed investigating.  This issue was present on the live site too, resulting in the page content cutting off and displaying a lot of blank space and no footer, with lots of errors being displayed in the console.  I did some investigation into the errors and discovered that these were being caused by some JavaScript embedded in the homepage that has been treated like HTML by WordPress.  It has added HTML line breaks (<br>) wherever there is a line break in the code, thereby breaking the JavaScript.  I updated the page to strip out all of the <br> tags and it now loads without any errors in the console but whatever the JavaScript is supposed to be doing still isn’t working and there’s still a huge expanse of empty space and then no footer.

The JavaScript appears to be attempting to display a map using the Leaflet mapping library, but using some sort of WordPress plugin to do so.  There are over 3000 lines of JavaScript code in the page, which is really crazy.  Every single marker on the map (e.g. “Cladh Choinnich (burial ground and site of chapel)” at [57.157715,-7.301283]) has its own script comprising around 70 lines of code.  Sofia, the project RA looked at the page and decided to try deleting the blocks of JavaScript, and this then seemed to solve to problem, which was great, as I was thinking I’d need to create a new map after somehow extracting all of the data.

I then moved on to the Ramsay ‘Gentle Shepherd’ data, and this week tackled the issue of importing the code I’d written into the University website’s T4 content management system.  I created a ‘one file’ version of the page that has everything incorporated in one single file – all the scripts, the data and the styles.  I was hoping I’d then be able to just upload this to T4 but I ran into a problem:

I selected the ‘Standard plain’ content type as I did for the Enlightenment map I created in T4 many years ago, but the ‘content’ box can only accept a maximum of 80,000 characters.  My ‘one file’ approach is around 404,000 characters so I can’t upload it.  I then wondered about using separate files, as I had done with the Enlightenment map, but the JSON data for the performances on its own is over 227,000 characters.  This data needs to be a single thing and can’t be split up into smaller chunks (at least not without then having to stitch the data back together in the JavaScript before it can be used every time someone loads the page which would have an impact on the speed of the page).

I notice that the Enlightenment map has a further content type called ‘_blank’ that isn’t available to me where the performance data is to go.  This type allows up to 150,000 characters.  Unfortunately this is still not big enough.  The Leaflet JavaScript library which I also need to upload is 141,000 characters so currently can’t be uploaded either. I then looked into uploading the JSON data as a media file and I managed to upload it, but apparently media files only become active in the system when they are linked to from a T4 page using T4’s method of linking to a file.  The JSON file would only ever be loaded in via an AJAX call from the JavaScript code so would never work.  However, I did realise that I could upload the JavaScript file with the JSON data stored directly within it as a media file and then link to this (and also the leaflet JavaScript file and the CSS files) from the T4 HTML file.  However, this wouldn’t work when using regular HTML tags to link to scripts and CSS files as T4 only activates media files when linked to using its own special way of inserting links.

A helpful guy called Rick in the Web Team suggested using the ‘standard’ content type and T4’s way of linking to files to get things working, and this did sort of work, but while the ‘standard’ content type allows you to manually edit the HTML, T4 then processes any HTML you enter, which included stripping out a lot of tags my code needed and overwriting other HTML tags, which was very frustrating.

However, I was able to view the source for the embedded media files in this template and then copy this into my ‘standard plain’ section and this seems to have worked.  There were other issues, though, such as that T4 applies its CSS styles AFTER any locally created styles meaning a lot of my custom styles were being overwritten.  I managed to find a way around this and the section of the page is now working if you preview it in T4.

Unfortunately to get this to work the JSON data needed to be embedded in the JavaScript file rather than loaded in as a separate file.  This is going to make it more difficult for non-technical people to edit the data directly in T4.  In order to do so someone would need to:  Download the ‘gspCode’ file in the Media Library, which T4 unhelpfully converts into a .txt file then rename the file to remove the .txt extension (so it ends in .js instead).  Then find the data array in the file, make the changes to it and then validate it in the handy JSON validator https://jsonlint.com/ before saving the JS file and uploading it as a replacement for the item in the Media Library.

With all of this out of the way I was hoping to begin work on the API and front-end for the Books and Borrowing project, and I did manage to make a start on this.  However, many further tweaks and updates came through from Jennifer Smith for the Speak For Yersel system, which we’re intending to sent out to selected people next week, and I ended up spending most of the rest of the week on this project instead.  This included several Zoom calls and implementing countless minor tweaks to the website content, including homepage text, updating quiz questions and answer options, help text, summary text, replacing images, changing styles and other such things.  I also updated the maps to set their height dynamically based on the height of the browser window, ensuring that the map and the button beneath it are visible without scrolling (but also including a minimum height so the map never gets too small).  I also made the maps wider and the question area narrower as there was previously quite a lot of wasted space with there was a 50/50 split between the two.

I also fixed a bug with the slider-based questions that was only affecting Safari that prevented the ‘next’ button from activating.  This was because the code that listened for the slider changing was set to do something when a slider was clicked on, but for it to work in Safari instead of ‘click’ the event needed to be ‘change’.  I also added in the new dictionary-based question type and added in the questions, although we then took these out again for now as we’d promised the DSL that the embedded school dictionary would only be used by the school children in our pilot.  I also added in a question about whether the user has been to university to the registration page and then cleared out all of the sample data and users that we’d created during our testing before actual users begin using the resource next week.

Week Beginning 16th May 2022

This week I finished off all of the outstanding work for the Speak For Yerself project. The other members of the team (Jennifer and Mary) are both on holiday so I finished off all of the tasks I had on my ‘to do’ list, although there will certainly be more to do once they are both back at work again.  The tasks I completed were a mixture of small tweaks and larger implementations.  I made tweaks to the ‘About’ page text and changed the intro text to the ‘more give your word’ exercise.  I then updated the age maps for this exercise, which proved to be pretty tricky and time-consuming to implement as I needed to pull apart a lot of the existing code.  Previously these maps showed ‘60+’ and ‘under 19’ data for a question, with different colour markers for each age group showing those who would say a term (e.g. ‘Scunnered’) and grey markers for each age group showing those who didn’t say the term.  We have completely changed the approach now.  The maps now default to showing ‘under 19’ data only, with different colours for each different term.  There is now an option in the map legend to switch to viewing the ‘60+’ data instead.  I added in the text ‘press to view’ to try and make it clearer that you can change the map.  Here’s a screenshot:

I also updated the ‘give your word’ follow-on questions so that they are now rated in a new final page that works the same way as the main quiz.  In the main ‘give your word’ exercise I updated the quiz intro text and I ensured that the ‘darker dots’ explanatory text has now been removed for all maps.  I tweaked a few questions to change their text or the number of answers that are selectable and I changed the ‘sounds about right’ follow-on ‘rule’ text and made all of the ‘rule’ words lower case.  I also made it so that when the user presses ‘check answers’ for this exercise a score is displayed to the right and the user is able to proceed directly to the next section without having to correct their answers.  They still can correct their answers if they want.

I then made some changes to the ‘She sounds really clever’ follow-on.  The index for this is now split into two sections, one for ‘stereotype’ data and one for ‘rating speaker’ data and you can view the speaker and speaker/listener results for both types of data.  I added in the option of having different explanatory text for each of the four perception pages (or maybe just two – one for stereotype data, one for speaker ratings) and when viewing the speaker rating data the speaker sound clips now appear beneath the map.  When viewing the speaker rating data the titles above the sliders are slightly different.  Currently when selecting the ‘speaker’ view the title is “This speaker from X sounds…” as opposed to “People from X sound…”.  When selecting the ‘speaker/listener’ view the title is “People from Y think this speaker from X sounds…” as opposed to “People from Y think people from X sound…”.  I also added a ‘back’ button to these perception follow-on pages so it’s easier to choose a different page.  Finally, I added some missing HTML <title> tags to pages (e.g. ‘Register’ and ‘Privacy’) and fixed a bug whereby the ‘explore more’ map sound clips weren’t working.

With my ‘Speak For Yersel’ tasks out of the way I could spend some time looking at other projects that I’d put on hold for a while.  A while back Eleanor Lawson contacted me about adding a new section to the Seeing Speech website where Gaelic speaker videos and data will be accessible, and I completed a first version this week.  I replicated the Speech Star layout rather than the /r/ & /l/ page layout as it seemed more suitable: the latter only really works for a limited number of records while the former works well with lots more (there are about 150 Gaelic records).  What this means is the data has a tabular layout and filter options.  As with Speech Star you can apply multiple filters and you can order the table by a column by clicking on its header (clicking a second time reverses the order).  I’ve also included the option to open multiple videos in the same window.  I haven’t included the playback speed options as the videos already include the clip at different speeds.  Here’s a screenshot of how the feature looks:

On Thursday I had a Zoom call with Laura Rattray and Ailsa Boyd to discuss a new digital edition project they are in the process of planning.  We had a really great meeting and their project has a lot of potential.  I’ve offered to give technical advice and write any technical aspects of the proposal as and when required, and their plan is to submit the proposal in the autumn.

My final major task for the week was to continue to work on the Ramsay ‘Gentle Shepherd’ data.  I overhauled the filter options that I implemented last week so they work in a less confusing way when multiple types are selected now.  I’ve also imported the updated spreadsheet, taking the opportunity to trim whitespace to cut down on strange duplicates in the filter options.  There are some typos you’ll need to fix in the spreadsheet, though (e.g. we have ‘Glagsgow’ and ‘Glagsow’) plus some dates still need to be fixed.

I then created an interactive map for the project and have incorporated the data for which there are latitude and longitude values.  As with the Edinburgh Gazetteer map of reform societies (https://edinburghgazetteer.glasgow.ac.uk/map-of-reform-societies/) the number of performances at a venue is displayed in the map marker.  Hover over a marker to see info about the venue.  Click on it to open a list of performances.  Note that when zoomed out it can be difficult to make out individual markers but we can’t really use clustering as on the Burns Supper map (https://burnsc21.glasgow.ac.uk/supper-map/) because this would get confusing:  we’d have clustered numbers representing the number of markers in a cluster and then induvial markers with a number representing the number of performances.  I guess we could remove the number of performances from the marker and just have this in the tooltip and / or popup, but it is quite useful to see all the numbers on the map.  Here’s a screenshot of how the map currently looks:

I still need to migrate all of this to the University’s T4 system, which I aim to tackle next week.

Also this week I had discussions about migrating an externally hosted project website to Glasgow for Thomas Clancy.  I received a copy of the files and database for the website and have checked over things and all is looking good.  I also submitted a request for a temporary domain and I should be able to get a version of the site up and running next week.  I also regenerated a list of possible duplicate authors in the Books and Borrowing system after the team had carried out some work to remove duplicates.  I will be able to use the spreadsheet I have now to amalgamate duplicate authors, a task which I will tackle next week.

Week Beginning 9th May 2022

I spent most of the week continuing with the Speak For Yersel website, which is now nearing completion.  A lot of my time was spent tweaking things that were already in place, and we had a Zoom call on Wednesday to discuss various matters too.  I updated the ‘explore more’ age maps so they now include markers for young and old who didn’t select ‘scunnered’, meaning people can get an idea of the totals.  I also changed the labels slightly and the new data types have been given two shades of grey and smaller markers, so the data is there but doesn’t catch the eye as much as the data for the selected term.  I’ve updated the lexical ‘explore more’ maps so they now actually have labels and the ‘darker dots’ text (which didn’t make much sense for many maps) has been removed.  Kinship terms now allow for two answers rather than one, which took some time to implement in order to differentiate this question type from the existing ‘up to 3 terms’ option.  I also updated some of the pictures that are used and added in an ‘other’ option to some questions.  I also updated the ‘Sounds about right’ quiz maps so that they display different legends that match the question words rather than the original questionnaire options.  I needed to add in some manual overrides to the scripts that generate the data for use in the site for this to work.

I also added in proper text to the homepage and ‘about’ page.  The former included a series of quotes above some paragraphs of text and I wrote a little script that highlighted each quote in turn, which looked rather nice.  This then led onto the idea of having the quotes positioned on a map on the homepage instead, with different quotes in different places around Scotland.  I therefore created an animated GIF based on some static map images that Mary had created and this looks pretty good.

I then spent some time researching geographical word clouds, which we had been hoping to incorporate into the site.  After much Googling it would appear that there is no existing solution that does what we want, i.e. take a geographical area and use this as the boundaries for a word cloud, featuring different coloured words arranged at various angles and sizes to cover the area.  One potential solution that I was pinning my hopes on was this one: https://github.com/JohnHenryEden/MapToWordCloud which promisingly states “Turn GeoJson polygon data into wordcloud picture of similar shape.”.  I managed to get the demo code to run, but I can’t get it to actually display a word cloud, even though the specifications for one are in the code.  I’ve tried investigating the code but I can’t figure out what’s going wrong.  No errors are thrown and there’s very little documentation.  All that happens is a map with a polygon area is displayed – no word cloud.

The word cloud aspects of the above are based on another package here: https://npm.io/package/wordcloud and this package allows you to specify a shape to use as an outline for the cloud, and one of the examples shows words taking up the shape of Taiwan: https://wordcloud2-js.timdream.org/#taiwan  However, this is a static image not an interactive map – you can’t zoom into it or pan around it.  One possible solution may be to create images of our regions, generate static word cloud images as with the above and then stitch the images together for form a single static map of Scotland.  This would be a static image, though, and not comparable to the interactive maps we use elsewhere in the website.  Programmatically stitching the individual region images together might also be quite tricky.  I guess another option would be to just allow users to select an individual region and view the static word cloud (dynamically generated based on the data available when the user selects to view it) for the selected region, rather than joining them all together.

I also looked at some further options that Mary had tracked down.  The word cloud on a leaflet map (http://hourann.com/2014/js-devs-dont-get-lost/leaflet-wordcloud.html?sydney) only uses a circle for the boundaries of the word cloud.  All of the code is written around the use of a circle (e.g. using diameters to work out placement) so couldn’t really be adapted to work with a complex polygon.  We could work out a central point for each region and have a circular word cloud positioned at that point, but we wouldn’t be able to make the words fill the entire region.  The second of Mary’s links (https://www.jasondavies.com/wordcloud/) as far as I can tell is just a standard word cloud generator with no geographical options.  The third option (https://github.com/peterschretlen/leaflet-wordcloud) has no demo or screenshot or much information about it and I’m afraid I can’t get it to work.

The final option (https://dagjomar.github.io/Leaflet.ParallaxMarker/) is pretty cool but it’s not really a word cloud as such.  Instead it’s a bunch of labels set to specific lat/lng points and given different levels which sets their size and behaviour on scroll.  We could use this to set the highest rated words to the largest level with lower rated words at lower level and position each randomly in a region, but it’s not really a word cloud and it would be likely that words would spill over into neighbouring regions.

Based on the limited options that appear to be out there, I think creating a working, interactive map-based word cloud would be a research project in itself and would take far more time than we have available.

Later on in the week Mary sent me the spreadsheet she’d been working on to list settlements found in postcode areas and to link these areas to the larger geographical regions we use.  This is exactly what we needed to fill in the missing piece in our system and I wrote a script that successfully imported the data.  For our 411 areas we now have 957 postcode records and 1638 settlement records.  After that I needed to make some major updates to the system.  Currently a person is associated with an area (e.g. ‘Aberdeen Southwest’) but I need to update this so that a person is associated with a specific settlement (e.g. ‘Ferryhill, Aberdeen’), which is then connected to the area and from the area to one of our 14 regions (e.g. ‘North East (Aberdeen)’).

I updated the system to make these changes and updated the ‘register’ form, which now features an autocomplete for the location – start typing a place and all matches appear.  Behind the scenes the location is saved and connected up to areas and regions, meaning we can now start generating real data, rather than a person being assigned a random area.  The perception follow-on now connects the respondent up with the larger region when selecting ‘listener is from’, although for now some of this data is not working.

I then needed to further update the registration page to add in an ‘outside Scotland’ option so people who did not grow up in Scotland can use the site.  Adding in this option actually broke much of the site:  registration requires an area with a geoJSON shape associated with the selected location otherwise it fails and the submission of answers requires this shape in order to generate a random marker point and this then failed when the shape wasn’t present.  I updated the scripts to fix these issues, meaning an answer submitted by an ‘outside’ person has a zero for both latitude and longitude, but then I also needed to update the script that gets the map data to ensure that none of these ‘outside’ answers were returned in any of the data used in the site (both for maps and for non-map visualisations such as the sliders).  So, much has changed and hopefully I haven’t broken anything whilst implementing these changes.  It does now mean that ‘outside’ people can now be included and we can export and use their data in future, even though it is not used in the current site.

Further tweaks I implemented this week included: changing the font sizes of some headings and buttons; renaming the ‘activities’ and ‘more’ pages as requested; adding ‘back’ buttons from all ‘activity’ and ‘more’ pages back to the index pages; adding an intro page to the click exercise as previously it just launched into the exercise whereas all others have an intro.  I also added summary pages to the end of the click and perception activities with links through to the ‘more’ pages and removed the temporary ‘skip to quiz’ option.  I also added progress bars to the click and perception activities.  Finally, I switched the location of the map legend from top right to top left as I realised when it was in the top right it was always obscuring Shetland whereas there’s nothing in the top left.  This has meant I’ve had to move the region label to the top right instead.

Also this week I continued to work on the Allan Ramsay ‘Gentle Shepherd’ performance data.  I added in faceted browsing to the tabular view, adding in a series of filter options for location, venue, adaptor and such things.  You can select any combination of filters (e.g. multiple locations and multiple years in combination).  When you select an item of one sort the limit options of other sorts update to only display those relevant to the limited data.  However, the display of limiting options can get a bit confusing once multiple limiting types have been selected.  I will try and sort this out next week.  There are also multiple occurrences of items in the limiting options (e.g. two Glasgows) because the data has spaces in some rows (‘Glasgow’ vs ‘Glasgow ‘) and I’ll need to see about trimming these out next time I import the data.

Also this week I arranged for the old DSL server to be taken offline, as the new website has now been operating successfully for two weeks.  I also had a chat with Katie Halsey about timescales for the development of the Books and Borrowers front-end.  Finally, I imported a new disordered paediatric speech dataset into the Speech Star website.  This included around double the number of records, new video files and a new ‘speaker code’ column.  Finally, I participated in a Zoom call for the Scottish Place-Names database where we discussed the various place-names surveys that are in progress and the possiblity of created an overarching search across all systems.

Week Beginning 11th April 2022

I was back at work on Monday this week after a lovely week off last week.  It was only a four-day week, however, as the week ended with the Good Friday holiday.  I’ll also be off next Monday too.  I had rather a lot to squeeze into the four working days.  For the DSL I did some further troubleshooting for integrating Google Analytics with the DSL’s new https://macwordle.co.uk/ site.  I also had discussions about the upcoming switchover to the new DSL website, which we scheduled in for the week after next, although later in the week it turned out that all of the data has already been finalised so I’ll begin processing it next week.

I participated in a meeting for the Historical Thesaurus on Tuesday, after which I investigated the server stats for the site, which needed fixing.  I also enquired about setting up a domain URL for one of the ‘ac.uk’ sites we host, and it turned out to be something that IT Support could set up really quickly, which is good to know for future reference.  I also had a chat with Craig Lamont about a database / timeline / map interface for some data for the Allan Ramsay project that he would like me to put together to coincide with a book launch at the end of May.  Unfortunately they want this to be part of the University’s T4 website, which makes development somewhat tricky but not impossible.  I had to spend some time familiarising myself with T4 again and arranging for access to the part of the system where the Ramsay content resides.  Now I have this sorted I’ve agreed to look into developing this in early May.  I also deleted a couple of unnecessary entries from the Anglo-Norman Dictionary after the editor requested their removal and created a new version of the requirements document for the front-end for the Books and Borrowing project following feedback form the project team on the previous version.

The rest of my week was spent on the Speak For Yerself project, for which I still have an awful lot to do and not much time to do it in.  I had a meeting with the team on Monday to go over some recent developments, and following that I tracked down a few bugs in the existing code (e.g. a couple of ‘undefined’ buttons in the ‘explore’ maps).  I then replaced all of the audio files in the ‘click’ exercise as the team had decided to use a standardised sentence spoken by many different regional speakers rather than having different speakers saying different things.  As the speakers were not always from the same region as the previous audio clips I needed to change the ‘correct’ regions and also regenerated the MP3 files and transcript data.

I then moved onto a major update to the system: working on the back end.  This took up the rest of the week and although in terms of the interface nothing much should have changed, behind the scenes things are very different.  I designed and implemented the database that will hold all of the data for the project, including information on respondents, answers and geographical areas.  I also migrated all of the activity and question data to this database too.  This was a somewhat time consuming and tedious task as I needed to input every question and every answer option into the database, but it needed to be done.  If we didn’t have the questions and answer options in the database alongside the answers then it would be rather tricky to analyse the data when the time comes, and this way everything is stored in one place and is all interconnected.  Previously the questions were held as JSON data within the JavaScript code for the site, but this was not ideal for the above reason and also because it made updating and manually accessing the question data a bit tricky.

With the new, much tidier arrangement all of the data is stored in a database on the server and the JavaScript code requests the data for an activity when the user loads the activity’s page.  All answer choices and transcript sections also now have their own IDs, which is what we need for recording which specific answer a user has selected.  For example, for the question with the ID 10 if the user selects ‘bairn’ the answer ID 36 will be logged for that user.  I’ve set up the database structure to hold these answers and have populated the postcode area table with all of the GeoJSON data for each area.

The next step will be to populate the table holding specific locations within a postcode area once this data is available.  After that I’ll be able to create the user information form and then I’ll need to update the activities so the selected options are actually saved.  In the meantime I began to implement the user management system.  A user icon now appears in the top right of every page, either with a green background and a tick if you’ve registered or a red background and a cross if you haven’t.  I haven’t created the registration form yet, but have just included a button to register, and when you press this you’ll be registered and this will be remembered in your browser even if you close your browser or turn your device off.  Press on the green tick user icon to view the details recorded about the registered person (none yet) and find an option to sign out if this isn’t you or you want to clear your details.  If you’re not registered and you try to access the activities the page will redirect you to the registration form as we don’t want unregistered people completing the activities.  I’ll continue with this next week, hopefully getting to the point where the choices a user makes are actually logged in the database.  After that I’ll be able to generate maps with real data, which will be an important step.

 

 

Week Beginning 31st January 2022

I split my time over many different projects this week.  For the Books and Borrowing project I completed the work I started last week on processing the Wigtown data, writing a little script that amalgamated borrowing records that had the same page order number on any page.  These occurrences arose when multiple volumes of a book were borrowed by a person at the same time and each volume was recorded separately.  My script worked perfectly and many such records were amalgamated.

I then moved onto incorporating images of register pages from Leighton into the CMS.  This proved to be a rather complicated process for one of the four registers as around 30 pages for the register had already been manually created in the CMS and had borrowing records associated with them.  However, these pages had been created in a somewhat random order, starting at folio number 25 and mostly being in order down to 43, at which point the numbers are all over the place, presumably because the pages were created in the order that they were transcribed.    As it stands the CMS relies on the ‘page ID’ order when generating lists of pages as ‘Folio Number’ isn’t necessarily in numerical order (e.g. front / back matter with Roman numerals).  If out of sequence pages crop up a lot we may have to think about adding a new ‘page order’ column, or possibly use the ‘previous’ and ‘next’ IDs to ascertain the order pages should be displayed.  After some discussion with the team it looks like pages are usually created in page order and Leighton is an unusual case, so we can keep using the auto-incrementing page ID for listing pages in the contents page.  I therefore generated a fresh batch of pages for the Leighton register then moved the borrowing records from the existing mixed up pages to the appropriate new page, then deleted the existing pages so everything is all in order.

For the Speak For Yersel project I created a new exercise whereby users are presented with a map of Scotland divided into 12 geographical areas and there are eight map markers in a box in the sea to the east of Scotland.  Each marker is clickable, and clicking on it plays a sound file.  Each marker is also draggable and after listening to the sound file the user should then drag the marker to whichever area they think the speaker in the sound file is from.  After dragging all of the markers the user can then press a ‘check answers’ button to see which they got right, and press a ‘view correct locations’ button which animates the markers to their correct locations on the map.  It was a lot of fun making the exercise and I think it works pretty well.  It’s still just an initial version and no doubt we will be changing it, but here’s a screenshot of how it currently looks (with one answer correct and the rest incorrect):

For the Speech Star project I made some further changes to the speech database.  Videos no longer autoplay, as requested.  Also, the tables now feature checkboxes beside them.  You can select up to four videos by pressing on these checkboxes.  If you select more than four the earliest one you pressed is deselected, keeping a maximum of four no matter how many checkboxes you try to click on.  When at least one checkbox is pressed the tab contents will slide down and a button labelled ‘Open selected videos’ will appear.  If you press on this a wider popup will open, containing all of your chosen videos and the metadata about each.  This has required quite a lot of reworking to implement, but it seemed to be working well, until I realised that while the multiple videos load and play successfully in Firefox, in Chrome and MS Edge (which is based on Chrome) only the final video loads in properly, with only audio playing on the other videos.  I’ll need to investigate this further next week.  But here’s a screenshot of how things look in Firefox:

Also this week I spoke to Thomas Clancy about the Place-names of Iona project, including discussing how the front-end map will function (Thomas wants an option to view all data on a single map, which should work although we may need to add in clustering at higher zoom levels.  We also discussed how to handle external links and what to do about the elements database, that includes a lot of irrelevant elements from other projects.

I also had an email conversation with Ophira Gamliel in Theology about a proposal she’s putting together that will involve an interactive map, gave some advice to Diane Scott about cookie policy pages, worked with Raymond in Arts IT Support to fix an issue with a server update that was affecting the playback of videos on the Seeing Speech and Dynamic Dialects websites and updated a script that Fraser Dallachy needed access to for his work on a Scots Thesaurus.

Finally, I had some email conversations with the DSL people and made an update to the interface of the new DSL website to incorporate an ‘abbreviations’ button, which links to the appropriate DOST or SND abbreviations page.

 

 

Week Beginning 24th January 2022

I had a very busy week this week, working on several different projects.  For the Books and Borrowing project I participated in the team Zoom call on Monday to discuss the upcoming development of the front-end and API for the project, which will include many different search and browse facilities, graphs and visualisations.  I followed this up with a lengthy email to the PI and Co-I where I listed some previous work I’ve done and discussed some visualisation libraries we could use.  In the coming weeks I’ll need to work with them to write a requirements document for the front-end.  I also downloaded images from Orkney library, uploaded all of them to the server and generated the necessary register and page records.  One register with 7 pages already existed in the system and I ensured that page images were associated with these and the remaining pages of the register fit in with the existing ones.  I also processed the Wigtown data that Gerry McKeever had been working on, splitting the data associated with one register into two distinct registers, uploading page images and generating the necessary page records.  This was a pretty complicated process, and I still need to complete the work on it next week, as there are several borrowing records listed as separate rows when in actual fact they are merely another volume of the same book borrowed at the same time.  These records will need to be amalgamated.

For the Speak For Yersel project I had a meeting with the PI and RA on Monday to discuss updates to the interface I’ve been working on, new data for the ‘click’ exercise and a new type of exercise that will precede the ‘click’ exercise and will involve users listening to sound clips then dragging and dropping them onto areas of a map to see whether they can guess where the speaker is from.  I spent some time later in the week making all of the required changes to the interface and the grammar exercise, including updating the style used for the interactive map and using different marker colours.

I also continued to work on the speech database for the Speech Star project based on feedback I received about the first version I completed last week.  I added in some new introductory text and changed the order of the filter options.  I also made the filter option section hidden by default as it takes up quite a lot of space, especially on narrow screens.  There’s now a button to show / hide the filters, with the section sliding down or up.  If a filter option is selected the section remains visible by default.  I also changed the colour of the filter option section to a grey with a subtle gradient (it gets lighter towards the right) and added a similar gradient to the header, just to see how it looks.

The biggest update was to the filter options, which I overhauled so that instead of a drop-down list where one option in each filter type can be selected there are checkboxes for each filter option, allowing multiple items of any type to be selected.  This was a fairly large change to implement as the way selected options are passed to the script and the way the database is queried needed to be completely changed.  When an option is selected the page immediately reloads to display the results of the selection and this can also change the contents of the other filter option boxes – e.g. selecting ‘alveolar’ limits the options in the ‘sound’ section.  I also removed the ‘All’ option and left all checkboxes unselected by default.  This is how filters on clothes shopping sites do it – ‘all’ is the default and a limit is only applied if an option is ticked.

I also changed the ‘accent’ labels as requested, changed the ‘By Prompt’ header to ‘By Word’ and updated the order of items in the ‘position’ filter.  I also fixed an issue where ‘cheap’ and ‘choose’ were appearing in a column instead of the real data.  Finally, I made the overlay that appears when a video is clicked on darker so it’s more obvious that you can’t click on the buttons.  I did investigate whether it was possible to have the popup open while other page elements were still accessible but this is not something that the Bootstrap interface framework that I’m using supports, at least not without a lot of hacking about with its source code.  I don’t think it’s worth pursuing this as the popup will cover much of the screen on tablets / phones anyway, and when I add in the option to view multiple videos the popup will be even larger.

Also this week I made some minor tweaks to the Burns mini-project I was working on last week and had a chat with the DSL people about a few items, such as the data import process that we will be going through again in the next month or so and some of the outstanding tasks that I still need to tackle with the DSL’s interface.

I also did some work for the AND this week, investigating a weird timeout error that cropped up on the new server and discussing how best to tackle a major update to the AND’s data.  The team have finished working on a major overhaul of the letter S and this is now ready to go live.  We have decided that I will ask for a test instance of the AND to be set up so I can work with the new data, testing out how the DMS runs on the new server and how it will cope with such a large update.

The editor, Geert, had also spotted an issue with the textbase search, which didn’t seem to include one of the texts (Fabliaux) he was searching for.  I investigated the issue and it looked like the script that extracted words from pages may have silently failed in some cases.  There are 12,633 page records in the textbase, each of which has a word count.  When the word count is greater than zero my script processes the contents of the page to generate the data for searching.  However, there appear to be 1889 pages in the system that have a word count of zero, including all of Fabliaux.  Further investigation revealed that my scripts expect the XML to be structured with the main content in a <body> tag.  This cuts out all of the front matter and back matter from the searches, which is what we’d agreed should happen and thankfully accounts for many of the supposedly ‘blank’ pages listed above as they’re not the actual body of the text.

However, Fabliaux doesn’t include the <body> tag in the standard way.  In fact, the XML file consists of multiple individual texts, each of which has a separate <body> tag.  As my script didn’t find a <body> in the expected place no content was processed.  I ran a script to check the other texts and the following also have a similar issue:  gaunt1372 (710 pages) and polsongs (111 pages), in addition to the 37 pages of Fabliaux.  Having identified these I could update my script that generates search words and re-ran it for these texts, fixing the issue.

Also this week I attended a Zoom-based seminar on ‘Digitally Exhibiting Textual Heritage’ that was being run by Information Studies.  This featured four speakers from archives, libraries and museums discussing how digital versions of texts can be exhibited, both in galleries and online.  Some really interesting projects were discussed, both past and present.  This included the BL’s ‘Turning the Pages’ system (http://www.bl.uk/turning-the-pages/) , some really cool transparent LCD display cases (https://crystal-display.com/transparent-displays-and-showcases/) that allow images to be projected on clear glass while objects behind the panel are still visible.  3d representations of gallery spaces were discussed (e.g. https://www.lib.cam.ac.uk/ghostwords), as were ‘long form narrative scrolls’ such as https://www.nytimes.com/projects/2012/snow-fall/index.html#/?part=tunnel-creek,  http://www.wolseymanuscripts.ac.uk/ and https://stories.durham.ac.uk/journeys-prologue/.  There is a tool that can be used to create these here: https://shorthand.com/.  It was a very interesting session!

Week Beginning 10th January 2022

I continued to work on the Books and Borrowing project for a lot of this week, completing some of the tasks I began last week and working on some others.  We ran out of server space for digitised page images last week, and although I freed up some space by deleting a bunch of images that were no longer required we still have a lot of images to come.  The team estimates that a further 11,575 images will be required.  If the images we receive for these pages are comparable to the ones from the NLS, which average around 1.5Mb each, then 30Gb should give us plenty of space.  However, after checking through the images we’ve received from other digitisation units it turns out that the  NLS images are a vit of an outlier in term of file size and generally 8-10Mb is more usual.  If we use this as an estimate then we would maybe require 120Gb-130Gb of additional space.  I did some experiments with resizing and changing the image quality of one of the larger images, managing to bring an 8.4Mb image down to 2.4Mb while still retaining its legibility.  If we apply this approach to the tens of thousands of larger images we have then this would result in a considerable saving of storage.  However, Stirling’s IT people very kindly offered to give us a further 150Gb of space for the images so this resampling process shouldn’t be needed for now at least.

Another task for the project this week was to write a script to renumber the folio numbers for the 14 volumes from the Advocates Library that I noticed had irregular numbering.  Each of the 14 volumes had different issues with their handwritten numbering, so I had to tailor my script to each volume in turn, and once the process was complete the folio numbers used to identify page images in the CMS (and eventually in the front-end) entirely matched the handwritten numbers for each volume.

My next task for the project was to import the records for several volumes from the Royal High School of Edinburgh but I ran into a bit of an issue.  I had previously been intending to extract the ‘item’ column and create a book holding record and a single book item record for each distinct entry in the column.  This would then be associated with all borrowing records in RHS that also feature this exact ‘item’.  However, this is going to result in a lot of duplicate holding records due to the contents of the ‘item’ column including information about different volumes of a book and/or sometimes using different spellings.

For example, in SL137142 the book ‘Banier’s Mythology’ appears four times as follows (assuming ‘Banier’ and ‘Bannier’ are the same):

  1. Banier’s Mythology v. 1, 2
  2. Banier’s Mythology v. 1, 2
  3. Bannier’s Myth 4 vols
  4. Bannier’s Myth. Vol 3 & 4

My script would create one holding and item record for ‘Banier’s Mythology v. 1, 2’ and associate it with the first two borrowing records but the 3rd and 4th items above would end up generating two additional holding / item records which would then be associated with the 3rd and 4th borrowing records.

No script I can write (at least not without a huge amount of work) would be able to figure out that all four of these books are actually the same, or that there are actually 4 volumes for the one book, each requiring its own book item record, and that volumes 1 & 2 need to be associated with borrowing records 1&2 while all 4 volumes need to be associated with borrowing record 3 and volumes 3&4 need to be associated with borrowing record 4.  I did wonder whether I might be able to automatically extract volume data from the ‘item’ column but there is just too much variation.

We’re going to have to tackle the normalisation of book holding names and the generation of all required book items for volumes at some point and this either needs to be done prior to ingest via the spreadsheets or after ingest via the CMS.

My feeling is that it might be simpler to do it via the spreadsheets before I import the data.  If we were to do this then the ‘Item’ column would become the ‘original title’ and we’d need two further columns, one for the ‘standardised title’ and one listing the volumes, consisting of a number of each volume separated with a comma.  With the above examples we would end up with the following (with a | representing a column division):

  1. Banier’s Mythology v. 1, 2 | Banier’s Mythology | 1,2
  2. Banier’s Mythology v. 1, 2 | Banier’s Mythology | 1,2
  3. Bannier’s Myth 4 vols | Banier’s Mythology | 1,2,3,4
  4. Bannier’s Myth. Vol 3 & 4 | Banier’s Mythology | 3,4

If each sheet of the spreadsheet is ordered alphabetically by the ‘item’ column it might not take too long to add in this information.  The additional fields could also be omitted where the ‘item’ column has no volumes or different spellings.  E.g. ‘Hederici Lexicon’ may be fine as it is.  If the ‘standardised title’ and ‘volumes’ columns are left blank in this case then when my script reaches the record it will know to use ‘Hederici Lexicon’ as both original and standardised titles and to generate one single unnumbered book item record for it.  We agreed that normalising the data prior to ingest would be the best approach and I will therefore wait until I receive updated data before I proceed further with this.

Also this week I generated a new version of a spreadsheet containing the records for one register for Gerry McKeever, who wanted borrowers, book items and book holding details to be included in addition to the main borrowing record.  I also made a pretty major update to the CMS to enable books and borrower listings for a library to be filtered by year of borrowing in addition to filtering by register.  Users can either limit the data by register or year (not both).  They need to ensure the register drop-down is empty for the year filter to work, otherwise the selected register will be used as the filter.  On either the ‘books’ or ‘borrowers’ tab in the year box they can add either a single year (e.g. 1774) or a range (e.g. 1770-1779).  Then when ‘Go’ is pressed the data displayed is limited to the year or years entered.  This also includes the figures in the ‘borrowing records’ and ‘Total borrowed items’ columns.  Also, the borrowing records listed when a related pop-up is opened will only feature those in the selected years.

I also worked with Raymond in Arts IT Support and Geert, the editor of the Anglo-Norman Dictionary to complete the process of migrating the AND website to the new server.  The website (https://anglo-norman.net/) is now hosted on the new server and is considerably faster than it was previously.  We also took the opportunity the launch the Anglo-Norman Textbase, which I had developed extensively a few months ago.  Searching and browsing can be found here: https://anglo-norman.net/textbase/ and this marks the final major item in my overhaul of the AND resource.

My last major task of the week was to start work on a database of ultrasound video files for the Speech Star project.  I received a spreadsheet of metadata and the video files from Eleanor this week and began processing everything.  I wrote a script to export the metadata into a three-table related database (speakers, prompts and individual videos of speakers saying the prompts) and began work on the front-end through which this database and the associated video files will be accessed.  I’ll be continuing with this next week.

In addition to the above I also gave some advice to the students who are migrating the IJOSTS journal over the WordPress, had a chat with the DSL people about when we’ll make the switch to the new API and data, set up a WordPress site for Joanna Kopaczyk for the International Conference on Middle English, upgraded all of the WordPress sites I manage to the latest version of WordPress, made a few tweaks to the 17th Century Symposium website for Roslyn Potter, spoke to Kate Simpson in Information Studies about speaking to her Digital Humanities students about what I do and arranged server space to be set up for the Speak For Yersel project website and the Speech Star project website.  I also helped launch the new Burns website: https://burnsc21-letters-poems.glasgow.ac.uk/ and updated the existing Burns website to link into it via new top-level tabs.  So a pretty busy week!

Week Beginning 6th December 2021

I spent a bit of time this week writing as second draft of a paper for DH2022 after receiving feedback from Marc.  This one targets ‘short papers’ (500-750 words) and I managed to get it submitted before the deadline on Friday.  Now I’ll just need to see if it gets accepted – I should find out one way or the other in February.  I also made some further tweaks to the locution search for the Anglo-Norman Dictionary, ensuring that when a term appears more than once the result is repeated for each occurrence, appearing in the results grouped by each word that matches the term.  So for example ‘quatre tempres, tens’ now appears twice, once amongst the ‘tempres’ and once amongst the ‘tens’ results.

I also had a chat with Heather Pagan about the Irish Dictionary eDIL (http://www.dil.ie/) who are hoping to rework the way they handle dates in a similar way to the AND.  I said that it would be difficult to estimate how much time it would take without seeing their current data structure and getting more of an idea of how they intend to update it, and also what updates would be required to their online resource to incorporate the updated date structure, such as enhanced search facilities and whether further updates to their resource would also be part of the process.  Also whether any back-end systems would also need to be updated to manage the new data (e.g. if they have a DMS like the AND).

Also this week I helped out with some issues with the Iona place-names website just before their conference started on Thursday.  Someone had reported that the videos of the sessions were only playing briefly and then cutting out, but they all seemed to work for me, having tried them on my PC in Firefox and Edge and on my iPad in Safari.  Eventually I managed to replicate the issue in Chrome on my desktop and in Chrome on my phone, and it seemed to be an issue specifically related to Chrome, and didn’t affect Edge, which is based on Chrome.  The video file plays and then cuts out due to the file being blocked on the server.  I can only assume that the way Chrome accesses the file is different to other browsers and it’s sending multiple requests to the server which is then blocking access due to too many requests being sent (the console in the browser shows a 403 Forbidden error).  Thankfully Raymond at Arts IT Support was able to increase the number of connections allowed per browser and this fixed the issue.  It’s still a bit of a strange one, though.

I also had a chat with the DSL people about when we might be able to replace the current live DSL site with the ‘new’ site, as the server the live site is on will need to be decommissioned soon.  I also had a bit of a catch-up with Stevie Barrett, the developer in Celtic and Gaelic, and had a video call with Luca and his line-manager Kirstie Wild to discuss the current state of Digital Humanities across the College of Arts.  Luca does a similar job to me at college-level and it was good to meet him and Kirstie to see what’s been going on outside of Critical Studies.  I also spoke to Jennifer Smith about the Speak For Yersel project, as I’d not heard anything about it for a couple of weeks.  We’re going to meet on Monday to take things further.

I spent the rest of the week working on the radar diagram visualisations for the Historical Thesaurus, completing an initial version.  I’d previously created a tree browser for the thematic headings, as I discussed last week.  This week I completed work on the processing of data for categories that are selected via the tree browser.  After the data is returned the script works out which lexemes have dates that fall into the four periods (e.g. a word with dates 650-9999 needs to appear in all four periods).  Words are split by Part of speech, and I’ve arranged the axes so that N, V, Aj and Av appear first (if present), with any others following on.  All verb categories have also been merged.

I’m still not sure how widely useful these visualisations will be as they only really work for categories that have several parts of speech.  But there are some nice ones.  See for example a visualisation of ‘Badness/evil’, ‘Goodness, acceptability’ and ‘Mediocrity’ which shows words for ‘Badness/evil’ being much more prevalent in OE and ME while ‘Mediocrity’ barely registers, only for it and ‘Goodness, acceptability’ to grow in relative size EModE and ModE:

I also added in an option to switch between visualisations which use total counts of words in each selected category’s parts of speech and visualisations that use percentages.  With the latter the scale is fixed at a maximum of 100% across all periods and the points on the axes represent the percentage of the total words in a category that are in a part of speech in your chosen period.  This means categories of different sizes are more easy to compare, but does of course mean that the relative sizes of categories is not visualised.  I could also add a further option that fixes the scale at the maximum number of words in the largest POS so the visualisation still represents relative sizes of categories but the scale doesn’t fluctuate between periods (e.g. if there are 363 nouns for a category across all periods then the maximum on the scale would stay fixed at 363 across all periods, even if the maximum number of nouns in OE (for example) is 128.  Here’s the above visualisation using the percentage scale:

The other thing I did was to add in a facility to select a specific category and turn off the others.  So for example if you’ve selected three categories you can press on a category to make it appear bold in the visualisation and to hide the other categories.  Pressing on a category a second time reverts back to displaying all.  Your selection is remembered if you change the scale type or navigate through the periods.  I may not have much more time to work on this before Christmas, but the next thing I’ll do is to add in access to the lexeme data behind the visualisation.  I also need to fix a bug that is causing the ModE period to be missing a word in its counts sometimes.