Week seven of lockdown continued in much the same fashion as the preceding weeks, the only difference being Friday was a holiday to mark the 75th anniversary of VE day. I spent much of the four working days on the development of the content management system for the Books and Borrowing project. The project RAs will start using the system in June and I’m aiming to get everything up and running before then so this is my main focus at the moment. I also had a Zoom meeting with project PI Katie Halsey and Co-I Matt Sangster on Tuesday to discuss the requirements document I’d completed last week and the underlying data structures I’d defined in the weeks before. Both Katie and Matt were very happy with the document, although Matt had a few changes he wanted made to the underlying data structures and the CMS. I made the necessary changes to the data design / requirements document and the project’s database that I’d set up last week. The changes were:
Borrowing spans have now been removed from libraries and these will instead be automatically inferred based on the start and end dates of ledger records held in these libraries. Ledgers now have a new ‘ledger type’ field which currently allows the choice of ‘Professorial’, ‘Student’ or ‘Town’. This field will allow borrowing spans for libraries to be altered based on a selected ledger type. The way occupations for borrowers is recorded has been updated to enable both original occupations from the records and a normalised list of occupations to be recorded. Borrowers may not have an original occupation but still might have a standardised occupation so I’ve decided to use the occupations table as previously designed to hold information about standardised occupations. A borrower may have multiple standardised occupations. I have also added a new ‘original occupation’ field to the borrower record where any number of occupations found for the borrower in the original documentation (e.g. river watcher) can be added if necessary. The book edition table now has an ‘other authority URL’ field and an ‘other authority type’ field which can be used if ESTC is not appropriate. The ‘type’ currently features ‘Worldcat’, ‘CERL’ and ‘Other’ and ‘Language’ has been moved from Holding to Edition. Finally, in Book Holding the short title is now original title and long title is now standardised title while the place and date of publication fields have been removed as the comparable fields at Edition level will be sufficient.
In terms of the development of the CMS, I created a Bootstrap-based interface for the system, which currently just uses the colour scheme I used for Matt’s pilot 18th Century Borrowing project. I created the user authentication scripts and the menu structure and then started to create the actual pages. So far I’ve created a page to add a new library record and all of the information associated with a library, such as any number of sources. I then created the facility to browse and delete libraries and the main ‘view library’ page, which will act as a hub through which all book and borrowing records associated with the library will be managed. This page has a further tab-based menu with options to allow the RA to view / add ledgers, additional fields, books and borrowers, plus the option to edit the main library information. So far I’ve completed the page to edit the library information and have started work on the page to add a ledger. I’m making pretty good progress with the CMS, but there is still a lot left to do. Here’s a screenshot of the CMS if you’re interested in how it looks:
Also this week I had a Zoom meeting the Marc Alexander and Fraser Dallachy to discuss update to the Historical Thesaurus as we head towards a second edition. This will include adding in new words from the OED and new dates for existing words. My new date structure will also go live, so there will need to be changes to how the timelines work. Marc is hoping to go live with new updates in August. We also discussed the ‘guess the category’ quiz, with Marc and Fraser having some ideas about limiting the quiz to certain categories, or excluding other categories that might feature inappropriate content. We may also introduce a difficulty level based on date, with an ‘easy’ version only containing words that were in use for a decent span of time in the past 200 years.
Other work I did this week included making some tweaks to the data for Gerry McKeever’s interactive map, fixing an issue with videos continuing to play after the video overlay was closed for Paul Malgrati’s Burns Supper map, replying to a query from Alasdair Whyte about his Place-names of Mull and Ulva project and looking into an issue for Fraser’s Scots Thesaurus project which unfortunately I can’t do anything about as the scripts I’d created for this (which needed to be let running for several days) are on the computer in my office. If this lockdown ever ends I’ll need to tackle this issue then.
This was the fifth week of lockdown and my first full week back after the Easter holidays, which as with previous weeks I needed to split between working and home-schooling my son. There was some issue with the database on the server that powers many of the project websites this week, meaning all of those websites stopped working. I had to spend some time liaising with Arts IT Support to get the issue sorted (as I don’t have the necessary server-level access to fix such matters) and replying to the many emails from the PIs of projects who understandably wanted to know why their website was offline. The server was unstable for about 24 hours, but has thankfully been working without issue since then.
Alison Wiggins got in touch with me this week to discuss the content management system for the Mary, Queen of Scots Letters sub-project, which I set up for her last year as part of her Archives and Writing Lives project. There is now lots of data in the system and Alison is editing it and wanted me to make some changes to the interface to make the process a little swifter. I changed how sorting works on the ‘browse documents’ and ‘browse parties’ pages. The pages are paginated and previously the sorted only affected the subset of records found on the current page rather than sorting the whole dataset. I updated this so that sorting now reorganises everything, and I also updated the ‘date sorting’ column so that it now uses the ‘sort_date’ field rather than the ‘display_date’ field. Alison had also noticed that the ‘edit party’ page wasn’t working and I discovered that there was a bug on this page that was preventing updates being saved in the database, which I fixed. I also created a new ‘Browse Collections’ page and added it to the top menu. This is a pretty simple page that lists the distinct collections alphabetically and for each lists their associated archives and documents, each with a link through to the relevant ‘edit’ page. Finally, I gave Alison some advice on editing the free-text fields, which use the TinyMCE editor, to strip out unwanted HTML that has been pasted into them and thought about how we might present this data to the public.
I also responded to a query from Matthew Creasy about the website for a new project he is working on. I set up a placeholder website for this project a couple of months ago and Matthew is now getting close to the point where he wants the website to go live. Gerry McKeever also go in touch with me to ask whether I would write some sections about the technology behind the interactive map I created for his Regional Romanticism project for a paper he is putting together. We talked a little about the structure of this and I’ll write the required sections when he needs them.
Other than these issues I spent the bulk of the week working on the Books and Borrowing project. Katie and Matt got back with some feedback on the data description document that I completed and sent to them before Easter and I spent some time going through this feedback and making an updated version of the document. After sending the document off I started working on a description of the content management system. This required a lot of thought and planning as I needed to consider how all of the data as defined in the document would be added, edited and deleted in the most efficient and easy to use manner. By the end of the week I’d written some 2,500 words about the various features of the CMS, but there is still a lot to do. I’m hoping to have a version completed and sent off to Katie and Matt early next week.
My other big task of the week was to work with the data for the Anglo-Norman Dictionary again. As mentioned previously, this project’s data and systems are in a real mess and I’m trying to get it all sorted along with the project’s Editor, Heather Pagan. Previously we’d figured out that there was a version of the AND data in a file called ‘all.xml’, but that it did not contain the updates to the online data from the project’s data management system and we instead needed to somehow extract the data relating to entries from the online database.
Back in February when looking through the documentation again I discovered where the entry data was located in a file called ‘entry_hash’ within the directory ‘/var/data’. I stated that the data was stored in a Berkeley DB and gave Heather a link to a place where the database files could be downloaded (https://www.oracle.com/database/technologies/related/berkeleydb-downloads.html)
I spent some time this week trying to access the data. It turns out that the download is not for a nice, standalone database program but is instead a collection of files that only seem to do anything when they are called from your own program written in something like C or Java. There was a command called ‘db_dump’ that would supposedly take the binary hash file and export it as plain text. This did work, in that the file could then be read in a text editor, but it was unfortunately still a hash file – just a series of massively long lines of numbers.
What I needed was just some way to view the contents of the file, and thankfully I came across this answer: https://stackoverflow.com/a/19793412 which suggests using the Python programming language to export the contents from a hash file. However, Python dropped support for the ‘dbhash’ function years ago so I had to track down and install a version of Python from 2010 for this to work. Also, I’ve not really used Python before so that took some getting used to. Thankfully with a bit of tweaking I was able to write a short Python script that appears to read through the ‘entry_hash’ file and output each line as plain text. I’m including the script here for future reference:
f = open(“testoutput.txt”,”w+”)
for k, v in dbhash.open(“entry_hash”).iteritems():
The resulting file includes XML entries and is 3,969,350 lines long. I sent this to Heather for her to have a look at, but Heather reckoned that some updates to the data were not present in this output. I wrote a little script that counts the number of <entry> elements in each of the XML files (all.xml and entry_hash.xml) and the former has 50426 entries while the latter has 53945. So the data extracted from the ‘entry_hash’ file is definitely not the same. The former file is about 111Mb in size while the latter is 133Mb so there’s definitely a lot more content, which I think is encouraging. Further investigation showed that the ‘all.xml’ file was actually generated in 2015 while the data I’ve exported is the ‘live’ data as it is now, which is good news. However, it would appear that data in the data management system that has not yet been published is stored somewhere else. As this represents two years of work it is data that we really need to track down.
I went back through the documentation of the old system, which really is pretty horrible to ready and unnecessarily complicated. Plus there are multiple versions of the documentation without any version control stating which is the most update to date version. I have a version in Word and a PDF containing images of scanned pages of a printout. Both differ massively without it being clear which superseded which. It turns out the scanned version is likely to be the most up to date, but of course being badly scanned images that are all wonky it’s not possible to search the text and attempting OCR didn’t work. After a lengthy email conversation with Heather we realised we would need to get back into the server to try and figure out where the data for the DMS was located. Heather needed to be in her office at work to do this and on Friday she managed to get access and via a Zoom call I was able to see the server and discuss the potential data locations with her. It looks like all of the data that has been worked on in the DMS but has yet to be integrated with the public site is located in a directory called ‘commit-out’. This contains more than 13,000 XML files, which I now have access to. If we can combine this with the data from ‘entry_hash’ and the data Heather and her colleague Geert have been working on for the letters R and S then we should have the complete dataset. Of course it’s not quite so simple as whilst looking through the server we realised that there are many different locations where a file called ‘entry_hash’ is found and no clear way of knowing which is the current version of the public data and which is just some old version that is not in use. What a mess. Anyway, progress has been made and our next step is to check that the ‘commit-out’ files do actually represent all of the changes made in the DMS and that the version of ‘entry_hash’ that I have so far extracted is the most up to date version.
I was on holiday for all of last week and Monday and Tuesday this week. My son and I were supposed to be visiting my parents for Easter, but we were unable to do so due to the lockdown and instead had to find things to amuse ourselves with around the house. I answered a few work emails during this time, including alerting Arts IT Support to some issues with the WordPress server and responding to a query from Ann Fergusson at the DSL. I returned to work (from home, of course) on Wednesday and spent the three days working on various projects.
For the Books and Borrowers project I spent some time downloading and looking through the digitised and transcribed borrowing registers of St. Andrews. They have made three registers from the second half of the 18th century available via a Wiki interface (see https://arts.st-andrews.ac.uk/transcribe/index.php?title=Main_Page) and we were given access to all of these materials that had been extracted and processed by Patrick McCann, who I used to work very closely with back when we were both based at HATII and worked for the Digital Curation Centre. Having looked through the materials it’s clear that we will be able to use the transcriptions, which will be a big help. The dates will probably need to be manually normalised, though, and we will need access to higher resolution images than the ones we have been given in order to make a zoom and pan interface using them.
I also updated the introductory text for Gerry McKeever’s interactive map of the novel Paul Jones, and I think this feature is now ready to go live, once Gerry want to launch it. I also fixed an issue with the Editing Robert Burns website that was preventing the site editors (namely Craig Lamont) from editing blog posts. I also created a further new version of the Burns Supper map for Paul Malgrati. This version incorporates updated data, which has greatly increased the number of Suppers that appear on the map and I also changed the way videos work. Previously if an entry had a link to a video then a button was added to the entry that linked through to the externally hosted video site (which could be YouTube, Facebook, Twitter or some other site). Instead, the code now identifies the origin of the video and I’ve managed to embed players from YouTube, Facebook and Twitter. These now open the videos in the same drop-down overlay as the images. The YouTube and Facebook players are centre aligned but unfortunately Twitter’s player displays to the left and can’t be altered. Also, the YouTube and Facebook players expect the width and height of the player to be specified. I’ve taken these from the available videos, but ideally the desired height and width should be stored as separate columns in the spreadsheet so these can be applied to each video as required. Currently all YouTube and all Facebook videos have the same width and height, which can mean landscape oriented Facebook videos appear rather small, for example. Also, some videos can’t be embedded due to their settings (e.g. the Singapore Facebook video). However, I’ve added a ‘watch video’ button underneath the player so people can always click through to the original posting.
I also responded to a query from Rhona Alcorn about how DSL data exported from their new editing system will be incorporated into the live DSL site, responded to a query from Thomas Clancy about making updates to the Place-names of Kirkcudbrightshire website and responded to a query from Kirsteen McCue about an AHRC proposal she’s putting together.
I returned to looking at the ‘guess the category quiz’ that I’d created for the Historical Thesaurus before the Easter holidays and updated the way it worked. I reworked the way the database is queried so as to make things more efficient, to ensure the same category isn’t picked as more than one of the four options and to ensure that the selected word isn’t also found in one of the three ‘wrong’ category choices. I also decided to update the category table to include two new columns, one that holds a count of the number of lexemes that have a ‘wordoed’ and the other than holds a count of the number of lexemes that have a ‘wordoe’ in each category. I then ran a script that generated these figures for all 250,000 or so categories. This is really just caching information that can be gleaned from a query anyway, but it makes querying a lot faster and makes it easier to pinpoint categories of a particular size and I think these columns will be useful for tasks beyond the quiz (e.g. show me the 10 largest Aj categories). I then created a new script that queries the database using these columns and returns data for the quiz.
This script is much more streamlined and considerably less prone to getting stuck in loops of finding nothing but unsuitable categories. Currently the script is set to only bring back categories that have at least two OED words in them, but this could easily be changed to target larger categories only (which would presumably make the quiz more of a challenge). I could also add in a check to exclude any words that are also found in the category name to increase the challenge further. The actual quiz page itself was pretty much unaltered during these updates, but I did add in a ‘loading’ spinner, which helps the transition between questions.
I’ve also created an Old English version of the quiz which works in the same way except the date of the word isn’t displayed and the ‘wordoe’ column is used. Getting 5/5 on this one is definitely more of a challenge! Here’s an example question:
I spent the rest of the week upgrading all of the WordPress sites I manage to the latest WordPress release. This took quite a bit of time as I had to track down the credentials for each site, many of which I didn’t already have a note of at home. There were also some issues with some of the sites that I needed to get Arts IT Support to sort out (e.g. broken SSL certificates, sites with the login page blocked even when using the VPN). By the end of the week all of the sites were sorted.
This was the first full week of the Coronavirus lockdown and as such I was working from home and also having to look after my nine year-old son who is also at home on lockdown. My wife and I have arranged to split the days into morning and afternoon shifts, with one of us home-schooling our son while the other works during each shift and extra work squeezed in before and after these shifts. The arrangement has worked pretty well for all of us this week and I’ve managed to get a fair amount of work done.
This included spotting and requesting fixes for a number of other sites that had started to display scary warnings about their SSL certificates, working on an updated version of the Data Management Plan for the SCOSYA follow-on proposal, fixing some log-in and account related issues for the DSL people and helping Carolyn Jess-Cooke in English Literature with some technical issues relating to a WordPress blog she has set up for a ‘Stay at home’ literary festival (https://stayathomefest.wordpress.com/). I also had a conference call with Katie Halsey and Matt Sangster about the Books and Borrowers project, which is due to start at the beginning of June. It was my first time using the Zoom videoconferencing software and it worked very well, other than my cat trying to participate several times. We had a good call and made some plans for the coming weeks and months. I’m going to try and get an initial version of the content management system and database for the project in place before the official start of the project so that the RAs will be able to use this straight away. This is of even greater importance now as they are likely to be limited in the kinds of research activities they can do at the start of the project because of travel restrictions and will need to work with digital materials.
Other than these issues I divided my time between three projects. The first was the Burns Supper map for Paul Malgrati in Scottish Literature. Paul had sent me some images that are to be used in the map and I spent some time integrating these. The image appears as a thumbnail with credit text (if available) appearing underneath. If there is a link to the place the image was taken from the credit text appears as a link. Clicking on the image thumbnail opens the full image in a new tab. I also added links to the videos where applicable too, but I decided not to embed the videos in the page as I think these would be too small and there would be just too much going on for locations that have both videos and an image. Paul also wanted clusters to be limited by areas (e.g. a cluster for Scotland rather than these just being amalgamated with a big cluster for Europe when zooming out) and I investigated this. I discovered that it is possible to create groups of locations. E.g. have a new column in the spreadsheet named ‘cluster’ or something like that and all the ‘Scotland’ ones could have ‘Scotland’ here, or all the South American ones could have ‘South America’ here. These will then be the top level clusters and they will not be further amalgamated on zoom out. Once Paul gets back to me with the clusters he would like for the data I’ll update things further. Below is an image of the map with the photos embedded:
The second major project I worked on was the interactive map for Gerry McKeever’s Regional Romanticism project. Gerry had got back to me with a new version of the data he’d been working on and some feedback from other people he’d sent the map to. I created a new version of the map featuring the new data and incorporated some changes to how the map worked based on feedback, namely I moved the navigation buttons to the top of the story pane and have made them bigger, with a new white dividing line between the buttons and the rest of the pane. This hopefully makes them more obvious to people and means the buttons are immediately visible rather than people potentially having to scroll to see them. I’ve also replaced the directional arrows with thicker chevron icons and have changed the ‘Return to start’ button to ‘Restart’. I’ve also made the ‘Next’ button on both the overview and the first slide blink every few seconds, at Gerry’s request. Hopefully this won’t be too annoying for people. Finally I made the slide number bigger too. Here’s a screenshot of how things currently look:
I then decided to chain several questions together to make the quiz more fun. Once the correct answer is given a ‘Next’ button appears, leading to a new question. I set up a ‘max questions’ variable that controls how many questions there are (e.g. 3, 5 or 10) and the questions keep coming until this number is reached. When the number is reached the user can then view a summary that tells them which words and (correct) categories were included, provides links to the categories and gives the user an overall score. I decided that if the user guesses correctly the first time they should get one star. If they guess correctly a second time they get half a star and any more guesses get no stars. The summary and star ratings for each question are also displayed as the following screenshot shows:
It’s shaping up pretty nicely, but I still need to work on the script that exports data from the database. Identifying random categories that contain at least one non-OE word and are of the same part of speech as the first randomly chosen category currently means hundreds or even thousands of database calls before a suitable category is returned. This is inefficient and occasionally the script was getting caught in a loop and timing out before it found a suitable category. I managed to catch this by having some sample data that loads if a suitable category isn’t found after 1000 attempts, but it’s not ideal. I’ll need to work on this some more over the next few weeks as time allows.
Last week was a full five-day strike and the end of the current period of UCU strike action. This week I returned to work, but the Coronavirus situation, which has been gradually getting worse over the past few weeks ramped up considerably, with the University closed for teaching and many staff working from home. I came into work from Monday to Wednesday but the West End was deserted and there didn’t seem much point in me using public transport to come into my office when there was no-one else around so from Thursday onwards I began to work from home, as I will be doing for the foreseeable future.
Despite all of these upheavals and also suffering from a pretty horrible cold I managed to get a lot done this week. Some of Monday was spend catching up with emails that had come in whilst I had been on strike last week, including a request from Rhona Alcorn of SLD to send her the data and sound files from the Scots School Dictionary and responding to Alan Riach from Scottish Literature about some web pages he wanted updated (these were on the main University site and this is not something I am involved with updating). I also noticed that the version of this site that was being served up was the version on the old server, meaning my most recent blog posts were not appearing. Thankfully Raymond Brasas in Arts IT Support was able to sort this out. Raymond had also emailed me about some WordPress sites I mange that had out of date versions of the software installed. There were a couple of sites that I’d forgotten about, a couple that were no longer operational and a couple that had legitimate reasons for being out of date, so I got back to him about those, and also updated my spreadsheet of WordPress sites I manage to ensure the ones I’d forgotten about would not be overlooked again. I also became aware of SSL certificate errors on a couple of websites that were causing the sites to display scary warning messages before anyone could reach the sites, so asked Raymond to fix these. Finally, Fraser Dallachy, who is working on a pilot for a new Scots Thesaurus, contacted me to see if he could get access to the files that were used to put together the first version of the Concise Scots Dictionary. We had previously established that any electronic files relating to the printed Scots Thesaurus have been lost and he was hoping that these old dictionary files may contain data that was used in this old thesaurus. I managed to track the files down, but alas there appeared to be no semantic data in the entries found therein. I also had a chat with Marc Alexander about a little quiz he would like to develop for the Historical Thesaurus.
I spoke to Jennifer Smith on Monday about the follow-on funding application for her SCOSYA project and spent a bit of time during the week writing a first draft of a Data Management Plan for the application, after reviewing all of the proposal materials she had sent me. Writing the plan raised some questions and I will no doubt have to revise the plan before the proposal is finalised, but it was good to get a first version completed and sent off.
I also finished work on the interactive map for Gerry McKeever’s Regional Romanticism project this week. Previously I’d started to use a new plugin to get nice curved lines between markers and all appeared to be working well. This week I began to integrate the plugin with my map, but unfortunately I’m still encountering unusable slowdown with the new plugin. Everything works fine to begin with, but after a bit of scrolling and zooming, especially round an area with lots of lines, the page becomes unresponsive. I wondered whether the issue might be related to the midpoint of the curve being dynamically generated from a function I took from another plugin so instead made a version that generated and then saved these midpoints that could then be used without needing to be calculated each time. This would also have meant that we could have manually tweaked the curves to position them as desired, which would have been great as some lines were not ideally positioned (e.g. from Scotland to the US via the North Pole), but even this seems to have made little impact on the performance issues. I even tried turning everything else off (e.g. icons, popups, the NLS map) to see if I could identify another cause of the slowdown but nothing has worked. I unfortunately had to admit defeat and resort to using straight lines after all. These are somewhat less visually appealing, but they result in no performance issues. Here’s a screenshot of this new version:
With these updates in place I made a version of the map that would run directly on the desktop and sent Gerry some instructions on how to update the data, meaning he can continue to work on it and see how it looks. But my work on this is now complete for the time being.
I was supposed to meet with Paul Malgrati from Scottish Literature on Wednesday to discuss an interactive map of Burns Suppers he would like me to create. We decided to cancel our meeting due to the Coronavirus, but continued to communicate via email. Paul had sent me a spreadsheet containing data relating to the Burns Suppers and I spent some time working on some initial versions of the map, reusing some of the code from the Regional Romanticism map, which in turn used code from the SCOSYA map.
I migrated the spreadsheet to an online database and then wrote a script that exports this data in the JSON format that can be easily read into the map. The initial version uses OpenStreetMap.HOT as a basemap rather than the .DE one that Paul had selected as the latter displays all place-names in German where these are available (e.g. Großbritannien). The .HOT map is fairly similar, although for some reason parts of South America look like they’re underwater. We can easily change to an alternative basemap in future if required. In my initial version all locations are marked with red icons displaying a knife and fork. We can use other colours or icons to differentiate types if or when these are available. The map is full screen with an introductory panel in the top right. Hovering over an icon displays the title of the event while clicking on it replaces the introductory panel with a panel containing the information about the supper. The content is generated dynamically and only displays fields that contain data (e.g. very few include ‘Dress Code’). You can always return to the intro by clicking on the ‘Introduction’ button at the top.
I spotted a few issues with the latitude and longitude of some locations that will need fixed. E.g. St Petersburg has Russia as the country but it is positioned in St Petersburg in Florida while Bogota Burns night in Colombia is positioned in South Sudan. I also realised that we might want to think about grouping icons as when zoomed out it’s difficult to tell where there are multiple closely positioned icons – e.g. the two in Reykjavik and the two in Glasgow. However, grouping may be tricky if different locations are assigned different icons / types.
After further email discussions with Paul (and being sent a new version of the spreadsheet) I created an updated version of my initial map. This version incorporates the data from the spreadsheet and incorporates the new ‘Attendance’ field into the pop-up where applicable. It is also now possible to zoom further out, and also scroll past the international dateline and still see the data (in the previous version if you did this the data would not appear). I also integrated the Leaflet Plugin MarkerCluster (see https://github.com/Leaflet/Leaflet.markercluster) that very nicely handles clustering of markers. In this new version of my map markers are now grouped into clusters that split apart as you zoom in. I also added in an option to hide and show the pop-up area as on small screens (e.g. mobile phones) the area takes up a lot of space, and if you click on a marker that is already highlighted this now deselects the marker and closes the popup. Finally, I added a new ‘Filters’ section in the introduction that you can show or hide. This contains options to filter the data by period. The three periods are listed (all ‘on’ be default’) and you can deselect or select any of them. Doing so automatically updates the map to limit the markers to those that meet the criteria. This is ‘remembered’ as you click on other markers and you can update your criteria by returning to the introduction. I did wonder about adding a summary of the selected filters to the popup of every marker, but I think this will just add too much clutter, especially when viewing the map on smaller screens (these days most people access websites on tablets or phones). Here is an example of the map as it currently looks:
The main things left to do are adding more filters and adding in images and videos, but I’ll wait until Paul sends me more data before I do anything further. That’s all for this week. I’ll just need to see how work progresses over the next few weeks as with the schools now shut I’ll need to spent time looking after my son in addition to tackling my usual work.
This was another strike week, and I only worked on Friday. Next week will be a full five-day strike. On Friday I caught up with a few emails about future projects from Paul Malgrati and Sourit Bhattacharya, read through some of the documentation for the SCOSYA follow-on funding proposal that is currently in development and had a further email conversation with Heather Pagan regarding the redevelopment of the Anglo-Norman Dictionary. Other than that I focussed on the interactive map for Gerry McKeever’s Regional Romanticism project. Last week I’d imported the new data and made a number of changes to the map, but had discovered that the plugin I was using to give nice Bezier curve lines between markers was not scaling well, and was making the map unusably slow. This week I experimented a bit more with the plugin, trying to find a more efficient means of making the lines appear, but although the alternative I came up with was slightly less inefficient it was still pretty much unusable. I therefore began looking into alternative libraries. There is another called leaflet.curve (https://github.com/elfalem/Leaflet.curve) that is looking promising, but the only downside is you need to specify a latitude and longitude for the midpoint of the curve in addition to the start and end points. I didn’t want to do this for all 80-odd locations without seeing if the result would be usable so instead I created a test map that uses the same midpoint for all lines, which you can view below:
Even with all of the lines displayed the map loads without any noticeable lag, so I’d say this is looking promising. After doing this I looked at the first plugin I’d used again and realised that it included a function that automatically calculates the latitude and longitude of the midpoint of a curve between two lat/lon pairings that are passed to it. I wondered whether I could just rip this function out of the first plugin and incorporate it into the second to avoid having to manually work out where the midpoint of each line should go. I wasn’t sure if this would work but it looks like it has, as you can see here:
This second map does seem to have the occasional performance issues and I’ll need to test it out more fully, but it is a considerable improvement on the original map. If it does prove to be too laggy once I update the actual interface I can still use the function to output the midpoint values and save them as static lat/lon values. I’m still going to have to change quite a bit of the map code to replace the first plugin with this new version, and I didn’t have time to do so this week, but I’d say things are looking very promising.
It was another strike week, and this time I only worked on Thursday and Friday. I spent a lot of Thursday catching up with emails and thinking about the technical implications of potential projects that people had contacted me about. I responded to a PhD student who had asked me for advice about a Wellcome Trust application and I spent quite a bit of time going through the existing Anglo-Norman Dictionary website to get a better understanding of how it works, the features it offers and how it could be improved. I’d also been contacted by two members of Critical Studies staff who wanted some technical work doing. The first was Paul Malgrati, who is wanting to put together an interactive map of Burns Suppers across the world. He’d sent me a spreadsheet containing the data he’s so far compiled and I spent some time going through this and replying to his email with some suggestions. The second was Sourit Bhattacharya, who is submitting a Carnegie application to develop an online bibliography. I considered his requirements and replied to him with some ideas.
I spent most of Friday working on the interactive map that plots important locations in a novel for Gerry McKeever’s Regional Romanticism project. I created an initial version of this back in January and since then Gerry has been working on his data, extending it to cover all three volumes of the novel and locations across the globe. He’d also made some changes to the structure of the data (e.g. page numbers had been separated out from the extracts to a new column in the spreadsheet) and had greatly enhanced the annotations made for each item. He had also written some new introductory text. The new data consisted of 120 entries across 88 distinct locations, and I needed to convert this from an Excel spreadsheet into a JSON file, with locations separated out and linked to multiple entries to enable map markers to be associated with multiple items. It took several hours to do this. I did consider writing a script to handle to conversion but that would have taken some time in itself and this is the last time the entire dataset will be migrated so the script would never be needed again. Plus undertaking the task manually gave me the opportunity to check the data and to ensure that the same locations mentioned in different entries all matched up.
I also created a new version of the interactive map that used the new data and incorporated some other updates too. This version has the new intro slide and hopefully has all of the updates to the existing data plus all of the new data for all three volumes. I changed the display slightly to include page numbers as a separate field (the information now appears in the ‘volume’ section) and also to include the notes. To differentiate these from the extracts I’ve enclosed the extracts in large speech marks.
One issue I encountered is that with all of the new data in place the loading of new items and panning to them slowed to a crawl. The map unfortunately became absolutely unusable. After some investigation I realised that the plugin that makes the nice curved lines between locations was to blame. It appears to not be at all scalable and I had to disable the grey dotted lines on the map as with all of the markers visible the map was unusable. I may be able to fix this, or I may have to switch to an alternative plugin for the lines as the one that’s currently used appears to be horribly inefficient and unscalable. However, the yellow line connecting the current marker with the previous one is still visible as you scroll around the locations and I think this is the most important thing. Below is a screenshot of the map as it currently stands. Next week I will be continuing with the UCU strike action and will therefore only be working on the Friday.
I met with Fraser Dallachy on Monday to discuss his ongoing pilot Scots Thesaurus project. It’s been a while since I’ve been asked to do anything for this project and it was good to meet with Fraser and talk through some of the new automated processes he wanted me to try out. One thing he wanted to try was tag the DSL dictionary definitions for part of speech to see if we could then automatically pick out word forms that we could query against the Historical Thesaurus to try and place the headword within a category. I adapted a previous script I’d created that picked out random DSL entries. This script targetted main entries (i.e. not supplements) that were nouns, were monosemous and had one sense, had fewer than 5 variant spellings, single word headwords and ‘short’ definitions, with the option to specify what is meant by ‘short’ in terms of the number of characters. I updated the script to bring back all DOST entries that met these criteria and had definitions that were less than 1000 characters in length, which resulted in just under 18,000 rows being returned (but I will rerun the script with a smaller character count if Fraser wants to focus on shorter entries). The script also stripped out all citations and tags from the definition to prepare it for POS tagging. With this dataset exported as a CSV I then began experimenting with a POS Tagger. I decided to use the Stanford POS Tagger (https://nlp.stanford.edu/software/tagger.html) which can be run at the command line, and I created a PHP script that went through each row of the CSV, passed the prepared definition text to the Tagger, pulled in the output and stored it in a database. I left the process running overnight and it had completed the following morning. I then outputted the rows as a spreadsheet and sent them on to Fraser for feedback. Fraser also wanted to see about using the data from the Scots School Dictionary so I sent that on to him too.
I also did a little bit of work for the DSL, investigating why some geographically tagged information was not being displayed in the citations, and replied to a few emails from Heather Pagan of the Anglo-Norman Dictionary as she began to look into uploading new data to their existing and no longer supported dictionary management system. I also gave some feedback on a proposal written by Rachel Douglas, a lecturer in French. Although this is not within Critical Studies and should be something Luca Guariento looks at, he is currently on holiday so I offered to help out. I also set up an initial WordPress site for Matthew Creasey’s new project. This still needs some further work, but I’ll need further information from Matthew before I can proceed. On Wednesday I met with Jennifer Smith and E Jamieson to discuss a possible follow-on project for the Scots Syntax Atlas. We talked through some of the possibilities and I think the project has huge potential. I’ll be helping to write the Data Management Plan and other such technical things for the proposal in due course.
I met with Marc and Fraser on Friday to discuss our plans for updating the way dates are stored in the Historical Thesaurus, which will make it much more easy to associate labels with specific dates and to update the dates in future as we align the data with revisions from the OED. I’d previously written a script that generated the new dates and from these generated a new ‘full date’ field which I then matched against the original ‘full date’ to spot errors. The script identified 1,116 errors, but this week I updated my script to change the way it handled ‘b’ dates. These are the dates that appear after a slash and where the date after the slash is in the same decade as the main date only one digit should be displayed (e.g. 1975/6), but this is not done so consistently, with dates sometimes appearing as 1975/76. Where this happened my script was noting the row as an error, but Marc wanted these to be ignored. I updated my script to take this into consideration, and this has greatly reduced the number of rows that will need to be manually checked, reducing the output to just 284 rows.
I spent the rest of my time this week working on the Books and Borrowers project. Although this doesn’t officially begin until the summer I’m designing the data structure at the moment (as time allows) so that when the project does start the RAs will have a system to work with sooner rather than later. I mapped out all of the fields in the various sample datasets in order to create a set of ‘core’ fields, mapping the fields from the various locations to these ‘core’ fields. I also designed a system for storing additional fields that may only be found at one or two locations, are not ‘core’ but still need to be recorded. I then created the database schema needed to store the data in this format and wrote a document that details all of this which I sent to Katie Halsey and Matt Sangster for feedback.
Matt also sent me a new version of the Glasgow Student borrowings spreadsheet he had been working on, and I spent several hours on Friday getting this uploaded to the pilot online resource I’m working on. I experimented with a new method of extracting the data from Excel to try and minimise the number of rows that were getting garbled due to Excel’s horrible attempts to save files as HTML. As previously documented, the spreadsheet uses formatting in a number of columns (e.g. superscript, strikethrough). This formatting is lost if the contents of the spreadsheet are copied in a plain text way (so no saving as a CSV, or opening the file in Google Docs or just copying the contents). The only way to extract the formatting in a way that can be used is to save the file as HTML in Excel and then work with that. But the resulting HTML produced by Excel is awful, with hundreds of tags and attributes scattered across the file used in an inconsistent and seemingly arbitrary way.
For example, this is the HTML for one row:
<tr height=23 style=’height:17.25pt’>
<td height=23 width=64 style=’height:17.25pt;width:48pt’></td>
<td width=143 style=’width:107pt’>Charles Wilson</td>
<td width=187 style=’width:140pt’>Charles Wilson</td>
<td width=86 style=’width:65pt’>Charles</td>
<td width=158 style=’width:119pt’>Wilson<span
<td width=88 style=’width:66pt’>Nat. Phil.</td>
<td width=129 style=’width:97pt’>Natural Philosophy</td>
<td width=64 style=’width:48pt’>B</td>
<td class=xl69 width=81 style=’width:61pt’>10</td>
<td class=xl70 width=81 style=’width:61pt’>3</td>
<td width=250 style=’width:188pt’>Wells Xenophon vol. 3<font class=”font6″><sup>d</sup></font></td>
<td width=125 style=’width:94pt’>Mr Smith</td>
<td width=124 style=’width:93pt’></td>
<td width=124 style=’width:93pt’></td>
<td width=124 style=’width:93pt’>Adam Smith</td>
<td width=124 style=’width:93pt’></td>
<td width=124 style=’width:93pt’></td>
<td width=89 style=’width:67pt’>22 Mar 1757</td>
<td width=89 style=’width:67pt’>10 May 1757</td>
<td align=right width=56 style=’width:42pt’>2</td>
<td width=64 style=’width:48pt’>4r</td>
<td class=xl71 width=64 style=’width:48pt’>1</td>
<td class=xl70 width=64 style=’width:48pt’>007</td>
<td class=xl65 width=325 style=’width:244pt’><a
<td width=293 style=’width:220pt’>Xenophon.</td>
<td width=392 style=’width:294pt’>Opera quae extant omnia; unà cum
chronologiâa Xenophonteâ <span style=’display:none’>cl. Dodwelli, et quatuor
tabulis geographicis. [Edidit Eduardus Wells] / [Xenophon].</span></td>
<td width=110 style=’width:83pt’>Wells, Edward, 16<span style=’display:none’>67-1727.</span></td>
<td colspan=2 width=174 style=’mso-ignore:colspan;width:131pt’>Sp Coll
<td align=right width=64 style=’width:48pt’>1</td>
<td align=right width=121 style=’width:91pt’>1</td>
<td width=64 style=’width:48pt’>T111427</td>
<td width=64 style=’width:48pt’></td>
<td width=64 style=’width:48pt’></td>
<td width=64 style=’width:48pt’></td>
Previously I tried to fix this by running through several ‘find and replace’ passes to try and strip out all of the rubbish, while retaining what I needed, which was <tr>, <td> and some formatting tags such as <sup> for superscript.
This time I found a regular expression that removes all attributes from HTML tags, so for example <td width=64 style=’width:48pt’> becomes <td> (see it here: https://stackoverflow.com/questions/3026096/remove-all-attributes-from-an-html-tag). I could then pass the resulting contents of every <td> through PHP’s strip_tags function to remove any remaining tags that were not required (e.g. <span>) while specifying the tags to retain (e.g. <sup>).
This approach seemed to work very well until I analysed the resulting rows and realised that the columns of many rows were all out of synchronisation, meaning any attempt at programmatically extracting the data and inserting it into the correct field in the database would fail. After some further research I realised that Excel’s save as HTML feature was to blame yet again. Without there being any clear reason, Excel sometimes expands a cell into the next cell or cells if these cells are empty. An example of this can be found above and I’ve extracted it here:
<td colspan=2 width=174 style=’mso-ignore:colspan;width:131pt’>Sp Coll Bi2-g.19-23</td>
The ‘colspan’ attribute means that the cell will stretch over multiple columns, in this case 2 columns, but elsewhere in the output file it was 3 and sometimes 4 columns. Where this happens the following cells simply don’t appear in the HTML. As my regular expression removed all attributes this ‘colspan’ was lost and the row ended up with subsequent cells in the wrong place.
Once I’d identified this I could update my script to check for the existence of ‘colspan’ before removing attributes, adding in the required additional empty cells as needed (so in the above case an extra <td></td>).
With all of this in place the resulting HTML was much cleaner. Here is the above row after my script had finished:
<td>Wells Xenophon vol. 3<sup>d</sup></td>
<td>22 Mar 1757</td>
<td>10 May 1757</td>
<td>Opera quae extant omnia; unà cum
chronologiâa Xenophonteâ cl. Dodwelli, et quatuor
tabulis geographicis. [Edidit Eduardus Wells] / [Xenophon].</td>
<td>Wells, Edward, 16<span>67-1727.</td>
I then updated my import script to pull in the new fields (e.g. normalised class and professor names), set it up so that it would not import any rows that had ‘Yes’ in the first column, and updated the database structure to accommodate the new fields too. The upload process then ran pretty smoothly and there are now 8145 records in the system. After that I ran the further scripts to generate dates, students, professors, authors, book names, book titles and classes and updated the front end as previously discussed. I still have the old data stored in separate database tables as well, just in case we need it, but I’ve tested out the front-end and it all seems to be working fine to me.
I divided most of my time between three projects this week. For the Place-Names of Mull and Ulva my time was spent working with the GB1900 dataset. On Friday last week I’d created a script that would go through the entire 2.5 million row CSV file and extract each entry, adding it to a database for more easy querying. This process had finished on Monday, but unfortunately things had gone wrong during the processing. I was using the PHP function ‘fgetcsv’ to extract the data a line at a time. This splits the CSV up based on a delimiting character (in this case a comma) and adds each part into an array, thus allowing the data to be inserted into my database. Unfortunately some of the data contained commas. In such cases the data was enclosed in double quotes, which is the standard way of handling such things, and I had thought the PHP function would automatically handle this, but alas it didn’t, meaning whenever a comma appeared in the data the row was split up into incorrect chunks and the data was inserted incorrectly into the database. After realising this I added another option to the ‘fgetcsv’ command to specify a character to be identified as the ‘enclosure’ character and set the script off running again. It had completed the insertion by Wednesday morning, but when I came to query the database again I realised that the process had still gone wrong. Further investigation revealed the cause to be the GB1900 CSV file itself, which was encoded with UCS-2 character encoding rather than the more usual UTF-8. I’m not sure why the data was encoded in this way, as it’s not a current standard and it results in a much larger file size than using UTF-8. It also meant that my script was not properly identifying the double quote characters, which is why my script failed a second time. However, after identifying this issue I converted the CSV to UTF-8, picked out a section with commas in the data, tested my script, discovered things were working this time, and let the script loose on the full dataset yet again. Thankfully it proved to be ‘third time lucky’ and all 2.5 million rows had been successfully inserted by Friday morning.
After that I was then able to extract all of the place-names for the three parishes we’re interested in, which is a rather more modest (3908 rows. I then wrote another script that would take this data and insert it into the project’s place-name table. The place-names are a mixture of Gaelic and English (e.g. ‘A’ Mhaol Mhòr’ is pretty clearly Gaelic while ‘Argyll Terrace’ is not) and for now I set the script to just add all place-names to the ‘English’ rather then ‘Gaelic’ field. The script also inserts the latitude and longitude values from the GB1900 data, and associates the appropriate parish. I also found a bit of code that takes latitude and longitude figures and generates a 6 figure OS grid reference from them. I tested this out and it seemed pretty accurate, so I also added this to my script, meaning all names also have the grid reference field populated.
The other thing I tried to do was to grab the altitude for each name via the Google Maps service. This proved to be a little tricky as the service blocks you if you make too many requests all at once. Also, our server was blacklisting my computer for making too many requests in a short space of time too, meaning for a while afterwards I was unable to access any page on the site or the database. Thankfully Arts IT Support managed to stop me getting blocked and I managed to set the script to query Google Maps at a rate that was acceptable to it, so I was able to grab the altitudes for all 3908 place-names (although 16 of them are at 0m so may look like it’s not worked for these). I also added in a facility to upload, edit and delete one or more sound files for each place-name, together with optional captions for them in English and Gaelic. Sound files must be in the MP3 format.
The second project I worked on this week was my redevelopment of the ‘Digital Humanities at Glasgow’ site. I have now finished going through the database of DH projects, trimming away the irrelevant or broken ones and creating new banners, icons, screenshots, keywords and descriptions for the rest. There are now 75 projects listed, including 15 that are currently set as ‘Showcase’ projects, meaning they appear in the banner slideshow and on the ‘showcase’ page. I also changed the site header font and fixed an issue with the banner slideshow and images getting too small on narrow screens. I’ve asked Marc Alexander and Lorna Hughes to give me some feedback on the new site and I hope to be able to launch it in two weeks or so.
My third major project of the week was the Historical Thesaurus. Marc, Fraser and I met last Friday to discuss a new way of storing dates that I’ve been wanting to implement for a while, and this week I began sorting this out. I managed to create a script that can process any date, including associating labels with the appropriate date. Currently the script allows you to specify a category (or to load a random category) and the dates for all lexemes therein are then processed and displayed on screen. As of yet nothing is inserted into the database. I have also updated the structure of the (currently empty) dates table to remove the ‘date order’ field. I have also changed all date fields to integers rather than varchars to ensure that ordering of the columns is handled correctly. At last Friday’s meeting we discussed replacing ‘OE’ and ‘_’ with numerical values. We had mentioned using ‘0000’ for OE, but I’ve realised this isn’t a good idea as ‘0’ can easily be confused with null. Instead I’m using ‘1100’ for OE and ‘9999’ for ‘current’. I’ve also updated the lexeme table to add in new fields for ‘firstdate’ and ‘lastdate’ that will be the cached values of the first and last dates stored in the new dates table.
The script displays each lexeme in a category with its ‘full date’ column. It then displays what each individual entry in the new ‘date’ table would hold for the lexeme in boxes beneath this, and then finishes off with displaying what the new ‘firstdate’ and ‘lastdate’ fields would contain. Processing all of the date variations turned out to be somewhat easier than it was for generating timeline visualisations, as the former can be treated as individual dates (an OE, a first, a mid, a last, a current) while the latter needed to transform the dates into ranges, meaning the script had to check how each individual date connected to the next, had to possibly us ‘b’ dates etc.
I’ve tested the script out and I have so far only encountered one issue, and that is there are 10 rows that have first dates and mid dates and last dates but instead of the ‘firmidcon’ field joining the first and the mid dates together the ‘firlastcon’ field is used instead. Then the ‘midlascon’ field is used to join the mid date to the last. This is an error as ‘firlastcon’ should not be used to join first and mid dates. An example of this happening is htid 28903 in catid 8880 where the ‘full date’ is ‘1459–1642/7 + 1856’. There may be other occasions where the wrong joining column has been used, but I haven’t checked for these so far.
After getting the script to sort out the dates I then began the look at labels. I started off using the ‘label’ field in order to figure out where in the ‘full date’ the label appeared. However, I noticed that where there are multiple labels these appear all joined together in the label field, meaning in such cases the contents of the label field will never be matched to any text in the ‘full date’ field. E.g. htid 6463 has the full date ‘1611 Dict. + 1808 poet. + 1826 Dict.’ And the label field is ‘Dict. poet. Dict.’ which is no help at all.
Instead I abandoned the ‘label’ field and just used the ‘full date’ field. Actually, I still use the ‘label’ field to check whether the script needs to process labels or not. Here’s a description of the logic for working out where a label should be added:
The dates are first split up into their individual boxes. Then, if there is a label for the lexeme I go through each date in turn. I split the full date field and look at the part after the date. I go through each character of this in turn. If the character is a ‘+’ then I stop. If I have yet to find label text (they all start with an a-z character) and the character is a ‘-‘ and the following character is a number then I stop. Otherwise if the character is a-z I note that I’ve started the label. If I’ve started the label and the current character is a number then I stop. Otherwise I add the current character to the label and proceed to the next character until all remaining characters are processed or a ‘stop’ criteria is reached. After that if there is any label text it’s added to the date. This process seems to work. I did, however, have to fix how labels applied to ‘current’ dates are processed. For a current date my algorithm was adding the label to the final year and not the current date (stored as 9999) as the label is found after the final year and ‘9999’ isn’t found in the full date string. I added a further check for ‘current’ dates after the initial label processing that moves labels from the penultimate date to the current date in such cases.
In addition to these three big projects I also had an email conversation with Jane Roberts about some issues she’d been having with labels in the Thesaurus of Old English, I liaised with Arts IT Support to get some server space set up for Rachel Smith and Ewa Wanat’s project, I gave some feedback on a job description for an RA for the Books and Borrowing project, helped Carole Hough with an issue with a presentation of the Berwickshire Place-names resource, gave the PI response for Thomas’s Iona project a final once-over, gave a report on the cost of Mapbox to Jennifer Smith for the SCOSYA project and arranged to meet Matthew Creasey next week to discuss his Decadence and Translation project.
I spent quite a bit of time this week continuing to work on the systems for the Place-names of Mull and Ulva project. The first thing I did was to figure out how WordPress sets the language code. It has a function called ‘get_locale()’ that bring back the code (e.g. ‘En’ or ‘Gd’). Once I knew this I could update the site’s footer to display a different logo and text depending on the language the page is in. So now if the page is in English the regular UoG logo and English text crediting the map and photo are displayed whereas is the page is in Gaelic the Gaelic UoG logo and credit text is displayed. I think this is working rather well.
I managed to get all of the new Gaelic fields added into the CMS and fully tested by Thursday and asked Alasdair to testing things out. I also had a discussion with Rachel Opitz in Archaeology about incorporating LIDAR data into the maps and started to look at how to incorporate data from the GB1900 project for the parishes we are covering. GB1900 (http://www.gb1900.org/) was a crowdsourced project to transcribe every place-name that appears on OS maps from 1888-1914, which resulted in more than 2.5 million transcriptions. The dataset is available to download as a massive CSV file (more than 600Mb). It includes place-names for the three parishes on Mull and Ulva and Alasdair wanted to populate the CMS with this data as a starting point. On Friday I started to investigate how to access the information. Extracting the data manually from such a large CSV file wasn’t feasible so instead I created a MySQL database and wrote a little PHP script that iterated through each line of the CSV and added it to the database. I left this running over the weekend and will continue to work with it next week.
Also this week I continued to add new project records to the new Digital Humanities at Glasgow site. I only have about 30 more sites to add now, and I think it’s shaping up to be a really great resource that we will hopefully be able to launch in the next month or so.
I also spent a bit of further time on the SCOSYA project. I’d asked the university’s research data management people whether they had any advice on how we could share our audio recording data with other researchers around the world. The dataset we have is about 117GB, and originally we’d planned to use the University’s file transfer system to share the files. However, this can only handle files that are up to 20Gb in size, which meant splitting things up. And it turned out to take an awfully long time to upload the files, a process we would have to do each time the data was requested. The RDM people suggested we use the University’s OneDrive system instead. This is part of Office365 and gives each member of staff 1TB of space, and it’s possible to share uploaded files with others. I tried this out and the upload process was very swift. It was also possible to share the files with users based on their email addresses, and to set expiration dates and password for file access. It looks like this new method is going to be much better for the project and for any researchers who want to access our data. We also set up a record about the dataset in the Enlighten Research Data repository: http://researchdata.gla.ac.uk/951/ which should help people find the data.
Also for SCOSYA we ran into some difficulties with Google’s reCAPTCHA service, which we were using to protect the contact forms on our site from spam submissions. There was an issue with version 3 of Google’s reCAPTCHA system when integrated with the contact form plugin. It works fine if Google thinks you’re not a spammer but if you somehow fail its checks it doesn’t give you the option of proving you’re a real person, it just blocks the submission of the form. I haven’t been able to find a solution for this using v3, but thankfully there is a plugin that allows the contact form plugin to revert back to using reCAPTCHA v2 (the ‘I am not a robot’ tickbox). I got this working and have applied it to both the contact form and the spoken corpus form and it works for me as someone Google somehow seems to trust and for me when using IE via remote desktop, where Google makes me select features in images before the form submits.
Also this week I met with Marc and Fraser to discuss further developments for the Historical Thesaurus. We’re going to look at implementing the new way of storing and managing dates that I originally mapped out last summer and so we met on Friday to discuss some of the implications of this. I’m hoping to find some time next week to start looking into this.
We received the reviews for the Iona place-name project this week and I spent some time during the week and over the weekend going through the reviews, responding to any technical matters that were raised and helping Thomas Clancy with the overall response, that needed to be submitted the following Monday. I also spoke to Ronnie Young about the Burns Paper Database, that we may now be able to make publicly available, and made some updates to the NME digital ode site for Bryony Randall.