This was a three-day week for me as I was participating in the UCU strike action on Thursday and Friday. I spent the majority of Monday to Wednesday tying up loose ends and finishing off items on my ‘to do’ list. The biggest task I tackled was to relaunch the Digital Humanities at Glasgow site. This involved removing all of the existing site and moving the new site from its test location to the main URL. I had to write a little script that changed all of the image URLs so they would work in the new location and I needed to update WordPress so it knew where to find all of the required files. Most of the migration process went pretty smoothly, but there were some slightly tricky things, such as ensuring the banner slideshow continued to work. I also needed to tweak some of the static pages (e.g. the ‘Team’ and ‘About’ pages) and I added in a ‘contact us’ form. I also put in redirects from all of the old pages so that any bookmarks or Google links will continue to work. As of yet I’ve had no feedback from Marc or Lorna about the redesign of the site, so I can only assume that they are happy with how things are looking. The final task in setting up the new site was to migrate my blog over from its old standalone URL to become integrated in the new DH site. I exported all of my blog posts and categories and imported them into the new site using WordPress’s easy to use tools, and that all went very smoothly. The only thing that didn’t transfer over using this method was the media. All of the images embedded in my blog posts still pointed to image files located on the old server so I had to manually copy these images over and then I wrote a script that went through every blog post and found and replaced all image URLs. All now appears to be working, and this is the first blog post I’m making using the new site so I’ll know for definite once I add this post to the site.
Other than setting up this new resource I made a further tweak to the new data for Matthew Sangster’s 18th century student borrowing records that I was working on last week. I had excluded rows from his spreadsheet that had ‘Yes’ in the ‘omit’ column, assuming that these were not to be processed by my upload script, but actually Matt wanted these to be in the system and displayed when browsing pages, but omitted from any searches. I therefore updated the online database to include a new ‘omit’ column, updated my upload script to only process these omitted rows and then changed the search facilities to ignore any rows that have a ‘Y’ in the ‘omit’ column.
I also responded to a query from Alasdair Whyte regarding parish boundaries for his Place-names of Mull and Ulva project, and investigated an issue that Marc Alexander was experiencing with one of the statistics scripts for the HT / OED matching process (it turned out to be an issue with a bad internet connection rather than an issue with the script). I’d had a request from Paul Malgrati in Scottish Literature about creating an online resource that maps Burns Suppers so I wrote a detailed reply discussing the various options.
I also spent some time fixing the issue of citation links not displaying exactly as the DSL people were wanting in the DSL website. This was surprisingly difficult to implement because the structure can vary quite a bit. The ID needed for the link is associated with the ‘cref’ that wraps the whole reference, but the link can’t be applied to the full contents as only authors and titles should be links, not geo and date tags or other non-tagged text that appears in the element. There may be multiple authors or no author so sometimes the link needs to start before the first (and only the first) author whereas other times the link needs to start before the title. As there is often text after the last element that needs to be linked the closing tag of the link can’t just be appended to the text but instead the script needs to find where this last element ends. However, it looks like I’ve figured out a way to do it that appears to work.
I devoted a few spare hours on Wednesday to investigating the Anglo-Norman Dictionary. Heather was trying to figure out where on the server the dataset that the website uses is located, and after reading through the documentation I managed to figure out that the data is stored in a Berkeley DB and it looks like the data the system uses is stored in a file called ‘entry_hash’. There is a file with this name in ‘/var/data’ and it’s just over 200Mb in size, which suggests it contains a lot of data. Software that can read this file can be downloaded from here: https://www.oracle.com/database/technologies/related/berkeleydb-downloads.html and the Java edition should work on a Windows PC if it has Java installed. Unfortunately you have to register to access the files and I haven’t done so yet, but I’ve let Heather know about this.
I then experimented with setting up a simple AND website using the data from ‘all.xml’, which (more or less) contains all of the data for the dictionary. My test website consists of a database, one script that goes through the XML file and inserts each entry into the database, and one script that displays a page allowing you to browse and view all entries. My AND test is really very simple (and is purely for test purposes) – the ‘browse’ from the live site (which can be viewed here: http://www.anglo-norman.net/gate/) is replicated, only it currently displays in its entirety, which makes the page a little slow to load. Cross references are in yellow, main entries in white. Click on a cross reference and it loads the corresponding main entry, click on a regular headword to load its entry. Currently only some of the entry page is formatted, and some elements don’t always display correctly (e.g. the position of some notes). The full XML is displayed below the formatted text. Here’s an example:
Clearly there would still be a lot to do to develop a fully functioning replacement for AND, but it wouldn’t take a huge amount of time (I’d say it could be measured in days not weeks). It just depends whether the AND people want to replace their old system.
I’ll be on strike again next week from Monday to Wednesday.