I was off on Tuesday this week to attend my uncle’s funeral. I spent the rest of the week working on a number of relatively small tasks for a variety of different projects. The Dictionary of Old English people got back to me on Monday to say they had updated their search system to allow our Thesaurus of Old English site to link directly from our word records to a search for that word on their site. This was really great news, and I updated our site to add in the direct links. This is going to be very useful for users of the both sites. I spent a bit more time on AHRC review duties this week, and I also had an email discussion with Joanna Kopaczyk in English Language about a proposal she is putting together. She sent me on the materials she is working on and I read through them all and gave some feedback about the technical aspects. I’m going to help her to write the Technical Plan for her project soon too. I also met with Rachel Douglas from the School of Modern Languages to offer some advice on technical matters relating to a projest she’s putting together. Althoguh Rachel is not in my School and I therefore can’t be involved in her project it was still good to be able to give her a bit of help and show her some examples of digital outputs similar to the sorts of thing she is hoping to produce.
I also spent some further time working on the integration of OED data with the Historical Thesaurus data with Fraser. Fraser had sent me some further categories that he and a student had manually matched up, and had also asked me to write another script that picks out all of the unmatched HT categories and all of the unmatched OED categories and for each HT category goes through all of the OED categories and finds the one with the lowest Levenshtein score (an algorithm that returns a number showing how many steps it would take to turn one string into another). My initial version of this script wasn’t ideal, as it included all unmatched OED categories and I’d forgotten that this included several thousand that are ‘top level’ categories that don’t have a part of speech and shouldn’t be matched with our categories at all. I also realised that the script should only compare categories that have the same part of speech, as my first version was ending up with (for example) a noun category being matched up with an adjective. I updated the script to bear these things in mind, but unfortunately the output still doesn’t look all that useful. However, there are definitely some real matches that can be manually picked out from the list, e.g. 31890 ‘locustana pardalina or rooibaadjie’ and ‘locustana pardalina (rooibaadjie)’ and some others around there. Also 14149 ‘applied to weapon etc’ and ‘applied to weapon, etc’. It’s over to Fraser again to continue with this.
I mentioned last week that I’d updated all of our WordPress sites to version 4.9, but that 4.9.1 would no doubt soon be released. And in fact it was released this week, so I had to update all of the sites once more. It’s a bit of a tedious task but it doesn’t really take too long – maybe about half an hour in total. I also decided to tick an item off my long-term ‘to do’ list as I had a bit of time available. The Mapping Metaphor site had a project blog, located at a different URL from the main site. As the project has now ended there are no more blog posts being made so it seems a bit pointless hosting this WordPress site, and having to keep it maintained, when I could just migrate the content to the main MM website as static HTML and delete the WordPress site. I spent some time investigating WordPress plugins that could export entire sites as static HTML, for example https://en-gb.wordpress.org/plugins/static-html-output-plugin/ and https://wordpress.org/plugins/simply-static/. These plugins go through a WordPress site, convert all pages and posts to static HTML, pull in the WordPress file uploads folder and wrap everything up as a ZIP file. This seemed ideal, and the tools both worked very well, but I realised they weren’t exactly what I needed. Firstly, the Metaphor blog (which was set up before I was involved with the project) just uses page IDs in the URLs, not other sorts of permalinks. Both the plugins don’t work with the default URL style in place, so I’d need to change the link type, meaning the new pages would have different URLs to the old pages which would be a problem for redirects. Secondly, both plugins pull in all of the page elements, including the page design, the header and all the rest. I didn’t actually want all of this stuff but just the actual body of the posts (plus titles and a few other details) so I could slot this into the main MM website template. So instead of using a plugin I realised it was probably simpler and easier if I just wrote my own little export script that grabbed just the published posts (not pages), for each getting the ID, the title, the main body, the author and the date of creation. My script hooked into the WordPress functions to make use of the ‘wpautop’ function, which adds paragraph markup to texts, and I also replaced absolute URLs with relative ones. I then created a temporary table to hold just this data, set my script to insert into it and then I exported this table. I imported this into the main MM site’s database and wrote a very simple script to pull out the correct post based on the passed ID and that was that. Oh, I also copied the WordPress uploads directory across too, so images and PDFs and such things embedded in posts would continue to work. Finally, I created a simple list of posts. It’s exactly what was required and was actually pretty simple to implement, which is a good combination.
On Thursday I heard that the Historical Thesaurus had been awarded the ‘Queen’s Anniversary Prize for Higher Education’, which is a wonderful achievement for the project. Marc had arranged a champagne reception on Friday afternoon to celebrate the announcement, so I spent most of afternoon sipping champagne and eating chocolates, which was a nice way to end the week.