I was struck down by a rather unpleasant, feverish throat infection this week. I managed to struggle through Wednesday, even though I should really have been in bed, but then was off sick on Thursday and Friday. It was very frustrating as I am really quite horribly busy at the moment with so many projects on the go and so many people needing advice, and I had to postpone three meetings I’d arranged for Thursday. But it can’t be helped.
I had a couple of meetings this week, one with Carole Hough to help her out with her Cogtop.org site. Whilst I was away on holiday a few weeks ago there were some problems with a multilingual plugin that we use on this site to provide content in English and Danish and the plugin had to be deactivated in order to get content added to the site. I met with Carole to discuss who should be responsible for updating the content of the site and what should be done about the multilingual feature. It turns out Carole will be updating the content herself so I gave her a quick tutorial on managing a WordPress site. I also replaced the multilingual plugin with a newer version that works very well. This plugin is called qTranslate X: https://wordpress.org/plugins/qtranslate-x/ and I would definitely recommend it.
My other meeting was with Gavin Miller, and we discussed the requirements for his bibliography of text relating to Medical Humanities and Science Fiction. I’m going to be creating a little WordPress plugin that he can use to populate the bibliography. We talked through the sorts of data that will need to be managed and Gavin is going to write a document listing the fields and some examples and we’ll take it from there.
I had hoped to be able to continue with the Hansard visualisation stuff on Wednesday this week but I just was feeling well enough to tackle it. My data extraction script had at least managed to extract frequency data for two whole years of the Commons by Wednesday, though. This may not seem like a lot of data when we have over 200 years to deal with, but it will be enough to test out how the Bookworm system will work with the data. Once I have get this test data working and I’m sure that the structure I’ve extracted the data into can be used with Bookworm we can then think about using Cloud or Grid computing to extract chunks of the data in parallel. If we don’t take this approach it will take another two years to complete the extraction of the data!
Instead of working with Hansard, I spent most of Wednesday working with the Thesaurus of Old English data that Fraser had given to me earlier in the week. I’ll be overhauling the old ‘TOE’ website and database and Fraser has been working to get the data into a consistent format. He gave me the data as a spreadsheet and I spent some time on Wednesday creating the necessary database structure for the data and writing scripts that would be able to process and upload the data. I managed to get all of the data uploaded into the new online database, consisting of almost 22,500 categories and 51,500 lexemes. I still need to do some work on the data, specifically fixing length symbols, which currently appear in the data as underscores after the letter (e.g. eorþri_ce) when what is needed is the modern UTF8 character (e.g. eorþrīce). I also need to create the search terms for variant forms in the data too, which could prove to be a little tricky.
Other tasks I carried out this week included completing the upload of all of the student created data for the Scots Thesaurus project, investigating the creation of the Google Play account for the STELLA apps and updating a lot of the ancillary content for the Mapping Metaphor website ahead of next week’s launch, a task which took a fair amount of time.