I continued working on the new website for the Thesaurus of Old English (TOE) this week, which took up a couple of days in total. Whilst working on the front end I noticed that the structure of TOE is different to that of the Historical Thesaurus of English (HTE) in an important way: There are never any categories with the same number but a different part of speech. With HTE the user can jump ‘sideways’ in the hierarchy from one part of speech to another and then browse up and down the hierarchy for that part of speech, but with TOE there is no ‘sideways’ – for example if there is an adjective category that could be seen as related to a noun category at the same level these categories are given different numbers. This difference meant that plugging the TOE data into the functions I’d created for the HTE website just didn’t work very well as there were just too many holes in the hierarchy when part of speech was taken into consideration.
The solution to the problem was to update the code to ignore part of speech. I checked that there were indeed no main categories with the same number but a different part of speech (a little script I wrote confirmed this to be the case) and then updated all of the functions that generated the hierarchy, the subcategories and other search and browse features to ignore part of speech, but instead to place the part of speech beside the category heading wherever category headings appear (e.g. in the ‘browse down’ section or the list of subcategories). This approach seems to have worked out rather well and the thesaurus hierarchy is now considerably more traversable.
I managed to complete a first version of the new website for TOE, with all required functionality in place. This includes both quick and advanced searches, category selection, the view category page and some placeholder ancillary pages. At Fraser’s request I also added in the facility to search with vowel length marks. This required creating another column in the ‘lexeme search words’ table with a stricter collation setting that ensures a search involving a length mark (e.g. ‘sǣd’) only finds words that feature the length mark (e.g. ‘sæd’ would not be found). I added an option to the advanced search field allowing the user to say whether they cared about length marks or not. The default is not, but I’m sure a certain kind of individual will be very keen on searching with length marks. If this option is selected the ‘special characters’ buttons expand to include all of the vowels with length marks, thus enabling the user to construct the required form. It will be useful for people who want to find out (for example) all of the words in the thesaurus that end in ‘*ēn’ (41) as opposed to all of those words that end in ‘*en’ disregarding length marks (1546).
I think we’re well on track to have the new TOE launched before the ISAS conference at the beginning of next month, which is great.
I continued working on the Scots Thesaurus project this week as well. I met with Susan and Magda on Tuesday to talk them through using the WordPress plugin I’d created for managing thesaurus categories and lexemes. Before this meeting I ran through the plugin a few times myself and noted a number of things that needed updating or improving so I spent some time sorting those things out. The meeting itself went well and I think both Susan and Magda are now familiar enough with the interface to use it. I created a ‘to do’ list containing outstanding technical tasks for the project and I’ll need to work through all of these. For example, a big thing to add will be facilities to enable staff to upload lexemes to a cateogory through the WordPress interface via a spreadsheet. This will really help to populate the thesaurus.
I also spent a little time contributing to a Leverhulme bid application for Carole Hough and did a tiny amount of DSL work as well. I’m still no further with the Hansard visualisations though. Arts Support are going to supply me with a test server on which I should be able to install Bookworm, but I’m not sure when this is going to happen yet. I’ll chase this up on Monday.