I was back at work this week after having a lovely holiday the previous week. It was a pretty busy week, mostly spent continuing to work on the preparations for the second edition of the Historical Thesaurus, which needs to be launched before the end of the month. I updated the OED date extraction script that formats all of the OED dates as we need them in the HT, including making full entries, in the HT dates table, generating the ‘full date’ text string that gets displayed on the website and generating cached first and last dates that are used for searching. I’d somehow managed to get the plus and dash connectors the wrong way round in my previous version of the script (a plus should be used where there is a gap of more than 150 years, otherwise it’s a dash) so I fixed this. I also stripped out dates that were within a 150 year time span, which really helped to make the full date text more readable. I also updated the category browser so that the category’s thematic heading is displayed in the drop-down section.
Fraser had made some suggested changes to the script I’d written to figure out whether an OED lexeme was new or already in the system so I made some changes to this and regenerated the output. I also made further tweaks to the date extraction script so that we record the actual final date in the system rather than converting it to ‘9999’ and losing this information that will no doubt be useful in future. I then worked on the post-1999 lexemes, which followed a similar set of processes.
With this all in place I could then run a script that would actually import the new lexemes and their associated dates into the HT database. This included changelog codes, new search terms and new dates (cached firstdate and lastdate, fulldate and individual entries in the dates table). A total of 11116 new words were added, although I subsequently noticed there were a few duplicates that had slipped through the net. With these stripped out we had a total of 804,830 lexemes in the HT, and it’s great to have broken through the 800,000 mark. Next week I need to fix a few things (e.g. the fullsize timelines aren’t set up to cope with post-1945 dates that don’t end in ‘9999’ if they’re current) but we’re mostly now good to launch the second edition.
Also this week I worked on setting up a website for the ‘Symposium for Seventeenth-Century Scottish Literature’ for Roslyn Potter in Scottish Literature and set up a subdomain for an art catalogue website for Bryony Randall’s ‘Imprints of the New Modernist Editing’ project. I also helped Megan Coyer out with an issue she was having in transcribing multi-line brackets in Word and travelled to the University to collect a new, higher-resolution monitor and some other peripherals to make working from home more pleasant. I also fixed a couple of bugs in the Books and Borrowing CMS, including one that was resulting in BC dates of birth and death for authors being lost when data was edited. I also spent some time thinking about the structure for the Burns Correspondence system for Pauline Mackay, resulting in a long email with a proposed database structure. I met with Thomas Clancy and Alasdair Whyte to discuss the CMS for the Iona place-names project (it now looks like this is going to have to be a completely separate system from Alasdair’s existing Mull / Ulva system) and replied to Simon Taylor about a query he had regarding the Place-names of Fife data.
I also found some time to continue with the redevelopment of the Anglo-Norman Dictionary website. I updated the way cognate references were processed to enable multiple links to be displayed for each dictionary. I also added in a ‘Cite this entry’ button, which now appears in the top right of the entry that when clicked on opens a pop-up where citation styles will appear (they’re not there yet). I updated the left-hand panel to make it ‘sticky’: If you scroll down a long entry the panel stays visible on screen (unless you’re viewing on a narrow screen like a mobile phone in which case the left-hand panel appears full-width before the entry). I also added in a top bar that appears when you scroll down the screen that contains the site title, the entry headword and the ‘cite’ button. I then began working on extracting the citations, including their dates and text, which will be used for search purposes. I ran an extraction script that extracted about 60,0000 citations, but I released that this was not extracting all of the citations and further work will be required to get this right next week.