I was off on Monday this week for the September Weekend holiday. My four working days were split across many different projects, but the main ones were the Historical Thesaurus and the Anglo-Norman Dictionary.
For the HT I continued with the preparations for the second edition. I updated the front-end so that multiple changelog items are now checked for and displayed (these are the little tooltips that say whether a lexeme’s dates have been updated in the second edition). Previously only one changelog was being displayed but this approach wasn’t sufficient as a lexeme may have a changed start and end date. I also fixed a bug in the assigning of the ‘end date verified as after 1945’ code, which was being applied to some lexemes with much earlier end dates. My script set the type to 3 in all cases where the last HT date was 9999. What it needed to do was to only set it to type 3 if the last HT date was 9999 and the last OED date was after 1945. I wrote a little script to fix this, which affected about 7,400 lexemes.
I also wrote a script to check off a bunch of HT and OED categories that had been manually matched by an RA. I needed to make a few tweaks to the script after testing it out, but after running it on the data we had a further 846 categories matched up, which is great. Fraser had previously worked on a document listing a set of criteria for working out whether an OED lexeme was ‘new’ or not (i.e. unlinked to an HT lexeme). This was a pretty complicated document with many different stages, and the output of the various stages needing to be outputted into seven different spreadsheets and it took quite a long time to write and test a script that would handle all of these stages. However, I managed to complete work on it and after a while it finished executing and resulted in the 7 CSV files, one for each code mentioned in the document. I was very glad that I had my new PC as I’m not sure my old one could have coped with it – for the Levenshtein tests data every word in the HT had to be stored in memory throughout the script’s execution, for example. On Friday I had a meeting with Marc and Fraser where we discussed the progress we’d been making and further tweaks to the script were proposed that I’ll need to implement next week.
For the Anglo-Norman Dictionary I continued to work on the ‘Entry’ page, implementing a mixture of major features and minor tweaks. I updated the way the editor’s initials were being displayed as previously these were the initials of the editor who made the most recent update in the changelog where what was needed were the initials of the person who created the record, contained in the ‘lead’ attribute of the main entry. I also attempted to fix an issue with references in the entry that were set to ‘YBB’. Unlike other references, these were not in the data I had as they were handled differently. I thought I’d managed to fix this, but it looks like ‘YBB’ is used to refer to many different sources so can’t be trusted to be a unique identifier. This is going to need further work.
Minor tweaks included changing the font colour of labels, making the ‘See Also’ header bigger and clearer, removing the final semi-colon from lists of items, adding in line breaks between parts of speech in the summary and other such things. I then spent quite a while integrating the commentaries. These were another thing that weren’t properly integrated with the entries but were added in as some sort of hack. I decided it would be better to have them as part of the editors’ XML rather than attempting to inject them into the entries when they were requested for display. I managed to find the commentaries in another hash file and thankfully managed to extract the XML from this using the Python script I’d previously written for the main entry hash file. I then wrote a script that identified which entry the commentary referred to, retrieved the entry and then inserted the commentary XML into the middle of it (underneath the closing </head> element.
It took somewhat longer than I expected to integrate the data as some of the commentaries contained Greek, and the underlying database was not set up to handle multi-byte UTF-8 characters (which Greek are), meaning these commentaries could not be added to the database. I needed to change the structure of the database and re-import all of the data as simply changing the character encoding of the columns gave errors. I managed to complete this process and import the commentaries and then begin the process of making them appear in the front-end. I still haven’t completely finished this (no formatting or links in the commentaries are working yet) and I’ll need to continue with this next week.
Also this week I added numbers to the senses. This also involved updating the editor’s XML to add a new ‘n’ attribute to the <sense> tag, e.g. <sense id=”AND-201-47B626E6-486659E6-805E33CE-A914EB1F-S001″ n=”1″>. As with the current site, the senses reset to 1 when a new part of speech begins. I also ensured that [sic] now appears, as does the language tag, with a question mark if the ‘cert’ attribute is present and not 100. Uncertain parts of speech are also now visible too (again if ‘cert’ is present and not 100), I increased the font size of the variant forms and citation dates are now visible. There is still a huge amount of work to do, but progress is definitely being made.
Also this week I reviewed the transcriptions from a private library that we are hoping to incorporate into the Books and Borrowing project and tweaked the way ‘additional fields’ are stored to enable the Ras to enter HTML characters into them. I also created a spreadsheet template for a recording the correspondence of Robert Burns for Craig Lamont and spoke to Eila Williamson about the design of the new Names Studies website. I updated the text on the homepage of this site, which Lorna Hughes sent me and gave some advice to Luis Gomes about a data management plan he is preparing. I also updated the working on the search results page for ‘V3’ of the DSL to bring it into line with ‘V2’ and participated in a Zoom call for the Iona project where we discussed the new website and images that might be used in the design.