This was another week of doing lots of fairly small bits of work for many different projects. I was involved in some discussions about some possible updates to websites with Scottish Language Dictionaries, and created a new version of a page for the Concise Scots Dictionary for them. I also made a couple of minor tweaks to a DSL page for them as well.
For the Edinburgh Gazetteer project I added in all of the ancillary material that Rhona Brown had sent me, added in some new logos, set up a couple of new pages and made a couple of final tweaks to the Gazetteer and reform societies map pages. The site is now live and can be accessed here: http://edinburghgazetteer.glasgow.ac.uk/
I also read through the Case for Support for Thomas Clancy’s project proposal and made a couple of updated to the Technical Plan based on this, and I spent some time reading over the applications for a post that I’m on the interview panel for. I also spent a bit more time on the Burns Paper Database project. There were some issues with the filenames of the images used. Some included apostrophes and ampersands, which meant the images wouldn’t load on the server. I decided to write a little script to rename all of the images in a more uniform way, while keeping a reference to the original filenames in the database for display and for future imports. It took a bit of time to get this sorted but the images work a lot better now.
I met with Fraser on Wednesday to get back into the whole issue of merging the new OED data with the HT data. It had been a few months since either of us had looked at the issues relating to this, so it took a bit of time to get back up to speed with things. The outcome of our meeting was that I would create three new scripts. The first would find all of the categories where there was no ‘oedmaincat’ and the part of speech was not a noun. The script would then check to see whether there was a noun at the same level and if so grab its ‘oedmaincat’ and then see if this matched anything in the OED data for the given part of speech. This managed to match up a further 183 categories that weren’t previously matched so we could tick these off. The second script generated a CSV for Fraser to use that ordered unmatched categories by size. This is going to be helpful for manual checking and it thankfully demonstrated that of the more than 12,000 non-matched categories only about 750 have more than 5 words in them. The final script was an update to the ‘all the non-matches’ script that added in counts of the number of words within the non-matching HT and OED categories. It’s now down to Fraser and some assistants to manually go through things.
I did some further work for the SPADE project this week, extracting some information about the SCOTS corpus. I wrote a script that queries the SCOTS database and pulls out some summary information about the audio recordings. For each audio recording the ID, title, year recorded and duration in minutes are listed. Details for each participant (there are between 1 and 6) are also listed: ID, Gender, decade of birth (this is the only data about the age of the person that there is), place of birth and occupation (there is no data about ‘class’). This information appears in a table. Beneath this I also added some tallies: the total number of recordings, the total duration, the number of unique speakers (as a speaker can appear in multiple recordings) and a breakdown of how many of these are male, female or not specified. Hopefully this will be of use to the project.
Finally, I had a meeting with Kirsteen McCue and project RA Brianna Robertson-Kirkland about the Romantic National Song Network project. We discussed potential updates to the project website, how it would be structured, how the song features might work and other such matters. I’m intending to produce a new version of the website next week.