I continued to work with the Hansard dataset this week, working with Chris McGlashan to get the dataset onto a server. Once it was there I could access the data, but as there are more than 682 million rows of frequency data things were a little slow to query, especially as no indexes were included in the dump. As I don’t have command-line access to the server I needed to ask Chris to run the commands to create indexes, as each index takes several hours to compile. He set on going that indexed the data by year, and after a few hours it had completed, resulting in an 11GB index file. With that in place I could much more swiftly retrieve the data for each year. I’ve let Marc know that this data is now available again, and I just need to wait to hear back from him to see exactly what he wants to do with the dataset.
I spent a fair amount of time this week advising staff on technical aspects of research proposals. It’s the time of year when the students are all away and staff have time to think about such proposals, meaning things get rather busy for me. I created a Data Management Plan for a follow-on project that Bryony Randall in English Literature is putting together. I also started to migrate a project website she had previously set up through WordPress.com onto an instance of WordPress hosted at Glasgow. Her site on WordPress.com was full of horribly intrusive adverts that did not give a good impression and really got in the way, and moving to hosting at Glasgow will stop this, and give the site a more official looking URL. It will also ensure the site can continue to be hosted in future, as free commercial hosting is generally not very reliable. I hope to finish the migration next week. I also responded to a query about equipment from Joanna Kopaczyk, discussed a couple of timescale issues with Thomas Clancy and gave some advice to Karen Lury from TFTS about video formats and storage requirements. I also met with Clara Cohen to discuss her Data Management Plan.
Also this week I sorted out my travel arrangements for the DH2019 conference and updated the site layout slightly for the DSL website, and on Wednesday I attended the English Language and Linguistics Christmas lunch, which was lovely. I also continued with my work on the HT / OED category linking, ticking off another batch of matches, which takes us down to 1894 unmatched OED categories that have words and a part of speech.
I also spent about a day continuing to work on the Bilingual Thesaurus. Last week I’d updated the ‘category’ box on the search page to make it an ‘autocomplete’ box, that lists matching categories as you type. However, I’d noticed that this was often not helpful as the same title is used for multiple categories (like the three ‘used in buildings’ categories mentioned in last week’s post). I therefore implemented a solution that I think works pretty well. When you type into the ‘category’ box the top two levels of the hierarchy to which a matching category belongs now appear in addition to the category name. If the category is more than two hierarchical levels down this is represented by ellipsis. Listed categories are now ordered by their ID rather than alphabetically too, so categories in the same part of the tree appear together. So now, for example, if you type in ‘processes’ the list contains ‘Building > Processes’ , ‘Building > Processes > … > Other processes’ etc. Hopefully this will make the search much easier to use. I also updated the search results page so the hierarchy is shown in the ‘you searched for’ box too, and I fixed a bug that was preventing the search results page displaying results if you searched for a category then followed a link through to the category page then pressed on the ‘Back to search results’ button.
Louise had noticed that there were two ‘processes’ categories within ‘Building’ so I amalgamated these. I also changed ‘Advanced Search’ back to plain old ‘Search’ again in all locations, and I created a new menu item and page for ‘How to use the thesaurus’.
As the Bilingual Thesaurus is almost ready to go live and it ‘hangs off’ the thesaurus.ac.uk domain I added some content to the homepage of the domain, as you can see in the screenshot below:
It currently just has boxes for the three thesauruses featuring a blurb and a link, with box colours taken from each site’s colour schemes. I did think about adding in the ‘sample category’ feature for each thesaurus here too, but as it might make the top row boxes rather long (if it’s a big category) I decided to keep things simple. I added the tagline ‘The home of academic thesauri’ (‘thesauruses’ seemed a bit clumsy here) just to give visitors a sense of what the site is. I’ll need some feedback from Marc and Fraser before this officially goes live.
Finally this week I spent some time working on some new song stories for the Romantic National Song Network. I managed to create about one and a half, which took several hours to do. I’ll hopefully manage to get the remaining half and maybe even a third one done next week.