This was a short week for me as I only worked from Monday to Wednesday due to Christmas coming along. I spent most of Monday and Tuesday continuing to work on the Technical Plan for Joanna Kopaczyk’s proposal. As it’s a project with quite a large technical component there was a lot to think about and lots of detail to try and squeeze into the maximum of four pages allowed for a Plan. My first draft was five pages long, so I had to chop some information out and reformat things to try and bring the length down a bit, but thankfully I managed to get it within the limit whilst still making sense and retaining the important points. I also chatted with Graeme some more about some of the XML aspects of the project and had an email conversation with Luca about it too. It was good to get the Plan sent on to Joanna, although it’s still very much a first draft that will need some further tweaking as other aspects of the proposal are firmed up.
I had to fix an issue with the Thesaurus of Old English staff pages on Monday. The ‘edit lexemes’ form was set to not allow words to be more than 21 characters long. Jane Roberts had been trying to update the positioning of the word ‘(ge)mearcian mid . . . rōde’, and as this is more than 21 characters any changes made to this row were being rejected. I’m not sure why I’d set the maximum word length to 21 as the database allows up to 60 characters in this field. But I updated the check to allow up to 60 characters and that fixed the problem. I also spent a bit of time on Tuesday gathering some stats for Wendy about the various Mapping Metaphor resources (i.e. the main website, the blog, the iOS app and the Android app). I also had a chat with Jane Stuart Smith about an older but still very important site that she would like me to redesign at some point next year, and I started looking through this and thinking how it could be improved.
On Wednesday, as it was my last day before the hols, I decided to focus on something from my ‘to do’ list that would be fun. I’d been wanting to make a timeline for the Historical Thesaurus for a while so I thought I’d look into that. What I’ve created so far is a page through which you can pass a category ID and then see all of the words in the category in a visualisation that shows when the word was used, based on the ‘apps’ and ‘appe’ fields in the database. When a word’s ‘apps’ and ‘appe’ fields are the same it appears as a dot in the timeline, and where the fields are different the word appears as a coloured bar showing the extent of the attested usage. Note that more complicated date structures such as ‘a1700 + 1850–‘ are not visualised yet, but could be incorporated (e.g. a dot for 1700 then a bar from 1850 to 2000).
When you hover over a dot or bar the word and its dates appear below the visualisation. Eventually (if we’re going to use this anywhere) I would instead have this as a tool-tip pop-up sort of thing.
Here are a couple of screenshots of fitting examples for the festive season. First up is words for ‘Be gluttonous’:
And here are words for ‘Excess in drinking’:
The next step with this would be to incorporate all subcategories for a category, with different shaded backgrounds for sections for each subcategory and a subcategory heading added in. I’m not entirely sure where we’d link to this, though. We could allow people to view the timeline by clicking on a button in the category browse page. Or we might not want to incorporate it at all, as it might just clutter things up. BTW, this is a D3 based visualisation created by adapting this code: https://github.com/denisemauldin/d3-timeline
That’s all from me for 2017. Best wishes for Christmas and the New Year to one and all!
On Monday this week I attended the Corpus Linguistics in Scotland event, which took place in the STELLA lab on the ground floor of the building I work in, which was very handy. It was a useful event to attend, as the keynote talk was about the SPADE project, which I’m involved with, and it was very helpful to listen to an overview of the project and also to get a glimpse at some of the interesting research that is already going on in the project. The rest of the day was split into two short paper sessions, one about corpus linguistics and the arts and humanities and the other about medical humanities. There were some interesting talks in both sessions and it was great to hear a little about some of the research that’s going on at the moment. It was also good to speak to some of the other attendees at the event, including Rhona Alcorn from SLD, Joanna Copaczyk, who I’m currently helping out with a research proposal, and Stevie Barrett from DASG.
I spent a lot of the rest of the week involved in research proposals. I’d been given another AHRC technical review to do, and this one was a particularly tricky one to get right, which took rather a lot of time. I also started working on a first draft of a Technical Plan for Joanna’s project. I read through all of the materials she’d previously sent me and spent some time thinking through some of the technical implications. I started to write the plan and completed the first couple of sections, by which point I had another series of questions to fire off to Joanna. I also spoke to Graeme about some TEI XML and OCR issues, wondering if he had encountered a similar sort of workflow in a previous project. Graeme’s advice was very helpful, as it usually is. I hope to get a first draft of the plan completed early next week. On Friday I had an email from the AHRC to say that my time as a technical reviewer would end at the end of the month. I have apparently been a technical reviewer for three years now. They also stated that from February next year the AHRC will be dropping Technical Plans and the technical review process. Instead they will just have a data management plan and will integrate the reviewing of technical details within the more general review process. I’m in two minds about this. On the one hand it is clear that the AHRC just don’t have enough technical reviewers which makes having such a distinct focus on the technical aspects of reviews difficult to sustain. But on the other hand, I worry that Arts and Humanities reviewers who are experts in a particular research area may lack the technical knowledge to ascertain whether a research proposal is at all technically feasible, which will almost certainly result in projects getting funded that are simply not viable. It’s going to be interesting to see how this all works out, and also to see how the new data management plans will be structured.
On Friday afternoon I attended a Skype call for the Linguistic DNA project, along with Marc and Fraser. It was good to hear a bit more about how the project is progressing, and to be taken through a presentation about some of the project’s research that the team in Sheffield are putting together. I’m afraid I didn’t have anything to add to the proceedings, though, as I haven’t done any work for the project since the last Skype meeting. There doesn’t really seem to be anything anyone wants me to do for the project at this stage.
Also this week I had a phone conversation with Pauline Mackay about collaborative tools that might be useful for the Burns project to use. I suggested Basecamp as a possible tool, as I know this has been used successfully by others in the School, for example Jane Stuart-Smith has used it to keep track of several projects. I also had a chat with Luca about some work he’s been doing on topic modelling and sentiment analysis, which all sounds really interesting. Hopefully I can arrange another meeting of Arts developers in the new year and Luca can tell us all a bit more about this. I also spent a bit of time updating the Romantic National Song Network website for Kirsteen McCue, and I read through an abstract for a paper Fraser is submitting that will involve the sparklines we created for the Historical Thesaurus.
I was struck down with some sort of tummy bug at the weekend and wasn’t well enough to come into work on Monday, but I worked from home instead. Unfortunately although I struggled through the day I was absolutely wiped out by the end of it and ended up being off work sick on Tuesday and Wednesday. I was mostly back to full health on Thursday, which is the day I normally work from home anyway, so I made it through that day and was back to completely full health on Friday, thankfully. So I only managed to work for three days this week, and for two of those I wasn’t exactly firing on all cylinders. However, I still managed to get a few things done this week.
Last week I’d migrated the Mapping Metaphor blog site and after getting approval from Wendy I deleted the old site on Monday. I took a backup of the database and files before I did so, and then I wrote a little redirect that ensures Google links and bookmarks to specific blog pages point to the correct page on the main Metaphor site. I also had some further AHRC review duties to take care of, plus I spent some time reading through the Case for Support for Joanna Kopaczyk’s project and thinking about some of the technical implications. Pauline Mackay also sent me a sample of an Access database she’s put together for her Scots Bawdry project. I’m going to create an online version of this so I spent a bit of time going through it and thinking about how it would work.
I spent most of Thursday and Friday working on this new system for Pauline, and by the end of the week I had created an initial structure for the online database, had created some initial search and browse facilities and I also created some management pages to allow Pauline to add / edit / delete records. The search page allows users to search for any combination of the following fields:
Verse title, first line, language, theme, type, ms title, publication year, place, publisher and location. Verse title, first line and ms title are free text and will bring back any records with matching text – e.g. if you enter ‘the’ into ‘verse title’ you can find all records where these three characters appear together in a title. Publication year allows users to search for an individual year or a range of years (e.g. 1820-1840 brings back everything that has a date between and including these years). Language, place, publisher and location are drop-down lists that allow you to select one option. Themes and type are checkboxes allowing you to select any number of options, with each joined by an ‘or’ (e.g. all the records that have a theme of ‘illicit love’ or ‘marriage’). I can change any of the single selection drop-downs to multiple options (or vice versa) if required. If multiple boxes are filled in these are joined by ‘and’ – e.g. publication place is Glasgow AND publication year is 1820.
The browse page presents all of the options in the search form as clickable lists, with each entry having a count to show you how many records match. For ‘publication year’ only those records with a year supplied are included. Clicking on a search or browse result displays the full record. Any content that can be searched for (e.g. publication type) is a link and clicking on it performs a search for that thing.
For the management pages, once logged in a staff user can browse the data, which displays all of the records in one big table. From here the user can access options to edit or delete a record. Deleting a record simply deactivates it in the database and I can retrieve it again if required. Users can also add new records by clicking on the ‘add new row’ link. I also created a script for importing all of the data from the Access database and I will run this again on a more complete version of the database when Pauline is ready to import everything. This is all just an initial version, and there will no doubt be a few changes required, but I think it’s all come together pretty well so far.
I was off on Tuesday this week to attend my uncle’s funeral. I spent the rest of the week working on a number of relatively small tasks for a variety of different projects. The Dictionary of Old English people got back to me on Monday to say they had updated their search system to allow our Thesaurus of Old English site to link directly from our word records to a search for that word on their site. This was really great news, and I updated our site to add in the direct links. This is going to be very useful for users of the both sites. I spent a bit more time on AHRC review duties this week, and I also had an email discussion with Joanna Kopaczyk in English Language about a proposal she is putting together. She sent me on the materials she is working on and I read through them all and gave some feedback about the technical aspects. I’m going to help her to write the Technical Plan for her project soon too. I also met with Rachel Douglas from the School of Modern Languages to offer some advice on technical matters relating to a projest she’s putting together. Althoguh Rachel is not in my School and I therefore can’t be involved in her project it was still good to be able to give her a bit of help and show her some examples of digital outputs similar to the sorts of thing she is hoping to produce.
I also spent some further time working on the integration of OED data with the Historical Thesaurus data with Fraser. Fraser had sent me some further categories that he and a student had manually matched up, and had also asked me to write another script that picks out all of the unmatched HT categories and all of the unmatched OED categories and for each HT category goes through all of the OED categories and finds the one with the lowest Levenshtein score (an algorithm that returns a number showing how many steps it would take to turn one string into another). My initial version of this script wasn’t ideal, as it included all unmatched OED categories and I’d forgotten that this included several thousand that are ‘top level’ categories that don’t have a part of speech and shouldn’t be matched with our categories at all. I also realised that the script should only compare categories that have the same part of speech, as my first version was ending up with (for example) a noun category being matched up with an adjective. I updated the script to bear these things in mind, but unfortunately the output still doesn’t look all that useful. However, there are definitely some real matches that can be manually picked out from the list, e.g. 31890 ‘locustana pardalina or rooibaadjie’ and ‘locustana pardalina (rooibaadjie)’ and some others around there. Also 14149 ‘applied to weapon etc’ and ‘applied to weapon, etc’. It’s over to Fraser again to continue with this.
I mentioned last week that I’d updated all of our WordPress sites to version 4.9, but that 4.9.1 would no doubt soon be released. And in fact it was released this week, so I had to update all of the sites once more. It’s a bit of a tedious task but it doesn’t really take too long – maybe about half an hour in total. I also decided to tick an item off my long-term ‘to do’ list as I had a bit of time available. The Mapping Metaphor site had a project blog, located at a different URL from the main site. As the project has now ended there are no more blog posts being made so it seems a bit pointless hosting this WordPress site, and having to keep it maintained, when I could just migrate the content to the main MM website as static HTML and delete the WordPress site. I spent some time investigating WordPress plugins that could export entire sites as static HTML, for example https://en-gb.wordpress.org/plugins/static-html-output-plugin/ and https://wordpress.org/plugins/simply-static/. These plugins go through a WordPress site, convert all pages and posts to static HTML, pull in the WordPress file uploads folder and wrap everything up as a ZIP file. This seemed ideal, and the tools both worked very well, but I realised they weren’t exactly what I needed. Firstly, the Metaphor blog (which was set up before I was involved with the project) just uses page IDs in the URLs, not other sorts of permalinks. Both the plugins don’t work with the default URL style in place, so I’d need to change the link type, meaning the new pages would have different URLs to the old pages which would be a problem for redirects. Secondly, both plugins pull in all of the page elements, including the page design, the header and all the rest. I didn’t actually want all of this stuff but just the actual body of the posts (plus titles and a few other details) so I could slot this into the main MM website template. So instead of using a plugin I realised it was probably simpler and easier if I just wrote my own little export script that grabbed just the published posts (not pages), for each getting the ID, the title, the main body, the author and the date of creation. My script hooked into the WordPress functions to make use of the ‘wpautop’ function, which adds paragraph markup to texts, and I also replaced absolute URLs with relative ones. I then created a temporary table to hold just this data, set my script to insert into it and then I exported this table. I imported this into the main MM site’s database and wrote a very simple script to pull out the correct post based on the passed ID and that was that. Oh, I also copied the WordPress uploads directory across too, so images and PDFs and such things embedded in posts would continue to work. Finally, I created a simple list of posts. It’s exactly what was required and was actually pretty simple to implement, which is a good combination.
On Thursday I heard that the Historical Thesaurus had been awarded the ‘Queen’s Anniversary Prize for Higher Education’, which is a wonderful achievement for the project. Marc had arranged a champagne reception on Friday afternoon to celebrate the announcement, so I spent most of afternoon sipping champagne and eating chocolates, which was a nice way to end the week.