I was on holiday last week and had quite a stack of things to do when I got back in on Monday. This included setting up a new project website for a student in Scottish Literature who had received some Carnegie funding for a project and preparing for an interview panel that I was on, with the interview taking place on Friday. I also responded to Alison Wiggins about the content management system I’d created for her Mary Queen of Scots letters project and had a discussion with someone in Central IT Services about the App and Play Store accounts that I’ve been managing for several years now. It’s looking like responsibility for this might be moving to IT services, which I think makes a lot of sense. I also gave some advice to a PhD student about archiving and preserving her website and engaged in a long email discussion with Heather Pagan of the Anglo-Norman Dictionary about sorting out their data and possibly migrating it to a new system. On Wednesday I had a meeting with the SCOSYA team about further developments of the public atlas. We decided on another few requirements and discussed timescales for the completion of the work. They’re hoping to be able to engage in some user testing in the middle of September, so I need to try and get everything completed before then. I had hoped to start on some of this on Thursday, but I was struck down by a really nasty cold that I’ve still not shaken yet, which made focussing on such tricky tasks as getting questionnaire areas to highlight when clicked on rather difficult.
I spent most of the rest of the week working for DSL in various capacities. I’d put in a request to get Apache Solr installed on a new server, so we could use this for free-text searching and thankfully Arts IT Support agreed to do this. A lot of my week was spent preparing the data from both the ‘v2’ version of the DSL (the data outputted from the original API, but with full quotes and everything pre-generated rather than being created on the fly every time an entry is requested) and the ‘v3’ API (data taken from the editing server and outputted by a script written by Thomas Widmann) so that it could be indexed by Solr. Raymond from Arts IT Support set up an instance of Solr on a new server and I created scripts that went through all 90,000 DSL entries in both versions and generated full-text versions of the entries that stripped out the XML tags. For each set I created three versions – on that was ‘full text’, one that was full text without the quotations and the other that was just the quotations. The script outputted this data in a format that Solr could work with and I sent this on to Raymond for indexing. The first test version I sent Raymond was just the full text, and Solr managed to index this without incident. However, the other views of the text required working with the XML a bit, and this appears to have brought in some issues with special characters that Solr is not liking. I’m still in the middle of sorting this out and will continue to look into it next week, but progress with the free-text searching is definitely being made and it looks like the new API will be able to offer the same level of functionality as the existing API. I also ensured I documented the process of generating all of the data from the XML files outputted by the editing system through to preparing the full-text for indexing by Solr, so next time we come to update the data we will know exactly what to do. This is much better than how things previously stood, as the original API is entirely a ‘black box’ with no documentation whatsoever as to how to update the data contained therein.
Also during this time I engaged in an email conversation about managing the dictionary entries and things like cross references with Ann Ferguson and the people who will be handling the new editor software for the dictionary, and helped to migrate control for the email part of the DSL domain to the control of the DSL’s IT people. We’re definitely making progress with sorting out the DSL’s systems, which is really great.
I’m going to be working for just three days over the next two weeks, and all of these days will be out of the office, so I’ll just need to see how much time I have to continue with the DSL tasks, especially as work for the SCOSYA project is getting rather urgent.