I returned to work after the Christmas holidays on Thursday this week, and spent the day dealing with a few issues that had cropped up whilst I’d been away. The DSL Advanced Search had stopped working on Wednesday this week. I remembered that this had happened a few years ago and was caused by an issue with the Apache Solr search engine, which the advanced search uses. Previously restarting the server had sorted the issue but this didn’t work this time. Thankfully after speaking to Chris about this we realised that Solr runs on an Apache Tomcat server rather than the main Apache server software, and this had been updated the day before. It would appear that the update had stopped Solr working, but restarting Tomcat got things working again. I also made a minor tweak to the Scots Corpus for Wendy.
After that, and dealing with a few emails, I returned to the Historical Thesaurus timeline visualisations I’d created the day before the Christmas holidays. I’d emailed Marc and Fraser about these before the holidays and they’d got back to me with some encouraging comments. The initial visualisations I’d made only worked with the approximate start and end dates from the Thesaurus database – the ‘apps’ and ‘appe’ dates that give a single start and end date for each word. However, the actual dates for lexemes are considerably more complicated than this. In fact there are 18 fields relating to dates in the underlying database that allow different ranges of dates to be recorded, for example ‘OE + a1400/50–c1475 + a1746– now History’. Writing an algorithm that could process every different possible permutation of the date fields proved to be rather tricky and took quite a bit of time to get my head around. I managed to get an algorithm working by mid-morning on the Friday, and although this still needs quite a bit of detailed testing it does at least seem to work (and does work with the above example), giving a nice series of dots and dashes along a timeline.
Marc, Fraser and I met on Friday to discuss the timeline and how we might improve on it and integrate it into the site. Our meeting lasted almost three hours and was very useful. It looks like the feature I created just because I had some free time and I had wanted to experiment is going to be fully integrated with many aspects of the site. The only downside is there is now a massive amount of additional functionality we want to implement, and I know I’m going to be pretty busy with other projects once the new year properly gets under way, so it might take quite a while to get all this up and running. Still, it’s exciting, though. Also whilst working through my algorithm I’d spotted some occurrences where the dates were wrong, in that they had a range of dates where a range did not make sense, e.g. ‘OE–c1200–a1500’. I generated a few CSV files with such rows (there are a couple of hundred) and Marc and Fraser are going to try and sort them out. After our lengthy meeting I started to add in pop-ups to the timeline, so that when you click on an item a pop-up opens displaying information about the word (e.g. the word and its full date text). I still need to do some work on this, but it’s good to get the basics in place. Here’s a screenshot showing the timeline using the full date fields and with a pop-up open: