Week Beginning 1st May 2017

This week was a shorter one than usual as Monday was the May Day holiday and I was off work on Wednesday afternoon to attend a funeral.  I worked on a variety of different tasks during the time available.  Wendy is continuing to work on the data for Mapping Metaphor and had another batch of it for me to process this week.  After dealing with the upload we now have a further nine categories marked off and a total of 12,938 metaphorical connections and 25,129 sample lexemes. I also returned to looking at integrating the new OED data into the Historical Thesaurus.  Fraser had enlisted the help of some students to manually check connections between HT and OED categories and I set up a script that will allow to mark off a few thousand more categories as ‘checked’.  Before that Fraser needs to QA their selections and I wrote a further script that will help with this.  Hopefully next week I’ll be able to actually mark off the selections.

I also returned to SCOSYA for the first time since before Easter.  I managed to track down and fix a few bugs that Gary had identified.  Firstly, Gary was running into difficulties when importing and displaying data using the ‘my map data’ feature.  The imported data simply wouldn’t display at all in the Safari browser and after a bit of investigation I figured out why.  It turned out there was a missing square bracket in my code, which rather strangely was being silently fixed in other browsers but was causing issues in Safari.  Adding in the missing bracket fixed the issue straight away.  The other issue Gary had encountered was when Gary did some work on the CSV file exported from the Atlas and then reimported it.  When he did so the import failed to upload any ratings.  It turned out that Excel had added in some extra columns to the CSV file whilst Gary was working with it and this change to the structure meant that each row failed the validation checks I had put in place.  I decided to rectify this in two ways – firstly the upload would no longer check the number of columns and secondly I added more informative error messages.  It’s all working a lot better now.

With these things out of the way I set to work on a larger update to the map.  Previously an ‘AND’ search limited results by location rather than by participant.  For example, if you did a search that said ‘show me attributes D19 AND D30, all age groups with a rating of 4-5’ a spot for a location would be returned if any combination of participant matched this.  As there are up to four participants per location it could mean that a location could be returned as meeting the criteria even if a no individual participant actually met the criteria.  For example, participants A and B give D19 a score of 5 but only give D30 a score of 3, while Participants C and D only give D19 a score of 3 and give D30 a score of 5.  In combination, therefore, this location meets the criteria even though none of the participants actually do.  Gary reckoned this wasn’t the best way to handle the search and I agreed.  So, instead I updated the ‘AND’ search to check whether individuals met the criteria.  This meant a fairly large reworking of the API and a fair amount of testing, but it looks like the ‘AND’ search now works at a participant level.  And the ‘OR’ search doesn’t need to be updated because an ‘OR’ search by its very nature is looking for any combination.

I spent the remainder of the week on LDNA duties, continuing to work on the ‘sparkline’ visualisations for thematic heading categories.  Most of the time was actually spent creating a new API for the Historical Thesaurus, which at this stage is used solely to output data for the visualisations.  It took a fair amount of time to get the required endpoints working, and to create a nice index page that lists the endpoints with examples of how each can be used.  It seems to be working pretty well now, though, including facilities to output the data in JSON or CSV format.  The latter proved to be slightly tricky to implement due to the way that the data for each decade was formatted.  I wanted each decade to appear in its own column, so as to roughly match the format of Marc’s original Excel spreadsheet, and this meant having to rework how the multi-level associative array was processed.

With the API in place I then set to work with creating a new version of the visualisation that actually used it.  This also took a fair amount of time as I had to deconstruct my single file test script, splitting out the JavaScript processing into its own file, handling the form submission in JavaScript, connecting to the API and all those sorts of things.  By the end of the week I had a set of visualisations that looked and functioned identically to the set I previously had, but behind the scenes they were being processed very differently, and in a much more robust and easy to manage way.  Next week I’ll continue with these as there are still lots of enhancements to add in.