Week Beginning 5th September 2016

It was a four-day week for me this week as I’d taken Friday off.  I spent a fair amount of time this week continuing to work on the Atlas interface for the SCOSYA project, in preparation for Wednesday, when Gary was going to demo that Atlas to other project members at a meeting in York.  I spent most of Monday and Tuesday working on the facilities to display multiple attributes through the Atlas.  This has been quite a tricky task and has meant massively overhauling the API as well as the front end so as to allow for multiple attribute IDs and Boolean joining types to be processed.

In the ‘Attribute locations’ section of the ‘Atlas Display Options’ menu underneath the select box there is now an ‘Add another’ button.  Pressing on this slides down a new select box and also options for how the previous select box should be ‘joined’ with the new one (either ‘and’, ‘or’ or ‘not’).  Users can add as many attribute boxes as they want, and can also remove a box by pressing on the ‘Remove’ button underneath it.  This smoothly slides up the box and removes it from the page using the always excellent jQuery library.

The Boolean operators (‘and’, ‘or’ and ‘not) can be quite confusing to use in combination so we’ll have to make sure we explain how we are using them.  E.g. ‘A AND B OR C’ could mean ‘(A AND B) OR C’ or ‘A AND (B OR C)’.  These could give massively different results.  The way I’ve set things up is to go through the attributes and operators sequentially.  So for ‘A AND B OR C’ the API gets the dataset for A, checks this against the dataset for B and makes a new dataset containing only those locations that appear in both datasets.  It then adds all of dataset C to this.  So this is ‘(A AND B) or C’.  It is possible to do the ‘A AND (B OR C)’ search, you’d just have to rearrange the order so the select boxes are ‘B OR C AND A’.

Adding in ‘not’ works in the same sequential way, so if you do ‘A NOT B OR C’ this gets dataset A then removes from it those places found in dataset B, then adds all of the places found in dataset C.  I would hope people would always put a ‘not’ as the last part of their search, but as the above example shows, they don’t have to.  Multiple ‘nots’ are allowed too – e.g. ‘A NOT B NOT C’ will get the dataset for A, remove those places found in dataset B and then remove any further places found in dataset C.

Another thing to note is that the ‘limits’ are applied to the dataset for each attribute independently at the moment.  E.g. a search for ‘A AND B OR C’ with the limits set to ‘Present’ and age group ‘60+’ each dataset A,B and C will have these limits applied BEFORE the Boolean operators are processed.  So the ratings in dataset A will only contain those that are ‘Present’ and ‘60+’, these will then be reduced to only include those locations that are also in dataset B (which only includes ratings that are ‘Present’ and ‘60+’) and then all of the ratings for dataset C (Again which only includes those that are ‘Present’ and ‘60+’) will be added to this.

If the limits weren’t imposed until after the Boolean processes had been applied then the results could possibly be different – especially the ‘present’ / ‘absent’ limits as there would be more ratings for these to be applied to.

I met with Gary a couple of times to discuss the above as these were quite significant additions to the Atlas.  It will be good to hear the feedback he gets from the meeting this week and we can then refine the browse facilities accordingly.

I spent some further time this week on AHRC review duties and Scott Spurlock sent me a proposal document for me to review so I spent a bit of time doing so this week as well.  I also spent a bit of time on Mapping Metaphor as Wendy had uncovered a problem with the Old English data.  For some reason an empty category labelled ‘0’ was appearing on the Old English visualisations.  After a bit of investigation it turned out this had been caused by a category that had been removed from the system (B71) still being present in the last batch of OE data that I uploaded last week.  After a bit of discussion with Wendy and Carole I removed the connections that were linking to this non-existent category and all was fine again.

I met with Luca this week to discuss content management systems for transcription projects and I also had a chat this week with Gareth Roy about getting a copy of the Hansard frequencies database from him.  As I mentioned last week, the insertion of the data has now been completed and I wanted to grab a copy of the MySQL data tables so we don’t have to go through all of this again if anything should happen to the test server that Gareth very kindly set up for the database.  Gareth stopped the database and added all of the necessary files to a tar.gz file for me.  The file was 13Gb in size and I managed to quickly copy this across the University network.  I also began trying to add some new indexes to the data to speed up querying but so far I’ve not had much luck with this.  I tried adding an index to the data on my local PC but after several hours the process was still running and I needed to turn off my PC.  I also tried adding an index to the database on Gareth’s server whilst I was working from home on Thursday but after leaving it running for several hours the remote connection timed out and left me with a partial index.  I’m going to have to have another go at this next week.