Week Beginning 29th August 2016

It’s now been four years since I started this job, so that’s four years’ worth of these weekly posts that are up here now.  I have to say I’m still really enjoying the work I’m doing here.  It’s still really rewarding to be working on all of these different research projects.  Another milestone was reached this week too – the Hansard semantic category dataset that I’ve been running through the grid in batches over the past few months in order to insert it into a MySQL database has finally completed!   The database now has 682,327,045 rows in it, which is by some considerable margin the largest database I’ve ever worked with.  Unfortunately as it currently stands it’s not going to be possible to use the database as a data source for web-based visualisations as a simple ‘Select count(*)’ to return the number of rows took just over 35 minutes to execute!  I will see what can be done to speed things up over the next few weeks, though.  At the moment I believe the database is sitting on what used to be a desktop PC so it may be that moving it to a more meatier machine with lots of memory might speed things up considerably.  We’ll see how that goes.

I met with Scott Spurlock on Tuesday to discuss his potential Kirk Sessions crowdsourcing project.  It was good to catch up with Scott again and we’ve made the beginnings of a plan about how to proceed with a funding application, and also what software infrastructure we’re going to try.  We’re hoping to use the Scripto tool (http://scripto.org/), which in itself is built around MediaWiki, in combination with the Omeka content management system creator (https://omeka.org/), which is a tool I’ve been keen to try out for some time.  This is the approach that was used by the ‘Letters of 1916’ project (http://letters1916.maynoothuniversity.ie/), whose talk at DH2016 I found so useful.  We’ll see how the funding application goes and if we can proceed with this.

I also had my PDR session this week, which took up a fair amount of my time on Wednesday.  It was all very positive and it was a good opportunity to catch up with Marc (my line manager) as I don’t see him very often.  Also on Wednesday I had some communication with the Thomas Widmann of the SLD as the DSL website had gone offline.  Thankfully Arts IT Support got it back up and running again a matter of minutes after I alerted them.  Thomas also asked me about the datafiles for the Scots School Dictionary app, and I was happy to send these on to him.

I gave some advice to Graeme Cannon this week about a project he has been asked to provide technical input costings for, and I also spent some time on AHRC review duties.  Wendy also contacted me about updating the data for the main map and OE maps for Mapping Metaphor so I spent some time running through the data update processes.  For the main dataset the number of connections has gone down from 15301 to 13932 (due to some connections being reclassified as ‘noise’ or ‘relevant’ rather than ‘metaphor’ while the number of lexemes has gone up from 10715 to 13037.  For the OE data the number of metaphorical connections has gone down from 2662 to 2488 and the number of lexemes has gone up from 3031 to 4654.

The rest of my week was spent on the SCOSYA project, for which I continued to developer the prototype Atlas interface and the API.  By Tuesday I had finished an initial version of the ‘attribute’ map (i.e. it allows you to plot the ratings for a specific feature as noted in the questionnaires).  This version allowed users to select one attribute and to see the dots on a map of Scotland, with different colours representing the rating scores of 1-5 (an average is calculated by the system based on the number of ratings at a given location).  I met with Gary and he pointed out that the questionnaire data in the system currently only has latitude / longitude figures for each speaker’s current address, so we’ve got too many spots on the map.  These need to be grouped more broadly by town for the figures to really make sense.  Settlement names are contained in the questionnaire filenames and I figured out a way of automatically querying Google Maps for this settlement name (plus ‘Scotland’ to disambiguate places) in order to grab a more generic latitude / longitude value for the place – e.g. http://maps.googleapis.com/maps/api/geocode/json?sensor=false&address=oxgangs+scotland

There will be some situations where there is some ambiguity and multiple places are returned but I just grab the first and the locations can be ‘fine tuned’ by Gary via the CMS.  I updated the CMS to incorporate just such facilities, in fact.  And also updated the questionnaire upload scripts so that the Google Maps data is incorporated automatically from now on.  With this data in place I then updated the API so that it spits out the new data rather than the speaker-specific data, and updated the atlas interface to use the new values too.  The result was a much better map – less dots and better grouping.

I also updated the atlas interface so that it uses leaflet ‘circleMarkers’ rather than just ‘circles’, as this allows the markers to stay the same size at all map zoom levels where previously they looked tiny when zoomed out but then far too big when zoomed in.  I added a thin black stroke around the markers too, to make the lighter coloured circles stand out a bit more on the map.  Oh, I also changed the colour gradient to a more gradual ‘yellow to red’ approach, which works much better than the colours I was using before.  Another small tweak was to move the atlas’s zoom in and out buttons to the bottom right rather than the top left, as the ‘Atlas Display Options’ slide-out menu was obscuring these.  I never noticed as I never use these buttons as I just zoom in and out with the mouse scrollwheel, but Gary pointed out it was annoying to cover them up.  I also prevented the map from resetting its location and zoom level every time a new search was performed, which makes it easier to compare search results.  And I also prevented the scrollwheel from zooming in and out when the mouse is in the attribute drop-down list.  I haven’t figured out a way to make the scrollwheel actually scroll the drop-down list as it really ought to yet, though.

I made a few visual tweaks to the map pop-up boxes, such as linking to the actual questionnaires from the ‘location’ atlas view (this will be for staff only) and including the actual average rating in the attribute view pop-up so you don’t have to guess what it is from the marker colour.  Adding in links to the questionnaires involved reworking the API somewhat, but it’s worked out ok.

One of the biggest features I implemented this week was the ‘limit’ options.  These allow you to focus on just those locations where two or more speakers rated the attribute at 4 or 5 (currently called ‘Present’), or where all speakers at a location rated it at 1 or 2 (currently called ‘Absent’) or just those locations that don’t meet either of these criteria (currently called ‘Unclear’).  I also added in limits by age group too.  I implemented these queries via the API, which meant minimal changes to the actual JavaScript of the atlas were required (although it did mean quite a large amount of logic had to be added to the API!).

The prototype is working very nicely so far.  What I’m going to try to do next week is allow for multiple attributes to be selected, with Boolean operators between them.  This might be rather tricky, but we’ll see.  I’ll finish off with a screenshot of the ‘attribute’ search, so you can compare how it looks now to the screenshot I posted last week: