Week Beginning 7th May 2018

Monday this week was the May Day holiday, so it was a four-day week for me.  I divided my time primarily between REELS and Linguistic DNA and updates to the Historical Thesaurus timeline interface.  For REELS I contacted Chris Fleet at the NLS about using one of their base maps in our map interface.  I’d found one that was apparently free to use and I wanted to check we had the details and attribution right.  Thankfully we did, and Chris very helpfully suggested another base map of theirs that we might be able to incorporate too.  He also pointed me towards an amazing crowdsourced resource that they had set up that has gathered more than 2 million map labels from the OS six-inch to the mile, 1888-1913 maps (see http://geo.nls.uk/maps/gb1900/).  It’s very impressive.

I also tackled the issue of adding icons to the map for classification codes rather than just having coloured spots.  This is something I’d had in mind from the very start of the project, but I wasn’t sure how feasible it would be to incorporate.  I started off by trying to add in Font Awesome icons, which is pretty easy to do with a Leaflet plugin.  However, I soon realised that Font Awesome just didn’t have the range of icons that I required for things like ‘coastal’, ‘antiquity’ ‘ecclesiastical’ and the like.  Instead I found some more useful icons: https://mapicons.mapsmarker.com/category/markers/.  The icons are released under a Creative Commons license and are free to use.  Unfortunately they are PNG rather than SVG icons, so they won’t scale quite as nicely, but they don’t look too bad on an iPad’s ‘retina’ display, so I think they’ll do.  I created custom markers for each icon and gave them additional styling with CSS.  I updated the map legend to incorporate them as well, and I think they’re looking pretty good.  It’s certainly easier to tell at a glance what each marker represents.  Here’s a screenshot of how things currently look (but this of course still might change):

I also slightly changed all of the regular coloured dots on the map to give them a dark grey border, which helps them stand out a bit more on the maps, and I have updated the way map marker colours are used for the ‘start date’ and ‘altitude’ maps.  If you categorise the map by start date the marker colours now have a fixed gradient, ranging from dark blue for 1000-1099 to red for after 1900 (the idea being things that are in the distant past are ‘cold’ and more recent things are still ‘hot’).  Hopefully this will make it easier to tell at a glance which names are older and which are more recent.  Here’s an example:

For the ‘categorised by altitude’ view I made the fixed gradient use the standard way of representing altitude on maps – ranging from dark green for low altitude, through browns and dark reds for high altitude, as this screenshot shows:

From the above screenshots you can see that I’ve also updated the map legend so that the coloured areas match the map markers, and I also added a scale to the map, with both metric and imperial units shown, which is what the team wanted.  There are still some further changes to be made, such as updating the base maps, and I’ll continue with this next week.

For Linguistic DNA and the Historical Thesaurus I met with Marc and Fraser on Wednesday morning to discuss updates.  We agreed that I would return to working on the sparklines in the next few weeks and I received a few further suggestions regarding the Historical Thesaurus timeline feature.  Marc has noticed that if your cursor was over the timeline then it wasn’t possible to scroll the page, even though a long timeline might go off the bottom of the screen.  If you moved your cursor to the sides of the timeline graphic scrolling worked normally, though.  It turned out that the SVG image was grabbing all of the pointer events so the HTML in the background never knew the scroll event was happening.  By setting the SVG to ‘pointer-events: none’ in the CSS the scroll events cascade down to the HTML and scrolling can take place.  However, this then stops the SVG being able to process click events, meaning the tooltips break.  Thankfully adding in ‘pointer-events: all’ to the bars, spots and OE label fixes this, apart from one oddity:  if your cursor is positioned over a bar, spot or the OE label and you try to scroll then nothing happens.  This is a relatively minor thing, though.  I also updated the timeline font so that it uses the font we use elsewhere on the site.

I also made the part of speech in the timeline heading lower-case to match the rest of the site, and I also realised that the timeline wasn’t using the newer versions of the abbreviations we’d decided upon (e.g. ‘adj.’ rather than ‘aj.’) so I updated this, and also added in the tooltip.  Finally, I addressed another bug whereby very short timelines were getting cut off.  I added extra height to the timeline when there are only a few rows, which stops this happening.

I had a Skype meeting with Mike Pidd had his team at DHI about the EEBO frequency data for Linguistic DNA on Wednesday afternoon.  We agreed that I would write a script that would output the frequency data for each Thematic Heading per decade as a series of CSV files that I would then send on to the team.  We also discussed the Sparkline interface and the HT’s API a bit more, and I gave some further explanation as to how the sparklines work.  After the meeting I started work on the export script, which does the following:

  1. It goes through every thematic heading that is up to the third hierarchy down.
  2. If the heading in question is a third level one then all lexemes from any lower levels are added into the output for this level
  3. Each CSV is given the heading number as a filename, but with dashes instead of colons as colons are bad characters for filenames
  4. Columns one and two are the heading and title of the thematic heading.  This is the same for every row in a file – i.e. words from lower down the hierarchy do not display their actual heading.  E.g. words from ‘AA:03:e:01 Volcano’ will display ‘AA:03:e High/rising ground’
  5. Column 3 contains the word and 4 the POS.
  6. Column 5 contains the number of senses in the HT.  I had considered excluding words that had zero senses in the HT as a means of cutting out a lot of noise from the data, but decided against this in the end, as it would also remove a lot of variant spellings and proper names, which might turn out to be useful at some point.  It will be possible to filter the data to remove all zero sense rows at a later date.
  7. The next 24 columns contain the data per decade, starting at 1470-1479 and ending with 1700-1709
  8. The final column contains the total frequency count

I started my script running on Thursday, and left my PC on over night to try and get the processing complete.  I left it running when I went home at 5pm, expecting to find several hundred CSV files had been outputted.  Instead, Windows had automatically installed an update and restarted my PC at 5:30, thus cancelling the script, which was seriously annoying.  It does not seem to be possible to stop Windows doing such things, as although there are plenty of Google results about how to stop Windows automatically restarting when installing updates Microsoft changes Windows so often that all the listed ways I’ve looked at no longer work.  It’s absolutely ridiculous as it means running batch processes that might take a few days is basically impossible to do with any reliability on a Windows machine.

Moving on to other tasks I undertook this week:  I sorted out payment for the annual Apple Developer subscription, which is necessary for our apps to continue to be listed on the App Store.  I also responded to a couple of app related queries from an external developer who is making an app for the University.  I also sorted out the retention period for user statistics for Google Analytics for all of the sites we host, after Marc asked me to look at this.