Week Beginning 4th February 2019

Everyone in the College of Arts had their emails migrated to a new system this week, so I had to spend a little bit of time getting all of my various devices working properly.  Rather worryingly, the default Android mail client told me I couldn’t access my emails until I allowed outlook.office365.com to remotely control my device, which included giving permissions to erase all data from my phone, control screen locks and control cameras.  It seemed like a lot of control to be giving a third party when this is my own personal device and all I want to do is read and send emails.  After some investigation would appear that the Outlook app for Android doesn’t require permission to erase all data or control the camera, just less horrible permissions involving setting password types and storage encryption.  It’s only the default Android mail app that asks for the more horrible permissions.  I therefore switched to using the Outlook app, although I also realised the default Android calendar app was also asking for the same permissions, so I’ve had to switch to using the calendar in the Outlook app as well.

With that issue out of the way, I divided my time this week primarily between three projects.  First of all in SCOSYA.  On Wednesday I met with E and Jennifer to discuss the ‘story atlas’ interface I’d created previously.  Jennifer found the Voronoi cells rather hard to read due to the fact that the cells are overlaid on the map, meaning the cell colour obscures features such as place-names and rivers, and the cells extend beyond the edges of the coastline, which makes it hard to see exactly what part of the country each cell corresponds to.  Unfortunately the map and all its features (e.g. placenames, rivers) are served up together as tiles.  It’s not possible to (for example) have the base map, then our own polygons then place-names, rivers etc on the top.  Coloured polygons are always going to obscure the map underneath as they are always added on top of the base tiles.  Voronoi diagrams automatically generate cells based on the proximity of points, and this doesn’t necessarily work so well with a coastline such as Scotland’s that features countless islands and features.  Some cells extend across bodies of water and give the impression that features are found in areas where they wouldn’t necessarily be found.  For example, North Berwick appears in the cell generated by Anstruther, over the other side of the Firth of Forth.  We decided, therefore, to abandon Voronoi diagrams and instead make our own cells that would more accurately reflect our questionnaire locations.  This does mean ‘hard coding’ the areas, but we decided this wasn’t too much of a problem as our questionnaire locations are all now in place and are fixed.  It will mean that someone will have to manually trace out the coordinates for each cell, following the coastline and islands, which will take some time, but we reckoned the end result will be much easier to understand.  I found a very handy online tool that can be used to trace polygons on a map and then download the shapes as GeoJSON files: https://geoman.io/studio and I also investigated whether it might be possible to export the polygons generated by my existing Voronoi diagram to use these as a starting point, rather than having to generate the shapes manually from scratch.

I spent some time trying to extract the shapes, but I was unable to do so using the technologies used to generate the map, as the polygons are not geolocational shapes (i.e. with latitude / longitude pairs) but are instead SVG shapes with coordinates that relate to the screen, which then get recalculated and moved every time the underlying map moves.  However, I then investigated alternative libraries and have come across one called turf.js (http://turfjs.org/) that can generate Voronoi cells that are actual geolocational shapes.  The Voronoi bit of the library can be found here: https://github.com/Turfjs/turf-voronoi and although it rather worryingly is plastered with messages from 4 years ago saying ‘Under development’, ‘not ready for use!’ and ‘build failing’ I’ve managed to get it to work.  By passing it our questionnaire locations as lat/lng coordinates I’ve managed to get it to spit out Voronoi polygons as a series of lat/lng coordinates.  These can be uploaded to the mapping service linked to above, resulting in polygons as the following diagram shows:

However, the Voronoi shapes generated by this library are not the same dimensions as those generated by the other library (see an earlier post for an image of this).  They are a lot spikier somehow.  I guess the turf.js Voronoi algorithm is rather different to the d3.js Voronoi algorithm.  Also, the boundaries between cells consist of lines for each polygon, meaning when dragging a line you’ll have to drag two or possibly three or more lines to fully update the positions of each cell.  Finally, despite including the names of each location in the data that was inputted into the Turf.js Voronoi processor this data has been ignored, meaning the polygon shapes have no place-name associated with them.  There doesn’t seem to be a way of getting these added back in automatically, so at some point I’m going to have to manually add place-names (and unique IDs) to the data.  This is going to be pretty horrible, but actually I would have had to have done that with any manually created shapes too.  It’s now over to other members of the team to tweak the polygons to get them to fit the coastline better.

Also for SCOSYA this week, the project’s previous RA, Gary Thoms, got in touch to ask about generating views of the atlas for publication.  He was concerned about issues relating to copyright, issues relating to the resolution of the images and also the fact that the publication would prefer images to be in greyscale rather than colour.  I investigated each of these issues:

Regarding copyright:  The map imagery we use is generated using the MapBox service.  According to their terms of service (see the ‘static images for print’ section here: https://docs.mapbox.com/help/how-mapbox-works/static-maps/) we are allowed to use them in academic publications: “You may make static exports and prints for non-commercial purposes such as flyers, posters, or other short publications for academic, non-profit, or personal use.” I’m not sure what their definition of ‘short’ is, though.  Attribution needs to be supplied (see https://docs.mapbox.com/help/how-mapbox-works/attribution/).  Map data (roads, place-names etc) comes from OpenStreetMap and is released via a Creative Commons license.  This should also appear in the attribution.

Regarding resolution: The SCOSYA atlas maps are raster images rather than scalable vector images, so generating images that are higher than screen resolution is going to be tricky.  There’s not much we can do about it without generating maps in a desktop GIS package, or some other such software.  All online maps packages I’ve used (Google Maps, Leaflet, MapBox) use raster image tiles (e.g. PNG, JPEG) rather than vector images (e.g. SVG).  The page linked to above states “With the Mapbox Static API, image exports can be up to 1,280 px x 1,280 px in size. While enabling retina may improve the quality of the image, you cannot export at a higher resolution using the Static API, and we do not support vector image formats.” And later on: “The following formats are not supported as a map export option and are not currently on our road map for integration: SVG, EPS, PDF”.  The technologies we’re using were chosen to make an online, interactive atlas and I’m afraid they’re not ideally suited for producing static printed images.  However, the ‘print map to A3 Portrait image’ option I added to the CMS version of the atlas several months ago does allow you to grab a map image that is larger than your screen.  Positioning the map to get what you want is a bit hit and miss, and it can take a minute or so to process once you press the button, but it does then generate an image that is around 2440×3310 pixels, which might be good enough quality.

Regarding greyscale images: I created an alternative version of the CMS atlas that uses a greyscale basemap and icons (see below for an example).  It is somewhat tricky to differentiate the shades of grey in the icons, though, so perhaps we’ll need to use different icon shapes as well.  I haven’t heard back from Gary yet, so will just need to see whether this is going to be good enough.

The next project I focussed on this week was the Historical Thesaurus, and the continuing task of linking up the HT and OED categories and lexemes.  I updated one of the scripts I wrote last week so that the length of the subcat is compared rather than the actual subcat (so 01 and 02 now match, but 01 and 01.02 don’t).  This has increased the matches from 110 to 209.  I also needed to rewrite the script that outputted all of the matching lexemes in every matched category in the HT and OED datasets as I’d realised that my previous script had silently failed to finish due to its size – it just cut off somewhere with no error having been given by Firefox.  The same thing happened in Firefox again when I tried to generate a new output, and when trying in Chrome it spent about half an hour processing things then crashed.  I’m not sure which browser comes out worse in this, but I’d have to say Firefox silently failing is probably worse, which pains me to say as Firefox is my browser of choice.

Anyway, I have since split the output into three separate files – one each for ‘01’, ‘02’ and ‘03’ categories, and thankfully this has worked.  There are a total of 223,182 categories in the three files, up from the 222433 categories in the previous half-finished file.  I have also changed the output so that OED lexemes that are marked as ‘revised’ in the database have a yellow [R] after them.  This applies to both matched and unmatched lexemes, as I thought it might be useful to see both.  I’ve also added a count of the number of revised forms that are matched and unmatched.  These appear underneath the tables.  It was adding this info underneath the tables that led me to realise the data had failed to fully display – as although Firefox said the page was loaded there was nothing displaying underneath the table.  So, for example, in the 114,872 ‘01’ matched categories there are 122,196 words that match and are revised and 15,822 words that don’t match and are revised.

On Friday I met with Marc and Fraser to discuss the next steps for the linking process and I’ll be focussing on this for much of the next few weeks, all being well.  Also this week I finally managed to get my travel and accommodation for Bergamo booked.

The third main project I worked on this week was RNSN.  For this project I updated our over-arching timeline to incorporate the new timeline I created last week and the major changes to an existing timeline.  I also made a number of other edits to existing timelines.  One of the project partners had been unable to access the timelines from her work.  The timeline page was loading, but the Google Doc containing the data failed to load.  It turned out that the person’s work WiFi was blocking access to Google Docs, as when the person checked via the mobile network the full timeline loaded without an issue.  This got me thinking that hosting data for the timelines via Google Docs is probably a bad idea.  The ‘storymap’ data is already hosted in JSON files hosted on our own servers, but for the timelines I used the Google Docs approach as it was so easy to add and edit data.  However, it does mean that we’re relying on a third party service to publish our timelines (all other code for the timelines is hosted at Glasgow).  If providers block access to Google Doc hosted spreadsheets, or Google decides to remove free access to this data (as it recently did for Google Maps) then all our timelines break.  In addition, the data is currently tied to my Google account, meaning no-one else can edit it or access it.

After a bit of investigation I discovered that you can just store timeline data in locally hosted JSON files, and read these into the timeline script in a very similar way to a Google Doc.  I therefore created a test timeline in the JSON format and everything worked perfectly.  I migrated two timelines to this format and will need to migrate the remainder in the coming weeks.  It will be slightly time consuming and may introduce errors, but I think it will be worth it.

Also this week I made a couple of small tweaks to the Decadence and Translation transcription pages, including reordering the pages and updating notes and explanatory texts, upgraded WordPress to the latest version for all the sites I manage and fixed the footer for the DSL WordPress site.