I probably spent the best part of a day on administrative tasks this week, including some relating to my role that I can’t really go into details about here. I also arranged to meet with Bryony Randall next week to discuss her text encoding project, and arranged a time for the Arts Developers to meet, which will be the week after next. I spoke with Gerry McKeever about a proposal he’s putting the finishing touches to and I helped Luca with a WordPress issue he wanted my advice with. I also had a chat with Graeme about a new proposal he’s helping to put together. I read through the materials he sent me and gave him some advice about technical approaches and costings. I also arranged a meeting with Catriona Macdonald next month regarding the People’s Voice project and helped Carole with a spam issue with one of her websites. I also spent about a further half a day on AHRC review duties, and will have to continue with this into next week too.
Other than the above I also spent a little bit of time on the Historical Thesaurus OED data import, ticking off another bunch of rows that Fraser had checked and speaking with Fraser about the next steps. I also launched this week’s Burns ‘song of the week’ (see http://burnsc21.glasgow.ac.uk/o-logan-sweetly-didst-thou-glide/). I had a hospital appointment on Friday morning so unfortunately lost a bit of time because of this. The remainder of my week was spent continuing with the migration of the ‘STARN’ resource to T4. As previously mentioned, it’s a pretty tedious task but it will be great to get it all done. I’m now about half-way through the final section: prose. This is a particularly long section, however, containing as it does a bunch of novels by Sir Walter Scott. It’s still going to be quite a while before I can get all of this finished as I am only doing a few pages here and there between other commitments.
My time this week was mostly spent on the SCOSYA Atlas again, continuing to work on the atlas search facilities, and specifically the Boolean ‘or’ search that has been proving rather tricky to get working properly. Last week I reworked my initial version of the ‘or’ search so that it would properly function when the same attribute with different limit options was selected, but during testing I realised that the search was behaving in unexpected ways when more than two attributes (or the same attribute with different limit options) were joined by an ‘or’: Rather than having the expected range of icons representing the different combinations of attributes at each location on the atlas many of the combinations were being given the same icon.
After quite a lot of head scratching I figured out what the problem was. The logic in the section of my code that loops through the locations and works out which attributes are present (or not present) at each had some flaws in it which was causing combinations that should have had two ‘yeses’ in them instead being incorrectly assigned as all ‘nos’. E.g. when three attributes are searched for and a location has the first two but not the third the combination should be ‘YYN’ but instead the code was quitting out with a ‘NNN’. Once identified a bit of tweaking and further testing corrected this issue and now a much broader selection of icons gets displayed when three or more attributes are joined by an ‘or’, as the screenshot below demonstrates.
The next thing I tackled was the map legend. As you can see in the above screenshot, the legend that’s displayed on an ‘or’ search map bears no relation to the icons found on the map. The legend instead displays the average ratings (between 1 and 5) found at each location, which is appropriate for the ‘and’ search but not an ‘or’ search with all its different icons. Adding the various ‘or’ icons to the legend actually required rather a lot of reworking of the atlas code. The icons on the map are all grouped into layers and then each layer is added to the map. These layers correspond to an item in the legend. So the ‘or’ map was still adding locations to a layer depending on its average rating rather than grouping locations based on which attribute combinations were present or absent. To have a legend that listed all the different icons and allowed the user to switch an icon on or off I needed to ensure that layers were set up for each of these attribute combinations.
It took quite a bit of reworking, but I managed to get the layer code updated so that a new layer was dynamically added for each present combination of attributes. So for example, with two attributes joined with an ‘or’ there could potentially be 4 layers (YN, YY, NY and NN), although in reality there may be less than this depending on the data. With this code in place I had a legend that listed the layers with their code combinations and the handy Leaflet checkboxes that allow you to show / hide each layer. This was all working great, but I then had to tackle my next problem: How to get the various shapes and colours of the icons represented in the legend.
The map icons I use are not actually ‘image’ files like PNGs or anything like that. They are actually markers that are dynamically generated as SVG images using the Leaflet DVF plugin. This works great as it allows me to create polygons or stars that have a random number of lines or points and can be assigned a random colour. However, replicating these dynamic shapes in the legend was not so easy. The Leaflet legend can only work with HTML by default so adding in SVG XML to make the shapes appear was not possible (or at least not possible without a massive amount of work). I spent some time trying to figure out how to get SVG shapes to appear in the legend but didn’t make much progress. I went to the gym at lunchtime and whilst jogging on the treadmill I had a brainwave: Could I not just make PNG versions of each possible shape, leaving the actual shape area transparent and then give the HTML image tag a background colour to match the required random colour?
My ‘random shape and colour’ generator only had a few possible shape options: Polygons with between 3 and 8 sides and stars with between 5 and 10 points. Any possible colour could then be applied to these. I created PNG files for each shape with the shape part transparent and the surrounding square white. I then updated the part of my code that generated the legend content to pull in the correct image (e.g. if the particular randomly generated marker was a polygon shape with three sides then it would reference the image file ‘poly-3.png’) and the randomly generated colour for the marker was then added to the HTML ‘img’ tag via CSS as a background colour. When displayed this then gave the effect of the shape being the background colour, with the white part of the PNG looking just like the white page background. It all worked very well, as the following screenshot demonstrates:
There is still much work to do with the atlas, though. For a start there is still some weird behaviour when a search combines ‘and’ and ‘or’. I’ll tackle this another week, though.
Other than SCOSYA work I had a chat with Graeme about some Leaflet things he’s beginning to experiment with. I had a chat with Luca about his projects and I also emailed all the other developers in the College of Arts to see if they’d like to meet up some time. I also did some administrative work I can’t really divulge here and replied to a query from Marc about the Hansard data. I made this week’s Burns ‘Song of the Week’ live (http://burnsc21.glasgow.ac.uk/i-love-my-jean/) and helped Carole out with a couple of issues. Other than that I continued to migrate the old STARN resource to the University’s T4 system. It’s pretty tedious work but I’m making some good progress with it. I’ve got through the bulk of the ‘poetry’ section now, at least.
I continued working on the SCOSYA project this week, further refining the ‘or’ search that I spent much of last week working on. Gary had noted that the ‘or’ search wasn’t working as intended (i.e. different icons representing different combinations of attributes) when the same attribute was selected but with different limit options – e.g. Attribute D3 rated 1-3 by young speakers OR Attribute D3 rated 1-3 by old speakers. Unfortunately, this is because all of my code was written around each attribute in an ‘or’ search being different – if two selected attributes are the same the code for splitting things up and assigning different icons simply doesn’t trigger. To get the display to work for the same attribute required some fairly major reworking of the code, which took rather a long time to get working. By the end of the week I’d got something that was sort of working in place. Now the ‘or’ search checks every selected ‘limit’ option including the attribute selection to see whether the selected attribute is the same or not. This means the ‘or’ search also works for selecting different scores, ‘interviewed by’ options and other limit options in addition to the ‘age’ selection. However, I’m noticing some errors in the choice of icons when more than two attributes are chosen. Specifically, it would appear that different attribute combinations for locations are being assigned the same icon, which is quite clearly a bug, so this is going to need some further work next week I’m afraid. Below is a screenshot of an ‘or’ search for the same attribute with different combinations of limit options, so you can see that (for two selected items at least) the ‘or’ search is now working better than last week.
I continued with more AHRC review work for a day or so this week and I also spent about a day reworking an old resource that was in desperate need of attention. The Thomas Crawford’s Diary section of Corpus of Modern Scottish Writing (http://www.scottishcorpus.ac.uk/thomascrawford/) is a WordPress powered site that was set up a couple of years before I started in this post. The WordPress instance hasn’t been updated since the site launched in 2010 and is very out of date and almost certainly a security risk. Unfortunately, the software is so old that I can’t even upgrade it using the WordPress tools – I tried doing so before and the upgrade failed and the entire site broke. The site doesn’t really need to be a WordPress site, at least not now it’s launched and the day by day postings are well and truly over. Instead I’ve created a version that uses nothing more than a small amount of PHP scripting and stores all of the diary entries in a PHP array. I think the version I’ve created works a lot better than the existing version (navigating between diary entries is easier and the order they’re listed in makes more sense) and hopefully I’ll be able to replace the existing version with my new version soon. I’ve contacted Wendy to let her approve things before I take down the old site. Hopefully the switchover will be able to take place in the next week or so and this ancient WordPress instance can be deleted.
I had two meetings with members of staff this week. The first was with Johanna Green, who now works in HATII but previously worked for the School of Critical Studies. Whilst she was still in SCS we submitted a Chancellor’s Fund proposal to develop a ‘web app’ based around an exhibition in Special Collections, and this was granted funding. We are now starting to think about developing the app and we met to discuss the options. Helpfully, Johanna had produced a series of Powerpoint based mockups of how she would like the app to look and function. We went through these slides and we have a pretty good idea about how development should proceed. I’ve requested a subdomain for the site and once we have the space available, and Johanna has got back to me with some images and other content, I’ll start to develop an initial version of the app. Note that at this stage we are merely going to create a ‘web app’ – it will use purely client-side scripting and will be optimised for touchscreens but it won’t be ‘wrapped’ as an iOS or Android app. Instead it will be accessed via a web browser. However, I will ensure the code can be ‘wrapped’ at a later date if needs be.
My second meeting was with Hannah Tweed, who is in the process of submitting a proposal for funding for a project. I can’t really go into details about this here, but we met and discussed the data management aspects of her project and after the meeting I wrote a few paragraphs for her data management plan.
I also launched the second of our weekly Burns songs this week (see http://burnsc21.glasgow.ac.uk/contented-wi-little-c/) and I spent the remainder of the week continuing to work through the OED data import for the Historical Thesaurus with Fraser. I created a bunch of new scripts to process potential category matches that Fraser had identified. For example, where the HT has ‘something/something else’ and the OED has ‘something (something else)’ these categories aren’t being flagged as the same so a little script to switch the formatting and then compare the category names helped to tick off a bunch of categories. At the start of the week we had 38,676 categories across the two datasets that were not marked as ‘checked’ and be the end of the week we had got this down to 23,595, which is pretty good going.
I continued to work on the SCOSYA project this week, picking out a couple of big items from my ‘to do’ list and working through them. The first was adding in facilities to the Atlas attribute search to display only those ratings made in questionnaires where the interviewer was a fieldworker or a participant. I’ve also added this option to the ‘consistency’ page as well. I used a drop-down list instead of the checkboxes that Gary initially suggested for this. This is because if there is a ‘participant’ and a ‘fieldworker’ checkbox it would be confusing as to what happens when both are deselected – ‘neither’ is not a valid option but it would appear to be one. In the atlas limit options the ‘interviewed by’ limit is available for each selected attribute. Implementing this meant I had to update the API to incorporate the new search options. There are now 7 possible arguments that can be passed in an ‘attribute’ search, including attribute IDs, Boolean join types, age limits, number of people, rating levels and whether spurious data is included or not. I updated the API documentation to take this into consideration too.
The second item I looked at was to update the Atlas display when multiple attributes are joined by ‘or’ in order to display a different marker shape / colour to represent each possible combination. Working out an algorithm that can generate every single ‘present / absent’ permutation for any number of selected attributes proved to be rather difficult and I spent rather a long time attempting to get something working.
Given the number of variables I wanted to get back all possible permutations of them being true or false. For two or three variables this is really simple to just work out manually, but figuring out an algorithm that can do it automatically really stumped me. So, say you have two variables A and B that can either be true or false there would be four possible outcomes:
It feels like it should be simple to write a script that can output that but I just couldn’t seem to get there and for once I was struggling to find anything on Stackoverflow that helped either. I found some information about calculating different permutations of an array, such as here: http://docstore.mik.ua/orelly/webprog/pcook/ch04_26.htm and here: http://stackoverflow.com/questions/18935813/efficiently-calculating-unique-permutations-in-a-set but this isn’t quite the same issue as they are about switching the order of the elements rather than keeping the order the same but stating whether the elements are present or absent. I.e. the output of these scripts for an array containing A and B would be:
Now when you perform a search joined with ‘or’ the script works out every possible combination and assigns a randomly selected colour and shape to each. For example, if you have two attributes D3 and A9 selected then there are 4 possible combinations:
Y, N = beige five pointed star
Y, Y = purple 5 sided polygon
N, Y = yellow 7 sided polygon
N, N = grey square (not present is always a grey square and ‘present but not meeting your criteria’ is always a grey circle)
When the script places the markers on the map it checks which attributes are present at that location, matches it to one of the above and displays the appropriately shaped and coloured marker. This means you can tell at a glance where different combinations of attributes are located. Note that this doesn’t take into consideration the rating level – i.e. you can’t tell from looking at the map where high or low scores are. Here’s a screenshot of a new ‘or’ search in action:
As the colours and shapes are randomly assigned whenever you perform a search if you don’t like the ones currently displayed you can just press the ‘show’ button again and different ones will load in. Note also that this search works best when you supply some limit options. Without them an ‘or’ search tends to just bring back everywhere having both for a lot of attributes.
I’ve still not fully completed work on this as I still need to update the map legend. When I’ve done so the different colours and shapes and what they represent will be displayed in the legend and you’ll be able to show / hide one or more marker types using it (it might take some time to implement this, though). I also need to look into an issue with different colours / shapes not appearing when the same attribute with different limits is selected multiple times joined by ‘OR’.
Other than SCOSYA work I spent some time on AHRC review duties and also gave some feedback to Graeme about an AHRC Technical Plan he’s working on. I also responded to a researcher who was looking into ways of extracting text from eBook files, had a chat with Carole about some REELS related issues and responded to a query from Marc about new OED works in the Historical Thesaurus. I also had an email conversation with Hannah Tweed about a proposal she’s putting together (we’re going to meet to discuss it next week) and looked through some materials Johanna Green sent me relating to an exhibition website the we’re going to work on together (this was all arranged before she moved from English Language to HATII). I’ll be meeting with her next week too. I also updated all of my WordPress sites to version 4.7.2 and created a new ‘Song of the Week’ feature for the Burns website. For the next 10 weeks we’re going to be publishing a new song every Wednesday. You can find more information here: http://burnsc21.glasgow.ac.uk/