Week Beginning 8th May 2017

On Monday this week I had a Skype meeting with the HRI people in Sheffield (recently renamed the Digital Humanities Institute) about the Linguistic DNA project.  I demonstrated the sparklines I’ve been working on and showed them the API and talked about the heatmap that I will also be developing, and the possibility of using the ‘highcharts’ view that I am hoping to use in addition to the sparkline view.  Mike and Matt talked about the ‘workbench’ that they are going to create for the project that will allow researchers to visualise the data.  They’re going to be creating a requirements document for this soon and as part of this they will look at our API and visualisations and work out how these might also be incorporated, and if further options need to be added to our API.

I was asked to review a paper this week so I spent a bit of time reading through it and writing a review.  I also set up an account on the Burns website for Craig Lamont, who will be working on this in future and responded to a query about the SciFiMedHums bibliography database.  I also had to fix a few issues with some websites following their migration to a new server and had to get some domain details to Chris to allow him to migrate some other sites.

I also spent a few hours getting back into the digital edition I’m making for Bryony Randall’s ‘New Modernist Editing’ project.  The last feature I need to add to my digital edition is editorial corrections.  I needed to mark up all of the typos and other errors in the original text and record the ‘corrected’ versions that Bryony had supplied me with.  I also then needed to update the website to allow users to switch between one view and the other using the ‘Edition Settings’ feature.  I used the TEI <choice> element with a <sic> tag for the original ‘erroneous’ text and a <corr> tag for the ‘corrected’ text.  This is the standard TEI way of handling such things and it works rather well.  I updated the section of jQuery that processes the XML and transforms it into HTML.  When the ‘Edition Settings’ has ‘sic’ turned on and ‘corr’ turned off the original typo filled text is displayed.  When ‘corr’ is turned on and ‘sic’ is turned off you get the edited text and when both ‘sic’ and ‘corr’ are turned on the ‘sic’ text is given a red border while the ‘corr’ text is given a green border so the user can see exactly where changes have been made and what text has been altered.  I think it works rather nicely.  See the following screenshot for an example.  I have so far only added in the markup for the first two pages but I hope to get the remaining four done next week.

For the rest of the week I focussed on the sparkline visualisations for the LDNA project.  Last week I created an API for the Historical Thesaurus that will allow the visualisations (or indeed anyone’s code) to pass query strings to it and receive JSON or CSV formatted data in return.  This week I created new versions of the sparkline visualisations that connected to this API.  I also had to update my ‘period cache’ data to include ‘minimum mode’ for each possible period, in addition to ‘minimum frequency of mode’.  This took quite a while to process as the script needed to generate the mode for every single possible combination of start and end decade over a thousand years.  It took a few hours to process but once it had completed I could update the ‘cache’ table in the online version of the database and update the sparkline search form so that it would pull in and display these.

I also started to make some further changes to the sparkline search form.  I updated the search type boxes so that the correct one is highlighted as soon as the user clicks anywhere in the section, rather than having to actually click on the radio button within the section.  This makes the form a lot easier to use as previously it was possible to fill in some details for ‘peak’, for example, but then forget to click on the ‘peak’ radio button, meaning that a ‘peak’ search didn’t run.  I also updated the period selectors so that instead of using two drop-down lists, one for ‘start’ decade and one of ‘end’ decade there is now a jQuery UI slider that allows a range to be selected.  I think this is nicer to use, although I have also started to employ sliders to other parts of the form to and I worry that it’s just too many sliders.  I might have to rethink this.  It’s also possible slightly confusing that updating the ‘period’ slider then updates the extent of the ‘peak decade’ slider, but the size of this slider remains the same as the full period range slider.  So we have one slider where the ends represent 1010s and 2000s but if you select the period 1200s to 1590s within this then the full extent of the ‘peak decade’ slider is 1200s to 1590s, even though it takes up the same width as the other slider.  See the screenshot below.  I’m thinking that this is just going to be too many sliders.

I also updated the search results page.  For ‘plateau’ searches I updated the sparklines so that the plateaus rather than peaks were highlighted with red spots.  I also increased the space devoted to each sparkline and added a border between each to make it easier to tell which text refers to which line.  I also added in an ‘info’ box that when clicked on gives you some statistics about the lines, such as the number of decades represented, the size of the largest decade and things like that.  I also added in a facility to download the CSV data for the sparklines you’re looking at.

I’ll continue with this next week.  What I really need to do is spend quite a bit of time testing out the search facilities to ensure they return the correct data.  I’m noticing some kind of quirk with the info box pop-ups, for example, that seems to sometimes display incorrect values for the lines.  There are also some issues relating to searches that do not cover the full period that I need to investigate.  And then after that I need to think about the heatmap and possible using HighCharts as an alternative to the D3 sparklines I’m currently using.  See the screenshot abovefor an example of the sparklines as they currently stand.



Week Beginning 17th April 2017

This was my first week back after the Easter holidays and was in fact a four-day week due to Monday being Easter Monday.  Despite being only four days long it was a pretty hectic week.  I had to travel to Edinburgh for a project meeting on Tuesday for Jane Stuart Smith’s SPADE (SPeech Across Dialects of English) project, which will be officially starting in the next few weeks (or whenever the funding actually comes through).  This is a fairly large project involving partners in Canada and the US and our meeting was the first opportunity for members of the project to meet in person since the notification of award was given.  It was a useful meeting and we discussed what the initial stages for each project partner would be, how data would be collected and other such matters.  I’m not hugely involved in the project (4.5% of my time over 3 years) and will mainly be spending that time developing the project website, which will feature a map-based interface to summary information about the data the project will be dealing with.  The main focus of the project is the creation of an ‘integrated speech corpus analysis’ tool, which is being undertaken at McGill University in Canada, and it was interesting to learn more about this software.  I spent the bulk of Tuesday preparing for the meeting, travelling and attending the meeting.

On Thursday and Friday this week I attended two workshops that Bryony Randall had organised as part of her ‘New Modernist Editing’ AHRC Network project.  The first was for postgraduates who wanted to learn more about transcription and annotation with specific emphasis on Modernist texts.  I was leading a two-hour hands-on session on TEI, XML and Transcription as part of the event, so I spent quite a bit of time preparing for this.  I’d started putting some materials together before my holiday and indeed I’d worked on the materials during my holiday too, but I still had quit a lot of preparation to do in the run-up to the event.  Thankfully the session went pretty well.  TEI and XML can be rather intimidating, especially for people with no previous experience of such matters and I was really hoping to put a session together that managed to cover the basics without putting people off.  I think I managed to achieve this and by the end of the session all of the participants had managed to get a taste of TEI transcription.

The event on Friday was one of the main Network events for the project and as part of this I had a half-hour session where I was to demonstrate the digital edition I had created for Virginia Woolf short story (see previous posts for lots more information about this).  I think the demonstration went ok, but I managed to mess things up at the start by having the wrong edition settings set up, which wasn’t so good.  I also fear that I went into too much technical detail for an audience that were not especially technically minded.  In fact at least some of them were rather against having digital editions at all.

I didn’t have time to do much other work this week, other than to catch up with emails and things like that.  I did do some further work during my holiday, however.  Firstly I had to respond to AHRC review feedback for Murray Pittock and secondly I had to give feedback on a technical plan that Meg MacDonald had asked for some help with.

With my speaking at events now complete for the foreseeable future I will be able to return to more usual work next week.  I have a lot still to do for the Historical Thesaurus visualisations for the Linguistic DNA project and a number of items still to sort out for the SCOSYA project, for a start.

Week Beginning 3rd April 2017

I split my time this week pretty evenly between two projects, the Historical Thesaurus visualisations for the Linguistic DNA project and the New Modernist Editing project.  For the Historical Thesaurus visualisations I continued with the new filter options I had started last week that would allow users to view sparklines for thematic categories that exhibited particular features, such as peaks and plateaus (as well as just viewing sparklines for all categories).  It’s been some pretty complicated work getting this operational as the filters often required a lot of calculations to be made on the fly in my PHP scripts, for example calculating standard deviations for categories within specified periods.

I corresponded with Fraser about various options throughout the week.  I have set up the filter page so that when a particular period is selected by means of select boxes a variety of minimum and maximum values throughout the page are update by means of an AJAX call.  For example if you select the period 1500-2000 then the minimum and maximum standard deviation values for this period are calculated and displayed, so you get an idea of what values you should be entering.  See the screenshot below of an idea of how this works:

Eventually I will replace these ‘min’ and ‘max’ values and a textbox for supplying a value with a much nicer slider widget, but that’s still to be done.  I also need to write a script that will cache the values generated by the AJAX call as currently the processing is far too slow.  I’ll write a script that will generate the values for every possible period selection and will store these in a database table, which should mean their retrieval will be pretty much instantaneous rather than taking several seconds as it currently does.

As you can see from the above screenshot, the user has to choose which type of filter they wish to apply by selecting a radio button.  This is to keep things simple, but we might change this in future.  During the week we decided to remove the ‘peak decade’ option as it seemed unnecessary to include this when the user could already select a period.

After removing the additional ‘peak decade’ selector I realised that it was actually needed.  The initial ‘select the period you’re interested in’ selector specifies the total range you’re interested in and therefore sets the start and end points of each sparkline’s x axis.  The ‘peak decade’ selector allows you to specify when in this period a peak must occur for the category to be returned.  So we need the two selectors to allow you to do something like “I want the sparklines to be from 1010s to 2000s and the peak decade to be between 1600 and 1700”.

Section 4 of the form has 5 possible options.  Currently only ‘All’, ‘peak’ and ‘plateau’ are operational and I’ll need to return to this after my Easter holiday.  ‘All’ brings back sparklines for your selected period and a selected average category size and / or minimum size of largest category.

‘Peaks’ allows you to specify a period (a subset of your initial period selection) within which a category must have its largest value for it to be returned.  You can also select a minimum percentage difference between largest and end.  The difference is a negative value and if you select -50 the filter will currently only then bring back categories where the percentage difference is between -50 and 0.  I was uncertain whether this was right and Fraser confirmed that it should instead be between -50 and -100 instead so that’s another thing I’ll need to fix.

You can also select a minimum standard deviation.  This is calculated based on your initial period selection.  E.g. if you say you’re interested in the period 1400-1600 then the standard deviation is calculated for each category based on the values for this period alone.

‘Plateaus’ is something that needs some further tweaking.  The option is currently greyed out until you select a date range that is from 1500 or later.  Currently you can specify a minimum mode, and the script works out the mode for each category for the selected period and if the mode is less than the supplied minimum mode the category is not returned.  I think that specifying the minimum number of times the mode occurs would be a better indicator and will need to implement this.

You can also specify the ‘minimum frequency of 5% either way from mode’.  For your specified period this currently works out 5% under the mode as:  the largest number of words in a decade multiplied by 0.05 and then subtract this from the mode.  5% over is the value largest number of words multiplied by 0.05 added onto the mode.  E.g. if the mode is 261 and the largest is 284 then 5% under is 246.8 and 5% over is 275.2.

For each category in your selected period the script counts the number of times the number of words in a decade falls within this range.  If the tally for a category is less than the supplied ‘minimum frequency of 5% either way from mode’ then the category is removed from the results.

I’ve updated the display of results to include some figures about each category as well as the sparkline and the ‘gloss’.  Information about the largest decade, the average decade, the mode, the standard deviation, the frequency of decades that are within 5% under / over the mode and what this 5% under / over range is are displayed when relevant to your search.

There are some issues with the above that I still need to address.  Sometimes the sparklines are not displaying the red dots representing the largest categories, and this is definitely a bug.  Another serious bug also exists in that when some combinations of options are selected PHP encounters an error and the script terminates.  This seems to be dependent on the data and as PHP errors are turned off on the server I can’t see what the problem is.  I’m guessing it’s a divide by zero error or something like that.  I hope to get to the bottom of this soon.

For the New Modernist Editing project I spent a lot of my time preparing materials for the upcoming workshop.  I’m going to be running a two-hour lab on transcription, TEI and XML for post-graduates and I also have a further half-hour session another day where I will be demonstrating the digital edition I’ve been working on.  It’s taken some time to prepare these materials but I feel I’ve made a good start on them now.

I also met with Bryony this week to discuss the upcoming workshop and also to discuss the digital edition as it currently stands.  It was a useful meeting and after it I made a few minor tweaks to the website and a couple of fixes to the XML transcription.  I still need to add in the edited version of the text but I’m afraid that is going to have to wait until after the workshop takes place.

I also spent a little bit of time on other projects, such as reading through the proposal documentation for Jane Stuart-Smith’s new project that I am going to be involved with, and publishing the final ‘song of the week’ for the Burns project (See http://burnsc21.glasgow.ac.uk/my-wifes-a-wanton-wee-thing/).  I also spoke to Alison Wiggins about some good news she had received regarding a funding application.  I can’t say much more about it just now, though.

Also this week I fixed an issue with the SCOSYA Atlas for Gary.  The Atlas suddenly stopped displaying any content, which was a bit strange as I hadn’t made any changes to it around the time it appears to have broken.  A bit of investigation uncovered the source of the problem.  Questionnaire participants are split into two age groups – ‘young’ and ‘old’.  This is based on the participants age.  However, two questionnaires had been uploaded for participants whose ages did not quite fit into our ‘young’ and ‘old’ categories and they were therefore being given a null age group.  The Atlas didn’t like this and stopped working when it encountered data for these participants.  I have now updated the script to ensure the participants are within one of our age groups and I’ve also updated things so that if any other people don’t fit in the whole thing doesn’t come crashing down.

I’m going to be on holiday all next week and will be back at work the Tuesday after that (as the Monday is Easter Monday).

Week Beginning 27th March 2017

I spent about a day this week continuing to tweak the digital edition system I’m creating for the ‘New Modernist Editing’ project.  My first task was to try and get my system working in Internet Explorer, as my current way of doing things produced nothing more than a blank section of the page when using this browser.  Even though IE is now obsolete it’s still used by a lot of people and I wanted to get to the bottom of the issue.  The problem was that jQuery’s find() function when executed in IE won’t parse an XMLDocument object.  I was loading in my XML file using jQuery’s ‘get’ method, e.g.:

$.get(“xml/ode.xml”, function( xmlFile ) {

//do stuff with xml file here


After doing some reading about XML files in jQuery it looked like you had to run a file through parseXML() in order to work with it (see http://api.jquery.com/jQuery.parseXML/) but when I did this after the ‘get’ I just got errors.  It turns out that the ‘get’ method automatically checks the file it’s getting and if it’s an XML file is automatically runs it through parseXML() behind the scenes so the text file is already an XMLDocument object by the time you get to play with it.

Information on this page (http://stackoverflow.com/questions/4998324/jquery-find-and-xml-does-not-work-in-ie) suggested an alternative way to load the XML file so that it could be read in IE but I realised in order to get this to work I’d need to get the plain text file rather than the XMLDocument object that jQuery had created.  I therefore used the ‘ajax’ method rather than the shorthand ‘get’ method, which allowed me to specify that the returned data was be to treated as plain text and not XML:


url: “xml/ode.xml”,

dataType: “text”}).done(function(xmlFile){

//do stuff with xml file here


This meant that jQuery didn’t automatically convert the text into an XMLDocument object and I was intending to then manually call the parseXML method for non-IE browsers and do separate things just for IE.  But rather unexpectedly jQuery’s find() function and all other DOM traversal methods just worked with the plain text, in all browsers including IE!  I’m not really sure why this is, or why jQuery even needs to bother converting XML into an XMLDocument Object if it can just work with it as plain text.  But as it appears to just work I’m not complaining.

To sum up:  to use jQuery’s find() method on an XML file in IE (well, all browsers) ensure you pass plain text to the object and not an XMLDocument object.

With this issue out of the way I set to work on adding some further features to the system.  I’ve integrated editorial notes with the transcription view, using the very handy jQuery plugin Tooltipster (http://iamceege.github.io/tooltipster/).  Words or phrases that have associated notes appear with a dashed line under them and you can click on the word to view the note and click anywhere to hide the note again.  I decided to have notes appearing on click rather than on hover because I find hovering notes a bit annoying and clicking (or tapping) works better on touchscreens too.  The following screenshot shows how the notes work:

I’ve also added in an initial version of the ‘Edition Settings’ feature.  This allows the user to decide how they would like the transcription to be laid out.  If you press on the ‘Edition Settings’ button this opens a popup (well, a jQuery UI modal dialog box, to be precise) through which you can select or deselect a number of options, such as visible line breaks, whether notes are present or not etc.  Once you press the ‘save’ button your settings are remembered as you browse between pages (but resets if you close your browser or navigate somewhere else).  We’ll eventually use this feature to add in alternatively edited views of the text as well – e.g. one that corrects all of the typos. The screenshot below shows the ‘popup’ in action:

I spent about a day on AHRC duties this week and did a few other miscellaneous tasks, such as making the penultimate Burns ‘new song of the week’ live (see http://burnsc21.glasgow.ac.uk/when-oer-the-hill-the-eastern-star/) and giving some advice to Wendy Anderson about OCR software for one of her post-grad students.  I had a chat with Kirsteen McCue about a new project she is leading that’s starting up over the summer and I’ll need to give some input into.  I also made a couple of tweaks to the content management system for ‘The People’s Voice’ project following on from our meeting last week.  Firstly, I added new field called ‘sound file’ to the poem table.  This can be used to add in the URL of a sound file for the poem.  I updated the ‘browse poems’ table to include a Y/N field for whether there is a sound file present so that the project team can therefore order the table by the column and easily find all of the poems that have sound files.  The second update I made was to the ‘edit’ pages for a person, publication or library.  These now list the poems that the selected item is associated with.  For people there are two lists, one for people associated as authors and another for people who feature in the poems.  For libraries there are two lists, one for associated poems and another for associated publications.  Items in the lists are links that take you to the ‘edit’ page for the listed poem / publication.  Hopefully this will make it easier for the team to keep track of which items are associated with which poems.

I also met with Gary this week to discuss the new ‘My Map Data’ feature I implemented last week for the SCOSYA project.  It turns out that display of uploaded user data isn’t working in the Safari browser that Gary tends to use, so he had been unable to see how the feature works.  I’m going to have to investigate this issue but haven’t done so yet.  It’s a bit of a strange one as the data all uploads fine – it’s there in the database and is spat out in a suitable manner by the API, but for some reason Safari just won’t stick the data on the map.  Hopefully it will be a simple bug to fix.  Gary was able to use the feature by switching to Chrome and is now trying it out and will let me know of any issues he encounters.  He did encounter one issue in that the atlas display is dependent on the order of the locations when grouping ratings into averages.  The file he uploaded had locations spread across the file and this meant there were several spots for certain locations, each with different average rating colours.  A simple reordering of his spreadsheet fixed this, but it may be something I need to ensure gets sorted programmatically in future.

I also spent a bit of time this week trying to write down a description of how the advanced attribute search will work.  I emailed this document to Gary and he is going to speak to Jennifer about it.  Gary also mentioned a new search that will be required – a search by participant rather than by location.  E.g. show me the locations where ‘participant a’ has a score of 5 for both ‘attribute x’ and ‘attribute y’.  Currently the search is just location based rather than checking that individual participants exhibit multiple features.

There was also an issue with the questionnaire upload facility this week.  For some reason the questionnaire upload was failing to upload files, even though there were no errors in the files.  After a bit of investigation it turned out that the third party API I’m using to grab the latitude and longitude was down, and without this data the upload script gave an error.  The API is back up again now, but at the time I decided to add in a fallback.  If this first API is down my script now attempts to connect to a second API to get the data.

I spent the rest of the week continuing to work on the new visualisations of the Historical Thesaurus data for the Linguistic DNA project.  Last week I managed to create ‘sparklines’ for the 4000 thematic headings.  This week I added red dots to the sparklines to mark where the peak values are.  I’ve also split the ‘experiments’ page into different pages as I’m going to be trying several different approaches.  I created an initial filter for the sparklines (as displaying all 4000 on one page is probably not very helpful).  This filter allows users to do any combination of the following:

Select an average category size range (between average size ‘x’ and average size ‘y’)

Select a period in which the peak decade is reached (between decade ‘x’ and decade ‘y’)

Select a minimum percentage rise of average

Select a minimum percentage fall of average (note that as this is negative values the search will bring back everything with a value less than or equal to the value you enter).

This works pretty nicely, so example the following screenshot shows all headings that have an average size of 50 or more and have a peak between 1700 and 1799:

With this initial filter option in place I started work on more detailed options that can identify peaks and plateaus and things like that.  The user first selects a period in which they’re interested (which can be the full date range) and this then updates the values that are possible to enter in a variety of fields by means of an AJAX call.  This new feature isn’t operational yet and I will continue to work on it next week, so I’ll have more to say about it in the next report.


Week Beginning 20th March 2017

I managed to make a good deal of progress with a number of different projects this week, which I’m pretty pleased about.  First of all there is the digital edition that I’m putting together for Bryony Randall’s ‘New Modernist Editing’ project.  Last week I completed the initial transcript of the short story and created a zoomable interface for browsing through the facsimiles.  This week I completed the transcription view, which allows the user to view the XML text, converted into HTML and styled using CSS.  It includes the notes and gaps and deletions but doesn’t differentiate between pencil and ink notes as of yet.  It doesn’t include the options to turn on / off features such as line breaks at this stage either, but it’s a start at least.  Below is a screenshot so you can see how things currently look.

The way I’ve transformed and styled the XML for display is perhaps a little unusual.  I wanted the site to be purely JavaScript powered – no server-side scripts or anything like that.  This is because the site will eventually be hosted elsewhere.  My plan was to use jQuery to pull in and process the XML for display, probably by means of an XSLT file.  But as I began to work on this I realised there was an even simpler way to do this.  With jQuery you can traverse an XML file in exactly the same way as an HTML file, so I simply pulled in the XML file, found the content of the relevant page and spat it out on screen.  I was expecting this to result in some horrible errors but… it just worked.  The XML and its tags get loaded into the HTML5 document and I can just style these using my CSS file.

I tested the site out in a variety of browsers and it works fine in everything other than Internet Explorer (Edge works, though).  This is because of the way jQuery loads the XML file and I’m hoping to find a solution to this.  I did have some nagging doubts about displaying the text in this way because I know that even though it all works it’s not valid HTML5. Sticking a bunch of <lb>, <note> and other XML tags into an HTML page works now but there’s no guarantee this will continue to work and … well, it’s not ‘right’ is it.

I emailed the other Arts Developers to see what they thought of the situation and discussed some other possible ways for handling things.  I decided I could leave things as they were.  I could use jQuery to transform the XML tags into valid HTML5 tags.  I could run my XML file through an XSLT file to convert it to HTML5 before adding it to the server so no transformation needs to be done on the fly.  I could see if it’s possible to call an XSLT file from jQuery to transform the XML on the fly.  Graeme suggested that it would be possible to process an XSLT file using JavaScript (as is described here https://www.w3schools.com/xml/xsl_client.asp) so I started to investigate this.

I managed to get something working, but… I was reminded just how much I really dislike XSLT files.  Apologies to anyone who likes that kind of thing but my brain just finds them practically incomprehensible.  Doing even the most simple of things seems far too convoluted.  So I decided to just transform the XML into HTML5 using jQuery.  There are only a handful of tags that I need to deal with anyway.  All I do is find each occurrence of an XML tag, grab its contents, add a span after the element and then remove the element, e.g:




var content = “<span class=\”del\”>”+$(this).html()+”</span>”;





I can even create a generic function that will pass the tag name and spit out a span with that tag name while removing the tag from the page.  When it comes to modifying the layout based on user preferences I’ll be able to handle that straightforwardly via jQuery too.  E.g. whether line breaks are on or off:


//line breaks



$(this).after(“<br />”);


$(this).after(“ “);




For me at least this is a much easier approach than having to pass variables to an XSLT file.

I spent a day or so working on the SCOSYA atlas as well and I have now managed to complete work on an initial version of the ‘my map data’ feature.  This feature lets you upload previously downloaded files to visualise the data on the atlas.

When you download a file now there is a new row at the top that includes the URL of the query that generated the file and some explanatory text.  You can add a title and a description for your data in columns D and E of the first row as well.  You can make changes to the rating data, for example deleting rows or changing ratings and then after you’ve saved your file you can upload it to the system.

You can do this through the ‘My Map Data’ section in the ‘Atlas Display Options’.  You can either drag and drop your file into the area or click to open a file browser.  An ‘Upload log’ displays any issues with your file that the system may encounter.  After upload your file will appear in the ‘previously uploaded files’ section and the atlas will automatically be populated with your data.  You can re-download your file by pressing on the ‘download map data’ button again and you can delete your uploaded file by pressing on the appropriate ‘Remove’ button.  You can switch between viewing different datasets by pressing on the ‘view’ button next to the title.  The following screenshot shows how this works:

I tested the feature out with a few datasets, for example I swapped the latitude and longitude columns round and the atlas dutifully displayed all of the data in the sea just north of Madagascar, so things do seem to be working.  There are a couple of things to note, though.  Firstly, the CSV download files currently do not include data that is below the query threshold, so no grey spots appear on the user maps.  We made a conscious decision to exclude this data but we might now want to reinstate it.  Secondly, the display of the map is very much dependent on the URL contained in the CSV file in row 1 column B.  This is how the atlas knows whether to display an ‘or’ map or an ‘and’ map, and what other limits were placed on the data.  If the spreadsheet is altered so that the data contained does not conform to what is expected by the URL (e.g. different attributes are added or new ratings are given) then things might not display correctly.  Similarly, if anyone removes or alters that URL from the CSV files some unexpected behaviour might be encountered.

Note also that ‘my map data’ is private – you can only view your data if you’re logged in.  This means you can’t share a URL with someone.  I still need to add ‘my map data’ to the ‘history’ feature and do a few other tweaks.  I’ve just realised trying to upload ‘questionnaire locations’ data results in an error, but I don’t think we need to include the option to upload this data.

I also started working on the new visualisations for the Historical Thesaurus that will be used for the Linguistic DNA project, based on the spreadsheet data that Marc has been working on.  We have data about how many new words appeared in each thematic heading in every decade since 1000 and we’re going to use this data to visualise changes in the language.  I started by reading through all of the documentation that Marc and Fraser had prepared about the data, and then I wrote some scripts to extract the data from Marc’s spreadsheet and insert it into our online database.  Marc had incorporated some ‘sparklines’ into his spreadsheet and my first task after getting the data available was to figure out a method to replicate these sparklines using the D3.js library.  Thankfully, someone had already done this for stock price data and had created a handy walkthrough of how to do it (see http://www.tnoda.com/blog/2013-12-19).  I followed the tutorial and adapted it for our data, writing a script that created sparklines for each of the almost 4000 thematic headings we have in the system and displaying these all on a page.  It’s a lot of data (stored in a 14Mb JSON file) and as of yet it’s static, so users can’t tweak the settings to see how this affects things, but it’s a good proof of concept.  You can see a small snippet from the gigantic list below:

Other than these tasks I published this week’s new Burns song (see http://burnsc21.glasgow.ac.uk/braw-lads-on-yarrow-braes/) and I had a meeting with The People’s Voice project team where we discussed how the database of poems will function, what we’ll be doing about the transcriptions, and when I will start work on things.  It was a useful meeting and in addition to these points we identified a few enhancements I am going to make to the project’s content management system.  I also answered a query about some App development issues from elsewhere in the University and worked with Chris McGlashan to implement an Apache module that limits access to the pages held on the Historical Thesaurus server so as to prevent people from grabbing too much data.


Week Beginning 13th March 2017

At the start of the week I had to spend a little time investigating some issues with a couple of my WordPress sites, which were failing to connect to external services such as the Akismet anti-spam service.  It turned out that there had been a hardware failure on one of the servers which had affected outgoing connections. With the help of Chris I managed to get this sorted again and also took the opportunity to upgrade all of the WordPress instances I manage to the latest incremental version.  Some further maintenance issues were encountered later in the week when Chris informed me that one of our old sites (the Old English Teaching site that was set up long before I started working in my current role) had some security issues, so I fixed these as soon as I could.

I spent about two days this week working on the New Modernist Editing project for Bryony Randall.  I’m creating a digital edition of a short story by Virginia Woolf that will include a facsimile view and a transcription view, with the transcription view offering several ways to view the text by turning on or off various features of the text.  One possibility I investigated was using the digital edition system that has been developed for the Digital Vercelli Book (see http://vbd.humnet.unipi.it/beta2/#).  This is a really lovely interface for displaying facsimiles and TEI texts and the tool is available to download and reuse.  However, it offers lots of functionality that we don’t really need and it doesn’t provide the facility to tailor the transcription view based on the selection or individual features.  While it would be possible to add this feature in, I decided that it would be simpler if I just made a simple system from scratch myself.

I used OpenLayers (http://openlayers.org/) to create a zoomable interface for the facsimile view and a few lines of jQuery handled the navigation, the display of a list of thumbnails and things like that.  I also added in a section for displaying the transcription and facilities to turn the facsimile and transcription view on or off.  I’m pretty happy with how things are progressing.  Here’s a screenshot of the facsimile view as it currently stands:

I also worked on the transcription itself, completing an initial transcription of the six manuscript pages as TEI XML.  This included marking deleted text, notes, illegible text, gaps in the text and things like that.  It’s only a first attempt and I still might change how certain aspects are marked up, but it’s good to have something to work with now.

I uploaded this week’s new Burns song of the week (See http://burnsc21.glasgow.ac.uk/lassie-wi-the-lintwhite-locks/) and corresponded with Kirsteen about a new web resource she requires, with Craig regarding his Burns bibliography database and with Ronnie about his Burns paper database.  I also spent a few hours on further AHRC review duties and had a meeting with Marc and Fraser about future Historical Thesaurus plans and the Linguistic DNA project.  I’m going to be producing some new visualisations of the thesaurus data to show the change in language over time.  I can’t really say more about them at this stage, but I’ll start investigating the possibilities next week.

For the remainder of the week I continued to work on the ‘upload my data’ facilities for the SCOSYA project.  Last week I completed the options for uploading data files and this week I set to work on actually allowing these files to be visualised through the atlas.  It’s proving to be a tricky process to get sorted and thus far I haven’t managed to visualise anything other than a JavaScript error.  The problem is I can’t just take the ratings and data points and display them – instead I need to display them in a way that matches the selection options that the user chose when they downloaded the file – for example the Boolean operators used to join the attributes.  I’m definitely making progress with this, but it’s pretty far from being finished as of yet.  I’ll continue with this next week, though.

Week Beginning 6th March 2017

This week was a pretty busy one, working on a number of projects and participating in a number of meetings.  I spent a bit of time working on Bryony Randall’s New Modernist Editing project.  This involved starting to plan the workshop on TEI and XML – sorting out who might be participating, where the workshop might take place, what it might actually involve and things like that.  We’re hoping it will be a hands-on session for postgrads with no previous technical experience of transcription, but we’ll need to see if we can get a lab booked that has Oxygen available first.  I also worked with the facsimile images of the Woolf short story that we’re going to make a digital edition of.  The Woolf estate wants a massive copyright statement to be plastered across the middle of every image, which is a little disappointing as it will definitely affect the usefulness of the images, but we can’t do anything about that.  I also started to work with Bryony’s initial Word based transcription of the short story, thinking how best to represent this in TEI.  It’s a good opportunity to build up my experience of Oxygen, TEI and XML.

I also updated the data for the Mapping Metaphor project, which Wendy has continued to work on over the past few months.  We now have 13,083 metaphorical connections (down from 13931), 9,823 ‘first lexemes’ (up from 8,766) and 14,800 other lexemes (up from 13,035).  We also now have 300 categories completed, up from 256.  I also replaced the old ‘Thomas Crawford’ part of the Corpus of Modern Scottish Writing with my reworked version.  The old version was a WordPress site that hadn’t been updated since 2010 and was a security risk.  The new version (http://www.scottishcorpus.ac.uk/thomascrawford/) consists of nothing more than three very simple PHP pages and is much easier to navigate and use.

I had a few Burns related tasks to take care of this week.  Firstly there was the usual ‘song of the week’ to upload, which I published on Wednesday as usual (see http://burnsc21.glasgow.ac.uk/ye-jacobites-by-name/).  I also had a chat with Craig Lamont about a Burns bibliography that he is compiling.  This is currently in a massive Word document but he wants to make it searchable online so we’re discussing the possibilities and also where the resource might be hosted.  On Friday I had a meeting with Ronnie Young to discuss a database of Burns paper that he has compiled.  The database currently exists as an Access database with a number of related images and he would like this to be published online as a searchable resource.  Ronnie is going to check where the resource should reside and what level of access should be given and we’ll take things from there.

I had been speaking to the other developers across the College about the possibility of meeting up semi-regularly to discuss what we’re all up to and where things are headed and we arranged to have a meeting on Tuesday this week.  It was a really useful meeting and we all got a chance to talk about our projects, the technologies we use, any cool developments or problems we’d encountered and future plans.  Hopefully we’ll have these meetings every couple of months or so.

We had a bit of a situation with the Historical Thesaurus this week relating to someone running a script to grab every page of the website in order to extract the data from it, which is in clear violation of our terms and conditions.  I can’t really go into any details here, but I had to spend some of the week identifying when and how this was done and speaking to Chris about ensuring that it can’t happen again.

The rest of my week was spent on the SCOSYA project.  Last week I updated the ‘Atlas Display Options’ to include accordion sections for ‘advanced attribute search’ and ‘my map data’.  I’m still waiting to hear back from Gary about how he would like to advanced search to work so instead I focussed on the ‘my map data’ section.  This section will allow people to upload their own map data using the same CSV format as the atlas download files in order to visualise this data on the map.  I managed to make some pretty good progress with this feature.  First of all I needed to create new database tables to house the uploaded data.  Then I needed to add in a facility to upload files.  I decided to use the ‘dropzone.js’ scripts that I had previously used for uploading the questionnaires to the CMS.  This allows the user to drag and drop one or more files into a section of the browser and for this data to then be processed in an AJAX kind of way.  This approach works very well for the atlas as we don’t want the user to have to navigate away from the atlas in order to upload the data – all needs to be managed from within the ‘display options’ slideout section.

I contemplated adding the facility to process the uploaded files to the API but decided against it as I wanted to keep the API ‘read only’ rather than also handling data uploads and deletions.  So instead I created a stand-along PHP script that takes the uploaded CSV files and adds them to the database tables I had created.  This script then echoes out some log messages that then get pulled into a ‘log’ section of the display in an AJAX manner.

I then had to add in a facility to list previously uploaded files.  I decided the query for this should be part of the API as it is a ‘GET’ request.  However, I needed to ensure that only the currently logged in user was able to access their particular list of files.  I didn’t want anyone to be able to pass a username to the API and then get that user’s files – the passed username must also correspond to the currently logged in user.  I did some investigation about securing an API, using access tokens and things like that but in the end I decided that accessing the user’s data would only ever be something that we would want to offer through our website and we could therefore just use session authentication to ensure the correct user was logged in.  This doesn’t really fit in with the ethos of a RESTful API, but it suits our purposes ok so it’s not really an issue.

With the API updated to be able to accept requests for listing a user’s data uploads I then created a facility in the front-end for listing these files, ensuring that the list automatically gets updated with each new file upload.  You can see the work in progress ‘my map data’ section in the following screenshot.

I also added in a facility to remove previously uploaded files.  This doesn’t actually delete the files but merely marks them as ‘inactive’ in the database.  What I haven’t done yet is figure out a way to actually display the data on the map.  This is possibly going to be a little tricky as I have to consider what sort of map it is, think about how to update the API to spit out the data in the correct format and update the JavaScript to deal with user specific data rather than the original data.  So still lots to do.

Week Beginning 27th February 2017

I wasn’t feeling very well at the start of the week, but instead of going home sick I managed to struggle through by focussing on some fairly unchallenging tasks, namely continuing to migrate the STARN materials to the University’s T4 system.  I’m still ploughing through the Walter Scott novels, but I made a bit of progress.  I also spent a little more time this week on AHRC duties.

I had a few meetings this week.  I met with the Heads of School Administration, Wendy Burt and Nikki Axford on Tuesday to discuss some potential changes to my job, and then had a meeting with Marc on Wednesday to discuss this further.  The outcome of these meetings is that actually there won’t be any changes after all, which is disappointing but at least after several months of the possibility hanging there it’s all decided now.

On Tuesday I also had a meeting with Bryony Randall to discuss her current AHRC project about editing modernist texts.  I have a few days of effort assigned to this project, to help create a digital edition of a short story by Virginia Woolf and to lead a session on transcribing texts at a workshop in April, so we met to discuss how all this will proceed.  We’ve agreed that I will create the digital edition, comprising facsimile images and multiple transcriptions with various features visible or hidden.  Users will be able to create their own edition by deciding which features to include or hide, thus making users the editors of their own edition.  Bryony is going to make various transcriptions in Word and I am then going to convert this into TEI text.  The short story is only 6 pages long so it’s not going to be too onerous a task and it will be good experience to use TEI and Oxygen for a real project.  I’ll get started on this next week.

I met with Fraser on Wednesday to discuss the OED updates for the Historical Thesaurus and also to talk about the Hansard texts again.  We returned to the visualisations I’d made for the frequency of Thematic headings in the two-year sample of Hansard that I was working with.  I should really try to find the time to return to this again as I had made some really good progress with the interface previously.  Also this week I arranged to meet with Catriona Macdonald about The People’s Voice project and published this week’s new song of the week on the Burns website (http://burnsc21.glasgow.ac.uk/robert-bruces-address-to-his-army-at-bannockburn/).

I spent the rest of the week working on the SCOSYA project.  Gary emailed me last week to say that he’d encountered some problems with the Atlas interface since I implemented my updated ‘or’ search a couple of weeks ago so I spent a few hours trying to get to the bottom of these issues.  The main issue was that some ‘or’ searches were displaying no points on the map, even though data was available and could be downloaded via the CSV download facility.  The problem was caused by a mistake in how I was pulling in the rating comments.  I updated how I was handling these but had forgot to change a variable name in one part of the code, which was causing a JavaScript error only when certain combinations of factors occurred.  It should be sorted now, I hope.

Gary had also stated that some ‘or’ searches were not showing multiple icons when different attributes were selected.   However, after some investigation I think this may just be because without supplying limits an ‘or’ search for two attributes will often result in the attributes both being present at every location, therefore all markers will be the same.  E.g. a search for ‘D3 or A9’.  There are definitely some combinations of attributes that do give multiple markers, e.g. ‘Q6 or D32’.  And if you supply limits you generally get lots of different icons, e.g. ‘D3, young, 4-5 or A9, old, 4-5’.  Gary is going to check this again for any specific examples that don’t seem right.

After that I began to think about the new Atlas search options that Gary would like me to implement, such as being able to search for entire groups of attributes (e.g. an entire parent category) rather than individual ones.  At the moment I’m not entirely sure how this should work, specifically how the selected attributes should be joined.  For example, if I select the parent ‘AFTER’ with limits ‘old’ and ‘rating 4-5’ would the atlas only then show me those locations where all ‘AFTER’ attributes (D3 and D4) are present with these limits?  This would basically be the same as an individual attribute search for D3 and D4 joined by ‘and’.  Or would it be an ‘or’ search?  I’ve asked Gary for clarification but I haven’t heard back from him yet.

I also made a couple of minor cosmetic changes to the atlas.  Attributes within parent categories are now listed alphabetically by their code rather than their name, and selected buttons are now yellow to make it clearer which are selected and to differentiate from the ‘hover over’ purple colour.  I then further reworked the ‘Atlas display options’ so that the different search options are now housed in an ‘accordion’.  This hopefully helps to declutter the section a little.  As well as accordion sections for ‘Questionnaire Locations’ and ‘Attribute Search’ I have added in new sections for ‘Advanced Attribute Search’ and ‘My Map Data’.  These don’t have anything useful in them yet but eventually ‘Advanced Attribute Search’ will feature the more expanded options that are available via the ‘consistency data’ view – i.e. options to select groups of attributes and alternative ways to select ratings.  ‘My Map Data’ will be where users can upload their own CSV files and possibly access previously uploaded datasets.  See the following screenshot for an idea of how the new accordion works.

I also started to think about how to implement the upload and display of a user’s CSV files and realised that a lot of the information about how points are displayed on the map is not included in the CSV file.  For example, there’s no indication of the joins between attributes or the limiting factors that were used to generate the data contained in the file.  This would mean when uploading the data the system wouldn’t be able to tell whether the points should be displayed as an ‘and’ map or an ‘or’ map.  I have therefore updated the ‘download map data’ facility to add in the URL used to generate the file in the first row.  This actually serves two useful purposes.  Firstly it means on re-uploading the file the system can tell which limits and Boolean joins were used and display an appropriate map and secondly it means there is a record in the CSV file of where the data came from and what it contains.  A user would be able to copy the URL into their browser to re-download the same dataset if (for example) they messed up their file.  I’ll continue to think about the implementation of the CSV upload facility next week.