Week Beginning 5th June 2017

I spent quite a bit of time this week on the Historical Thesaurus.  A few tweaks ahead of Kay Day has now turned into a complete website redevelopment, so things are likely to get a little hectic over the next couple of weeks.  Last week I implemented an initial version of a new HT tree-based browse mechanism but at the start of this week I still wasn’t sure how best to handle different parts of speech and subcategories.  Originally I had thought we’d have a separate tree for each part of speech, but I came to realise that this was not going to work as the non-noun hierarchy has more gaps than actual content.  There are also issues with subcategories as ones with the same number but different parts of speech have no direct connection.  Main categories with the same number but different parts of speech always refer to the same thing – e.g. 01.02aj is the adjective version of 01.02.n.  But subcategories just fill out the numbers, meaning 01.02|01.aj can be something entirely different to 01.02|01.n.  This means providing an option to jump from a subcategory in one part of speech to another wouldn’t make sense.

Initially I went with the idea of having noun subcategories represented in the tree and the option to switch part of speech in the right-hand pane after the user selected a category in the tree (if a main category was selected).  When a non-noun main category was selected then the subcategories for this part of speech would then be displayed under the main category words.  This approach worked, but I felt that it was too inconsistent.  I didn’t like that subcategories were handled differently depending on their part of speech.  I therefore created two additional versions of the tree browser in addition to the one I created last week.

The second one has [+] and [-] instead of chevrons.  It has the catnum in grey before the heading.  The tree structure is the same as the first version (i.e. includes all noun categories and noun subcats).  When you open a category the different parts of speech now appear as tabs, with ‘noun’ open by default.  Hover over a tab to see the full part of speech and the heading for that part of speech.  The good thing about the tabs is the currently active PoS doesn’t disappear from the list, as happens with the other view.  When viewing a PoS that isn’t ‘noun’ and there are subcategories the full contents of these subcategories are visible underneath the maincat words.  Subcats are indented and coloured to reflect their level, as with the ‘live’ site’s subcats, but here all lexemes are also displayed.  As ‘noun’ subcats are handled differently and this could be confusing a line of text explains how to access these when viewing a non-noun category.

For the third version I removed all subcats from the tree and it only features noun maincats.  It is therefore considerably less extensive, and no doubt less intimidating.  In the category pane, the PoS selector is the same as the first version.  The full subcat contents as in v2 are displayed for every PoS including nouns.  This does make for some very long pages, but does at least mean all parts of speech are handled in the same way.

Marc, Fraser and I met to discuss the HT on Wednesday.  It was a very productive meeting and we formed a plan about how to proceed with the revamp of the site.  Marc showed us some new versions of the interface he has been working on too.  There is going to be a new colour scheme and new fonts will be used too.  Following on from the meeting I updated the navigation structure of the HT site, replaced all icons used in the site with Font Awesome icons, added in the facility to reload the ’random category’ that gets displayed on the homepage, moved the ‘quick search’ to the navigation bar of every page and made some other tweaks to the interface.

I spent more time towards the end of the week on the tree browser.  I’ve updated the ‘parts of speech’ section so that the current PoS is also included.  I’ve also updated the ordering to reflect the order in the printed HT and updated the abbreviations to match these too.  Tooltips now give text as found in the HT PDF.  The PoS beside the cat number is also now a tooltip.  I’ve updated the ‘random category’ to display the correct PoS abbreviation too.  I’ve also added in some default text that appears on the ‘browse’ page before you select a category.

A lot of my time was spent looking into how to handle loading the ‘browse’ page at a specific point in the hierarchy and ensuring that links to specific categories are possible via catIDs in the URL.  Now when you open a category the catid is appended to the page URL in the address bar.  This is a hash (#id=1) rather than a GET variable (?id=1) as updating hashes is much easier in JavaScript and works with older browsers.  This does mean old HT bookmarks and citations will point to the wrong place, but this isn’t a problem as I’ve updated the ‘category’ page so that it now takes an old style URL then redirects on to our new ‘browse’ page.  This brings us onto the next tricky issue:  Loading a category and the tree from a passed URL.  I am still in the middle of this as there are a couple of tricky things that need to be taken into consideration:

  1. If it’s a subcat we don’t just want to display this, we need to grab its maincat, all of the maincat’s subcats but then ensure the passed subcat is displayed on screen.
  2. We need to build up the tree hierarchy, which is for nouns, so if the passed catid is not a noun category we need to also then find the appropriate noun category

I have sorted out point 1 now.  If you pass a subcat ID to the page the maincat record is loaded and the page scrolls until the subcat is in view.  I will also highlight the subcat as well, but haven’t done this yet.  I’m still in the middle of addressing the second point.  I know where and how to add in the grabbing of the noun category, I just haven’t had the time to do it yet.  I also need to properly build up the tree structure and have the relevant parts open.  This is still to do as currently only the tree from the maincat downwards is loaded in.  It’s potentially going to be rather tricky to get the full tree represented and opened properly so I’ll be focussing on this next week.  Also, T7 categories are currently giving an error in the tree.  They all appear to have children and when you click on the [+] then an error occurs.  I’ll get this fixed next week too.  After that I’ll focus on integrating the search facilities with the tree view.  Here’s a screenshot of how the tree currently looks:

I was pretty busy with other projects this week as well.  I met with Thomas Clancy and Simon Taylor on Tuesday to discuss a new place-names project they are putting together.  I will hopefully be able to be involved in this in some capacity, despite it not being based in the School of Critical Studies.  I also helped Chris to migrate the SCOTS Corpus websites to a new server.  This caused some issues with the PostGreSQL database that took use several hours to get to the bottom of.  These were causing the search facilities to be completely broken, but thankfully I figured out what was causing this and by the end of the week the site was sitting on a new server.  I also had an AHRC review to undertake this week.

On Friday I met with Marc and the group of people who are working on a new version of the ARIES app.  I will be implementing their changes so it was good to speak to them and learn what they intend to do.  The timing of this is going to be pretty tight as they want to release a new version by the end of August, so we’ll just need to see how this goes.  I also made some updates to the ‘Burns and the Fiddle’ section of the Burns website.  It’s looking like this new section will now launch in July.

Finally, I spent several hours on The People’s Voice project, implementing the ‘browse’ functionality for the database of poems.  This includes a series of tabs for different ways of browsing the data.  E.g. you can browse the titles of poems by initial letter, you can browse a list of authors, years of publication etc.  Each list includes the items plus the number of poems that are associated with the item – so for example in the list of archives and libraries you can see that Aberdeen Central Library has 70 associated poems.  You can then click on an item and view a list of all of the matching poems.  I still need to create the page for actually viewing the poem record.  This is pretty much the last thing I need to implement for the public database and all being well I’ll get this finished next Friday.





Week Beginning 29th May 2017

Monday this week was the spring bank holiday so it was a four-day week for me.  I split my time this week over three main projects.  Firstly, I set up an initial project website for Jane Stuart-Smith’s SPADE project.  We’d had some difficulty in assigning the resources for this project but thankfully this week we were given some web space and I managed to get a website set up, create a skeleton structure for it and create the user accounts that will allow the project team to manage the content.  I also had some email discussions with the project partners about how best to handle ‘private’ pages that should be accessible to the team but no-one else.  There is still some work to be done on the website, but for the time being my work is done.

I also continued this week to work on the public interface for the database of poems for The People’s Voice project.  Last week I started on the search facility, but only progressed as far as allowing a search for a few fields, with the search results page displaying nothing more than the number of matching poems.  This week I managed to pretty much complete the search facility.  Users can now search for any combination of search boxes and on the search results page there is now a section above the results that lists what you’ve searched for.  This also includes a ‘refine your search’ button that takes the user back to the search page.  The previously selected options are now ‘remembered’ by the form, allowing the user to update what they’ve searched for.  There is also a ‘clear search boxes’ button so the user can start a fresh search.

Search results are now paginated.  Twenty results are displayed per page and if there are more results than this then ‘next’, ‘previous’ and ‘jump to page’ links are displayed above and below your search results.  If there are lots of pages some ‘jump to page’ links are omitted to stop things getting too cluttered.  Search results display the poem title, date (or ‘undated’ if there is no date), archive / library, franchise and author.  Clicking on a result will lead to the full record, but this is still to do.  I also haven’t added in the option to order the results by anything other than poem title, as I’m not sure whether this will really be of much use and it’s going to require a reworking of the way search results are queried if I am to implement it.  I still have the ‘browse’ interface to work on and the actual page that displays the poem details, and I’ll continue with this next week.

I met with Bryony Randall this week to discuss some final tweaks to the digital edition of the Virginia Woolf short story that I’ve been working on.  I made a few changes to the transcription, updated how we label ‘sic’ and ‘corr’ text in the ‘edition settings’ (these are now called ‘original’ and ‘edited’) and I changed which edition settings are selected by default.  Where previously the original text was displayed we now display the ‘edited’ text with only line breaks from the ‘original’ retained.  Bryony is going to ask for feedback from members of the Network and we’re going to aim to get things finalised by the end of the month.

I spent the rest of the week working on the Historical Thesaurus.  Last week I met with Marc and Fraser to discuss updates to the website that we were going to try and implement before ‘Kay Day’ at the end of the month.  One thing I’ve wanted to try to implement for a while now is a tree-based browse structure.  I created a visual tree browse structure using the D3.js library for the Scots Thesaurus project and doing so made me realise how useful having such a speedy way to browse the full thesaurus structure would be.

I tried a few jQuery ‘tree’ plugins and in the end I went with FancyTree (https://github.com/mar10/fancytree) because it clearly explained how to load data into nodes via AJAX when a user opens the node.  This is important for us as we can’t load all 235,000 categories into the tree at once (well, we could but it would be a bad idea).  I created a PHP script (that I will eventually integrate with the HT API) that you can pass a catid to and it will spit out a JSON file containing all of the categories and subcategories that are one level down from it.  It also checks whether each of these have child categories.  If there are child categories then the tree knows to place the little ‘expand’ icon next to the category.  When the user clicks on a category this fires off a request for the category’s children and these are then dynamically loaded into the tree.  Here’s a screenshot of my first attempt at using FancyTree:

Subcategories are highlighted with a grey background and in this version you can’t actually view the words in a category.  Also, only nouns are currently represented.  I thought at this stage that we might have to have separate trees for each part of speech, but then realised that the other parts of speech don’t have a full hierarchy so the tree would be missing lots of branches and would therefore not work.  In this version the labels only show the category heading and catnum / subcat, but I can update the labels to display additional information.  We could for example show the number of categories within each category, or somehow represent the number of words contained in the category so you can see where the big categories are. I should also be able to override the arrow icons with font awesome icons.

After creating this initial version I realised there was still a lot to be done.  For example, if we’re using this browser then we need to ensure that when you open a category the tree loads with the correct part opened.  This might be tricky to implement.  Also there’s the issue of dealing with different parts of speech.

After working on this initial version I then began to work on a version that was integrated into the HT website design.  I also followed the plugin instructions for using Font Awsome icons rather than the bitmap icons, although this took some working out.  In order to get this to work another Javascript file was required (jquery.fancytree.glyph.js) but I just couldn’t get this to work.  It kept bringing up javascript errors about a variable called ‘span’ not existing.  Eventually I commented out the line of code (relating to the ‘loading’ section) and after that everything worked perfectly.  With this new version I also added in the facility to actually view the words when you open a category, and also to switch to different parts of speech.  It’s all working very nicely, apart from subcategories belonging to other parts of speech.  I’m wondering whether I should include subcategories in the tree or whether they should just be viewable through the ‘category’ pane.  If they only appear in the tree then it is never going to be possible to view subcategories that aren’t nouns whereas if they appear in a section as they do in the current category page then they will load whenever anyone selects that PoS.  It does mean we would lose a lot of content from the tree, though.  E.g. if you find ‘Beer’ all of those subcategories of beer would no longer be in the tree and you would no longer be able to flick through them all really quickly.  This is going to need some further though.  But here’s a screenshot of the current work-in-progress version of the tree browser:

Week Beginning 22nd May 2017

I spent a fair amount of time this week on Historical Thesaurus related matters.  Towards the start of the week I continued with the mammoth task of integrating the new OED data with the HT data.  I created a few new checking and integration scripts to try and find patterns in the HT and OED category names in order to be able to match them up.  Out of a total of 235,249 categories we now have 221,336 that are marked as checked.

On Wednesday Fraser, Marc and I had a meeting to discuss how to proceed with the rest of the HT / OED linking and also to consider what updates to make to the HT website.  We came up with a few ideas that we are going to try and implement in the next few weeks.  I can’t really say much more about it yet, though.

I spent about a day this week working on the Burns project, creating a new subsection of the website about Burns and the Fiddle.  This included creating pages, positioning images, creating MP3 files from audio files in other formats, uploading everything and making it all look nice.  The section isn’t going live yet as there are further tweaks to be made, but most of the content is now in place.

I had a couple of meetings with Luca Guariento this week to discuss some of the technical issues he’s working through at the moment, including working with APIs with jQuery and some issues with some OpenLayers maps he’s working on.  I also helped Gary with a couple of minor SCOSYA issues, spoke to Ronnie Young about a Burns project he’s putting together, talked to Jane Stuart-Smith about the website for her SPADE project and had a chat with Bryony Randall about the digital edition we’re working on.  I also attended a college-wide meeting about critical editions that had been organised by Sheila Dickson from the School of Modern Languages and Cultures.  It was an interesting meeting to attend and it looks like I might be involved in setting up a website that will showcase the critical edition work that is based at the university.  I’ll just need to wait and see if anything comes from this, but hopefully it will.

On Friday I returned to The People’s Voice project and continued to work on the public interface to the poem database.  I reinstated the ‘publication’ field as a drop-down list, as Catriona requested that it was added back in.  I also added the ‘autocomplete’ feature to the required fields (author, set tune title, featured individual, publication name).  Now if you start typing into these fields anything that matches will be displayed in a list and can be selected.  I also included ‘pseudonym’ in the ‘author’ and ‘featured individual’ autocomplete search.  I then updated the form so that the publication country and city lists are now populated from the data in the database.  The ‘city’ list updates depending on the user’s choice of country.  I also added in a query to generate the Archive / Library multi-select based on the data and I started to work on the code that will take the user’s selected options, process them, build a query and display the results.  So far you can search for title, set tune, set tune title, audio, comments and franchise only (any combination of these).  Results aren’t displaying yet but the number of poems that match your search are.  There’s still lots to do here and I’ll hopefully be able to continue with this next Friday.


Week Beginning 15th May 2017

I was ill this week and was off work sick on Wednesday and Thursday and because of this I didn’t manage to get much done with regards to the LDNA visualisations or the SCOSYA atlas.  I had to spend about half a day upgrading the 20 WordPress sites I manage to the most recent release and I spoke to Jane about the problems we’re having getting some space set aside for the website for her SPADE project.  I also replied to Alex Benchimol about a project he is putting together and completed the review of the paper that I had started last week.

On Tuesday I completed an initial version of the digital edition I have been working on for Bryony Randall’s New Modernist Editing project.  This version now includes all of the features I had intended to implement and I completed the TEI transcription for the whole manuscript.  Using the resource it is now possible to tailor the view as you would like to see it, ranging from a version of the text that closely represents the original typewritten page, including line breaks, handwritten notes and typos, through to a fully edited and ‘corrected’ version of the text complete with explanatory notes that closely resembles the text you’d find in an edited and printed edition.  I think it works rather well, although there are still some aspects that will need tweaking, such as adding in an introduction, possibly updating the way some features are labelled and maybe changing the way some of the document is tagged.

One new feature I added in this week that I’m rather pleased with is a reworking of the edition settings function.  Previously a user’s selected settings were stored in a JavaScript object behind the scenes; the site ‘remembered’ the settings as the user navigated from page to page, but if the user bookmarked a page, or copied the URL to send to someone or for a citation the exact settings would not be included and the default view would instead be loaded.  I decided that this wasn’t the best way to go about things so instead updated the JavaScript so that settings are now incorporated into the page URL.  This does make the URL rather longer and messier, but it does mean that the exact view of the page can be passed between sessions, which I think is more important than cleaner URLs.

When I returned to work on Friday I decided to make a start on the public interface for the database of poems and songs for The People’s Voice project.  I spent the morning writing a specification document and thinking about how the search would work.  The project website is WordPress based and I did consider developing the search as a WordPress plugin, as I have done for other projects, such as the SciFiMedHums project (see http://scifimedhums.glasgow.ac.uk/the-database/).  However, I didn’t want the resource to be too tied into WordPress and instead wanted it to be useable (with minimal changes) independently of the WordPress system.  Having said that, I still wanted the resource to feel like a part of the main project website and to use the WordPress theme the rest of the site uses.  After a bit of investigation I found a way to create a PHP page that is not part of WordPress but ‘hooks’ into some WordPress functions in order to use the website’s theme.  Basically you add a ‘require’ statement that pulls in ‘wp-load.php’ and then you can call the functions that process the WordPress header, sidebar and footer (get_header(), get_sidebar(), get_footer()) wherever you want these to appear.  All the rest of your script can be as you want it.

I emailed my specification document to the project team and started to work on the search interface in the afternoon.  This is going to use jQuery UI components so I created a theme for this and set up the basic structure for the search form.  It’s not fully complete yet as I need to add in some ‘auto-complete’ functions and some content for drop-down lists, but the overall structure is there.  The project team wanted pretty much every field in the database to be searchable, which makes for a rather unwieldy and intimidating search form so I’m going to have to think of a way to make this more appealing.  I’ll try to continue with this next week, if I have the time.

Week Beginning 27th March 2017

I spent about a day this week continuing to tweak the digital edition system I’m creating for the ‘New Modernist Editing’ project.  My first task was to try and get my system working in Internet Explorer, as my current way of doing things produced nothing more than a blank section of the page when using this browser.  Even though IE is now obsolete it’s still used by a lot of people and I wanted to get to the bottom of the issue.  The problem was that jQuery’s find() function when executed in IE won’t parse an XMLDocument object.  I was loading in my XML file using jQuery’s ‘get’ method, e.g.:

$.get(“xml/ode.xml”, function( xmlFile ) {

//do stuff with xml file here


After doing some reading about XML files in jQuery it looked like you had to run a file through parseXML() in order to work with it (see http://api.jquery.com/jQuery.parseXML/) but when I did this after the ‘get’ I just got errors.  It turns out that the ‘get’ method automatically checks the file it’s getting and if it’s an XML file is automatically runs it through parseXML() behind the scenes so the text file is already an XMLDocument object by the time you get to play with it.

Information on this page (http://stackoverflow.com/questions/4998324/jquery-find-and-xml-does-not-work-in-ie) suggested an alternative way to load the XML file so that it could be read in IE but I realised in order to get this to work I’d need to get the plain text file rather than the XMLDocument object that jQuery had created.  I therefore used the ‘ajax’ method rather than the shorthand ‘get’ method, which allowed me to specify that the returned data was be to treated as plain text and not XML:


url: “xml/ode.xml”,

dataType: “text”}).done(function(xmlFile){

//do stuff with xml file here


This meant that jQuery didn’t automatically convert the text into an XMLDocument object and I was intending to then manually call the parseXML method for non-IE browsers and do separate things just for IE.  But rather unexpectedly jQuery’s find() function and all other DOM traversal methods just worked with the plain text, in all browsers including IE!  I’m not really sure why this is, or why jQuery even needs to bother converting XML into an XMLDocument Object if it can just work with it as plain text.  But as it appears to just work I’m not complaining.

To sum up:  to use jQuery’s find() method on an XML file in IE (well, all browsers) ensure you pass plain text to the object and not an XMLDocument object.

With this issue out of the way I set to work on adding some further features to the system.  I’ve integrated editorial notes with the transcription view, using the very handy jQuery plugin Tooltipster (http://iamceege.github.io/tooltipster/).  Words or phrases that have associated notes appear with a dashed line under them and you can click on the word to view the note and click anywhere to hide the note again.  I decided to have notes appearing on click rather than on hover because I find hovering notes a bit annoying and clicking (or tapping) works better on touchscreens too.  The following screenshot shows how the notes work:

I’ve also added in an initial version of the ‘Edition Settings’ feature.  This allows the user to decide how they would like the transcription to be laid out.  If you press on the ‘Edition Settings’ button this opens a popup (well, a jQuery UI modal dialog box, to be precise) through which you can select or deselect a number of options, such as visible line breaks, whether notes are present or not etc.  Once you press the ‘save’ button your settings are remembered as you browse between pages (but resets if you close your browser or navigate somewhere else).  We’ll eventually use this feature to add in alternatively edited views of the text as well – e.g. one that corrects all of the typos. The screenshot below shows the ‘popup’ in action:

I spent about a day on AHRC duties this week and did a few other miscellaneous tasks, such as making the penultimate Burns ‘new song of the week’ live (see http://burnsc21.glasgow.ac.uk/when-oer-the-hill-the-eastern-star/) and giving some advice to Wendy Anderson about OCR software for one of her post-grad students.  I had a chat with Kirsteen McCue about a new project she is leading that’s starting up over the summer and I’ll need to give some input into.  I also made a couple of tweaks to the content management system for ‘The People’s Voice’ project following on from our meeting last week.  Firstly, I added new field called ‘sound file’ to the poem table.  This can be used to add in the URL of a sound file for the poem.  I updated the ‘browse poems’ table to include a Y/N field for whether there is a sound file present so that the project team can therefore order the table by the column and easily find all of the poems that have sound files.  The second update I made was to the ‘edit’ pages for a person, publication or library.  These now list the poems that the selected item is associated with.  For people there are two lists, one for people associated as authors and another for people who feature in the poems.  For libraries there are two lists, one for associated poems and another for associated publications.  Items in the lists are links that take you to the ‘edit’ page for the listed poem / publication.  Hopefully this will make it easier for the team to keep track of which items are associated with which poems.

I also met with Gary this week to discuss the new ‘My Map Data’ feature I implemented last week for the SCOSYA project.  It turns out that display of uploaded user data isn’t working in the Safari browser that Gary tends to use, so he had been unable to see how the feature works.  I’m going to have to investigate this issue but haven’t done so yet.  It’s a bit of a strange one as the data all uploads fine – it’s there in the database and is spat out in a suitable manner by the API, but for some reason Safari just won’t stick the data on the map.  Hopefully it will be a simple bug to fix.  Gary was able to use the feature by switching to Chrome and is now trying it out and will let me know of any issues he encounters.  He did encounter one issue in that the atlas display is dependent on the order of the locations when grouping ratings into averages.  The file he uploaded had locations spread across the file and this meant there were several spots for certain locations, each with different average rating colours.  A simple reordering of his spreadsheet fixed this, but it may be something I need to ensure gets sorted programmatically in future.

I also spent a bit of time this week trying to write down a description of how the advanced attribute search will work.  I emailed this document to Gary and he is going to speak to Jennifer about it.  Gary also mentioned a new search that will be required – a search by participant rather than by location.  E.g. show me the locations where ‘participant a’ has a score of 5 for both ‘attribute x’ and ‘attribute y’.  Currently the search is just location based rather than checking that individual participants exhibit multiple features.

There was also an issue with the questionnaire upload facility this week.  For some reason the questionnaire upload was failing to upload files, even though there were no errors in the files.  After a bit of investigation it turned out that the third party API I’m using to grab the latitude and longitude was down, and without this data the upload script gave an error.  The API is back up again now, but at the time I decided to add in a fallback.  If this first API is down my script now attempts to connect to a second API to get the data.

I spent the rest of the week continuing to work on the new visualisations of the Historical Thesaurus data for the Linguistic DNA project.  Last week I managed to create ‘sparklines’ for the 4000 thematic headings.  This week I added red dots to the sparklines to mark where the peak values are.  I’ve also split the ‘experiments’ page into different pages as I’m going to be trying several different approaches.  I created an initial filter for the sparklines (as displaying all 4000 on one page is probably not very helpful).  This filter allows users to do any combination of the following:

Select an average category size range (between average size ‘x’ and average size ‘y’)

Select a period in which the peak decade is reached (between decade ‘x’ and decade ‘y’)

Select a minimum percentage rise of average

Select a minimum percentage fall of average (note that as this is negative values the search will bring back everything with a value less than or equal to the value you enter).

This works pretty nicely, so example the following screenshot shows all headings that have an average size of 50 or more and have a peak between 1700 and 1799:

With this initial filter option in place I started work on more detailed options that can identify peaks and plateaus and things like that.  The user first selects a period in which they’re interested (which can be the full date range) and this then updates the values that are possible to enter in a variety of fields by means of an AJAX call.  This new feature isn’t operational yet and I will continue to work on it next week, so I’ll have more to say about it in the next report.


Week Beginning 20th March 2017

I managed to make a good deal of progress with a number of different projects this week, which I’m pretty pleased about.  First of all there is the digital edition that I’m putting together for Bryony Randall’s ‘New Modernist Editing’ project.  Last week I completed the initial transcript of the short story and created a zoomable interface for browsing through the facsimiles.  This week I completed the transcription view, which allows the user to view the XML text, converted into HTML and styled using CSS.  It includes the notes and gaps and deletions but doesn’t differentiate between pencil and ink notes as of yet.  It doesn’t include the options to turn on / off features such as line breaks at this stage either, but it’s a start at least.  Below is a screenshot so you can see how things currently look.

The way I’ve transformed and styled the XML for display is perhaps a little unusual.  I wanted the site to be purely JavaScript powered – no server-side scripts or anything like that.  This is because the site will eventually be hosted elsewhere.  My plan was to use jQuery to pull in and process the XML for display, probably by means of an XSLT file.  But as I began to work on this I realised there was an even simpler way to do this.  With jQuery you can traverse an XML file in exactly the same way as an HTML file, so I simply pulled in the XML file, found the content of the relevant page and spat it out on screen.  I was expecting this to result in some horrible errors but… it just worked.  The XML and its tags get loaded into the HTML5 document and I can just style these using my CSS file.

I tested the site out in a variety of browsers and it works fine in everything other than Internet Explorer (Edge works, though).  This is because of the way jQuery loads the XML file and I’m hoping to find a solution to this.  I did have some nagging doubts about displaying the text in this way because I know that even though it all works it’s not valid HTML5. Sticking a bunch of <lb>, <note> and other XML tags into an HTML page works now but there’s no guarantee this will continue to work and … well, it’s not ‘right’ is it.

I emailed the other Arts Developers to see what they thought of the situation and discussed some other possible ways for handling things.  I decided I could leave things as they were.  I could use jQuery to transform the XML tags into valid HTML5 tags.  I could run my XML file through an XSLT file to convert it to HTML5 before adding it to the server so no transformation needs to be done on the fly.  I could see if it’s possible to call an XSLT file from jQuery to transform the XML on the fly.  Graeme suggested that it would be possible to process an XSLT file using JavaScript (as is described here https://www.w3schools.com/xml/xsl_client.asp) so I started to investigate this.

I managed to get something working, but… I was reminded just how much I really dislike XSLT files.  Apologies to anyone who likes that kind of thing but my brain just finds them practically incomprehensible.  Doing even the most simple of things seems far too convoluted.  So I decided to just transform the XML into HTML5 using jQuery.  There are only a handful of tags that I need to deal with anyway.  All I do is find each occurrence of an XML tag, grab its contents, add a span after the element and then remove the element, e.g:




var content = “<span class=\”del\”>”+$(this).html()+”</span>”;





I can even create a generic function that will pass the tag name and spit out a span with that tag name while removing the tag from the page.  When it comes to modifying the layout based on user preferences I’ll be able to handle that straightforwardly via jQuery too.  E.g. whether line breaks are on or off:


//line breaks



$(this).after(“<br />”);


$(this).after(“ “);




For me at least this is a much easier approach than having to pass variables to an XSLT file.

I spent a day or so working on the SCOSYA atlas as well and I have now managed to complete work on an initial version of the ‘my map data’ feature.  This feature lets you upload previously downloaded files to visualise the data on the atlas.

When you download a file now there is a new row at the top that includes the URL of the query that generated the file and some explanatory text.  You can add a title and a description for your data in columns D and E of the first row as well.  You can make changes to the rating data, for example deleting rows or changing ratings and then after you’ve saved your file you can upload it to the system.

You can do this through the ‘My Map Data’ section in the ‘Atlas Display Options’.  You can either drag and drop your file into the area or click to open a file browser.  An ‘Upload log’ displays any issues with your file that the system may encounter.  After upload your file will appear in the ‘previously uploaded files’ section and the atlas will automatically be populated with your data.  You can re-download your file by pressing on the ‘download map data’ button again and you can delete your uploaded file by pressing on the appropriate ‘Remove’ button.  You can switch between viewing different datasets by pressing on the ‘view’ button next to the title.  The following screenshot shows how this works:

I tested the feature out with a few datasets, for example I swapped the latitude and longitude columns round and the atlas dutifully displayed all of the data in the sea just north of Madagascar, so things do seem to be working.  There are a couple of things to note, though.  Firstly, the CSV download files currently do not include data that is below the query threshold, so no grey spots appear on the user maps.  We made a conscious decision to exclude this data but we might now want to reinstate it.  Secondly, the display of the map is very much dependent on the URL contained in the CSV file in row 1 column B.  This is how the atlas knows whether to display an ‘or’ map or an ‘and’ map, and what other limits were placed on the data.  If the spreadsheet is altered so that the data contained does not conform to what is expected by the URL (e.g. different attributes are added or new ratings are given) then things might not display correctly.  Similarly, if anyone removes or alters that URL from the CSV files some unexpected behaviour might be encountered.

Note also that ‘my map data’ is private – you can only view your data if you’re logged in.  This means you can’t share a URL with someone.  I still need to add ‘my map data’ to the ‘history’ feature and do a few other tweaks.  I’ve just realised trying to upload ‘questionnaire locations’ data results in an error, but I don’t think we need to include the option to upload this data.

I also started working on the new visualisations for the Historical Thesaurus that will be used for the Linguistic DNA project, based on the spreadsheet data that Marc has been working on.  We have data about how many new words appeared in each thematic heading in every decade since 1000 and we’re going to use this data to visualise changes in the language.  I started by reading through all of the documentation that Marc and Fraser had prepared about the data, and then I wrote some scripts to extract the data from Marc’s spreadsheet and insert it into our online database.  Marc had incorporated some ‘sparklines’ into his spreadsheet and my first task after getting the data available was to figure out a method to replicate these sparklines using the D3.js library.  Thankfully, someone had already done this for stock price data and had created a handy walkthrough of how to do it (see http://www.tnoda.com/blog/2013-12-19).  I followed the tutorial and adapted it for our data, writing a script that created sparklines for each of the almost 4000 thematic headings we have in the system and displaying these all on a page.  It’s a lot of data (stored in a 14Mb JSON file) and as of yet it’s static, so users can’t tweak the settings to see how this affects things, but it’s a good proof of concept.  You can see a small snippet from the gigantic list below:

Other than these tasks I published this week’s new Burns song (see http://burnsc21.glasgow.ac.uk/braw-lads-on-yarrow-braes/) and I had a meeting with The People’s Voice project team where we discussed how the database of poems will function, what we’ll be doing about the transcriptions, and when I will start work on things.  It was a useful meeting and in addition to these points we identified a few enhancements I am going to make to the project’s content management system.  I also answered a query about some App development issues from elsewhere in the University and worked with Chris McGlashan to implement an Apache module that limits access to the pages held on the Historical Thesaurus server so as to prevent people from grabbing too much data.


Week Beginning 28th November 2016

I worked on rather a lot of different projects this week.  I made some updates to the WordPress site I set up last week for Carolyn Jess-Cooke’s project, such as fixing an issue with the domain’s email forwarding.  I replaced the website design of The People’s Voice project website with the new one I was working on last week, and this is now live: http://thepeoplesvoice.glasgow.ac.uk/.  I think it looks a lot more visually appealing that the previous design, and I also added in a twitter feed to the right-hand column.  I also had a phone conversation with Maria Dick about her research proposal and we have now agreed on the amount of technical effort she should budget for.

I received some further feedback about the Metre app this week from a colleague of Jean Anderson’s who very helpfully took the time to go through the resource.  As a result of this feedback I made the following changes to the app:

  1. I’ve made the ‘Home’ button bigger
  2. I’ve fixed the erroneous syllable boundary in ‘ivy’
  3. When you’re viewing the first or last page in a section a ‘home’ button now appears where otherwise there would be a ‘next’ or ‘previous’ button
  4. I’ve removed the ‘info’ icon from the start of the text.

Jean also tried to find some introductory text about the app but was unable to do so.  She’s asked if Marc can supply some, but I wold imagine he’s probably too busy to do so.  I’ll have to chase this up or maybe write some text myself as it would be good to be able to get the app completed and published soon.

Also this week I had a phone conversation with Sarah Jones of the Digital Curation Centre about some help and documentation I’ve given her about AHRC Technical Plans for a workshop she’s running.  I also helped out with two other non-SCS issues that cropped up.  Firstly, the domain for TheGlasgowStory (http://theglasgowstory.com/), which was one of the first websites I worked on had expired and the website had therefore disappeared.  As it’s been more than 10 years since the project ended no-one was keeping track of the domain subscription, but thankfully after some chasing about we’ve managed to get the domain ownership managed by IT Services and the renewal fee has now been paid.  Secondly, IT Services were wanting to delete a database that belonged to Archive Services (who I used to work for) and I had to check on the status of this.

I also spent a little bit of time this week creating a few mock-up logos / banners for the Survey of Scottish Place-Names, which I’m probably going to be involved with in some capacity and I spoke to Carole about the redevelopment of the Thesaurus of Old English Teaching Package.

Also this week I finally got round to completing training in the University Web Site’s content management system, T4.  After completing training I was given access to the STELLA pages within T4 and I’ve started to rework these.  I went through the outdated list of links on the old STELLA site and have checked each one, updating URLS or removing links entirely where necessary.  I’ve added in a few new ones too (e.g. to the BYU Corpus in the ‘Corpora’ section).  This updated content now appears on the ‘English & Scots Links’ page in the University website (http://www.gla.ac.uk/schools/critical/aboutus/resources/stella/englishscotslinks/) .

I also moved ‘Staff’ from its own page into a section on the STELLA T4 page to reduce left-hand navigation menu clutter.  For the same reason I’ve removed the left-hand links to SCOTS, STARN, The Glasgow Review and the Bibliography of Scottish Literature, as these are all linked to elsewhere.   I then renamed the ‘Teaching Packages’ page ‘Projects’ and have updated the content to provide direct links to the redeveloped resources first of all, and then links to all other ‘legacy’ resources, thus removing the need for the separate ‘STELLA digital resources’ page.  See here: http://www.gla.ac.uk/schools/critical/aboutus/resources/stella/projects/.  I updated the links to STELLA from my redeveloped resources to they go to this page now too.  With all of this done I decided to migrate the old ‘The Glasgow Review’ collection of papers to T4.  This was a long and tedious process, but it was satisfying to get it done.  The resource can now be found here: http://www.gla.ac.uk/schools/critical/aboutus/resources/stella/projects/glasgowreview/

In addition to the above I also worked on the SCOSYA project, looking into alternative map marker possibilities, specifically how we can show information about multiple attributes through markers on the map.  At our team meeting last week I mentioned the possibility of colour coding the selected attributes and representing them on the map using pie charts rather than circles for each map point and this week I found a library that will allow us to do such a thing, and also another form of marker called a ‘coxcomb chart’.  I created a test version with both forms, that you can see below:


Note that the map is dark because that’s how the library’s default base map looks.  Our map wouldn’t look like this.  The library is pretty extensive and has other marker types available too, as you can see from this example page: http://humangeo.github.io/leaflet-dvf/examples/html/markers.html.

So in the above example, there are four attributes selected, and these are displayed for four locations.  The coxcomb chart splits the circle into the number of attributes and then the depth of each segment reflects the average score for each attribute.  E.g. looking at ‘Arbroath’ you can see at a glance that the ‘red’ attribute has a much higher average score than the ‘green’ attribute, while the marker for ‘Airdrie’ (to the east of Glasgow) has an empty segment where ‘pink’ should be, indicating that this attribute is not present at this location.

The two pie chart examples (Barrhead and Dumbarton) are each handled differently.  For Barrhead the average score of each attribute is not taken into consideration at all.  The ‘pie’ simply shows which attributes are present.  All four are present so the ‘pie’ is split into quarters.  If one attribute wasn’t found at this location then it would be omitted and the ‘pie’ would be split into thirds.  For Dumbarton average scores are taken into consideration, which changes the size of each segment.  You can see that the ‘pink’ attribute has a higher average than the ‘red’ one.  However, I think this layout is rather confusing as at a glance it seems to suggest that there are more of the ‘pink’ attribute, rather than the average being higher.  It’s probably best not to go with this one.

Both the pies and coxcombs are a fixed size no matter what the zoom level, so when you zoom far out they stay big rather than getting too small to make out.  On one hand this is good as it addresses a concern Gary raised about not being able to make out the circles when zoomed out.  However, when zoomed out the map is potentially going to get very cluttered, which will introduce new problems.  Towards the end of the week I heard back from Gary and Jennifer, and they wanted to meet with me to discuss the possibilities before I proceeded any further with this.  We have an all-day team meeting planned for next Friday, which seems like a good opportunity to discuss this.


Week Beginning 21st November 2016

This is my 200th weekly report since starting this job, which is something of a milestone!  I received emails from three members of staff who required my help this week.  Maria Dick in English Literature is putting together a project proposal and wanted some advice on a project website so I had an email conversation with her about this.  Carolyn Jess-Cooke in English Literature wanted my help to set up a website for a small unfunded project she is starting up, and I spent some of the week getting this set up.  Finally Michael Shaw of The People’s Voice project got in touch with me because he has created a new flyer for the project and wondered if the project website design could be updated to reflect some of the elements from the flyer.  I played around with some of the graphics he had sent me and came up with a new interface that I think looks a lot nicer than the current one.  I sent screenshots of this new interface to Michael and he is going to show them to the rest of the team to see whether we should replace the current website design.

On Monday this week I completed work on Android versions of the ‘ARIES’ and ‘English Grammar: An Introduction’ apps.  I took screenshots of the apps running on my Android tablet and completed the Play Store listing for the apps and then went through the various processes required to prepare the app package file for submission to the store.  This is a slightly cumbersome process, involving various command-line tools such as zipalign and certification and signing tools.  But by the afternoon I had submitted the apps to the Play Store and the following day the apps were available to download.  You can download ‘ARIES’ here: https://play.google.com/store/apps/details?id=com.gla.stella.aries  and ‘English Grammar: An Introduction’ here: https://play.google.com/store/apps/details?id=com.gla.stella.grammar.  They are free to use so give them a try.

On Tuesday I met with Jean Anderson to discuss the ‘Basics of English Metre’ app that I’m currently redeveloping.  Jean had tested out the app for me and had noted down a few areas where improvements could be made and a couple of places where there were bugs.  It was a really useful meeting and we spent some time thinking about how some aspects of the app could be made more intuitive.  I spent some time during the week implementing the changes Jean had suggested, namely:

  1. ‘Unit’ text in the header now appears on the second line. I’ve kept it the blue colour.
  2. Emphasised parts of the text are now a dark blue rather than pink. Pink is now only used for syllable boundaries.  Rhythm is now purple and foot boundaries are now the dark blue colour
  3. The incorrect ‘cross’ icon now fades away after a few seconds.
  4. When an answer is correct the ‘check answer’ button is removed
  5. The yellow highlight colour is a more muted shade
  6. Additional questions (when they appear after correctly answering the final question on some pages) now appear in their own box rather than within the box for the last question
  7. In the ‘app’ version I’ve added in an ‘information’ icon at the start of the introductory text, to try and emphasise that this text is important in case the user’s eye is drawn down to the exercises without reading the text.

Some of these changes took rather a long time to implement, especially the introduction of new colours for certain types of content as this meant updating the JSON source data file and some parts of the JavaScript code for both the ‘app’ and ‘web’ versions of the tool (these versions have some differences due to the different libraries that are used in each).  I am now just waiting for Jean to supply me with some ‘about’ text and then hopefully I’ll be able to start creating iOS and Android versions of the resource.  BTW, you can view the ‘web’ version of the resource here if you’re interested: http://arts.gla.ac.uk/stella/apps/web/metre/

On Wednesday I had a SCOSYA meeting with Jennifer and Gary to discuss where we’re at with the technical aspects of the project.  They both seem pretty satisfied with the progress I’ve made so far and our discussions mostly focussed on what developments I should focus on next.  This was good because I had pretty much implemented all of the items that I had on my to do list for the project.  After our meeting I had lots more things added, namely:

  • I’ll split the Atlas attribute drop-down lists sections by parent category
  • I will investigate making the atlas markers bigger when zoomed out or moving the location of Shetland (the latter is probably not going to be possible)
  • Average gradient will be used when one attribute alone is selected, or when multiple attributes are joined by ‘AND’ or ‘NOT’. When multiple attributes are joined with ‘OR’ we need instead to show which attribute is present or not at each location.  How to visualise this needs further investigation.  It might be possible to assign a colour to each attribute and then make each map marker a pie chart featuring the colours of each attribute that is present at each location.

E.g. Attributes A,B,C are selected, joined by ‘OR’.  These are given the colours Red, Green and Blue.  Location X has A and B so has a marker split in two – half red, half green.  Location Y has A, B and C so has a marker split into thirds, one red, one green, one blue.  Location Z only has attribute C so is coloured solid blue.  The ratings are not represented in the markers – just whether the attribute is present or not.  I’m going to see whether it might be possible to use D3.js for this.

  • I will create a new atlas search option that will allow one or more parent categories to be selected. The limit options will be the same as for individual categories but the results will be different.  It won’t be about the average ratings at each location, instead it will be a rating of the number of times each attribute within the parent category matches the specified criteria.

E.g. There are 6 categories within Negative Concord and the limit options are ‘rated by 2 or more people with a rating of 3-5’.  The atlas will count how many of these 6 categories meet the criteria at each location.  This will then be split into 5 grades (so as to be able to handle any number of categories within one or more parents).  If the 6 categories (100%) meet the criteria at Location X then the map maker will be dark (e.g. black).  If only 3 categories (50%) match then the marker will be lighter (e.g. grey).  If no categories (0%) match then the marker will be white.  This will allow us to show the density of particular forms by location.  Note that some categories may need to be excluded by the project team.  This will be handled manually once Gary knows which attributes might need taken out (e.g. because they are just not present anywhere).

  • Once I start to create the public atlas, we agreed that the ‘expert’ user interface will allow users to log in and make their own parent categories – select any number of categories then give this group a name and save it. Their created categories will then be available for them to use via the atlas interface.
  • I will create another atlas search option that will basically replicate the ‘consistency data’ search only plotting the data on the atlas. The limit options in the ‘consistency’ page will all be offered, and map markers will have 3 possible colours representing ‘low’, ‘high’ and ‘mixed’ (with what these are being set through the limit options).

So, lots more to do for the project, and I’ll probably start on this next week.

Week Beginning 1st August 2016

This was a very short week for me as I was on holiday until Thursday.  I still managed to cram a fair amount into my two days of work, though.  On Thursday I spent quite a bit of time dealing with emails that had come in whilst I’d been away.  Carole Hough emailed me about a slight bug in the Old English version of the Mapping Metaphor website.  With the OE version all metaphorical connections are supposed to default to a strength of ‘both’ rather than ‘strong’ like with the main site.  However, when accessing data via the quick and advanced search the default was still set to ‘strong’, which was causing some confusion as this was obviously giving different results to the browse facilities, which defaulted to ‘both’.  Thankfully it didn’t take long to identify the problem and fix it.  I also had to update a logo for the ‘People’s Voice’ project website, which was another very quick fix.  Luca Guariento, who is the new developer for the Curious Travellers project, emailed me this week to ask for some advice on linking proper names in TEI documents to a database of names for search purposes and I explained to him how I am working with this for the ‘People’s Voice’ project, which has similar requirements.  I also spoke to Megan Coyer about the ongoing maintenance of her Medical Humanities Network website and fixed an issue with the MemNet blog, which I was previously struggling to update.  It would appear that the problem was being caused by an out of date version of the sFTP helper plugin, as once I updated that everything went smoothly.

I also set up a new blog for Rob Maslen, who wants to use it to allow postgrad students and others in the University to post articles about fantasy literature.  I also managed to get Rob’s Facebook group integrated with the blog for his fantasy MLitt course.  I’ve also got the web space set up for Rhona’s Edinburgh Gazetteer project, and extracted all of the images for this project too.  I spent about half of Friday working on the Technical Plan for the proposal Alison Wiggins is putting together and I now have a clearer picture of how the technical aspects of the project should fit together.  There is still quite a bit of work to do on this document, however, and a number of further questions I need to speak to Alison about before I can finish things off.  Hopefully I’ll get a first draft completed early next week, though.

The remainder of my short working week was spent on the SCOSYA project, working on updates to the CMS.  I added in facilities to create codes and attributes through the CMS, and also to browse these types of data.  This includes facilities to edit attributes and view which codes have which attributes and vice-versa.  I also began work on a new page for displaying data relating to each code – for example which questionnaires the code appears in.  There’s still work to be done here, however, and hopefully I’ll get a chance to continue with this next week.

Week Beginning 25th April 2016

I was stuck down with a rather nasty cold this week, which unfortunately led to me being off sick on Wednesday and Thursday.  It hit me on Tuesday and although I struggled through the day it really affected y ability to work.  I somewhat foolishly struggled into work on the Wednesday but only lasted an hour before I had to go home.  I returned to work on the Friday but was still not feeling all that great, which did unfortunately limit what I was able to achieve.  However, I did manage to get a few things done this week.

On Monday I created ‘version 1.1’ of the Metaphoric app.  The biggest update in this version is the ‘loading’ icons that now appear on the top-level visualisation when the user presses on a category.  As detailed in previous posts, there can be a fairly lengthy delay between a user pressing on a category and the processing of yellow lines and circles completing, during which time the user has no feedback that anything is actually happening.  I had spent a long time trying to get to the bottom of this, but realised that without substantially redeveloping the way the data is processed I would be unable to speed things up.  Instead what I managed to do was add in the ‘loading’ icon to at least give a bit of feedback to users that something is going on.  I had added this to the web version of the resource before the launch last week, but I hadn’t had time to add the feature to the app versions due to the time it takes for changes to apps to be approved before they appear on the app stores.  I set to work adding this feature (plus a few other minor tweaks to the explanatory text) to the app code and then went through all of the stages that are required to build the iOS and Android versions of the apps and submit these updated builds to the App Store and the Play Store.  By lunchtime on Monday the new versions had been submitted.  By Tuesday morning version 1.1 for Android was available for download.  Apple’s approval process takes rather longer, but thankfully the iOS version was also available for download by Friday morning.  Other than updating the underlying data when the researchers have completed new batches of sample lexemes my work on the Metaphor projects is now complete.  The project celebrated this milestone with lunch in the Left Bank on Tuesday, which was very tasty, although I was already struggling with my cold by this point, alas.

Also this week I met with Michael McAuliffe, a researcher from McGill University in Canada who is working with Jane Stuart Smith to develop some speech corpus analysis tools.  Michael was hoping to get access to the SCOTS corpus files, specifically the original, uncompressed sound recordings and the accompanying transcriptions made using the PRAAT tool.  I managed to locate these files for him and he is going to try and use these files with a tool he has created in order to carry out automated analysis/extraction of vowel durations.  It’s not really an area I know much about but I’m sure it would be useful to add such data to the SCOTS materials for future research possibilities.

I also finalised my travel arrangements for the DH2016 conference and made a couple of cosmetic tweaks to the People’s Voice website interface.  Other than that I spent the rest of my remaining non-sick time this week working on the technical plan for Murray Pittock’s new project.  I’ve managed to get about a third of the way through a first draft of the plan so far, which has resulted in a number of questions that I sent on to the relevant people.  I can’t go into any detail here but the plan is shaping up pretty well and I aim to get a completed first draft to Murray next week.