Week Beginning 24th April 2017

Technology had it in for me this week.  On Monday I noticed that the SCOSYA project website and content management system was completely down and displaying a 503 error message (or in Chrome an ‘ERR_EMPTY_RESPONSE’ message).  Rather strangely parts of the site that didn’t use WordPress were still working.  This was odd because I hadn’t updated WordPress on the site for a few weeks, hadn’t changed anything to do with the site for a while and it was working the previous week.  I emailed Chris about this and he reckoned it was a WordPress issue because by renaming the ‘plugins’ folder the site came back online.  After a bit of testing I worked out that it was WordPress’s own Jetpack plugin that appeared to be causing the problem as renaming the plugin’s directory brought the site back.  This was very odd because the plugin hadn’t been updated recently either.

More alarmingly, I began to realise that the problem was not limited to the SCOSYA website but had in actual fact affected the majority of the 20 WordPress websites that I currently manage.  Even more strangely, it was affecting different websites in different ways.  Some were completely down, others broke only when certain links were clicked on and one was working perfectly, even though it had the same version of Jetpack installed as all of the others.  I spent the majority of Monday trying to figure out what was going wrong.  Was it some change in the University firewall or server settings that was blocking access to the WordPress server, resulting in my sites being knocked offline?  Was it some change at the WordPress end that was to blame?  I was absolutely stumped and resorted to disabling Jetpack (and in some cases other plugins) simply to get the sites back online until some unknown fix could be found.

Thankfully Chris managed to find the cause of the problem.  It wasn’t an issue with WordPress, Jetpack or any other plugins at all.  The problem was being caused by a corrupt library file on the server that for some unknown reason was knocking out some sites.  So despite looking like an issue that was my responsibility to fix it was in fact a server issue that I couldn’t possibly fix myself.  Thank goodness Chris managed to identify the problem and to replace the library file with a fixed version.  It’s just a pity I spent the best part of a day looking for a solution that couldn’t possibly exist, but these things happen.

My technological woes continued thanks to the Windows 10 ‘Creators Update’ that chose to install itself on Monday.  I opted to postpone the installation until I shut down my PC, which is a really nice feature.  Unfortunately it only processes a tiny part of the update as you shut down your PC and the rest sits and waits until you next turn your PC on again.  So when I came to turn my PC on again the next day I had a mandatory update that completely locked me out of my PC for 1 and a half hours.  Hugely frustrating!

And to round off my week of woes, I work from home on Thursdays and my broadband signal broke.  Well, it still works but it’s losing packets.  At one point 86% of packets were being lost, resulting in connections quitting, things failing to upload and download and everything going very slowly.  I was on the phone to Virgin Media for an hour but the problem couldn’t be fixed and they’re going to have to send an engineer out.  Yay for technology.

Now to actually discuss the work I did this week.  I spent about a day on AHRC review duties this week and apart from that practically my entire week was spend working on the LDNA project, specifically on the Thematic Heading visualisations that I last worked on before Easter.  It took me a little while to get back up to speed with the visualisation and how they fitted together but once I’d got over that hurdle I managed to make some good progress, as follows:

  1. I have identified the cause of the search results displaying a blank page.  This was caused by the script using more memory than the server was allowing it to use.  This occurred when large numbers of results needed to be processed.  Chris increased the memory limit so the script should always process the results, although it may take a while if there are thousands of sparklines.
  2. It is now possible to search for plateaus in any time period rather than just after 1500.
  3. I’ve created a database table that holds cached values of the minimums and maximums used in the search form for every possible combination of decade.  When you change the period these new values are now pulled in pretty much instantaneously so no more splines need to be reticulated.  It took rather a long time to run the script that generated the cached data (several hours in fact) but it’s great to have it all in the database and it makes the form a LOT faster to use now.
  4. Min/Max values for trauma rise and fall are in place, and I have also implemented the rise and fall searches. These currently just use the end of the full period as the end date so searches only really work properly for the full period.  I’ll need to properly update the searches so everything is calculated for the user’s selected period.  Something to look at next week
  5. Plateau now uses ‘minimum frequency of mode’ rather than ‘minimum mode’ so you can say ‘show me only those categories where the mode appears 20 times’.
  6. I also investigated what red spots for the peaks were not appearing some times. It turns out this occurs when you select a time period within which the category’s peak doesn’t appear.  So not having the red dot for peaks when the peak is beyond the selected period is probably the best approach, and that the missing red dots is actually a feature rather than a bug.

On Friday we had a big project meeting for LDNA, which involved Marc, Fraser and I Skyping from Marc’s office into a meeting being held at Sheffield.  It was interesting to hear about the progress the project has been making and its future plans, and also to see the visualisation that Matt from Sheffield’s DHI (the renamed HRI) has been working on.  I didn’t go into any details about the visualisation I’m working on as they’re not yet in a fit state to share.  But hopefully I’ll be able to send round the URL to the team soon.

It’s a bank holiday next Monday so it will be a four-day week.  I need to get back into the SCOSYA project next week and also to continue with these LDNA visualisations so it’s shaping up to be pretty busy.

Week Beginning 17th April 2017

This was my first week back after the Easter holidays and was in fact a four-day week due to Monday being Easter Monday.  Despite being only four days long it was a pretty hectic week.  I had to travel to Edinburgh for a project meeting on Tuesday for Jane Stuart Smith’s SPADE (SPeech Across Dialects of English) project, which will be officially starting in the next few weeks (or whenever the funding actually comes through).  This is a fairly large project involving partners in Canada and the US and our meeting was the first opportunity for members of the project to meet in person since the notification of award was given.  It was a useful meeting and we discussed what the initial stages for each project partner would be, how data would be collected and other such matters.  I’m not hugely involved in the project (4.5% of my time over 3 years) and will mainly be spending that time developing the project website, which will feature a map-based interface to summary information about the data the project will be dealing with.  The main focus of the project is the creation of an ‘integrated speech corpus analysis’ tool, which is being undertaken at McGill University in Canada, and it was interesting to learn more about this software.  I spent the bulk of Tuesday preparing for the meeting, travelling and attending the meeting.

On Thursday and Friday this week I attended two workshops that Bryony Randall had organised as part of her ‘New Modernist Editing’ AHRC Network project.  The first was for postgraduates who wanted to learn more about transcription and annotation with specific emphasis on Modernist texts.  I was leading a two-hour hands-on session on TEI, XML and Transcription as part of the event, so I spent quite a bit of time preparing for this.  I’d started putting some materials together before my holiday and indeed I’d worked on the materials during my holiday too, but I still had quit a lot of preparation to do in the run-up to the event.  Thankfully the session went pretty well.  TEI and XML can be rather intimidating, especially for people with no previous experience of such matters and I was really hoping to put a session together that managed to cover the basics without putting people off.  I think I managed to achieve this and by the end of the session all of the participants had managed to get a taste of TEI transcription.

The event on Friday was one of the main Network events for the project and as part of this I had a half-hour session where I was to demonstrate the digital edition I had created for Virginia Woolf short story (see previous posts for lots more information about this).  I think the demonstration went ok, but I managed to mess things up at the start by having the wrong edition settings set up, which wasn’t so good.  I also fear that I went into too much technical detail for an audience that were not especially technically minded.  In fact at least some of them were rather against having digital editions at all.

I didn’t have time to do much other work this week, other than to catch up with emails and things like that.  I did do some further work during my holiday, however.  Firstly I had to respond to AHRC review feedback for Murray Pittock and secondly I had to give feedback on a technical plan that Meg MacDonald had asked for some help with.

With my speaking at events now complete for the foreseeable future I will be able to return to more usual work next week.  I have a lot still to do for the Historical Thesaurus visualisations for the Linguistic DNA project and a number of items still to sort out for the SCOSYA project, for a start.

Week Beginning 3rd April 2017

I split my time this week pretty evenly between two projects, the Historical Thesaurus visualisations for the Linguistic DNA project and the New Modernist Editing project.  For the Historical Thesaurus visualisations I continued with the new filter options I had started last week that would allow users to view sparklines for thematic categories that exhibited particular features, such as peaks and plateaus (as well as just viewing sparklines for all categories).  It’s been some pretty complicated work getting this operational as the filters often required a lot of calculations to be made on the fly in my PHP scripts, for example calculating standard deviations for categories within specified periods.

I corresponded with Fraser about various options throughout the week.  I have set up the filter page so that when a particular period is selected by means of select boxes a variety of minimum and maximum values throughout the page are update by means of an AJAX call.  For example if you select the period 1500-2000 then the minimum and maximum standard deviation values for this period are calculated and displayed, so you get an idea of what values you should be entering.  See the screenshot below of an idea of how this works:

Eventually I will replace these ‘min’ and ‘max’ values and a textbox for supplying a value with a much nicer slider widget, but that’s still to be done.  I also need to write a script that will cache the values generated by the AJAX call as currently the processing is far too slow.  I’ll write a script that will generate the values for every possible period selection and will store these in a database table, which should mean their retrieval will be pretty much instantaneous rather than taking several seconds as it currently does.

As you can see from the above screenshot, the user has to choose which type of filter they wish to apply by selecting a radio button.  This is to keep things simple, but we might change this in future.  During the week we decided to remove the ‘peak decade’ option as it seemed unnecessary to include this when the user could already select a period.

After removing the additional ‘peak decade’ selector I realised that it was actually needed.  The initial ‘select the period you’re interested in’ selector specifies the total range you’re interested in and therefore sets the start and end points of each sparkline’s x axis.  The ‘peak decade’ selector allows you to specify when in this period a peak must occur for the category to be returned.  So we need the two selectors to allow you to do something like “I want the sparklines to be from 1010s to 2000s and the peak decade to be between 1600 and 1700”.

Section 4 of the form has 5 possible options.  Currently only ‘All’, ‘peak’ and ‘plateau’ are operational and I’ll need to return to this after my Easter holiday.  ‘All’ brings back sparklines for your selected period and a selected average category size and / or minimum size of largest category.

‘Peaks’ allows you to specify a period (a subset of your initial period selection) within which a category must have its largest value for it to be returned.  You can also select a minimum percentage difference between largest and end.  The difference is a negative value and if you select -50 the filter will currently only then bring back categories where the percentage difference is between -50 and 0.  I was uncertain whether this was right and Fraser confirmed that it should instead be between -50 and -100 instead so that’s another thing I’ll need to fix.

You can also select a minimum standard deviation.  This is calculated based on your initial period selection.  E.g. if you say you’re interested in the period 1400-1600 then the standard deviation is calculated for each category based on the values for this period alone.

‘Plateaus’ is something that needs some further tweaking.  The option is currently greyed out until you select a date range that is from 1500 or later.  Currently you can specify a minimum mode, and the script works out the mode for each category for the selected period and if the mode is less than the supplied minimum mode the category is not returned.  I think that specifying the minimum number of times the mode occurs would be a better indicator and will need to implement this.

You can also specify the ‘minimum frequency of 5% either way from mode’.  For your specified period this currently works out 5% under the mode as:  the largest number of words in a decade multiplied by 0.05 and then subtract this from the mode.  5% over is the value largest number of words multiplied by 0.05 added onto the mode.  E.g. if the mode is 261 and the largest is 284 then 5% under is 246.8 and 5% over is 275.2.

For each category in your selected period the script counts the number of times the number of words in a decade falls within this range.  If the tally for a category is less than the supplied ‘minimum frequency of 5% either way from mode’ then the category is removed from the results.

I’ve updated the display of results to include some figures about each category as well as the sparkline and the ‘gloss’.  Information about the largest decade, the average decade, the mode, the standard deviation, the frequency of decades that are within 5% under / over the mode and what this 5% under / over range is are displayed when relevant to your search.

There are some issues with the above that I still need to address.  Sometimes the sparklines are not displaying the red dots representing the largest categories, and this is definitely a bug.  Another serious bug also exists in that when some combinations of options are selected PHP encounters an error and the script terminates.  This seems to be dependent on the data and as PHP errors are turned off on the server I can’t see what the problem is.  I’m guessing it’s a divide by zero error or something like that.  I hope to get to the bottom of this soon.

For the New Modernist Editing project I spent a lot of my time preparing materials for the upcoming workshop.  I’m going to be running a two-hour lab on transcription, TEI and XML for post-graduates and I also have a further half-hour session another day where I will be demonstrating the digital edition I’ve been working on.  It’s taken some time to prepare these materials but I feel I’ve made a good start on them now.

I also met with Bryony this week to discuss the upcoming workshop and also to discuss the digital edition as it currently stands.  It was a useful meeting and after it I made a few minor tweaks to the website and a couple of fixes to the XML transcription.  I still need to add in the edited version of the text but I’m afraid that is going to have to wait until after the workshop takes place.

I also spent a little bit of time on other projects, such as reading through the proposal documentation for Jane Stuart-Smith’s new project that I am going to be involved with, and publishing the final ‘song of the week’ for the Burns project (See http://burnsc21.glasgow.ac.uk/my-wifes-a-wanton-wee-thing/).  I also spoke to Alison Wiggins about some good news she had received regarding a funding application.  I can’t say much more about it just now, though.

Also this week I fixed an issue with the SCOSYA Atlas for Gary.  The Atlas suddenly stopped displaying any content, which was a bit strange as I hadn’t made any changes to it around the time it appears to have broken.  A bit of investigation uncovered the source of the problem.  Questionnaire participants are split into two age groups – ‘young’ and ‘old’.  This is based on the participants age.  However, two questionnaires had been uploaded for participants whose ages did not quite fit into our ‘young’ and ‘old’ categories and they were therefore being given a null age group.  The Atlas didn’t like this and stopped working when it encountered data for these participants.  I have now updated the script to ensure the participants are within one of our age groups and I’ve also updated things so that if any other people don’t fit in the whole thing doesn’t come crashing down.

I’m going to be on holiday all next week and will be back at work the Tuesday after that (as the Monday is Easter Monday).

Week Beginning 27th March 2017

I spent about a day this week continuing to tweak the digital edition system I’m creating for the ‘New Modernist Editing’ project.  My first task was to try and get my system working in Internet Explorer, as my current way of doing things produced nothing more than a blank section of the page when using this browser.  Even though IE is now obsolete it’s still used by a lot of people and I wanted to get to the bottom of the issue.  The problem was that jQuery’s find() function when executed in IE won’t parse an XMLDocument object.  I was loading in my XML file using jQuery’s ‘get’ method, e.g.:

$.get(“xml/ode.xml”, function( xmlFile ) {

//do stuff with xml file here

}

After doing some reading about XML files in jQuery it looked like you had to run a file through parseXML() in order to work with it (see http://api.jquery.com/jQuery.parseXML/) but when I did this after the ‘get’ I just got errors.  It turns out that the ‘get’ method automatically checks the file it’s getting and if it’s an XML file is automatically runs it through parseXML() behind the scenes so the text file is already an XMLDocument object by the time you get to play with it.

Information on this page (http://stackoverflow.com/questions/4998324/jquery-find-and-xml-does-not-work-in-ie) suggested an alternative way to load the XML file so that it could be read in IE but I realised in order to get this to work I’d need to get the plain text file rather than the XMLDocument object that jQuery had created.  I therefore used the ‘ajax’ method rather than the shorthand ‘get’ method, which allowed me to specify that the returned data was be to treated as plain text and not XML:

$.ajax({

url: “xml/ode.xml”,

dataType: “text”}).done(function(xmlFile){

//do stuff with xml file here

});

This meant that jQuery didn’t automatically convert the text into an XMLDocument object and I was intending to then manually call the parseXML method for non-IE browsers and do separate things just for IE.  But rather unexpectedly jQuery’s find() function and all other DOM traversal methods just worked with the plain text, in all browsers including IE!  I’m not really sure why this is, or why jQuery even needs to bother converting XML into an XMLDocument Object if it can just work with it as plain text.  But as it appears to just work I’m not complaining.

To sum up:  to use jQuery’s find() method on an XML file in IE (well, all browsers) ensure you pass plain text to the object and not an XMLDocument object.

With this issue out of the way I set to work on adding some further features to the system.  I’ve integrated editorial notes with the transcription view, using the very handy jQuery plugin Tooltipster (http://iamceege.github.io/tooltipster/).  Words or phrases that have associated notes appear with a dashed line under them and you can click on the word to view the note and click anywhere to hide the note again.  I decided to have notes appearing on click rather than on hover because I find hovering notes a bit annoying and clicking (or tapping) works better on touchscreens too.  The following screenshot shows how the notes work:

I’ve also added in an initial version of the ‘Edition Settings’ feature.  This allows the user to decide how they would like the transcription to be laid out.  If you press on the ‘Edition Settings’ button this opens a popup (well, a jQuery UI modal dialog box, to be precise) through which you can select or deselect a number of options, such as visible line breaks, whether notes are present or not etc.  Once you press the ‘save’ button your settings are remembered as you browse between pages (but resets if you close your browser or navigate somewhere else).  We’ll eventually use this feature to add in alternatively edited views of the text as well – e.g. one that corrects all of the typos. The screenshot below shows the ‘popup’ in action:

I spent about a day on AHRC duties this week and did a few other miscellaneous tasks, such as making the penultimate Burns ‘new song of the week’ live (see http://burnsc21.glasgow.ac.uk/when-oer-the-hill-the-eastern-star/) and giving some advice to Wendy Anderson about OCR software for one of her post-grad students.  I had a chat with Kirsteen McCue about a new project she is leading that’s starting up over the summer and I’ll need to give some input into.  I also made a couple of tweaks to the content management system for ‘The People’s Voice’ project following on from our meeting last week.  Firstly, I added new field called ‘sound file’ to the poem table.  This can be used to add in the URL of a sound file for the poem.  I updated the ‘browse poems’ table to include a Y/N field for whether there is a sound file present so that the project team can therefore order the table by the column and easily find all of the poems that have sound files.  The second update I made was to the ‘edit’ pages for a person, publication or library.  These now list the poems that the selected item is associated with.  For people there are two lists, one for people associated as authors and another for people who feature in the poems.  For libraries there are two lists, one for associated poems and another for associated publications.  Items in the lists are links that take you to the ‘edit’ page for the listed poem / publication.  Hopefully this will make it easier for the team to keep track of which items are associated with which poems.

I also met with Gary this week to discuss the new ‘My Map Data’ feature I implemented last week for the SCOSYA project.  It turns out that display of uploaded user data isn’t working in the Safari browser that Gary tends to use, so he had been unable to see how the feature works.  I’m going to have to investigate this issue but haven’t done so yet.  It’s a bit of a strange one as the data all uploads fine – it’s there in the database and is spat out in a suitable manner by the API, but for some reason Safari just won’t stick the data on the map.  Hopefully it will be a simple bug to fix.  Gary was able to use the feature by switching to Chrome and is now trying it out and will let me know of any issues he encounters.  He did encounter one issue in that the atlas display is dependent on the order of the locations when grouping ratings into averages.  The file he uploaded had locations spread across the file and this meant there were several spots for certain locations, each with different average rating colours.  A simple reordering of his spreadsheet fixed this, but it may be something I need to ensure gets sorted programmatically in future.

I also spent a bit of time this week trying to write down a description of how the advanced attribute search will work.  I emailed this document to Gary and he is going to speak to Jennifer about it.  Gary also mentioned a new search that will be required – a search by participant rather than by location.  E.g. show me the locations where ‘participant a’ has a score of 5 for both ‘attribute x’ and ‘attribute y’.  Currently the search is just location based rather than checking that individual participants exhibit multiple features.

There was also an issue with the questionnaire upload facility this week.  For some reason the questionnaire upload was failing to upload files, even though there were no errors in the files.  After a bit of investigation it turned out that the third party API I’m using to grab the latitude and longitude was down, and without this data the upload script gave an error.  The API is back up again now, but at the time I decided to add in a fallback.  If this first API is down my script now attempts to connect to a second API to get the data.

I spent the rest of the week continuing to work on the new visualisations of the Historical Thesaurus data for the Linguistic DNA project.  Last week I managed to create ‘sparklines’ for the 4000 thematic headings.  This week I added red dots to the sparklines to mark where the peak values are.  I’ve also split the ‘experiments’ page into different pages as I’m going to be trying several different approaches.  I created an initial filter for the sparklines (as displaying all 4000 on one page is probably not very helpful).  This filter allows users to do any combination of the following:

Select an average category size range (between average size ‘x’ and average size ‘y’)

Select a period in which the peak decade is reached (between decade ‘x’ and decade ‘y’)

Select a minimum percentage rise of average

Select a minimum percentage fall of average (note that as this is negative values the search will bring back everything with a value less than or equal to the value you enter).

This works pretty nicely, so example the following screenshot shows all headings that have an average size of 50 or more and have a peak between 1700 and 1799:

With this initial filter option in place I started work on more detailed options that can identify peaks and plateaus and things like that.  The user first selects a period in which they’re interested (which can be the full date range) and this then updates the values that are possible to enter in a variety of fields by means of an AJAX call.  This new feature isn’t operational yet and I will continue to work on it next week, so I’ll have more to say about it in the next report.