Week Beginning 24th July 2017

I spent Monday this week creating the Android version of the ‘Basics of English Metre’ app, which took a little bit of time as the workflow for creating and signing apps for publication had completely changed since the last time I created an app.  The process now uses Android Studio, and once I figured out how it all worked it was actually a lot easier than the old way that used to involve several command-line tools such as zipalign.  By the end of the day I had submitted the app and on Tuesday both the iOS and the Android version had been approved and had been added to the respective stores.  You can see the iOS version here: https://itunes.apple.com/us/app/the-basics-of-english-metre/id1262414928?mt=8 and the Android version here: https://play.google.com/store/apps/details?id=com.gla.stella.metre&hl=en_GB&pcampaignid=MKT-Other-global-all-co-prtnr-py-PartBadge-Mar2515-1.  The Web version is also available here: http://www.arts.gla.ac.uk/stella/apps/web/metre/.

On Friday I met with Stuart Gillespie in English Literature to discuss a new website he requires.  He has a forthcoming monograph that will be published by OUP and he needs a website to host an ‘Annexe’ for this publication.  Initially it will just be a PDF but it might develop into something bigger with online searches later on.  Also this week I had a further email conversation with Thomas Clancy about some potential place-name projects that might use the same system as the REELS project and I had a chat with someone from the KEEP archive if Suffolk about hosting the Woolf short story digital edition I’ve created.

I spent the rest of the week getting back into the visualisations I’ve been making using data from the Historical Thesaurus for the Linguistic DNA project.  It took me some time to read through all of the documentation again, look through previously sent emails and my existing code and figure out where I’d left things off several weeks ago.  I created a little ‘to do’ list of things I need to do with the visualisations.  My ‘in progress’ versions of the sparklines went live when the new version of the Historical Thesaurus was launched at the end of June (see http://historicalthesaurus.arts.gla.ac.uk/sparklines/) but these still need quite a bit of work, firstly to speed up their generation and secondly to make sure the data actually makes sense when a period other than the full duration is specified.  The pop-ups that appear on the visualisations also need to be reworked when looking at shorter periods too as the statistics contained currently refer to the full duration.

I didn’t actually tackle any of the above during this week as instead I decided to look into creating a new set of visualisations for ‘Deviation in the LDNA period’ instead.  Marc had created a sort of heatmap for this data in Excel and what I needed to do was create a dynamic, web-based version of this.  I decided to use the always useful D3.js library for these and rather handily I found an example heatmap that I could use as a basis for further work:  http://bl.ocks.org/tjdecke/5558084.  Also via this I found some very handy colour scales that I could use for the heatmap and I’ll no doubt use for future visualisations: https://bl.ocks.org/mbostock/5577023

The visualisation I created is pretty much the same as the spreadsheet – increasingly darker shades of green representing positive numbers and increasingly darker shades of blue representing negative numbers.  There are columns for each decade in the LDNA period and rows for each Thematic Heading.

I’ve split the visualisation up based on the ‘S1’ code.  It defaults to just showing the ‘AA’ headings but using the drop-down list you can select another heading, e.g. ‘AB’ and the visualisation updates, replacing the data.  This calls a PHP script that generates new data from the database and formats it as a CSV file.  We could easily offer up the CSV files to people too if they would want to reuse the data.

Note that not all of the ‘S1’ Thematic Headings appear in the spreadsheet or have ‘glosses’.  E.g. ‘AJ Matter’ is not in the spreadsheet and has no ‘gloss’ so I’ve had to use ‘AJ01 Alchemy’ as the ‘group’ in the drop-down list, which is probably not right.  Where there is no ‘S1’ heading (or no ‘S1’ heading that has a ‘gloss’) the ‘S2’ heading appears instead.

Here’s a screenshot of the visualisation, showing Thematic Heading group ‘AE Animals’:

In the ‘live’ visualisation (which I can’t share the URL of yet) if you hover over a thematic heading code down the left-hand edge of the visualisation a pop-up appears containing the full ‘gloss’ so you can tell what it is you’re looking at.  Similarly, if you hover over one of the cells a pop-up appears, this time containing the decade (helpful if you’ve scrolled down and the column headings are not visible) and the actual value contained within the cell.

Rather than make the cells square boxes as in the example I started with, I’ve made the boxes rectangles, with the intention of giving more space between rows and hopefully making it clearer that the data should primarily be read across the way.  I have to say I rather like the look of the visualisation as it brings to mind DNA sequences, which is rather appropriate for the project.

I experimented with a version of the page that had a white background and another that had a black background.  I think the white background actually makes it easier to read the data, but the black background looks more striking and ‘DNA sequency’ so I’ve added in an option to switch from a ‘light’ theme to a ‘dark’ theme, with a nice little transition between the two.  Here’s the ‘dark’ theme selected for ‘AB Life’:

There’s still probably some further work to be done on this, e.g. allowing users to in some way alter the cell values based on limits applied, or allowing users to click through from a cell pop-up to some actual words in the HT or something.  I could also add in a legend that shows what values the different shades represent.  I wasn’t sure whether this was really needed as you can tell by hovering over the boxes anyway.  I’ll see what Marc and Fraser suggest when they have a chance to use the visualisations.

Week Beginning 8th May 2017

On Monday this week I had a Skype meeting with the HRI people in Sheffield (recently renamed the Digital Humanities Institute) about the Linguistic DNA project.  I demonstrated the sparklines I’ve been working on and showed them the API and talked about the heatmap that I will also be developing, and the possibility of using the ‘highcharts’ view that I am hoping to use in addition to the sparkline view.  Mike and Matt talked about the ‘workbench’ that they are going to create for the project that will allow researchers to visualise the data.  They’re going to be creating a requirements document for this soon and as part of this they will look at our API and visualisations and work out how these might also be incorporated, and if further options need to be added to our API.

I was asked to review a paper this week so I spent a bit of time reading through it and writing a review.  I also set up an account on the Burns website for Craig Lamont, who will be working on this in future and responded to a query about the SciFiMedHums bibliography database.  I also had to fix a few issues with some websites following their migration to a new server and had to get some domain details to Chris to allow him to migrate some other sites.

I also spent a few hours getting back into the digital edition I’m making for Bryony Randall’s ‘New Modernist Editing’ project.  The last feature I need to add to my digital edition is editorial corrections.  I needed to mark up all of the typos and other errors in the original text and record the ‘corrected’ versions that Bryony had supplied me with.  I also then needed to update the website to allow users to switch between one view and the other using the ‘Edition Settings’ feature.  I used the TEI <choice> element with a <sic> tag for the original ‘erroneous’ text and a <corr> tag for the ‘corrected’ text.  This is the standard TEI way of handling such things and it works rather well.  I updated the section of jQuery that processes the XML and transforms it into HTML.  When the ‘Edition Settings’ has ‘sic’ turned on and ‘corr’ turned off the original typo filled text is displayed.  When ‘corr’ is turned on and ‘sic’ is turned off you get the edited text and when both ‘sic’ and ‘corr’ are turned on the ‘sic’ text is given a red border while the ‘corr’ text is given a green border so the user can see exactly where changes have been made and what text has been altered.  I think it works rather nicely.  See the following screenshot for an example.  I have so far only added in the markup for the first two pages but I hope to get the remaining four done next week.

For the rest of the week I focussed on the sparkline visualisations for the LDNA project.  Last week I created an API for the Historical Thesaurus that will allow the visualisations (or indeed anyone’s code) to pass query strings to it and receive JSON or CSV formatted data in return.  This week I created new versions of the sparkline visualisations that connected to this API.  I also had to update my ‘period cache’ data to include ‘minimum mode’ for each possible period, in addition to ‘minimum frequency of mode’.  This took quite a while to process as the script needed to generate the mode for every single possible combination of start and end decade over a thousand years.  It took a few hours to process but once it had completed I could update the ‘cache’ table in the online version of the database and update the sparkline search form so that it would pull in and display these.

I also started to make some further changes to the sparkline search form.  I updated the search type boxes so that the correct one is highlighted as soon as the user clicks anywhere in the section, rather than having to actually click on the radio button within the section.  This makes the form a lot easier to use as previously it was possible to fill in some details for ‘peak’, for example, but then forget to click on the ‘peak’ radio button, meaning that a ‘peak’ search didn’t run.  I also updated the period selectors so that instead of using two drop-down lists, one for ‘start’ decade and one of ‘end’ decade there is now a jQuery UI slider that allows a range to be selected.  I think this is nicer to use, although I have also started to employ sliders to other parts of the form to and I worry that it’s just too many sliders.  I might have to rethink this.  It’s also possible slightly confusing that updating the ‘period’ slider then updates the extent of the ‘peak decade’ slider, but the size of this slider remains the same as the full period range slider.  So we have one slider where the ends represent 1010s and 2000s but if you select the period 1200s to 1590s within this then the full extent of the ‘peak decade’ slider is 1200s to 1590s, even though it takes up the same width as the other slider.  See the screenshot below.  I’m thinking that this is just going to be too many sliders.

I also updated the search results page.  For ‘plateau’ searches I updated the sparklines so that the plateaus rather than peaks were highlighted with red spots.  I also increased the space devoted to each sparkline and added a border between each to make it easier to tell which text refers to which line.  I also added in an ‘info’ box that when clicked on gives you some statistics about the lines, such as the number of decades represented, the size of the largest decade and things like that.  I also added in a facility to download the CSV data for the sparklines you’re looking at.

I’ll continue with this next week.  What I really need to do is spend quite a bit of time testing out the search facilities to ensure they return the correct data.  I’m noticing some kind of quirk with the info box pop-ups, for example, that seems to sometimes display incorrect values for the lines.  There are also some issues relating to searches that do not cover the full period that I need to investigate.  And then after that I need to think about the heatmap and possible using HighCharts as an alternative to the D3 sparklines I’m currently using.  See the screenshot abovefor an example of the sparklines as they currently stand.

 

 

Week Beginning 1st May 2017

This week was a shorter one than usual as Monday was the May Day holiday and I was off work on Wednesday afternoon to attend a funeral.  I worked on a variety of different tasks during the time available.  Wendy is continuing to work on the data for Mapping Metaphor and had another batch of it for me to process this week.  After dealing with the upload we now have a further nine categories marked off and a total of 12,938 metaphorical connections and 25,129 sample lexemes. I also returned to looking at integrating the new OED data into the Historical Thesaurus.  Fraser had enlisted the help of some students to manually check connections between HT and OED categories and I set up a script that will allow to mark off a few thousand more categories as ‘checked’.  Before that Fraser needs to QA their selections and I wrote a further script that will help with this.  Hopefully next week I’ll be able to actually mark off the selections.

I also returned to SCOSYA for the first time since before Easter.  I managed to track down and fix a few bugs that Gary had identified.  Firstly, Gary was running into difficulties when importing and displaying data using the ‘my map data’ feature.  The imported data simply wouldn’t display at all in the Safari browser and after a bit of investigation I figured out why.  It turned out there was a missing square bracket in my code, which rather strangely was being silently fixed in other browsers but was causing issues in Safari.  Adding in the missing bracket fixed the issue straight away.  The other issue Gary had encountered was when Gary did some work on the CSV file exported from the Atlas and then reimported it.  When he did so the import failed to upload any ratings.  It turned out that Excel had added in some extra columns to the CSV file whilst Gary was working with it and this change to the structure meant that each row failed the validation checks I had put in place.  I decided to rectify this in two ways – firstly the upload would no longer check the number of columns and secondly I added more informative error messages.  It’s all working a lot better now.

With these things out of the way I set to work on a larger update to the map.  Previously an ‘AND’ search limited results by location rather than by participant.  For example, if you did a search that said ‘show me attributes D19 AND D30, all age groups with a rating of 4-5’ a spot for a location would be returned if any combination of participant matched this.  As there are up to four participants per location it could mean that a location could be returned as meeting the criteria even if a no individual participant actually met the criteria.  For example, participants A and B give D19 a score of 5 but only give D30 a score of 3, while Participants C and D only give D19 a score of 3 and give D30 a score of 5.  In combination, therefore, this location meets the criteria even though none of the participants actually do.  Gary reckoned this wasn’t the best way to handle the search and I agreed.  So, instead I updated the ‘AND’ search to check whether individuals met the criteria.  This meant a fairly large reworking of the API and a fair amount of testing, but it looks like the ‘AND’ search now works at a participant level.  And the ‘OR’ search doesn’t need to be updated because an ‘OR’ search by its very nature is looking for any combination.

I spent the remainder of the week on LDNA duties, continuing to work on the ‘sparkline’ visualisations for thematic heading categories.  Most of the time was actually spent creating a new API for the Historical Thesaurus, which at this stage is used solely to output data for the visualisations.  It took a fair amount of time to get the required endpoints working, and to create a nice index page that lists the endpoints with examples of how each can be used.  It seems to be working pretty well now, though, including facilities to output the data in JSON or CSV format.  The latter proved to be slightly tricky to implement due to the way that the data for each decade was formatted.  I wanted each decade to appear in its own column, so as to roughly match the format of Marc’s original Excel spreadsheet, and this meant having to rework how the multi-level associative array was processed.

With the API in place I then set to work with creating a new version of the visualisation that actually used it.  This also took a fair amount of time as I had to deconstruct my single file test script, splitting out the JavaScript processing into its own file, handling the form submission in JavaScript, connecting to the API and all those sorts of things.  By the end of the week I had a set of visualisations that looked and functioned identically to the set I previously had, but behind the scenes they were being processed very differently, and in a much more robust and easy to manage way.  Next week I’ll continue with these as there are still lots of enhancements to add in.

Week Beginning 24th April 2017

Technology had it in for me this week.  On Monday I noticed that the SCOSYA project website and content management system was completely down and displaying a 503 error message (or in Chrome an ‘ERR_EMPTY_RESPONSE’ message).  Rather strangely parts of the site that didn’t use WordPress were still working.  This was odd because I hadn’t updated WordPress on the site for a few weeks, hadn’t changed anything to do with the site for a while and it was working the previous week.  I emailed Chris about this and he reckoned it was a WordPress issue because by renaming the ‘plugins’ folder the site came back online.  After a bit of testing I worked out that it was WordPress’s own Jetpack plugin that appeared to be causing the problem as renaming the plugin’s directory brought the site back.  This was very odd because the plugin hadn’t been updated recently either.

More alarmingly, I began to realise that the problem was not limited to the SCOSYA website but had in actual fact affected the majority of the 20 WordPress websites that I currently manage.  Even more strangely, it was affecting different websites in different ways.  Some were completely down, others broke only when certain links were clicked on and one was working perfectly, even though it had the same version of Jetpack installed as all of the others.  I spent the majority of Monday trying to figure out what was going wrong.  Was it some change in the University firewall or server settings that was blocking access to the WordPress server, resulting in my sites being knocked offline?  Was it some change at the WordPress end that was to blame?  I was absolutely stumped and resorted to disabling Jetpack (and in some cases other plugins) simply to get the sites back online until some unknown fix could be found.

Thankfully Chris managed to find the cause of the problem.  It wasn’t an issue with WordPress, Jetpack or any other plugins at all.  The problem was being caused by a corrupt library file on the server that for some unknown reason was knocking out some sites.  So despite looking like an issue that was my responsibility to fix it was in fact a server issue that I couldn’t possibly fix myself.  Thank goodness Chris managed to identify the problem and to replace the library file with a fixed version.  It’s just a pity I spent the best part of a day looking for a solution that couldn’t possibly exist, but these things happen.

My technological woes continued thanks to the Windows 10 ‘Creators Update’ that chose to install itself on Monday.  I opted to postpone the installation until I shut down my PC, which is a really nice feature.  Unfortunately it only processes a tiny part of the update as you shut down your PC and the rest sits and waits until you next turn your PC on again.  So when I came to turn my PC on again the next day I had a mandatory update that completely locked me out of my PC for 1 and a half hours.  Hugely frustrating!

And to round off my week of woes, I work from home on Thursdays and my broadband signal broke.  Well, it still works but it’s losing packets.  At one point 86% of packets were being lost, resulting in connections quitting, things failing to upload and download and everything going very slowly.  I was on the phone to Virgin Media for an hour but the problem couldn’t be fixed and they’re going to have to send an engineer out.  Yay for technology.

Now to actually discuss the work I did this week.  I spent about a day on AHRC review duties this week and apart from that practically my entire week was spend working on the LDNA project, specifically on the Thematic Heading visualisations that I last worked on before Easter.  It took me a little while to get back up to speed with the visualisation and how they fitted together but once I’d got over that hurdle I managed to make some good progress, as follows:

  1. I have identified the cause of the search results displaying a blank page.  This was caused by the script using more memory than the server was allowing it to use.  This occurred when large numbers of results needed to be processed.  Chris increased the memory limit so the script should always process the results, although it may take a while if there are thousands of sparklines.
  2. It is now possible to search for plateaus in any time period rather than just after 1500.
  3. I’ve created a database table that holds cached values of the minimums and maximums used in the search form for every possible combination of decade.  When you change the period these new values are now pulled in pretty much instantaneously so no more splines need to be reticulated.  It took rather a long time to run the script that generated the cached data (several hours in fact) but it’s great to have it all in the database and it makes the form a LOT faster to use now.
  4. Min/Max values for trauma rise and fall are in place, and I have also implemented the rise and fall searches. These currently just use the end of the full period as the end date so searches only really work properly for the full period.  I’ll need to properly update the searches so everything is calculated for the user’s selected period.  Something to look at next week
  5. Plateau now uses ‘minimum frequency of mode’ rather than ‘minimum mode’ so you can say ‘show me only those categories where the mode appears 20 times’.
  6. I also investigated what red spots for the peaks were not appearing some times. It turns out this occurs when you select a time period within which the category’s peak doesn’t appear.  So not having the red dot for peaks when the peak is beyond the selected period is probably the best approach, and that the missing red dots is actually a feature rather than a bug.

On Friday we had a big project meeting for LDNA, which involved Marc, Fraser and I Skyping from Marc’s office into a meeting being held at Sheffield.  It was interesting to hear about the progress the project has been making and its future plans, and also to see the visualisation that Matt from Sheffield’s DHI (the renamed HRI) has been working on.  I didn’t go into any details about the visualisation I’m working on as they’re not yet in a fit state to share.  But hopefully I’ll be able to send round the URL to the team soon.

It’s a bank holiday next Monday so it will be a four-day week.  I need to get back into the SCOSYA project next week and also to continue with these LDNA visualisations so it’s shaping up to be pretty busy.

Week Beginning 3rd April 2017

I split my time this week pretty evenly between two projects, the Historical Thesaurus visualisations for the Linguistic DNA project and the New Modernist Editing project.  For the Historical Thesaurus visualisations I continued with the new filter options I had started last week that would allow users to view sparklines for thematic categories that exhibited particular features, such as peaks and plateaus (as well as just viewing sparklines for all categories).  It’s been some pretty complicated work getting this operational as the filters often required a lot of calculations to be made on the fly in my PHP scripts, for example calculating standard deviations for categories within specified periods.

I corresponded with Fraser about various options throughout the week.  I have set up the filter page so that when a particular period is selected by means of select boxes a variety of minimum and maximum values throughout the page are update by means of an AJAX call.  For example if you select the period 1500-2000 then the minimum and maximum standard deviation values for this period are calculated and displayed, so you get an idea of what values you should be entering.  See the screenshot below of an idea of how this works:

Eventually I will replace these ‘min’ and ‘max’ values and a textbox for supplying a value with a much nicer slider widget, but that’s still to be done.  I also need to write a script that will cache the values generated by the AJAX call as currently the processing is far too slow.  I’ll write a script that will generate the values for every possible period selection and will store these in a database table, which should mean their retrieval will be pretty much instantaneous rather than taking several seconds as it currently does.

As you can see from the above screenshot, the user has to choose which type of filter they wish to apply by selecting a radio button.  This is to keep things simple, but we might change this in future.  During the week we decided to remove the ‘peak decade’ option as it seemed unnecessary to include this when the user could already select a period.

After removing the additional ‘peak decade’ selector I realised that it was actually needed.  The initial ‘select the period you’re interested in’ selector specifies the total range you’re interested in and therefore sets the start and end points of each sparkline’s x axis.  The ‘peak decade’ selector allows you to specify when in this period a peak must occur for the category to be returned.  So we need the two selectors to allow you to do something like “I want the sparklines to be from 1010s to 2000s and the peak decade to be between 1600 and 1700”.

Section 4 of the form has 5 possible options.  Currently only ‘All’, ‘peak’ and ‘plateau’ are operational and I’ll need to return to this after my Easter holiday.  ‘All’ brings back sparklines for your selected period and a selected average category size and / or minimum size of largest category.

‘Peaks’ allows you to specify a period (a subset of your initial period selection) within which a category must have its largest value for it to be returned.  You can also select a minimum percentage difference between largest and end.  The difference is a negative value and if you select -50 the filter will currently only then bring back categories where the percentage difference is between -50 and 0.  I was uncertain whether this was right and Fraser confirmed that it should instead be between -50 and -100 instead so that’s another thing I’ll need to fix.

You can also select a minimum standard deviation.  This is calculated based on your initial period selection.  E.g. if you say you’re interested in the period 1400-1600 then the standard deviation is calculated for each category based on the values for this period alone.

‘Plateaus’ is something that needs some further tweaking.  The option is currently greyed out until you select a date range that is from 1500 or later.  Currently you can specify a minimum mode, and the script works out the mode for each category for the selected period and if the mode is less than the supplied minimum mode the category is not returned.  I think that specifying the minimum number of times the mode occurs would be a better indicator and will need to implement this.

You can also specify the ‘minimum frequency of 5% either way from mode’.  For your specified period this currently works out 5% under the mode as:  the largest number of words in a decade multiplied by 0.05 and then subtract this from the mode.  5% over is the value largest number of words multiplied by 0.05 added onto the mode.  E.g. if the mode is 261 and the largest is 284 then 5% under is 246.8 and 5% over is 275.2.

For each category in your selected period the script counts the number of times the number of words in a decade falls within this range.  If the tally for a category is less than the supplied ‘minimum frequency of 5% either way from mode’ then the category is removed from the results.

I’ve updated the display of results to include some figures about each category as well as the sparkline and the ‘gloss’.  Information about the largest decade, the average decade, the mode, the standard deviation, the frequency of decades that are within 5% under / over the mode and what this 5% under / over range is are displayed when relevant to your search.

There are some issues with the above that I still need to address.  Sometimes the sparklines are not displaying the red dots representing the largest categories, and this is definitely a bug.  Another serious bug also exists in that when some combinations of options are selected PHP encounters an error and the script terminates.  This seems to be dependent on the data and as PHP errors are turned off on the server I can’t see what the problem is.  I’m guessing it’s a divide by zero error or something like that.  I hope to get to the bottom of this soon.

For the New Modernist Editing project I spent a lot of my time preparing materials for the upcoming workshop.  I’m going to be running a two-hour lab on transcription, TEI and XML for post-graduates and I also have a further half-hour session another day where I will be demonstrating the digital edition I’ve been working on.  It’s taken some time to prepare these materials but I feel I’ve made a good start on them now.

I also met with Bryony this week to discuss the upcoming workshop and also to discuss the digital edition as it currently stands.  It was a useful meeting and after it I made a few minor tweaks to the website and a couple of fixes to the XML transcription.  I still need to add in the edited version of the text but I’m afraid that is going to have to wait until after the workshop takes place.

I also spent a little bit of time on other projects, such as reading through the proposal documentation for Jane Stuart-Smith’s new project that I am going to be involved with, and publishing the final ‘song of the week’ for the Burns project (See http://burnsc21.glasgow.ac.uk/my-wifes-a-wanton-wee-thing/).  I also spoke to Alison Wiggins about some good news she had received regarding a funding application.  I can’t say much more about it just now, though.

Also this week I fixed an issue with the SCOSYA Atlas for Gary.  The Atlas suddenly stopped displaying any content, which was a bit strange as I hadn’t made any changes to it around the time it appears to have broken.  A bit of investigation uncovered the source of the problem.  Questionnaire participants are split into two age groups – ‘young’ and ‘old’.  This is based on the participants age.  However, two questionnaires had been uploaded for participants whose ages did not quite fit into our ‘young’ and ‘old’ categories and they were therefore being given a null age group.  The Atlas didn’t like this and stopped working when it encountered data for these participants.  I have now updated the script to ensure the participants are within one of our age groups and I’ve also updated things so that if any other people don’t fit in the whole thing doesn’t come crashing down.

I’m going to be on holiday all next week and will be back at work the Tuesday after that (as the Monday is Easter Monday).

Week Beginning 27th March 2017

I spent about a day this week continuing to tweak the digital edition system I’m creating for the ‘New Modernist Editing’ project.  My first task was to try and get my system working in Internet Explorer, as my current way of doing things produced nothing more than a blank section of the page when using this browser.  Even though IE is now obsolete it’s still used by a lot of people and I wanted to get to the bottom of the issue.  The problem was that jQuery’s find() function when executed in IE won’t parse an XMLDocument object.  I was loading in my XML file using jQuery’s ‘get’ method, e.g.:

$.get(“xml/ode.xml”, function( xmlFile ) {

//do stuff with xml file here

}

After doing some reading about XML files in jQuery it looked like you had to run a file through parseXML() in order to work with it (see http://api.jquery.com/jQuery.parseXML/) but when I did this after the ‘get’ I just got errors.  It turns out that the ‘get’ method automatically checks the file it’s getting and if it’s an XML file is automatically runs it through parseXML() behind the scenes so the text file is already an XMLDocument object by the time you get to play with it.

Information on this page (http://stackoverflow.com/questions/4998324/jquery-find-and-xml-does-not-work-in-ie) suggested an alternative way to load the XML file so that it could be read in IE but I realised in order to get this to work I’d need to get the plain text file rather than the XMLDocument object that jQuery had created.  I therefore used the ‘ajax’ method rather than the shorthand ‘get’ method, which allowed me to specify that the returned data was be to treated as plain text and not XML:

$.ajax({

url: “xml/ode.xml”,

dataType: “text”}).done(function(xmlFile){

//do stuff with xml file here

});

This meant that jQuery didn’t automatically convert the text into an XMLDocument object and I was intending to then manually call the parseXML method for non-IE browsers and do separate things just for IE.  But rather unexpectedly jQuery’s find() function and all other DOM traversal methods just worked with the plain text, in all browsers including IE!  I’m not really sure why this is, or why jQuery even needs to bother converting XML into an XMLDocument Object if it can just work with it as plain text.  But as it appears to just work I’m not complaining.

To sum up:  to use jQuery’s find() method on an XML file in IE (well, all browsers) ensure you pass plain text to the object and not an XMLDocument object.

With this issue out of the way I set to work on adding some further features to the system.  I’ve integrated editorial notes with the transcription view, using the very handy jQuery plugin Tooltipster (http://iamceege.github.io/tooltipster/).  Words or phrases that have associated notes appear with a dashed line under them and you can click on the word to view the note and click anywhere to hide the note again.  I decided to have notes appearing on click rather than on hover because I find hovering notes a bit annoying and clicking (or tapping) works better on touchscreens too.  The following screenshot shows how the notes work:

I’ve also added in an initial version of the ‘Edition Settings’ feature.  This allows the user to decide how they would like the transcription to be laid out.  If you press on the ‘Edition Settings’ button this opens a popup (well, a jQuery UI modal dialog box, to be precise) through which you can select or deselect a number of options, such as visible line breaks, whether notes are present or not etc.  Once you press the ‘save’ button your settings are remembered as you browse between pages (but resets if you close your browser or navigate somewhere else).  We’ll eventually use this feature to add in alternatively edited views of the text as well – e.g. one that corrects all of the typos. The screenshot below shows the ‘popup’ in action:

I spent about a day on AHRC duties this week and did a few other miscellaneous tasks, such as making the penultimate Burns ‘new song of the week’ live (see http://burnsc21.glasgow.ac.uk/when-oer-the-hill-the-eastern-star/) and giving some advice to Wendy Anderson about OCR software for one of her post-grad students.  I had a chat with Kirsteen McCue about a new project she is leading that’s starting up over the summer and I’ll need to give some input into.  I also made a couple of tweaks to the content management system for ‘The People’s Voice’ project following on from our meeting last week.  Firstly, I added new field called ‘sound file’ to the poem table.  This can be used to add in the URL of a sound file for the poem.  I updated the ‘browse poems’ table to include a Y/N field for whether there is a sound file present so that the project team can therefore order the table by the column and easily find all of the poems that have sound files.  The second update I made was to the ‘edit’ pages for a person, publication or library.  These now list the poems that the selected item is associated with.  For people there are two lists, one for people associated as authors and another for people who feature in the poems.  For libraries there are two lists, one for associated poems and another for associated publications.  Items in the lists are links that take you to the ‘edit’ page for the listed poem / publication.  Hopefully this will make it easier for the team to keep track of which items are associated with which poems.

I also met with Gary this week to discuss the new ‘My Map Data’ feature I implemented last week for the SCOSYA project.  It turns out that display of uploaded user data isn’t working in the Safari browser that Gary tends to use, so he had been unable to see how the feature works.  I’m going to have to investigate this issue but haven’t done so yet.  It’s a bit of a strange one as the data all uploads fine – it’s there in the database and is spat out in a suitable manner by the API, but for some reason Safari just won’t stick the data on the map.  Hopefully it will be a simple bug to fix.  Gary was able to use the feature by switching to Chrome and is now trying it out and will let me know of any issues he encounters.  He did encounter one issue in that the atlas display is dependent on the order of the locations when grouping ratings into averages.  The file he uploaded had locations spread across the file and this meant there were several spots for certain locations, each with different average rating colours.  A simple reordering of his spreadsheet fixed this, but it may be something I need to ensure gets sorted programmatically in future.

I also spent a bit of time this week trying to write down a description of how the advanced attribute search will work.  I emailed this document to Gary and he is going to speak to Jennifer about it.  Gary also mentioned a new search that will be required – a search by participant rather than by location.  E.g. show me the locations where ‘participant a’ has a score of 5 for both ‘attribute x’ and ‘attribute y’.  Currently the search is just location based rather than checking that individual participants exhibit multiple features.

There was also an issue with the questionnaire upload facility this week.  For some reason the questionnaire upload was failing to upload files, even though there were no errors in the files.  After a bit of investigation it turned out that the third party API I’m using to grab the latitude and longitude was down, and without this data the upload script gave an error.  The API is back up again now, but at the time I decided to add in a fallback.  If this first API is down my script now attempts to connect to a second API to get the data.

I spent the rest of the week continuing to work on the new visualisations of the Historical Thesaurus data for the Linguistic DNA project.  Last week I managed to create ‘sparklines’ for the 4000 thematic headings.  This week I added red dots to the sparklines to mark where the peak values are.  I’ve also split the ‘experiments’ page into different pages as I’m going to be trying several different approaches.  I created an initial filter for the sparklines (as displaying all 4000 on one page is probably not very helpful).  This filter allows users to do any combination of the following:

Select an average category size range (between average size ‘x’ and average size ‘y’)

Select a period in which the peak decade is reached (between decade ‘x’ and decade ‘y’)

Select a minimum percentage rise of average

Select a minimum percentage fall of average (note that as this is negative values the search will bring back everything with a value less than or equal to the value you enter).

This works pretty nicely, so example the following screenshot shows all headings that have an average size of 50 or more and have a peak between 1700 and 1799:

With this initial filter option in place I started work on more detailed options that can identify peaks and plateaus and things like that.  The user first selects a period in which they’re interested (which can be the full date range) and this then updates the values that are possible to enter in a variety of fields by means of an AJAX call.  This new feature isn’t operational yet and I will continue to work on it next week, so I’ll have more to say about it in the next report.

 

Week Beginning 20th March 2017

I managed to make a good deal of progress with a number of different projects this week, which I’m pretty pleased about.  First of all there is the digital edition that I’m putting together for Bryony Randall’s ‘New Modernist Editing’ project.  Last week I completed the initial transcript of the short story and created a zoomable interface for browsing through the facsimiles.  This week I completed the transcription view, which allows the user to view the XML text, converted into HTML and styled using CSS.  It includes the notes and gaps and deletions but doesn’t differentiate between pencil and ink notes as of yet.  It doesn’t include the options to turn on / off features such as line breaks at this stage either, but it’s a start at least.  Below is a screenshot so you can see how things currently look.

The way I’ve transformed and styled the XML for display is perhaps a little unusual.  I wanted the site to be purely JavaScript powered – no server-side scripts or anything like that.  This is because the site will eventually be hosted elsewhere.  My plan was to use jQuery to pull in and process the XML for display, probably by means of an XSLT file.  But as I began to work on this I realised there was an even simpler way to do this.  With jQuery you can traverse an XML file in exactly the same way as an HTML file, so I simply pulled in the XML file, found the content of the relevant page and spat it out on screen.  I was expecting this to result in some horrible errors but… it just worked.  The XML and its tags get loaded into the HTML5 document and I can just style these using my CSS file.

I tested the site out in a variety of browsers and it works fine in everything other than Internet Explorer (Edge works, though).  This is because of the way jQuery loads the XML file and I’m hoping to find a solution to this.  I did have some nagging doubts about displaying the text in this way because I know that even though it all works it’s not valid HTML5. Sticking a bunch of <lb>, <note> and other XML tags into an HTML page works now but there’s no guarantee this will continue to work and … well, it’s not ‘right’ is it.

I emailed the other Arts Developers to see what they thought of the situation and discussed some other possible ways for handling things.  I decided I could leave things as they were.  I could use jQuery to transform the XML tags into valid HTML5 tags.  I could run my XML file through an XSLT file to convert it to HTML5 before adding it to the server so no transformation needs to be done on the fly.  I could see if it’s possible to call an XSLT file from jQuery to transform the XML on the fly.  Graeme suggested that it would be possible to process an XSLT file using JavaScript (as is described here https://www.w3schools.com/xml/xsl_client.asp) so I started to investigate this.

I managed to get something working, but… I was reminded just how much I really dislike XSLT files.  Apologies to anyone who likes that kind of thing but my brain just finds them practically incomprehensible.  Doing even the most simple of things seems far too convoluted.  So I decided to just transform the XML into HTML5 using jQuery.  There are only a handful of tags that I need to deal with anyway.  All I do is find each occurrence of an XML tag, grab its contents, add a span after the element and then remove the element, e.g:

 

//deletions

$(“del”).each(function(){

var content = “<span class=\”del\”>”+$(this).html()+”</span>”;

$(this).after(content);

$(this).remove();

});

 

I can even create a generic function that will pass the tag name and spit out a span with that tag name while removing the tag from the page.  When it comes to modifying the layout based on user preferences I’ll be able to handle that straightforwardly via jQuery too.  E.g. whether line breaks are on or off:

 

//line breaks

$(“lb”).each(function(){

If(lineBreaks==true)

$(this).after(“<br />”);

else

$(this).after(“ “);

$(this).remove();

});

 

For me at least this is a much easier approach than having to pass variables to an XSLT file.

I spent a day or so working on the SCOSYA atlas as well and I have now managed to complete work on an initial version of the ‘my map data’ feature.  This feature lets you upload previously downloaded files to visualise the data on the atlas.

When you download a file now there is a new row at the top that includes the URL of the query that generated the file and some explanatory text.  You can add a title and a description for your data in columns D and E of the first row as well.  You can make changes to the rating data, for example deleting rows or changing ratings and then after you’ve saved your file you can upload it to the system.

You can do this through the ‘My Map Data’ section in the ‘Atlas Display Options’.  You can either drag and drop your file into the area or click to open a file browser.  An ‘Upload log’ displays any issues with your file that the system may encounter.  After upload your file will appear in the ‘previously uploaded files’ section and the atlas will automatically be populated with your data.  You can re-download your file by pressing on the ‘download map data’ button again and you can delete your uploaded file by pressing on the appropriate ‘Remove’ button.  You can switch between viewing different datasets by pressing on the ‘view’ button next to the title.  The following screenshot shows how this works:

I tested the feature out with a few datasets, for example I swapped the latitude and longitude columns round and the atlas dutifully displayed all of the data in the sea just north of Madagascar, so things do seem to be working.  There are a couple of things to note, though.  Firstly, the CSV download files currently do not include data that is below the query threshold, so no grey spots appear on the user maps.  We made a conscious decision to exclude this data but we might now want to reinstate it.  Secondly, the display of the map is very much dependent on the URL contained in the CSV file in row 1 column B.  This is how the atlas knows whether to display an ‘or’ map or an ‘and’ map, and what other limits were placed on the data.  If the spreadsheet is altered so that the data contained does not conform to what is expected by the URL (e.g. different attributes are added or new ratings are given) then things might not display correctly.  Similarly, if anyone removes or alters that URL from the CSV files some unexpected behaviour might be encountered.

Note also that ‘my map data’ is private – you can only view your data if you’re logged in.  This means you can’t share a URL with someone.  I still need to add ‘my map data’ to the ‘history’ feature and do a few other tweaks.  I’ve just realised trying to upload ‘questionnaire locations’ data results in an error, but I don’t think we need to include the option to upload this data.

I also started working on the new visualisations for the Historical Thesaurus that will be used for the Linguistic DNA project, based on the spreadsheet data that Marc has been working on.  We have data about how many new words appeared in each thematic heading in every decade since 1000 and we’re going to use this data to visualise changes in the language.  I started by reading through all of the documentation that Marc and Fraser had prepared about the data, and then I wrote some scripts to extract the data from Marc’s spreadsheet and insert it into our online database.  Marc had incorporated some ‘sparklines’ into his spreadsheet and my first task after getting the data available was to figure out a method to replicate these sparklines using the D3.js library.  Thankfully, someone had already done this for stock price data and had created a handy walkthrough of how to do it (see http://www.tnoda.com/blog/2013-12-19).  I followed the tutorial and adapted it for our data, writing a script that created sparklines for each of the almost 4000 thematic headings we have in the system and displaying these all on a page.  It’s a lot of data (stored in a 14Mb JSON file) and as of yet it’s static, so users can’t tweak the settings to see how this affects things, but it’s a good proof of concept.  You can see a small snippet from the gigantic list below:

Other than these tasks I published this week’s new Burns song (see http://burnsc21.glasgow.ac.uk/braw-lads-on-yarrow-braes/) and I had a meeting with The People’s Voice project team where we discussed how the database of poems will function, what we’ll be doing about the transcriptions, and when I will start work on things.  It was a useful meeting and in addition to these points we identified a few enhancements I am going to make to the project’s content management system.  I also answered a query about some App development issues from elsewhere in the University and worked with Chris McGlashan to implement an Apache module that limits access to the pages held on the Historical Thesaurus server so as to prevent people from grabbing too much data.

 

Week Beginning 13th March 2017

At the start of the week I had to spend a little time investigating some issues with a couple of my WordPress sites, which were failing to connect to external services such as the Akismet anti-spam service.  It turned out that there had been a hardware failure on one of the servers which had affected outgoing connections. With the help of Chris I managed to get this sorted again and also took the opportunity to upgrade all of the WordPress instances I manage to the latest incremental version.  Some further maintenance issues were encountered later in the week when Chris informed me that one of our old sites (the Old English Teaching site that was set up long before I started working in my current role) had some security issues, so I fixed these as soon as I could.

I spent about two days this week working on the New Modernist Editing project for Bryony Randall.  I’m creating a digital edition of a short story by Virginia Woolf that will include a facsimile view and a transcription view, with the transcription view offering several ways to view the text by turning on or off various features of the text.  One possibility I investigated was using the digital edition system that has been developed for the Digital Vercelli Book (see http://vbd.humnet.unipi.it/beta2/#).  This is a really lovely interface for displaying facsimiles and TEI texts and the tool is available to download and reuse.  However, it offers lots of functionality that we don’t really need and it doesn’t provide the facility to tailor the transcription view based on the selection or individual features.  While it would be possible to add this feature in, I decided that it would be simpler if I just made a simple system from scratch myself.

I used OpenLayers (http://openlayers.org/) to create a zoomable interface for the facsimile view and a few lines of jQuery handled the navigation, the display of a list of thumbnails and things like that.  I also added in a section for displaying the transcription and facilities to turn the facsimile and transcription view on or off.  I’m pretty happy with how things are progressing.  Here’s a screenshot of the facsimile view as it currently stands:

I also worked on the transcription itself, completing an initial transcription of the six manuscript pages as TEI XML.  This included marking deleted text, notes, illegible text, gaps in the text and things like that.  It’s only a first attempt and I still might change how certain aspects are marked up, but it’s good to have something to work with now.

I uploaded this week’s new Burns song of the week (See http://burnsc21.glasgow.ac.uk/lassie-wi-the-lintwhite-locks/) and corresponded with Kirsteen about a new web resource she requires, with Craig regarding his Burns bibliography database and with Ronnie about his Burns paper database.  I also spent a few hours on further AHRC review duties and had a meeting with Marc and Fraser about future Historical Thesaurus plans and the Linguistic DNA project.  I’m going to be producing some new visualisations of the thesaurus data to show the change in language over time.  I can’t really say more about them at this stage, but I’ll start investigating the possibilities next week.

For the remainder of the week I continued to work on the ‘upload my data’ facilities for the SCOSYA project.  Last week I completed the options for uploading data files and this week I set to work on actually allowing these files to be visualised through the atlas.  It’s proving to be a tricky process to get sorted and thus far I haven’t managed to visualise anything other than a JavaScript error.  The problem is I can’t just take the ratings and data points and display them – instead I need to display them in a way that matches the selection options that the user chose when they downloaded the file – for example the Boolean operators used to join the attributes.  I’m definitely making progress with this, but it’s pretty far from being finished as of yet.  I’ll continue with this next week, though.

Week Beginning 3rd October 2016

I spent most of my time this week split between two projects:  SCOSYA and REELS.  I’ll discuss REELS first.  On Wednesday I attended a project meeting for the REELS project, the first one I’ve attended for several months as I had previously finished work on the content management system for the project and didn’t have anything left to do for the project for a while.  The project has recently appointed their PhD student so it seemed like a good time to have a full project meeting and it was god to catch up with the project again and meet the new member of staff.  A few updates and fixes to the content management system were requested at the meeting, so I spent some time this week working on these, specifically:

  1. I added a new field to the place-name table for recording whether the place-name is ‘non-core’ or not. This is to allow names like ‘Edrington’, that appear in names like ‘Edrington Castle’ but don’t appear to exist as names in their own right to be recorded.  It’s a ‘yes/no’ field as with ‘Obsolete’ and ‘Linear’ and appears on the ‘add’ and ‘edit’ place-name page underneath the ‘Linear’ option.
  2. I fixed the issue caused when selecting the same parish as both ‘current’ and ‘former’. The system was giving an error when this situation arose and I realised this is because the primary key for the table connecting place-name and parish was composed of the IDs for the relevant place-name and parish – i.e. only one join was possible for each place-name / parish pairing.  I fixed this by adding in the ‘type’ field (current or former) to the primary key, this allowing one of each type to appear for each pairing.
  3. I updated the column sorting in the ‘browse place-names’ page so that pressing on a column heading sorts the complete dataset on this column rather than just the 50 rows that are displayed at any one time. Pressing on the column header once orders it ascending and a second time orders it descending.  This required a pretty major overhaul of the ‘browse’ page as sorting had to be done on the server side rather than the client side.  Still, it works a lot better now.
  4. I added a rudimentary search facility to the ‘browse place-names’ page, which replaces the ‘select parish’ facility. The search facility allows you to select a parish and/or a code and/or supply some text that may be found in the place-name field.  All three search options may be combined – e.g. list all place-names that include the text ‘point’ that are coastal in EYM.  The text search is currently pretty basic: it matches any part of the place-name text and no wildcards can be used.  E.g. a search for ‘rock’ finds ‘Brockholes’.  Hopefully this will suffice until we’re thinking about the public website.
  5. I tested adding IPA characters to the ‘pronunciation’ field and this appears to work fine (I’m sure I would have tested this out when I originally created the CMS anyway but just thought I’d check again).

In addition I also met separately with the project’s PhD student to go over the content management system with him.  That’s probably all I will need to do for the project until we come to develop the front end, which I’ll make a start on sometime next year.

For the SCOSYA project this week I finished work on a table in the CMS that shows consistent / conflicted data.  This can be displayed as a table in your browser or saved as a CSV to open in Excel.  The structure of the table is as Gary suggested to me last week:

One row per attribute (e.g. ‘A1’) and one column per location (e.g. Airdrie).  If all of the ratings for an attribute for a location are 4 or 5 then the cell contains ‘High’.  If all of the ratings are 1-2 then the cell contains ‘Low’.  If the ratings are something else then the cell contains ‘Mixed’.  Note that if the attribute was not recorded for a location the cell is left blank.  In the browser based table I’ve given the ‘Mixed’ cells a yellow border so you can more easily see where these appear.

I have also added in a row at the top of the table that contains the percentage of attributes for each location that are ‘Mixed’.  Note that this percentage does not take into consideration any attributes that are not recorded for a location.  E.g. if ‘Location A’ has ‘High’ for attribute A1, ‘Mixed’ for A2 and blank for A3 then the percentage mixed will be 50%.  I have also added in facilities to limit the data to the young or old age groups.  Towards the end of the week I met with Gary again and he suggested some further updates to the table, which I will hopefully implement next week.

I also met with Gary and Jennifer this week to discuss the tricky situation with grey squares vs grey circles on the map as discussed in last week’s post.  We decided to include grey circles (i.e. there is data but it doesn’t meet your criteria) for all locations where there is data for the specified attributes, so long as the attribute is not included with a ‘NOT’ joiner.  After the meeting I updated the map to include such grey circles and it appears to be working pretty well.  I also updated the map pop-ups to include more information about each rating, specifically the text for the attribute (as opposed to just the ID) and the age group for each rating.

The last big thing I did for the project was to add in a ‘save image’ facility to the atlas, which allows you to save an image of the map you’re viewing, complete with all markers.  This was a pretty tricky thing to implement as the image needs to be generated in the browser by pulling in and stitching together the visible map tiles, incorporating all of the vector based map marker data and then converting all of this into a raster image.  Thankfully I found a plugin that handled most of this (https://github.com/mapbox/leaflet-image), although it required some tweaking and customisation to get it working.  The PNG data is created as Base64 encoded text, which can then be appended to an image tag’s ‘src’ attribute.  What I really wanted was to have the image automatically work as a download rather than get displayed in the browser.  Unfortunately I didn’t manage to get this working.  I know it is possible if the Base64 data is posted to a server which then fires it back as a file for download (I did this with Mapping Metaphor) but for some reason the server was refusing to accept the data.  Also, I wanted something that worked on the client side rather than posting and then retrieving data to / from the server, which seems rather wasteful.  I managed to get the image to open in a new window, but this meant the full image data appeared in the browser’s address bar, which was horribly messy.  It also meant the user still had to manually select ‘save’.  So instead I decided to have the image open in page, in an overlay.  The user still has to manually save the image, but it looks neater and it allows the information about image attribution to be displayed too.  The only further issue was that this didn’t work if the atlas was being viewed in ‘full screen’ mode so I had to figure out a way of programmatically exiting out of full screen mode if the user pressed the ‘save image’ button when in this view.  Thankfully I found a handy function call that did just this: fullScreenApi.cancelFullScreen();

Fraser contacted me on Monday to say that Lancaster have finished tagging the EEBO dataset for the LinguisticDNA project and were looking to hand this over to us.  On Tuesday Lancaster placed the zipped data on a server and I managed to grab it, extracting the 11GB zip file into 25,368 XML files (although upon closer inspection the contents don’t appear to really be XML and are really just tab delimited text with a couple of wrapper tags).  I copied this to the J: drive for Fraser and Marc to look at.

I also had an email chat with Thomas Widmann at SLD about the font used for the DSL website.  Apparently this doesn’t include IPA characters which is causing the ‘pronunciation’ field to display inconsistently (i.e. the letters that the font does include are in the font and the ones it doesn’t are in the computer’s default sans serif font).  On My PC the different in character size is minimal but I think it looks worse on Thomas’s computer.  We discussed possible solutions, the easiest of which would be to simply ensure that the ‘pronunciation’ field is fully displayed in the default sans serif font.  He said he’d get back to me about this.  I also gave a little bit of WordPress help to Maria Economou in HATII, who had an issue with a drop-down menu not working in iOS.  We upgraded her theme to one that supported responsive menus and fixed that issue pretty quickly.

I also met with Fraser and Marc on Friday to discuss the new Historical Thesaurus data that we had received from the OED people.  We are going to want to incorporate words and dates from this data into our database, which is going to involve several potentially tricky stages.  The OED data is in XML and as I mentioned in a previous week there is no obvious ID that can be used to link their data to ours.  Thankfully during the SAMUELS project someone had figured out how to take data in one of our columns and rework it so it matches up with the OED category IDs.  My first step will be to extract the OED data from XML and convert it into a format similar to our structure and then create a script that will allow categories in the two datasets to be aligned.  After that we’ll need to compare the contents of the categories (i.e. the words) and work out which are new ones plus which dates don’t match up.  It’s going to be a fairly tricky process but it should be fun.  On Friday afternoon I decided to up the HT database to remove all of the unnecessary backup tables that I had created over the years.  I did a dump of the database before I did this in case I messed up.  It turns out I did mess up as I accidentally deleted the ‘lexeme_search_terms’ table, which broke the HT search facility.  I then discovered that my SQL dump was incomplete and had quit downloading mid-way through without telling me.  Thankfully Chris managed to get the database from a backup file and I’ve reinstated the required table, but it was a rather stressful way to end the week!

Week Beginning 14th September 2015

I attended a project meeting and workshop for the Linguistic DNA project this week (see http://www.linguisticdna.org/ for more information and some very helpful blog posts about the project). I’m involved with the project for half a day a week over the next three years, but that effort will be bundled up into much larger chunks. At the moment there are no tasks assigned to me so I was attending the meeting mainly to meet the other participants and to hear what has been going on so far. It was really useful to meet the project team and to hear about their experiences with the data and tools that they’re working with so far. The day after the project meeting there was a workshop about the project’s methodological approach, and this also featured a variety of external speakers who are dealing or who have previously dealt with some of the same sorts of issues that the project will be facing, so it was hugely informative to hear these speakers too.

Preparing for, travelling to and attending the project meeting and workshop took up a fair chunk of my working week, but I did also manage to squeeze in some work on other projects as well. I spent about a day continuing to work on the Medical Humanities Network website, adding in the teaching materials section and facilities to manage teaching materials and the images that appear in the carousel on the homepage.  I’ve also updated the ‘spotlight on’ feature so that collections and teaching materials can appear in this section in addition to projects. That just leaves keyword management, the browse keywords feature, and organisation / unit management to complete. I also spent a small amount of time updating the registration form for Sean’s Academic Publishing event. There were a couple of issues with it that needed tweaking, for example sending users a notification email and things like that. All fairly minor things that didn’t take long to fix.

I also gave advice to a couple of members of staff on projects they are putting together. Firstly Katherine Heavey and secondly Alice Jenkins. I can’t really go into any detail about their projects at this stage, but I managed to give them some (hopefully helpful) advice. I met with Fraser on Monday to collect my tickets for the project meeting and also to show him developments on the Hansard visualisations. This week I added a couple of further enhancements which enable users to add up to seven different lines on the graph. So for example you can compare ‘love’ and ‘hate’ and ‘war’ and ‘peace’ over time all on the same graph. It’s really quite a fascinating little tool to use already, but of course there’s still a lot more to implement. I had a meeting with Marc on Wednesday to discuss Hansard and a variety of other issues. Marc made some very good suggestions about the types of data that it should be possible to view on the graph (e.g. not just simple counts of terms but normalised figures too).

I also met with Susan and Magda on Monday to discuss the upcoming Scots Thesaurus launch. There are a few further enhancements I need to make before next Wednesday, such as adding in a search term variant table for search purposes. I also need to prepare a little 10 minute talk about the implementation of the Scots Thesaurus, which I will be giving at the colloquium. There’s actually quite a lot that needs to be finished off before next Wednesday and a few other tasks I need to focus on before then as well, so it could all get slightly rushed next week.