Week Beginning 8th January 2018

This was my first full, five-day week back after the Christmas holidays, and I spent the majority of it continuing to work on the new timeline visualisation for the Historical Thesaurus, plus some other interface updates that were proposed during the meeting Marc, Fraser and I had last week.  I managed to make quite a bit of progress on the visualisation and also the way in which dates are stored in the underlying database.  The HT has many different date fields, but the main ones are ‘firstd’, ‘midd’, and ‘lastd’.  Each of these has a second ‘b’ field where a potential second, later date can be added, which gives (for example) ‘1400/50’ as a date.  These ‘b’ fields generally (but not always) contain dates as two, or even one-digit numbers, so in the previous example the ‘b’ field just holds ‘50’ and not ‘1450’.  If a date was ‘1400/6’ the ‘b’ field might just have a ‘6’ in it, while if a date was 1395/1410 all four digits would be stored in the ‘b’ field.  The current setup is therefore inconsistent and makes it difficult for scripts to work with and we decided to update the ‘b’ fields to always use four digits.  I wrote a script to do this, and successfully updated all of the ‘b’ dates.  I also then updated the timeline visualisation to always use the ‘b’ date for the end date of a timeline, if it existed.  I then wrote two further scripts, one to check that all ‘b’ dates are actually after the main dates (it turns out there are a handful that aren’t, or are identical to the main date), and the other to list all of the words that have a ‘b’ date that is less then five years away from the main date, as in such cases it is likely that the date should actually just be a ‘circa’ instead.

I also wrote some further checking scripts for dates, including one to pull out all occasions where the fields connecting dates together (with can either be a dash to indicate a range or a plus to indicate separate occurrences) have two dashes in a row, or where there is a final dash where the word is set as ‘current’.  These are probably errors as it means two ranges are next to each other, which shouldn’t happen.  E.g. ‘1200-1400-1600’, or ‘1600-1800-‘ don’t make much sense.  Another date checking script I wrote was to find all words that have a ‘plus’ connecting dates together (e.g. ‘1400 + 1800’) where the amount of time between the two dates is less than 150 years.  There was a rule when compiling the HT that if there were less than 150 years between dates these shouldn’t be treated as a ‘plus’ gap.  There were quite a few words that had a gap of less than 150 years and I send the resulting output of my script to Fraser and Marc for them to check through.

Turning to the timeline script itself, I fixed a couple of outstanding issues from last week, namely the pop-ups were not appearing in the right place for words that had multiple date periods.  This is because I had assigned an ID to the word row rather than each individual block of time.  I had to update the way in which I was generating the data for the timeline, and tweak the timeline JavaScript a bit, but thankfully I got the pop-ups working properly.  I had also noticed that some ‘dot’ end dates were extending up to ‘current’, which meant something was wrong with my date processing algorithm.  It turned out I’d missed out an equals sign in my code, and adding this in sorted the issue.

An update to the HT website that Marc was keen to implement in addition to the timeline visualisations is a ‘fixed’ header for the category browse page.  Such a header would appear ‘fixed’ at the top of the screen as the user scrolls down the page, thus enabling the user to tell at a glance what category they are looking at, even when far down the page.  I’d implemented something similar to this for the DSL website a few years ago (e.g. go here and start scrolling down the page: http://dsl.ac.uk/entry/snd/dreich) so reckoned it would be pretty straightforward to do something similar for the HT.  It took a bit of time to get a test version working, as I had to create new, test versions of several files (e.g. JavaScript, CSS, API, PHP) in order to be able to play about without breaking the live site.

In the test version, when the top of the category heading section scrolls off the page the fixed header fades in, and when it scrolls into view again the fixed header fades out.  Currently the header takes up the full width of the screen and has the same background colour as the main HT banner.  I’ve also added in the HT logo, which you can click to return to the homepage.  It’s a bit fuzzy looking in Chrome (but not other browsers), though.  The heading displays the noun hierarchy for the current category, which reflects the tree structure that is currently open on the page.  You can click on any level in the hierarchy to jump to it.  The current category’s Catnum, PoS and Heading are also displayed.  After some helpful feedback from Fraser I also added in a means of selecting a subcategory and for the subcategory hierarchy to be added to the fixed header too, which works as follows:

  1. Clicking on a subcategory gives its box a yellow border, which I think is pretty useful as you can then scroll about the page and quickly find the thing you’re interested in again.
  2. Clicking on the box also replaces the ID in the URL with the subcat URL, so you can now much more easily bookmark a subcat, or share the URL.  Previously you had to open the ‘cite’ box for the subcat to get the URL for a specific subcat.
  3. Clicking on a highlighted subcat removes the highlighting, in case you don’t like the yellow.  Note that this does not currently reset the ID in the URL to the maincat URL, but I think I will update this.
  4. Highlighting a category adds the subcat hierarchy to the fixed header so you can see at a glance the pathway from the very top of the HT to the subcat you’re looking at.
  5. When you follow a URL to a subcat ID the subcat is automatically highlighted and the subcat hierarchy is automatically added to the fixed header, in addition to the page scrolling to the subcat (as it previously did).

I think this will all be very helpful to users, and although it is not currently live, here is a screenshot showing how it works:

Returning to the timeline, I have changed the x axis so that it now starts at 1100 rather than 1000.  The 1100 label now displays as ‘OE*’ and if you click on it you now get the same message that is displayed on the MM timeline, namely “The English spoken by the Anglo-Saxons before c.1150, with the earliest written sources c.700”.  OE words on the timeline are no longer displayed as dots but instead have rectangles starting at the left edge of the visualisation and ending at 1150.  Once I figure out how to add in curved and pointy ends these will be given a pointy arrow on the left and a curve on the right.  I also added in faint horizontal lines between the individual timelines, to help keep your eye in a line.  Here’s an example of how things currently look:

I also started to investigate how to add in these ‘curved’ and ‘pointy’ ends to the rectangles in the timeline.  This is going to be rather tricky to implement as it means reverse engineering and then extending the timeline library I’m using, and also trying to figure out just how to give rectangles curved edges in D3, or how to append an arrow to a rectangle.  I’ll also need to find a way to pass data about ‘circa’ and ‘ante’ dates to the timeline library.  Thankfully I made a bit of progress on all of this.  It turns out I can add any additional fields that I want to the timeline’s JSON structure, so adding in ‘circa’ fields etc. will not be a problem.  Also, the timeline library’s code is pretty well structured and easy to follow.  I’ve managed to update it so that it checks for my ‘circa’ fields (but doesn’t actually do anything about them yet).  Also, there are ways of giving rectangles rounded corners in D3 (e.g. https://bl.ocks.org/mbostock/3468167) so this might work ok (although it’s not quite so simple as I will need to extend the rectangle beyond its allotted space in the timeline before the curves start).  Arrows still might prove tricky, though.  I’ll continue with this next week.

Other than HT related work I did a few other bits and bobs.  I met with Graeme to discuss a UTF8 issue he was experiencing with a database of his.  I met with Megan Coyer to discuss an upcoming project that will involve OCR, I had a chat with Luca about a Technical Plan he is putting together, I responded to a request from Stuart Gillespie about a URL he needs to incorporate into a printed volume, I helped Craig Lamont out with an issue relating to Google Analytics for the ‘Edinburgh’s Enlightenment’ site we put together a while back, I tracked down some missing sound files for the SPADE project and read through and gave feedback on a document Rachel had written about setting up Polyglot, and I had a conversation with Eleanor Lawson and Jane Stuart-Smith about future updates to the Seeing Speech website.  All in all it’s been a pretty busy week.