On Monday this week I had a Skype meeting with the HRI people in Sheffield (recently renamed the Digital Humanities Institute) about the Linguistic DNA project. I demonstrated the sparklines I’ve been working on and showed them the API and talked about the heatmap that I will also be developing, and the possibility of using the ‘highcharts’ view that I am hoping to use in addition to the sparkline view. Mike and Matt talked about the ‘workbench’ that they are going to create for the project that will allow researchers to visualise the data. They’re going to be creating a requirements document for this soon and as part of this they will look at our API and visualisations and work out how these might also be incorporated, and if further options need to be added to our API.
I was asked to review a paper this week so I spent a bit of time reading through it and writing a review. I also set up an account on the Burns website for Craig Lamont, who will be working on this in future and responded to a query about the SciFiMedHums bibliography database. I also had to fix a few issues with some websites following their migration to a new server and had to get some domain details to Chris to allow him to migrate some other sites.
I also spent a few hours getting back into the digital edition I’m making for Bryony Randall’s ‘New Modernist Editing’ project. The last feature I need to add to my digital edition is editorial corrections. I needed to mark up all of the typos and other errors in the original text and record the ‘corrected’ versions that Bryony had supplied me with. I also then needed to update the website to allow users to switch between one view and the other using the ‘Edition Settings’ feature. I used the TEI <choice> element with a <sic> tag for the original ‘erroneous’ text and a <corr> tag for the ‘corrected’ text. This is the standard TEI way of handling such things and it works rather well. I updated the section of jQuery that processes the XML and transforms it into HTML. When the ‘Edition Settings’ has ‘sic’ turned on and ‘corr’ turned off the original typo filled text is displayed. When ‘corr’ is turned on and ‘sic’ is turned off you get the edited text and when both ‘sic’ and ‘corr’ are turned on the ‘sic’ text is given a red border while the ‘corr’ text is given a green border so the user can see exactly where changes have been made and what text has been altered. I think it works rather nicely. See the following screenshot for an example. I have so far only added in the markup for the first two pages but I hope to get the remaining four done next week.
For the rest of the week I focussed on the sparkline visualisations for the LDNA project. Last week I created an API for the Historical Thesaurus that will allow the visualisations (or indeed anyone’s code) to pass query strings to it and receive JSON or CSV formatted data in return. This week I created new versions of the sparkline visualisations that connected to this API. I also had to update my ‘period cache’ data to include ‘minimum mode’ for each possible period, in addition to ‘minimum frequency of mode’. This took quite a while to process as the script needed to generate the mode for every single possible combination of start and end decade over a thousand years. It took a few hours to process but once it had completed I could update the ‘cache’ table in the online version of the database and update the sparkline search form so that it would pull in and display these.
I also started to make some further changes to the sparkline search form. I updated the search type boxes so that the correct one is highlighted as soon as the user clicks anywhere in the section, rather than having to actually click on the radio button within the section. This makes the form a lot easier to use as previously it was possible to fill in some details for ‘peak’, for example, but then forget to click on the ‘peak’ radio button, meaning that a ‘peak’ search didn’t run. I also updated the period selectors so that instead of using two drop-down lists, one for ‘start’ decade and one of ‘end’ decade there is now a jQuery UI slider that allows a range to be selected. I think this is nicer to use, although I have also started to employ sliders to other parts of the form to and I worry that it’s just too many sliders. I might have to rethink this. It’s also possible slightly confusing that updating the ‘period’ slider then updates the extent of the ‘peak decade’ slider, but the size of this slider remains the same as the full period range slider. So we have one slider where the ends represent 1010s and 2000s but if you select the period 1200s to 1590s within this then the full extent of the ‘peak decade’ slider is 1200s to 1590s, even though it takes up the same width as the other slider. See the screenshot below. I’m thinking that this is just going to be too many sliders.
I also updated the search results page. For ‘plateau’ searches I updated the sparklines so that the plateaus rather than peaks were highlighted with red spots. I also increased the space devoted to each sparkline and added a border between each to make it easier to tell which text refers to which line. I also added in an ‘info’ box that when clicked on gives you some statistics about the lines, such as the number of decades represented, the size of the largest decade and things like that. I also added in a facility to download the CSV data for the sparklines you’re looking at.
I’ll continue with this next week. What I really need to do is spend quite a bit of time testing out the search facilities to ensure they return the correct data. I’m noticing some kind of quirk with the info box pop-ups, for example, that seems to sometimes display incorrect values for the lines. There are also some issues relating to searches that do not cover the full period that I need to investigate. And then after that I need to think about the heatmap and possible using HighCharts as an alternative to the D3 sparklines I’m currently using. See the screenshot abovefor an example of the sparklines as they currently stand.