I spent a fair amount of time this week on Historical Thesaurus related matters. Towards the start of the week I continued with the mammoth task of integrating the new OED data with the HT data. I created a few new checking and integration scripts to try and find patterns in the HT and OED category names in order to be able to match them up. Out of a total of 235,249 categories we now have 221,336 that are marked as checked.
On Wednesday Fraser, Marc and I had a meeting to discuss how to proceed with the rest of the HT / OED linking and also to consider what updates to make to the HT website. We came up with a few ideas that we are going to try and implement in the next few weeks. I can’t really say much more about it yet, though.
I spent about a day this week working on the Burns project, creating a new subsection of the website about Burns and the Fiddle. This included creating pages, positioning images, creating MP3 files from audio files in other formats, uploading everything and making it all look nice. The section isn’t going live yet as there are further tweaks to be made, but most of the content is now in place.
I had a couple of meetings with Luca Guariento this week to discuss some of the technical issues he’s working through at the moment, including working with APIs with jQuery and some issues with some OpenLayers maps he’s working on. I also helped Gary with a couple of minor SCOSYA issues, spoke to Ronnie Young about a Burns project he’s putting together, talked to Jane Stuart-Smith about the website for her SPADE project and had a chat with Bryony Randall about the digital edition we’re working on. I also attended a college-wide meeting about critical editions that had been organised by Sheila Dickson from the School of Modern Languages and Cultures. It was an interesting meeting to attend and it looks like I might be involved in setting up a website that will showcase the critical edition work that is based at the university. I’ll just need to wait and see if anything comes from this, but hopefully it will.
On Friday I returned to The People’s Voice project and continued to work on the public interface to the poem database. I reinstated the ‘publication’ field as a drop-down list, as Catriona requested that it was added back in. I also added the ‘autocomplete’ feature to the required fields (author, set tune title, featured individual, publication name). Now if you start typing into these fields anything that matches will be displayed in a list and can be selected. I also included ‘pseudonym’ in the ‘author’ and ‘featured individual’ autocomplete search. I then updated the form so that the publication country and city lists are now populated from the data in the database. The ‘city’ list updates depending on the user’s choice of country. I also added in a query to generate the Archive / Library multi-select based on the data and I started to work on the code that will take the user’s selected options, process them, build a query and display the results. So far you can search for title, set tune, set tune title, audio, comments and franchise only (any combination of these). Results aren’t displaying yet but the number of poems that match your search are. There’s still lots to do here and I’ll hopefully be able to continue with this next Friday.
I was ill this week and was off work sick on Wednesday and Thursday and because of this I didn’t manage to get much done with regards to the LDNA visualisations or the SCOSYA atlas. I had to spend about half a day upgrading the 20 WordPress sites I manage to the most recent release and I spoke to Jane about the problems we’re having getting some space set aside for the website for her SPADE project. I also replied to Alex Benchimol about a project he is putting together and completed the review of the paper that I had started last week.
On Tuesday I completed an initial version of the digital edition I have been working on for Bryony Randall’s New Modernist Editing project. This version now includes all of the features I had intended to implement and I completed the TEI transcription for the whole manuscript. Using the resource it is now possible to tailor the view as you would like to see it, ranging from a version of the text that closely represents the original typewritten page, including line breaks, handwritten notes and typos, through to a fully edited and ‘corrected’ version of the text complete with explanatory notes that closely resembles the text you’d find in an edited and printed edition. I think it works rather well, although there are still some aspects that will need tweaking, such as adding in an introduction, possibly updating the way some features are labelled and maybe changing the way some of the document is tagged.
When I returned to work on Friday I decided to make a start on the public interface for the database of poems and songs for The People’s Voice project. I spent the morning writing a specification document and thinking about how the search would work. The project website is WordPress based and I did consider developing the search as a WordPress plugin, as I have done for other projects, such as the SciFiMedHums project (see http://scifimedhums.glasgow.ac.uk/the-database/). However, I didn’t want the resource to be too tied into WordPress and instead wanted it to be useable (with minimal changes) independently of the WordPress system. Having said that, I still wanted the resource to feel like a part of the main project website and to use the WordPress theme the rest of the site uses. After a bit of investigation I found a way to create a PHP page that is not part of WordPress but ‘hooks’ into some WordPress functions in order to use the website’s theme. Basically you add a ‘require’ statement that pulls in ‘wp-load.php’ and then you can call the functions that process the WordPress header, sidebar and footer (get_header(), get_sidebar(), get_footer()) wherever you want these to appear. All the rest of your script can be as you want it.
I emailed my specification document to the project team and started to work on the search interface in the afternoon. This is going to use jQuery UI components so I created a theme for this and set up the basic structure for the search form. It’s not fully complete yet as I need to add in some ‘auto-complete’ functions and some content for drop-down lists, but the overall structure is there. The project team wanted pretty much every field in the database to be searchable, which makes for a rather unwieldy and intimidating search form so I’m going to have to think of a way to make this more appealing. I’ll try to continue with this next week, if I have the time.
On Monday this week I had a Skype meeting with the HRI people in Sheffield (recently renamed the Digital Humanities Institute) about the Linguistic DNA project. I demonstrated the sparklines I’ve been working on and showed them the API and talked about the heatmap that I will also be developing, and the possibility of using the ‘highcharts’ view that I am hoping to use in addition to the sparkline view. Mike and Matt talked about the ‘workbench’ that they are going to create for the project that will allow researchers to visualise the data. They’re going to be creating a requirements document for this soon and as part of this they will look at our API and visualisations and work out how these might also be incorporated, and if further options need to be added to our API.
I was asked to review a paper this week so I spent a bit of time reading through it and writing a review. I also set up an account on the Burns website for Craig Lamont, who will be working on this in future and responded to a query about the SciFiMedHums bibliography database. I also had to fix a few issues with some websites following their migration to a new server and had to get some domain details to Chris to allow him to migrate some other sites.
I also spent a few hours getting back into the digital edition I’m making for Bryony Randall’s ‘New Modernist Editing’ project. The last feature I need to add to my digital edition is editorial corrections. I needed to mark up all of the typos and other errors in the original text and record the ‘corrected’ versions that Bryony had supplied me with. I also then needed to update the website to allow users to switch between one view and the other using the ‘Edition Settings’ feature. I used the TEI <choice> element with a <sic> tag for the original ‘erroneous’ text and a <corr> tag for the ‘corrected’ text. This is the standard TEI way of handling such things and it works rather well. I updated the section of jQuery that processes the XML and transforms it into HTML. When the ‘Edition Settings’ has ‘sic’ turned on and ‘corr’ turned off the original typo filled text is displayed. When ‘corr’ is turned on and ‘sic’ is turned off you get the edited text and when both ‘sic’ and ‘corr’ are turned on the ‘sic’ text is given a red border while the ‘corr’ text is given a green border so the user can see exactly where changes have been made and what text has been altered. I think it works rather nicely. See the following screenshot for an example. I have so far only added in the markup for the first two pages but I hope to get the remaining four done next week.
For the rest of the week I focussed on the sparkline visualisations for the LDNA project. Last week I created an API for the Historical Thesaurus that will allow the visualisations (or indeed anyone’s code) to pass query strings to it and receive JSON or CSV formatted data in return. This week I created new versions of the sparkline visualisations that connected to this API. I also had to update my ‘period cache’ data to include ‘minimum mode’ for each possible period, in addition to ‘minimum frequency of mode’. This took quite a while to process as the script needed to generate the mode for every single possible combination of start and end decade over a thousand years. It took a few hours to process but once it had completed I could update the ‘cache’ table in the online version of the database and update the sparkline search form so that it would pull in and display these.
I also started to make some further changes to the sparkline search form. I updated the search type boxes so that the correct one is highlighted as soon as the user clicks anywhere in the section, rather than having to actually click on the radio button within the section. This makes the form a lot easier to use as previously it was possible to fill in some details for ‘peak’, for example, but then forget to click on the ‘peak’ radio button, meaning that a ‘peak’ search didn’t run. I also updated the period selectors so that instead of using two drop-down lists, one for ‘start’ decade and one of ‘end’ decade there is now a jQuery UI slider that allows a range to be selected. I think this is nicer to use, although I have also started to employ sliders to other parts of the form to and I worry that it’s just too many sliders. I might have to rethink this. It’s also possible slightly confusing that updating the ‘period’ slider then updates the extent of the ‘peak decade’ slider, but the size of this slider remains the same as the full period range slider. So we have one slider where the ends represent 1010s and 2000s but if you select the period 1200s to 1590s within this then the full extent of the ‘peak decade’ slider is 1200s to 1590s, even though it takes up the same width as the other slider. See the screenshot below. I’m thinking that this is just going to be too many sliders.
I also updated the search results page. For ‘plateau’ searches I updated the sparklines so that the plateaus rather than peaks were highlighted with red spots. I also increased the space devoted to each sparkline and added a border between each to make it easier to tell which text refers to which line. I also added in an ‘info’ box that when clicked on gives you some statistics about the lines, such as the number of decades represented, the size of the largest decade and things like that. I also added in a facility to download the CSV data for the sparklines you’re looking at.
I’ll continue with this next week. What I really need to do is spend quite a bit of time testing out the search facilities to ensure they return the correct data. I’m noticing some kind of quirk with the info box pop-ups, for example, that seems to sometimes display incorrect values for the lines. There are also some issues relating to searches that do not cover the full period that I need to investigate. And then after that I need to think about the heatmap and possible using HighCharts as an alternative to the D3 sparklines I’m currently using. See the screenshot abovefor an example of the sparklines as they currently stand.
This week was a shorter one than usual as Monday was the May Day holiday and I was off work on Wednesday afternoon to attend a funeral. I worked on a variety of different tasks during the time available. Wendy is continuing to work on the data for Mapping Metaphor and had another batch of it for me to process this week. After dealing with the upload we now have a further nine categories marked off and a total of 12,938 metaphorical connections and 25,129 sample lexemes. I also returned to looking at integrating the new OED data into the Historical Thesaurus. Fraser had enlisted the help of some students to manually check connections between HT and OED categories and I set up a script that will allow to mark off a few thousand more categories as ‘checked’. Before that Fraser needs to QA their selections and I wrote a further script that will help with this. Hopefully next week I’ll be able to actually mark off the selections.
I also returned to SCOSYA for the first time since before Easter. I managed to track down and fix a few bugs that Gary had identified. Firstly, Gary was running into difficulties when importing and displaying data using the ‘my map data’ feature. The imported data simply wouldn’t display at all in the Safari browser and after a bit of investigation I figured out why. It turned out there was a missing square bracket in my code, which rather strangely was being silently fixed in other browsers but was causing issues in Safari. Adding in the missing bracket fixed the issue straight away. The other issue Gary had encountered was when Gary did some work on the CSV file exported from the Atlas and then reimported it. When he did so the import failed to upload any ratings. It turned out that Excel had added in some extra columns to the CSV file whilst Gary was working with it and this change to the structure meant that each row failed the validation checks I had put in place. I decided to rectify this in two ways – firstly the upload would no longer check the number of columns and secondly I added more informative error messages. It’s all working a lot better now.
With these things out of the way I set to work on a larger update to the map. Previously an ‘AND’ search limited results by location rather than by participant. For example, if you did a search that said ‘show me attributes D19 AND D30, all age groups with a rating of 4-5’ a spot for a location would be returned if any combination of participant matched this. As there are up to four participants per location it could mean that a location could be returned as meeting the criteria even if a no individual participant actually met the criteria. For example, participants A and B give D19 a score of 5 but only give D30 a score of 3, while Participants C and D only give D19 a score of 3 and give D30 a score of 5. In combination, therefore, this location meets the criteria even though none of the participants actually do. Gary reckoned this wasn’t the best way to handle the search and I agreed. So, instead I updated the ‘AND’ search to check whether individuals met the criteria. This meant a fairly large reworking of the API and a fair amount of testing, but it looks like the ‘AND’ search now works at a participant level. And the ‘OR’ search doesn’t need to be updated because an ‘OR’ search by its very nature is looking for any combination.
I spent the remainder of the week on LDNA duties, continuing to work on the ‘sparkline’ visualisations for thematic heading categories. Most of the time was actually spent creating a new API for the Historical Thesaurus, which at this stage is used solely to output data for the visualisations. It took a fair amount of time to get the required endpoints working, and to create a nice index page that lists the endpoints with examples of how each can be used. It seems to be working pretty well now, though, including facilities to output the data in JSON or CSV format. The latter proved to be slightly tricky to implement due to the way that the data for each decade was formatted. I wanted each decade to appear in its own column, so as to roughly match the format of Marc’s original Excel spreadsheet, and this meant having to rework how the multi-level associative array was processed.