I managed to make a good deal of progress with a number of different projects this week, which I’m pretty pleased about. First of all there is the digital edition that I’m putting together for Bryony Randall’s ‘New Modernist Editing’ project. Last week I completed the initial transcript of the short story and created a zoomable interface for browsing through the facsimiles. This week I completed the transcription view, which allows the user to view the XML text, converted into HTML and styled using CSS. It includes the notes and gaps and deletions but doesn’t differentiate between pencil and ink notes as of yet. It doesn’t include the options to turn on / off features such as line breaks at this stage either, but it’s a start at least. Below is a screenshot so you can see how things currently look.
I tested the site out in a variety of browsers and it works fine in everything other than Internet Explorer (Edge works, though). This is because of the way jQuery loads the XML file and I’m hoping to find a solution to this. I did have some nagging doubts about displaying the text in this way because I know that even though it all works it’s not valid HTML5. Sticking a bunch of <lb>, <note> and other XML tags into an HTML page works now but there’s no guarantee this will continue to work and … well, it’s not ‘right’ is it.
I managed to get something working, but… I was reminded just how much I really dislike XSLT files. Apologies to anyone who likes that kind of thing but my brain just finds them practically incomprehensible. Doing even the most simple of things seems far too convoluted. So I decided to just transform the XML into HTML5 using jQuery. There are only a handful of tags that I need to deal with anyway. All I do is find each occurrence of an XML tag, grab its contents, add a span after the element and then remove the element, e.g:
var content = “<span class=\”del\”>”+$(this).html()+”</span>”;
I can even create a generic function that will pass the tag name and spit out a span with that tag name while removing the tag from the page. When it comes to modifying the layout based on user preferences I’ll be able to handle that straightforwardly via jQuery too. E.g. whether line breaks are on or off:
For me at least this is a much easier approach than having to pass variables to an XSLT file.
I spent a day or so working on the SCOSYA atlas as well and I have now managed to complete work on an initial version of the ‘my map data’ feature. This feature lets you upload previously downloaded files to visualise the data on the atlas.
When you download a file now there is a new row at the top that includes the URL of the query that generated the file and some explanatory text. You can add a title and a description for your data in columns D and E of the first row as well. You can make changes to the rating data, for example deleting rows or changing ratings and then after you’ve saved your file you can upload it to the system.
You can do this through the ‘My Map Data’ section in the ‘Atlas Display Options’. You can either drag and drop your file into the area or click to open a file browser. An ‘Upload log’ displays any issues with your file that the system may encounter. After upload your file will appear in the ‘previously uploaded files’ section and the atlas will automatically be populated with your data. You can re-download your file by pressing on the ‘download map data’ button again and you can delete your uploaded file by pressing on the appropriate ‘Remove’ button. You can switch between viewing different datasets by pressing on the ‘view’ button next to the title. The following screenshot shows how this works:
I tested the feature out with a few datasets, for example I swapped the latitude and longitude columns round and the atlas dutifully displayed all of the data in the sea just north of Madagascar, so things do seem to be working. There are a couple of things to note, though. Firstly, the CSV download files currently do not include data that is below the query threshold, so no grey spots appear on the user maps. We made a conscious decision to exclude this data but we might now want to reinstate it. Secondly, the display of the map is very much dependent on the URL contained in the CSV file in row 1 column B. This is how the atlas knows whether to display an ‘or’ map or an ‘and’ map, and what other limits were placed on the data. If the spreadsheet is altered so that the data contained does not conform to what is expected by the URL (e.g. different attributes are added or new ratings are given) then things might not display correctly. Similarly, if anyone removes or alters that URL from the CSV files some unexpected behaviour might be encountered.
Note also that ‘my map data’ is private – you can only view your data if you’re logged in. This means you can’t share a URL with someone. I still need to add ‘my map data’ to the ‘history’ feature and do a few other tweaks. I’ve just realised trying to upload ‘questionnaire locations’ data results in an error, but I don’t think we need to include the option to upload this data.
I also started working on the new visualisations for the Historical Thesaurus that will be used for the Linguistic DNA project, based on the spreadsheet data that Marc has been working on. We have data about how many new words appeared in each thematic heading in every decade since 1000 and we’re going to use this data to visualise changes in the language. I started by reading through all of the documentation that Marc and Fraser had prepared about the data, and then I wrote some scripts to extract the data from Marc’s spreadsheet and insert it into our online database. Marc had incorporated some ‘sparklines’ into his spreadsheet and my first task after getting the data available was to figure out a method to replicate these sparklines using the D3.js library. Thankfully, someone had already done this for stock price data and had created a handy walkthrough of how to do it (see http://www.tnoda.com/blog/2013-12-19). I followed the tutorial and adapted it for our data, writing a script that created sparklines for each of the almost 4000 thematic headings we have in the system and displaying these all on a page. It’s a lot of data (stored in a 14Mb JSON file) and as of yet it’s static, so users can’t tweak the settings to see how this affects things, but it’s a good proof of concept. You can see a small snippet from the gigantic list below:
Other than these tasks I published this week’s new Burns song (see http://burnsc21.glasgow.ac.uk/braw-lads-on-yarrow-braes/) and I had a meeting with The People’s Voice project team where we discussed how the database of poems will function, what we’ll be doing about the transcriptions, and when I will start work on things. It was a useful meeting and in addition to these points we identified a few enhancements I am going to make to the project’s content management system. I also answered a query about some App development issues from elsewhere in the University and worked with Chris McGlashan to implement an Apache module that limits access to the pages held on the Historical Thesaurus server so as to prevent people from grabbing too much data.
At the start of the week I had to spend a little time investigating some issues with a couple of my WordPress sites, which were failing to connect to external services such as the Akismet anti-spam service. It turned out that there had been a hardware failure on one of the servers which had affected outgoing connections. With the help of Chris I managed to get this sorted again and also took the opportunity to upgrade all of the WordPress instances I manage to the latest incremental version. Some further maintenance issues were encountered later in the week when Chris informed me that one of our old sites (the Old English Teaching site that was set up long before I started working in my current role) had some security issues, so I fixed these as soon as I could.
I spent about two days this week working on the New Modernist Editing project for Bryony Randall. I’m creating a digital edition of a short story by Virginia Woolf that will include a facsimile view and a transcription view, with the transcription view offering several ways to view the text by turning on or off various features of the text. One possibility I investigated was using the digital edition system that has been developed for the Digital Vercelli Book (see http://vbd.humnet.unipi.it/beta2/#). This is a really lovely interface for displaying facsimiles and TEI texts and the tool is available to download and reuse. However, it offers lots of functionality that we don’t really need and it doesn’t provide the facility to tailor the transcription view based on the selection or individual features. While it would be possible to add this feature in, I decided that it would be simpler if I just made a simple system from scratch myself.
I used OpenLayers (http://openlayers.org/) to create a zoomable interface for the facsimile view and a few lines of jQuery handled the navigation, the display of a list of thumbnails and things like that. I also added in a section for displaying the transcription and facilities to turn the facsimile and transcription view on or off. I’m pretty happy with how things are progressing. Here’s a screenshot of the facsimile view as it currently stands:
I also worked on the transcription itself, completing an initial transcription of the six manuscript pages as TEI XML. This included marking deleted text, notes, illegible text, gaps in the text and things like that. It’s only a first attempt and I still might change how certain aspects are marked up, but it’s good to have something to work with now.
I uploaded this week’s new Burns song of the week (See http://burnsc21.glasgow.ac.uk/lassie-wi-the-lintwhite-locks/) and corresponded with Kirsteen about a new web resource she requires, with Craig regarding his Burns bibliography database and with Ronnie about his Burns paper database. I also spent a few hours on further AHRC review duties and had a meeting with Marc and Fraser about future Historical Thesaurus plans and the Linguistic DNA project. I’m going to be producing some new visualisations of the thesaurus data to show the change in language over time. I can’t really say more about them at this stage, but I’ll start investigating the possibilities next week.
This week was a pretty busy one, working on a number of projects and participating in a number of meetings. I spent a bit of time working on Bryony Randall’s New Modernist Editing project. This involved starting to plan the workshop on TEI and XML – sorting out who might be participating, where the workshop might take place, what it might actually involve and things like that. We’re hoping it will be a hands-on session for postgrads with no previous technical experience of transcription, but we’ll need to see if we can get a lab booked that has Oxygen available first. I also worked with the facsimile images of the Woolf short story that we’re going to make a digital edition of. The Woolf estate wants a massive copyright statement to be plastered across the middle of every image, which is a little disappointing as it will definitely affect the usefulness of the images, but we can’t do anything about that. I also started to work with Bryony’s initial Word based transcription of the short story, thinking how best to represent this in TEI. It’s a good opportunity to build up my experience of Oxygen, TEI and XML.
I also updated the data for the Mapping Metaphor project, which Wendy has continued to work on over the past few months. We now have 13,083 metaphorical connections (down from 13931), 9,823 ‘first lexemes’ (up from 8,766) and 14,800 other lexemes (up from 13,035). We also now have 300 categories completed, up from 256. I also replaced the old ‘Thomas Crawford’ part of the Corpus of Modern Scottish Writing with my reworked version. The old version was a WordPress site that hadn’t been updated since 2010 and was a security risk. The new version (http://www.scottishcorpus.ac.uk/thomascrawford/) consists of nothing more than three very simple PHP pages and is much easier to navigate and use.
I had a few Burns related tasks to take care of this week. Firstly there was the usual ‘song of the week’ to upload, which I published on Wednesday as usual (see http://burnsc21.glasgow.ac.uk/ye-jacobites-by-name/). I also had a chat with Craig Lamont about a Burns bibliography that he is compiling. This is currently in a massive Word document but he wants to make it searchable online so we’re discussing the possibilities and also where the resource might be hosted. On Friday I had a meeting with Ronnie Young to discuss a database of Burns paper that he has compiled. The database currently exists as an Access database with a number of related images and he would like this to be published online as a searchable resource. Ronnie is going to check where the resource should reside and what level of access should be given and we’ll take things from there.
I had been speaking to the other developers across the College about the possibility of meeting up semi-regularly to discuss what we’re all up to and where things are headed and we arranged to have a meeting on Tuesday this week. It was a really useful meeting and we all got a chance to talk about our projects, the technologies we use, any cool developments or problems we’d encountered and future plans. Hopefully we’ll have these meetings every couple of months or so.
We had a bit of a situation with the Historical Thesaurus this week relating to someone running a script to grab every page of the website in order to extract the data from it, which is in clear violation of our terms and conditions. I can’t really go into any details here, but I had to spend some of the week identifying when and how this was done and speaking to Chris about ensuring that it can’t happen again.
The rest of my week was spent on the SCOSYA project. Last week I updated the ‘Atlas Display Options’ to include accordion sections for ‘advanced attribute search’ and ‘my map data’. I’m still waiting to hear back from Gary about how he would like to advanced search to work so instead I focussed on the ‘my map data’ section. This section will allow people to upload their own map data using the same CSV format as the atlas download files in order to visualise this data on the map. I managed to make some pretty good progress with this feature. First of all I needed to create new database tables to house the uploaded data. Then I needed to add in a facility to upload files. I decided to use the ‘dropzone.js’ scripts that I had previously used for uploading the questionnaires to the CMS. This allows the user to drag and drop one or more files into a section of the browser and for this data to then be processed in an AJAX kind of way. This approach works very well for the atlas as we don’t want the user to have to navigate away from the atlas in order to upload the data – all needs to be managed from within the ‘display options’ slideout section.
I contemplated adding the facility to process the uploaded files to the API but decided against it as I wanted to keep the API ‘read only’ rather than also handling data uploads and deletions. So instead I created a stand-along PHP script that takes the uploaded CSV files and adds them to the database tables I had created. This script then echoes out some log messages that then get pulled into a ‘log’ section of the display in an AJAX manner.
I then had to add in a facility to list previously uploaded files. I decided the query for this should be part of the API as it is a ‘GET’ request. However, I needed to ensure that only the currently logged in user was able to access their particular list of files. I didn’t want anyone to be able to pass a username to the API and then get that user’s files – the passed username must also correspond to the currently logged in user. I did some investigation about securing an API, using access tokens and things like that but in the end I decided that accessing the user’s data would only ever be something that we would want to offer through our website and we could therefore just use session authentication to ensure the correct user was logged in. This doesn’t really fit in with the ethos of a RESTful API, but it suits our purposes ok so it’s not really an issue.
With the API updated to be able to accept requests for listing a user’s data uploads I then created a facility in the front-end for listing these files, ensuring that the list automatically gets updated with each new file upload. You can see the work in progress ‘my map data’ section in the following screenshot.
I wasn’t feeling very well at the start of the week, but instead of going home sick I managed to struggle through by focussing on some fairly unchallenging tasks, namely continuing to migrate the STARN materials to the University’s T4 system. I’m still ploughing through the Walter Scott novels, but I made a bit of progress. I also spent a little more time this week on AHRC duties.
I had a few meetings this week. I met with the Heads of School Administration, Wendy Burt and Nikki Axford on Tuesday to discuss some potential changes to my job, and then had a meeting with Marc on Wednesday to discuss this further. The outcome of these meetings is that actually there won’t be any changes after all, which is disappointing but at least after several months of the possibility hanging there it’s all decided now.
On Tuesday I also had a meeting with Bryony Randall to discuss her current AHRC project about editing modernist texts. I have a few days of effort assigned to this project, to help create a digital edition of a short story by Virginia Woolf and to lead a session on transcribing texts at a workshop in April, so we met to discuss how all this will proceed. We’ve agreed that I will create the digital edition, comprising facsimile images and multiple transcriptions with various features visible or hidden. Users will be able to create their own edition by deciding which features to include or hide, thus making users the editors of their own edition. Bryony is going to make various transcriptions in Word and I am then going to convert this into TEI text. The short story is only 6 pages long so it’s not going to be too onerous a task and it will be good experience to use TEI and Oxygen for a real project. I’ll get started on this next week.
I met with Fraser on Wednesday to discuss the OED updates for the Historical Thesaurus and also to talk about the Hansard texts again. We returned to the visualisations I’d made for the frequency of Thematic headings in the two-year sample of Hansard that I was working with. I should really try to find the time to return to this again as I had made some really good progress with the interface previously. Also this week I arranged to meet with Catriona Macdonald about The People’s Voice project and published this week’s new song of the week on the Burns website (http://burnsc21.glasgow.ac.uk/robert-bruces-address-to-his-army-at-bannockburn/).
Gary had also stated that some ‘or’ searches were not showing multiple icons when different attributes were selected. However, after some investigation I think this may just be because without supplying limits an ‘or’ search for two attributes will often result in the attributes both being present at every location, therefore all markers will be the same. E.g. a search for ‘D3 or A9’. There are definitely some combinations of attributes that do give multiple markers, e.g. ‘Q6 or D32’. And if you supply limits you generally get lots of different icons, e.g. ‘D3, young, 4-5 or A9, old, 4-5’. Gary is going to check this again for any specific examples that don’t seem right.
After that I began to think about the new Atlas search options that Gary would like me to implement, such as being able to search for entire groups of attributes (e.g. an entire parent category) rather than individual ones. At the moment I’m not entirely sure how this should work, specifically how the selected attributes should be joined. For example, if I select the parent ‘AFTER’ with limits ‘old’ and ‘rating 4-5’ would the atlas only then show me those locations where all ‘AFTER’ attributes (D3 and D4) are present with these limits? This would basically be the same as an individual attribute search for D3 and D4 joined by ‘and’. Or would it be an ‘or’ search? I’ve asked Gary for clarification but I haven’t heard back from him yet.
I also made a couple of minor cosmetic changes to the atlas. Attributes within parent categories are now listed alphabetically by their code rather than their name, and selected buttons are now yellow to make it clearer which are selected and to differentiate from the ‘hover over’ purple colour. I then further reworked the ‘Atlas display options’ so that the different search options are now housed in an ‘accordion’. This hopefully helps to declutter the section a little. As well as accordion sections for ‘Questionnaire Locations’ and ‘Attribute Search’ I have added in new sections for ‘Advanced Attribute Search’ and ‘My Map Data’. These don’t have anything useful in them yet but eventually ‘Advanced Attribute Search’ will feature the more expanded options that are available via the ‘consistency data’ view – i.e. options to select groups of attributes and alternative ways to select ratings. ‘My Map Data’ will be where users can upload their own CSV files and possibly access previously uploaded datasets. See the following screenshot for an idea of how the new accordion works.
I also started to think about how to implement the upload and display of a user’s CSV files and realised that a lot of the information about how points are displayed on the map is not included in the CSV file. For example, there’s no indication of the joins between attributes or the limiting factors that were used to generate the data contained in the file. This would mean when uploading the data the system wouldn’t be able to tell whether the points should be displayed as an ‘and’ map or an ‘or’ map. I have therefore updated the ‘download map data’ facility to add in the URL used to generate the file in the first row. This actually serves two useful purposes. Firstly it means on re-uploading the file the system can tell which limits and Boolean joins were used and display an appropriate map and secondly it means there is a record in the CSV file of where the data came from and what it contains. A user would be able to copy the URL into their browser to re-download the same dataset if (for example) they messed up their file. I’ll continue to think about the implementation of the CSV upload facility next week.