I had rather a lot of different things to get through this week, which made the week fly past alarmingly quickly. On Monday I attended an all-day meeting for the Scots Syntax Atlas project. The meeting was primarily focussed on the online resources that the project intends to produce, so it was obviously quite important for me to attend, seen as I’ll be creating the resources. We discussed some existing online linguistic atlas projects, looking at some in detail and figuring out aspects of each that we liked or didn’t like. We also tried out a few online language quizzes too as the project is hoping to incorporate such a feature. We also discussed some of the requirements of our intended online resource, and how the ‘general public’ interface that the project intends to create should differ from the ‘academic’ interface. I have a much clearer idea now of what it is the project would like to develop, although we still need to have a more detailed discussion about the underlying data itself and how this will be presented on the map-based interface. Things are beginning to take shape, however.
Gareth Roy from Physics and Astronomy got back to me this week to say that one of his colleagues had set up a database on a server for me to use for the Hansard data, which was excellent news. I didn’t have much time to test this out this week, but did manage to confirm that I can access it from both my desktop machine and the little test server located in my office. Hopefully I will find the time next week to migrate some of the existing data to this database and start to write a script that can process the thousands of SQL files that contain the actual data.
I also had to spend a bit of time this week doing AHRC review duties, and I attended a Digital Humanities talk on Friday afternoon, which was interesting but perhaps lacking a bit in detail. I spend some further time on the Mapping Metaphor project, as Wendy and Carole had prepared new batches of both OE and non-OE data that I needed to upload to the database. This all seemed to go smoothly enough, although as a result of the upload Wendy noticed that the data from certain OE categories appears to have been overlooked. It would appear that several of the categories that were split off from category H27 were omitted from the spreadsheets at some earlier stage in the data processing and I had to spend some time trying to figure out when and how this happened, which was not easy as this must have happened more than a year ago and it may have been because of one of my scripts or it may have been as a result of some of the processes Ellen and Flora conducted. We still hadn’t managed to get to the bottom of the issue by the end of the week and I’ll have to return to this next week.
I also spent some time this week helping out Vivien Williams of the Burns project, who is putting the finishing touches to lots of new content for the website. She needed my help to get some parts of her pages laid out as she required them, and in converting some WAV files and other such tasks. I also had a detailed email conversation with Alison Wiggins about the proposal she’s putting together, and I think we now have a much clearer idea of how the technical aspects of the proposal will fit together. I also set up a ‘bare bones’ WordPress installation for Christine Ferguson’s new project and sent some code on to some researchers that were interested in how the SciFiMedHums bibliographic database was put together. Oh, I also published the Android version of ‘Readings in Early English’, which can be found here: https://play.google.com/store/apps/details?id=com.gla.stella.readings
I had quite a lot of meetings this week and I’ll just run through these first of all. I met with Scott Spurlock on Monday to discuss a proposal he is hoping to submit over the summer. It’s the same project that we discussed last year, which will involve crowdsourcing the transcription of millions of old handwritten documents. Scott was previously hoping to work with the Zooniverse people and had made some progress with them, but unfortunately they have decided to take on no more humanities projects for the next 18 months, which has left Scott looking for an alternative. It looks like we are going to try and adapt and host an existing free and open source crowdsourcing tool, although we still need to decide how detailed the transcriptions should be and how the workflow would need to be managed. It was a useful meeting and I hope we can take the project further.
On Tuesday I met with Gareth Roy from Physics and Astronomy. Gareth had previously helped me to use the ScotGrid infrastructure to process the Hansard data and he had also mentioned the possibility of being able to host the rather large database that the dataset requires after Arts IT Support said they wouldn’t be able to host it. I met Gareth at his office in the Kelvin building, which too some finding as the Kelvin building is quite a tricky place to navigate around. We discussed the requirements for the database and how I would need to use ScotGrid again to process all of the SQL files that my previous scripts had generated. Gareth and his colleagues were very helpful in discussing the possibilities and have agreed to set up a server where the data can be hosted, which should be accessible from the ScotGrid infrastructure so as to allow the insertion of data from scripts running through the SQL files. Hopefully I should have access to the database next week, which is brilliant.
On Thursday some of the developers and more technical academics in the College of Arts met with Arts IT Support to discuss the sorts of software that can be put on Arts servers. Arts IT are wanting to lock down the servers to only allow standard LAMP technologies to be used (that’s Linux, Apache, MySQL and PHP). This suits the needs of (I would estimate) about 90% of the online resources that developers in Arts are creating and it makes a lot of sense to put these limits in place, both in terms of long-term support (e.g. if a project is written in Ruby on Rails and the only person who knows that language leaves then ensuring the project continues to work in years to come as operating systems change and new software versions replace older versions is going to be tricky) and in terms of security (it’s trickier to ensure a wider variety of technologies are patched and up to date, especially over the longer term). However, there are cases when non-LAMP tools are simply the better solution to a research project’s needs and my worry was that these restrictions may hamper research and innovation. I think the meeting was really useful. We all got to discuss our concerns and we reached some useful conclusions about software, hosting issues and ensuring effective communication between the people writing bid proposals and the people who will have to support whatever is promised in these proposals. There was general agreement that LAMP should be used wherever possible (e.g. a developer shouldn’t use another technology if this setup will work just as well) but when there is a real need for another technology (for example the eXist XML database for working with XML datasets) then this should be discussed with Arts IT Support who will try to support its deployment. Hopefully this arrangement will work well in future.
On Friday I met with Christine Ferguson to discuss the requirements for her recently funded AHRC project. This is going to be fairly simple from a technical point of view (just a project website and some hosted videos) but it was good to talk to Christine about how she wanted the website to look and function and to get things started. I also met with Eva Moreda Rodriguez, a researcher from Music who is wanting to create a map-based system looking at early musical recordings in Spain. I gave her some advice, which was hopefully helpful, but as Music is beyond my remit I can’t really do anything more than that.
Other than meetings, I spent the rest of the week returning to the STELLA apps. Since publishing the Essentials of Old English app last year a new version of the Apache Cordova ‘wrapping’ tool had been released and Android apps created with the old version were being flagged up as being a possible security risk. I managed to upgrade the app code to the new version of Cordova and to publish version 1.1 of the app, both on the Google Play store and the Apple App store. I also took the opportunity to fix a couple of other issues with this app, namely a broken link when navigating between sections of the manual and some other small tweaks to the app text. After this was out of the way I decided to revisit the three other STELLA apps that I had created. I had only ever created iOS versions of these apps and I’ve always meant to create Android versions as well. I started with ‘Readings in Early English’. I upgraded Cordova and also upgraded the jQueryMobile framework for this app as well. I also made some updates to the interface, using buttons rather than text links to navigate between sections, adding in a higher resolution logo and things like that. Previously it had not been possible to use the HTML5 Audio tag in apps on Android devices – the player simply wouldn’t appear. Instead I had to use Cordova’s media plugin, which works fine but is somewhat rudimentary. I spent quite a bit of time trying to get the media plugin working, which I eventually did. But then I decided to just see if the HTML5 Audio tag might work in Android now (it has been a couple of years since I last tried) and it now seems to be working. This would appear to be a much nicer approach as the player features a duration bar, muting and other such functionality that I would have to manually create with the Media plugin. By the end of the week I had submitted an updated version of the app to the Apple App store and had a test version of the app running on my Android phone. I had also made a start on the Google Play store information for the app and had started creating screenshots and things like that, but I will need to finish this off next week.
It was another four-day week this week due to the May Day holiday on Monday. I spent most of Tuesday finishing off a first version of the Technical Plan for the proposal Murray Pittock is putting together and sent it off to him for comment on Wednesday morning. There will no doubt be some refinements to make to it before it can be integrated with the rest of the bid documentation but I think overall the technical side of the potential project have been given sufficient consideration. On Wednesday I heard from Christine Ferguson that a proposal she had submitted which I had given technical advice on had been awarded funding, which is great news. I’m going to meet with her next week to discuss the technical requirements in more detail.
Also on Wednesday we had a College of Arts developers meeting. This consisted of Matt Barr, Graeme Cannon, Neil McDermott and me so not a huge gathering, but it was very useful to catch up with other developers and discuss some of the important issues for developers in the College.
I also spent some time on Wednesday investigating a bug in the Mapping Metaphor tabular view. Wendy had noticed that ordering the table by category two was not working as it should have done. It looks like I introduced this bug when I allowed the table to be properly ordered by the ‘direction’ column a few weeks ago. For the table to be ordered by the direction column I needed to look at the full HTML of the data in the columns to get some info from within the ‘image’ tag of the arrow. But for some reason the reordering isn’t working properly for the other columns when the full HTML rather than the plain text is used. I’ve updated things so that the full HTML is used only when the ‘direction’ column is clicked on and the plain text is used for all other columns. I’ve fixed this in the main site and the Metaphoric site. I’m afraid the problem also exists in the App, but I’ll wait to fix this until we have the next batch of data to add later on this month.
On Thursday I contacted Gareth Roy about the possibility of the Hansard data being hosted on one of the servers managed by Physics and Astronomy. Gareth had previously been hugely helpful in giving advice on how to use the ScotGrid infrastructure to extract all of the Hansard data and to convert it into millions of SQL insert statements, but without a server to host the data I couldn’t proceed any further. In March Gareth and I attended the same ‘big data’ meeting and after wards he suggested that there might be the possibility of getting the data hosted on one of the servers he has access to and now that Metaphoric is out of the way I have a bit of time to return to the Hansard data and consider how it can be used. I’m going to meet with Gareth next week to consider the options. In the meantime I tried to access the test server that Chris had set up for me last year, on which I had a subset of the Hansard data running through a graph-based front end that I’d created. Unfortunately when I tried to connect to it nothing happened. As the box is physically in my office (it’s just an old desktop PC set up as a server) I tried manually turning it off and on again but it just made a worrying series of low pitch beeps and then nothing happened. I tried plugging a monitor and keyboard into it and there was no display and the caps lock key wouldn’t light up or anything. I took out the hard drive and put it in my old desktop PC and that made the same worrying beeps and then did nothing when I started it too. I then attached the drive in place of the secondary hard drive in my old PC and after it had successfully booted into Windows it didn’t find the drive. So I guess there’s been some sort of hard drive failure! Thankfully I have all the code I was working with on my desktop PC anyway. However, I’ve realised that the version of the database I have on my desktop PC is not the most recent version. I’ll be able to reconstruct the database from the original data I have but it’s a shame I’ve lost the working version. It’s completely my own fault for not backing things up. Chris is going to try and see if he can access the hard drive too, but I’m not holding out much hope.
On Friday I had a meeting with Alison Wiggins to discuss a proposal she is putting together that will involve crowdsourcing. I did a bit of research into the Zooniverse Scribe tool (http://scribeproject.github.io/), which she was keen to use. I also tried installing the tool in OSX but despite following their detailed installation instructions all I ended up with was a bunch of errors. Nevertheless, it was useful to learn about the tool and also to try out some online examples of the tool as well. I think it has potential, if we can only get it working. The meeting with Alison went well and we discussed the technical issues relating to her project and how crowdsourcing might fit in with it. She is still at the planning stages at the moment and we’ll need to see if the project will feature a technical aspect or if it will involve some kind of event discussing issues relating to crowdsourcing instead. I think either might work pretty well.
Also this week I received an external hard drive in the mail from the Royal College of Physicians in Edinburgh, who want access to the raw TIFF images that were produced for the Cullen project. Project PI David Shuttleton had previously agreed to this so I started the process off. Copying half a terabyte of images from a Network drive to an external hard drive takes rather a long time, but by Friday morning I had completed the process and had mailed the hard drive off. Hopefully it will arrive in one piece.
I was stuck down with a rather nasty cold this week, which unfortunately led to me being off sick on Wednesday and Thursday. It hit me on Tuesday and although I struggled through the day it really affected y ability to work. I somewhat foolishly struggled into work on the Wednesday but only lasted an hour before I had to go home. I returned to work on the Friday but was still not feeling all that great, which did unfortunately limit what I was able to achieve. However, I did manage to get a few things done this week.
On Monday I created ‘version 1.1’ of the Metaphoric app. The biggest update in this version is the ‘loading’ icons that now appear on the top-level visualisation when the user presses on a category. As detailed in previous posts, there can be a fairly lengthy delay between a user pressing on a category and the processing of yellow lines and circles completing, during which time the user has no feedback that anything is actually happening. I had spent a long time trying to get to the bottom of this, but realised that without substantially redeveloping the way the data is processed I would be unable to speed things up. Instead what I managed to do was add in the ‘loading’ icon to at least give a bit of feedback to users that something is going on. I had added this to the web version of the resource before the launch last week, but I hadn’t had time to add the feature to the app versions due to the time it takes for changes to apps to be approved before they appear on the app stores. I set to work adding this feature (plus a few other minor tweaks to the explanatory text) to the app code and then went through all of the stages that are required to build the iOS and Android versions of the apps and submit these updated builds to the App Store and the Play Store. By lunchtime on Monday the new versions had been submitted. By Tuesday morning version 1.1 for Android was available for download. Apple’s approval process takes rather longer, but thankfully the iOS version was also available for download by Friday morning. Other than updating the underlying data when the researchers have completed new batches of sample lexemes my work on the Metaphor projects is now complete. The project celebrated this milestone with lunch in the Left Bank on Tuesday, which was very tasty, although I was already struggling with my cold by this point, alas.
Also this week I met with Michael McAuliffe, a researcher from McGill University in Canada who is working with Jane Stuart Smith to develop some speech corpus analysis tools. Michael was hoping to get access to the SCOTS corpus files, specifically the original, uncompressed sound recordings and the accompanying transcriptions made using the PRAAT tool. I managed to locate these files for him and he is going to try and use these files with a tool he has created in order to carry out automated analysis/extraction of vowel durations. It’s not really an area I know much about but I’m sure it would be useful to add such data to the SCOTS materials for future research possibilities.
I also finalised my travel arrangements for the DH2016 conference and made a couple of cosmetic tweaks to the People’s Voice website interface. Other than that I spent the rest of my remaining non-sick time this week working on the technical plan for Murray Pittock’s new project. I’ve managed to get about a third of the way through a first draft of the plan so far, which has resulted in a number of questions that I sent on to the relevant people. I can’t go into any detail here but the plan is shaping up pretty well and I aim to get a completed first draft to Murray next week.