These weeks seem to be zipping by at an alarming rate! I split most of my time this week between three projects and tackled a few bits and bobs for other projects along the way too. First up in Metaphor in the Curriculum. Last week I created a fully functioning mockup of a metaphor quiz and I’d created three basic interface designs. This week I created a fourth design that is an adaptation of the third design, but incorporates some fairly significant changes. The biggest change is the introduction of a background image – a stock image from the very handy free resource http://www.freeimages.com/. The use of a background image really brightens up the interface and some transparency features on some of the interface elements helps to make the interface look appealing without making it difficult to read the actual content. I also reworked the ‘MetaphorIC’ header text so that the ‘IC’ is in a different, more cursive font and added a ‘home’ button to the header. I think it’s coming together quite nicely. We have another project meeting next week so I’ll probably have a better idea about where to focus next on this project after that.
My next project was the Burns project. Last month Pauline sent round a document listing some fairly major changes to the project website – restructuring sections, changing navigation, layout and page content etc. I set up a test version of the live site and set about implementing all of the changes that I could make without further input from project people. After getting it all working pretty well I contacted Pauline and we arranged to meet on Monday next week to go through everything and (hopefully) make all of the changes live.
The third project I worked on this week was the SCOSYA project and this took up the bulk of my time. Last week Gary had sent me a template of the spreadsheet that the project fieldworkers will fill in and email to Gary. Gary will then need to upload these spreadsheets to an online database through a content management system that I need to create. This week I began working on the database structure and the content management system. The project also wants the usual sort of project website and blog, so first of all I set up WordPress on the project’s domain. I toyed with the idea of making the content management system a WordPress ‘plugin’, but as I want the eventual front-end to be non-Wordpress I decided against this. I also looked into using Drupal for the content management system as Drupal is a tool I feel I ought to learn more about. However, the content management system is going to be very straightforward – just file upload plus data browse, edit and delete and using Drupal or other such tools seemed like overkill to me. I was also reluctant to use a system such as Drupal because they seem to change so rapidly. SCOSYA is a 5 year project (I think!) and my worry is that by the end of the project the version of Drupal that I use would have been superseded, no longer supported and seen as a bad thing to have running on a server. So I decided just to create the CMS myself.
I decided that rather than write all of the user authentication and management stuff myself I would tie this in with the WordPress system that I’d set up to power the project website and blog. After a bit of research I figured out that it is remarkably easy for non-Wordpress scripts to access the WordPress authentication methods so I set up the CMS to use these, following the instructions I found here: http://skookum.com/blog/using-wordpress-as-a-user-and-authentication-database. With this in place SCOSYA staff can manage their user accounts via WordPress and use the same details to access the CMS, which seems very neat.
By the end of the week I had created an upload script that allows you to drag and drop multiple files into the upload pane, for these to be checked both on the client and server side and for a log of uploads to be built up dynamically as each file is processed in a scrolling section beneath the upload pane. I still need to do quite a lot of work on the server-side script in order to extract the actual data from the uploaded files and to insert this data into the relevant tables, but I feel that I have made very good progress with the system so far. I’ll continue with this next week.
Other than these three main projects I was involved with some others. I fixed a few bugs that had crept into the SciFiMedHums bibliographic search facility when I’d updated the functionality of it last week. I slightly tweaked the Medical Humanities Network system to give a user feedback if they try to log in with incorrect details (previously no feedback was given at all). I also contacted John Watt at NeSC to see whether he might be able to help with the extraction of the Hansard data and he suggested I try to do this on the University’s High Performance Compute Cluster. I’ll need to speak with the HPCC people to see how I might be able to use their facilities for the task that needs performed.
I met with Gary Thoms this week to discuss the content management system I’m going to build for the SCOSYA project. Gary has prepared a spreadsheet template that the fieldworkers are going to be using when conducting their interviews and we talked through how this was structured. After that I began to think about the database structure that will be used to store the spreadsheet uploads. I also had a meeting with the Burns people to discuss the new bid that they are putting together, which is getting very close to completion now. I also talked to Pauline about the restructuring of the Burns online resource and we agreed that I would begin work on this at a temporary URL before moving everything across to replace the existing site. I’ll need to start work on this in the next week or so. I also updated the search functionality of the SciFiMedHums bibliography system to enable users to search for multiple themes and mediums (once the system goes live). I also made a few tweaks to the Medical Humanities Network website, mainly adding in the site text and helping out with some video uploads. I made a couple of small tweaks to the new Thesaurus of Old English content management system and set up some user accounts for people too.
My major focus of the week was the Metaphor in the Curriculum project. At our last project meeting Ellen had given me some sample questions to show how the metaphor quizzes that will be found in the apps and the website will be structured. I spent a lot of this week creating working digital prototypes of these, using a similar approach to the one I’d taken for the interactive STELLA apps I’d previously produced: there is a question, some possible answers, a ‘check answer’ button, a ‘restart’ button and next and previous question buttons (where applicable). The question content itself is pulled in from a JSON file and there are three question types, although really types 1 and 3 are handled in the same way in the current version. Types 1 and 3 (questions 1-3 and 6-7) present possible answers, the user can click on one and then press the ‘check answer’ button. A tick will be placed beside their answer if it was correct, a cross if incorrect. No other feedback is currently offered and there is no tracking of right / wrong answers (this is something that might be changed).
Question type 2 (questions 4-5) allows the user to ‘drag and drop’ an answer into the dotted box in the question. Once an answer is dropped into place the user can press the ‘check answer’ button and a tick or cross will be displayed beside the sentence and also beside the item they dragged. I’ve tested this drag and drop functionality out on iOS and Android devices and it works fine, so hopefully we’ll be able to include such functionality in the final version of the quizzes.
The prototypes I’d created focused entirely on the functionality of the quiz itself and not on the overall design of the interface, and once I’d completed the prototypes I set to work on some possible interface designs. So far I have created three, but I need to work on these some more before I let anyone see them. Having said that, I think the technical side of the project is currently on schedule, which is encouraging.
On Friday there was a corpus linguistics workshop at the University, which was being organised by Wendy. I had been invited to be part of the panel for a roundtable discussion session at the end of the event so I spent some time this week preparing for this. Unfortunately my son was ill on Friday and I had to leave work to pick him up from school, which meant I had to miss the event which was a real shame.
This week my time was mostly split across three projects. Firstly, I returned to finish off some work for the Thesaurus of Old English. I had been asked to create a content management system that would allow staff to edit categories and words when I redeveloped the website a couple of months ago, but due to other work commitments I hadn’t got round to implementing it. This week I decided that the time had come. I had initially been planning on using the WordPress based Thesaurus management system that I had created for the Scots Thesaurus project, but I realised that this was a bit unnecessary for the task in hand. The WordPress based system is configured to manage every aspect a thesaurus website – not just adding and editing categories and words but also the front end, the search facilities, handling users and user submitted content and more. TOE already has a front end and doesn’t need WordPress to manage all of these aspects. Instead I decided to take the approach I’d previously taken with the Mapping Metaphor project: Have a very simple script that displays an edit form for a category and processes and updates (with user authentication, of course). It took about a day to get this set up and tested for the TOE data. The resulting script allows all the thesaurus category information (e.g. the heading, part of speech and number) to be edited and category cross references to be added, edited and deleted. Associated lexemes can also be added, edited and deleted and all the lexeme data, including the associated search terms can be updated. I also updated the database so that whenever information is deleted it’s not really deleted but moved to a different ‘deleted’ table.
My second project of the week was Mapping Metaphor. Last week I had begun to update the advanced search and the quick search to enable searches for the sample lexemes. This week I updated the Old English version of the site to also include these facilities. This wasn’t as straightforward as copying the code across as the OE data has some differences to the main data – for example there are no dates or ‘first lexemes’. This meant updating the code I’d written for the main site to take this into consideration. I also had to ensure that the buttons for adding ashes and thorns worked with the new sample lexeme search box. With all this implemented and then tested by Wendy and Ellen I made the new versions live and they are now available through the Mapping Metaphor Website.
My third major project of the week was the Hansard visualisations for the Samuels project. My first big task was to finish off the ‘limit by member’ feature. Last week I had created the user interface components for this, but the database query running behind it just wasn’t working. A bit of further investigation this week uncovered some problems with the way in which the SQL queries were being dynamically generated and I managed to fix these, and also to add some additional indices to the tables to speed up data retrieval. I also ensured that returned data was cached in another table which great improves the speed of subsequent queries for the same member. The limit by member feature is now working rather well, although there are still some improvements that I need to make to the user interface. We had an XML file containing more information about members from the ’Digging into Linked Parliamentary Data’ project. This included information on members’ party affiliations and also their gender, both of which will be very useful to limit the display of thematic headings by. I managed to extract party information from the XML file and have uploaded it to our Hansard database now, associating it with members (and through members to speeches and frequencies). Some people have multiple parties and I managed to get them all out too, including dates for these where available. We have 9704 party affiliations for the 9575 members. I’ve also extracted all of the parties too – there are 54 of these, which is more than I was expecting. This data will mean that it will eventually be possible to select a party and see the frequency data for that party.
I also took the opportunity to add the gender data this to our member database as well as I thought a search for gender might interest people (although we’ll definitely need to normalise this due to the massive gender imbalance and even then it might not be considered advisable to compare thematic heading use by gender – we’ll need to see). I had a bit of trouble with the import of the gender data as there are two ID fields in the people database – ‘ID’ and ‘import_ID’. I initially used the first one but spotted something was wrong when it told me that Paddy Ashdown was a woman! All is fixed now, though, and I’ll try to update the visualisation to include limit options for party and gender next week.
Also this week I had a catch-up meeting with Marc where we discussed the various projects I’m involved with and where things are headed. As always, it was a very useful meeting. I also had a couple of other university related tasks that I had to take care of this week that I can’t really go into too much detail about here. That’s all for this week.
My first task this week was to finish off the AHRC duties I’d started last Friday, and with that out of the way I set about trying to fix a small bug with the Scots Corpus that I’d been meaning to try and sort for some time. The concordance feature of the advanced search allows users to order the sentences alphabetically by words to the left or right of the node word but the ordering was treating upper and lower case words separately, e.g. ABC and then abc, rather than AaBbCc. This was rather confusing for users and was being caused by the XSLT file that transforms the XML data into HTML. This is processed dynamically via PHP and unfortunately PHP doesn’t support XPath 2, which provides some handy functions for ignoring case. There is a hack to make XPath 1 ignore case by transforming the data before it is ordered but last time I tried to get this to work I just couldn’t figure out how to integrate it. Thankfully when I looked at the XSLT this time I realised where the transformation needed to go and we have nicely ordered results in the concordance at last.
On Tuesday we had a team meeting for the Metaphor in the Curriculum project. One of the metaphor related items on my ‘to do’ list was to integrate a search for sample lexemes with the Mapping Metaphor search facilities so In preparation for this meeting I tried to get such a feature working. I managed to get an updated version of the Advanced Search working before the meeting, and this allows users to supply some text (with wildcards if required) into a textbox and for this to search the sample lexemes we have recorded about each metaphorical link. I also updated the way advanced search results are displayed. Previously upon completing an advanced search users were then taken to a page where they could then decide which of the returned categories they actually wanted to view the data for. This was put in place to avoid the visualisation getting swamped with data, but I always found it a rather confusing feature. What I’ve done instead is to present a summary of the user’s search, the number of returned metaphorical connections, an option to refine the search and then buttons leading to the four data views for the results (visualisation, table etc). I think this works a lot better and makes a lot more sense. I also updated the quick search to incorporate a search for sample lexemes. The quick search is actually rather different to the advanced search in that the former searches categories while the latter searches metaphorical connections. Sample lexemes are an attribute of a metaphorical connection rather than a category so it took me a while to decide how to integrate the sample lexeme search with the quick search. In the end I realised that both categories in a metaphorical connection share the same pool of sample lexemes – that is how we know there is overlap between the two categories. So I could just assume that if a search term appeared in the sample lexemes then both categories in the associated connection should be returned in the quick search results.
The actual project meeting on Tuesday went well and Ellen has supplied me with some example questions for which I need to make some mock-ups to test out the functionality of the interactive exercises. Ellen had also noticed that some of the OE data was missing from the online database and after she provided it to me I incorporated it into the system.
On Thursday morning I had a meeting with Jennifer and Gary to discuss the SCOSYA project. I’m going to be developing the database, content management system and frontend for the project, and my involvement will now be starting in the next week or so. The project will be looking at about 200 locations, with student fieldworkers filling in about 1000 questionnaires and a similar number of recordings. To keep things simple the questionnaires will be paper based and the students will then transfer their answers to an Excel spreadsheet afterwards. They will then email these to Gary. So rather than having to make a CMS that needs to keep track of at least 40 users and their various data I thankfully just need to provide facilities for Gary and one or two others to use, and the data will be uploaded into the database via Gary attaching the spreadsheets to a web form. Gary is going to provide me with a first version of the spreadsheet structure, with all data / metadata fields present and I will begin working on the database after that.
I spent most of the rest of the week on tasks relating to the Hansard data for the Samuels project. On Thursday afternoon I attended the launch of the Hansard Corpus, and that all went very well. My own Hansard related work (visualising the frequency of thematic headings throughout the Hansard texts) is also progressing, albeit slowly. This week I incorporated a facility to allow the user’s search to be limited to a specific category that they have chosen, or for the returned data to include the chosen category and every other category below this point in the hierarchy. So for example whether ‘Love’ (AU:27) should only show results for this category specifically or whether it should also include all categories lower down, e.g. ‘Liking’ (AU:27:a). I created cached versions of the data for the ‘cascading’ searches too, and it’s working pretty well. I then began to tackle limiting searches by speaker. I’ve now got a ‘limit by’ box that users can open up for each row that is displayed. This box currently features a text box where a speaker’s name can be entered. This box has an ‘autocomplete’ function so for example searching for ‘Tony B’ will display people like ‘Blair’ and ‘Benn’. Clicking on a person adds them to the limit option and updates the query. And this is as far as I’ve got, because trying to run the query on the fly even for the two years of sample data causes my little server to stop working. I’m going to need to figure out a way to optimise the queries if this feature is going to be at all usable. This will be a task for next week.
I returned to a more normal working week this week, after having spent the previous one at a conference and the one before that on holiday. I probably spent about a day catching up with emails, submitting my expenses claim and writing last week’s rather extensive conference report / blog post. I also decided it was about time that I gathered all of my outstanding tasks together into one long ‘to do’ list as I seem to have a lot going on at the moment. The list currently has 47 items on it split across more than 12 different projects, not including other projects that will be starting up in the next month or two. There’s rather a lot going on at the moment and it is good to have everything written down in one place so I don’t forget anything. I also had some AHRC review duties to perform this week as well, which took up some further time.
With these tasks out of the way I could get stuck into working on some of my outstanding projects again. I met with Hannah Tweed on Tuesday to go through the Medical Humanities Network website with her. She had begun to populate the content management system with projects and people now and had encountered a few bugs and areas of confusion so we went through the system and I made a note of things that needed fixed. These were all thankfully small issues and all easily fixable, such as supressing the display of fields when the information isn’t available and it was good to get things working properly. I also returned to the SciFiMedHums bibliographical database. I updated the layout of the ‘associated information’ section of the ‘view item’ page to make it look nicer and I created the ‘advanced search’ form, that enables users to search for things like themes, mediums, dates, people and places. I also reworked the search results page to add in pagination, with results currently getting split over multiple pages when more than 10 items are returned. I’ve pretty much finished all I can do on this project now until I get some feedback from Gavin. I also helped Zanne to get some videos reformatted and uploaded to the Academic Publishing website, which will probably be my final task for this project.
Wendy contacted me this week to say that she’d spotted some slightly odd behaviour with the Scots Corpus website. The advanced search was saying that there were 1317 documents in the system but a search returning all of them was saying that it matched 99.92% of the corpus. The regular search stated that there were 1316 documents. We figured out that this was being caused by a request we had earlier this year to remove a document from the corpus. I had figured out a way to delete it but evidently there was some data somewhere that hadn’t been successfully updated. I managed to track this down: it turned out that the number of documents and the total number of words was being stored statically in a database table, and the advanced search was referencing this. Having discovered this I updated the static table and everything was sorted. Wendy also asked me about further updates to the Corpus that she would like to see in place before a new edition of a book goes to the printers in January. We agreed that it would be good to rework the advanced search criteria selection as the options are just too confusing as they stand. There is also a slight issue with the concordance ordering that I need to get sorted too.
At the conference last week Marc, Fraser and I met with Terttu Nevalainen and Matti Rissanen to discuss Glasgow hosting the Helsinki Corpus, which is currently only available on CD. This week I spent some time looking through the source code and getting a bit of server space set aside for hosting the resource. The scripts that power the corpus are Python based and I’ve not had a massive amount of experience with Python, but looking through the source code it all seemed fairly easy to understand. I managed to get the necessary scripts and the data (mostly XML and some plain text) uploaded to the server and the scripts executing. The only change I have so far made to the code is to remove the ‘Exit’ tab as this is no longer applicable. We will need to update some of the site text and also add in a ‘hosted by Glasgow’ link somewhere. The corpus all seems to work online in the same way as it does on the CD now, which is great. The only problem is the speed of the search facilities. The search is very slow, and can take up to 30 seconds to run. Without delving into the code I can’t say why this is the case, but I would suspect it is because the script has to run through every XML file in the system each time the search runs. There doesn’t appear to be any caching or indexing of the data (e.g. using an XML database) and I would imagine that without using such facilities we won’t be able to do much to improve the speed. The test site isn’t publicly accessible yet as I need to speak to Marc about it before we take things further.