Week Beginning 9th November 2015

This week my time was mostly split across three projects. Firstly, I returned to finish off some work for the Thesaurus of Old English. I had been asked to create a content management system that would allow staff to edit categories and words when I redeveloped the website a couple of months ago, but due to other work commitments I hadn’t got round to implementing it. This week I decided that the time had come. I had initially been planning on using the WordPress based Thesaurus management system that I had created for the Scots Thesaurus project, but I realised that this was a bit unnecessary for the task in hand. The WordPress based system is configured to manage every aspect a thesaurus website – not just adding and editing categories and words but also the front end, the search facilities, handling users and user submitted content and more. TOE already has a front end and doesn’t need WordPress to manage all of these aspects. Instead I decided to take the approach I’d previously taken with the Mapping Metaphor project: Have a very simple script that displays an edit form for a category and processes and updates (with user authentication, of course). It took about a day to get this set up and tested for the TOE data. The resulting script allows all the thesaurus category information (e.g. the heading, part of speech and number) to be edited and category cross references to be added, edited and deleted. Associated lexemes can also be added, edited and deleted and all the lexeme data, including the associated search terms can be updated. I also updated the database so that whenever information is deleted it’s not really deleted but moved to a different ‘deleted’ table.

My second project of the week was Mapping Metaphor. Last week I had begun to update the advanced search and the quick search to enable searches for the sample lexemes. This week I updated the Old English version of the site to also include these facilities. This wasn’t as straightforward as copying the code across as the OE data has some differences to the main data – for example there are no dates or ‘first lexemes’. This meant updating the code I’d written for the main site to take this into consideration. I also had to ensure that the buttons for adding ashes and thorns worked with the new sample lexeme search box. With all this implemented and then tested by Wendy and Ellen I made the new versions live and they are now available through the Mapping Metaphor Website.

My third major project of the week was the Hansard visualisations for the Samuels project. My first big task was to finish off the ‘limit by member’ feature.  Last week I had created the user interface components for this, but the database query running behind it just wasn’t working.  A bit of further investigation this week uncovered some problems with the way in which the SQL queries were being dynamically generated and I managed to fix these, and also to add some additional indices to the tables to speed up data retrieval.  I also ensured that returned data was cached in another table which great improves the speed of subsequent queries for the same member.  The limit by member feature is now working rather well, although there are still some improvements that I need to make to the user interface.  We had an XML file containing more information about members from the ’Digging into Linked Parliamentary Data’ project. This included information on members’ party affiliations and also their gender, both of which will be very useful to limit the display of thematic headings by. I managed to extract party information from the XML file and have uploaded it to our Hansard database now, associating it with members (and through members to speeches and frequencies). Some people have multiple parties and I managed to get them all out too, including dates for these where available. We have 9704 party affiliations for the 9575 members. I’ve also extracted all of the parties too – there are 54 of these, which is more than I was expecting. This data will mean that it will eventually be possible to select a party and see the frequency data for that party.

I also took the opportunity to add the gender data this to our member database as well as I thought a search for gender might interest people (although we’ll definitely need to normalise this due to the massive gender imbalance and even then it might not be considered advisable to compare thematic heading use by gender – we’ll need to see). I had a bit of trouble with the import of the gender data as there are two ID fields in the people database – ‘ID’ and ‘import_ID’. I initially used the first one but spotted something was wrong when it told me that Paddy Ashdown was a woman! All is fixed now, though, and I’ll try to update the visualisation to include limit options for party and gender next week.

Also this week I had a catch-up meeting with Marc where we discussed the various projects I’m involved with and where things are headed. As always, it was a very useful meeting. I also had a couple of other university related tasks that I had to take care of this week that I can’t really go into too much detail about here. That’s all for this week.

Week Beginning 2nd November 2015

My first task this week was to finish off the AHRC duties I’d started last Friday, and with that out of the way I set about trying to fix a small bug with the Scots Corpus that I’d been meaning to try and sort for some time. The concordance feature of the advanced search allows users to order the sentences alphabetically by words to the left or right of the node word but the ordering was treating upper and lower case words separately, e.g. ABC and then abc, rather than AaBbCc. This was rather confusing for users and was being caused by the XSLT file that transforms the XML data into HTML. This is processed dynamically via PHP and unfortunately PHP doesn’t support XPath 2, which provides some handy functions for ignoring case. There is a hack to make XPath 1 ignore case by transforming the data before it is ordered but last time I tried to get this to work I just couldn’t figure out how to integrate it. Thankfully when I looked at the XSLT this time I realised where the transformation needed to go and we have nicely ordered results in the concordance at last.

On Tuesday we had a team meeting for the Metaphor in the Curriculum project. One of the metaphor related items on my ‘to do’ list was to integrate a search for sample lexemes with the Mapping Metaphor search facilities so In preparation for this meeting I tried to get such a feature working. I managed to get an updated version of the Advanced Search working before the meeting, and this allows users to supply some text (with wildcards if required) into a textbox and for this to search the sample lexemes we have recorded about each metaphorical link. I also updated the way advanced search results are displayed. Previously upon completing an advanced search users were then taken to a page where they could then decide which of the returned categories they actually wanted to view the data for. This was put in place to avoid the visualisation getting swamped with data, but I always found it a rather confusing feature. What I’ve done instead is to present a summary of the user’s search, the number of returned metaphorical connections, an option to refine the search and then buttons leading to the four data views for the results (visualisation, table etc). I think this works a lot better and makes a lot more sense. I also updated the quick search to incorporate a search for sample lexemes. The quick search is actually rather different to the advanced search in that the former searches categories while the latter searches metaphorical connections. Sample lexemes are an attribute of a metaphorical connection rather than a category so it took me a while to decide how to integrate the sample lexeme search with the quick search. In the end I realised that both categories in a metaphorical connection share the same pool of sample lexemes – that is how we know there is overlap between the two categories. So I could just assume that if a search term appeared in the sample lexemes then both categories in the associated connection should be returned in the quick search results.

The actual project meeting on Tuesday went well and Ellen has supplied me with some example questions for which I need to make some mock-ups to test out the functionality of the interactive exercises. Ellen had also noticed that some of the OE data was missing from the online database and after she provided it to me I incorporated it into the system.

On Thursday morning I had a meeting with Jennifer and Gary to discuss the SCOSYA project. I’m going to be developing the database, content management system and frontend for the project, and my involvement will now be starting in the next week or so. The project will be looking at about 200 locations, with student fieldworkers filling in about 1000 questionnaires and a similar number of recordings. To keep things simple the questionnaires will be paper based and the students will then transfer their answers to an Excel spreadsheet afterwards. They will then email these to Gary. So rather than having to make a CMS that needs to keep track of at least 40 users and their various data I thankfully just need to provide facilities for Gary and one or two others to use, and the data will be uploaded into the database via Gary attaching the spreadsheets to a web form. Gary is going to provide me with a first version of the spreadsheet structure, with all data / metadata fields present and I will begin working on the database after that.

I spent most of the rest of the week on tasks relating to the Hansard data for the Samuels project. On Thursday afternoon I attended the launch of the Hansard Corpus, and that all went very well. My own Hansard related work (visualising the frequency of thematic headings throughout the Hansard texts) is also progressing, albeit slowly. This week I incorporated a facility to allow the user’s search to be limited to a specific category that they have chosen, or for the returned data to include the chosen category and every other category below this point in the hierarchy. So for example whether ‘Love’ (AU:27) should only show results for this category specifically or whether it should also include all categories lower down, e.g. ‘Liking’ (AU:27:a). I created cached versions of the data for the ‘cascading’ searches too, and it’s working pretty well. I then began to tackle limiting searches by speaker. I’ve now got a ‘limit by’ box that users can open up for each row that is displayed. This box currently features a text box where a speaker’s name can be entered. This box has an ‘autocomplete’ function so for example searching for ‘Tony B’ will display people like ‘Blair’ and ‘Benn’. Clicking on a person adds them to the limit option and updates the query. And this is as far as I’ve got, because trying to run the query on the fly even for the two years of sample data causes my little server to stop working. I’m going to need to figure out a way to optimise the queries if this feature is going to be at all usable. This will be a task for next week.

Week Beginning 14th September 2015

I attended a project meeting and workshop for the Linguistic DNA project this week (see http://www.linguisticdna.org/ for more information and some very helpful blog posts about the project). I’m involved with the project for half a day a week over the next three years, but that effort will be bundled up into much larger chunks. At the moment there are no tasks assigned to me so I was attending the meeting mainly to meet the other participants and to hear what has been going on so far. It was really useful to meet the project team and to hear about their experiences with the data and tools that they’re working with so far. The day after the project meeting there was a workshop about the project’s methodological approach, and this also featured a variety of external speakers who are dealing or who have previously dealt with some of the same sorts of issues that the project will be facing, so it was hugely informative to hear these speakers too.

Preparing for, travelling to and attending the project meeting and workshop took up a fair chunk of my working week, but I did also manage to squeeze in some work on other projects as well. I spent about a day continuing to work on the Medical Humanities Network website, adding in the teaching materials section and facilities to manage teaching materials and the images that appear in the carousel on the homepage.  I’ve also updated the ‘spotlight on’ feature so that collections and teaching materials can appear in this section in addition to projects. That just leaves keyword management, the browse keywords feature, and organisation / unit management to complete. I also spent a small amount of time updating the registration form for Sean’s Academic Publishing event. There were a couple of issues with it that needed tweaking, for example sending users a notification email and things like that. All fairly minor things that didn’t take long to fix.

I also gave advice to a couple of members of staff on projects they are putting together. Firstly Katherine Heavey and secondly Alice Jenkins. I can’t really go into any detail about their projects at this stage, but I managed to give them some (hopefully helpful) advice. I met with Fraser on Monday to collect my tickets for the project meeting and also to show him developments on the Hansard visualisations. This week I added a couple of further enhancements which enable users to add up to seven different lines on the graph. So for example you can compare ‘love’ and ‘hate’ and ‘war’ and ‘peace’ over time all on the same graph. It’s really quite a fascinating little tool to use already, but of course there’s still a lot more to implement. I had a meeting with Marc on Wednesday to discuss Hansard and a variety of other issues. Marc made some very good suggestions about the types of data that it should be possible to view on the graph (e.g. not just simple counts of terms but normalised figures too).

I also met with Susan and Magda on Monday to discuss the upcoming Scots Thesaurus launch. There are a few further enhancements I need to make before next Wednesday, such as adding in a search term variant table for search purposes. I also need to prepare a little 10 minute talk about the implementation of the Scots Thesaurus, which I will be giving at the colloquium. There’s actually quite a lot that needs to be finished off before next Wednesday and a few other tasks I need to focus on before then as well, so it could all get slightly rushed next week.

Week Beginning 7th September 2015

My time this week was mostly divided between three projects: the Scots Thesaurus, the Hansard work for SAMUELS and the Medical Humanities Network. For the Scots Thesaurus I managed to tick off all of the outstanding items on my ‘to do’ list (although there are still a number of refinements and tweaks that still need to be made). I updated the ‘Thesaurus Browse’ page so that it shows all of the main categories that are available in the system. These are split into different tabs for each part of speech and I’ve added in a little feature that allows the user to select whether the categories are ordered by category number or heading. I also completed a first version of the search facilities.  There is a ‘quick search’ box that appears in the right-hand column of every page, which searches category headings, words and definitions.  By default it performs an exact match search, but you can specify an asterisk character at the beginning and / or end. There’s also an advanced search that allows you to search for word, parts of speech, definition and category.  Asterisk wildcards can be used in the word, definition and category text boxes here too.  The ‘Jump to category’ feature is also now working. I’ve also added the ‘Thesaurus Search’ and ‘Browse Thesaurus Categories’ pages as menu items so people can find the content and I’ve also reinstated the ‘random category’ feature widget in the right-hand column.

Another feature that had been requested was to provide a way for people to click on a word in a category to find out which other categories it appears in. To achieve this I added in a little magnifying glass icon beside each word, and clicking on this performs a quick search for the word. I also made some further refinements to the visualisation as follows:

  1. The visualisation can now be ‘zoomed and panned’ like Google Maps.  Click, hold and drag any white space in the visualisation and you can move the contents, meaning if you open lots of stuff that gets lost off the right-hand edge you can simply drag the visualisation to move this area to the middle.  You can also zoom in and out using the scroll wheel on your mouse.  The zoom functionality isn’t really all that important, but it can help if you want to focus in on a cluttered part of the visualisation.
  2. Category labels in the visualisation are now ‘clickable’ again, as they used to be with the previous visualisation style. This makes it easier to follow links as previously only the dots representing categories were clickable.
  3. The buttons for ‘browsing up’ or ‘centring on a category’ in the visualisation are now working properly again.  If you click on the root node and this has a parent in the database the ‘browse up’ button appears in the infobox.  If you click on any other node a button displays in the infobox that allows you to make this node the root.
  4. In the visualisation I’ve added [+] and [-] signs to the labels of categories that have child categories.  As you’d probably expect, if the child categories are hidden a [+] is displayed and when clicked on this expands the categories and changes to a [-].

I’m meeting again with Susan and Magda next Monday to discuss the website and which areas (if any) still need further work. I think it’s all come together very well.

For Hansard I made some very promising progress with the visualisations. Last week I’d begun to look into ways of making the subject of the two thematic heading lines dynamic – i.e. allowing users to enter a thematic heading code into a box and for the graph to dynamically update to display this content. I hadn’t quite managed to get it working last week but I did manage to get it working this week. I had encountered a rather annoying problem whereby the AJAX request for data was not bringing back data for the second line but was instead quitting out before data was returned. To get around this I updated the way requests for data were being made. Previously each line in the graph made its own AJAX call, but this didn’t seem very efficient to me so instead I’ve changed things so the script only makes one AJAX call that can include any number of thematic codes or other search strings. The PHP script on the server then queries the database and puts the data into the required JSON format that the Javascript in the browser can then work with. This seems to work a lot better. I also added in a handy little ‘autocomplete’ feature for selecting thematic headings. Rather than having to select a code (e.g. ‘BA:01’) a user can start typing in a heading (e.g. ‘War’), select the required heading from the list and then use this. Users can still start entering codes as well and this works too. The script I started running on Friday to extract all of the information about speeches from the ‘.idx’ file supplied by Lancaster finally finished running on Tuesday this week, having extracted metadata about more than 6 million speeches.

I had quite a long but useful meeting with Fraser to discuss the Hansard data on Wednesday this week. We went through all of the options that should be available to limit what data gets displayed in the graph and have agreed to try and provide facilities to limit the data by:

  1. Speaker’s name
  2. House (commons or lords)
  3. Speaker’s party (commons only, and probably only possible from the 1920s onwards)
  4. Office (commons only)
  5. Constituency (commons only)
  6. Title (lords only)

We spent quite a lot of time looking through the metadata we have about speeches, which is split across many different SQL dumps and XML files, and it’s looking like it will be possible to get all of these options working. It’s all looking very promising.

For the Medical Humanities Network I continued working on the site and the contement management system. I realised I hadn’t added in options to record organisational units for projects or people, or to associate keywords with people. I’ve added in these facilities now. I still need to add in options to allow staff to manage organisational units. ‘Organisation’ is currently hidden as it defaults to ‘University of Glasgow’ for now and only ‘Unit’ (e.g. College of Arts, School of Critical Studies) appears anywhere. If we add an organisation that isn’t University of Glasgow this will appear, though.

I’ve also completed a first draft of the ‘collections’ section of the site, including scripts for adding, editing and listing collections. As agreed in the original project documentation, collections can only be added by admin users. We could potentially change this at some point, though. One thing that wasn’t stated in the documentation is whether collections should have a relationship with organisational units. It seemed sensible to me to be able to record who owns the collection (e.g. The Hunterian) so I’ve added in the relationship type.

It’s possible to make a collection a ‘spotlight’ feature through the collection edit page, but I still need to update the homepage so that it checks the collections as well as just projects. I’ll do this next time I’m working on the project. After that I still need to add in the teaching materials pages and complete work on the keywords section and then all of the main parts of the system should be in place.

I also spent a little time this week working on the map for Murray Pittock’s Ramsay and the Enlightenment project. I’ve been helping Craig Lamont with this, with Craig working on the data while I develop the map. Craig has ‘pinned’ quite a lot of data to the map now and was wanting me to add in the facility to enable markers of a certain type to be switched on or off. I’d never done this before using Leaflet.js so it was fun to figure out how it could work. I managed to get a very nice little list of checkboxes that when clicked on automatically turn marker types on or off. It is working very well. The next challenge will be to get it all working properly within T4.

Other than meeting with Fraser and Craig, I had a few other meetings this week. On Monday I attended a project meeting with the ‘Metaphor in the Curriculum’ project. It was good to catch up with developments on this project. It’s looking like I’ll start doing development work on this project in October, which should hopefully fit in with my other work. I also had two meetings with the Burns people this week. The first was with Kirsteen and Vivien to discuss the George Thomson part of the Burns project. There are going to be some events for this and some audio, video and textual material that they would like to be nicely packaged up and we discussed some of the possibilities. I also met with Gerry and Pauline on Friday to discuss the next big Burns project, specifically some of the technical aspects of the proposal that I will be working on. I think we all have a clearer idea of what is involved now and I’m going to start writing the technical aspects in the next week or so.

Week Beginning 31st August 2015

This week I returned to working a full five days, after the previous two part-time weeks. It was good to have a bit more time to work on the various projects I’m involved with, and to be able to actually get stuck into some development work again. On Monday and Tuesday and a bit of Thursday this week I focussed on the Scots Thesaurus project. The project is ending at the end of September so there’s going to be a bit of a final push over the coming weeks to get all of the outstanding tasks completed.

I spent quite a bit of time continuing to try to get an option to enable multiple parts of speech represented in the visualisations at the same time, but unfortunately I had to abandon this due to the limitations of my available time. It’s quite difficult to explain why allowing multiple parts of speech to appear in the same visualisation is tricky, but I’ll try. The difficulty is caused by the way parts of speech and categories are handled in the thesaurus database. A category for each part of speech is considered to be a completely separate entity, with a different unique identifier, different lexemes and subcategories. For example there isn’t just one category ‘ Rain’, and then certain lexemes within it that are nouns and others that are verbs. Instead, ‘ Rain’ is one category (ID 398) and ‘ Rain’ is another, different category (ID 401). This is useful because categories of different parts of speech can then have different names (e.g. ‘Dew'(n) and ‘Cover with dew'(v)), but it also means building a multiple part of speech visualisation is tricky because the system is based around the IDs.

The tree based visualisations we’re using expect every element to have one parent category and if we try to include multiple parts of speech things get a bit confused as we no longer have a single top-level parent category as the noun categories have a different parent from the verbs etc. I thought of trying to get around this by just taking the category for one part of speech to be the top category but this is a little confusing if the multiple top categories have different names. It also makes it confusing to know where the ‘browse up’ link goes to if multiple parts of speech are displayed.

There is also the potential for confusion relating to the display of categories that are at the same level but with a different part of speech. It’s not currently possible to tell by looking at the visualisation which category ‘belongs’ to which part of speech when multiple parts of speech are selected, so for example if looking at both ‘n’ and ‘v’ we end up with two circles for ‘Rain’ but no way of telling which is ‘n’ and which is ‘v’. We could amalgamate these into one circle but that brings other problems if the categories have different names, like the ‘Dew’ example. Also, what then should happen with subcategories? If an ‘n’ category has 3 subcategories and a ‘v’ category has 2 subcategories and these are amalgamated it’s not possible to tell which main category the subcategories belong to. Also, subcategory numbers can be the same in different categories, so the ‘n’ category may have a subcategory ’01’ and a further one ‘01.01’ while the ‘v’ category may also have ones with the same numbers and it would be difficult to get these to display as separate subcategories.

There is also a further issue with us ending up with too much information in the right-hand column, where the lexemes in each category are displayed. If the user selects 2 or 3 parts of speech we then have to display the category headings and the words for each of these in the right-hand column, which can result in far too much data being displayed.


None of these issues are completely insurmountable, but I decided that given the limited amount of time I have left on the project it would be risky to continue to pursue this approach for the time being. Instead what I implemented is a feature that allows users to select a single part of speech to view from a list of available options. Users are able to, for example, switch from viewing ‘n’ to viewing ‘v’ and back again, but can’t to view both ‘n’ and ‘v’ at the same time. I think this facility works well enough and considerably cuts down on the potential for confusion.

After completing the part of speech facility I moved onto some of the other outstanding, ono-visualisation tasks I still have to tackle, namely a ‘browse’ facility and the search facilities. Using WordPress shortcodes I created an option that lists all of the top level main categories in the system – i.e. those categories that have no parent category. This option provides a pathway into the thesaurus data and is a handy reference showing which semantic areas the project has so far tackled. I also began work on the search facilities, which will work in a very similar manner to those offered by the Historical Thesaurus of English. So far I’ve managed to create the required search forms but not the search that this needs to connect to.

After making this progress with non-visualisation features I returned to the visualisations. The visualisation style we had adopted was a radial tree, based on this example: http://bl.ocks.org/mbostock/4063550. This approach worked well for representing the hierarchical nature of the thesaurus, but it was quite hard to read the labels. I decided instead to investigate a more traditional tree approach, initially hoping to get a workable vertical tree, with the parent node at the top and levels down the hierarchy from this expanding down the page. Unfortunately our labels are rather long and this approach meant that there were a lot of categories on the same horizontal line of the visualisation, leading to a massive amount of overlap of labels. So instead I went for a horizontal tree approach, and adapted a very nice collapsible tree style similar to the one found here: http://mbostock.github.io/d3/talk/20111018/tree.html. I continued to work on this on Thursday and I have managed to get a first version integrated with the WordPress plugin I’m developing.

Also on Thursday I met with Susan and Magda to discuss the project and the technical tasks that are still outstanding. We agreed on what I should focus in my remaining time and we also discussed the launch at the end of the month. We also had a further meeting with Wendy, as a representative of the steering group, and showed her what we’d been working on.

On Wednesday this week I focussed on Medical Humanities. I spent a few hours adding a new facility to the SciFiMedHums database and WordPress plugin to enable bibliographical items to cross reference any number of other items. This facility adds such a connection in both directions, allowing (for example) Blade Runner to have an ‘adapted from’ relationship with ‘Do androids dream of electric sheep’ and for the relationship in the other direction to then automatically be recorded with an ‘adapted into’ relationship.

I spent the remainder of Wednesday and some other bits of free time continuing to work on the Medical Humanities Network website and CMS. I have now completed the pages and the management scripts for managing people and projects and have begun work on Keywords. There should be enough in place now to enable the project staff to start uploading content and I will continue to add in the other features (e.g. collections, teaching materials) over the next few weeks.

On Friday I met with Stuart Gillespie to discuss some possibilities for developing an online resource out of a research project he is currently in the middle of. We had a useful discussion and hopefully this will develop into a great resource if funding can be secured. The rest of my available time this week was spent on the Hansard materials again. After discussions with Fraser I think I now have a firmer grasp on where the metadata that we require for search purposes is located. I managed to get access to information about speeches from one of the files supplied by Lancaster and also access to the metadata used in the Millbanksystems website relating to constituencies, offices and things like that. The only thing we don’t seem to have access to is which party a member belonged to, which is a shame is this would be hugely useful information. Fraser is going to chase this up, but in the meantime I have the bulk of the required information. On Friday I wrote a script to extract the information relating to speeches from the file sent by Lancaster. This will allow us to limit the visualisations by speaker, and also hopefully by constituency and office too. I also worked some more with the visualisation, writing a script that created output files for each thematic heading in the two-year sample data I’m using, to enable these to be plugged into the visualisation. I also started to work on facilities to allow a user to specify which thematic headings to search for, but I didn’t quite manage to get this working before the end of the day. I’ll continue with this next week.


Week Beginning 17th August 2015

I had taken two days of annual leave this week so it was a three-day week for me. I still managed to pack quite a lot into three days, however. I had a long meeting with Fraser on Monday to discuss future updates to the HTE using new data from the OED. We went through some sample data that the OED people had sent and figured out how we would go about checking which parts of our data would need updating (mostly dates and some new words added to categories). We also discussed the Hansard visualisations and the highcharts example I thought I would be able to get working with the data. I spent about a day working on the highcharts visualisations, which included creating a PHP script that queried my two years of sample data for any thematic code passed to it, bundling up usage of the code by day and then spitting out the data in the JSON format. This sort of got the information into the format highcharts required (date / frequency key / value pairs) but the script itself was horribly slow due to the volume of data that is being queried. My two year sample has more than 13 million rows. So rather than connecting the chart directly to my PHP script I decided to cache the output as static JSON files. I think this is the approach we will have to take with the final visualisations too if we want them to be usable. Plugging the data into highcharts didn’t work initially, and I finally realised that this was because the timestamps highcharts uses are not just standard unix timestamps (the number of seconds since 1970) but Javascript timestamps, which use milliseconds instead of seconds. Adding three zeros onto the end of my timestamps did the trick and after some tweaking of axes and layout options I managed to get a nice time-based graph that plotted the usage of two thematic categories over two years. It worked very well and I’m confident I’ll be able to extend this out both to the full dataset and with limiting options (e.g. speaker).

I had to deal with some further Apple Developer Program issues this week, which took up a little time. I also continued to work on the Scots Thesaurus project. First up was investigating why the visualisations weren’t working in IE9, which is what Magda has on her office PC. I had thought that this might be caused by compatibility mode being turned on for University sites, but this wasn’t actually the case. I was rather stumped for a while as to what the problem was but I managed to find a solution. The problem seems to be with how older versions of IE pull in data from a server after a page has loaded. When the visualisation loads, Javascript is connecting to the server to pull in data behind the scenes. The method I was using to do this should wait until it receives the data before it processes things, but in older versions of IE it doesn’t wait, meaning that the script attempts to generate the visualisation before it has any data to visualise! I switched to an alternative method that does wait properly in older versions of IE. I’ve tested this out in IE on my PC, which I’ve figured out I can set to emulate IE9. Before the change, with it set to IE9 I was getting the ‘no data’ error. After changing the method the visualisation loads successfully.

After fixing this issue I continued to work with the visualisations. I added in an option to show or hide the words in a category, as the ‘infobox’ was taking up quite a lot of space when viewing a category that contains a lot of words. I also developed a first version of the part of speech selector. This displays the available parts of speech as checkboxes above the visualization and allows the user to select which parts to view. Ticking or unticking a box automatically updates the visualization. The feature is still unfinished and there are some aspects that need sorted, for example the listed parts of speech only show those that are present at the current level in the hierarchy but as things stand there are sometimes a broader range of parts lower down the hierarchy and these are not available to choose until the user browses down to the lower level. I’m still uncertain as to whether multiple parts of speech in one visualisation is going to work very well and whether a simpler switch from one part to another might work better, but we’ll see how it goes.

I also spent a bit of time on the Medical Humanities Network website, continuing to add new features to it and I set up a conference website for Sean Adams in Theology. This is another WordPress powered site but Sean wanted it to look like the University website. A UoG-esque theme for WordPress had been created a few years ago by Dave Beavan and then subsequently tweaked by Matt Barr, but the theme was rather out of date and didn’t look exactly like the current University website so I spent some time updating the theme, which will probably see some use on other websites too. This one, for example.



Week Beginning 10th August 2015

This week I continued to work on the projects I’d started work on again last week after launching the three Old English resources. For the Science Fiction and the Medical Humanities project I completed a first draft of all of the management scripts that are required for managing the bibliographic data that will be published through the website. It is now possible to manage all of the information relating to bibliographical items through the WordPress interface, including adding, editing and deleting mediums, themes, people, places and organisations. The only thing it isn’t possible to do is to update the list of options that appear in the ‘connection’ drop-down lists when associating people, places and organisations.  But I can very easily update these lists directly through the database and the new information then appears wherever it is required so this isn’t going to be a problem.

Continuing on the Medical Humanities theme, I spent about a day this week starting work on the new Medical Humanities network website and content management system for Megan Coyer. This system is going to be an adaptation of the existing Digital Humanities network system. Most of my time was spent on ‘back end’ stuff like setting up the underlying database, password protecting the subdomain until we’re ready to ‘go live’ and configuring the template scripts. The homepage is in place (but without any content), it is possible to log into the system and the navigation menu is set up, but no other pages are currently in place. I spent a bit of time tidying up the interface, for example adding in more modern looking up and down arrows to the ‘log in’ box, tweaking the breadcrumb layout and updating the way links are styled to bring things more into line with the main University site.

I also spent a bit of time advising staff and undertaking some administrative work. Rhona Brown asked me for some advice on the project she is putting together and it took a little time to formulate a response to her. I was also asked by Wendy and Nikki to complete a staff time allocation survey for them, which also took a bit of time to go through. I also had an email from Adam Zachary Wyner in Aberdeen about a workshop he is putting together and I gave him a couple of suggestions about possible Glasgow participants. I’m also in the process of setting up a conference website for Sean Adams in Theology and have been liaising with the RA who is working on this with him.

Other than these matters the rest of my week was spent on two projects, the Scots Thesaurus and SAMUELS. For the Scots Thesaurus I continued to work on the visualizations. Last week I adapted an earlier visualization I had created to make it ‘dynamic’ – i.e the contents change depending on variables passed to it by the user. This week I set about integrating this with the WordPress interface. I had initially intended to make the visualisations available as a separate tab within the main page. E.g. the standard ‘browse’ interface would be available and by clicking on the visualization tab this would be replaced in page by the visualization interface. However, I realized that this approach wasn’t really going to work due to the limited screen space that we have available within the WordPress interface. As we are using a side panel the amount of usable space is actually quite limited and for the visualizations we need as much screen width as possible. I decided therefore to place the visualizations within a jQuery modal dialog box which takes up 90% of the screen width and height and have provided a button from the normal browse view to open this. When clicked on the visualization now loads in the dialog box, showing the current category in the centre and the full hierarchy from this point downwards spreading out around it. Previously the contents of a category were displayed in a pop-up when the user clicked on a category in the visualization, but this wasn’t ideal as it obscured the visualization itself. Instead I created an ‘infobox’ that appears to the right of the visualization and I’ve set this up so that it lists the contents of the selected category, including words, sources, links through to the DSL and facilities to centre the visualization on the currently selected category or to browse up the hierarchy if the central node is selected. The final thing I added in was highlighting of the currently selected node in the visualization and facilities to switch back to the textual browse option at the point at which the user is viewing the visualization. There is still some work to be done on the visualizations, for example adding in the part of speech browser, sorting out the layout and ideally providing some sort of animations between views, but things are coming along nicely.

For SAMUELS I continued to work on the visualizations of the Hansard data. Unfortunately it looks like I’m unable to make any further progress with Bookworm. I’ve spent several days trying to get the various parts of the Bookworm system to communicate with each other using the sample ‘congress’ data but the API component is returning errors that I just can’t get to the bottom of. I have BookwormDB (https://github.com/Bookworm-project/BookwormDB) set up and the congress data appears to have been successfully ingested. I have installed the API (https://github.com/Bookworm-project/BookwormAPI) and it is executing and apparently successfully connecting to the database. This page http://bookworm-project.github.io/Docs/API.html says the API should be able to query the database to return the possible fields I can run such a query successfully on my test server. I have installed the BookwormGUI (https://github.com/Bookworm-project/BookwormGUI), but the javascript in the front end just doesn’t seem to be able to pass a valid query to the API. I added in an ‘alert’ that pops up to display the query that gets passed to the API, but running this through the API just gives Python errors. I’ve tried following the API guidelines on query structure (http://bookworm-project.github.io/Docs/query_structure.html) in order to create a simple, valid query but nothing I’ve tried has worked. The Python errors seem to suggest that the API is having some difficulty connecting to the database (there’s an error ‘connect() argument 12 must be string, not None’) but I don’t know enough about Python to debug this problem. Plus I don’t understand how the API can connect to the database to successfully query the possible fields but then fail to connect for other query types. It’s infuriating. Without access to a friendly Python expert I’m afraid it’s looking like we’re stuck.

However, I have figured out that BookwormGUI is based around the Highcharts.js library (see http://www.highcharts.com/demo/line-ajax) and I’m wondering now whether I can just use this library to connect to the Hansard data instead of trying to get Bookworm working, possibly borrowing some of the BookwormGUI code for handling the ‘limit by’ options and the ‘zoom in’ functionality (which I haven’t been able to find in the highcharts examples). I’m going to try this with the two years of Hansard data that I previously managed to extract, specifically this visualisation style: http://www.highcharts.com/stock/demo/compare. If I can get it to work the timeslider along the bottom would work really nicely.



Week Beginning 3rd August 2015

The ISAS (International Society of Anglo-Saxonists) conference took place this week and two projects I have been working on over the past few weeks were launched at this event. The first was A Thesaurus of Old English (http://oldenglishthesaurus.arts.gla.ac.uk/), which went live on Monday. As is usual with these things there were some last minute changes and additions that needed to be made, but overall the launch went very smoothly and I’m particularly pleased with how the ‘search for word in other online resources’ feature works.

The second project that launched was the Old English Metaphor Map (http://mappingmetaphor.arts.gla.ac.uk/old-english/). We were due to launch this on Thursday but due to illness the launch was bumped up to Tuesday instead. Thankfully I had completed everything that needed sorting out before Tuesday so making the resource live was a very straightforward process. I think the map is looking pretty good and it complements the main site nicely.

With these two projects out of the way I had to spend about a day this week on AHRC duties, but once all that was done I could breathe a bit of a sigh of relief and get on with some other projects that I haven’t been able to devote much time to recently due to other commitments. The first of these was Gavin Miller’s Science Fiction and the Medical Humanities project. I’m developing a WordPress based tool for his project to manage a database of sources and this week I continued adding functionality to this tool as follows:

  1. I removed the error messages that were appearing when there weren’t any errors
  2. I’ve replaced ‘publisher’ with a new entity named ‘organisation’.  This allows the connection the organisation has with the item (e.g. Publisher, Studio) to be selected in the same way as connections to items from places and people are handled.
  3. I’ve updated the way in which these connections are pulled out of the database to make it much easier to add new connection types.  After adding a new connection type to the database this then immediately appears as a selectable option in all relevant places in the system.
  4. I’ve updated the underlying database so that data can have an ‘active’ or ‘deleted’ state, which will allow entities like people and places to be ‘deleted’ via WordPress but still retained in the underlying database in case they need to be reinstated.
  5. I’ve begun work on the pages that will allow the management of types and mediums, themes, people, places and organisations.  Currently there are new menu items that provide options to list these data types.  The lists also include counts of the number of bibliographic items each row is connected to.  The next step will be to add in facilities to allow admin users to edit, delete and create types, mediums, themes, people, places and organisations.

The next project I worked on was the Scots Thesaurus project. Magda has emailed me stating she was having problems uploading words via CSV files and also assigning category numbers. I met with Magda on Thursday to discuss these issues and to try and figure out what was going wrong. The CSV issue was being caused by the CSV files created by Excel on Magda’s PC being given a rather unexpected MIME type. The upload script was checking the uploaded file for specific CSV MIME types but Excel was giving them a MIME type of ‘application/vnd.ms-excel’. I have no idea why this was happening, and even more strangely, when Magda emailed me one of her files and I uploaded it on my PC (without re-saving the file) it uploaded fine. I didn’t really get to the bottom of this problem, but instead I simply fixed it by allowing files of MIME type ‘application/vnd.ms-excel’ to be accepted.

The issue with certain category numbers not saving was being caused by deleted rows in the system. When creating a new category the system checks to see if there is already a row with the supplied number and part of speech in the system. If there is then the upload fails. However, the check wasn’t taking into consideration categories that had been deleted from within WordPress. These rows were being marked as ‘trash’ in WordPress but still existed in our non-Wordpress ‘category’ table. I updated the check to link up the category table to WordPress’s posts table to check the status of the category there. Now if a category number exists but it’s associated with a WordPress post that is marked as deleted then the upload of a new row can proceed without any problems.

In addition to fixing these issues I also continued working on the visualisations for the Scots Thesaurus. Magda will be presenting the thesaurus at a conference next week and she was hoping to be able to show some visualisations of the weather data. We had previously agreed at a meeting with Susan that I would continue to work on the static visualisation I had made for the ‘Golf’ data using the d3.js ‘node-link tree’ diagram type (see http://bl.ocks.org/mbostock/4063550). I would make this ‘dynamic’ (i.e. it would work with any data passed to it from the database and it would be possible to update the central node). Eventually we may choose a completely different visualisation approach but this is the one we will focus on for now. I spent some time adapting my ‘Golf’ visualisation to work with any thesaurus data passed to it – simply give it a category ID and a part of speech and the thesaurus structure (including subcategories) from this point downwards gets displayed. There’s still a lot of work to do on this (e.g. integrating it within WordPress) but I happy with the progress I’m making with it.

The last project I worked on this week was the SAMUELS Hansard data, or more specifically trying to get Bookworm set up on the test server I have access to. Previously I had managed to get the underlying database working and the test data (US Congress) installed. I had then installed the Bookworm API but I was having difficulty getting Python scripts to execute. I’m happy to report that I got to the bottom of this. After reading this post (https://www.linux.com/community/blogs/129-servers/757148-configuring-apache2-to-run-python-scripts) I realised that I had not enabled the CGI module of Apache, so even though the cgi-bin directory was now web accessible nothing was getting executed there. The second thing I realised was that I’d installed the API in a subdirectory within cg-bin and I needed to add privileges in the Apache configuration file for this subdirectory as well as the parent directory. With that out of the way I could query the API from a web browser, which was quite a relief.

After this I installed the Bookworm GUI code, which connects to the API in order to retrieve data from the database. I still haven’t managed to get this working successfully. The page surroundings load but the connection to the data just isn’t working. One reason why this is the case is because I’d installed the API in a subdirectory of the cgi-bin, but even after updating every place in the Javascript where the API is called I was still getting errors. The AJAX call is definitely connecting to the API as I’m getting a bunch of Python errors returned instead of data. I’ll need to further investigate this next week.

Also this week I had a meeting with Gary Thomas about Jennifer Smith’s Syntactic Atlas of Scots project. Gary is the RA on the project and we met on Thursday to discuss how we should get the technical aspects of the project off the ground. It was a really useful meeting and we already have some ideas about how things will be managed. We’re not going to get started on this until next month, though, due to the availability of the project staff.


Week Beginning 20th July 2015

It was a four-day week this week due to the Glasgow Fair holiday. I actually worked the Monday and took the Friday off instead, and this worked out quite well as it gave me a chance to continue development of the Scots Thesaurus before we had our team meeting on the Tuesday morning. I had previously circulated a ‘to do’ list that brought together all of the outstanding technical tasks for the project, with 5 items specifically to do with the management of thesaurus data via the WordPress interface. I’m happy to report that I managed to complete all of these items. This included adding facilities to enable words associated with a category to be deleted from the system (in actual fact the word records are simply marked as ‘inactive’ in the underlying database). This option makes it a lot easier for Magda to manage the category information. I also redeveloped the way sources and URLs as stored in the system. Previously each word could have one single source (either DOST or SND) and a single URL. I’ve updated this to enable a word to have any number of associated sources and URLs, and I’ve expanded the possible source list to include the paper Scots Thesaurus too. I could have updated the system to incorporate any number of sources but Susan thinks these three will be sufficient. Allowing multiple sources per word actually meant quite a lot of reworking of both the underlying database and the WordPress plugin I’m developing for the project, but it is all now working fine. I also updated the way connections to existing Historical Thesaurus of English categories are handled and added in an option that allows a CSV file containing words to be uploaded to a category via the WordPress admin interface. This last update should prove very useful to the people working on the project as it will enable them to compile lists of words in Excel and then upload them directly from this to a category in the online database. On Tuesday we had a team meeting for the project and I gave a demonstration of these new features and Magda is going to start using the system and will let me know if anything needs updated.

I spent a small amount of time this week updating the Burns website to incorporate new features that launched on the anniversary of Burns’ death on the 21st. These are an audio play about Burns forgeries (http://burnsc21.glasgow.ac.uk/burns-forgery/) and an online exhibition about the illustrations to George Thomson’s collections of songs (http://burnsc21.glasgow.ac.uk/the-illustrations-to-george-thomsons-collections/).

I continued working on the SAMUELS project this week, again trying to figure out how to get the Bookworm system working on the test server that Chris has set up for me. The script that imports the congress data into Bookworm that I left running last week successfully completed this week. The amount of data generated for this textual resource is rather large, with one of the tables consisting of over 42 million rows and another one taking up 22 million rows. I still need to figure out how this data is actually queried and used to power the Bookworm visualisations and the next step was to get the Bookworm API installed and running. The API connects to the database and allows the visualisation to query it. It’s written in Python and I spent rather a lot of time just trying to get Python scripts to execute via Apache on the server. This involved setting up a cgi-bin, ensuring Apache knows about it, where it is and it has the permissions to execute scripts stored there. I spent a rather frustrating few hours getting nothing but 403 Forbidden errors before realising that you had to explicitly give Apache rights to do things with the directory in the apache configuration file as well as updating file permissions. By the end of the week I still hadn’t managed to get Python files actually running – instead the browser just attempts to download the files. I need to continue with this next week, hopefully with the help of Chris McGlashan who was on holiday this week.

I spent the majority of the rest of the week working on the Old English version of the Metaphor Map, which we are intending to launch at the ISAS conference. This is a version of the Metaphor Map that features purely Old English related data and will sit alongside the main Mapping Metaphor website as a stand-alone interface. Here’s a summary of what I managed to complete this week:

  1. I’ve uploaded OE stage 5 and stage 4 data to new OE specific tables
  2. I identified some rows that included categories that no longer exist and following feedback from Ellen I deleted these (I think there were only 3 in total)
  3. I’ve replicated the existing site structure at the new OE URL and I’ve updated how the text for the ancillary pages is stored: It’s all now stored in one single PHP file which is then referenced by both the main and the OE ancillary pages. I’ve also put a check in all of the OE pages to see if OE specific text has been supplied and if so this is used instead of the main text. This should make it easier to manage all of the text.
  4. I’ve created a new purple colour scheme for the OE site, plus a new purple ‘M’ favicon (unfortunately it isn’t exactly the same as the green one so I might update this)
  5. I’ve expanded the top bar to incorporate tabs for switching from the OE map to the main one. These are currently positioned to the left of the bar in a similar way to how the Scots Corpus links to CMSW and back work.
  6. The visualisation / table / card views are all now working with the OE data. Timeline has been removed as this is no longer applicable (all metaphors are OE with no specific date).
  7. Search and browse are also now working with the OE data.
  8. All reference to first dates and first lexemes has been removed, e.g. from metaphor cards, columns in the tabular view, the search options
  9. The metaphor card heading now says ‘OE Metaphor’ and then a number, just in case people notice the same number is used for different connections in the OE / non-OE sites.
  10. The text ‘(from OE to present day)’ has been added to the lexeme info in the metaphor cards.
  11. Where a metaphorical connection between two categories also exists in the non-OE data, a link is added to the bottom of the metaphor card with text ‘View this connection in the main metaphor map’. Pressing on this opens the nonOE map in a new tab with the visualisation showing category 1 and the connection to category 2 highlighted. The check for the existence of the connection in the non-OE data ignores strength and presents the nonOE map with both strong and weak visible. This is so that if (for example) the OE connection is weak but the main connection is strong you can still jump from one to the other.
  12. I’ve updated the category database to add a new column ‘OE categories completed’. The OE categories completed page will list all categories where this is set to ‘y’ (none currently)
  13. I’ve created staff pages to allow OE data to be managed by project staff.

Next week I’ll receive some further data to upload and after that we should be pretty much ready to launch.

Week Beginning 13th July 2015

Last week I returned to the development of the Essentials of Old English app and website, which we’re hoping to make available before the ISAS conference at the start of August. Christian had previously sent me a list of things to change or fix and I managed to get through all but one of the items. The item that I still needed to tackle was a major reordering of the exercises, including the creation of new sections and moving exercises all over the place. I had feared that this would be a tricky task as the loading of the next and previous exercise is handled programmatically via Javascript and I had thought I’d just set this up so the script took the current ID and added or subtracted 1 to create the ‘next’ and ‘previous’ links. This would have meant real trouble if the order of the exercises was changed. Thankfully when I looked at the data I realised I’d structured things better than I’d remembered and had actually included ‘next’ and ‘previous’ fields within each exercise record which contained the IDs of whichever exercise should come before or after the current one. So all I had to do was switch everything around and then update these fields to ensure that the correct exercises loaded. There was a little more to it than that due to sections changing and exercise titles changing, and it was a time-consuming process, but it wasn’t too tricky to sort out. The new structure makes a lot more sense than the old one so it’s definitely been worthwhile making the changes.

After making all of the required updates to the pages the next step was to actually create an app from them. It’s been a while since I last made an app and in that time there has been a new version of Apache Cordova (the very handy wrapping tool that generates apps from HTML, JavaScript and CSS files) so I had to spend some time upgrading the software and all of the platform specific tools as well, such as XCode for iOS and the Android developer tools for Android. Once this was complete I managed to get versions of the app working for iOS and Android and I tested these out both using emulators and on actual hardware. I had to rebuild the code a few times before everything was exactly as I wanted it, and I had to include some platform specific CSS styles, for example to ensure the app header didn’t obscure the iOS status bar. It also took a horribly long time to create all of the icons and splash screens that are required for iOS and Android, and then a horribly long time to create the store pages via iTunes Connect and the Google Play developer interface. And then I needed to generate seemingly thousands of screenshots at different screen sizes from phones to mini tablets to full-size tablets. And then I had to go through the certification process for both platforms.

As usual, when it came to verify the iOS app using distribution certificates and provisioning profiles I got a bunch of errors in XCode. It took ages to get to the bottom of these (it was because the distribution certificate that existed for the University of Glasgow account had been generated by someone in MVLS in May and had been downloaded onto her Mac and I didn’t have a copy of it and you can’t just re-download the certificate, you have to make a new one and then associate it with the provisioning profile and download and install the certificates on your Mac and then ensure you close and reopen XCode for the changes to be registered. Ugh). However, I did finally manage to get the app uploaded to iTunes Connect and I have now submitted it for review. Hopefully it will be approved within the next two weeks.

The process of signing the Android version of the app was less arduous but still took a fair amount of time to finally get right. I must remember to follow these instructions next time: http://developer.android.com/tools/publishing/app-signing.html#signing-manually (although it seems as if help pages relating to app development stop working after a few months when new OS versions get released).

Now I’ve completed the process of creating a STELLA app for Android I really need to get around to updating the three existing apps for iOS and creating Android versions of them too. It shouldn’t be too difficult to do so, but it will take some time and I’m afraid I just don’t have the time at the moment due to all of the other work I need to do.

One of the other major outstanding projects I’m currently working on is the Samuels project, specifically trying to get some visualisations of the semantically tagged Hansard data working. We are using the Bookworm interface for this (see http://bookworm.culturomics.org/) and I’ve been trying to get the code for this working with our data for a while now. It turned out that I needed access to a Linux box to get the software installed and last week Chris McGlashan helpfully said he’d be able to set up a test server for me. On Monday this week he popped by with the box, which is now plugged in and working away in my office. After some initial problems getting through the University proxy I managed to download a handy script that installs all of the components that Bookworm requires (see https://github.com/Bookworm-project/FreshInstallationScript). I then downloaded the Congress data that is used as test data in the documentation (see https://github.com/econpy/congress_api) and followed the steps required to set this up. There were a couple of problems with this that were caused by my test server not having some required zip software installed, but after getting over this I had access to the data. The script that then imports this data into Bookworm is currently running, so I will need to wait and see how that works out before I can proceed further.

Another ongoing project is the Thesaurus of Old English. I’d managed to complete a first version of the new website for this project last week and I’d received some feedback from a couple of people since then and I updated the interface and functionality as a result of this as follows:

  1. I’ve added some explanatory text to the ‘quick search’ so it’s clearer that you can search for either Old English words or Modern English category headings using the facility. I’ve also included a sentence about wildcards too. This appears both on the homepage and the ‘quick search’ tab of the search page.
  2. I’ve updated the labels in the ‘advanced search’ tab so ‘word’ now says ‘Old English word’ and ‘category’ now says ‘Modern English words in Category Heading’ to make things clearer.
  3. I’ve updated the way the search queries for category headings when asterisks are not supplied, both in the ‘quick search’ and in the ‘category’ box of the ‘advanced search’. Now if a user enters a word (e.g. ‘love’) the script searches for all categories where this full word appears – either as the full category name (‘love’) or beginning the category name (‘love from the heart’), the end of the category name (‘great love’) or somewhere in the middle (no example in ‘love’ but e.g. search for ‘fall’: ‘Shower, fall of rain’). It doesn’t bring back any categories where the string is part of a longer word (e.g. ‘lovely, beautiful, fair’ is not found). If the user supplies asterisks at the beginning and/or end of the search term then the search just matches the characters as before.
  4. I’ve also updated the appearance of the flags that appear next to words. These are now smaller and less bold and I’ve also added in tool-tip help for when a user hovers over them.

What with all of the above going on I didn’t get a chance to do any further work on the Scots Thesaurus or the Sci-Fi Med Hums database, but hopefully I’ll be able to work on these next week. Having said that, I’ve also now received the first set of data for the Old English Metaphor Map, and this also has to ‘go live’ before ISAS at the start of August, so I may have to divert a lot of time to this over the next couple of weeks.