Week Beginning 21st September 2015

A lot of this week was devoted to the Scots Thesaurus project, which we launched on Wednesday. You can now access the website here: http://scotsthesaurus.org/. I spent quite a bit of time on Monday and Tuesday making some last minute updates to the website and visualisations and also preparing my session for Wednesday’s colloquium. The colloquium went well, as did the launch itself. We had considerable media attention and many thousands of page hits and thankfully the website coped with all of this admirably. I still have a number of additional features to implement now the launch is out of the way, and I’ll hopefully get a chance to implement these in the coming weeks.

Other than Scots Thesaurus stuff I had to spend a day or so this week doing my Performance and Development review exercise. This involved preparing materials, having my meeting and then updating the materials. It has been a very successful year for me so the process all went fine.

I spent some of the remainder of the week working on the front end for the bibliographical database for Gavin Miller’s SciFiMedHums project. This involved updating my WordPress plugin to incorporate some functions that could then be called in the front-end template page using WordPress shortcodes. This is a really handy way to add custom content to the WordPress front end and I managed to get a first draft of the bibliographical entry page completed, including lists of associated people, places, organisations and other items, themes, excerpts and other information. It’s all looking pretty good so far, but there is still a lot of functionality to add, for example search and browse facilities and the interlinking of data such as themes (e.g. click on a theme listed in one entry to view all other entries that have been classified with it).

I also spent some time this week starting on the Technical Plan for a new project for the Burns people, but I haven’t got very far with it yet. I’ll be continuing with this on Monday. So, a very short report this week, even though the week itself was really rather hectic.

Week Beginning 14th September 2015

I attended a project meeting and workshop for the Linguistic DNA project this week (see http://www.linguisticdna.org/ for more information and some very helpful blog posts about the project). I’m involved with the project for half a day a week over the next three years, but that effort will be bundled up into much larger chunks. At the moment there are no tasks assigned to me so I was attending the meeting mainly to meet the other participants and to hear what has been going on so far. It was really useful to meet the project team and to hear about their experiences with the data and tools that they’re working with so far. The day after the project meeting there was a workshop about the project’s methodological approach, and this also featured a variety of external speakers who are dealing or who have previously dealt with some of the same sorts of issues that the project will be facing, so it was hugely informative to hear these speakers too.

Preparing for, travelling to and attending the project meeting and workshop took up a fair chunk of my working week, but I did also manage to squeeze in some work on other projects as well. I spent about a day continuing to work on the Medical Humanities Network website, adding in the teaching materials section and facilities to manage teaching materials and the images that appear in the carousel on the homepage.  I’ve also updated the ‘spotlight on’ feature so that collections and teaching materials can appear in this section in addition to projects. That just leaves keyword management, the browse keywords feature, and organisation / unit management to complete. I also spent a small amount of time updating the registration form for Sean’s Academic Publishing event. There were a couple of issues with it that needed tweaking, for example sending users a notification email and things like that. All fairly minor things that didn’t take long to fix.

I also gave advice to a couple of members of staff on projects they are putting together. Firstly Katherine Heavey and secondly Alice Jenkins. I can’t really go into any detail about their projects at this stage, but I managed to give them some (hopefully helpful) advice. I met with Fraser on Monday to collect my tickets for the project meeting and also to show him developments on the Hansard visualisations. This week I added a couple of further enhancements which enable users to add up to seven different lines on the graph. So for example you can compare ‘love’ and ‘hate’ and ‘war’ and ‘peace’ over time all on the same graph. It’s really quite a fascinating little tool to use already, but of course there’s still a lot more to implement. I had a meeting with Marc on Wednesday to discuss Hansard and a variety of other issues. Marc made some very good suggestions about the types of data that it should be possible to view on the graph (e.g. not just simple counts of terms but normalised figures too).

I also met with Susan and Magda on Monday to discuss the upcoming Scots Thesaurus launch. There are a few further enhancements I need to make before next Wednesday, such as adding in a search term variant table for search purposes. I also need to prepare a little 10 minute talk about the implementation of the Scots Thesaurus, which I will be giving at the colloquium. There’s actually quite a lot that needs to be finished off before next Wednesday and a few other tasks I need to focus on before then as well, so it could all get slightly rushed next week.

Week Beginning 7th September 2015

My time this week was mostly divided between three projects: the Scots Thesaurus, the Hansard work for SAMUELS and the Medical Humanities Network. For the Scots Thesaurus I managed to tick off all of the outstanding items on my ‘to do’ list (although there are still a number of refinements and tweaks that still need to be made). I updated the ‘Thesaurus Browse’ page so that it shows all of the main categories that are available in the system. These are split into different tabs for each part of speech and I’ve added in a little feature that allows the user to select whether the categories are ordered by category number or heading. I also completed a first version of the search facilities.  There is a ‘quick search’ box that appears in the right-hand column of every page, which searches category headings, words and definitions.  By default it performs an exact match search, but you can specify an asterisk character at the beginning and / or end. There’s also an advanced search that allows you to search for word, parts of speech, definition and category.  Asterisk wildcards can be used in the word, definition and category text boxes here too.  The ‘Jump to category’ feature is also now working. I’ve also added the ‘Thesaurus Search’ and ‘Browse Thesaurus Categories’ pages as menu items so people can find the content and I’ve also reinstated the ‘random category’ feature widget in the right-hand column.

Another feature that had been requested was to provide a way for people to click on a word in a category to find out which other categories it appears in. To achieve this I added in a little magnifying glass icon beside each word, and clicking on this performs a quick search for the word. I also made some further refinements to the visualisation as follows:

  1. The visualisation can now be ‘zoomed and panned’ like Google Maps.  Click, hold and drag any white space in the visualisation and you can move the contents, meaning if you open lots of stuff that gets lost off the right-hand edge you can simply drag the visualisation to move this area to the middle.  You can also zoom in and out using the scroll wheel on your mouse.  The zoom functionality isn’t really all that important, but it can help if you want to focus in on a cluttered part of the visualisation.
  2. Category labels in the visualisation are now ‘clickable’ again, as they used to be with the previous visualisation style. This makes it easier to follow links as previously only the dots representing categories were clickable.
  3. The buttons for ‘browsing up’ or ‘centring on a category’ in the visualisation are now working properly again.  If you click on the root node and this has a parent in the database the ‘browse up’ button appears in the infobox.  If you click on any other node a button displays in the infobox that allows you to make this node the root.
  4. In the visualisation I’ve added [+] and [-] signs to the labels of categories that have child categories.  As you’d probably expect, if the child categories are hidden a [+] is displayed and when clicked on this expands the categories and changes to a [-].

I’m meeting again with Susan and Magda next Monday to discuss the website and which areas (if any) still need further work. I think it’s all come together very well.

For Hansard I made some very promising progress with the visualisations. Last week I’d begun to look into ways of making the subject of the two thematic heading lines dynamic – i.e. allowing users to enter a thematic heading code into a box and for the graph to dynamically update to display this content. I hadn’t quite managed to get it working last week but I did manage to get it working this week. I had encountered a rather annoying problem whereby the AJAX request for data was not bringing back data for the second line but was instead quitting out before data was returned. To get around this I updated the way requests for data were being made. Previously each line in the graph made its own AJAX call, but this didn’t seem very efficient to me so instead I’ve changed things so the script only makes one AJAX call that can include any number of thematic codes or other search strings. The PHP script on the server then queries the database and puts the data into the required JSON format that the Javascript in the browser can then work with. This seems to work a lot better. I also added in a handy little ‘autocomplete’ feature for selecting thematic headings. Rather than having to select a code (e.g. ‘BA:01’) a user can start typing in a heading (e.g. ‘War’), select the required heading from the list and then use this. Users can still start entering codes as well and this works too. The script I started running on Friday to extract all of the information about speeches from the ‘.idx’ file supplied by Lancaster finally finished running on Tuesday this week, having extracted metadata about more than 6 million speeches.

I had quite a long but useful meeting with Fraser to discuss the Hansard data on Wednesday this week. We went through all of the options that should be available to limit what data gets displayed in the graph and have agreed to try and provide facilities to limit the data by:

  1. Speaker’s name
  2. House (commons or lords)
  3. Speaker’s party (commons only, and probably only possible from the 1920s onwards)
  4. Office (commons only)
  5. Constituency (commons only)
  6. Title (lords only)

We spent quite a lot of time looking through the metadata we have about speeches, which is split across many different SQL dumps and XML files, and it’s looking like it will be possible to get all of these options working. It’s all looking very promising.

For the Medical Humanities Network I continued working on the site and the contement management system. I realised I hadn’t added in options to record organisational units for projects or people, or to associate keywords with people. I’ve added in these facilities now. I still need to add in options to allow staff to manage organisational units. ‘Organisation’ is currently hidden as it defaults to ‘University of Glasgow’ for now and only ‘Unit’ (e.g. College of Arts, School of Critical Studies) appears anywhere. If we add an organisation that isn’t University of Glasgow this will appear, though.

I’ve also completed a first draft of the ‘collections’ section of the site, including scripts for adding, editing and listing collections. As agreed in the original project documentation, collections can only be added by admin users. We could potentially change this at some point, though. One thing that wasn’t stated in the documentation is whether collections should have a relationship with organisational units. It seemed sensible to me to be able to record who owns the collection (e.g. The Hunterian) so I’ve added in the relationship type.

It’s possible to make a collection a ‘spotlight’ feature through the collection edit page, but I still need to update the homepage so that it checks the collections as well as just projects. I’ll do this next time I’m working on the project. After that I still need to add in the teaching materials pages and complete work on the keywords section and then all of the main parts of the system should be in place.

I also spent a little time this week working on the map for Murray Pittock’s Ramsay and the Enlightenment project. I’ve been helping Craig Lamont with this, with Craig working on the data while I develop the map. Craig has ‘pinned’ quite a lot of data to the map now and was wanting me to add in the facility to enable markers of a certain type to be switched on or off. I’d never done this before using Leaflet.js so it was fun to figure out how it could work. I managed to get a very nice little list of checkboxes that when clicked on automatically turn marker types on or off. It is working very well. The next challenge will be to get it all working properly within T4.

Other than meeting with Fraser and Craig, I had a few other meetings this week. On Monday I attended a project meeting with the ‘Metaphor in the Curriculum’ project. It was good to catch up with developments on this project. It’s looking like I’ll start doing development work on this project in October, which should hopefully fit in with my other work. I also had two meetings with the Burns people this week. The first was with Kirsteen and Vivien to discuss the George Thomson part of the Burns project. There are going to be some events for this and some audio, video and textual material that they would like to be nicely packaged up and we discussed some of the possibilities. I also met with Gerry and Pauline on Friday to discuss the next big Burns project, specifically some of the technical aspects of the proposal that I will be working on. I think we all have a clearer idea of what is involved now and I’m going to start writing the technical aspects in the next week or so.

Week Beginning 31st August 2015

This week I returned to working a full five days, after the previous two part-time weeks. It was good to have a bit more time to work on the various projects I’m involved with, and to be able to actually get stuck into some development work again. On Monday and Tuesday and a bit of Thursday this week I focussed on the Scots Thesaurus project. The project is ending at the end of September so there’s going to be a bit of a final push over the coming weeks to get all of the outstanding tasks completed.

I spent quite a bit of time continuing to try to get an option to enable multiple parts of speech represented in the visualisations at the same time, but unfortunately I had to abandon this due to the limitations of my available time. It’s quite difficult to explain why allowing multiple parts of speech to appear in the same visualisation is tricky, but I’ll try. The difficulty is caused by the way parts of speech and categories are handled in the thesaurus database. A category for each part of speech is considered to be a completely separate entity, with a different unique identifier, different lexemes and subcategories. For example there isn’t just one category ‘ Rain’, and then certain lexemes within it that are nouns and others that are verbs. Instead, ‘ Rain’ is one category (ID 398) and ‘ Rain’ is another, different category (ID 401). This is useful because categories of different parts of speech can then have different names (e.g. ‘Dew'(n) and ‘Cover with dew'(v)), but it also means building a multiple part of speech visualisation is tricky because the system is based around the IDs.

The tree based visualisations we’re using expect every element to have one parent category and if we try to include multiple parts of speech things get a bit confused as we no longer have a single top-level parent category as the noun categories have a different parent from the verbs etc. I thought of trying to get around this by just taking the category for one part of speech to be the top category but this is a little confusing if the multiple top categories have different names. It also makes it confusing to know where the ‘browse up’ link goes to if multiple parts of speech are displayed.

There is also the potential for confusion relating to the display of categories that are at the same level but with a different part of speech. It’s not currently possible to tell by looking at the visualisation which category ‘belongs’ to which part of speech when multiple parts of speech are selected, so for example if looking at both ‘n’ and ‘v’ we end up with two circles for ‘Rain’ but no way of telling which is ‘n’ and which is ‘v’. We could amalgamate these into one circle but that brings other problems if the categories have different names, like the ‘Dew’ example. Also, what then should happen with subcategories? If an ‘n’ category has 3 subcategories and a ‘v’ category has 2 subcategories and these are amalgamated it’s not possible to tell which main category the subcategories belong to. Also, subcategory numbers can be the same in different categories, so the ‘n’ category may have a subcategory ’01’ and a further one ‘01.01’ while the ‘v’ category may also have ones with the same numbers and it would be difficult to get these to display as separate subcategories.

There is also a further issue with us ending up with too much information in the right-hand column, where the lexemes in each category are displayed. If the user selects 2 or 3 parts of speech we then have to display the category headings and the words for each of these in the right-hand column, which can result in far too much data being displayed.


None of these issues are completely insurmountable, but I decided that given the limited amount of time I have left on the project it would be risky to continue to pursue this approach for the time being. Instead what I implemented is a feature that allows users to select a single part of speech to view from a list of available options. Users are able to, for example, switch from viewing ‘n’ to viewing ‘v’ and back again, but can’t to view both ‘n’ and ‘v’ at the same time. I think this facility works well enough and considerably cuts down on the potential for confusion.

After completing the part of speech facility I moved onto some of the other outstanding, ono-visualisation tasks I still have to tackle, namely a ‘browse’ facility and the search facilities. Using WordPress shortcodes I created an option that lists all of the top level main categories in the system – i.e. those categories that have no parent category. This option provides a pathway into the thesaurus data and is a handy reference showing which semantic areas the project has so far tackled. I also began work on the search facilities, which will work in a very similar manner to those offered by the Historical Thesaurus of English. So far I’ve managed to create the required search forms but not the search that this needs to connect to.

After making this progress with non-visualisation features I returned to the visualisations. The visualisation style we had adopted was a radial tree, based on this example: http://bl.ocks.org/mbostock/4063550. This approach worked well for representing the hierarchical nature of the thesaurus, but it was quite hard to read the labels. I decided instead to investigate a more traditional tree approach, initially hoping to get a workable vertical tree, with the parent node at the top and levels down the hierarchy from this expanding down the page. Unfortunately our labels are rather long and this approach meant that there were a lot of categories on the same horizontal line of the visualisation, leading to a massive amount of overlap of labels. So instead I went for a horizontal tree approach, and adapted a very nice collapsible tree style similar to the one found here: http://mbostock.github.io/d3/talk/20111018/tree.html. I continued to work on this on Thursday and I have managed to get a first version integrated with the WordPress plugin I’m developing.

Also on Thursday I met with Susan and Magda to discuss the project and the technical tasks that are still outstanding. We agreed on what I should focus in my remaining time and we also discussed the launch at the end of the month. We also had a further meeting with Wendy, as a representative of the steering group, and showed her what we’d been working on.

On Wednesday this week I focussed on Medical Humanities. I spent a few hours adding a new facility to the SciFiMedHums database and WordPress plugin to enable bibliographical items to cross reference any number of other items. This facility adds such a connection in both directions, allowing (for example) Blade Runner to have an ‘adapted from’ relationship with ‘Do androids dream of electric sheep’ and for the relationship in the other direction to then automatically be recorded with an ‘adapted into’ relationship.

I spent the remainder of Wednesday and some other bits of free time continuing to work on the Medical Humanities Network website and CMS. I have now completed the pages and the management scripts for managing people and projects and have begun work on Keywords. There should be enough in place now to enable the project staff to start uploading content and I will continue to add in the other features (e.g. collections, teaching materials) over the next few weeks.

On Friday I met with Stuart Gillespie to discuss some possibilities for developing an online resource out of a research project he is currently in the middle of. We had a useful discussion and hopefully this will develop into a great resource if funding can be secured. The rest of my available time this week was spent on the Hansard materials again. After discussions with Fraser I think I now have a firmer grasp on where the metadata that we require for search purposes is located. I managed to get access to information about speeches from one of the files supplied by Lancaster and also access to the metadata used in the Millbanksystems website relating to constituencies, offices and things like that. The only thing we don’t seem to have access to is which party a member belonged to, which is a shame is this would be hugely useful information. Fraser is going to chase this up, but in the meantime I have the bulk of the required information. On Friday I wrote a script to extract the information relating to speeches from the file sent by Lancaster. This will allow us to limit the visualisations by speaker, and also hopefully by constituency and office too. I also worked some more with the visualisation, writing a script that created output files for each thematic heading in the two-year sample data I’m using, to enable these to be plugged into the visualisation. I also started to work on facilities to allow a user to specify which thematic headings to search for, but I didn’t quite manage to get this working before the end of the day. I’ll continue with this next week.