Week Beginning 19th June 2017

I decided this week to devote some time to redevelop the Thesaurus of Old English, to bring it into line with the work I’ve been doing to redevelop the main Historical Thesaurus website.  I had thought I wouldn’t have time to do this before next week’s ‘Kay Day’ event but I decided that it would be better to tackle the redevelopment whilst the changes I’d made for the main site were still fresh in my mind, rather than coming back to it in possibly a few months’ time, having forgotten how I implemented the tree browse and things like that.  It actually took me less time than I had anticipated to get the new version up and running, and by the end of Tuesday I had a new version in place that was structurally similar to the new HT site.  We will hopefully be able to launch this alongside the new HT site towards the end of next week.

I sent the new URL to Carole Hough for feedback as I was aware that she had some issues with the existing TOE website.  Carole sent me some useful feedback, which led to me making some additional changes to the site – mainly to the tree browse structure.  The biggest issue is that the hierarchical structure of TOE doesn’t quite make sense.  There are 18 top-level categories, but for some reason I am not at all clear about each top-level category isn’t a ‘parent’ category but is in fact a sibling category to the ones that are one level down.  E.g, logically ’04 Consumption of food/drink’ would be the parent category of ’04.01’, ’04.02’ etc but in the TOE this isn’t the case, rather ’04.01’, ’04.02’ should sit alongside ‘04’.  This really confuses both me and my tree browse code, which expects categories ‘xx.yy’ to be child categories of ‘xx’.  This led to the tree browse putting categories where logically they belong, but within the confines of the TOE make no sense – e.g. we ended up with ’04.04 Weaving’ within ’04 Consumption of food/drink’!

To confuse matters further, there are some additional ‘super categories’ that I didn’t have in my TOE database but apparently should be used as the real 18 top-level categories.  Rather confusingly these have the same numbers as the other top-level categories.  So we now have ’04 Material Needs’ that has a child category ’04 Consumption of food/drink’ that then has ’04.04 Weaving’ as a sibling and not as a child as the number would suggest.  This situation is a horrible mess that makes little sense to a user, but is even harder for a computer program to make sense of.  Ideally we should renumber the categories in a more logical manner, but apparently this isn’t an option.  Therefore I had to hack about with my code to try and allow it to cope with these weird anomalies.  I just about managed to get it all working by the end of the week but there are a few issues that I still need to clear up next week.  The biggest one is that all of the ‘xx.yy’ categories and their child categories are currently appearing in two places – within ‘xx’ where they logically belong and beside ‘xx’ where this crazy structure says they should be placed.

In addition to all this TOE madness I also spent some further time tweaking the new HT website, including updating the quick search box so the display doesn’t mess up on narrow screens, making some further tweaks to the photo gallery and making alterations to the interface.  I also responded to a request from Fraser to update one of the scripts I’d written for the HT OED data migration that we’re still in the process of working through.

In terms of non-thesaurus related tasks this week, I was involved in a few other projects.  I had to spend some time on some AHRC review duties.  I also fixed an issue that had crept into the SCOTS and CMSW Corpus websites since their migration:  the ‘download corpus as a zip’ issue was no longer working due to the PHP code using an old class to create the zip that was not compatible with the new server.  I spent some time investigating this and finding a new way of using PHP to create zip files.  I also locked down the SPADE website admin interface to IP address ranges of our partner institutions and fixed an issue with the SCOSYA questionnaire upload facility.  I also responded to a request for information about TEI XML training from a PhD student and made a tweak to a page of the DSL website.

I spent the remainder of my week looking at some app issues.  We are hopefully going to be releasing a new and completely overhauled version of the ARIES app by the end of the summer and I had been sent a document detailing the overall structure of the new site.  I spent a bit of time creating a new version of the web-based ARIES app that reflected this structure, in preparation for receiving content.  I also returned to the Metre app, that I’ve not done anything about since last year.  I added in some explanatory text and I am hopefully going to be able to start wrapping this app up and deploying it to the App and Play stores soon.  But possibly not until after my summer holiday, which starts the week after next.


Week Beginning 5th June 2017

I spent quite a bit of time this week on the Historical Thesaurus.  A few tweaks ahead of Kay Day has now turned into a complete website redevelopment, so things are likely to get a little hectic over the next couple of weeks.  Last week I implemented an initial version of a new HT tree-based browse mechanism but at the start of this week I still wasn’t sure how best to handle different parts of speech and subcategories.  Originally I had thought we’d have a separate tree for each part of speech, but I came to realise that this was not going to work as the non-noun hierarchy has more gaps than actual content.  There are also issues with subcategories as ones with the same number but different parts of speech have no direct connection.  Main categories with the same number but different parts of speech always refer to the same thing – e.g. 01.02aj is the adjective version of 01.02.n.  But subcategories just fill out the numbers, meaning 01.02|01.aj can be something entirely different to 01.02|01.n.  This means providing an option to jump from a subcategory in one part of speech to another wouldn’t make sense.

Initially I went with the idea of having noun subcategories represented in the tree and the option to switch part of speech in the right-hand pane after the user selected a category in the tree (if a main category was selected).  When a non-noun main category was selected then the subcategories for this part of speech would then be displayed under the main category words.  This approach worked, but I felt that it was too inconsistent.  I didn’t like that subcategories were handled differently depending on their part of speech.  I therefore created two additional versions of the tree browser in addition to the one I created last week.

The second one has [+] and [-] instead of chevrons.  It has the catnum in grey before the heading.  The tree structure is the same as the first version (i.e. includes all noun categories and noun subcats).  When you open a category the different parts of speech now appear as tabs, with ‘noun’ open by default.  Hover over a tab to see the full part of speech and the heading for that part of speech.  The good thing about the tabs is the currently active PoS doesn’t disappear from the list, as happens with the other view.  When viewing a PoS that isn’t ‘noun’ and there are subcategories the full contents of these subcategories are visible underneath the maincat words.  Subcats are indented and coloured to reflect their level, as with the ‘live’ site’s subcats, but here all lexemes are also displayed.  As ‘noun’ subcats are handled differently and this could be confusing a line of text explains how to access these when viewing a non-noun category.

For the third version I removed all subcats from the tree and it only features noun maincats.  It is therefore considerably less extensive, and no doubt less intimidating.  In the category pane, the PoS selector is the same as the first version.  The full subcat contents as in v2 are displayed for every PoS including nouns.  This does make for some very long pages, but does at least mean all parts of speech are handled in the same way.

Marc, Fraser and I met to discuss the HT on Wednesday.  It was a very productive meeting and we formed a plan about how to proceed with the revamp of the site.  Marc showed us some new versions of the interface he has been working on too.  There is going to be a new colour scheme and new fonts will be used too.  Following on from the meeting I updated the navigation structure of the HT site, replaced all icons used in the site with Font Awesome icons, added in the facility to reload the ’random category’ that gets displayed on the homepage, moved the ‘quick search’ to the navigation bar of every page and made some other tweaks to the interface.

I spent more time towards the end of the week on the tree browser.  I’ve updated the ‘parts of speech’ section so that the current PoS is also included.  I’ve also updated the ordering to reflect the order in the printed HT and updated the abbreviations to match these too.  Tooltips now give text as found in the HT PDF.  The PoS beside the cat number is also now a tooltip.  I’ve updated the ‘random category’ to display the correct PoS abbreviation too.  I’ve also added in some default text that appears on the ‘browse’ page before you select a category.

A lot of my time was spent looking into how to handle loading the ‘browse’ page at a specific point in the hierarchy and ensuring that links to specific categories are possible via catIDs in the URL.  Now when you open a category the catid is appended to the page URL in the address bar.  This is a hash (#id=1) rather than a GET variable (?id=1) as updating hashes is much easier in JavaScript and works with older browsers.  This does mean old HT bookmarks and citations will point to the wrong place, but this isn’t a problem as I’ve updated the ‘category’ page so that it now takes an old style URL then redirects on to our new ‘browse’ page.  This brings us onto the next tricky issue:  Loading a category and the tree from a passed URL.  I am still in the middle of this as there are a couple of tricky things that need to be taken into consideration:

  1. If it’s a subcat we don’t just want to display this, we need to grab its maincat, all of the maincat’s subcats but then ensure the passed subcat is displayed on screen.
  2. We need to build up the tree hierarchy, which is for nouns, so if the passed catid is not a noun category we need to also then find the appropriate noun category

I have sorted out point 1 now.  If you pass a subcat ID to the page the maincat record is loaded and the page scrolls until the subcat is in view.  I will also highlight the subcat as well, but haven’t done this yet.  I’m still in the middle of addressing the second point.  I know where and how to add in the grabbing of the noun category, I just haven’t had the time to do it yet.  I also need to properly build up the tree structure and have the relevant parts open.  This is still to do as currently only the tree from the maincat downwards is loaded in.  It’s potentially going to be rather tricky to get the full tree represented and opened properly so I’ll be focussing on this next week.  Also, T7 categories are currently giving an error in the tree.  They all appear to have children and when you click on the [+] then an error occurs.  I’ll get this fixed next week too.  After that I’ll focus on integrating the search facilities with the tree view.  Here’s a screenshot of how the tree currently looks:

I was pretty busy with other projects this week as well.  I met with Thomas Clancy and Simon Taylor on Tuesday to discuss a new place-names project they are putting together.  I will hopefully be able to be involved in this in some capacity, despite it not being based in the School of Critical Studies.  I also helped Chris to migrate the SCOTS Corpus websites to a new server.  This caused some issues with the PostGreSQL database that took use several hours to get to the bottom of.  These were causing the search facilities to be completely broken, but thankfully I figured out what was causing this and by the end of the week the site was sitting on a new server.  I also had an AHRC review to undertake this week.

On Friday I met with Marc and the group of people who are working on a new version of the ARIES app.  I will be implementing their changes so it was good to speak to them and learn what they intend to do.  The timing of this is going to be pretty tight as they want to release a new version by the end of August, so we’ll just need to see how this goes.  I also made some updates to the ‘Burns and the Fiddle’ section of the Burns website.  It’s looking like this new section will now launch in July.

Finally, I spent several hours on The People’s Voice project, implementing the ‘browse’ functionality for the database of poems.  This includes a series of tabs for different ways of browsing the data.  E.g. you can browse the titles of poems by initial letter, you can browse a list of authors, years of publication etc.  Each list includes the items plus the number of poems that are associated with the item – so for example in the list of archives and libraries you can see that Aberdeen Central Library has 70 associated poems.  You can then click on an item and view a list of all of the matching poems.  I still need to create the page for actually viewing the poem record.  This is pretty much the last thing I need to implement for the public database and all being well I’ll get this finished next Friday.





Week Beginning 25th April 2016

I was stuck down with a rather nasty cold this week, which unfortunately led to me being off sick on Wednesday and Thursday.  It hit me on Tuesday and although I struggled through the day it really affected y ability to work.  I somewhat foolishly struggled into work on the Wednesday but only lasted an hour before I had to go home.  I returned to work on the Friday but was still not feeling all that great, which did unfortunately limit what I was able to achieve.  However, I did manage to get a few things done this week.

On Monday I created ‘version 1.1’ of the Metaphoric app.  The biggest update in this version is the ‘loading’ icons that now appear on the top-level visualisation when the user presses on a category.  As detailed in previous posts, there can be a fairly lengthy delay between a user pressing on a category and the processing of yellow lines and circles completing, during which time the user has no feedback that anything is actually happening.  I had spent a long time trying to get to the bottom of this, but realised that without substantially redeveloping the way the data is processed I would be unable to speed things up.  Instead what I managed to do was add in the ‘loading’ icon to at least give a bit of feedback to users that something is going on.  I had added this to the web version of the resource before the launch last week, but I hadn’t had time to add the feature to the app versions due to the time it takes for changes to apps to be approved before they appear on the app stores.  I set to work adding this feature (plus a few other minor tweaks to the explanatory text) to the app code and then went through all of the stages that are required to build the iOS and Android versions of the apps and submit these updated builds to the App Store and the Play Store.  By lunchtime on Monday the new versions had been submitted.  By Tuesday morning version 1.1 for Android was available for download.  Apple’s approval process takes rather longer, but thankfully the iOS version was also available for download by Friday morning.  Other than updating the underlying data when the researchers have completed new batches of sample lexemes my work on the Metaphor projects is now complete.  The project celebrated this milestone with lunch in the Left Bank on Tuesday, which was very tasty, although I was already struggling with my cold by this point, alas.

Also this week I met with Michael McAuliffe, a researcher from McGill University in Canada who is working with Jane Stuart Smith to develop some speech corpus analysis tools.  Michael was hoping to get access to the SCOTS corpus files, specifically the original, uncompressed sound recordings and the accompanying transcriptions made using the PRAAT tool.  I managed to locate these files for him and he is going to try and use these files with a tool he has created in order to carry out automated analysis/extraction of vowel durations.  It’s not really an area I know much about but I’m sure it would be useful to add such data to the SCOTS materials for future research possibilities.

I also finalised my travel arrangements for the DH2016 conference and made a couple of cosmetic tweaks to the People’s Voice website interface.  Other than that I spent the rest of my remaining non-sick time this week working on the technical plan for Murray Pittock’s new project.  I’ve managed to get about a third of the way through a first draft of the plan so far, which has resulted in a number of questions that I sent on to the relevant people.  I can’t go into any detail here but the plan is shaping up pretty well and I aim to get a completed first draft to Murray next week.

Week Beginning 2nd November 2015

My first task this week was to finish off the AHRC duties I’d started last Friday, and with that out of the way I set about trying to fix a small bug with the Scots Corpus that I’d been meaning to try and sort for some time. The concordance feature of the advanced search allows users to order the sentences alphabetically by words to the left or right of the node word but the ordering was treating upper and lower case words separately, e.g. ABC and then abc, rather than AaBbCc. This was rather confusing for users and was being caused by the XSLT file that transforms the XML data into HTML. This is processed dynamically via PHP and unfortunately PHP doesn’t support XPath 2, which provides some handy functions for ignoring case. There is a hack to make XPath 1 ignore case by transforming the data before it is ordered but last time I tried to get this to work I just couldn’t figure out how to integrate it. Thankfully when I looked at the XSLT this time I realised where the transformation needed to go and we have nicely ordered results in the concordance at last.

On Tuesday we had a team meeting for the Metaphor in the Curriculum project. One of the metaphor related items on my ‘to do’ list was to integrate a search for sample lexemes with the Mapping Metaphor search facilities so In preparation for this meeting I tried to get such a feature working. I managed to get an updated version of the Advanced Search working before the meeting, and this allows users to supply some text (with wildcards if required) into a textbox and for this to search the sample lexemes we have recorded about each metaphorical link. I also updated the way advanced search results are displayed. Previously upon completing an advanced search users were then taken to a page where they could then decide which of the returned categories they actually wanted to view the data for. This was put in place to avoid the visualisation getting swamped with data, but I always found it a rather confusing feature. What I’ve done instead is to present a summary of the user’s search, the number of returned metaphorical connections, an option to refine the search and then buttons leading to the four data views for the results (visualisation, table etc). I think this works a lot better and makes a lot more sense. I also updated the quick search to incorporate a search for sample lexemes. The quick search is actually rather different to the advanced search in that the former searches categories while the latter searches metaphorical connections. Sample lexemes are an attribute of a metaphorical connection rather than a category so it took me a while to decide how to integrate the sample lexeme search with the quick search. In the end I realised that both categories in a metaphorical connection share the same pool of sample lexemes – that is how we know there is overlap between the two categories. So I could just assume that if a search term appeared in the sample lexemes then both categories in the associated connection should be returned in the quick search results.

The actual project meeting on Tuesday went well and Ellen has supplied me with some example questions for which I need to make some mock-ups to test out the functionality of the interactive exercises. Ellen had also noticed that some of the OE data was missing from the online database and after she provided it to me I incorporated it into the system.

On Thursday morning I had a meeting with Jennifer and Gary to discuss the SCOSYA project. I’m going to be developing the database, content management system and frontend for the project, and my involvement will now be starting in the next week or so. The project will be looking at about 200 locations, with student fieldworkers filling in about 1000 questionnaires and a similar number of recordings. To keep things simple the questionnaires will be paper based and the students will then transfer their answers to an Excel spreadsheet afterwards. They will then email these to Gary. So rather than having to make a CMS that needs to keep track of at least 40 users and their various data I thankfully just need to provide facilities for Gary and one or two others to use, and the data will be uploaded into the database via Gary attaching the spreadsheets to a web form. Gary is going to provide me with a first version of the spreadsheet structure, with all data / metadata fields present and I will begin working on the database after that.

I spent most of the rest of the week on tasks relating to the Hansard data for the Samuels project. On Thursday afternoon I attended the launch of the Hansard Corpus, and that all went very well. My own Hansard related work (visualising the frequency of thematic headings throughout the Hansard texts) is also progressing, albeit slowly. This week I incorporated a facility to allow the user’s search to be limited to a specific category that they have chosen, or for the returned data to include the chosen category and every other category below this point in the hierarchy. So for example whether ‘Love’ (AU:27) should only show results for this category specifically or whether it should also include all categories lower down, e.g. ‘Liking’ (AU:27:a). I created cached versions of the data for the ‘cascading’ searches too, and it’s working pretty well. I then began to tackle limiting searches by speaker. I’ve now got a ‘limit by’ box that users can open up for each row that is displayed. This box currently features a text box where a speaker’s name can be entered. This box has an ‘autocomplete’ function so for example searching for ‘Tony B’ will display people like ‘Blair’ and ‘Benn’. Clicking on a person adds them to the limit option and updates the query. And this is as far as I’ve got, because trying to run the query on the fly even for the two years of sample data causes my little server to stop working. I’m going to need to figure out a way to optimise the queries if this feature is going to be at all usable. This will be a task for next week.

Week Beginning 26th October 2015

I returned to a more normal working week this week, after having spent the previous one at a conference and the one before that on holiday. I probably spent about a day catching up with emails, submitting my expenses claim and writing last week’s rather extensive conference report / blog post. I also decided it was about time that I gathered all of my outstanding tasks together into one long ‘to do’ list as I seem to have a lot going on at the moment. The list currently has 47 items on it split across more than 12 different projects, not including other projects that will be starting up in the next month or two. There’s rather a lot going on at the moment and it is good to have everything written down in one place so I don’t forget anything. I also had some AHRC review duties to perform this week as well, which took up some further time.

With these tasks out of the way I could get stuck into working on some of my outstanding projects again. I met with Hannah Tweed on Tuesday to go through the Medical Humanities Network website with her. She had begun to populate the content management system with projects and people now and had encountered a few bugs and areas of confusion so we went through the system and I made a note of things that needed fixed. These were all thankfully small issues and all easily fixable, such as supressing the display of fields when the information isn’t available and it was good to get things working properly. I also returned to the SciFiMedHums bibliographical database. I updated the layout of the ‘associated information’ section of the ‘view item’ page to make it look nicer and I created the ‘advanced search’ form, that enables users to search for things like themes, mediums, dates, people and places. I also reworked the search results page to add in pagination, with results currently getting split over multiple pages when more than 10 items are returned. I’ve pretty much finished all I can do on this project now until I get some feedback from Gavin. I also helped Zanne to get some videos reformatted and uploaded to the Academic Publishing website, which will probably be my final task for this project.

Wendy contacted me this week to say that she’d spotted some slightly odd behaviour with the Scots Corpus website. The advanced search was saying that there were 1317 documents in the system but a search returning all of them was saying that it matched 99.92% of the corpus. The regular search stated that there were 1316 documents. We figured out that this was being caused by a request we had earlier this year to remove a document from the corpus. I had figured out a way to delete it but evidently there was some data somewhere that hadn’t been successfully updated. I managed to track this down: it turned out that the number of documents and the total number of words was being stored statically in a database table, and the advanced search was referencing this. Having discovered this I updated the static table and everything was sorted. Wendy also asked me about further updates to the Corpus that she would like to see in place before a new edition of a book goes to the printers in January. We agreed that it would be good to rework the advanced search criteria selection as the options are just too confusing as they stand. There is also a slight issue with the concordance ordering that I need to get sorted too.

At the conference last week Marc, Fraser and I met with Terttu Nevalainen and Matti Rissanen to discuss Glasgow hosting the Helsinki Corpus, which is currently only available on CD. This week I spent some time looking through the source code and getting a bit of server space set aside for hosting the resource. The scripts that power the corpus are Python based and I’ve not had a massive amount of experience with Python, but looking through the source code it all seemed fairly easy to understand. I managed to get the necessary scripts and the data (mostly XML and some plain text) uploaded to the server and the scripts executing. The only change I have so far made to the code is to remove the ‘Exit’ tab as this is no longer applicable. We will need to update some of the site text and also add in a ‘hosted by Glasgow’ link somewhere. The corpus all seems to work online in the same way as it does on the CD now, which is great. The only problem is the speed of the search facilities. The search is very slow, and can take up to 30 seconds to run. Without delving into the code I can’t say why this is the case, but I would suspect it is because the script has to run through every XML file in the system each time the search runs. There doesn’t appear to be any caching or indexing of the data (e.g. using an XML database) and I would imagine that without using such facilities we won’t be able to do much to improve the speed. The test site isn’t publicly accessible yet as I need to speak to Marc about it before we take things further.



Week Beginning 27th April 2015

I spent most of this week continuing with the App version of the old STELLA teaching resource Essentials of Old English and I have now completed the first version of it, which you can currently view here: http://www.arts.gla.ac.uk/stella/apps/eoe/ (note however that this URL may stop working at any time). Pretty much all of the time devoted to the App was spent developing the ‘Plus’ exercises, of which there are 50. In the original resource these were actually split into almost 80 exercises, but I’ve amalgamated some of these for the new App. Most of the exercises followed the same general pattern as the ‘basic’ exercises, which meant I generally just had to extract the content and plug it into the structure I’d already established. I’m using a JSON file to store the exercise contents and this is then pulled into the exercises Javascript file with the appropriate content extracted based on the exercise ID that is passed. This approach works pretty well and it also ensures that new exercise types can be plugged into the code relatively easily. I should really have an ‘exercise type’ field in the JSON file that specifies which bit of logic runs, but for the time being at least this is based on exercise ID instead. Some of the ‘Plus’ exercises were rather different in structure to the ‘Basic’ exercises, but thankfully these had the same sort of structure as some of the exercises I’d previously developed for the ‘Grammar’ app, for example exercises where you have to assign function labels (SPOCA) to phrases or label phrase forms (e.g. Noun Phrase). This meant I could take my existing code and plug it into the new app, a process that was relatively straightforward.

After formatting the questions and answers for all 50 exercises a first version of the app was ready for testing. This version isn’t a proper ‘app’ as such in that it just runs in a web browser rather than being installed on a device, but in terms of functionality it reflects what will be included. I’ve arranged to meet Ger Malcolm, the guy in communications who currently manages the University’s Apple developer account in order to see about STELLA taking over the payment and management of the account. It’s a good time to meet with him firstly because the annual payment to Apple is due soon and secondly because our fourth app is just about ready to launch. I’ll also need to sort out a Google Play developer account so we can release Android versions of all of our apps too.

After completing the EOE app I started working on the ‘web’ version of the resource. For each of the apps I’ve created a version that fits within the University website design and is aimed at people using PCs rather than mobile devices so I needed to make such a version for EOE too. The web versions for the three previous apps I’d created were all still using the slightly older University web layout (the one with the nice big background image down the right hand side of the screen) so I decided to update these three to bring them into line with the current University website. You can view these here:

  1. ARIES: http://www.arts.gla.ac.uk/stella/apps/web/aries/
  2. English Grammar: http://www.arts.gla.ac.uk/stella/apps/web/grammar/
  3. Readings in Early English: http://www.arts.gla.ac.uk/STELLA/apps/web/readings/

As you can probably deduce from the above URLs, the web version of the EOE app will be found here:


I haven’t completed work on this yet though. I’m currently still getting the navigation structure sorted out. Migrating the app to the University layout is slightly tricky in some respects because I can no longer rely on the widgets provided by the Jquerymobile library that I use for the app. In general the logic behind the exercises can all be carried over without a problem but certain things like popups, collapsible sections of the screen and buttons need to be reworked. I hope to get this completed next week.

Other than App stuff I worked on a few other tasks this week. I had a meeting with Susan Rennie and Magda to discuss the Scots Thesaurus project on Thursday. I went through the tool I’d created for managing metaphor categories and showed Magda how it might be used. She is going to send me the data she has worked on since I was last given any and I’ll import this into my system. I also need to start work on some of the front end aspects of the project as Susan is wanting to demonstrate some of this at a conference in August.

I also made a little tweak to the Historical Thesaurus of English this week. Fraser had received a request that we displayed subcategory numbers in the subcategory list within the category page as people sometime wanted to be able to pinpoint a subcategory based on its number. This made total sense and it’s a feature that I wished we had included from the start. Thankfully it was very easy to incorporate the update and the numbers now appear to the right in the subcategory list.

I had a meeting with Craig Lamont, a researcher who is working on a project with Murray Pittock. They have a historical map of Edinburgh that the NLS has made available to them in geocoded format and they are wanting to pin markers on it. Craig didn’t really know how to proceed with this so I met with him, showed him some of the previous work I’d done for the Burns people and we managed to get a mockup working with two markers pinned to it. Craig is now going to go away and compile the data in a spreadsheet and we’ll meet again at some point and think about how to transform this into map markers.

Wendy Anderson contacted me this week with a request to remove a record from the SCOTS Corpus. I’ve never had access to the underlying database for SCOTS, having only worked on the front-end design for the site. This was a good opportunity for me to get access to the database and see how it all fits together. With the help of Chris McGlashan I got access and managed to delete the necessary information. It will be useful to have this access in future.

I also made a couple of small tweaks to the DSL website (well, the development version – we still need to ‘go live’ with a variety of updates).  I added in a ‘cite’ popup to the ‘Scots Language’ page, so that’s task number 4 ticked off our list of things to do.


Week Beginning 13th April 2015

I was off for Easter last week and spent a lovely, sunny week visiting family in Yorkshire. Upon returning from this relaxing week I got stuck into a few projects, the first of which was SAMUELS. At the final project meeting before Easter Fraser was given a hard drive with the complete Hansard data on it – a 40Gb tar.gz file. I got this off Fraser with a view to extracting the data and figuring out exactly what it contained and just what I would need to do with it. Unzipping the file took many hours and resulted in a tar file that was approaching 200Gb in size. Unfortunately, although the unzipping process appeared to complete successfully when I attempted to ‘de-tar’ the file (i.e. split it up into its individual files) my zip program just gave an error message about the archive being unreadable. I repeated the extraction process, which took many more hours, but alas, the same error was given. I had a meeting with Marc and Fraser on Tuesday and Marc said he’d try to extract the files on his computer so I handed the hard drive over. I haven’t heard anything back from Marc yet but fingers crossed he has managed to make some progress. What I really need is a new desktop PC that has more storage and processing power as I’m currently rather hampered by the hardware I have access to.

The Tuesday meeting with Marc and Fraser was primarily to discuss the Thesaurus of Old English (TOE). There is an online version of this resource which is hosted at Glasgow, but it really needs to be redeveloped along the lines of the main HT website and we discussed how we might proceed with this. I would very much like to get a reworked TOE website up and available as soon as possible to complement the HT website and Marc is of the same opinion. As there is a big Anglo-Saxon conference being held in Glasgow in August (http://www.isas2015.com/) Marc would really like the new TOE to be available for this, alongside the Old English metaphor map which I will be working on in June. We agreed that Marc and Fraser would work on the underlying data and will try to get it to me in the next week or so and I will then adapt the scripts I’ve already created for the HT to work with this data. Structurally the data from each thesaurus are very similar so it shouldn’t be too tricky a task.

One item that has been sitting on my ‘to do’ list for a long time is to redevelop the map interface of the SCOTS corpus website. This was an aspect of the site that I didn’t update significantly when I revamped the SCOTS website previously but I always intended to update it. I’ve updated the map to use the current version of the Google Maps API (version 3). The old map used version 2, which Google no longer supports. Google still allow access to version 2 (they actually migrate calls to version 2 to version 3 at their end), but this facility could be switched off at any time so it’s important that we moved to version 3. I updated the map so that it displays a map rather than satellite images – I decided that being able to see placenames and locations would be more useful that seeing the geography.  I’ve also removed options to switch from map to satellite and to view street view as these don’t really seem necessary.

I’ve styled the map to make it different from a standard Google map. The map is coloured so that water is the same colour as the website header and land masses are grey. I’ve also set it so that road markings, stations and businesses are not labelled to avoid clutter. I’ve also added a ‘key’ to the info-box on the right by default so people can tell what the icons mean. This gets replaced by document details when a ‘show details’ link is pressed. I was originally intending to replace the icons used on the map with new ones but I think on the grey map the icons still look pretty good. The new version of the map can be found here: http://www.scottishcorpus.ac.uk/advanced-search/

I also updated some favicons (the little icons used in browser table) used by a few sites this week. I’d noticed that the Mapping Metaphor icon I had created looked really blocky and horrible on my iPad and realised that higher resolution favicons were required for this site, plus SCOTS and CMSW. I found a website that can create such icons rather nicely (http://www.xiconeditor.com/) and created some new and considerably less pixilated favicons. Much better!

I also spent a bit of time on DSL duties, updating the front end of the ‘dev’ version so that it worked nicely with Peter’s newly released Boolean search functionality. It is now possible to use Boolean keywords AND, OR and NOT, but if these words were found at the beginning or the end of a search string they resulted in an HTTP error being returned. I’ve now added in a check that strips out such words. I also made another couple of tweaks to the search results browser. Once these updates have been approved by Ann I will update the ‘live’ site.

The remainder of the week was mostly spent with Essentials of Old English (EOE). I’ve been meaning to update this ageing and slightly broken website (see http://www.arts.gla.ac.uk/stella/OE/HomePage.html) for some time but other work commitments have taken priority. As I’m awaiting the Hansard data for SAMUELS it seemed like a good opportunity to make a start, plus I think it would be great to have this resource available before ISAS in August too. The old website uses Java applets for the exercises, which is a bit of a pain as most modern browsers recognise Java applets as major security risks these days and refuse to run them without a lot of customisation. It took an hour or so just to get my browser to open the exercises, and even then I’m having trouble getting some of the ‘Plus’ exercises to display. However, I came across the uncompiled Java source files in a directory on the STELLA server so these should be some help.

I’m creating an ‘app’ version of EOE that will sit alongside the three other STELLA apps I’ve previously created, so visually this new app fits in with the previous ones. So far I’ve managed to complete the ‘Basic’ book, the glossary and the ‘about’ pages, leaving the ‘Plus’ book and all of the exercises still to do. You can view a work in progress version here: http://www.arts.gla.ac.uk/STELLA/apps/eoe/ (Note that this is URL may cease to function at any time).

I hope to be able to find the time to continue with this app next week, although I have a few meetings and other commitments that might limit how much I can do.


Week Beginning 30th March 2015

A brief report this week as I’m off for my Easter hols soon and I don’t have much time to write. I will be off all of next week. It was a four-day week this week as Friday is Good Friday. Last week was rather hectic with project launches and the like but this week was thankfully a little calmer. I spent some time helping Chris out with an old site that urgently needed fixing and I spent about a day on AHRC duties, which I can’t go into here. Other than that I helped Jane with the data management plan for her ESRC bid, which was submitted this week. I also had a meeting with Gavin Miller and Jenny Eklöf to discuss potential collaboration tools for medical humanities people. This was a really interesting meeting and we had a great discussion about the various possible technical solutions for the project they are hoping to put together. I also spoke to Fraser about the Hansard data for SAMUELS but there wasn’t enough time to work through it this week. We are going to get stuck into it after Easter.

In terms of actual development work, I did a little bit for three projects this week. Firstly the SCOTS corpus: after I’d added the ‘cite this page’ feature to the Mapping Metaphor website last week Wendy thought it would be a good idea if I could add a similar feature to the SCOTS corpus. I added this in for both SCOTS and CMSW and I’m sure it will be a very useful feature. I also got back into DSL development duties, finishing off the reworking of the advanced search results page. I had previously split this into two tabs and had added improved pagination facilities but Ann wanted the results for DOST and SND to appear side by side rather than in separate tabs. She wanted a few other changes made too and I think I managed to get all of this in place this week. Hopefully we’ll be able to ‘go live’ with the changes once I’m back from my holiday. I also spent a bit of time with the Digital Humanities Network website. This website looks like the University website but doesn’t run within its content management system. It does however rely on a lot of the scripts from their system and some changes had been made to these scripts this week which were unfortunately causing the digital humanities website’s JavaScript to completely break. I managed to find a workaround for this but there are still some aspects that are not quite right and I’ll need to have a further look into this once I’m back. That’s all for now though.

Week Beginning 12th January 2015

I completed the migration of the audio and video files for the Scottish Corpus website this week.  The Quicktime plugin is no longer used and instead audio uses the standard HTML5 Audio tag while video uses the HTML5 Video tag.  It took a while to get all of the media files migrated to newer formats and to get all of the Javascript updated so things like synchronisation and ‘jump to’ continued to work, but that’s it all done now.  It now means that the media files can be heard in all modern browsers on all devices (including smartphones and tablets), plus the server no longer needs to run the Darwin Streaming Server, which simplifies the server side of things and should make migrating the system to a new operating system (which Chris intends to do later this year) a lot easier.  You can see an example of the new interface here: http://www.scottishcorpus.ac.uk/document/?documentid=805

I had thought I would be spending some of this week working on Burns related things, but unfortunately Pauline is still not well so this has been pushed back to next week.  We should hopefully still be able to get the timeline and the tour maps available in time for Burns’ Night though.

The rest of my week was spent on Mapping Metaphor duties.  I completed a first draft of the advanced search last week and I received some feedback from Ellen.  She suggested that rather than jumping directly from the search form to the visualisation it might instead be better to present users with a list of categories that match the user’s search criteria to enable them to select which of these they’re interested in.  This would get around the problem of someone searching for a concept such as ‘light’ being faced with hundreds of categories when the majority of them might not be relevant.  I added in such an intermediary page, and it works a lot better.  Matching categories are listed with a checkbox beside them and by default all categories are ticked.  An option is available enabling a user to select or deselect all, in addition to allowing individual categories to be ticked / unticked and after pressing the ‘continue’ button the search runs.

After completing this task I began to investigate how the timeline might operate.  I decided against using an off the peg timeline solution (such as Timeglider, which I had previously used for a mock-up).  This was because all the points plotted on the timeline will be rounded into 50 year chunks so will be stacks of points, plus we don’t need to be able to zoom in on specific periods – 50 year chunks is a granular as we are going to get.  Instead I decided to implement something using d3.js, based on some code from a scatterplot chart (http://bl.ocks.org/mbostock/3887118).  Our timeline would have time as 50 year chunks across the bottom, points of two sizes (representing strong and weak metaphors) and 6 colours representing which top level categories the connections belong to:  1 (External World) is green, 2 (Mental World) is blue and 3 (Social World) is red, so 1-1 is green, 2-2 is blue, 3-3 is red, 1-2 is cyan, 1-3 is yellow, 2-3 is magenta.

For my first attempt I didn’t round the dates to 50 year chunks and made the position of the dots on the vertical axis random, as you can see from the following screenshot:


I then made two further tests where the dates were rounded and the points were ‘stacked’, one with dots, and the other with rectangles, as you can see below:

metaphor-timeline-test-2 metaphor-timeline-test-3

Ellen preferred the dots to the rectangles so that’s what I focussed on.  The next stage was to make the diagram dynamic (i.e. working with any data based on the user’s selection) and to integrate it with the other data view options.  Ellen also informed me that there would be no 50 year chunks before 1150 and everything before this point should just be grouped into an Old English ‘bucket’.  I also realised that in order to deal with the datasets as used in the other data views I would have to be able to plot aggregated points as well as individual metaphorical connections.  This meant updating the data output files so that aggregated data (e.g. that there are connections from 1B03 Age and the level 2 category 2A Mental Capacity) also returned a start date (e.g. the earliest date when 1B03 had a connection to any level 3 category within 2A).

I decided that aggregated points should be displayed as rectangles on the visualisation.  Having two different shapes actually proved to be rather tricky to implement as previously the entire dataset was cycled through from within the ‘add circle’ function and the way the code was set up meant that adding a little ‘if’ statement to position a different shape instead was not particularly straightforward.  However, I managed to figure out a way to do it and by the end of the week the ‘Change View’ option for the timeline was operational for the ‘drilldown’ data, producing something like the following:


There is still a lot to be done, though.  For example, sometimes there is too much data for a column and it  gets cut off the top of the visualisation.  I need a way to manage this.  I also need to implement the key, facilities to open the metaphor card when a data point is clicked on and the download option and the top-level view as well.  I’ll continue with this next week.

Other than the above I also had a meeting with Marc this week, which was mainly an opportunity to talk about projects and my work levels and things like that.

Week beginning 5th January 2015

This was my first week back after my Christmas holiday.  The holiday was lovely but despite being two whole weeks it did seem to zip by rather too speedily.  This week I split my time primarily over three projects – Mapping Metaphor, Burns and the Scottish Corpus.  For Mapping Metaphor I returned to the task I had begun before Christmas but hadn’t quite managed to complete – the updating of all of the database queries in the site from dedicated MySQL functions to PDO style functions.  This was a rather tedious task, but I managed to get it completed and thoroughly tested this week.  Using PDO will make the site more future-proof and should also help boost security too.

The next task I completed this week was the Advanced Search functionality.  The Quick Search performs a search for categories, while the advanced search is intended to search for metaphor connections.  A document stating what search facilities were required had been prepared a while ago and I worked through this, implementing all of the desired search options.  I also implemented wildcard searching for both the quick and advanced searches.  This facility is the same as is offered on the Historical Thesaurus website, enabling a user to place an asterisk at the start and/or end of their search term in order to run a partial match search for the beginning, middle or end of words.  The advanced search options that are now available enable a user to search for category names or keywords, select one or more level 2 categories in which to search (with an option to state whether the search should be limited to connections purely within these categories or just involving one of the categories), specifying a date of metaphor inception, metaphor strength and directionality.  I think this covers all of the search options that the project requires, and will users to perform searches such as ‘show all of the bi-directional, strong metaphors that began in the 19th century’ or ‘show all strong metaphor connections within 1E Animals and 2D Emotion that began before 1200’.  The advanced search results are displayed using the visualisation interface, and as with all of the metaphor browse options, users can also then choose to display the results as a table or cards.  I think it’s working rather well at the moment, but I’m still awaiting feedback from the team.

After completing the advanced search I made another few tweaks to the site, for example I added in a print CSS file so the pages now print a bit more nicely.  I also made the interface ‘responsive’ – i.e. it resizes to fit different screen widths.  This works ok, but the visualisation itself still doesn’t resize.  I’m not sure I am going to make it resize, as squashing it down to fit on a small screen it going to impede the usability of the visualisation.  I think it might be preferable to just have the visualisation scroll off the user’s screen and make them move their view.  The only major tasks I have left to develop for the project now are the timeline view of the data and the content management system.

For Burns I began to work with the leaflet.js based historical maps of Burns’s tours that the NLS people had very kindly prepared for us last year.  Just before Christmas Pauline had sent me a document containing a few of the items that we will ‘pin’ to the map so I added these to see how they will look.  I’ve made two styles of ‘pop-up’ for the map – a default one that is used for shorter items and a bigger one that is used for longer items.  The latter featured scrollbars so that long items can be read without the pop-up spilling out over the map.  It’s all looking promising but I need to get more content before I can proceed further.  Pauline was off sick this week so couldn’t get this to me, but hopefully all can be completed next week.  I also tweaked the timeline as well, removing the image bar at the top of the timeline as this was taking up too much space and didn’t really add much to the look of the timeline (images of old pages tend to look the same as thumbnails – boring beige squares).  We hope to launch both the timeline and the interactive tour maps next week.

My final task of the week was to being phase two of the redevelopment of the Scottish Corpus website.  This phase involves migrating the audio and video of the site away from using the proprietary Quicktime plugin and instead using standard HTML5 audio and video.  Currently audio and video files open up in a new browser window and the files only work if the user’s computer has the Quicktime plugin installed.  This is causing problems for some users, plus it makes accessing the files on tablets and smartphones rather tricky.  In migrating the files to HTML5 I had a few tasks to consider:

  1. Migrating the audio files to MP3
  2. Replacing the ‘new window’ with a jQuery UI in-page dialog box
  3. Adding in the HTML5 audio / video player
  4. Updating the site’s Javascript so that the synchronisation of audio and the highlighting of transcript sections continues to function.

I wasn’t sure how long these tasks might take, or even if I’d be able to complete them, but thankfully my experiments were rather successful.  I set up a test version of the ‘document’ page for one of the documents that had audio and a synchronised transcription and then added the necessary jQuery UI code to enable a dialog box to appear when the ‘play audio’ link is pressed.  Within this I added the HTML audio tag, linking to my newly migrated MP3 file.  I did have to do some fairly major reworking of the Javascript that synchronises the audio to the text, but I managed to get this working.  I’m hoping to be able to replace the old Quicktime version with the new HTML5 version next week, all being well.