Week Beginning 20th July 2015

It was a four-day week this week due to the Glasgow Fair holiday. I actually worked the Monday and took the Friday off instead, and this worked out quite well as it gave me a chance to continue development of the Scots Thesaurus before we had our team meeting on the Tuesday morning. I had previously circulated a ‘to do’ list that brought together all of the outstanding technical tasks for the project, with 5 items specifically to do with the management of thesaurus data via the WordPress interface. I’m happy to report that I managed to complete all of these items. This included adding facilities to enable words associated with a category to be deleted from the system (in actual fact the word records are simply marked as ‘inactive’ in the underlying database). This option makes it a lot easier for Magda to manage the category information. I also redeveloped the way sources and URLs as stored in the system. Previously each word could have one single source (either DOST or SND) and a single URL. I’ve updated this to enable a word to have any number of associated sources and URLs, and I’ve expanded the possible source list to include the paper Scots Thesaurus too. I could have updated the system to incorporate any number of sources but Susan thinks these three will be sufficient. Allowing multiple sources per word actually meant quite a lot of reworking of both the underlying database and the WordPress plugin I’m developing for the project, but it is all now working fine. I also updated the way connections to existing Historical Thesaurus of English categories are handled and added in an option that allows a CSV file containing words to be uploaded to a category via the WordPress admin interface. This last update should prove very useful to the people working on the project as it will enable them to compile lists of words in Excel and then upload them directly from this to a category in the online database. On Tuesday we had a team meeting for the project and I gave a demonstration of these new features and Magda is going to start using the system and will let me know if anything needs updated.

I spent a small amount of time this week updating the Burns website to incorporate new features that launched on the anniversary of Burns’ death on the 21st. These are an audio play about Burns forgeries (http://burnsc21.glasgow.ac.uk/burns-forgery/) and an online exhibition about the illustrations to George Thomson’s collections of songs (http://burnsc21.glasgow.ac.uk/the-illustrations-to-george-thomsons-collections/).

I continued working on the SAMUELS project this week, again trying to figure out how to get the Bookworm system working on the test server that Chris has set up for me. The script that imports the congress data into Bookworm that I left running last week successfully completed this week. The amount of data generated for this textual resource is rather large, with one of the tables consisting of over 42 million rows and another one taking up 22 million rows. I still need to figure out how this data is actually queried and used to power the Bookworm visualisations and the next step was to get the Bookworm API installed and running. The API connects to the database and allows the visualisation to query it. It’s written in Python and I spent rather a lot of time just trying to get Python scripts to execute via Apache on the server. This involved setting up a cgi-bin, ensuring Apache knows about it, where it is and it has the permissions to execute scripts stored there. I spent a rather frustrating few hours getting nothing but 403 Forbidden errors before realising that you had to explicitly give Apache rights to do things with the directory in the apache configuration file as well as updating file permissions. By the end of the week I still hadn’t managed to get Python files actually running – instead the browser just attempts to download the files. I need to continue with this next week, hopefully with the help of Chris McGlashan who was on holiday this week.

I spent the majority of the rest of the week working on the Old English version of the Metaphor Map, which we are intending to launch at the ISAS conference. This is a version of the Metaphor Map that features purely Old English related data and will sit alongside the main Mapping Metaphor website as a stand-alone interface. Here’s a summary of what I managed to complete this week:

  1. I’ve uploaded OE stage 5 and stage 4 data to new OE specific tables
  2. I identified some rows that included categories that no longer exist and following feedback from Ellen I deleted these (I think there were only 3 in total)
  3. I’ve replicated the existing site structure at the new OE URL and I’ve updated how the text for the ancillary pages is stored: It’s all now stored in one single PHP file which is then referenced by both the main and the OE ancillary pages. I’ve also put a check in all of the OE pages to see if OE specific text has been supplied and if so this is used instead of the main text. This should make it easier to manage all of the text.
  4. I’ve created a new purple colour scheme for the OE site, plus a new purple ‘M’ favicon (unfortunately it isn’t exactly the same as the green one so I might update this)
  5. I’ve expanded the top bar to incorporate tabs for switching from the OE map to the main one. These are currently positioned to the left of the bar in a similar way to how the Scots Corpus links to CMSW and back work.
  6. The visualisation / table / card views are all now working with the OE data. Timeline has been removed as this is no longer applicable (all metaphors are OE with no specific date).
  7. Search and browse are also now working with the OE data.
  8. All reference to first dates and first lexemes has been removed, e.g. from metaphor cards, columns in the tabular view, the search options
  9. The metaphor card heading now says ‘OE Metaphor’ and then a number, just in case people notice the same number is used for different connections in the OE / non-OE sites.
  10. The text ‘(from OE to present day)’ has been added to the lexeme info in the metaphor cards.
  11. Where a metaphorical connection between two categories also exists in the non-OE data, a link is added to the bottom of the metaphor card with text ‘View this connection in the main metaphor map’. Pressing on this opens the nonOE map in a new tab with the visualisation showing category 1 and the connection to category 2 highlighted. The check for the existence of the connection in the non-OE data ignores strength and presents the nonOE map with both strong and weak visible. This is so that if (for example) the OE connection is weak but the main connection is strong you can still jump from one to the other.
  12. I’ve updated the category database to add a new column ‘OE categories completed’. The OE categories completed page will list all categories where this is set to ‘y’ (none currently)
  13. I’ve created staff pages to allow OE data to be managed by project staff.

Next week I’ll receive some further data to upload and after that we should be pretty much ready to launch.

Week Beginning 13th July 2015

Last week I returned to the development of the Essentials of Old English app and website, which we’re hoping to make available before the ISAS conference at the start of August. Christian had previously sent me a list of things to change or fix and I managed to get through all but one of the items. The item that I still needed to tackle was a major reordering of the exercises, including the creation of new sections and moving exercises all over the place. I had feared that this would be a tricky task as the loading of the next and previous exercise is handled programmatically via Javascript and I had thought I’d just set this up so the script took the current ID and added or subtracted 1 to create the ‘next’ and ‘previous’ links. This would have meant real trouble if the order of the exercises was changed. Thankfully when I looked at the data I realised I’d structured things better than I’d remembered and had actually included ‘next’ and ‘previous’ fields within each exercise record which contained the IDs of whichever exercise should come before or after the current one. So all I had to do was switch everything around and then update these fields to ensure that the correct exercises loaded. There was a little more to it than that due to sections changing and exercise titles changing, and it was a time-consuming process, but it wasn’t too tricky to sort out. The new structure makes a lot more sense than the old one so it’s definitely been worthwhile making the changes.

After making all of the required updates to the pages the next step was to actually create an app from them. It’s been a while since I last made an app and in that time there has been a new version of Apache Cordova (the very handy wrapping tool that generates apps from HTML, JavaScript and CSS files) so I had to spend some time upgrading the software and all of the platform specific tools as well, such as XCode for iOS and the Android developer tools for Android. Once this was complete I managed to get versions of the app working for iOS and Android and I tested these out both using emulators and on actual hardware. I had to rebuild the code a few times before everything was exactly as I wanted it, and I had to include some platform specific CSS styles, for example to ensure the app header didn’t obscure the iOS status bar. It also took a horribly long time to create all of the icons and splash screens that are required for iOS and Android, and then a horribly long time to create the store pages via iTunes Connect and the Google Play developer interface. And then I needed to generate seemingly thousands of screenshots at different screen sizes from phones to mini tablets to full-size tablets. And then I had to go through the certification process for both platforms.

As usual, when it came to verify the iOS app using distribution certificates and provisioning profiles I got a bunch of errors in XCode. It took ages to get to the bottom of these (it was because the distribution certificate that existed for the University of Glasgow account had been generated by someone in MVLS in May and had been downloaded onto her Mac and I didn’t have a copy of it and you can’t just re-download the certificate, you have to make a new one and then associate it with the provisioning profile and download and install the certificates on your Mac and then ensure you close and reopen XCode for the changes to be registered. Ugh). However, I did finally manage to get the app uploaded to iTunes Connect and I have now submitted it for review. Hopefully it will be approved within the next two weeks.

The process of signing the Android version of the app was less arduous but still took a fair amount of time to finally get right. I must remember to follow these instructions next time: http://developer.android.com/tools/publishing/app-signing.html#signing-manually (although it seems as if help pages relating to app development stop working after a few months when new OS versions get released).

Now I’ve completed the process of creating a STELLA app for Android I really need to get around to updating the three existing apps for iOS and creating Android versions of them too. It shouldn’t be too difficult to do so, but it will take some time and I’m afraid I just don’t have the time at the moment due to all of the other work I need to do.

One of the other major outstanding projects I’m currently working on is the Samuels project, specifically trying to get some visualisations of the semantically tagged Hansard data working. We are using the Bookworm interface for this (see http://bookworm.culturomics.org/) and I’ve been trying to get the code for this working with our data for a while now. It turned out that I needed access to a Linux box to get the software installed and last week Chris McGlashan helpfully said he’d be able to set up a test server for me. On Monday this week he popped by with the box, which is now plugged in and working away in my office. After some initial problems getting through the University proxy I managed to download a handy script that installs all of the components that Bookworm requires (see https://github.com/Bookworm-project/FreshInstallationScript). I then downloaded the Congress data that is used as test data in the documentation (see https://github.com/econpy/congress_api) and followed the steps required to set this up. There were a couple of problems with this that were caused by my test server not having some required zip software installed, but after getting over this I had access to the data. The script that then imports this data into Bookworm is currently running, so I will need to wait and see how that works out before I can proceed further.

Another ongoing project is the Thesaurus of Old English. I’d managed to complete a first version of the new website for this project last week and I’d received some feedback from a couple of people since then and I updated the interface and functionality as a result of this as follows:

  1. I’ve added some explanatory text to the ‘quick search’ so it’s clearer that you can search for either Old English words or Modern English category headings using the facility. I’ve also included a sentence about wildcards too. This appears both on the homepage and the ‘quick search’ tab of the search page.
  2. I’ve updated the labels in the ‘advanced search’ tab so ‘word’ now says ‘Old English word’ and ‘category’ now says ‘Modern English words in Category Heading’ to make things clearer.
  3. I’ve updated the way the search queries for category headings when asterisks are not supplied, both in the ‘quick search’ and in the ‘category’ box of the ‘advanced search’. Now if a user enters a word (e.g. ‘love’) the script searches for all categories where this full word appears – either as the full category name (‘love’) or beginning the category name (‘love from the heart’), the end of the category name (‘great love’) or somewhere in the middle (no example in ‘love’ but e.g. search for ‘fall’: ‘Shower, fall of rain’). It doesn’t bring back any categories where the string is part of a longer word (e.g. ‘lovely, beautiful, fair’ is not found). If the user supplies asterisks at the beginning and/or end of the search term then the search just matches the characters as before.
  4. I’ve also updated the appearance of the flags that appear next to words. These are now smaller and less bold and I’ve also added in tool-tip help for when a user hovers over them.

What with all of the above going on I didn’t get a chance to do any further work on the Scots Thesaurus or the Sci-Fi Med Hums database, but hopefully I’ll be able to work on these next week. Having said that, I’ve also now received the first set of data for the Old English Metaphor Map, and this also has to ‘go live’ before ISAS at the start of August, so I may have to divert a lot of time to this over the next couple of weeks.



Week Beginning 6th July 2015

I continued working on the new website for the Thesaurus of Old English (TOE) this week, which took up a couple of days in total. Whilst working on the front end I noticed that the structure of TOE is different to that of the Historical Thesaurus of English (HTE) in an important way: There are never any categories with the same number but a different part of speech. With HTE the user can jump ‘sideways’ in the hierarchy from one part of speech to another and then browse up and down the hierarchy for that part of speech, but with TOE there is no ‘sideways’ – for example if there is an adjective category that could be seen as related to a noun category at the same level these categories are given different numbers. This difference meant that plugging the TOE data into the functions I’d created for the HTE website just didn’t work very well as there were just too many holes in the hierarchy when part of speech was taken into consideration.

The solution to the problem was to update the code to ignore part of speech. I checked that there were indeed no main categories with the same number but a different part of speech (a little script I wrote confirmed this to be the case) and then updated all of the functions that generated the hierarchy, the subcategories and other search and browse features to ignore part of speech, but instead to place the part of speech beside the category heading wherever category headings appear (e.g. in the ‘browse down’ section or the list of subcategories). This approach seems to have worked out rather well and the thesaurus hierarchy is now considerably more traversable.

I managed to complete a first version of the new website for TOE, with all required functionality in place. This includes both quick and advanced searches, category selection, the view category page and some placeholder ancillary pages. At Fraser’s request I also added in the facility to search with vowel length marks. This required creating another column in the ‘lexeme search words’ table with a stricter collation setting that ensures a search involving a length mark (e.g. ‘sǣd’) only finds words that feature the length mark (e.g. ‘sæd’ would not be found). I added an option to the advanced search field allowing the user to say whether they cared about length marks or not. The default is not, but I’m sure a certain kind of individual will be very keen on searching with length marks. If this option is selected the ‘special characters’ buttons expand to include all of the vowels with length marks, thus enabling the user to construct the required form. It will be useful for people who want to find out (for example) all of the words in the thesaurus that end in ‘*ēn’ (41) as opposed to all of those words that end in ‘*en’ disregarding length marks (1546).

I think we’re well on track to have the new TOE launched before the ISAS conference at the beginning of next month, which is great.

I would also like to have the new ‘Essentials of Old English’ app and website available before the conference as well, but I haven’t had a chance to act on Christian’s extensive feedback since I got back from my holiday a number of weeks ago. I’m happy to say that I managed to find a little bit of time to work on the updates this week, and have so far managed to implement most of the required updates. The only thing left to update is the reordering of the ‘plus’ exercises, which I’m afraid is going to cause me some serious problems due to the way the JavaScript that processes the exercises works. I’ll try and get this sorted next week though. I also need to ‘wrap’ the code into an app, test it on iOS and Android devices, create store records for the apps, generate screenshots and icons and all those other tedious tasks associated with actually getting an app published. I’m hoping to get all of this done next week but I’ll just need to see if anything else crops up that eats into my time. I did finally manage to set up a Google Play account for the University this week, thanks to James Matthew and his purchasing credit card. I should now be able to use this to create and publish Android versions of all of the STELLA apps. It will also be useful for the Mapping Metaphor app too.

I spent some further time this week on Gavin Miller’s Science Fiction and the Medial Humanities project, continuing to work on the WordPress plugin I’ve created for creating and managing a bibliography. Last week I created the form that will be used to upload bibliographical items while this week I created the code that handles the actual upload and editing of the items, and also the tabular view of items in the WordPress admin interface. It seems to be working pretty well, although facilities to handle error checking for custom data types seem to be a bit limited. I found a way to do this on this Stackoverflow page: http://stackoverflow.com/questions/13216633/field-validation-and-displaying-error-in-wordpress-custom-post-type , although I had to update it to store actual error messages relating to individual fields as a session variable in order to display these messages after the form has submitted. It’s still not ideal as the pesky error messages are being displayed as soon as I open an item to edit, even before I’ve submitted any updates. I might have to resort to JavaScript error messages, which might not be so bad as it’s only project staff who will be uploading the data.

I continued working on the Scots Thesaurus project this week as well. I met with Susan and Magda on Tuesday to talk them through using the WordPress plugin I’d created for managing thesaurus categories and lexemes. Before this meeting I ran through the plugin a few times myself and noted a number of things that needed updating or improving so I spent some time sorting those things out. The meeting itself went well and I think both Susan and Magda are now familiar enough with the interface to use it. I created a ‘to do’ list containing outstanding technical tasks for the project and I’ll need to work through all of these. For example, a big thing to add will be facilities to enable staff to upload lexemes to a cateogory through the WordPress interface via a spreadsheet. This will really help to populate the thesaurus.

So, that’s four major areas of work that I covered this week. I also met with Craig Lamont again to try once more to get a historical map working within the University’s T4 system. A helpful T4 expert in the University had updated the maximum number of characters a section could contain, which allowed use to upload all of the necessary JavaScript information to power the map (e.g. the leaflet.js library). After that we had a working map (of sorts) in the preview pane! However, none of the markers were appearing and the formatting was completely off. However, with a further bit of tweaking I managed to fix the formatting and get the markers to appear, so all is looking good!

I also spent a little time contributing to a Leverhulme bid application for Carole Hough and did a tiny amount of DSL work as well. I’m still no further with the Hansard visualisations though. Arts Support are going to supply me with a test server on which I should be able to install Bookworm, but I’m not sure when this is going to happen yet. I’ll chase this up on Monday.

Week Beginning 29th June 2015

I returned to work on Monday this week after being off sick on Thursday and Friday last week. It has been yet another busy week, the highlight of which was undoubtedly the launch of the Mapping Metaphor website. After many long but enjoyable months working on the project it is really wonderful to finally be able to link to the site. So here it is: http://www.glasgow.ac.uk/metaphor

I moved the site to its ‘live’ location on Monday and made a lot of last minute tweaks to the content over the course of the day, with everything done and dusted before the press release went out at midnight. We’ve had lots of great feedback about the site. There was a really great article on the Guardian website (which can currently be found here: http://www.theguardian.com/books/2015/jun/30/metaphor-map-charts-the-images-that-structure-our-thinking) plus we made the front page of the Herald. A couple of (thankfully minor) bugs were spotted after the launch but I managed to get those sorted on Wednesday. It’s been a very successful launch and it has been a wonderful project to have been a part of. I’m really pleased with how everything has turned out.

Other than Mapping Metaphor duties I split my time across a number of different projects. I continued working with the Thesaurus of Old English data and managed to get everything that I needed to do to the data completed. This included writing and executing a nice little script that added in the required UTF-8 length markers over vowels. Previously the data used an underscore after the vowel to note that it was a long one but with UTF-8 we can use proper length marks, so my script found words like ‘sæ_d’ and converted them all to words like ‘sǣd’. Much nicer.

I wrote and executed another script that added in all of the category cross references, and another one that checked all of the words with a ‘ge’ prefix. My final data processing script generated the search terms for the words, for example it identified word forms with brackets such as ‘eorþe(r)n’ and then generated multiple variant search words, in this case two – ‘eorþen’ and ‘eorþern’. This has resulted in a totally of 57,067 search terms for the 51,470 words we have in the database.

Once I’d completed work on the data, I spent a little bit of time on the front end for the new Thesaurus of Old English website. This is going to be structurally the same as the Historical Thesaurus of English website, just with a different colour scheme and logos. I created three different colour scheme mockups and have sent these to Fraser and Marc for consideration, plus I got the homepage working for the new site (still to be kept under wraps for now). This homepage has a working ‘random category’ feature, which shows that the underlying data is working very nicely. Next week I’ll continue with the site and hopefully will get the search and browse facilities completed.

I also returned to working on the Scots Thesaurus project this week. I spent about a day on a number of tasks, including separating out the data that originated in the paper Scots Thesaurus from the data that has been gathered directly from the DSL. I also finally got round to amalgamating the ‘tools’ database with the ‘Wordpress’ database. When I began working for the project I created a tool that enables researchers to bring together the data for the Historical Thesaurus of English and the data from the Dictionary of the Scots Language in order to populate Scots Thesaurus categories with words uncovered from the DSL. Following on from this I made a WordPress plugin through which thesaurus data could be managed and published. But until this week the two systems were using separate databases, which required me to periodically manually migrate the data from the tools database to the WordPress one.   I have now brought the two systems together, so it should now be possible to edit categories through the WordPress admin interface and for these updates to be reflected in the ‘tools’ interface. Similarly, any words added to categories through the ‘tools’ interface now automatically appear through the WordPress interface. I still need to fully integrate the ‘tools’ functionality with WordPress to we can get rid of the ‘tools’ system altogether, but it is much better having a unified database, even if there are still two interfaces on top of it. Other than these updates I also made a few tweaks to the public facing Scots Thesaurus website – adding in logos and things like that.

I also spent some time this week working on another WordPress plugin – this time for Gavin Miller’s Science Fiction and the Medical Humanities project. I’m creating the management scripts to allow him and his researchers to assemble a bibliographic database of materials relating to both Science Fiction and the Medical Humanities. I’ve got the underlying database created and the upload form completed. Next week I’ll get the upload form to actually upload its data. One handy thing I figured out whilst developing this plugin is how you can have multiple text areas that have the nice ‘WYSIWYG’ tools above them to enable people to add in formatted text. After lots of hunting around it turned out to be remarkably simple to incorporate, as this page explains: http://codex.wordpress.org/Function_Reference/wp_editor

The ‘scifimedhums’ website itself went live this week, so I can link to it here: http://scifimedhums.glasgow.ac.uk/

I was intending to continue with the Hansard data work this week as well. I had moved my 2 years of sample data (some 13 million rows) to my work PC and was all ready to get Bookworm up and running when I happened to notice that the software will not run on Windows (“Windows is out of the question” says the documentation). I contacted IT support to see if I could get command-line access to a server to get things working but I’m still waiting to see what they might be able to offer me.

I also managed to squeeze in a couple of meetings this week, both of which were ones that I needed to reschedule after being ill last week. The first was with Pauline Mackay to discuss an upcoming Burns project that I’m going to help write the technical components of. I can’t really go into further details about this for the time being. My second meeting was with Craig Lamont to go through the historical map he is creating for Murray Pittock’s Ramsay project. We tried again to get my Leaflet based sample map to work within the University’s T4 system, and although we made a bit of progress (we managed to get the CSS files uploaded as content blocks that could then be referenced by our page) we hit a brick wall when it came to the Javascript files, which were too large to be accepted as content blocks. We even tried uploading the files to the T4 media library, which appeared to work, but we just couldn’t figure out how to reference the uploaded files. Craig is going to contact the T4 people for advice. One thing we did manage to do was to get different coloured icons working with the map. Craig should now be able to create all of the required data and we can return to hosting issues and further functionality later.