It was a four-day week this week due to the Glasgow Fair holiday. I actually worked the Monday and took the Friday off instead, and this worked out quite well as it gave me a chance to continue development of the Scots Thesaurus before we had our team meeting on the Tuesday morning. I had previously circulated a ‘to do’ list that brought together all of the outstanding technical tasks for the project, with 5 items specifically to do with the management of thesaurus data via the WordPress interface. I’m happy to report that I managed to complete all of these items. This included adding facilities to enable words associated with a category to be deleted from the system (in actual fact the word records are simply marked as ‘inactive’ in the underlying database). This option makes it a lot easier for Magda to manage the category information. I also redeveloped the way sources and URLs as stored in the system. Previously each word could have one single source (either DOST or SND) and a single URL. I’ve updated this to enable a word to have any number of associated sources and URLs, and I’ve expanded the possible source list to include the paper Scots Thesaurus too. I could have updated the system to incorporate any number of sources but Susan thinks these three will be sufficient. Allowing multiple sources per word actually meant quite a lot of reworking of both the underlying database and the WordPress plugin I’m developing for the project, but it is all now working fine. I also updated the way connections to existing Historical Thesaurus of English categories are handled and added in an option that allows a CSV file containing words to be uploaded to a category via the WordPress admin interface. This last update should prove very useful to the people working on the project as it will enable them to compile lists of words in Excel and then upload them directly from this to a category in the online database. On Tuesday we had a team meeting for the project and I gave a demonstration of these new features and Magda is going to start using the system and will let me know if anything needs updated.
I spent a small amount of time this week updating the Burns website to incorporate new features that launched on the anniversary of Burns’ death on the 21st. These are an audio play about Burns forgeries (http://burnsc21.glasgow.ac.uk/burns-forgery/) and an online exhibition about the illustrations to George Thomson’s collections of songs (http://burnsc21.glasgow.ac.uk/the-illustrations-to-george-thomsons-collections/).
I continued working on the SAMUELS project this week, again trying to figure out how to get the Bookworm system working on the test server that Chris has set up for me. The script that imports the congress data into Bookworm that I left running last week successfully completed this week. The amount of data generated for this textual resource is rather large, with one of the tables consisting of over 42 million rows and another one taking up 22 million rows. I still need to figure out how this data is actually queried and used to power the Bookworm visualisations and the next step was to get the Bookworm API installed and running. The API connects to the database and allows the visualisation to query it. It’s written in Python and I spent rather a lot of time just trying to get Python scripts to execute via Apache on the server. This involved setting up a cgi-bin, ensuring Apache knows about it, where it is and it has the permissions to execute scripts stored there. I spent a rather frustrating few hours getting nothing but 403 Forbidden errors before realising that you had to explicitly give Apache rights to do things with the directory in the apache configuration file as well as updating file permissions. By the end of the week I still hadn’t managed to get Python files actually running – instead the browser just attempts to download the files. I need to continue with this next week, hopefully with the help of Chris McGlashan who was on holiday this week.
I spent the majority of the rest of the week working on the Old English version of the Metaphor Map, which we are intending to launch at the ISAS conference. This is a version of the Metaphor Map that features purely Old English related data and will sit alongside the main Mapping Metaphor website as a stand-alone interface. Here’s a summary of what I managed to complete this week:
- I’ve uploaded OE stage 5 and stage 4 data to new OE specific tables
- I identified some rows that included categories that no longer exist and following feedback from Ellen I deleted these (I think there were only 3 in total)
- I’ve replicated the existing site structure at the new OE URL and I’ve updated how the text for the ancillary pages is stored: It’s all now stored in one single PHP file which is then referenced by both the main and the OE ancillary pages. I’ve also put a check in all of the OE pages to see if OE specific text has been supplied and if so this is used instead of the main text. This should make it easier to manage all of the text.
- I’ve created a new purple colour scheme for the OE site, plus a new purple ‘M’ favicon (unfortunately it isn’t exactly the same as the green one so I might update this)
- I’ve expanded the top bar to incorporate tabs for switching from the OE map to the main one. These are currently positioned to the left of the bar in a similar way to how the Scots Corpus links to CMSW and back work.
- The visualisation / table / card views are all now working with the OE data. Timeline has been removed as this is no longer applicable (all metaphors are OE with no specific date).
- Search and browse are also now working with the OE data.
- All reference to first dates and first lexemes has been removed, e.g. from metaphor cards, columns in the tabular view, the search options
- The metaphor card heading now says ‘OE Metaphor’ and then a number, just in case people notice the same number is used for different connections in the OE / non-OE sites.
- The text ‘(from OE to present day)’ has been added to the lexeme info in the metaphor cards.
- Where a metaphorical connection between two categories also exists in the non-OE data, a link is added to the bottom of the metaphor card with text ‘View this connection in the main metaphor map’. Pressing on this opens the nonOE map in a new tab with the visualisation showing category 1 and the connection to category 2 highlighted. The check for the existence of the connection in the non-OE data ignores strength and presents the nonOE map with both strong and weak visible. This is so that if (for example) the OE connection is weak but the main connection is strong you can still jump from one to the other.
- I’ve updated the category database to add a new column ‘OE categories completed’. The OE categories completed page will list all categories where this is set to ‘y’ (none currently)
- I’ve created staff pages to allow OE data to be managed by project staff.
Next week I’ll receive some further data to upload and after that we should be pretty much ready to launch.
As usual, when it came to verify the iOS app using distribution certificates and provisioning profiles I got a bunch of errors in XCode. It took ages to get to the bottom of these (it was because the distribution certificate that existed for the University of Glasgow account had been generated by someone in MVLS in May and had been downloaded onto her Mac and I didn’t have a copy of it and you can’t just re-download the certificate, you have to make a new one and then associate it with the provisioning profile and download and install the certificates on your Mac and then ensure you close and reopen XCode for the changes to be registered. Ugh). However, I did finally manage to get the app uploaded to iTunes Connect and I have now submitted it for review. Hopefully it will be approved within the next two weeks.
The process of signing the Android version of the app was less arduous but still took a fair amount of time to finally get right. I must remember to follow these instructions next time: http://developer.android.com/tools/publishing/app-signing.html#signing-manually (although it seems as if help pages relating to app development stop working after a few months when new OS versions get released).
Now I’ve completed the process of creating a STELLA app for Android I really need to get around to updating the three existing apps for iOS and creating Android versions of them too. It shouldn’t be too difficult to do so, but it will take some time and I’m afraid I just don’t have the time at the moment due to all of the other work I need to do.
One of the other major outstanding projects I’m currently working on is the Samuels project, specifically trying to get some visualisations of the semantically tagged Hansard data working. We are using the Bookworm interface for this (see http://bookworm.culturomics.org/) and I’ve been trying to get the code for this working with our data for a while now. It turned out that I needed access to a Linux box to get the software installed and last week Chris McGlashan helpfully said he’d be able to set up a test server for me. On Monday this week he popped by with the box, which is now plugged in and working away in my office. After some initial problems getting through the University proxy I managed to download a handy script that installs all of the components that Bookworm requires (see https://github.com/Bookworm-project/FreshInstallationScript). I then downloaded the Congress data that is used as test data in the documentation (see https://github.com/econpy/congress_api) and followed the steps required to set this up. There were a couple of problems with this that were caused by my test server not having some required zip software installed, but after getting over this I had access to the data. The script that then imports this data into Bookworm is currently running, so I will need to wait and see how that works out before I can proceed further.
Another ongoing project is the Thesaurus of Old English. I’d managed to complete a first version of the new website for this project last week and I’d received some feedback from a couple of people since then and I updated the interface and functionality as a result of this as follows:
- I’ve added some explanatory text to the ‘quick search’ so it’s clearer that you can search for either Old English words or Modern English category headings using the facility. I’ve also included a sentence about wildcards too. This appears both on the homepage and the ‘quick search’ tab of the search page.
- I’ve updated the labels in the ‘advanced search’ tab so ‘word’ now says ‘Old English word’ and ‘category’ now says ‘Modern English words in Category Heading’ to make things clearer.
- I’ve updated the way the search queries for category headings when asterisks are not supplied, both in the ‘quick search’ and in the ‘category’ box of the ‘advanced search’. Now if a user enters a word (e.g. ‘love’) the script searches for all categories where this full word appears – either as the full category name (‘love’) or beginning the category name (‘love from the heart’), the end of the category name (‘great love’) or somewhere in the middle (no example in ‘love’ but e.g. search for ‘fall’: ‘Shower, fall of rain’). It doesn’t bring back any categories where the string is part of a longer word (e.g. ‘lovely, beautiful, fair’ is not found). If the user supplies asterisks at the beginning and/or end of the search term then the search just matches the characters as before.
- I’ve also updated the appearance of the flags that appear next to words. These are now smaller and less bold and I’ve also added in tool-tip help for when a user hovers over them.
What with all of the above going on I didn’t get a chance to do any further work on the Scots Thesaurus or the Sci-Fi Med Hums database, but hopefully I’ll be able to work on these next week. Having said that, I’ve also now received the first set of data for the Old English Metaphor Map, and this also has to ‘go live’ before ISAS at the start of August, so I may have to divert a lot of time to this over the next couple of weeks.
I continued working on the new website for the Thesaurus of Old English (TOE) this week, which took up a couple of days in total. Whilst working on the front end I noticed that the structure of TOE is different to that of the Historical Thesaurus of English (HTE) in an important way: There are never any categories with the same number but a different part of speech. With HTE the user can jump ‘sideways’ in the hierarchy from one part of speech to another and then browse up and down the hierarchy for that part of speech, but with TOE there is no ‘sideways’ – for example if there is an adjective category that could be seen as related to a noun category at the same level these categories are given different numbers. This difference meant that plugging the TOE data into the functions I’d created for the HTE website just didn’t work very well as there were just too many holes in the hierarchy when part of speech was taken into consideration.
The solution to the problem was to update the code to ignore part of speech. I checked that there were indeed no main categories with the same number but a different part of speech (a little script I wrote confirmed this to be the case) and then updated all of the functions that generated the hierarchy, the subcategories and other search and browse features to ignore part of speech, but instead to place the part of speech beside the category heading wherever category headings appear (e.g. in the ‘browse down’ section or the list of subcategories). This approach seems to have worked out rather well and the thesaurus hierarchy is now considerably more traversable.
I managed to complete a first version of the new website for TOE, with all required functionality in place. This includes both quick and advanced searches, category selection, the view category page and some placeholder ancillary pages. At Fraser’s request I also added in the facility to search with vowel length marks. This required creating another column in the ‘lexeme search words’ table with a stricter collation setting that ensures a search involving a length mark (e.g. ‘sǣd’) only finds words that feature the length mark (e.g. ‘sæd’ would not be found). I added an option to the advanced search field allowing the user to say whether they cared about length marks or not. The default is not, but I’m sure a certain kind of individual will be very keen on searching with length marks. If this option is selected the ‘special characters’ buttons expand to include all of the vowels with length marks, thus enabling the user to construct the required form. It will be useful for people who want to find out (for example) all of the words in the thesaurus that end in ‘*ēn’ (41) as opposed to all of those words that end in ‘*en’ disregarding length marks (1546).
I think we’re well on track to have the new TOE launched before the ISAS conference at the beginning of next month, which is great.
I continued working on the Scots Thesaurus project this week as well. I met with Susan and Magda on Tuesday to talk them through using the WordPress plugin I’d created for managing thesaurus categories and lexemes. Before this meeting I ran through the plugin a few times myself and noted a number of things that needed updating or improving so I spent some time sorting those things out. The meeting itself went well and I think both Susan and Magda are now familiar enough with the interface to use it. I created a ‘to do’ list containing outstanding technical tasks for the project and I’ll need to work through all of these. For example, a big thing to add will be facilities to enable staff to upload lexemes to a cateogory through the WordPress interface via a spreadsheet. This will really help to populate the thesaurus.
I also spent a little time contributing to a Leverhulme bid application for Carole Hough and did a tiny amount of DSL work as well. I’m still no further with the Hansard visualisations though. Arts Support are going to supply me with a test server on which I should be able to install Bookworm, but I’m not sure when this is going to happen yet. I’ll chase this up on Monday.
I returned to work on Monday this week after being off sick on Thursday and Friday last week. It has been yet another busy week, the highlight of which was undoubtedly the launch of the Mapping Metaphor website. After many long but enjoyable months working on the project it is really wonderful to finally be able to link to the site. So here it is: http://www.glasgow.ac.uk/metaphor
I moved the site to its ‘live’ location on Monday and made a lot of last minute tweaks to the content over the course of the day, with everything done and dusted before the press release went out at midnight. We’ve had lots of great feedback about the site. There was a really great article on the Guardian website (which can currently be found here: http://www.theguardian.com/books/2015/jun/30/metaphor-map-charts-the-images-that-structure-our-thinking) plus we made the front page of the Herald. A couple of (thankfully minor) bugs were spotted after the launch but I managed to get those sorted on Wednesday. It’s been a very successful launch and it has been a wonderful project to have been a part of. I’m really pleased with how everything has turned out.
Other than Mapping Metaphor duties I split my time across a number of different projects. I continued working with the Thesaurus of Old English data and managed to get everything that I needed to do to the data completed. This included writing and executing a nice little script that added in the required UTF-8 length markers over vowels. Previously the data used an underscore after the vowel to note that it was a long one but with UTF-8 we can use proper length marks, so my script found words like ‘sæ_d’ and converted them all to words like ‘sǣd’. Much nicer.
I wrote and executed another script that added in all of the category cross references, and another one that checked all of the words with a ‘ge’ prefix. My final data processing script generated the search terms for the words, for example it identified word forms with brackets such as ‘eorþe(r)n’ and then generated multiple variant search words, in this case two – ‘eorþen’ and ‘eorþern’. This has resulted in a totally of 57,067 search terms for the 51,470 words we have in the database.
Once I’d completed work on the data, I spent a little bit of time on the front end for the new Thesaurus of Old English website. This is going to be structurally the same as the Historical Thesaurus of English website, just with a different colour scheme and logos. I created three different colour scheme mockups and have sent these to Fraser and Marc for consideration, plus I got the homepage working for the new site (still to be kept under wraps for now). This homepage has a working ‘random category’ feature, which shows that the underlying data is working very nicely. Next week I’ll continue with the site and hopefully will get the search and browse facilities completed.
I also returned to working on the Scots Thesaurus project this week. I spent about a day on a number of tasks, including separating out the data that originated in the paper Scots Thesaurus from the data that has been gathered directly from the DSL. I also finally got round to amalgamating the ‘tools’ database with the ‘Wordpress’ database. When I began working for the project I created a tool that enables researchers to bring together the data for the Historical Thesaurus of English and the data from the Dictionary of the Scots Language in order to populate Scots Thesaurus categories with words uncovered from the DSL. Following on from this I made a WordPress plugin through which thesaurus data could be managed and published. But until this week the two systems were using separate databases, which required me to periodically manually migrate the data from the tools database to the WordPress one. I have now brought the two systems together, so it should now be possible to edit categories through the WordPress admin interface and for these updates to be reflected in the ‘tools’ interface. Similarly, any words added to categories through the ‘tools’ interface now automatically appear through the WordPress interface. I still need to fully integrate the ‘tools’ functionality with WordPress to we can get rid of the ‘tools’ system altogether, but it is much better having a unified database, even if there are still two interfaces on top of it. Other than these updates I also made a few tweaks to the public facing Scots Thesaurus website – adding in logos and things like that.
I also spent some time this week working on another WordPress plugin – this time for Gavin Miller’s Science Fiction and the Medical Humanities project. I’m creating the management scripts to allow him and his researchers to assemble a bibliographic database of materials relating to both Science Fiction and the Medical Humanities. I’ve got the underlying database created and the upload form completed. Next week I’ll get the upload form to actually upload its data. One handy thing I figured out whilst developing this plugin is how you can have multiple text areas that have the nice ‘WYSIWYG’ tools above them to enable people to add in formatted text. After lots of hunting around it turned out to be remarkably simple to incorporate, as this page explains: http://codex.wordpress.org/Function_Reference/wp_editor
The ‘scifimedhums’ website itself went live this week, so I can link to it here: http://scifimedhums.glasgow.ac.uk/
I was intending to continue with the Hansard data work this week as well. I had moved my 2 years of sample data (some 13 million rows) to my work PC and was all ready to get Bookworm up and running when I happened to notice that the software will not run on Windows (“Windows is out of the question” says the documentation). I contacted IT support to see if I could get command-line access to a server to get things working but I’m still waiting to see what they might be able to offer me.