Week Beginning 23rd October 2017

After an enjoyable week’s holiday I returned to work on Monday, spending quite a bit of Monday catching up with some issues people had emailed me about whilst I was away, such as making further tweaks to the ‘Concise Scots Dictionary’ page on the DSL website for Rhona Alcorn (the page is now live if you’d like to order the book: http://dsl.ac.uk/concise-scots-dictionary/), speaking with Luca about a project he’s involved in the planning of that’s going to use some of the DSL data, helping Carolyn Jess-Cooke with some issues she was encountering when accessing one of her websites, giving some information to Brianna of the RNSN project about timeline tools we might use, and a few other such things.

I also spent some time adding paragraph IDs to the ‘Scots Language’ page of the DSL (http://dsl.ac.uk/about-scots/the-scots-language/) for Ann Fergusson to enable references to specific paragraphs to be embedded in other pages.  Implementing this was somewhat complicated by the ‘floating’ contents section on the left as when a ‘hash’ is included in a URL a browser automatically jumps to the ID of the element that has this value.  But for the contents section to float or be fixed to the top of the page depending on which section the user is viewing the page needs to load at the top for the position to be calculated.  If the page loads halfway down then the contents section remains fixed at the top of the page, which is not much use.  However, I managed to get the ‘jump to paragraph from a URL’ feature working with the floating contents section now with a bit of a hack.  Basically, I’ve made it so that the ‘hash’ that gets passed to the page doesn’t actually correspond to an element on the page, so the browser doesn’t jump anywhere.  But my JavaScript grabs the hash after the page has loaded, reworks it to a format that does match an actual element and then smoothly scrolls to this element.   I’ve tested this in Firefox, Chrome, Internet Explorer and Edge and it works pretty well.

I had a couple of queries from Wendy Anderson this week.  The first was for Mapping Metaphor.  Wendy wanted to grab all of the bidirectional metaphors in both the main and OE datasets, including all of their sample lexemes.  I wrote a script that extracted the required data and formatted it as a CSV file, which is just the sort of thing she wanted.  The second query was for all of the metadata associated with the Corpus of Modern Scots Writing texts.  A researcher had contacted Wendy to ask for a copy but although the metadata is in the database and can be viewed on a per text basis through the website, we didn’t have the complete dataset in an easy to share format.  I wrote a little script that queried the database and retrieved all of the data.  I had to do a little digging into how the database was structure in order to do this, as it is a system that wasn’t developed by me.  However, after a little bit of exploration I managed to write a script that grabbed the data about each text, including multiple authors that can be associated with each text.  I then formatted this as a CSV file and sent the outputted file to Wendy.

I met with Gary on Monday to discuss some changes to the SCOSYA atlas and CMS that he wanted me to implement ahead of an event the team are at next week.  This included adding Google Analytics to the website, updating the legend of the Atlas to make it clearer what the different rating levels meant, separating out the grey squares (which mean no data is present) and the grey circles (meaning data is present but doesn’t meet the specified criteria) into separate layers so they can be switched on and off independently of each other, making the map markers a little smaller, and adding in facilities to allow Gary to delete codes, attributes and code parents via the CMS.  This all took a fair amount of time to implement, and unfortunately I lost a lot of time on Thursday due to a very strange situation with my access to the server.

I work from home on Thursdays and I had intended to work on the ‘delete’ facilities that day, but when I came to log into the server the files and the database appeared to have reverted back to the state they were in in May – i.e. it looked like we had lost almost six months of data, plus all of the updates to the code I’d implemented during this time.  This was obviously rather worrying and I spent a lot of time toing and froing with Arts IT Support to try and figure out what had gone wrong.  This included restoring a backup from the weekend before, which strangely still seemed to reflect the state of things in May.  I was getting very concerned about this when Gary noted that he was seeing two different views of the data on his laptop.  In Safari on his laptop his view of the data appeared to have ‘stuck’ at May while in Chrome he could see the up to date dataset.  I then realised that perhaps the issue wasn’t with the server after all but instead the problem was my home PC (and Safari on Gary’s laptop) was connecting to the wrong server.  Arts IT Support’s Raymond Brasas suggested it might be an issue with my ‘hosts’ file and that’s when I realised what had happened.  As the SCOSYA domain is an ‘ac.uk’ domain and it takes a while for these domains to be set up, we had set up the server long before the domain was running, so to allow me to access the server I had added a line to the ‘hosts’ file on my PC to override what happens when the SCOSYA URL is requested.  Instead of it being resolved by a domain name service my PC pointed at the IP address of the server as I had entered it in my ‘hosts’ file.  Now in May, the SCOSYA site was moved to a new server, with a new IP address, but the old server had never been switched off, so my home PC was still connecting to this old server.  I had only encountered the issue this week because I hadn’t worked on SCOSYA from home since May.  So, it turned out there was no problem with the server, or the SCOSYA data.  I removed the line from my ‘hosts’ file, restarted my browser and immediately I could access the up to date site.  All this took several hours of worry and stress, but it was quite a relief to actually figure out what the issue was and to be able to sort it.

I had intended to start setting up the server for the SPADE project this week, but the machine has not yet been delivered, so I couldn’t work on this.  I did make a few further tweaks to the SPADE website, however, and responded to a couple of queries from Rachel about the SCOTS data and metadata, which the project will be using.

I also met with Fraser to discuss the ongoing issue of linking up the HT and OED data.  We’re at the stage now where we can think about linking up the actual words with categories.  I’d previously written a script that goes through each HT category that matches an OED category and compares the words in each, checking whether an HT word matches the next found in either the OED ‘ght_lemma’ or ‘lemma’ fields.  After our meeting I updated the HT lexeme table to include extra fields for the ID of a matching OED lexeme and whether the lexeme had been checked.  After that I updated the script to go through every matching category in order to ‘tick off’ the matching words within.  The first time I ran my script it crashed the browser, but with a bit of tweaking I got it to successfully complete the second time.  Here are some stats:

There are 655513 HT lexemes that are now matched up with an OED lexeme.  There are 47074 HT lexemes that only have OE forms, so with 793733 HT lexemes in total this means there are 91146 HT lexemes that should have an OED match but don’t.  Note, however, that we still have 12373 HT categories that don’t match OED categories and these categories contain a total of 25772 lexemes.

On the OED side of things, we have a total of 688817 lexemes, and of these 655513 now match an HT lexeme, meaning there are 33304 OED lexemes that don’t match anything.  At least some of these will also be cleared up by future HT / OED category matches.  Of the 655513 OED lexemes that now match, 243521 of them are ‘revised’.  There are 262453 ‘revised’ OED lexemes in total, meaning there are 18932 ‘revised’ lexemes that don’t currently match an HT lexeme.  I think this is all pretty encouraging as it looks like my script has managed to match up bulk of the data.  It’s just the several thousand edge cases that are going to be a bit more work.

On Wednesday I met with Thomas Widmann of Scots Language Dictionaries to discuss our plans to merge all three of the SLD websites (DSL, SLD and Scuilwab) into one resource that will have the DSL website’s overall look and feel.  We’re going to use WordPress as a CMS for all of the site other than the DSL’s dictionary pages, so as to allow SLD staff to very easily update the content of the site.  It’s going to take a bit of time to migrate things across (e.g. making a new WordPress theme based on the DSL website, create quick search widgets, updating the DSL dictionary pages to work with the WordPress theme), but we now have the basis of a plan.  I’ll try to get started on this before the year is out.

Finally this week, I responded to a request from Simon Taylor to make a few updates to the REELS system, and I replied to Thomas Clancy about how we might use existing Ordinance Survey data in the Scottish Place-Names survey.  All in all it has been a very busy week.

Week Beginning 9th October 2017

It was another week of working on fairly small tasks for lots of different projects.  I helped Gerry McKeever to put the finishing touches to his new project website, and this has now gone live and can be accessed here:  http://regionalromanticism.glasgow.ac.uk/.  I also spent some further time making updates to the Burns Paper Database website for Ronnie Young.  This included adding in a site menu to facilitate navigation, adding a subheader to the banner, creating new pages for ‘about’ and ‘contact’, adding some new content, making repositories appear with their full names rather than acronyms, updating the layout of the record page and tweaking how the image pop-up works. It’s all pretty much done and dusted now, although I can’t share the URL as the site is password protected due to the manuscript images being under copyright restrictions.

I spent about a day this week on AHRC review duties and also spent some time working on the new interface for Kirsteen McCue’s ‘Romantic National Song Network’ project website.  This took up a fair amount of time as I had to try out a few different designs, work with lots of potential images, set up a carousel, and experiment with fonts for the site header.  I’m pretty pleased with how things are looking now, although there are four different font styles that we still need to choose one from.

I had a couple of conference calls and a meeting with Marc and Fraser about the Linguistic DNA project.  I met with Marc and Fraser first, in order to go over the work Fraser is currently doing and how my involvement in the project might proceed.  Fraser and I then had a Skype call with Iona and Seth in Sheffield about the work the researchers are currently doing and some of the issues they are encountering when dealing with the massive dataset they’re working with.  After the call Fraser sent me a sample of the data, which really helped me to understand some of the technical issues that are cropping up.  On Friday afternoon the whole project had a Skype call.  This included the DHI people in Sheffield and it was useful to hear something about the technical work they are currently doing.

I had a couple of other meetings this week too.  On Wednesday morning I had a meeting with Jennifer Smith about a new pilot project she’s putting together in order to record Scots usage in schools.  We talked through a variety of technical solutions and I was able to give some advice on how the project might be managed from a technical point of view.  On Wednesday afternoon I had a meeting for The People’s Voice project, at which I met with new project RA, who has taken over from Michael Shaw as he’s now moved to a different institution.  I helped the new RA get up to speed with the database and how to update the front-end.

Also this week I had an email conversation with the SPADE people about how we will set up a server for the project’s infrastructure at Glasgow.  I’m going to be working on this the week after next.  I also made a few further updates to the DSL website and had a chat with Thomas Widmann about a potential reworking of some of the SLD’s websites.

There’s not a huge amount more to say about the work I did this week.  I was feeling rather unwell all week and it was a bit of a struggle getting through some days during the middle of the week, but I made it through to the end.  I’m on holiday all of next week so there won’t be an update from me until the week after.

Week Beginning 2nd October 2017

This was another week of doing lots of fairly small bits of work for many different projects.  I was involved in some discussions about some possible updates to websites with Scottish Language Dictionaries, and created a new version of a page for the Concise Scots Dictionary for them.  I also made a couple of minor tweaks to a DSL page for them as well.

For the Edinburgh Gazetteer project I added in all of the ancillary material that Rhona Brown had sent me, added in some new logos, set up a couple of new pages and made a couple of final tweaks to the Gazetteer and reform societies map pages.  The site is now live and can be accessed here: http://edinburghgazetteer.glasgow.ac.uk/

I also read through the Case for Support for Thomas Clancy’s project proposal and made a couple of updated to the Technical Plan based on this, and I spent some time reading over the applications for a post that I’m on the interview panel for.  I also spent a bit more time on the Burns Paper Database project.  There were some issues with the filenames of the images used.  Some included apostrophes and ampersands, which meant the images wouldn’t load on the server.  I decided to write a little script to rename all of the images in a more uniform way, while keeping a reference to the original filenames in the database for display and for future imports.  It took a bit of time to get this sorted but the images work a lot better now.

I also had a chat with Gary Thoms about the SCOSYA Atlas.  This is a project I’ve not worked on recently as the team are focussing on getting all of the data together and are not so bothered about the development of the Atlas.  However, Gary will be teaching a class in a few weeks and he wanted to let the students try out the Atlas.  As the only available version can be accessed by the project team once they have logged into the content management system he wondered whether I could make a limited access guest account for students.  I suggested that instead of this I could create a version that is publicly accessible and is not part of the CMS at all.  Gary agreed with this so I spent some time creating this version.  To do so I had to strip out the CMS specific bits of code from the Atlas (there weren’t many such bits as I’d designed it to be easily extractable) and create a new, publicly accessible page for it to reside in.  I also had to update some of the JavaScript that powers the Atlas to cut back on certain aspects of functionality – e.g. to disable the feature that allows users to upload their own map data for display on the map and to ensure that links through to the full questionnaire details don’t appear.  With this done the new version of the Atlas worked perfectly.  I can’t put the URL here, though, as it’s still just a work in progress and the URL will be taken down once the students have had a go with the Atlas.

I met with Fraser on Wednesday to get back into the whole issue of merging the new OED data with the HT data.  It had been a few months since either of us had looked at the issues relating to this, so it took a bit of time to get back up to speed with things.  The outcome of our meeting was that I would create three new scripts.  The first would find all of the categories where there was no ‘oedmaincat’ and the part of speech was not a noun.  The script would then check to see whether there was a noun at the same level and if so grab its ‘oedmaincat’ and then see if this matched anything in the OED data for the given part of speech.  This managed to match up a further 183 categories that weren’t previously matched so we could tick these off.  The second script generated a CSV for Fraser to use that ordered unmatched categories by size.  This is going to be helpful for manual checking and it thankfully demonstrated that of the more than 12,000 non-matched categories only about 750 have more than 5 words in them.  The final script was an update to the ‘all the non-matches’ script that added in counts of the number of words within the non-matching HT and OED categories.  It’s now down to Fraser and some assistants to manually go through things.

I did some further work for the SPADE project this week, extracting some information about the SCOTS corpus.  I wrote a script that queries the SCOTS database and pulls out some summary information about the audio recordings.  For each audio recording the ID, title, year recorded and duration in minutes are listed.  Details for each participant (there are between 1 and 6) are also listed:  ID, Gender, decade of birth (this is the only data about the age of the person that there is), place of birth and occupation (there is no data about ‘class’).  This information appears in a table.  Beneath this I also added some tallies:  the total number of recordings, the total duration, the number of unique speakers (as a speaker can appear in multiple recordings) and a breakdown of how many of these are male, female or not specified.  Hopefully this will be of use to the project.

Finally, I had a meeting with Kirsteen McCue and project RA Brianna Robertson-Kirkland about the Romantic National Song Network project.  We discussed potential updates to the project website, how it would be structured, how the song features might work and other such matters.  I’m intending to produce a new version of the website next week.


Week Beginning 25th September 2017

My time this week was split amongst many different projects.  I continued to work on the Burns Paper Database, setting up a proper subdomain for the project and creating a more unified interface for the site, which previously just used styles taken from pervious sites that I had borrowed functionality from.  I think my work on this website is now pretty much complete and it’s been a useful experience, especially working with image pan and zoom libraries, which I will no doubt make further use of in future projects.  It’s a shame I can’t share the URL, though, as the site needs to be password protected due to the use of high resolution copyrighted images.

I had a chat with Chris McGlashan this week about maybe migrating the project websites to HTTPS rather than just using HTTP.  This week the main University website was being migrated over to this more secure, encrypted protocol for accessing web pages, and I wondered whether all of the project websites that exist as subdomains of the main University URL could also maybe make use of the main site’s SSL certificate.  This would be good because we have lots of log-in forms for accessing content management systems and as these all submit data using non-encrypted HTTP that data could be intercepted.  Browsers these days are also flagging up ‘insecure’ forms, which makes our sites look bad.  However, the University’s IT people have advised against migrating to HTTPS for a couple of reasons.  Firstly, we couldn’t just use the certificate from the main site (well, technically we might have been able to but from a security point of view this would be a bad idea as this certificate is used for finance sites and such things).  This would mean we’d have to pay for our own certificates and to projects generally don’t have the funds for that.  Secondly, most of the data we deal with is considered ‘low risk’, and therefore doesn’t warrant an SSL certificate.  So, we’re keeping things as they are, for the time being at least.

I spent a bit of time this week reworking the site design for Gerry McKeever’s Regional Romanticism project, as he wasn’t too keen on the design I’d previously created.  The new design looks a lot better, and he seems happy with it, so all is well there.  I also spent quite a bit of time this week working with Rachel MacDonald on the interface for the SPADE website.  I created an initial website for the project many weeks ago but just left it with a placeholder interface, but this week I implemented a proper interface, which I think looks pretty good.  I just need to wait for feedback from the project team now, though.  Neither of these websites is officially live yet, so I can’t share the URLS for them.

Also this week I had a chat with the DSL people about a new page they want me to make on the website.  I also created a new version of the Technical Plan for Thomas Clancy’s Iona project, based on feedback, and also created the Technical Summary paragraph for the main part of the proposal.  I spent a bit of time following up on a task for my PDR too, and responded for a request for me to be on another interview panel.

I also returned to some Historical Thesaurus duties.  A couple of weeks ago I was alerted to the existence of a non-noun category that didn’t have a noun category at the same level.  This meant the category didn’t appear within the new ‘tree browse’ interface and neither it nor its child categories could be found by browsing.  This issue was fixed by creating a new empty noun category.  I wondered whether there might be any other similar categories in the database, so this week I wrote a little script to check.  It turns out that there are similar categories, but thankfully not too many – between 20 and 30, in fact.  After identifying these I asked Fraser to check what the headings of the empty noun categories should be, and once I heard back from him I created them, meaning all of the previously inaccessible categories can now be found.  There may be more HT stuff to come back to next week, but we’ll see what is sent my way.

Week Beginning 18th September 2017

On Monday this week I spent a bit of time creating a new version of the MetaphorIC app, featuring the ‘final’ dataset from Mapping Metaphor.  This new version features almost 12,000 metaphorical connections between categories and more than 30,000 examples of metaphor.  Although the creation of the iOS version went perfectly smoothly (this time), I ran into some difficulties updating the Android app as the build process started giving me some unexplained errors.  I eventually tried dropping the Android app in order to rebuild it, but that didn’t work either and unfortunately dropping the app also deleted its icon files.  After that I had to build the app in a new location, which thankfully worked.  Also thankfully I still had the source files for the icons so I could create them again.  There’s always something that doesn’t go smoothly when publishing apps.  The new version of the app was made available on the Apple App and Google Play stores by the end of the week and you can download either version by following the links here: http://mappingmetaphor.arts.gla.ac.uk/metaphoric/.  That’s Mapping Metaphor and its follow-on project MetaphorIC completely finished now, other than the occasional tweak that will no doubt be required.

I spent the bulk of the rest of the week working on the Burns Paper Database for Ronnie Young.  Last week I started looking at the Access version of the database that Ronnie had sent me, and I’m managed to make an initial version of a MySQL database to hold the data and I created an upload script that populated this table with the data via a CSV file.  This week I met with Ronnie to discuss how to take the project further.  We agreed that rather than having an online content management system through which Ronnie would continue to update the database, he would instead continue to use his Access version and I would then run this through my ‘import’ script to replace the old online version whenever updates are required.  This is a more efficient approach as I already have an upload script and Ronnie is already used to working with his Access database.

We went through the data together and worked out which fields would need to be searchable and browseable, and how the data should be presented.  This was particularly useful as there are some consistency issues with the data, for example in how uncertain dates are recorded, which may include square brackets, asterisks, question marks, the use of ‘or’ and also date ranges.

After the meeting I set to work creating an updated structure for the database and an updated ‘import’ script that would enable the extraction and storage of the data required for search purposes.  This included creating separate tables for year searches, manuscript types, watermarks and countermarks, and also images of both the documents and the watermarks.  It took quite some time to get the import script working properly, but now that it is in place I will be able to run any updated version of the data through this in order to create a new online version.  With this in place I set to work on the actual pages for searching and browsing, viewing results and viewing an individual record.  Much of this I managed to repurpose from my previous work on The People’s Voice database of poems, which helped speed things up considerably.  The biggest issue I encountered was with working with the images of the manuscript pages.  The project contains over 1200 high-resolution images that Ronnie wants users to be able to zoom into and pan around.  In order to work with these images I had to batch process the creation of thumbnails and also the renaming of the images, as they had a mixture of upper and lower case file extensions, which causes problems for case sensitive servers.  I then had to decide on a library that would provide the required zoom and pan functionality.  Previously I’ve used OpenLayers, but this requires large images to be split into tiles, and I didn’t want to have to do this.  Instead I looked at some other JavaScript libraries.  What I really wanted was a ‘google maps’ style interface that allowed multiple levels of zoom.  Unfortunately most libraries didn’t seem to offer this.  I found one called ‘jQuery Panzoom’ (http://timmywil.github.io/jquery.panzoom/demo/) that fitted the bill, and I tried working with this for a while.  Unfortunately, my images were all very large and the pane they will be viewed in is considerably smaller, and it didn’t seem very straightforward to reposition the zoomed image so that it actually appeared visible in the pane when zoomed out by default.  Instead I tried another library called magnifier.js (http://mark-rolich.github.io/Magnifier.js/) that can be set up to have a thumbnail navigation window and a larger main window.  I spent quite a bit of time working with this library and thought everything was going to work out perfectly, but then I encountered a bug:  If you manually set the dimensions of the pane in which the zoomed in image appears and these dimensions are different to the image then the zoomed in image is distorted to fit the pane.  After investigating this issue I discovered it had been raised by someone in 2014 and had not been addressed (see https://github.com/mark-rolich/Magnifier.js/issues/4).  As a distorted image was no good I had to look elsewhere once again.  My third attempt was using the ‘Elevate Zoom’ plugin (http://www.elevateweb.co.uk/image-zoom/examples).  Thankfully I managed to get this working.  It also can be set up to have a thumbnail navigation window and then a larger pane for viewing the zoomed in image.  It can also be set up to use the mouse wheel to zoom in and out, which is ideal.  The only downside is without physical zoom controls there’s no way to zoom in and out when using a touchscreen device.  But as it’s still possible to view the full image at one zoom level I think this is good enough.  By the end of the week I had pretty much completed the online database and I emailed the details to Ronnie for feedback.

Other than the above I also did a little bit of work for the SPADE project, beginning to create a proper interface for the website with Rachel MacDonald, and I had a further chat with Gerry McKeever regarding the website for his new project.

Week Beginning 4th September 2017

I spent a lot of this week continuing with the redevelopment of the ARIES app and thankfully after laying the groundwork last week (e.g. working out the styles and the structure, implementing a couple of exercise types) my rate of progress this week was considerably improved.  In fact, by the end of the week I had added in all of the content and had completed an initial version of the web version of the app.  This included adding in some new quiz types, such as one that allows the user to reorder the sentences in a paragraph by dragging and dropping them, and also a simple multiple choice style quiz.  I also received some very useful feedback from members of the project team and made a number of refinements to the content based on this.

This included updating the punctuation quiz so that if you get three incorrect answers in a quiz a ‘show answer’ button is displayed.  Clicking on this puts in all of the answers and shows the ‘well done’ box.  This was rather tricky to implement as the script needed to reset the question, including removing all previous answers, ticks, and resetting the initial letter case as if you select a full stop the following letter is automatically capitalised.  I also implemented a workaround for answers where a space is acceptable.  These no longer count towards the final tally of correct answers, so leaving a space rather than selecting a comma can now result in the ‘well done’ message being displayed.  Again, this was rather tricky to implement and it would be good if you could test out this quiz thoroughly to make sure there aren’t any occasions where the quiz breaks.

I also improved navigation throughout the app.  I added ‘next’ buttons to all of the quizzes, which either take you to the next section, or to the next part of the quiz, as applicable.  I think this works much better than just having the option to return to the page the quiz was linked from.  I also added in a ‘hamburger’ button to the footer of every page within a section.  Pressing on this takes you to the section’s contents page, and I added ‘next’ and ‘previous’ buttons to the contents pages too, so you can navigate between sections without having to go back to the homepage.

I spent a bit of time fixing the drag / drop quizzes so that the draggable boxes were constrained to each exercise’s boundaries.  This seemed to work great until I got to the references quiz, which has quite long sections of draggable text.  With the constraint in place it became impossible for the part of the draggable button that triggers the drop to reach the boxes nearest the boundaries of the question as none of the button could pass the borders.  So rather annoyingly I had to remove this feature and just allow people to drag the buttons all over the page.  But dropping a button from one question into another will always give you an incorrect answer now, so it’s not too big a problem.

With all of this in place I’ll start working on the app version of the resource next week and will hopefully be able to submit it to the app stores by the end of the week, all being well.

In addition to my work on ARIES, I completed some other tasks for a number of other projects.  For Mapping Metaphor I created a couple of scripts for Wendy that output some statistics about the metaphorical connections in the data.  For the Thesaurus of Old English I created a facility to enable staff to create new categories and subcatetories (previously it was only possible to edit existing categories or add / edit / remove words from existing categories).  I met with Nigel Leask and some of the Curious Travellers team on Friday to discuss some details for a new post associated with this project.  I had an email discussion with Ronnie Young about the Burns database he wants me to make an online version of.  I also met with Jane Stuart-Smith and Rachel MacDonald, who is the new project RA for the SPADE project, and set up a user account for Rachel to manage the project website.  I had a chat with Graeme Cannon about a potential project he’s helping put together that may need some further technical input and I updated the DSL website and responded to a query from Ann Ferguson regarding a new section of the site.

I also spent most of a day working on the Edinburgh Gazetteer project, during which I completed work on the new ‘keywords’ feature.  It was great to be able to do this as I had been intending to work on this last week but just didn’t have the time.  I took Rhona’s keywords spreadsheet, which had page ID in one column and keywords separated by a semi-colon in another and created two database tables to hold the information (one for information about keywords and a joining table to link keywords to individual pages).  I then wrote a little script that went through the spreadsheet, extracted the information and added it to my database.  I then set to work on adding the actual feature to the website.

The index page of the Gazetteer now has a section where all of the keywords are listed.  There are more than 200 keywords so it’s a lot of information.  Currently the keywords appear like ‘bricks’ in a scrollable section, but this might need to be updated as it’s maybe a bit much information.  If you click on a keyword a page loads that lists all of the pages that the keyword is associated with.  When you load a specific page, either from the keyword page or from the regular browse option, there’s now a section above the page image that lists the associated keywords.  Clicking on one of these loads the keyword’s page, allowing you to access any other pages that are associated with it.  It’s a pretty simple system but it works well enough.  The actual keywords need a bit of work, though, as some are too specific and there are some near duplications due to typos and things like that.  Rhona is going to send me an updated spreadsheet and I will hopefully upload this next week.

Oh yes, it was five years ago this week that I started in this post.  How time flies.

Week Beginning 14th August 2017

I was on holiday last week but was back to work on Monday this week.  I’d kept tabs on my emails whilst I was away but as usual there were a number of issues that had cropped up in my absence that I needed to sort out.  I spent some time on Monday going through emails and updating my ‘to do’ list and generally getting back up to speed again after a lazy week off.

I had rather a lot of meetings and other such things to prepare for and attend this week.  On Monday I met with Bryony Randall for a final ‘sign off’ meeting for the New Modernist Editing project.  I’ve really enjoyed working on this project, both the creation of the digital edition and taking part in the project workshop.  We have now moved the digital edition of Virginia Woolf’s short story ‘Ode written partly in prose on seeing the name of Cutbush above a butcher’s shop in Pentonville’ to what will hopefully be its final and official URL and you can now access it here: http://nme-digital-ode.glasgow.ac.uk

On Tuesday I was on the interview panel for Jane Stuart-Smith’s SPADE project, which I’m also working on for a small percentage of my time.  After the interviews I also had a further meeting with Jane to discuss some of the technical aspects of her project.  On Wednesday I met with Alison Wiggins to discuss her ‘Archives and Writing Lives’ project, which is due to begin next month.  This project will involve creating digital editions of several account books from the 16th century.  When we were putting the bid together I did quite a bit of work creating a possible TEI schema for the account books and working out how best to represent all of the various data contained within the account entries.  Although this approach would work perfectly well, now that Alison has started transcribing some entries herself we’ve realised that managing complex relational structures via taxonomies in TEI via the Oxygen editor is a bit of a cumbersome process.  Instead Alison herself investigated using a relational database structure and had created her own Access database.  We went through the structure when we met and everything seems to be pretty nicely organised.  It should be possible to record all of the types of data and the relationships between these types using the Access database and so we’ve decided that Alison should just continue to use this for her project.  I did suggest making a MySQL database and creating a PHP based content management system for the project, but as there’s only one member of staff doing the work and Alison is very happy using Access it seemed to make sense to just stick with this approach.  Later on in the project I will then extract the data from Access, create a MySQL database out of it and develop a nice website for searching, browsing and visualising the data.  I will also write a script to migrate the data to our original TEI XML structure as this might prove useful in other projects.

It’s Performance and Development Review time again, and I have my meeting with my line manager coming up, so I spent about a day this week reviewing last year’s objectives and writing all of the required sections for this year.  Thankfully having my weekly blog posts makes it easier to figure out exactly what I’ve been up to in the review period.

Other than these tasks I helped Jane Roberts out with an issue with the Thesaurus of Old English, I fixed an issue with the STARN website that Jean Anderson had alerted me to, I had an email conversation with Rhona Brown about her Edinburgh Gazetteer project and I discussed data management issues with Stuart Gillespie.  I also uploaded the final set of metaphor data to the Mapping Metaphor database.  That’s all of the data processing for this project now completed, which is absolutely brilliant.  All categories are now complete and the number of metaphors has gone down from 12938 to 11883, while the number of sample lexemes (including first lexemes) has gone up from 25129 to a whopping 45108.

Other than the above I attended the ‘Future proof IT’ event on Friday.  This was an all-day event organised by the University’s IT services and included speakers from JISC, Microsoft, Cisco and various IT related people across the University.  It was an interesting day with some excellent speakers, although the talks weren’t as relevant to my role as I’d hoped they would be.  I did get to see Microsoft’s HoloLens technology in action, which was great, although I didn’t personally get a chance to try the headset on, which was a little disappointing.


Week Beginning 26th June 2017

On Friday this week I attended the Kay Day event, a series of lectures to commemorate the work of Christian Kay.  It was a thoroughly interesting event with some wonderful talks and some lovely introductions where people spoke about the influence Christian had on their lives.  The main focus of the event was the Historical Thesaurus, and it was at this event that we officially launched the new versions of the main HT website and the Thesaurus of Old English website, which I have been working on over the past few weeks.  You can now see the new versions here http://historicalthesaurus.arts.gla.ac.uk/ and here: http://oldenglishthesaurus.arts.gla.ac.uk/.  We’ve had some really good feedback about the new versions and hopefully they will prove to be great research tools.

In the run-up to the even this week I spent some further time on last-minute tweaks to the websites. On Monday I finished my major reworking of the TOE browse structure, which I had spent quite a bit of time on towards the end of last week.  The ‘xx’ categories now all have no child categories.  This does look a little strange in some places as these categories are now sometimes the only ones at that level without child categories, and in some cases it’s fairly clear that they should have child categories (e.g. ’11 Action and Utility’ contains ’11 Action, operation’ that presumably then should contains ’11.01 Action, doing, performance’).  However, the structure generally makes a lot more sense now (no ‘weaving’ in ‘food and drink’!) and we can always work on further refinement of the tree structure at a later date.

I also updated the ‘jump to category’ section of the search page to hopefully make it clearer what these ‘t’ numbers are.  This text is also on the new HT website.  I also fixed the display of long category titles that have slashes in them.  In Firefox these were getting split up over multiple lines as you’d expect, but Chrome was keeping all of the text on one long line, thus breaking out of the box and looking a bit horrible.  I have added a little bit of code to the script that generates the category info to replace slashes with a slash followed by a zero-width space character (​).  This shouldn’t change the look of the titles, but means the line will break on the slashes if the text is too long for the box.  I also fixed the issue with subcategory ‘cite’ buttons being pushed out of the title section when the subcategory titles were of a certain long length.

I also noticed that the browser’s ‘back’ button wasn’t working when navigating the tree – e.g. if you click to load a new category or change the part of speech you can’t press the ‘back’ button to return to what you were looking at previously.  I’m not sure that this is a massive concern as I don’t think many people actually use the ‘back’ button much these days, but when you do press it the ‘back’ button the ‘hash’ in the URL changes, but the content of the page doesn’t update, unless you then press the browser’s ‘reload’ button.  I spent a bit of time investigating this and came up with a solution.  It’s not a perfect solution as all I’ve managed to do is to stop the browsing of the tree and parts of speech being added to the user’s history, therefore no matter how much clicking around the tree you do if you press ‘back’ you’ll just be taken to the last non-tree page you looked at.  I think this is acceptable as the URL in the address bar still gets updated when you click around, meaning you can still copy this and share the link, and clicking around the tree and parts of speech isn’t really reloading a new page anyway.  I’d say it’s better than the user pressing ‘back’ and nothing updating other than the ID in the URL, which is how it currently worked.

Marc also noted that our Google Analytics stats are not going to update now we’re using a new AJAX way to load category details.  Thankfully Google have thought about how to handle sites like ours and it looks like I followed some instructions to make my code submit a GA ‘hit’ when my ‘load category’ JavaScript runs, following the instructions here: https://developers.google.com/analytics/devguides/collection/analyticsjs/single-page-applications

There are still further things I want to do with the HT and TOE sites- e.g. I never did have the time to properly overhaul the back-end and create one unified API for handling all data requests.  That side of things is still a bit of a mess of individual scripts and I’d really like to tidy it up at some point.  Also, the way I updated the ‘back button’ issue was to use the HTML5 ‘history’ interface to update the URL in the address bar without actually adding this change to the browser’s history (See https://developer.mozilla.org/en-US/docs/Web/API/History).  If I had the time I would investigate using this interface to use proper variables in the URL (e.g. ‘?id=1’) rather than a hash (e.g. ‘#id=1’) as hashes are only ever handled client side whereas variables can be processed on both client and server.  Before this HTML5 interface was created there was no reliable way for Javascript to update the page URL in the address bar, other than by changing the hash.

Other than Historical Thesaurus matters, I spent some time this week on other projects.  I read through the job applications for the SPADE RA post and met with Jane to discuss these.  I also fixed a couple of issues with the SCOSYA content management system that had crept in since the system was moved to a new server a while back.  I also got my MacOS system and XCode up to date in preparation for doing more app work in the near future.

I spent the remainder of my week updating the digital edition of the Woolf short story that I’ve been working on for Bryony Randall’s ‘New Modernist Editing’ project.  Bryony had sent the URL out for feedback and we’d received quite a lot of useful suggestions.  Bryony herself had also provided me with some updated text for the explanatory notes and some additional pages about the project, such as a bibliography.

I made some tweaks to the XML transcription to fix a few issues that people had noticed.  I added in ‘Index’ as a title to the index page and I’ve added in Bryony’s explanatory text.

I relabelled ‘Edition Settings’ to ‘Create your own view’ to make it clearer what this option is.  I moved the ‘next’ and ‘previous’ buttons to midway down the left and right edges of the page, and I think this works really well as when you’re looking at the text it feels more intuitive to ‘turn the page’ at the edges of what you’re looking at.  It also frees up space for additional buttons in the top navigation bar.

I made the ‘explanatory notes’ a dotted orange line rather than blue and I removed the OpenLayers blue dot and link from the facsimile view to reduce confusion.  In the ‘create your own view’ facility I made it so that if you select ‘original text’ this automatically selects all of the options within it.  If you deselect ‘original text’ the options within are all deselected.  If ‘Edited text’ is not selected when you do this then it becomes selected.  If ‘Original text’ is deselected and you deselect ‘Edited text’ then ‘Original text’ and the options within all become selected.  This should hopefully make it more difficult to create a view of the text that doesn’t make sense.

I also added in some new interpretations to the first handwritten note, as this is still rather undecipherable.  I created new pages for the ‘further information’, ‘how to use’ and ‘bibliography’.  These are linked to from the navigation bar of the pages of the manuscript, in addition to being linked to from the index page text.  A link appears allowing you to return to the page you were looking at if you access one of these pages from a manuscript page.  I think the digital edition is looking rather good now, and it was good to get the work on this completed before my holiday.  I can’t share the URL yet as we’re still waiting on some web space for the resource at The KEEP archives.  Hopefully this will happen by the end of July.

I will be on holiday for the next two weeks now so no further updates from me until later on in the summer.


Week Beginning 19th June 2017

I decided this week to devote some time to redevelop the Thesaurus of Old English, to bring it into line with the work I’ve been doing to redevelop the main Historical Thesaurus website.  I had thought I wouldn’t have time to do this before next week’s ‘Kay Day’ event but I decided that it would be better to tackle the redevelopment whilst the changes I’d made for the main site were still fresh in my mind, rather than coming back to it in possibly a few months’ time, having forgotten how I implemented the tree browse and things like that.  It actually took me less time than I had anticipated to get the new version up and running, and by the end of Tuesday I had a new version in place that was structurally similar to the new HT site.  We will hopefully be able to launch this alongside the new HT site towards the end of next week.

I sent the new URL to Carole Hough for feedback as I was aware that she had some issues with the existing TOE website.  Carole sent me some useful feedback, which led to me making some additional changes to the site – mainly to the tree browse structure.  The biggest issue is that the hierarchical structure of TOE doesn’t quite make sense.  There are 18 top-level categories, but for some reason I am not at all clear about each top-level category isn’t a ‘parent’ category but is in fact a sibling category to the ones that are one level down.  E.g, logically ’04 Consumption of food/drink’ would be the parent category of ’04.01’, ’04.02’ etc but in the TOE this isn’t the case, rather ’04.01’, ’04.02’ should sit alongside ‘04’.  This really confuses both me and my tree browse code, which expects categories ‘xx.yy’ to be child categories of ‘xx’.  This led to the tree browse putting categories where logically they belong, but within the confines of the TOE make no sense – e.g. we ended up with ’04.04 Weaving’ within ’04 Consumption of food/drink’!

To confuse matters further, there are some additional ‘super categories’ that I didn’t have in my TOE database but apparently should be used as the real 18 top-level categories.  Rather confusingly these have the same numbers as the other top-level categories.  So we now have ’04 Material Needs’ that has a child category ’04 Consumption of food/drink’ that then has ’04.04 Weaving’ as a sibling and not as a child as the number would suggest.  This situation is a horrible mess that makes little sense to a user, but is even harder for a computer program to make sense of.  Ideally we should renumber the categories in a more logical manner, but apparently this isn’t an option.  Therefore I had to hack about with my code to try and allow it to cope with these weird anomalies.  I just about managed to get it all working by the end of the week but there are a few issues that I still need to clear up next week.  The biggest one is that all of the ‘xx.yy’ categories and their child categories are currently appearing in two places – within ‘xx’ where they logically belong and beside ‘xx’ where this crazy structure says they should be placed.

In addition to all this TOE madness I also spent some further time tweaking the new HT website, including updating the quick search box so the display doesn’t mess up on narrow screens, making some further tweaks to the photo gallery and making alterations to the interface.  I also responded to a request from Fraser to update one of the scripts I’d written for the HT OED data migration that we’re still in the process of working through.

In terms of non-thesaurus related tasks this week, I was involved in a few other projects.  I had to spend some time on some AHRC review duties.  I also fixed an issue that had crept into the SCOTS and CMSW Corpus websites since their migration:  the ‘download corpus as a zip’ issue was no longer working due to the PHP code using an old class to create the zip that was not compatible with the new server.  I spent some time investigating this and finding a new way of using PHP to create zip files.  I also locked down the SPADE website admin interface to IP address ranges of our partner institutions and fixed an issue with the SCOSYA questionnaire upload facility.  I also responded to a request for information about TEI XML training from a PhD student and made a tweak to a page of the DSL website.

I spent the remainder of my week looking at some app issues.  We are hopefully going to be releasing a new and completely overhauled version of the ARIES app by the end of the summer and I had been sent a document detailing the overall structure of the new site.  I spent a bit of time creating a new version of the web-based ARIES app that reflected this structure, in preparation for receiving content.  I also returned to the Metre app, that I’ve not done anything about since last year.  I added in some explanatory text and I am hopefully going to be able to start wrapping this app up and deploying it to the App and Play stores soon.  But possibly not until after my summer holiday, which starts the week after next.


Week Beginning 29th May 2017

Monday this week was the spring bank holiday so it was a four-day week for me.  I split my time this week over three main projects.  Firstly, I set up an initial project website for Jane Stuart-Smith’s SPADE project.  We’d had some difficulty in assigning the resources for this project but thankfully this week we were given some web space and I managed to get a website set up, create a skeleton structure for it and create the user accounts that will allow the project team to manage the content.  I also had some email discussions with the project partners about how best to handle ‘private’ pages that should be accessible to the team but no-one else.  There is still some work to be done on the website, but for the time being my work is done.

I also continued this week to work on the public interface for the database of poems for The People’s Voice project.  Last week I started on the search facility, but only progressed as far as allowing a search for a few fields, with the search results page displaying nothing more than the number of matching poems.  This week I managed to pretty much complete the search facility.  Users can now search for any combination of search boxes and on the search results page there is now a section above the results that lists what you’ve searched for.  This also includes a ‘refine your search’ button that takes the user back to the search page.  The previously selected options are now ‘remembered’ by the form, allowing the user to update what they’ve searched for.  There is also a ‘clear search boxes’ button so the user can start a fresh search.

Search results are now paginated.  Twenty results are displayed per page and if there are more results than this then ‘next’, ‘previous’ and ‘jump to page’ links are displayed above and below your search results.  If there are lots of pages some ‘jump to page’ links are omitted to stop things getting too cluttered.  Search results display the poem title, date (or ‘undated’ if there is no date), archive / library, franchise and author.  Clicking on a result will lead to the full record, but this is still to do.  I also haven’t added in the option to order the results by anything other than poem title, as I’m not sure whether this will really be of much use and it’s going to require a reworking of the way search results are queried if I am to implement it.  I still have the ‘browse’ interface to work on and the actual page that displays the poem details, and I’ll continue with this next week.

I met with Bryony Randall this week to discuss some final tweaks to the digital edition of the Virginia Woolf short story that I’ve been working on.  I made a few changes to the transcription, updated how we label ‘sic’ and ‘corr’ text in the ‘edition settings’ (these are now called ‘original’ and ‘edited’) and I changed which edition settings are selected by default.  Where previously the original text was displayed we now display the ‘edited’ text with only line breaks from the ‘original’ retained.  Bryony is going to ask for feedback from members of the Network and we’re going to aim to get things finalised by the end of the month.

I spent the rest of the week working on the Historical Thesaurus.  Last week I met with Marc and Fraser to discuss updates to the website that we were going to try and implement before ‘Kay Day’ at the end of the month.  One thing I’ve wanted to try to implement for a while now is a tree-based browse structure.  I created a visual tree browse structure using the D3.js library for the Scots Thesaurus project and doing so made me realise how useful having such a speedy way to browse the full thesaurus structure would be.

I tried a few jQuery ‘tree’ plugins and in the end I went with FancyTree (https://github.com/mar10/fancytree) because it clearly explained how to load data into nodes via AJAX when a user opens the node.  This is important for us as we can’t load all 235,000 categories into the tree at once (well, we could but it would be a bad idea).  I created a PHP script (that I will eventually integrate with the HT API) that you can pass a catid to and it will spit out a JSON file containing all of the categories and subcategories that are one level down from it.  It also checks whether each of these have child categories.  If there are child categories then the tree knows to place the little ‘expand’ icon next to the category.  When the user clicks on a category this fires off a request for the category’s children and these are then dynamically loaded into the tree.  Here’s a screenshot of my first attempt at using FancyTree:

Subcategories are highlighted with a grey background and in this version you can’t actually view the words in a category.  Also, only nouns are currently represented.  I thought at this stage that we might have to have separate trees for each part of speech, but then realised that the other parts of speech don’t have a full hierarchy so the tree would be missing lots of branches and would therefore not work.  In this version the labels only show the category heading and catnum / subcat, but I can update the labels to display additional information.  We could for example show the number of categories within each category, or somehow represent the number of words contained in the category so you can see where the big categories are. I should also be able to override the arrow icons with font awesome icons.

After creating this initial version I realised there was still a lot to be done.  For example, if we’re using this browser then we need to ensure that when you open a category the tree loads with the correct part opened.  This might be tricky to implement.  Also there’s the issue of dealing with different parts of speech.

After working on this initial version I then began to work on a version that was integrated into the HT website design.  I also followed the plugin instructions for using Font Awsome icons rather than the bitmap icons, although this took some working out.  In order to get this to work another Javascript file was required (jquery.fancytree.glyph.js) but I just couldn’t get this to work.  It kept bringing up javascript errors about a variable called ‘span’ not existing.  Eventually I commented out the line of code (relating to the ‘loading’ section) and after that everything worked perfectly.  With this new version I also added in the facility to actually view the words when you open a category, and also to switch to different parts of speech.  It’s all working very nicely, apart from subcategories belonging to other parts of speech.  I’m wondering whether I should include subcategories in the tree or whether they should just be viewable through the ‘category’ pane.  If they only appear in the tree then it is never going to be possible to view subcategories that aren’t nouns whereas if they appear in a section as they do in the current category page then they will load whenever anyone selects that PoS.  It does mean we would lose a lot of content from the tree, though.  E.g. if you find ‘Beer’ all of those subcategories of beer would no longer be in the tree and you would no longer be able to flick through them all really quickly.  This is going to need some further though.  But here’s a screenshot of the current work-in-progress version of the tree browser: