Week Beginning 14th January 2019

I worked on a lot of different projects this week, which was a bit of a change from last week’s focus on a single project.  For the RNSN project I investigated how to create an over-arching timeline that would pull in data from all of the individual timelines I’d created previously.  I had investigated a variety of timeline libraries, but none of them really offered as good a level of functionality as the timeline.js library I was already using for the individual timelines, so I decided to focus on adapting this library to work with multiple timelines.  Thankfully the library already included support for ‘groups’, and by assigning slides to the same group they would appear on different ‘tracks’ in the timeline summary section.  For my initial version I created a ‘group’ for each of the individual timelines, and a further ‘group’ for the spreadsheet of ‘major events’ that Kirsteen had previously sent me to give context to the individual stories.  This initial version mostly worked pretty well, although I needed to tweak the CSS and some of the library’s JavaScript somewhat in order to make the group names more clearly visible in the summary section and to make the group names appear within the main body of the slides too.  On Friday I met with Kirsteen and Brianna, the project RA to discuss our future plans for the project and also to discuss the new timeline.  Kirsteen was very happy with how the timeline was coming along, and during the meeting I proposed some further changes.  In order to reduce the amount of space taken up by the summary section I suggested we have a ‘track’ for each nation rather than each song, and I suggested we ensure there is a link to each song’s full timeline from every slide as well.  Kirsteen suggested changing the colour of the contextual slides to more clearly differentiate them from the regular slides too.  I spent some further time implementing these updates and by the end of the week things were looking good.  I’ll still need to add in more content and to check how the interface works on narrower screens, but I’m pretty happy with how things are working out.  Here’s a screenshot of the timeline as it currently looks:

I also spent a bit of time working on Matthew Creasy’s Decadence and Translation project.  Matthew had digitised a number of pages of short French poems that he wanted to make available as a sort of mini digital edition that would involve a zoomable / panable interface to the images, plus textual transcriptions and translations with embedded notes.  Thankfully I’ve had quite a bit of previous experience with such interfaces (e.g. the ‘Digital Ode’ for the New Modernist Editing project) and could use a lot of existing code.  I couldn’t make things exactly the same as Matthew wanted people to be able to comment on each page, and as the rest of the project website is a WordPress site, it made sense to use WordPress’s commenting features.  For this reason I created a WordPress page for each digitised image, and embedded a small amount of code within each page.  The actual content was stored in a JSON file I created, and the only change needed between individual WordPress pages was to change a numerical ID.  My JavaScript code would then grab this ID, pull in the JSON file and extract the entry with the corresponding number, find the image tiles and set up an OpenLayers interface to allow the image to be zoomed, pull in the translations, transcriptions, notes and other things as HTML text and format everything for display.  WordPress would then add in its regular comment boxes and that would be that.  Matthew had sent me test information about six images and I added all of this to my JSON file.  Everything seems to be working pretty well so far, and I just need further input from Matthew now.  Here’s how things currently look:

Also this week I created an updated of a Data Management Plan for Thomas Clancy (the fourth version and possibly the last), updated the test version of the SCOSYA atlas to limit the attributes contained in it before a class uses the interface next week, made some further (and possibly final) tweaks to the Bilingual Thesaurus and migrated the Thesaurus of Old English to our new Thesaurus domain and ensured the site and its data all worked in the new location, which thankfully it did.  I’ll need to hear back from Marc and Fraser before I make this new version ‘live’, though.  I also made a small tweak to the DSL website, started to think about the next batch of updates that will be required to link up the HT and OED data, and did some App store related duties for someone elsewhere in the University.  So all in all a pretty busy week.

Week Beginning 19th February 2018

This week marked the start of the UCU’s strike action, which I am participating in.  This meant that I only worked from Monday to Wednesday.  It was quite a horribly busy week as I tried to complete some of the more urgent things on my ‘to do’ list before the start of the strike, while other things I had intended to complete unfortunately had to be postponed.  I spent some time on Monday writing a section containing details about the technical methodology for a proposal Scott Spurlock is intending to submit to the HLF.  I can’t really say too much about it here, but it will involve crowd sourcing and I therefore had to spend time researching the technologies and workflows that might work best for the project and then writing the required text.  Also on Monday I discovered that the AHRC does now have some guidance on its website about the switchover from Technical Plans to Data Management Plans.  There are some sample materials and accompanying support documentation, which is very helpful.  This can currently be found here:  http://www.ahrc.ac.uk/peerreview/peer-review-updates-and-guidance/ although this doesn’t look like it will be a very permanent URL.  Thankfully there will be a transition period up to the 29th of March when proposals can be submitted with either a Technical Plan or a DMP.  This will make things easier for a few projects I’m involved with.

Also on Monday Gary Thoms contacted me to say there were some problems with the upload facilities for the SCOSYA project, so I spent some time trying to figure out what was going on there.  What has happened is that Google seem to have restricted access to their geocoding API, which the upload script connects to in order to get the latitude and longitude of the ‘display town’.  Instead of returning data, Google was returning an error saying we had exceeded our quota of requests.  This was because previously I was just connecting to their API without registering for an API key, which used to work just fine but now is intermittent.  Keep refreshing this page: https://maps.googleapis.com/maps/api/geocode/json?address=Aberdour+scotland and you’ll see it returns data sometimes and an error about exceeding the quota other times.

After figuring this out I created an API account for the project with Google.  If I pass the key they gave me in the URL this now bypasses the restrictions.  We are allowed up to 2,500 requests a day and up to 5000 requests in 100 seconds (that’s what they say – not sure how that works if you’re limited to 2,500 a day) so we shouldn’t encounter a quota error again.

Thankfully the errors Gary was encountering with a second file turned out to be caused by typos in the questionnaire – an invalid postcode was given.  There were issues with a third questionnaire, which was giving an error on upload without stating what the error was, which was odd as I’d added in some fairly comprehensive error handling.  After some further investigation it turned out to be caused by the questionnaire containing a postcode that didn’t actually exist.  In order to get the latitude and longitude for a postcode my scripts connect to an external API which then returns the data in the ever so handy JSON format.  However, a while ago the API I was connecting to started to go a bit flaky and for this reason I added in a connection to a second external API if the first one gave a 404.  But now the initial API I used has completely gone offline, and is taking ages to even return a 404, which was really slowing down the upload script.  Not only that but the second API didn’t handle ‘unknown’ postcode errors in the same way.  The first API returned a nice error message but the second one just returned an empty JSON file.  This meant my error handler wasn’t picking up that there was a postcode error and thus giving no feedback.  I have now completely dropped the first API and connect directly to the second one, which speeds up the upload script dramatically.  I have also updated my error handlers so it knows how to handle an empty JSON file from this API.

On Tuesday I fixed a data upload error with the Thesaurus of Old English, spoke to Graeme about the AHRC’s DMPs and spent the morning working on the Advanced Search for the REELS project.  Last week I had completed the API for the advanced search and had started on the front end, and this week I managed to complete the front end for the search, including auto-complete fields where required, and supplying facilities to export the search results in CSV and JSON format.  There was a lot more to this task than I’m saying here but the upshot is that we have a search facility that can be used to build up some pretty complex queries.

On Tuesday afternoon we had a project meeting for the REELS project where I demonstrated the front end facilities and we discussed some further updates that would be required for the content management system.  I tackled some of these on Wednesday.  The biggest issue was with adding place-name elements to historical forms.  If you created a new element through the page where elements are associated with historical forms an error was encountered that caused the entire script to break and display a blank page.  Thankfully after a bit of investigation I figured out what was causing this and fixed it.  I also implemented the following:

  1. Added gender to elements
  2. Added ‘Epexegetic’ to the ‘role’ list
  3. When adding new elements to a place-name or historical form no language is selected by default, meaning entering text into the ‘element’ field searches all languages. The language appears in brackets after each element in the returned list.  Once selected the element’s language is then selected in the ‘language’ list.  You can still select a language before typing in an element to limit the search to that specific language
  4. All traces of ‘ScEng’ have been removed
  5. I’d noticed that when no element order was specified when you returned to the ‘manage elements’ page the various elements would sometime just appear in a random order. I’ve made it so that if no element order is entered the order is always the order in which the elements were originally added.
  6. When a historical form has been given elements these now appear in the table of historical forms on the ‘edit place’ page, so you can tell which forms already have elements (and what they are) without needing to load the edit a historical form page.
  7. Added an ‘unknown’ element. As all elements need a language I’ve assigned this to ‘Not applicable (na)’ for now.

Also on Wednesday I had to spend some time investigating why an old website of mine wasn’t displaying characters properly.  This was caused by the site being moved to a new server a couple of weeks ago.  It turned out to be caused by the page fragments (of which there are several thousand) being encoded as ANSI when the need to be UTF8.  I thought it would be a simple task to batch process the files to convert them butI’m afraid doing something as simple as batch converting from ANSI to UTF8 is proving to be stupidly difficult.  I still haven’t found a way to do it.  I tried following the example in Powershell here: https://superuser.com/questions/113394/free-ansi-to-utf8-multiple-files-converter

But it turns out you can only convert to UTF8 with BOM, which adds in bad characters to the start of the file as displayed on the website.  And there’s no easy way to get it without BOM, as discussed here: https://stackoverflow.com/questions/5596982/using-powershell-to-write-a-file-in-utf-8-without-the-bom

I then followed some of the possible methods listed here: https://gist.github.com/dogancelik/2a88c81d309a753cecd8b8460d3098bc UTFCast used to offer a ‘lite’ version for free that would have worked, but now they only offer the paid version, plus a demo.  I’ve installed the demo but it only allows conversion to UTF8 with BOM as well.  I got a macro working in Notepad++ but it turns out macros are utterly pointless as you can’t set them to run on multiple files at once – you need to open each file and then play the macro each time.  I also installed the python script plugin for Notepad++ and tried to run the script listed on the above page but nothing happens at all – not even an error message.  It was all very frustrating and I had to give up due to a lack of time.  Graeme (who was also involved in this project back in the day) had an old program that can do the batch converting and he gave me a copy so I’ll try this when I get the chance.

So that was my three-day week.  Next week I’ll be on strike on Monday to Wednesday so will be back at work on Thursday.

Week Beginning 18th December 2017

This was a short week for me as I only worked from Monday to Wednesday due to Christmas coming along.  I spent most of Monday and Tuesday continuing to work on the Technical Plan for Joanna Kopaczyk’s proposal.  As it’s a project with quite a large technical component there was a lot to think about and lots of detail to try and squeeze into the maximum of four pages allowed for a Plan.  My first draft was five pages long, so I had to chop some information out and reformat things to try and bring the length down a bit, but thankfully I managed to get it within the limit whilst still making sense and retaining the important points.  I also chatted with Graeme some more about some of the XML aspects of the project and had an email conversation with Luca about it too.  It was good to get the Plan sent on to Joanna, although it’s still very much a first draft that will need some further tweaking as other aspects of the proposal are firmed up.

I had to fix an issue with the Thesaurus of Old English staff pages on Monday.  The ‘edit lexemes’ form was set to not allow words to be more than 21 characters long.  Jane Roberts had been trying to update the positioning of the word ‘(ge)mearcian mid . . . rōde’, and as  this is more than 21 characters any changes made to this row were being rejected.  I’m not sure why I’d set the maximum word length to 21 as the database allows up to 60 characters in this field.  But I updated the check to allow up to 60 characters and that fixed the problem.  I also spent a bit of time on Tuesday gathering some stats for Wendy about the various Mapping Metaphor resources (i.e. the main website, the blog, the iOS app and the Android app).  I also had a chat with Jane Stuart Smith about an older but still very important site that she would like me to redesign at some point next year, and I started looking through this and thinking how it could be improved.

On Wednesday, as it was my last day before the hols, I decided to focus on something from my ‘to do’ list that would be fun.  I’d been wanting to make a timeline for the Historical Thesaurus for a while so I thought I’d look into that.  What I’ve created so far is a page through which you can pass a category ID and then see all of the words in the category in a visualisation that shows when the word was used, based on the ‘apps’ and ‘appe’ fields in the database.  When a word’s ‘apps’ and ‘appe’ fields are the same it appears as a dot in the timeline, and where the fields are different the word appears as a coloured bar showing the extent of the attested usage.  Note that more complicated date structures such as ‘a1700 + 1850–‘ are not visualised yet, but could be incorporated (e.g. a dot for 1700 then a bar from 1850 to 2000).

When you hover over a dot or bar the word and its dates appear below the visualisation.  Eventually (if we’re going to use this anywhere) I would instead have this as a tool-tip pop-up sort of thing.

Here are a couple of screenshots of fitting examples for the festive season.  First up is words for ‘Be gluttonous’:

And here are words for ‘Excess in drinking’:

The next step with this would be to incorporate all subcategories for a category, with different shaded backgrounds for sections for each subcategory and a subcategory heading added in.  I’m not entirely sure where we’d link to this, though.  We could allow people to view the timeline by clicking on a button in the category browse page.  Or we might not want to incorporate it at all, as it might just clutter things up.  BTW, this is a D3 based visualisation created by adapting this code: https://github.com/denisemauldin/d3-timeline

That’s all from me for 2017.  Best wishes for Christmas and the New Year to one and all!

Week Beginning 27th November 2017

I was off on Tuesday this week to attend my uncle’s funeral.  I spent the rest of the week working on a number of relatively small tasks for a variety of different projects.  The Dictionary of Old English people got back to me on Monday to say they had updated their search system to allow our Thesaurus of Old English site to link directly from our word records to a search for that word on their site.  This was really great news, and I updated our site to add in the direct links.  This is going to be very useful for users of the both sites.  I spent a bit more time on AHRC review duties this week, and I also had an email discussion with Joanna Kopaczyk in English Language about a proposal she is putting together.  She sent me on the materials she is working on and I read through them all and gave some feedback about the technical aspects.  I’m going to help her to write the Technical Plan for her project soon too.  I also met with Rachel Douglas from the School of Modern Languages to offer some advice on technical matters relating to a projest she’s putting together.  Althoguh Rachel is not in my School and I therefore can’t be involved in her project it was still good to be able to give her a bit of help and show her some examples of digital outputs similar to the sorts of thing she is hoping to produce.

I also spent some further time working on the integration of OED data with the Historical Thesaurus data with Fraser.  Fraser had sent me some further categories that he and a student had manually matched up, and had also asked me to write another script that picks out all of the unmatched HT categories and all of the unmatched OED categories and for each HT category goes through all of the OED categories and finds the one with the lowest Levenshtein score (an algorithm that returns a number showing how many steps it would take to turn one string into another).  My initial version of this script wasn’t ideal, as it included all unmatched OED categories and I’d forgotten that this included several thousand that are ‘top level’ categories that don’t have a part of speech and shouldn’t be matched with our categories at all.  I also realised that the script should only compare categories that have the same part of speech, as my first version was ending up with (for example) a noun category being matched up with an adjective.  I updated the script to bear these things in mind, but unfortunately the output still doesn’t look all that useful.  However, there are definitely some real matches that can be manually picked out from the list, e.g. 31890 ‘locustana pardalina or rooibaadjie’ and ‘locustana pardalina (rooibaadjie)’ and some others around there.  Also 14149 ‘applied to weapon etc’ and ‘applied to weapon, etc’.  It’s over to Fraser again to continue with this.

I mentioned last week that I’d updated all of our WordPress sites to version 4.9, but that 4.9.1 would no doubt soon be released.  And in fact it was released this week, so I had to update all of the sites once more.  It’s a bit of a tedious task but it doesn’t really take too long – maybe about half an hour in total.  I also decided to tick an item off my long-term ‘to do’ list as I had a bit of time available.  The Mapping Metaphor site had a project blog, located at a different URL from the main site.  As the project has now ended there are no more blog posts being made so it seems a bit pointless hosting this WordPress site, and having to keep it maintained, when I could just migrate the content to the main MM website as static HTML and delete the WordPress site.  I spent some time investigating WordPress plugins that could export entire sites as static HTML, for example https://en-gb.wordpress.org/plugins/static-html-output-plugin/ and https://wordpress.org/plugins/simply-static/.  These plugins go through a WordPress site, convert all pages and posts to static HTML, pull in the WordPress file uploads folder and wrap everything up as a ZIP file.  This seemed ideal, and the tools both worked very well, but I realised they weren’t exactly what I needed.  Firstly, the Metaphor blog (which was set up before I was involved with the project) just uses page IDs in the URLs, not other sorts of permalinks.  Both the plugins don’t work with the default URL style in place, so I’d need to change the link type, meaning the new pages would have different URLs to the old pages which would be a problem for redirects.  Secondly, both plugins pull in all of the page elements, including the page design, the header and all the rest.  I didn’t actually want all of this stuff but just the actual body of the posts (plus titles and a few other details) so I could slot this into the main MM website template.  So instead of using a plugin I realised it was probably simpler and easier if I just wrote my own little export script that grabbed just the published posts (not pages), for each getting the ID, the title, the main body, the author and the date of creation.  My script hooked into the WordPress functions to make use of the ‘wpautop’ function, which adds paragraph markup to texts, and I also replaced absolute URLs with relative ones.  I then created a temporary table to hold just this data, set my script to insert into it and then I exported this table.  I imported this into the main MM site’s database and wrote a very simple script to pull out the correct post based on the passed ID and that was that. Oh, I also copied the WordPress uploads directory across too, so images and PDFs and such things embedded in posts would continue to work.  Finally, I created a simple list of posts.  It’s exactly what was required and was actually pretty simple to implement, which is a good combination.

On Thursday I heard that the Historical Thesaurus had been awarded the ‘Queen’s Anniversary Prize for Higher Education’, which is a wonderful achievement for the project.  Marc had arranged a champagne reception on Friday afternoon to celebrate the announcement, so I spent most of afternoon sipping champagne and eating chocolates, which was a nice way to end the week.

Week Beginning 20th November 2017

It was another unsettled week for me, as on Tuesday my uncle died.  He’d been suffering from cancer for a while and had been steadily getting worse, but we hoped he would make it until after Christmas, so it was a bit of a shock.  I was very close to him, and he introduced me to hillwalking, which is one of my favourite pastimes, so it’s all very sad.

However, I still managed to work just about a full week of work and get a lot done.  I’d met with Jennifer Smith several weeks ago to discuss a pilot project she was putting together that would require an online questionnaire that schoolchildren would fill in.  On Monday Jennifer phoned me up to say that the pilot school had been in touch and she would need the website in place for next Monday.  Thankfully I had intended to spent a fair amount of time this week on the SCOSYA project (another of Jennifer’s projects) so I could simply divert my time without causing too much inconvenience to anyone.

I met with Jennifer and her RA Derek Henderson on Monday to discuss what needed to be done.  When we’d met previously we’d decided that a simple Google Form would probably work well enough as the questionnaire, but after further discussions it turned out this wasn’t going to work so well.  Derek had been playing about with Google Forms but it just wasn’t flexible enough to get things laid out exactly as he wanted.  Plus, we needed to password protect the form and create different user accounts for different schools, and as a Google form is hosted on a Google server at a Google URL it’s not possible to do such a thing (I did some investigation and it is possible to set up a password, but only in JavaScript on the client side, which means anyone looking at the page source can see how to bypass it).  So for these reasons I decided I’d just set up the questionnaire myself at a subdomain on one of our servers.

As the questionnaire is to be filled in on mobile devices as much as traditional PCs I used the jQuery Mobile framework to set up the user interface.  This worked very nicely as it provides lots of great widgets for form elements.  It’s also possible (and indeed recommended) to use one single page to process multiple jQuery Mobile powered pages, which allows for nice transitions between pages, and also works very nicely with forms split over multiple pages (the questionnaire was in four parts) as it means the previous stages aren’t actually submitted but are just ‘hidden’ by the framework.  This allows the user to navigate back to the earlier stages and for all the content they’ve entered to still be there.

So, I set up a database structure for questions, answers, user groups and such things.  I designed a simple but pleasing user interface, aided by the jQuery Mobile theme roller, I picked out a couple of nice fonts for the header and the main site text, I added in some validation of certain boxes, made some boxes hidden until certain options had been selected, and wrote a small amount of PHP to handle the submission of the form and the inserting of the data into the database.  I also created a script that exports all of the data as a CSV file for Jennifer and Derek to use.

Other than this I had a further email conversation with the DSL people about future updates and make a couple of tweaks to the live website.  I also participated in an email discussion about the way the Thesaurus of Old English links out to the Dictionary of Old English.  Previously, we were able to link from a word on our site directly into the search facility on the DOE website, allowing users to go from an OE word on our site to entries on about the word on the DOE website.  However, a while back the DOE unveiled a new website, which was much improved but unfortunately changed the way their search worked.  All searches are now handled by AJAX in the client’s browser and it’s not possible for us to hook into this.  However, DOE are keen on allowing us to hook in as before so I made some suggestions as to how this might be possible, and explained exactly how the old system worked.  It looks like we might be able to get something working again soon.

On Wednesday I met with Pauline Mackay to discuss a few upcoming issues.  Firstly, we talked about a new database she’s been putting together that I’m going to create an online version of.  Pauline has created this in Access and I should be able to migrate this to an online version and create a nice search / browse interface fairly easily.  Secondly, we talked about updating the ‘Editing Burns’ website to incorporate the new ‘Phase 2’.  We’re going to have some sort of top level of tabs allowing users to choose two different sites that have different colour schemes.  There’s a project meeting in December that I’ll be going to where this will be discussed further.  Thirdly, we discussed a new project Pauline is in the process of putting together that will have a small technical component.  I can’t say much more about this for now, but I’m going to help out with the Technical Plan.

I also spent a bit of time this week on AHRC review duties and I have a brief chat with Scott Spurlock about a possible way of getting his parish records project funded.  And I updated Gavin Miller’s SciFiMedHums project website to disable user registrations as these had started to attract spammers and the project doesn’t require user registrations at the moment anyway.  Oh, on Monday I was on an interview panel for a technical job in another part of the College.  Finally, I spent a bit of time upgrading all of the WordPress instances I manage as a new version of WordPress (version 4.9) was recently released.  No doubt there will be a 4.9.1 to install before too long.

Week Beginning 4th September 2017

I spent a lot of this week continuing with the redevelopment of the ARIES app and thankfully after laying the groundwork last week (e.g. working out the styles and the structure, implementing a couple of exercise types) my rate of progress this week was considerably improved.  In fact, by the end of the week I had added in all of the content and had completed an initial version of the web version of the app.  This included adding in some new quiz types, such as one that allows the user to reorder the sentences in a paragraph by dragging and dropping them, and also a simple multiple choice style quiz.  I also received some very useful feedback from members of the project team and made a number of refinements to the content based on this.

This included updating the punctuation quiz so that if you get three incorrect answers in a quiz a ‘show answer’ button is displayed.  Clicking on this puts in all of the answers and shows the ‘well done’ box.  This was rather tricky to implement as the script needed to reset the question, including removing all previous answers, ticks, and resetting the initial letter case as if you select a full stop the following letter is automatically capitalised.  I also implemented a workaround for answers where a space is acceptable.  These no longer count towards the final tally of correct answers, so leaving a space rather than selecting a comma can now result in the ‘well done’ message being displayed.  Again, this was rather tricky to implement and it would be good if you could test out this quiz thoroughly to make sure there aren’t any occasions where the quiz breaks.

I also improved navigation throughout the app.  I added ‘next’ buttons to all of the quizzes, which either take you to the next section, or to the next part of the quiz, as applicable.  I think this works much better than just having the option to return to the page the quiz was linked from.  I also added in a ‘hamburger’ button to the footer of every page within a section.  Pressing on this takes you to the section’s contents page, and I added ‘next’ and ‘previous’ buttons to the contents pages too, so you can navigate between sections without having to go back to the homepage.

I spent a bit of time fixing the drag / drop quizzes so that the draggable boxes were constrained to each exercise’s boundaries.  This seemed to work great until I got to the references quiz, which has quite long sections of draggable text.  With the constraint in place it became impossible for the part of the draggable button that triggers the drop to reach the boxes nearest the boundaries of the question as none of the button could pass the borders.  So rather annoyingly I had to remove this feature and just allow people to drag the buttons all over the page.  But dropping a button from one question into another will always give you an incorrect answer now, so it’s not too big a problem.

With all of this in place I’ll start working on the app version of the resource next week and will hopefully be able to submit it to the app stores by the end of the week, all being well.

In addition to my work on ARIES, I completed some other tasks for a number of other projects.  For Mapping Metaphor I created a couple of scripts for Wendy that output some statistics about the metaphorical connections in the data.  For the Thesaurus of Old English I created a facility to enable staff to create new categories and subcatetories (previously it was only possible to edit existing categories or add / edit / remove words from existing categories).  I met with Nigel Leask and some of the Curious Travellers team on Friday to discuss some details for a new post associated with this project.  I had an email discussion with Ronnie Young about the Burns database he wants me to make an online version of.  I also met with Jane Stuart-Smith and Rachel MacDonald, who is the new project RA for the SPADE project, and set up a user account for Rachel to manage the project website.  I had a chat with Graeme Cannon about a potential project he’s helping put together that may need some further technical input and I updated the DSL website and responded to a query from Ann Ferguson regarding a new section of the site.

I also spent most of a day working on the Edinburgh Gazetteer project, during which I completed work on the new ‘keywords’ feature.  It was great to be able to do this as I had been intending to work on this last week but just didn’t have the time.  I took Rhona’s keywords spreadsheet, which had page ID in one column and keywords separated by a semi-colon in another and created two database tables to hold the information (one for information about keywords and a joining table to link keywords to individual pages).  I then wrote a little script that went through the spreadsheet, extracted the information and added it to my database.  I then set to work on adding the actual feature to the website.

The index page of the Gazetteer now has a section where all of the keywords are listed.  There are more than 200 keywords so it’s a lot of information.  Currently the keywords appear like ‘bricks’ in a scrollable section, but this might need to be updated as it’s maybe a bit much information.  If you click on a keyword a page loads that lists all of the pages that the keyword is associated with.  When you load a specific page, either from the keyword page or from the regular browse option, there’s now a section above the page image that lists the associated keywords.  Clicking on one of these loads the keyword’s page, allowing you to access any other pages that are associated with it.  It’s a pretty simple system but it works well enough.  The actual keywords need a bit of work, though, as some are too specific and there are some near duplications due to typos and things like that.  Rhona is going to send me an updated spreadsheet and I will hopefully upload this next week.

Oh yes, it was five years ago this week that I started in this post.  How time flies.

Week Beginning 26th June 2017

On Friday this week I attended the Kay Day event, a series of lectures to commemorate the work of Christian Kay.  It was a thoroughly interesting event with some wonderful talks and some lovely introductions where people spoke about the influence Christian had on their lives.  The main focus of the event was the Historical Thesaurus, and it was at this event that we officially launched the new versions of the main HT website and the Thesaurus of Old English website, which I have been working on over the past few weeks.  You can now see the new versions here http://historicalthesaurus.arts.gla.ac.uk/ and here: http://oldenglishthesaurus.arts.gla.ac.uk/.  We’ve had some really good feedback about the new versions and hopefully they will prove to be great research tools.

In the run-up to the even this week I spent some further time on last-minute tweaks to the websites. On Monday I finished my major reworking of the TOE browse structure, which I had spent quite a bit of time on towards the end of last week.  The ‘xx’ categories now all have no child categories.  This does look a little strange in some places as these categories are now sometimes the only ones at that level without child categories, and in some cases it’s fairly clear that they should have child categories (e.g. ’11 Action and Utility’ contains ’11 Action, operation’ that presumably then should contains ’11.01 Action, doing, performance’).  However, the structure generally makes a lot more sense now (no ‘weaving’ in ‘food and drink’!) and we can always work on further refinement of the tree structure at a later date.

I also updated the ‘jump to category’ section of the search page to hopefully make it clearer what these ‘t’ numbers are.  This text is also on the new HT website.  I also fixed the display of long category titles that have slashes in them.  In Firefox these were getting split up over multiple lines as you’d expect, but Chrome was keeping all of the text on one long line, thus breaking out of the box and looking a bit horrible.  I have added a little bit of code to the script that generates the category info to replace slashes with a slash followed by a zero-width space character (​).  This shouldn’t change the look of the titles, but means the line will break on the slashes if the text is too long for the box.  I also fixed the issue with subcategory ‘cite’ buttons being pushed out of the title section when the subcategory titles were of a certain long length.

I also noticed that the browser’s ‘back’ button wasn’t working when navigating the tree – e.g. if you click to load a new category or change the part of speech you can’t press the ‘back’ button to return to what you were looking at previously.  I’m not sure that this is a massive concern as I don’t think many people actually use the ‘back’ button much these days, but when you do press it the ‘back’ button the ‘hash’ in the URL changes, but the content of the page doesn’t update, unless you then press the browser’s ‘reload’ button.  I spent a bit of time investigating this and came up with a solution.  It’s not a perfect solution as all I’ve managed to do is to stop the browsing of the tree and parts of speech being added to the user’s history, therefore no matter how much clicking around the tree you do if you press ‘back’ you’ll just be taken to the last non-tree page you looked at.  I think this is acceptable as the URL in the address bar still gets updated when you click around, meaning you can still copy this and share the link, and clicking around the tree and parts of speech isn’t really reloading a new page anyway.  I’d say it’s better than the user pressing ‘back’ and nothing updating other than the ID in the URL, which is how it currently worked.

Marc also noted that our Google Analytics stats are not going to update now we’re using a new AJAX way to load category details.  Thankfully Google have thought about how to handle sites like ours and it looks like I followed some instructions to make my code submit a GA ‘hit’ when my ‘load category’ JavaScript runs, following the instructions here: https://developers.google.com/analytics/devguides/collection/analyticsjs/single-page-applications

There are still further things I want to do with the HT and TOE sites- e.g. I never did have the time to properly overhaul the back-end and create one unified API for handling all data requests.  That side of things is still a bit of a mess of individual scripts and I’d really like to tidy it up at some point.  Also, the way I updated the ‘back button’ issue was to use the HTML5 ‘history’ interface to update the URL in the address bar without actually adding this change to the browser’s history (See https://developer.mozilla.org/en-US/docs/Web/API/History).  If I had the time I would investigate using this interface to use proper variables in the URL (e.g. ‘?id=1’) rather than a hash (e.g. ‘#id=1’) as hashes are only ever handled client side whereas variables can be processed on both client and server.  Before this HTML5 interface was created there was no reliable way for Javascript to update the page URL in the address bar, other than by changing the hash.

Other than Historical Thesaurus matters, I spent some time this week on other projects.  I read through the job applications for the SPADE RA post and met with Jane to discuss these.  I also fixed a couple of issues with the SCOSYA content management system that had crept in since the system was moved to a new server a while back.  I also got my MacOS system and XCode up to date in preparation for doing more app work in the near future.

I spent the remainder of my week updating the digital edition of the Woolf short story that I’ve been working on for Bryony Randall’s ‘New Modernist Editing’ project.  Bryony had sent the URL out for feedback and we’d received quite a lot of useful suggestions.  Bryony herself had also provided me with some updated text for the explanatory notes and some additional pages about the project, such as a bibliography.

I made some tweaks to the XML transcription to fix a few issues that people had noticed.  I added in ‘Index’ as a title to the index page and I’ve added in Bryony’s explanatory text.

I relabelled ‘Edition Settings’ to ‘Create your own view’ to make it clearer what this option is.  I moved the ‘next’ and ‘previous’ buttons to midway down the left and right edges of the page, and I think this works really well as when you’re looking at the text it feels more intuitive to ‘turn the page’ at the edges of what you’re looking at.  It also frees up space for additional buttons in the top navigation bar.

I made the ‘explanatory notes’ a dotted orange line rather than blue and I removed the OpenLayers blue dot and link from the facsimile view to reduce confusion.  In the ‘create your own view’ facility I made it so that if you select ‘original text’ this automatically selects all of the options within it.  If you deselect ‘original text’ the options within are all deselected.  If ‘Edited text’ is not selected when you do this then it becomes selected.  If ‘Original text’ is deselected and you deselect ‘Edited text’ then ‘Original text’ and the options within all become selected.  This should hopefully make it more difficult to create a view of the text that doesn’t make sense.

I also added in some new interpretations to the first handwritten note, as this is still rather undecipherable.  I created new pages for the ‘further information’, ‘how to use’ and ‘bibliography’.  These are linked to from the navigation bar of the pages of the manuscript, in addition to being linked to from the index page text.  A link appears allowing you to return to the page you were looking at if you access one of these pages from a manuscript page.  I think the digital edition is looking rather good now, and it was good to get the work on this completed before my holiday.  I can’t share the URL yet as we’re still waiting on some web space for the resource at The KEEP archives.  Hopefully this will happen by the end of July.

I will be on holiday for the next two weeks now so no further updates from me until later on in the summer.


Week Beginning 19th June 2017

I decided this week to devote some time to redevelop the Thesaurus of Old English, to bring it into line with the work I’ve been doing to redevelop the main Historical Thesaurus website.  I had thought I wouldn’t have time to do this before next week’s ‘Kay Day’ event but I decided that it would be better to tackle the redevelopment whilst the changes I’d made for the main site were still fresh in my mind, rather than coming back to it in possibly a few months’ time, having forgotten how I implemented the tree browse and things like that.  It actually took me less time than I had anticipated to get the new version up and running, and by the end of Tuesday I had a new version in place that was structurally similar to the new HT site.  We will hopefully be able to launch this alongside the new HT site towards the end of next week.

I sent the new URL to Carole Hough for feedback as I was aware that she had some issues with the existing TOE website.  Carole sent me some useful feedback, which led to me making some additional changes to the site – mainly to the tree browse structure.  The biggest issue is that the hierarchical structure of TOE doesn’t quite make sense.  There are 18 top-level categories, but for some reason I am not at all clear about each top-level category isn’t a ‘parent’ category but is in fact a sibling category to the ones that are one level down.  E.g, logically ’04 Consumption of food/drink’ would be the parent category of ’04.01’, ’04.02’ etc but in the TOE this isn’t the case, rather ’04.01’, ’04.02’ should sit alongside ‘04’.  This really confuses both me and my tree browse code, which expects categories ‘xx.yy’ to be child categories of ‘xx’.  This led to the tree browse putting categories where logically they belong, but within the confines of the TOE make no sense – e.g. we ended up with ’04.04 Weaving’ within ’04 Consumption of food/drink’!

To confuse matters further, there are some additional ‘super categories’ that I didn’t have in my TOE database but apparently should be used as the real 18 top-level categories.  Rather confusingly these have the same numbers as the other top-level categories.  So we now have ’04 Material Needs’ that has a child category ’04 Consumption of food/drink’ that then has ’04.04 Weaving’ as a sibling and not as a child as the number would suggest.  This situation is a horrible mess that makes little sense to a user, but is even harder for a computer program to make sense of.  Ideally we should renumber the categories in a more logical manner, but apparently this isn’t an option.  Therefore I had to hack about with my code to try and allow it to cope with these weird anomalies.  I just about managed to get it all working by the end of the week but there are a few issues that I still need to clear up next week.  The biggest one is that all of the ‘xx.yy’ categories and their child categories are currently appearing in two places – within ‘xx’ where they logically belong and beside ‘xx’ where this crazy structure says they should be placed.

In addition to all this TOE madness I also spent some further time tweaking the new HT website, including updating the quick search box so the display doesn’t mess up on narrow screens, making some further tweaks to the photo gallery and making alterations to the interface.  I also responded to a request from Fraser to update one of the scripts I’d written for the HT OED data migration that we’re still in the process of working through.

In terms of non-thesaurus related tasks this week, I was involved in a few other projects.  I had to spend some time on some AHRC review duties.  I also fixed an issue that had crept into the SCOTS and CMSW Corpus websites since their migration:  the ‘download corpus as a zip’ issue was no longer working due to the PHP code using an old class to create the zip that was not compatible with the new server.  I spent some time investigating this and finding a new way of using PHP to create zip files.  I also locked down the SPADE website admin interface to IP address ranges of our partner institutions and fixed an issue with the SCOSYA questionnaire upload facility.  I also responded to a request for information about TEI XML training from a PhD student and made a tweak to a page of the DSL website.

I spent the remainder of my week looking at some app issues.  We are hopefully going to be releasing a new and completely overhauled version of the ARIES app by the end of the summer and I had been sent a document detailing the overall structure of the new site.  I spent a bit of time creating a new version of the web-based ARIES app that reflected this structure, in preparation for receiving content.  I also returned to the Metre app, that I’ve not done anything about since last year.  I added in some explanatory text and I am hopefully going to be able to start wrapping this app up and deploying it to the App and Play stores soon.  But possibly not until after my summer holiday, which starts the week after next.


Week Beginning 2nd January 2017

I had a fairly easy first week back after the Christmas holidays as I was only working on the Thursday and Friday.  On Thursday I spent some time catching up with emails and other such administrative tasks.  I also spent some time preparing for a meeting I had on Friday with Alice Jenkins.  She is putting together a proposal for a project that has a rather large and complicated digital component and before the meeting I read through the materials she had sent me and wrote a few pages of notes about how the technical aspects might be tackled.  We then had a good meeting on Friday and we will be taking the proposal forward during the New Year, all being well.  I can’t say much more about it here at this stage, though.

I spent some further time on Thursday and on Friday updating the content of the rather ancient ‘Learning with the Thesaurus of Old English’ website for Carole Hough.  The whole website needs a complete overhaul but its exercises are built around an old version of the thesaurus that forms part of the resource and is quite different in its functionality from the new TOE online resource.  So for now Carole just wanted some of the content of the existing website updated and we’ll leave the full redesign for later.  This meant going through a list of changes Carole had compiled and making the necessary updates, which took a bit of time but wasn’t particularly challenging to do – so a good way to start back after the hols.

Other than these tasks I spent the remainder of the week going through the old STELLA resource STARN and migrating it to T4.  Before Christmas I had completed ‘Criticism and commentary’ and this week I completed ‘Journalism’ and made a start on ‘Language’.  However, this latter section actually has a massive amount of content tucked away in subsections and it is going to take rather a long time to get this all moved over.  Luckily there’s no rush to get this done and I’ll just keep pegging away at it whenever I have a free moment or two over the next few months.

Week Beginning 5th December 2016

I spent the majority of this week working for the SCOSYA project, in advance of our all-day meeting on Friday.  I met with Gary on Monday to discuss some additional changes he wanted made to the ‘consistency data’ view and other parts of the content management system.  The biggest update was to add a new search facility to the ‘consistency data’ page that allows you to select whether data is ‘consistent’ or ‘mixed’ based on the distance between the ratings.  Previously to work out ‘mixed’ scores you specified which scores were considered ‘low’ and which were considered ‘high’ and everything else was ‘mixed’, but this new way provides a more useful means of grouping the scores.  E.g. you can specify that a ‘mixed’ score is anything where the ratings for a location are separated by 3 or more points.  So ratings of 1 and 2 are consistent but ratings of 1 and 4 are mixed.  In addition users can state whether a pairing of ‘2’ and ‘4’ is always considered ‘mixed’.  This is because ‘2’ is generally always a ‘low’ score and ‘4’ is always a ‘high’ score, even though there are only two rating points between the scores.

I also updated the system to allow users to focus on locations and attributes where a specific rating has been given.  Users can select a rating (e.g. 2) and the table of results only shows which attributes at each location have one or more rating of 2.  The matching cells just say ‘present’ while other attributes at each location have blank cells in the table.  Instead of %mixed, %high etc there is %present – the percentage of each location and attribute where this rating is found.

I also added in the option to view all of the ‘score groups’ for ratings – i.e. the percentage of each combination of scores for each attribute.  E.g. 10% of the ratings for Attribute A are ‘1 and 2’, 50% are ‘4 and 5’.

With these changes in place I then updated the narrowing of a consistency data search to specific attributes.  Previously the search facility allowed staff to select one or more ‘code parents’ to focus on rather than viewing the data for all attributes at once.  I’ve now extended this so that users can open up each code parent and select / deselect the individual attributes contained within.  This greatly extends the usefulness of the search tool.  I also added in another limiting facility, this time allowing the user to select or deselect questionnaires.  This can be used to focus on specific locations or to exclude certain questionnaires from a query if these are considered problematic questionnaires.

When I met with Gary on Monday he was keen to have access to the underlying SCOSYA database to maybe try running some queries directly on the SQL himself.  We agreed that I would give him an SQL dump of the database and will help him get this set up on his laptop.  I realised that we don’t have a document that describes the structure of the project database, which is not very good as without such a document it would be rather difficult for someone else to work with the system.  I therefore spent a bit of time creating an entity-relationship diagram showing the structure of the database and writing a document that describes each table, the fields contained in them and the relationships between them.  I feel much better knowing this document exists now.

On Friday was has a team meeting, involving the Co-Is for the project:  David Adger and Caroline Heycock, in addition to Jennifer and Gary.  I was a good meeting, and from a technical point of view it was particularly good to be able to demonstrate the atlas to David and Caroline and receive their feedback on it.  For example, it wasn’t clear to either of them whether the ‘select rating’ buttons were selected or deselected, which led to confusing results (e.g. thinking 4-5 was selected but actually having 1-3 selected).  This is something I will have to make a lot clearer.  We also discussed alternative visualisation styles and the ‘pie chart’ map markers I mentioned in last week’s post.  Jennifer thinks these will be just too cluttered on the map so we’re going to have to think of alternative ways of displaying the data – e.g. have a different icon for each combination of selected attribute, or have different layers that allow you to transition between different views of attributes so you can see what changes are introduced.

Other than SCOSYA related activities I completed a number of other tasks this week.  I had an email chat with Carole about the Thesaurus of Old English teaching resource.  I have now fixed the broken links in the existing version of the resource.  However, it looks like there isn’t going to be an updated version any time soon as I pointed out that the resource would have to work with the new TOE website and not the old search options that appear in a frameset in the resource.  As the new TOE functions quite differently from the old resource this would mean a complete rewrite of the exercises, which Carole understandably doesn’t want to do.  Carole also mentioned that she and others find the new TOE website difficult to use, so we’ll have to see what we can do about that too.

I also spent a bit more time working through the STELLA resources.  I spoke to Marc about the changes I’ve been making and we agreed that I should be added to the list of STELLA staff too.  I’m going to be ‘STELLA Resources Director’ now, which sounds rather grand.  I made a start on migrating the old ‘Bibliography of Scottish Literature’ website to T4 and also Jane’s ‘Accent change in Glaswegian’ resource too.  I’ll try and get these completed next week.

I also completed work on the project website for Carolyn Jess-Cooke, and I’m very pleased with how this is looking now.  It’s not live yet so I can’t link to it from here at the moment.  I also spoke with Fraser about a further script he would like me to write to attempt to match up the historical thesaurus categories and the new data we received from the OED people.  I’m going to try to create the script next week and we’re going to meet to discuss it.