Week Beginning 3rd September 2018

It was back to normality this week after last week’s ICEHL conference.  I had rather a lot to catch up with after being out of the office for four days last week and spending the fifth writing up my notes.  I spent about a day thinking through the technical issues for an AHR proposal Matthew Sangster is putting together and then writing a first version of the Data Management Plan.  I also had email conversations with Bryony Randall and Dauvit Broun about workshops they’re putting together that they each want me to participate in.  I responded to a query from Richard Coates at Bristol who is involved with the English Place-Name Society about a database related issue the project is experiencing, and I also met with Luca a couple of times to help him with an issue related to using OpenStreetMap maps offline.  Luca needed to set up a version of map-based interface he has created that needs to work offline, so he needed to download the map tiles for offline use.  He figured out that this is possible with the Marble desktop mapping application (https://marble.kde.org/) but couldn’t figure out where the map tiles were stored.  I helped him to figure this out, and also to fix a couple of JavaScript issues he was encountering.  I was concerned that he’d have to set up a locally hosted map server for his JavaScript to connect to, but thankfully it turns out that all of the processing is done at the JavaScript end, and all you need is the required directory /subdirectory structure for map tiles and the PNG images themselves stored in this structure.  It’s good to know for future use.

I also responded to queries from Sarah Phelan regarding the Medical Humanities Network and Kirsteen McCue about her Romantic National Song Network.  Eleanor Lawson also got in touch with some text for one of the redesigned Seeing Speech website pages, so I added that.  It also transpired that she had sent me a document containing lots of other updates in June, but I’d never received the email.  It turns out she had sent it to a Brian Aitken at her own institution (QMU) rather than me.  She sent the document on to me again and I’ll hopefully have some time to implement all of the required changes next week.

I also investigated an issue Thomas Clancy is having with his Saints Places website.  The Google Maps used throughout the website are no longer working.  After some investigation it would appear that Google is now charging for using its maps service.  You can view information here: https://cloud.google.com/maps-platform/user-guide/.  So you now have to set up an account with a credit card associated with it to use Google Maps on your website.  Google offer $200 worth of free usage, and I believe you can set a limit that would mean if usage goes over that amount the service is blocked until the next monthly period.  Pricing information can be found here: https://cloud.google.com/maps-platform/pricing/sheet/.  The maps on the Saints website are ‘Dynamic Maps’, and although the information is pretty confusing I think the table on the above page says that the $200 of free credit would cover 28,000 loads of a map on the Saints website per month (the cost is $7 per 1000 loads), and every time a user loads a page with a map on it this is one load, so one user looking at several records will log multiple map loads.

This isn’t something I can fix and it has worrying implications for projects that have fixed periods of funding but need to continue to be live for years or decades after the period of funding.  It feels like a very long time since Google’s motto was “Don’t be evil” and I’m very glad I moved over to using the Leaflet mapping library rather than Google a few years ago now.

I also spent a bit of time making further updates to the new Place-names of Kirkcudbrightshire website, creating some place-holder pages for the public website, adding in the necessary logos and a background map image, updating the parish three-letter acronyms in the database and updating the map in the front-end so that it defaults to showing the right part of Scotland.

I was engaged in some App related duties this week too, communicating with Valentina Busin in MVLS about publishing a student-created app.  Pamela Scott in MVLS also contacted me to say that her ‘Molecular Methods’ app had been taken off the Android App store.  After logging into the UoG Android account I found a bunch of emails from Google saying that about 6 of our apps had been taken down because they didn’t include a ‘child-directed declaration’.  Apparently this is a new thing that was introduced and you have to tick a checkbox in the Developer console to say whether your app is primarily aimed at under 13 year-olds.  Once that’s done your app gets added back to the store.  I did this for the required apps and all was put right again about an hour later.

I spent about a day this week working on Historical Thesaurus duties.  I set up a new ‘colophon’ page that will list all of the technologies we use on the HT website and I also returned to the ongoing task of aligning the HT and OED data.  I created new fields for the HT and OED category and word tables to contain headings / words that are stripped of all non-alphanumeric characters (including spaces) and also all occurrences of ‘ and ‘ and ‘ or ‘ (with spaces round them).  I also converted the text into all lower case.  This means a word such as “in spite of/unþonc/maugre/despite one’s teeth” will be stored in the field as “inspiteofunþoncmaugredespiteonesteeth”.  The idea is that it will be easier to compare HT and OED data with such extraneous information stripped out.  With this in place I then ran a script that goes through all of the unmatched categories and finds any where the oedmaincat matches OED path, subcat matches OED sub, the pat of speech matches and the ‘stripped’ headings match.  This has identified 1556 new matches, which I’ve now logged in the database.  This brings the total unmatched HT categories down to 10,478 (of which  1679 have no oedmaincat and presumably can’t be matched).  The total unmatched OED categories is 13,498 (of which 8406 have no pos and so will probably never match an HT category).  There are also a further 920 potential matches where the oedmaincat matches the path, the pos matches and the ‘stripped’ headings match, but the subcat numbers are different.  I’ll need to speak to Marc and Fraser about these next week.

I spent most of Friday working on the setting up the system for the ‘Records of Govan Old’ crowdsourcing site for Scott Spurlock.  Although it’s not completely finished things are beginning to come together.  It’s a system that’s based on the ‘Scripto’ crowdsourcing tool (http://scripto.org/) that uses Omeka and MediaWiki to manage data and versioning.  The interface I’ve set up is pretty plain at the moment but I’ve set up a couple of sample pages with placeholder text (Home and About).  It’s also possible to browse collections – currently there is only one collection (Govan old images) but this could be used to have different collections for different manuscripts, for example.  You can then view items in the collection, or from the menu choose ‘browse items’ to access all of them.

For now there are only two sample images in the system, which are images from a related manuscript that Scott previously gave me.  Users can create a user accounts via MediaWiki and then if you then go to the ‘Browse items’ page then select one of the images to transcribe you can view the image in a zoomable / panable image viewer, view any existing transcription that’s been made, view the history of changes made and if you press the ‘edit’ link a section will open that allows you to edit the transcription and add your own.

I’ve added in a bunch of buttons that place tags in the transcription area when they’re clicked on.  They’re TEI tags so eventually (hopefully) we’ll be able to shape the texts into valid TEI XML documents.  All updates made by users are tracked and you can view all previous versions of the transcriptions, so if anyone comes along and messes things up it’s easy to revert to an earlier version.  There’s also an admin interface where you can view the pages and ‘protect’ them, which prevents future edits being made by anyone other than admin users.

There’s still a lot to be done with this.  For example, at the moment it’s possible to add any tags and HTML to the transcription, which we want to prevent for security reasons as much as anything else.  The ‘wiki’ that sits behind the transcription interface (which you see when creating an account) is also open for users to edit and mess up so that needs to be locked down too.  I also want to update the item lists so that it displays which items have not be transcribed, which have been started and which have been ‘protected’, to make it easier for users to find something to work on.    I need to get the actual images that we’ll use in the tool before I do much more with this, I reckon.