Week Beginning 19th February 2018

This week marked the start of the UCU’s strike action, which I am participating in.  This meant that I only worked from Monday to Wednesday.  It was quite a horribly busy week as I tried to complete some of the more urgent things on my ‘to do’ list before the start of the strike, while other things I had intended to complete unfortunately had to be postponed.  I spent some time on Monday writing a section containing details about the technical methodology for a proposal Scott Spurlock is intending to submit to the HLF.  I can’t really say too much about it here, but it will involve crowd sourcing and I therefore had to spend time researching the technologies and workflows that might work best for the project and then writing the required text.  Also on Monday I discovered that the AHRC does now have some guidance on its website about the switchover from Technical Plans to Data Management Plans.  There are some sample materials and accompanying support documentation, which is very helpful.  This can currently be found here:  http://www.ahrc.ac.uk/peerreview/peer-review-updates-and-guidance/ although this doesn’t look like it will be a very permanent URL.  Thankfully there will be a transition period up to the 29th of March when proposals can be submitted with either a Technical Plan or a DMP.  This will make things easier for a few projects I’m involved with.

Also on Monday Gary Thoms contacted me to say there were some problems with the upload facilities for the SCOSYA project, so I spent some time trying to figure out what was going on there.  What has happened is that Google seem to have restricted access to their geocoding API, which the upload script connects to in order to get the latitude and longitude of the ‘display town’.  Instead of returning data, Google was returning an error saying we had exceeded our quota of requests.  This was because previously I was just connecting to their API without registering for an API key, which used to work just fine but now is intermittent.  Keep refreshing this page: https://maps.googleapis.com/maps/api/geocode/json?address=Aberdour+scotland and you’ll see it returns data sometimes and an error about exceeding the quota other times.

After figuring this out I created an API account for the project with Google.  If I pass the key they gave me in the URL this now bypasses the restrictions.  We are allowed up to 2,500 requests a day and up to 5000 requests in 100 seconds (that’s what they say – not sure how that works if you’re limited to 2,500 a day) so we shouldn’t encounter a quota error again.

Thankfully the errors Gary was encountering with a second file turned out to be caused by typos in the questionnaire – an invalid postcode was given.  There were issues with a third questionnaire, which was giving an error on upload without stating what the error was, which was odd as I’d added in some fairly comprehensive error handling.  After some further investigation it turned out to be caused by the questionnaire containing a postcode that didn’t actually exist.  In order to get the latitude and longitude for a postcode my scripts connect to an external API which then returns the data in the ever so handy JSON format.  However, a while ago the API I was connecting to started to go a bit flaky and for this reason I added in a connection to a second external API if the first one gave a 404.  But now the initial API I used has completely gone offline, and is taking ages to even return a 404, which was really slowing down the upload script.  Not only that but the second API didn’t handle ‘unknown’ postcode errors in the same way.  The first API returned a nice error message but the second one just returned an empty JSON file.  This meant my error handler wasn’t picking up that there was a postcode error and thus giving no feedback.  I have now completely dropped the first API and connect directly to the second one, which speeds up the upload script dramatically.  I have also updated my error handlers so it knows how to handle an empty JSON file from this API.

On Tuesday I fixed a data upload error with the Thesaurus of Old English, spoke to Graeme about the AHRC’s DMPs and spent the morning working on the Advanced Search for the REELS project.  Last week I had completed the API for the advanced search and had started on the front end, and this week I managed to complete the front end for the search, including auto-complete fields where required, and supplying facilities to export the search results in CSV and JSON format.  There was a lot more to this task than I’m saying here but the upshot is that we have a search facility that can be used to build up some pretty complex queries.

On Tuesday afternoon we had a project meeting for the REELS project where I demonstrated the front end facilities and we discussed some further updates that would be required for the content management system.  I tackled some of these on Wednesday.  The biggest issue was with adding place-name elements to historical forms.  If you created a new element through the page where elements are associated with historical forms an error was encountered that caused the entire script to break and display a blank page.  Thankfully after a bit of investigation I figured out what was causing this and fixed it.  I also implemented the following:

  1. Added gender to elements
  2. Added ‘Epexegetic’ to the ‘role’ list
  3. When adding new elements to a place-name or historical form no language is selected by default, meaning entering text into the ‘element’ field searches all languages. The language appears in brackets after each element in the returned list.  Once selected the element’s language is then selected in the ‘language’ list.  You can still select a language before typing in an element to limit the search to that specific language
  4. All traces of ‘ScEng’ have been removed
  5. I’d noticed that when no element order was specified when you returned to the ‘manage elements’ page the various elements would sometime just appear in a random order. I’ve made it so that if no element order is entered the order is always the order in which the elements were originally added.
  6. When a historical form has been given elements these now appear in the table of historical forms on the ‘edit place’ page, so you can tell which forms already have elements (and what they are) without needing to load the edit a historical form page.
  7. Added an ‘unknown’ element. As all elements need a language I’ve assigned this to ‘Not applicable (na)’ for now.

Also on Wednesday I had to spend some time investigating why an old website of mine wasn’t displaying characters properly.  This was caused by the site being moved to a new server a couple of weeks ago.  It turned out to be caused by the page fragments (of which there are several thousand) being encoded as ANSI when the need to be UTF8.  I thought it would be a simple task to batch process the files to convert them butI’m afraid doing something as simple as batch converting from ANSI to UTF8 is proving to be stupidly difficult.  I still haven’t found a way to do it.  I tried following the example in Powershell here: https://superuser.com/questions/113394/free-ansi-to-utf8-multiple-files-converter

But it turns out you can only convert to UTF8 with BOM, which adds in bad characters to the start of the file as displayed on the website.  And there’s no easy way to get it without BOM, as discussed here: https://stackoverflow.com/questions/5596982/using-powershell-to-write-a-file-in-utf-8-without-the-bom

I then followed some of the possible methods listed here: https://gist.github.com/dogancelik/2a88c81d309a753cecd8b8460d3098bc UTFCast used to offer a ‘lite’ version for free that would have worked, but now they only offer the paid version, plus a demo.  I’ve installed the demo but it only allows conversion to UTF8 with BOM as well.  I got a macro working in Notepad++ but it turns out macros are utterly pointless as you can’t set them to run on multiple files at once – you need to open each file and then play the macro each time.  I also installed the python script plugin for Notepad++ and tried to run the script listed on the above page but nothing happens at all – not even an error message.  It was all very frustrating and I had to give up due to a lack of time.  Graeme (who was also involved in this project back in the day) had an old program that can do the batch converting and he gave me a copy so I’ll try this when I get the chance.

So that was my three-day week.  Next week I’ll be on strike on Monday to Wednesday so will be back at work on Thursday.