Week Beginning 24th October 2016

I had a very relaxing holiday last week and returned to work on Monday.  When I got back to work I spent a bit of time catching up with things, going through my emails, writing ‘to do’ lists and things like that and once that was out the way I settled back down into some actual work.

I started off with Mapping Metaphor.  Wendy had noticed a bug with the drop-down ‘show descriptors’ buttons in the search results page, which I swiftly fixed.  Carole had also completed work on all of the Old English metaphor data, so I uploaded this to the database.  Unfortunately, this process didn’t go as smoothly as previous data uploads due to some earlier troubles with this final set of data (this data was the dreaded ‘H27’ data, which originally was one category but which was split into several smaller categories, which caused problems for Flora’s Access database that the researchers were using).

Rather that updating rows the data upload added new ones, and this was because the ordering of cat1 and cat2 appeared to have been reversed since stage 4 of the data processing.  For example, in the database cat1 is ‘1A16’ and cat2 is ‘2C01’ but in the spreadsheet these are the other way round.  Thankfully this was consistently the case, so once identified it was easy to rectify the problem.  For Old English we now have a complete set of metaphorical connections, consisting of 2488 and 4985 example words.  I also fixed a slight bug in the category ordering for OE categories and replied to Wendy about a query she had received regarding access to the underlying metaphor data.  After that I updated a few FAQs and all was complete.

Also this week I undertook some more AHRC work, which took up the best part of a day, and I replied to a request from Gavin Miller about a Medical Humanities Network mailing list.  We’ve agreed a strategy to implement such a thing, which I hope to undertake next week.  I also chatted to Chris about migrating the Scots Corpus website to a new server.  The fact that the underlying database is PostGreSQL rather than MySQL is causing a few issues here, but we’ve come up with a solution to this.

I spent a couple of days this week working on the SCOSYA project, continuing with the updates to the ‘consistency data’ views that I had begun before I went away.  I added an option to the page that allows staff to select which ‘code parents’ they want to include in the output.  This defaults to ‘all’ but you can narrow this down to any of them as required.  You can also select or deselect ‘all’ which ticks / unticks all the boxes.  The ‘in-browser table’ view now colour codes the codes based on their parent, with the colour assigned to a parent listed across the top of the page.  The colours are randomly assigned each time the page loads so if two colours are too similar a user can reload the page and different ones will take their place.

Colour coding is not possible in the CSV view as CSV files are plain text and can’t have any formatting.  However, I have added the colour coding to the chart view, which colours both the bars and the code text based on each code’s parent.  I’ve also added in a little feature that allows staff to save the charts.

I then added in the last remaining required feature to the consistency data page, namely making the figures for ‘percentage high’ and ‘percentage low’ available in addition to ‘percentage mixed’.  In the ‘in-browser’ and CSV table views these appear as new rows and columns alongside ‘% mixed’, giving you the figures for each location and for each code.  In the Chart view I’ve updated the layout to make a ‘stacked percentage’ bar chart.  Each bar is the same height (100) but is split into differently coloured sections to reflect the parts that are high, low and mixed.  I’ve made ‘mixed’ appear at the bottom rather than between high and low as mixed is most important and it’s easier to track whatever is at the bottom.  This change in chart style does mean that the bars are no longer colour coded to match the parent code (as three colours are now needed per bar), but the x-axis label still has the parent code colour so you can still see which code belongs to each parent.

I spent most of the rest of the week working with the new OED data for the Historical Thesaurus.  I had previously made a page that lists all of the HT categories and notes which OED categories match up, or if there are no matching OED categories.  Fraser had suggested that it would be good to be able to approach this from the other side – starting with OED categories and finding which ones have a matching HT category and which ones don’t.  I created such a script, and I also update both this and the other script so that the output would either display all of the categories or just those that don’t have matches (as these are the ones we need to focus on).

I then focussed on creating a script that matches up HT and OED words for each category where the HT and OED categories match up.  What the script does is as follows:

  1. Finds each HT category that has a matching OED category
  2. Retrieves the lists of HT and OED words in each
  3. For each HT word displays it and the HT ‘fulldate’ field
  4. For each HT word it then checks to see if an OED word matches.  This checks the HT’s ‘wordoed’ column against the OED’s ‘ght_lemma’ column and also the OED’s ‘lemma’ column (as I noticed sometimes the ‘ght_lemma’ column is blank but the ‘lemma’ column matches)
  5. If an OED word matches the script displays it and its dates (OED ‘sortdate’ (=start) and ‘enddate’)
  6. If there are any additional OED words in the category that haven’t been matched to an HT word these are then displayed


Note that this script has to process every word in the HT thesaurus and every word in the OED thesaurus data so it’s rather a lot of data.  I tried running it on the full dataset but this resulted in Firefox crashing.  And Chrome too.  For this reason I’ve added a limit on the number of categories that are processed.  By default the script starts at 0 and processes 10,000 categories.  ‘Data processing complete’ appears at the bottom of the output so you can tell it’s finished, as sometimes a browser will just silently stop processing.  You can look at a different section of the data by passing parameters to it – ‘start’ (the row to start at) and ‘rows’ (the number of rows to process).  I’ve tried it with 50,000 categories at it worked for me, but any more than that may result in a crashed browser.  I think the output is pretty encouraging.  The majority of OED words appear to match up, and for the OED words that don’t I could create a script that lists these and we could manually decide what to do with them – or we could just automatically add them, but there are definitely some in there that should match – such as HT ‘Roche(‘s) limit’ and OED ‘Roche limit’.  After that I guess we just need to figure out how we handle the OED dates.  Fraser, Marc and I are meeting next week to discuss how to take this further.

I was left with a bit of time on Friday afternoon which I spent attempting to get the ‘essentials of Old English’ app updated.  A few weeks ago it was brought to my attention that some of the ‘C’ entries in the glossary were missing their initial letters.  I’ve fixed this issue in the source code (it took a matter of minutes) but updating the app even for such a small fix takes rather a lot longer than this.  First of all I had to wait until gigabytes of updates had been installed for MacOS and XCode, as I hadn’t used either for a while.  After that I had to update Cordova, and then I had to update the Android developer tools.  Cordova kept failing to build my updated app because it said I hadn’t accepted the license agreement for Android, even though I had!  This was hugely frustrating, and eventually I figured out the problem was I had the Android tools installed in two different locations.  I’d updated Android (and accepted the license agreements) in one place, but Cordova uses the tools installed in the other location.  After realising this I made the necessary updates and finally my project built successfully.  Unfortunately about three hours of my Friday afternoon had by that point been used up and it was time to leave.  I’ll try to get the app updated next week, but I know there are more tedious hoops to jump through before this tiny fix is reflected in the app stores.