Week beginning 1st September 2014

This week marks my second anniversary as the School of Critical Studies’ Digital Humanities Research Officer.  I have to say that is has been a really excellent year and I’ve been involved in some amazing projects over the course of it.  Redesigning the Historical Thesaurus was an amazing opportunity, as was creating a new interface for the Scottish Corpus.  Working towards the relaunch of the Dictionary of the Scots Language (which takes place next Friday) has been really rewarding too.  It has been wonderful to gain experience developing apps, and seeing the three STELLA apps I created available through the Apple App store is very rewarding.  I have also really got to grips with technical plans and AHRC bids in the past year, having created plans for numerous projects, and becoming a technical reviewer for the AHRC is something I’ve been hoping to do for a long time.  Developing visualisations for the Mapping Metaphor project has also been a tremendous opportunity to learn more about data visualisation techniques, an area that is increasingly important.  Tons more than this has happened over the course of the year, but I would say the above are the highlights.  I’ve probably forgotten to mention some other amazing things that happened this year, but that’s what these weekly reports are for after all.  Here’s hoping the coming year has lots more exciting projects in store for me!

Anyway, back to the current week.  This was mostly split between two projects: DSL and the Historical Thesaurus.  For the HT I had to return once again to the problem of duplicate categories.  I spent a large amount of time trying to figure out how best to programmatically fix the issue, including writing numerous little scripts that listed some of the troublesome rows, but I just couldn’t find the key that would let me fix things and it looked for a long time like manual intervention would be required to fix the errors relating to the several hundred erroneous categories.  Marc, Fraser and I met on Monday afternoon and had a bit of a brainstorming session and we finally figured out how to pick out the erroneous rows.  Many of the problematical categories were empty and incorrect main categories that had the same heading and number as a valid subcategory, but a different part of speech.  I created a little script that would pick out all main categories that had the same number and heading as a subcategory but a different part of speech and was able to identify and delete several hundred dodgy main categories (checking first that they were empty).  What a relief!  I then wrote another script to identify any subcategories that didn’t have a main category, with the bulk of these getting fixed by the removal of a duplicate t7 number.

For future reference here is a summary of what caused the errors we have been dealing with over the past week or so:

1.  At some point during the initial redevelopment of the HT database I imported a bunch of empty categories to fill the gaps in the hierarchy.  During this process something went wrong and some subcategories ended up appearing as main categories with erroneous parts of speech.  These new main categories were all empty and were duplicates of valid subcategories.  We have hopefully now tracked all of these down (there are certainly no duplicate category numbers in the database any more).

2.  During the recent renumbering process some categories were shifted up a number and any that had a t7 number should have had the t7 column cleared of data.  Unfortunately I overlooked this, leading to the t7 column containing the same value as the t6 column.  As far as I can tell these have now all been sorted.

After getting the data in order I proceeded to integrate the thematic headings that Fraser has been preparing for the SAMUELS project.  Rather than add these as new columns to the category table, I have created a new table for this data.  This is to avoid duplication of data:  There are just over 4000 thematic headings and these are applied to almost a quarter of a million categories and representing the 7 new thematic heading columns would therefore have resulted in a lot of duplication.  I managed to get the 4000 headings uploaded and add in the links to the categories, but as always there were some issues.  I’ve emailed Fraser about these and I should be able to get the data all finalised next week.

On to DSL now.  This week I managed to get through pretty much all of the outstanding items on my ‘to do’ list and the site is just about ready for its launch next week.  I had another sizable chunk of text to migrate from the old site to the new one this week – ‘The Scots Language’.  Thankfully the markup of this section of the old site was nowhere nearly as bad as the ‘History of Scots’ section and didn’t require a lengthy and tedious complete retagging, just some tweaking with regards to paragraphs, headings and footnotes.  It still took some time but it’s all done now.  I also added in maps of Scotland and England from the original printed dictionaries as pannable, scrollable OpenLayers.js map images, which I think is rather nice.  I also added in the content of the ‘cite’ boxes too.  The last bits of content were also added, for example the abbreviations page.

I also completed some more interesting, more ‘developery’  tasks this week too, such as updating the ‘add yogh’ feature so that it adds the yogh character at the cursor position in the text box rather than at the end of the text and adding in the new ‘supplemental’ information that now gets returned for certain entries via the API.  I also added in some error messages and warnings too.  If a search reaches the maximum number of results permissible by the API, the API now returns an HTTP ‘partial content’ error code.  My scripts now recognise this and display a warning.  Similarly, if the API is offline it will send a 503 server error and in such an event the front end now displays a warning message (here’s hoping this one won’t be seen very often!).  I think we’re pretty much all ready to switch the DNS to point to the new site, which will take place some time next week.

Some other tasks I carried out this week included finally getting round to completing my PDR form and sending it to Jennifer, speaking to Pauline Mackay about another couple of projects she’s hoping to get funding for, agreeing to review yet another AHRC bid and fixing the maps on the Burns website, which had stopped working due to OpenLayers changing the location of their scripts (I’m using local versions now so that won’t happen again).