Week Beginning 15th August 2016

I was expecting to spend a fair amount of this week working on the Scots Syntax Atlas project, beginning to develop the atlas interface following a meeting with Gary on Monday.  Unfortunately, Gary was off sick and we had to postpone our meeting, which meant I could get started on the Atlas as Gary and I need to meet to agree just how it should work.  However, I did meet with Flora, who is working as an administrator for the project.  I created a user account for her and showed her how to use the content management system.  I spent most of the rest of the week improving my XML and TEI skills, which is something I’ve been meaning to do for a long time and was prompted by Alison Wiggins’ proposal that she’s putting together, which I would be doing the technical work for and which would involve a lot of TEI stuff.

Alison had previously given me three pages of transcriptions of a late 16th century account book that she had created using rudimentary XML.  What I wanted to do was figure out how best to convert this to TEI XML – what elements and attributes should be used, what TEI modules would be required, and other such things.  I had previously got to grips with the basics of TEI, XML and the Oxygen XML editor in the winter, whilst doing some work for the People’s Voice project, so I had the materials from this to get me started.

Initially I just played around with elements and attributes using Notepad++ and referencing the TEI documentation (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/), but I had a chat with Graeme Cannon (HATII’s XML expert) and he reminded me that if I used Oxygen I could start a new document and select a TEI P5 template and the editor would then give helpful warnings and other feedback about which elements and attributes could go where.  This was hugely useful advice.  Graeme also provided some very helpful advice on structuring the document, which was much appreciated.

After playing around with the structure of the text and figuring out which elements and attributes might suit I used the TEI Roma tool http://tei.oucs.ox.ac.uk/Roma/ to generate a bespoke RELAXNG schema for the transcription, which contained just the TEI modules that were required.  I say ‘bespoke’ but actually the ‘TEI for Manuscript Description’ template that Roma provided was pretty much exactly what I required.  After generating the RNG file I associated it with my XML file in place of the more generic (and externally hosted) TEI schema and I also created a nice little CSS stylesheet for use in Oxygen’s ‘Author view’, which I set up to display separate pages, individual account entries with their own border, proper names in bold and things like that.

Going through the original transcript brought up lots of questions and I had an email conversation with Alison over the course of the week where we discussed these.  The transcription included the noting of scribal hands, and I included this in the TEI transcription using the TEI ‘hand’ attribute, which in turn links to a section of the TEI header called ‘handNotes’, where each hand and information about it is defined.  The transcription included currency values in the form of pounds, shillings and pence so I used a ‘num’ element for this, using the attribute ‘type’ to specify ‘LSD’ (there will be other currency types such as crowns and groats) with the actual value contained in the ‘n’ attribute, split by a decimal point – e.g. n=”6.5.2” is six pounds, five shillings and two pence.

Alison wanted to categorise each entry into one or more categories and for this I created a simple taxonomy using the TEI ‘taxonomy’ tag.  The taxonomy was located in the TEI header, within the ‘classDecl’ element, which in turn was within the ‘encodingDesc’ element.  I made a very simple, non-hierarchical taxonomy, consisting of a list of categories, which in turn had IDs and names.  I then linked to one or more of these categories from the entry ‘div’ elements.  The most obvious attribute to link from the ‘div’ to the taxonomy seemed to me to be the ‘ref’ attribute, or possibly the ‘target’ attribute.  However, neither of these attributes can be used with a ‘div’ element by default in TEI.  Instead I chose to use the ‘ana’ attribute (which is for denoting analysis of text).  It didn’t feel quite right to use this attribute, but it does sort of make sense as placing entries in categories is analysis of some sort and the attribute did allow for multiple IDs to be specified in it.

The other main aspect of the mark-up was proper names – people and places.  I decided to tag these using the ‘name’ element, with the ‘type’ attribute use to state whether the name is a person or a place.  There are more specific elements, such as ‘persName’, but you need to use a specific TEI module to incorporate these and I wanted to keep things simple.  There is quite a lot of information associated with names, such as titles, forenames, surnames, gender etc, so I decided that rather than store all of this in the XML file I would create the information as an Excel spreadsheet, with an ‘ID’ column in this linking in to the ‘ref’ attribute of the ‘name’ element.  This worked out pretty well, resulting in a spreadsheet containing about 50 names for the three page transcription, some of which appeared multiple times in the text.

Working with these transcriptions has been a really useful experience.  I’ve wanted to gain more experience with TEI XML for a number of years now, and while working on The People’s Voice project last winter was a great start, working on the account book transcriptions has really improved my understanding of how TEI and mark-up in general works.

I met with Alison on Friday and we went through the transcriptions, following a set of guidelines that I’d put together as a Word document over the course of the week.  She seems pretty happy with how things are working out and will now go off and transcribe more pages using the schema and guidelines that I have created.  No doubt she will encounter things that I haven’t covered this week, but when she does a brief meeting should hopefully allow us to decide what course of action to take.

Other than this main task I spent a bit of time this week compiling a list of all of the WordPress sites I’ve set up for staff over the years.  It turns out there are 22 of them, which is rather a lot!  I’m going to use this list to ensure that I regularly upgrade the WordPress versions for each site and ensure all is well with the sites.  I began doing this on Friday, getting through about half of them. I noticed that one site’s database was much larger that it should have been, which led me to discover that one of the member of staff’s accounts had been hijacked (almost certainly due to an easy to crack password being used) and obfuscated Javascript had been added to all of the pages, which seemed to be attempting to redirect the pages to malicious servers.  I cleaned up the site and reset the user’s password.  I’ll keep a more regular check on these kinds of things in future, I reckon.