Week Beginning 16th June 2014

Back to five days a week this week, and a pretty full-on week it was too.  On Monday I had requests for updates from three separate people about different Worpress websites (Choral Burns, CogTop and ISAS 2015).  Thankfully they were all pretty straightforward requests and I got them all sorted pretty quickly.  For the rest of the day I continued with DSL work, focussing this time on implementing the Bibliography search.  I’ve added this as a separate search tab in the ‘Advanced Search’ page and it allows users to search the bibliography for authors, titles and full text.  I’m not entirely sure why people would want to search the bibliography as it currently stands, as there is no way to get from a bibliographical record to the dictionary entries that reference it.  However, Ann is on the case and will be giving me some feedback in the next few weeks about how the search and display options might need to be updated.  Also this week I added another set of the ancillary text, this time the ‘History of DOST’ page.  I don’t really have much more to do for DSL now until I get content and feedback from the DSL people, so I’m going to be focussing on other projects instead.

I spent some time this week working with the Scots School Dictionary data, specifically extracting it from the HTML table layout and into something more usable.  This involved several stages.  Firstly I copied and pasted the text from my web browser into Excel.  Next I saved the data as a CSV file.  I then created a MySQL table with the correct structure for the data and wrote a little PHP script that would increment over the CSV file, uploading each record into this database.  I then wrote another PHP script that generated a JSON file from this data, which I will use as the data source for the App I’ll be developing.  This process may sound fairly straightforward, but I ran into some difficulties with character encoding along the way that took some time to sort out.  For example, all of the space characters weren’t simple space characters but were being encoded as some kind of odd UTF8 non-breaking space.  These looked just like normal spaces (all documents and the database were set to UTF8 encoding) but running a query on the database that involved a space (e.g. “where word like ‘%sample word%’”) found no results as the space supplied didn’t match the weird space that was recorded, which was very frustrating.  I eventually had to go back to the HTML data and do a find and replace on spaces in order to get round the problem.

I had a couple of meetings with people this week.  The first was with Pauline and Nigel regarding maps for the Burns tours.  We’re wanting to plot the tour routes in the Google Maps style interface and to have these available for launch alongside the prose volume.  We made a bit of progress deciding how the maps should look and who will gather the data.  The next stage will be to speak to Chris Fleet at NLS to see what can be done with the maps.  We already have static versions of the maps with the tours on them as very large TIFF files, but we’ll need to get geocoded versions of the maps if we want to be able to pin boxes on them and allow people to zoom in and out of them.

My second meeting was with Susan regarding her Scots Thesaurus project.  This has hit something of a snag due to the availability of technical staff, but she now has a clearer course of action.  Hopefully she will get someone to work on the project in the next couple of months, but on the meantime I’m going to help her set up the project website and some example visualisations for use at conferences and the like.

The remainder of my week was spent working on updates to the Historical Thesaurus of English.  Christian and Fraser have been working through the data, identifying sections that need to be renumbered (e.g. any sections that have a ‘00’ number).  These changes then have knock-on effects for other sections of the data (e.g. all categories at a certain level shifting their level 3 number by one).  I was given ten pages worth of updates to make, covering many thousand category records.  It took the best part of two days to write the scripts and the database queries that could handle the updates, testing these, making the changes and then testing the system after the updates were made.  We ran into a couple of problems along that way with duplicate numbers, but for the most part everything went smoothly.  I’m very glad it’s all been completed as it was a pretty tricky task.  Of course these updates do mean that the numbering of the online version no longer corresponds to the printed version, but the HT people seem ok with that.

Next week I have some further AHRC duties to perform, and I also hope to be able to publish updated versions of the STELLA apps too.