Week Beginning 15th February 2021

I spent quite a bit of time this week continuing to work on the Anglo-Norman Dictionary, creating a new ‘bibliography’ page that will replace the existing ‘source texts’ page and uses the new source text management scripts that I added to the new content management system recently.  This required rather a lot of updates as I needed to update the API to use the new source texts table and also to incorporate source text items as required, which took some time.  I then created the new ‘bibliography’ page which uses the new source text data.  There is new introductory text and each item features the new fields as requested by the editors.  ‘Dean’ references always appear, the title and author are in bold and ‘details’ and ‘notes’ appear when present.  If a source text has one or more items these are listed in numeric order, in a slightly smaller font and indented.  Brackets for page numbers are added in.  I also had to change the way the source texts were ordered as previously the list was ordered by the ‘slug’ but with the updates to the data it sometimes happens that the ‘slug’ doesn’t begin with the same letter as the siglum text and this was messing up the order and the alphabetical buttons.  Now the list is ordered by the siglum text stripped of any tags and all seems to be working fine.  I will still need to update the links from dictionary items to the bibliography when the new page goes live, and update the search facilities too, but I’ll leave this until we’re ready to launch the new page.

During the week I made a number of further tweaks to the new bibliography page based on feedback from the editors.  One big thing was to change the way the page was split up in order to allow links to specific bibliographical items to be added.  Previously the selection of a section of the bibliography based on the initial letter was handled in the browser via JavaScript.  This made it fast to switch between letters, but meant that it was not possible to easily link to a specific section of the bibliography.  I changed this so that the selection was handled on the server side.  This does mean that each time a letter is pressed on the whole page needs to reload, which is a bit slower, but it also means you can bookmark a specific letter, e.g. bibliographies beginning with ‘T’.  It also means it’s possible to link to a specific item within a page.  Each item in the page has an ID in the HTML consisting of ‘bib-‘ plus the item’s slug.  To link to this section of the page you can add a link consisting of the page URL, a hash, and then this ID.  Then when the page loads it will jump down to the relevant section.

I also had to change the way items within bibliographical entries were ordered.  These were previously ordered on the ‘numeral’ field, which contained a Roman numeral.  I’d written a bit of a hack to ensure that these were ordered correctly up to 20, but it turns out that there are some entries with more than 60 items, and some of them have non-standard numerals, such as ‘IXa’.  I decided that it would be too complicated to use the ‘numeral’ field for ordering as the contents are likely to be too inconsistent for a computer to automatically order successfully.  I therefore created a new ‘itemorder’ column in the database that holds a numerical value that decides the order of the items.  I wrote a little script that populates this field for the items already in the system and for any bibliographical entry with 20 or fewer items the order should be correct without manual intervention.  For the handful of entries with more than 20 items the editors will have to manually update the order.  I updated the DMS so that the new ‘item order’ field appears when you add or edit items, and this will need to be used for each item to rearrange the items into the order they should be in.  The new bibliography page uses the new itemorder field so updates are reflected on this page.

I also needed to update the system to correctly process multiple DEAF links, which I’d forgotten to do previously, made some changes to the ordering of items (e.g. so that entries with a number appear before entries with the same text but without a number) and added in an option to hide certain fields by adding a special character into the field.  Also for the AND I updated the XML of an entry and continued to migrate blog posts from the old blog to our new system.

I then began work on the pages of the CMS that will be used for uploading, viewing and downloading entries.  I added an option to the CMS that allows the editors to choose an entry to view all of the data stored about it and to download its XML for editing.  This consists of a textbox into which an entry’s slug can be entered.  After entering the slug and pressing ‘Go’ a page loads that lists all of the data stored about the entry in the system, such as its ID, the ID from the old system, last editor and date of last edit.  You can also access the XML of the entry if you scroll down to the ‘XML’ section of the page.  The XML is hidden in a collapsed section of the page and if you click on the header it expands.  I’ve added in styles to make it easier to read, using a very nice JavaScript library called prism.js (https://prismjs.com/).  There is also a button to download the XML.  Pressing on this prompts you to save the file, and the filename consists of the entryorder plus the entry ID.  This section of the page will also keep a record of all previous versions of the XML when a new version is uploaded into the system (once I develop the upload feature).  This will allow you to access, check and download older versions of the XML, if some mistake has been made when uploading a new version.

Beneath the XML section you can view all of the information that is extracted from the XML and used in the system for search and display purposes: forms, parts of speech, cross references, labels, citations and translations.  This is to enable the editors to check that the data extracted and used by the system is correct.  I could possibly add in options for you to edit this data, but any edits made would then be overwritten the next time an XML file is uploaded for the entry, so I’m not sure how useful this would be.  I think it would be better to limit the editing of this information to via a new XML file upload only.

However, we may want to make some of the information in this page directly editable, specifically some of the fields in the first table on the page.  The editors may want to change the lemma or homonym number, or the slug or entry order.  Similarly the editors may want to manually override the earliest date for the entry (although this would then be overwritten when a new XML version is uploaded) or change the ‘phase’ information.

The scripts to upload a new XML entry are going to take some time to get working, but at least for now you can view and download entries as required. Here’s a screenshot of how the facility works:

Also this week I dealt with a few queries about the Symposium for Seventeenth-Century Scottish Literature, which was taking place online this week and for which I had set up the website.  I also spoke to Arts IT Support about getting a test server set up for the Historical Thesaurus.  I spent a bit of time working for the Books and Borrowing project, processing images for a ledger from Edinburgh University Library, uploading these to the server and generating page records and links between pages for the ledger.  I also gave some advice to the Scots Language Policy RA about how to use the University’s VPN, spoke to Jennifer Smith about her SCOSYA follow-on funding proposal and had a chat with Thomas Clancy about how we will use GIS systems in the Iona project.

Week Beginning 8th February 2021

I was on holiday from Monday to Wednesday this week to cover the school half-term, so only worked on Thursday and Friday.  On Thursday I had a Zoom call with the Historical Thesaurus team to discuss further imports of new data from the OED and how to export our data (such as the revised category hierarchy) in a format that the OED team would be able to use.  We have a meeting with the OED the week after next so it was good to go over some of the issues and refresh my memory about where things were left off as it’s been several months since I last did any major work on the HT.  As a result of the meeting I also did some further work, namely exporting the current version of the online database and making it available for Fraser to download and access on his own PC, and updating some of the earlier scripts I’d created to generate statistics about the unmatched categories and words so that they used the most recent versions of the database.

Also this week I made some further tweaks to the SCOSYA website and created a user account for a researcher who is going to work with some of the data that is only available in the project’s CMS rather than the public website.  I also read through a new funding proposal that Wendy Anderson is involved with and have her some feedback on that and reported a couple of issues with expired SSL certificates that were affecting some websites.

I spent some time on the Books and Borrowing project on two data-related tasks.  First was to look through the new set of digitised images from Edinburgh University Library and decide what we should do with them.  Each image is of an open book, featuring both recto and verso pages in one image.  We may need to split these up into individual images, or we may just create page records that cover both pages.  I alerted the project PI Katie Halsey to the issue and the team will make a decision about which approach to take next week.  The second task was to look through the data from Selkirk library that another project had generated.  We had previously imported data for Selkirk that another researcher had compiled a few years before our project began, but recently discovered that this data did not include several thousand borrowing records of French prisoners of war, as the focus of the researcher was on Scottish borrowers.  We need these missing records and another project has agreed to let us use their data.  I had intended to completely replace the database I’d previously ingested with this new data, but on closer inspection of the new data I have a number of reservations about doing so.

The data from the other project has been compiled in an Excel spreadsheet and as far as I can tell there is no record of the ledger volume or page that each borrowing record was originally located on.  In the data we already have there is a column for ‘source ref’, containing the ledger volume (e.g. ‘volume 1’) and a column for ‘page number’, containing a unique ID for each page in the spreadsheet (e.g. ‘1010159r’).  Looking through the various sheets in the new spreadsheet there is nothing comparable to this, which is vital for our project, as borrowing records must be associated with page records, which in turn must be associated with a ledger.  It also would make it extremely difficult to trace a record back to the original physical record.

Another issue is that in our existing data the researcher has very handily used unique identifiers for readers (e.g. ‘brodie_james’), borrowing records (e.g. ‘1’) and books (e.g. ‘adam_view_religion’) that tie the various records together very nicely.  The new project’s data does not appear to use any unique identifiers to connect bits of data together.  For example, there are three ‘John Anderson’ borrowers and in the data we’re currently using these are differentiated by their IDs as ‘anderson_john’, ‘anderson_john2’ and ‘anderson_john3’.  This means it’s easy to tell which borrower appears in the borrowing records.  In the new project’s data three different fields are required to identify the borrower:  surname, forename and residence.  This data is stored in separate columns in the ‘All loans’ sheet (e.g. ‘Anderson’, ‘John’, ‘Cramalt’), but in the ‘Members’ sheet everything is joined together in one ‘Name’ field, e.g. ‘Anderson, John (Cramalt)’.  This lack of unique identifiers combined with the inconsistent manner of recording name and place will make it very difficult to automatically join up records and I’ve flagged this up with Katie for further discussion with the team.  It’s looking like we may want to try and identify the POW records from the new project’s data and amalgamate these with the data we already have, rather than replacing everything.

I also spent a bit of time on the Anglo-Norman Dictionary this week, making some changes to homonym numbers for a few entries and manually updating a couple of commentaries.  I also worked for the Dictionary of the Scots Language, preparing the SND and DOST datasets for import into the new editing system that the project is now going to use.  This was a little trickier than anticipated as initially I zipped up the data that I’d exported from the old editing system in November when I worked on the new ‘V4’ version of the online API, but we realised that this still contained duplicates that I’d stripped out when uploading the data into the new online database.  So instead I exported the XML from the online database, but it turned out that during the upload process a section of the entry XML was being removed.  This section (<meta>) contained all of the forms and URLs and my upload process exported these to a separate table and reformatted the XML so that it matched the structure that was defined during the creation of the first version of the API.  However, the new editing system requires this <meta> section so that data I’d prepared was not usable.  Instead I took the XML exported from the old editing system back in November and ran it through the script I’d written to strip out duplicates, then prepared the resulting XML dataset for transfer.  It looks like this approach has worked, but I’ll find out more next week.

Week Beginning 1st February 2021

I had two Zoom calls this week, the first on Wednesday with Kirsteen McCue to discuss a new, small project to publish a selection of musical settings to Burns poems and the second on Friday with Joanna Kopaczyk and her RA on the Scots Language Policy project to give a tutorial on how to use WordPress.

The majority of my week was divided between the Anglo-Norman Dictionary, the Dictionary of the Scots Language and the Place-names of Iona projects.  For the AND I made a few tweaks to the static content of the site and migrated some more blog posts across to the new site (these are not live yet).  I also added commentaries to more than 260 entries, which took some time to test.  I also worked on the DTD file that the editors reference from their XML editing software to ensure that all of the elements and attributes found within commentaries are ‘allowed’ in the XML.  Without doing this it was possible to add the tags in, but this would give errors in the editing software.  I also batch updated all of the entries on the site to reference the new DTD and exported all of the files, zipped them up and sent them to the editors so they can work on them as required.  I also began to think about migrating the TextBase from the old site to the new one, and managed to source the XML files that comprise this system.  It looks like it may be quite tricky to work with these as there are more than 70 book-length XML files to deal with and so far I have not managed to locate the XSLT that was originally used to process these files.

For the DSL I completed work on the new bibliography search pages that use the new ‘V4’ data.  These pages allow the authors and titles of bibliographical items to be searched, results to be viewed and individual items to be displayed.  I also made some minor tweaks to the live site and had a discussion with Ann Fergusson about transferring the project’s data to the people who have set up a new editing interface for them, something I’m hoping to be able to tackle next week.

For the Place-names of Iona project I had a discussion about implementing a new ‘work of the month’ feature and spent quite a bit of time investigating using 10-digit OS grid references in the project’s CMS.  The team need to use up to 10-digit grid references to get 1m accuracy for individual monuments, but the library I use in the CMS to automatically generate latitude and longitude from the supplied grid reference will only work with a 6-digit NGR.  The automatically generated latitude and longitude are then automatically passed to Google Maps to ascertain the altitude of the location and all of this information is stored in the database whenever a new place-name record is created or an existing record is edited.

As the library currently in use will only accept 6-digit NGRs I had to do a bit of research into alternative libraries, and I managed to find one that can accept NGRs of 2,4,6,8 or 10 digits.  Information about the library, including text boxes where you can enter an NGR and see the results can be found here: http://www.movable-type.co.uk/scripts/latlong-os-gridref.html along with an awful lot of description about the calculations and some pretty scary looking formulae.

The library is written in JavaScript, which runs in the client’s browser, whereas the previous library was written in PHP, which runs on the server.  This means I needed to change the way the CMS works – previously you’d enter an NGR and then when the form was submitted to the server the PHP library would generate the latitude and longitude whereas now the latitude and longitude need to be generated in the browser as soon as the NGR is entered into the textbox, and two further textboxes for latitude and longitude will appear in the form and will then be automatically populated with the results.

 

This does mean the person filling out the form can see the generated latitude and longitude and also tweak it if required before submitting the form, which is a potentially useful thing.  I may even be able to add a Google Map to the form so you can see (and possibly tweak) the point before submitting the form, but I’ll need to look into this further.  I also still need to work on the format of the latitude and longitude as the new library generates them with a compass point (e.g. 6.420848° W) and we need to store them as a purely decimal value (e.g. -6.420848) with ‘W’ and ‘S’ figures being negatives.

However, whilst researching this I discovered a potentially worrying thing that needs discussion with the wider team.  The way the Ordnance Survey generates latitude and longitude from their grid references was changed in 2014.  Information about this can be found in the page linked to above in the ‘Latitude/longitudes require a datum’ section.  Previously the OS used ‘OSGB-36’ to generate latitude and longitude, but in 2014 this was changed to ‘WGS84’, which is used by GPS systems.  The difference in the latitude / longitude figures generated by the two systems is about 100 metres, which is quite a lot if you’re intending to pinpoint individual monuments.

The new library has facilities to generate latitude and longitude using either the new or old systems, but defaults to the new system.  I’ve checked the output of the library we currently use and it uses the old ‘OSGB-36’ system.  This means all of the place-names in the system so far (and all those for the previous projects) have latitudes and longitudes generated using the now obsolete (since 2014) system. To give an example of the difference, the place-name A’ Mhachair in the CMS has this location: https://www.google.com/maps/place/56%C2%B019’33.2%22N+6%C2%B025’11.4%22W/@56.3258889,-6.422022,582m/data=!3m2!1e3!4b1!4m5!3m4!1s0x0:0x0!8m2!3d56.325885!4d-6.419828 and with the newer ‘WGS84’ system it would have this location: https://www.google.com/maps/place/56%C2%B019’32.7%22N+6%C2%B025’15.1%22W/@56.325744,-6.4230367,582m/data=!3m2!1e3!4b1!4m5!3m4!1s0x0:0x0!8m2!3d56.325744!4d-6.420848

So what we need to decide before I replace the old library with the new one in the CMS is whether we switch to using ‘WGS84’ or we keep using ‘OSGB-36’.  As I say, this will need further discussion before I implement any changes.

Also this week I responded to a query from Cris Sarg of the Medical Humanities Network project, spoke to Fraser Dallachy about future updates to the HT’s data from the OED, made some tweaks to the structure of the SCOSYA website for Jennifer Smith, added a plugin to the Editing Burns site for Craig Lamont and had a chat with the Books and Borrowing people about cleaning the authors data, importing the Craigston data and how to deal with a lot of borrowers that were excluded from the Selkirk data that I previously imported.

Next week I’ll be on holiday from Monday to Wednesday to cover the school half term.

 

Week Beginning 25 January 2021

I headed into the University for the first time this year on Wednesday this week to collect a new iPad that I’d ordered and to get some files from my office.  It was great to see the old place again, but it did take quite a chunk out of my day to travel there and back, especially as I’m still home-schooling either a morning or an afternoon each day at the moment too.

As with last week, I mainly divided my time this week between the Dictionary of the Scots Language, the Anglo-Norman Dictionary and the Books and Borrowing project, with a few other bits and bobs added in as well.  For the DSL I retrieved the source code for my original Scots School Dictionary app from my office so we can host this somewhere on the DSL website.  This is because the DSL have commissioned someone else to make a new School Dictionary app, which launched this week, but doesn’t include an ‘English to Scots’ feature as the old app does, so we’re going to make the old app available as a website for those people who miss the feature.  I also made a few minor tweaks to the main DSL site, and then focussed on adding bibliography search facilities to the new version of the API, a task that I’d begun last week.

I created a new table for the bibliographical data that includes the various fields used for DOST (note, author, editor, date, longtitle etc) and a field for the XML data used for SND.  I then created two further tables for searching, one that contains every author and editor name for each item (for DOST there may be different names in the author, editor, longauthor and longeditor fields while for SND there may be any number of <author> tags) and the other containing every title for each item (DOST may have different text in title and longtitle while SND items can have any number of <title> tags).  These tables allow you to search for any variant author, editor or title and find the item.

I also created two additional fields in the bibliography table that contain the ‘display author’ and ‘display title’.  These are the forms that get displayed in the search results before you click on an item to open the full bibliographical entry.  I then updated the V4 API to add in facilities to search and retrieve the bibliographies.  I didn’t have the time to connect to this API and to implement the search on the Sienna test site, which is something I hope to do next week, but the logic behind the search and display of bibliographies is all there.  There is a predictive search that will be used to generate the autocomplete list, similar to how the live site currently works:  You will be able to select whether your search is for authors, titles or both and when you start typing in some text a list of matching items will appear, e.g. typing in ‘ham’ for authors in both dictionaries will display the following all items containing ‘ham’ and when you select an item this will then perform a search for the specific text.  You will then be able to click on an item to view the full bibliography.  This is a bit different to how the live site currently works, as with these if you enter ‘ham’ and select (for example) ‘Hamilton, J,’ from the autocomplete list you are taken directly to a page that lists all of the items for the author.  However, we can’t do that any more as we no longer have unique identifiers that group bibliographical items by author.  I may be able to do something similar with the page that comes up when you select an author, but this would have to rely on the name to group items together and a name may not be unique.

For the AND I made some tweaks to the website, such as adding a link to the search page if you type some text into the ‘jump to entry’ option and no matching entries are found.  I then spent the rest of my time continuing to develop the new content management system, specifically the pages for managing source texts.  I finished work on this, adding in facilities to add, edit, browse and delete source texts from the database.  I then migrated the DTD to the new site, which is referenced by the editors’ XML editor when they work on the entry XML files.  The DTD on the old server referenced several lists of things that are then used to populate drop-down lists of options in the XML editor.  I migrated these too, making them dynamically generated from the underlying database rather than statis lists, meaning when (for example) new source texts are added to the CMS these will automatically become available when using the XML editor.

For the Books and Borrowing project I participated in the project’s Zoom call on Monday to discuss the project’s CMS and how to amalgamate the various duplicate author records that resulted from data uploads from different libraries.   After the call I made some required changes to the CMS, such as making the editor’s notes fields visible by default again, and worked on the duplicate authors matching script to add in further outputs when comparing the author names with Levenshtein ratings of 1 and 2.  I also reviewed some content that was sent to us from another library.

Also this week I responded to an email from James Caudle in Scottish Literature about a potential project he’s setting up, made a couple of changes to the Scots Language Policy website, made some tweaks to the menu structure for the Scots Syntax Atlas project and gave some advice to a post-grad student who had contacted me about setting up a corpus.