Week Beginning 5th April 2021

This week began with Easter Monday, which was a holiday.  I’d also taken Tuesday and Thursday off to cover some of the Easter school holidays so it was a two-day working week for me.  I spent some of this time continuing to download and process images of library register books for the Books and Borrowing project, including 14 from St Andrews and several further books from Edinburgh.  I was also in communication with one of the people responsible for the Dictionary of the Scots Language’s new editor interface regarding the export of new data from this interface and importing it into the DSL’s website.  I was sent a ZIP file containing a sample of the data for SND and DOST, plus a sample of the bibliographical data, with some information on the structure of the files and some points for discussion.

I looked through all of the files and considered how I might be able to incorporate the data into the systems that I created for the DSL’s website.  I should be able to run the new dictionary XML files through my upload script with only a few minor modifications required.  It’s also really great that the bibliographies and cross references are getting sorted via the new Editor interface.  One point of discussion is that the new editor interface has generated new IDs for the entries, and the old IDs are not included.  I reckoned that it would be good if the old IDs were included in the XML as well, just in case we ever need to match up the current data with older datasets.  I did notice that the old IDs already appeared to be included in the <url> fields, but after discussion we decided that it would be safer to include them as an attribute of the <entry> tag, e.g. <entry oldid=”snd848”> or something like that, which is what will happen when I receive the full dataset.

There are also new labels for entries, stating when and how the entry was prepared.  The actual labels are stored in a spreadsheet and a numerical ID appears in the XML to reference a row in the spreadsheet.  This method of dealing with labels seems fine with me – I can update my system to use the labels from the spreadsheet and display the relevant labels depending on the numerical codes in the entry XML.  I reckon it’s probably better to not store the actual labels in the XML as this saves space and makes it easier to change the label text, if required, as it’s only then stored in a single place.

The bibliographies are looking good in the sample data, but I pointed out that it might be handy to have a reference of the old bibliographical IDs in the XML, if that’s possible.  There were also spurious xmlns=”” attributes in the new XML, but these shouldn’t pose any problems and I said that it’s ok to leave them in.  Once I receive the full dataset with some tweaks (e.g. the inclusion of old IDs) then I will do some further work on this.

I spent most of the rest of my available time working on the new Comparative Kingship place-names systems.  I completed work on the Scotland CMS, including adding in the required parishes and former parishes.  This means my place-name system has now been fully modernised and uses the Bootstrap framework throughout, which looks a lot better and works more effectively on all screen dimensions.

I also imported the data from GB1900 for the relevant parishes.  There are more than 10,000 names, although a lot of these could be trimmed out – lots of ‘F.P.’ for footpath etc.  It’s likely that the parishes listed are rather broader than the study will be.  All the names in and around St Andrews are in there, for example.  In order to generate altitude for each of the names imported from GB1900 I had to run a script I’d written that passes the latitude and longitude for each name in turn to Google Maps, which then returns elevation data.  I had to limit the frequency of submissions to one every few seconds otherwise Google blocks access, so it took rather a long time for the altitudes of more than 10,000 names to be gathered, but the process completed successfully.

Also this week I dealt with an issue with the SCOTS corpus, which had broken (the database had gone offline) and helped Raymond at Arts IT Support to investigate why the Anglo-Norman Dictionary server had been blocking uploads to the dictionary management system when thousands of files were added to the upload form.  It turns out that while the Glasgow IP address range was added into the whitelist the VPN’s IP address range wasn’t, which is why uploads were being blocked.

Next week I’m also taking a couple of days off to cover the Easter School holidays, and will no doubt continue with the DSL and Comparative Kingship projects then.

Week Beginning 22nd March 2021

I continued to develop the ‘Dictionary Management System’ for the Anglo-Norman Dictionary this week, following on with the work I began last week to allow the editors to drag and drop sets of entry XML files into the system.  I updated the form to add in another option underneath the selection of phase statement called ‘Phase Statements for existing records’.  Here the editor can choose whether to retain existing statements or replace them.  If ‘retain’ is selected then any XML entries attached to the form that either have an existing entry ID in their filename or have a slug that matches an existing entry in the system will retain whatever phase statement the existing entry has, no matter what phase statement is selected in the form.  The phase statement selected in the form will still be applied to any XML entries attached to the form that don’t have an existing entry in the system.  Selecting ‘replace existing statements’ will ignore all phase statements of existing entries and will overwrite them with whatever phase statement is selected in the form.  I also updated the system so that it extracts the earliest date for an entry at the point of upload.  I added two new columns to the holding area (for earliest date and the date that is displayed for this) and have ensured that the display date appears on the ‘review’ page too.  In addition, I added in an option to download the XML of an entry in the holding area, if it needs further work.

I ran a large-scale upload test, comprising of around 3,200 XML files from the ‘R’ data to see how the system would cope with this, but unfortunately I ran into difficulties with the server rejecting too many requests in a short space of time and only about 600 of the files made it through.  I asked Arts IT Support to see whether the server limits can be removed for this script, but haven’t heard anything back yet.  I ran into a similar issue when processing files for the Mull and Ulva place-names project in January last year and Raymond was able to update the whitelist for the Apache module mod_evasive that was blocking such uploads and I’m hoping he’ll be able to do something similar this time.  Alternatively, I’ll need to try and throttle the speed of uploads in the browser.

In the meantime, I continued with the scripts for publishing entries that had been uploaded to the holding area, using a test version of the site that I set up on my local PC to avoid messing up the live database.  I updated the ‘holding area’ page quite significantly.  At the top of the page is a box for publishing selected items, and beneath this is the table containing the holding items.  Each row now features a checkbox, and there is an option above the table to select / deselect all rows on the page (so currently up to 200 entries can be published in one batch as 200 is the page limit).    The ‘preview’ button has been replaced with an ‘eye’ icon but the preview page works in the same way as before.  I was intending to add the ‘publish’ options to this page but I’ve moved this to the holding area page instead to allow multiple entries to be selected for publication at any one time.

Selecting one or more items for publication and then pressing the ‘publish selected holdings’ button runs some JavaScript that grabs the ID of each holding item and then submits this to a script on the server via AJAX, and the server-side script then processes each selected item for publication in turn.  I limited the processing of this to one item per second to hopefully avoid the server rejecting requests.  Rather a lot happens when an item is published: The holding item is copied to the live entry table and then its XML is analysed to extract and store for search purposes: Citations, attestation dates and word counts of every word in each citation; translations and word counts of every word in each translation; semantic and usage labels (including adding new labels to the system if the XML contains new ones); word forms and their types (lemma, variant, deviant); parts of speech; cross references in xref entries.

If there is an existing live entry that matches the current entry (either because of the stored ‘Existing ID’ or because it has the same slug as the holding item) then this entry is deactivated in the database, its XML is copied to the ‘history’ table and associated with the new item record and all search data for the live entry as mentioned above is deleted.  At this point the holding item record is deleted and the server-side script finishes executing, returning its output to the JavaScript, which then adds a row to the ‘publication log’ on the holding entries page; decreases the count of the number of holding entries on the page by one and removes the row containing the holding item from the table on the page.

Once all of the selected items are published there is one final task that the page performs, which is to completely regenerate the cross references data.  This is something that unfortunately needs to be done after each batch (even if it’s only one record) because cross references rely on database IDs and when a new version of an existing entry is published it receives a new ID.  This means any existing cross references to that item will no longer work.  The publication log will state that the regeneration is taking place and then after about 30 seconds another statement will say it is complete.  I tested this process on my local PC, publishing single items, a few items and entire pages (200 items) at a time and all seemed to be working fine so I then copied the new scripts to the server.

Also this week I continued with the processing of library registers for the Books and Borrowing project.  These are coming in rather quickly now and I’m getting a bit of a backlog.  This is because I have to download the image files, then process then to generate tilesets, and then upload all of the images and their tilesets to the server.  It’s the tilesets that are the real sticking point, as these consist of thousands of small files.  I’m only getting an upload speed of about 70KB/s and I’m having to upload many gigabytes of data.  I did a test where I zipped up some of the images and uploaded this zip file instead and was getting a speed of around 900KB/s and as it looks like I can get command-line access to the server I’m going to investigate whether zipping up the files, then uploading them then unzipping them will be a quicker process.  I also had to spend some time sorting out connection issues to the server as the Stirling VPN wasn’t letting me connect.  It turned out that they had switched to multi-factor authentication and I needed to set this up before I could continue.

Also this week I wrote a summary of the work I’ve done so far for the Place-names of Iona project for a newsletter they’re putting together, spoke to people about the new ‘Comparative Kingship’ place-names project I’m going to be involved with, spoke to the Scots Language Policy people about setting up a mailing list for the project(it turns out that the University has software to handle this, available here: https://www.gla.ac.uk/myglasgow/it/emaillists/) and fixed an issue relating to the display of citations that have multiple dates for the DSL.

Week Beginning 1st March 2021

There was quite a bit of work to be done for the Books and Borrowing project this week.  Several more ledgers had been digitised and needed tilesets and page records generated for them.  The former requires the processing and upload of many gigabytes of images files which takes quite some time to complete, especially as the upload speed from my home computer to the server never gets beyond about 56Kb per second.  However, I just end up leaving my PC on overnight and generally the upload has completed by the morning.  Generating page records generally involves me updating a script to change image filename parts and page numbers, and to specify the first and last page and the script does the rest, but there are some quirks that need to be sorted out manually.  For the Wigtown data some of the images were not sequentially numbered, which meant I couldn’t rely on my script to generate the correct page structure.  For one of the Edinburgh ledgers the RA has already manually created some pages and had added more than a hundred borrowing records to them so I had to figure out a way to incorporate these.  The page images are a double spread (so two pages per image) but the pages the RA had made were individual, so what I needed to do was to remove the manual pages, generate a new set and then update the page references for each of the borrowing records so they appeared on the correct new page.

Also this week I continued to migrate the blogs over to the new Anglo-Norman Dictionary website, a process which I managed to complete.  The new blog isn’t live yet, as I asked for feedback from the Editors before I replaced the link to the old blog site, and there are a couple of potential tweaks that I need to make before we’re ready to go.  I also had a chat with the Dictionary of the Scots Language people about migrating to a new DNS provider and the impact this might have on email addresses.

The rest of my week was spent working on proposals for two new projects, one for Kirsteen McCue and the other for Wendy Anderson.  This involved reading through all of the documentation, making notes and beginning to write the required Data Management Plans.  For Wendy’s proposal we also had a Zoom meeting with partners in Dundee and for Kirsteen’s proposal I had an email discussion with partners at the British Library.  There’s not really much more I can say about the work I’m doing for these projects, but I’ll be continuing to work on the respective DMPs next week.

Week Beginning 22nd February 2021

I had a couple of Zoom meetings this week, then first on Monday was with the Historical Thesaurus team and members of the Oxford English Dictionary’s team to discuss how our two datasets will be aligned and updated in future.  It was an interesting meeting, but there’s still a lot of uncertainty regarding how the datasets can be tracked and connected as future updates are made, at least some of which will probably only become apparent when we get new data to integrate.

My second Zoom meeting was on Tuesday with the Place-Names of Iona project to discuss how we will be working with the QGIS package that team members will be using to access some of the archaeological data and Lidar maps, and also to discuss the issue of 10 digit grid references and the potential change from the old OSGB-36 means of generating latitude and longitude from grid references to the new WGS84 method.  It was a productive meeting and we decided that we would switch over to WGS84 and I would update the CMS to incorporate the new library for generating latitude and longitude from grid references.

I spent some time later in the week implementing this change, meaning that when a member of the project team adds or edits a place-name and supplies a grid reference the latitude and longitude generated use the new system.  As I mentioned a couple of weeks ago, the new library (see  http://www.movable-type.co.uk/scripts/latlong-os-gridref.html) allows 6, 8 or 10 digit grid references to be used and is JavaScript based, meaning as soon as the user enters the grid reference the latitude and longitude are generated.  I updated my scripts so that these values immediately appear in the relevant boxes in the form, and also integrated the Google Maps service that generates altitude data from the latitude and longitude, populating the altitude box in the form and also displaying a Google Map showing the exact location that the entered grid reference has produced if further tweaks are required.  I’m pretty happy with how the new system is working out.

Also this week I continued to work on the Books and Borrowing project, generating image tilesets for the scans of several volumes of ledgers from Edinburgh University Library and writing scripts to generate pages in the Content Management System, creating ‘next’ and ‘previous’ links as required and associating the relevant images.  I also had an email correspondence about some of the querying methods we will develop for the data, such as collocation information.

I also gave some feedback on a data management plan for a project I’m involved with, had a chat with Wendy Anderson about a possible future project she’s trying to set up and spent some time making updates to the underlying data of the Interactive Map of Burns Suppers that launched last month.  I didn’t have the time to do a huge amount of work on the Anglo-Norman Dictionary this week, but I still managed to migrate some of the project’s old blog posts to our new site over the course of the week.

Finally, I made some updates to the bibliography system for the Dictionary of the Scots Language, updating the new system so it works in a similar manner to the live site.  I added ‘Author’ and ‘Title’ to the drop-down items when searching for both to help differentiate them and a search for an item when the user ignores the drop-down options and manually submits the search now works as it does in the live site.  I also fixed the issue with selecting ‘Montgomerie, Norah & William’ resulting in a 404 error.  This was caused by the ampersand.  There were some issues with other non-alphanumeric characters that I’ve fixed too, including slashes and apostrophes.

Week Beginning 8th February 2021

I was on holiday from Monday to Wednesday this week to cover the school half-term, so only worked on Thursday and Friday.  On Thursday I had a Zoom call with the Historical Thesaurus team to discuss further imports of new data from the OED and how to export our data (such as the revised category hierarchy) in a format that the OED team would be able to use.  We have a meeting with the OED the week after next so it was good to go over some of the issues and refresh my memory about where things were left off as it’s been several months since I last did any major work on the HT.  As a result of the meeting I also did some further work, namely exporting the current version of the online database and making it available for Fraser to download and access on his own PC, and updating some of the earlier scripts I’d created to generate statistics about the unmatched categories and words so that they used the most recent versions of the database.

Also this week I made some further tweaks to the SCOSYA website and created a user account for a researcher who is going to work with some of the data that is only available in the project’s CMS rather than the public website.  I also read through a new funding proposal that Wendy Anderson is involved with and have her some feedback on that and reported a couple of issues with expired SSL certificates that were affecting some websites.

I spent some time on the Books and Borrowing project on two data-related tasks.  First was to look through the new set of digitised images from Edinburgh University Library and decide what we should do with them.  Each image is of an open book, featuring both recto and verso pages in one image.  We may need to split these up into individual images, or we may just create page records that cover both pages.  I alerted the project PI Katie Halsey to the issue and the team will make a decision about which approach to take next week.  The second task was to look through the data from Selkirk library that another project had generated.  We had previously imported data for Selkirk that another researcher had compiled a few years before our project began, but recently discovered that this data did not include several thousand borrowing records of French prisoners of war, as the focus of the researcher was on Scottish borrowers.  We need these missing records and another project has agreed to let us use their data.  I had intended to completely replace the database I’d previously ingested with this new data, but on closer inspection of the new data I have a number of reservations about doing so.

The data from the other project has been compiled in an Excel spreadsheet and as far as I can tell there is no record of the ledger volume or page that each borrowing record was originally located on.  In the data we already have there is a column for ‘source ref’, containing the ledger volume (e.g. ‘volume 1’) and a column for ‘page number’, containing a unique ID for each page in the spreadsheet (e.g. ‘1010159r’).  Looking through the various sheets in the new spreadsheet there is nothing comparable to this, which is vital for our project, as borrowing records must be associated with page records, which in turn must be associated with a ledger.  It also would make it extremely difficult to trace a record back to the original physical record.

Another issue is that in our existing data the researcher has very handily used unique identifiers for readers (e.g. ‘brodie_james’), borrowing records (e.g. ‘1’) and books (e.g. ‘adam_view_religion’) that tie the various records together very nicely.  The new project’s data does not appear to use any unique identifiers to connect bits of data together.  For example, there are three ‘John Anderson’ borrowers and in the data we’re currently using these are differentiated by their IDs as ‘anderson_john’, ‘anderson_john2’ and ‘anderson_john3’.  This means it’s easy to tell which borrower appears in the borrowing records.  In the new project’s data three different fields are required to identify the borrower:  surname, forename and residence.  This data is stored in separate columns in the ‘All loans’ sheet (e.g. ‘Anderson’, ‘John’, ‘Cramalt’), but in the ‘Members’ sheet everything is joined together in one ‘Name’ field, e.g. ‘Anderson, John (Cramalt)’.  This lack of unique identifiers combined with the inconsistent manner of recording name and place will make it very difficult to automatically join up records and I’ve flagged this up with Katie for further discussion with the team.  It’s looking like we may want to try and identify the POW records from the new project’s data and amalgamate these with the data we already have, rather than replacing everything.

I also spent a bit of time on the Anglo-Norman Dictionary this week, making some changes to homonym numbers for a few entries and manually updating a couple of commentaries.  I also worked for the Dictionary of the Scots Language, preparing the SND and DOST datasets for import into the new editing system that the project is now going to use.  This was a little trickier than anticipated as initially I zipped up the data that I’d exported from the old editing system in November when I worked on the new ‘V4’ version of the online API, but we realised that this still contained duplicates that I’d stripped out when uploading the data into the new online database.  So instead I exported the XML from the online database, but it turned out that during the upload process a section of the entry XML was being removed.  This section (<meta>) contained all of the forms and URLs and my upload process exported these to a separate table and reformatted the XML so that it matched the structure that was defined during the creation of the first version of the API.  However, the new editing system requires this <meta> section so that data I’d prepared was not usable.  Instead I took the XML exported from the old editing system back in November and ran it through the script I’d written to strip out duplicates, then prepared the resulting XML dataset for transfer.  It looks like this approach has worked, but I’ll find out more next week.

Week Beginning 1st February 2021

I had two Zoom calls this week, the first on Wednesday with Kirsteen McCue to discuss a new, small project to publish a selection of musical settings to Burns poems and the second on Friday with Joanna Kopaczyk and her RA on the Scots Language Policy project to give a tutorial on how to use WordPress.

The majority of my week was divided between the Anglo-Norman Dictionary, the Dictionary of the Scots Language and the Place-names of Iona projects.  For the AND I made a few tweaks to the static content of the site and migrated some more blog posts across to the new site (these are not live yet).  I also added commentaries to more than 260 entries, which took some time to test.  I also worked on the DTD file that the editors reference from their XML editing software to ensure that all of the elements and attributes found within commentaries are ‘allowed’ in the XML.  Without doing this it was possible to add the tags in, but this would give errors in the editing software.  I also batch updated all of the entries on the site to reference the new DTD and exported all of the files, zipped them up and sent them to the editors so they can work on them as required.  I also began to think about migrating the TextBase from the old site to the new one, and managed to source the XML files that comprise this system.  It looks like it may be quite tricky to work with these as there are more than 70 book-length XML files to deal with and so far I have not managed to locate the XSLT that was originally used to process these files.

For the DSL I completed work on the new bibliography search pages that use the new ‘V4’ data.  These pages allow the authors and titles of bibliographical items to be searched, results to be viewed and individual items to be displayed.  I also made some minor tweaks to the live site and had a discussion with Ann Fergusson about transferring the project’s data to the people who have set up a new editing interface for them, something I’m hoping to be able to tackle next week.

For the Place-names of Iona project I had a discussion about implementing a new ‘work of the month’ feature and spent quite a bit of time investigating using 10-digit OS grid references in the project’s CMS.  The team need to use up to 10-digit grid references to get 1m accuracy for individual monuments, but the library I use in the CMS to automatically generate latitude and longitude from the supplied grid reference will only work with a 6-digit NGR.  The automatically generated latitude and longitude are then automatically passed to Google Maps to ascertain the altitude of the location and all of this information is stored in the database whenever a new place-name record is created or an existing record is edited.

As the library currently in use will only accept 6-digit NGRs I had to do a bit of research into alternative libraries, and I managed to find one that can accept NGRs of 2,4,6,8 or 10 digits.  Information about the library, including text boxes where you can enter an NGR and see the results can be found here: http://www.movable-type.co.uk/scripts/latlong-os-gridref.html along with an awful lot of description about the calculations and some pretty scary looking formulae.

The library is written in JavaScript, which runs in the client’s browser, whereas the previous library was written in PHP, which runs on the server.  This means I needed to change the way the CMS works – previously you’d enter an NGR and then when the form was submitted to the server the PHP library would generate the latitude and longitude whereas now the latitude and longitude need to be generated in the browser as soon as the NGR is entered into the textbox, and two further textboxes for latitude and longitude will appear in the form and will then be automatically populated with the results.

 

This does mean the person filling out the form can see the generated latitude and longitude and also tweak it if required before submitting the form, which is a potentially useful thing.  I may even be able to add a Google Map to the form so you can see (and possibly tweak) the point before submitting the form, but I’ll need to look into this further.  I also still need to work on the format of the latitude and longitude as the new library generates them with a compass point (e.g. 6.420848° W) and we need to store them as a purely decimal value (e.g. -6.420848) with ‘W’ and ‘S’ figures being negatives.

However, whilst researching this I discovered a potentially worrying thing that needs discussion with the wider team.  The way the Ordnance Survey generates latitude and longitude from their grid references was changed in 2014.  Information about this can be found in the page linked to above in the ‘Latitude/longitudes require a datum’ section.  Previously the OS used ‘OSGB-36’ to generate latitude and longitude, but in 2014 this was changed to ‘WGS84’, which is used by GPS systems.  The difference in the latitude / longitude figures generated by the two systems is about 100 metres, which is quite a lot if you’re intending to pinpoint individual monuments.

The new library has facilities to generate latitude and longitude using either the new or old systems, but defaults to the new system.  I’ve checked the output of the library we currently use and it uses the old ‘OSGB-36’ system.  This means all of the place-names in the system so far (and all those for the previous projects) have latitudes and longitudes generated using the now obsolete (since 2014) system. To give an example of the difference, the place-name A’ Mhachair in the CMS has this location: https://www.google.com/maps/place/56%C2%B019’33.2%22N+6%C2%B025’11.4%22W/@56.3258889,-6.422022,582m/data=!3m2!1e3!4b1!4m5!3m4!1s0x0:0x0!8m2!3d56.325885!4d-6.419828 and with the newer ‘WGS84’ system it would have this location: https://www.google.com/maps/place/56%C2%B019’32.7%22N+6%C2%B025’15.1%22W/@56.325744,-6.4230367,582m/data=!3m2!1e3!4b1!4m5!3m4!1s0x0:0x0!8m2!3d56.325744!4d-6.420848

So what we need to decide before I replace the old library with the new one in the CMS is whether we switch to using ‘WGS84’ or we keep using ‘OSGB-36’.  As I say, this will need further discussion before I implement any changes.

Also this week I responded to a query from Cris Sarg of the Medical Humanities Network project, spoke to Fraser Dallachy about future updates to the HT’s data from the OED, made some tweaks to the structure of the SCOSYA website for Jennifer Smith, added a plugin to the Editing Burns site for Craig Lamont and had a chat with the Books and Borrowing people about cleaning the authors data, importing the Craigston data and how to deal with a lot of borrowers that were excluded from the Selkirk data that I previously imported.

Next week I’ll be on holiday from Monday to Wednesday to cover the school half term.

 

Week Beginning 25 January 2021

I headed into the University for the first time this year on Wednesday this week to collect a new iPad that I’d ordered and to get some files from my office.  It was great to see the old place again, but it did take quite a chunk out of my day to travel there and back, especially as I’m still home-schooling either a morning or an afternoon each day at the moment too.

As with last week, I mainly divided my time this week between the Dictionary of the Scots Language, the Anglo-Norman Dictionary and the Books and Borrowing project, with a few other bits and bobs added in as well.  For the DSL I retrieved the source code for my original Scots School Dictionary app from my office so we can host this somewhere on the DSL website.  This is because the DSL have commissioned someone else to make a new School Dictionary app, which launched this week, but doesn’t include an ‘English to Scots’ feature as the old app does, so we’re going to make the old app available as a website for those people who miss the feature.  I also made a few minor tweaks to the main DSL site, and then focussed on adding bibliography search facilities to the new version of the API, a task that I’d begun last week.

I created a new table for the bibliographical data that includes the various fields used for DOST (note, author, editor, date, longtitle etc) and a field for the XML data used for SND.  I then created two further tables for searching, one that contains every author and editor name for each item (for DOST there may be different names in the author, editor, longauthor and longeditor fields while for SND there may be any number of <author> tags) and the other containing every title for each item (DOST may have different text in title and longtitle while SND items can have any number of <title> tags).  These tables allow you to search for any variant author, editor or title and find the item.

I also created two additional fields in the bibliography table that contain the ‘display author’ and ‘display title’.  These are the forms that get displayed in the search results before you click on an item to open the full bibliographical entry.  I then updated the V4 API to add in facilities to search and retrieve the bibliographies.  I didn’t have the time to connect to this API and to implement the search on the Sienna test site, which is something I hope to do next week, but the logic behind the search and display of bibliographies is all there.  There is a predictive search that will be used to generate the autocomplete list, similar to how the live site currently works:  You will be able to select whether your search is for authors, titles or both and when you start typing in some text a list of matching items will appear, e.g. typing in ‘ham’ for authors in both dictionaries will display the following all items containing ‘ham’ and when you select an item this will then perform a search for the specific text.  You will then be able to click on an item to view the full bibliography.  This is a bit different to how the live site currently works, as with these if you enter ‘ham’ and select (for example) ‘Hamilton, J,’ from the autocomplete list you are taken directly to a page that lists all of the items for the author.  However, we can’t do that any more as we no longer have unique identifiers that group bibliographical items by author.  I may be able to do something similar with the page that comes up when you select an author, but this would have to rely on the name to group items together and a name may not be unique.

For the AND I made some tweaks to the website, such as adding a link to the search page if you type some text into the ‘jump to entry’ option and no matching entries are found.  I then spent the rest of my time continuing to develop the new content management system, specifically the pages for managing source texts.  I finished work on this, adding in facilities to add, edit, browse and delete source texts from the database.  I then migrated the DTD to the new site, which is referenced by the editors’ XML editor when they work on the entry XML files.  The DTD on the old server referenced several lists of things that are then used to populate drop-down lists of options in the XML editor.  I migrated these too, making them dynamically generated from the underlying database rather than statis lists, meaning when (for example) new source texts are added to the CMS these will automatically become available when using the XML editor.

For the Books and Borrowing project I participated in the project’s Zoom call on Monday to discuss the project’s CMS and how to amalgamate the various duplicate author records that resulted from data uploads from different libraries.   After the call I made some required changes to the CMS, such as making the editor’s notes fields visible by default again, and worked on the duplicate authors matching script to add in further outputs when comparing the author names with Levenshtein ratings of 1 and 2.  I also reviewed some content that was sent to us from another library.

Also this week I responded to an email from James Caudle in Scottish Literature about a potential project he’s setting up, made a couple of changes to the Scots Language Policy website, made some tweaks to the menu structure for the Scots Syntax Atlas project and gave some advice to a post-grad student who had contacted me about setting up a corpus.

Week Beginning 18th January 2021

I worked on many different projects this week, with most of my time being split between the Dictionary of the Scots Language, the Anglo-Norman Dictionary, the Books and Borrowing project and the Scots Language Policy project.  For the DSL I began investigating adding the bibliographical data to the new API and developing bibliographical search facilities.  Ann Ferguson had sent me spreadsheets containing the current bibliographical data for DOST and SND and I migrated this data into a database and began to think about how the data needs to be processed in order to be used on the website.  At the moment links to bibliographies from SND entries are not appearing in the new version of the API, while DOST bibliographical links do appear but don’t lead anywhere.  Fixing the latter should be fairly straightforward but the former looks to be a bit trickier.

For SND for the live site using the original V1 API it looks like the bibliographical links are stored in a database table and these are then injected into the XML entries whenever an entry is displayed.  A column in the table contains the order the citation appears in the entry and this is how the system knows which bibliographical ID to assign to which link in the entry.  This raises some questions about what happens when an entry is edited.  If the order of the citations in the XML is changed, or a new citation is added then all of the links to the bibliographies will be out of sync.  Plus, unless the database table is edited no new bibliographical links will ever display.  It is possible that the data in bibliographical links table is already out of date and we are going to need to try and find a way to add these bibliographical links into the actual XML entries rather than retaining the old system of storing them separately and then injected then each time the entry is requested.  I emailed Ann for further discussion about these points.  Also this week I made a few updates to the live DSL website, changing the logos that are used and making ‘Dictionary’ in the title plural.

For the AND this week I added in the missing academic articles that Geert had managed to track down and then began focusing on updating the source texts and working with the commentaries for the R data.  The commentaries were sent to me in two Word files, and although we had hoped to be able to work out a mechanism for automatically extracting these and adding them to their corresponding entries it looks like this will be very difficult to achieve with any accuracy.  I concluded that I could split the entries up in Geert’s document based on the ‘**’ characters between commentaries and possibly split Heather’s up based on blank lines.  I could possibly retain the formatting (bold, italic, superscript text etc) and convert this to HTML, although even this would be tricky, time consuming and error-prone.  The commentaries include links to other entries in bold, and I would possibly be able to automatically add in links to other entries based on entries appearing in bold in the commentaries, but again this would be highly error-prone as bold text is used for things other than entries, and sometimes the entry number follows a hash while at other times it’s superscript.  It would also be difficult to automatically ascertain which entry a commentary belongs to as there is some inconsistency here too – e.g. the commentary for ‘remuement’ is listed as ‘[remuement]??’ and there are other occasions where the entry doesn’t appear on its own on a line – e.g. ‘Retaillement xref with recelement’ and ‘Reverdure—Geert says to omit’.  Then there are commentaries that are all crossed out, e.g. ‘resteot’.  We decided that attempting to automatically process the commentaries would not be feasible and instead the editors would add them to the entry XML files manually, adding the tags for bold, italic, superscript and other formatting as required.  Geert added commentaries to two entries to see how this would work and it worked very well.

For the source texts, we had originally discussed the editors editing these via a spreadsheet that I’d generated from the online data last year, but I decided it would be better if I just start work on the new online Dictionary Management System (DMS) and create the means of adding, listing and editing the source texts as the first thing that can be managed via the new DMS.  This seemed preferable to establishing a new, temporary workflow that may take some time to set up and may end up not being used for very long.  I therefore created the login and initial pages for the DMS (by repurposing earlier content management systems I’d created).  I then set up database tables for holding the new source text data, which includes multiple potential items for each source and a range of new fields that the original source text data does not contain.  With this in place I created the DMS pages for browsing the source texts and deleting them, and I’m midway through writing the scripts for editing existing and adding new source texts.  I aim to have this finished next week.

For the Books and Borrowing project I continued to make refinements to the CMS, namely reducing the number of books and borrowers from 500 to 200 to speed up page loads, adding in the day of the week that books were borrowed and returned, based on the date information already in the system, removing tab characters for edition titles as these were causing some issues for the system, replacing the editor’s notes rich text box with a plain text area to save space on the edit page and adding a new field to the borrowing record that allows the editor to note when certain items appear for display only and should otherwise be overlooked, for example when generating stats.  This is to be used for duplicate lines and lines that are crossed out.  I also had a look through the new sample data from Craigston that was sent to us this week.

For the Scots Language Policy project I set up the project’s website, including the user interface, adding in fonts, plugins, initial page structure, site graphics, logos etc.  Also this week I fixed an issue with song downloads on the Burns website (the plugin the controls the song downloads is very old and had broken.  I needed to install a newer version and upgrade the song data for the downloads to work again.  I also continued my email conversation with Rachel Fletcher about a project she’s putting together and created a user account to allow Simon Taylor to access the Ayr Placenames CMS.

Week Beginning 11th January 2021

This was my first full week back of the year, although it was also the first week of a return to homeschooling, which made working a little trickier than usual.  I also had a dentist’s appointment on Tuesday and lost some time to that due to my dentist being near the University rather than where I live.  However, despite these challenges I was able to achieve quite a lot this week.  I had two Zoom calls, the first on Monday to discuss a new ESRC grant that Jane Stuart-Smith is putting together with colleagues at Strathclyde while the second on Wednesday was with a partner in Joanna Kopaczyk’s new RSC funded project about Scots Language Policy to discuss the project’s website and the survey they’re going to put out.  I also made a few tweaks to the DSL website, replied to Kirsteen McCue about the AHRC proposal she’s currently putting together, replied to a query regarding the technologies behind the Scots Syntax Atlas, made a few further updates to the Burns Supper map and replied to a query from Rachel Fletcher in English Language about lemmatising Old English.

Other than these various tasks I split my time between the Anglo-Norman Dictionary and the Books and Borrowing projects.  For the former I completed adding explanatory notes to all of the ‘Introducing the AND’ pages.  This was a very time consuming task as there were probably about 150 explanatory notes in total to add in, each appearing in a Bootstrap dialog box, and each requiring me to copy the note form the old website, add in any required HTML formatting, find and check all of the links to AND entries on the old site and add these in as required.  It was pretty tedious to do, but it feels great to get it done, as the notes were previously just giving 404 errors on the new live site, and I don’t like having such things on a site I’m responsible for.  I also migrated the academic articles from the old site to the new one (https://anglo-norman.net/articles/) which also required some manual formatting of the content.  There are five other articles that I haven’t managed to migrate yet as they are full of character encoding errors on the old site.  Geert is looking for copies of these articles that actually work and I’ll add them in once he’s able to get them to me.  I also begin migrating the blog posts to the new site too.  Currently the blog is hosted on Blogspot and there are 55 entries, but we’d like these to be an internal part of the new site.  Migrating these is going to take some time as it means copying the text (which thankfully retains formatting) and then manually saving and embedding any images in the posts.  I’m just going to do a few of these a week until they’re all done and so far I’ve migrated seven.  I also needed to look into how the blogs page works in the WordPress theme I created for the AND, as to start with the page was just listing the full text of every post rather than giving summaries and links through to the full text of each.  After some investigation I figured out that in my theme there is a script called ‘home.php’ and this is responsible for displaying all of the blog posts on the ‘blog’ page.  It in turn calls another template called ‘content-blog.php’ which was previously set to display the full content of the blog.  Instead I set it to display the title as a link through to the full post, the date and then an excerpt from the full blog, which can be accessed through a handy WordPress function called ‘the_excerpt()’.

For the Books and Borrowing project I made some improvements and fixes to the Content Management System.  I’d been meaning to enhance the CMS for some time, but due to other commitments to other projects I didn’t have the time to delve into it.  It felt good to find the time to return to the project this week.

I updated the ‘Books’ and ‘Borrowers’ tabs when viewing a library in the CMS.  I added in pagination to speed up the loading of the pages.  Pages are now split into 500 record blocks and you can navigate between pages using the links above and below the tables.  For some reason the loading of the page is still a bit slow on the Stirling server whereas it was fine on the Glasgow server I was using for test purposes.  I’m not entirely sure why as I’d copied the database over too – presumably the Stirling server is slower.  However, it is still a massive improvement on the speed of the page previously.

I also changed the way tables scroll horizontally.  Previously if a table was wider than the page a scrollbar appeared above and below the table, but this was rather awkward to use if you were looking at the middle of the table (you had to scroll up or down to the beginning or end of the table, then use the horizontal scrollbar to move the table along a bit, then navigate back to the section of the page you were interested in).  Now the scrollbar just appears at the bottom of the browser window and can always be accessed no matter where in the table you are.

I also removed the editorial notes from tables by default to reduce clutter, and added in a button for showing / hiding the editors’ notes near the top of each page.  I also added a limit option in the ‘Books’ and ‘Borrowers’ pages within a library to limit the displayed records to only those found in a specific ledger.  I added in a further option to display those records that are not currently associated with any ledgers too.

I then deleted the ‘original borrowed date’ and ‘original returned date’ fields in St Andrews data as these were no longer required.  I deleted these additional fields from the system and all data that were contained in these fields.

It had been noted that the book part numbers were not being listed numerically.  As part numbers can contain text as well as numbers (e.g. ‘Vol. II’), this field in the database needed to be set as text rather than an integer.  Unfortunately the database doesn’t order numbers correctly when they are contained in a non-numerical field  – instead all the ones come first (1, 10, 11) then all the twos (2, 20, 22) etc.  However, I managed to find a way to ensure that the numbers are ordered correctly.

I also fixed the ‘Add another Edition/Work to this holding’ button that was not working.  This was caused by the Stirling server running a different version of PHP that doesn’t allow functions to have variable numbers of arguments.  The autocomplete function was also not working at edition level and I investigated this.  The issue was being caused by tab characters appearing in edition titles, and I updated my script to ensure these characters are stripped out before the data is formatted as JSON.

There may be further tweaks to be made – I’ll need to hear back from the rest of the team before I know more, but for now I’m up to date with the project.  Next week I intend to get back into some of the larger and more trickier outstanding AND tasks (of which there are, alas, many) and to begin working towards adding the DSL bibliography data into the new version of the API.

Week Beginning 4th January 2021

This was my first week back after the Christmas holidays, and I only worked the Thursday and the Friday.  We’re back in full lockdown and homeschooling again now, so it’s not the best of starts to the new year.  I spent my two days this week catching up with emails and finishing off some outstanding tasks from last year.  I spoke to Joanna Kopaczyk about her new RSE funded project that I need to set up a website for, and I had a chat with the DSL people about the outstanding tasks that still need to be tackled for the Dictionary of the Scots Language.  I also added a few more Burns Suppers to the Supper Map that I created over the past year for Paul Malgrati in Scottish Literature, which was a little time consuming as the data is contained in a spreadsheet featuring more than 70 columns.

I spent the remainder of the week continuing to work on the new Anglo-Norman Dictionary site, which we launched just before Christmas.  The editors, Geert and Heather, had spotted some issues with the site whilst using it so I had a few more things to add to my ‘to do’ list, some of which I ticked off.  One such thing was that entries with headwords that consisted of multiple words weren’t loading.  This required an update to the way the API handles variables passed in URL strings, and after I implemented that such entries then loaded successfully.

A bigger issue was the fact that some citations were not appearing in the entries.  This took some time to investigate but I eventually tracked down the problem.  I’d needed to write a script that reordered all of the citations in every sense in every entry by date, as previously the citations were not in date order.  However, when looking at the entries that had missing citations it would appear that where a sense has more than one citation in the same year only one of these citations was appearing.  This is because within each sense I was placing the citations in an array with the year as the key, e.g:

$citation[“1134”] = citation 1

$citation[“1362”] = citation 2

$citation[“1247”] = citation 3

I was then reordering the array based on the key to get things in year order.  But where there were multiple citations in a single year for a sense this approach wasn’t working as the array key needs to be unique.  So if there were two ‘1134’ citations only one was being retained.  To fix this I updated the reordering script to add a further incrementing number to the key, so if there are two ‘1134’ citations the key for the first is ‘1134-1’ and the second is ‘1134-2’.  This ensures all citations for a year are retained and the sorting by key still works.  After implementing the fix and rerunning the citation ordering script I updated the XML in the online database and the missing citations are now thankfully appearing online.

I ended the week by continuing to work through the ancillary pages of the dictionary, focusing on the ‘Introducing the AND’ pages (https://anglo-norman.net/introducing-the-and/).  I’d managed to get the main content of the pages in place before Christmas, but explanatory notes and links were not working.  There are about 50 explanatory notes in the ‘Magna Carta’ page and I needed to copy all of these from the old site and add them to a Bootstrap dialog pop-up, which was rather time-consuming.  I also had to update the links through to the dictionary entries as although I’d added redirects to ensure the old links worked, some of the links in these pages didn’t feature an entry number where one was required.  For example on the page about food there was a link to ‘pere’ but the dictionary contains three ‘pere’ entries and the correct one is actually the third (the fruit pear).  I still need to fix links and explanatory notes in the two remaining pages of the introduction, which I will try to get sorted next week.