This week began with Easter Monday, which was a holiday. I’d also taken Tuesday and Thursday off to cover some of the Easter school holidays so it was a two-day working week for me. I spent some of this time continuing to download and process images of library register books for the Books and Borrowing project, including 14 from St Andrews and several further books from Edinburgh. I was also in communication with one of the people responsible for the Dictionary of the Scots Language’s new editor interface regarding the export of new data from this interface and importing it into the DSL’s website. I was sent a ZIP file containing a sample of the data for SND and DOST, plus a sample of the bibliographical data, with some information on the structure of the files and some points for discussion.
I looked through all of the files and considered how I might be able to incorporate the data into the systems that I created for the DSL’s website. I should be able to run the new dictionary XML files through my upload script with only a few minor modifications required. It’s also really great that the bibliographies and cross references are getting sorted via the new Editor interface. One point of discussion is that the new editor interface has generated new IDs for the entries, and the old IDs are not included. I reckoned that it would be good if the old IDs were included in the XML as well, just in case we ever need to match up the current data with older datasets. I did notice that the old IDs already appeared to be included in the <url> fields, but after discussion we decided that it would be safer to include them as an attribute of the <entry> tag, e.g. <entry oldid=”snd848”> or something like that, which is what will happen when I receive the full dataset.
There are also new labels for entries, stating when and how the entry was prepared. The actual labels are stored in a spreadsheet and a numerical ID appears in the XML to reference a row in the spreadsheet. This method of dealing with labels seems fine with me – I can update my system to use the labels from the spreadsheet and display the relevant labels depending on the numerical codes in the entry XML. I reckon it’s probably better to not store the actual labels in the XML as this saves space and makes it easier to change the label text, if required, as it’s only then stored in a single place.
The bibliographies are looking good in the sample data, but I pointed out that it might be handy to have a reference of the old bibliographical IDs in the XML, if that’s possible. There were also spurious xmlns=”” attributes in the new XML, but these shouldn’t pose any problems and I said that it’s ok to leave them in. Once I receive the full dataset with some tweaks (e.g. the inclusion of old IDs) then I will do some further work on this.
I spent most of the rest of my available time working on the new Comparative Kingship place-names systems. I completed work on the Scotland CMS, including adding in the required parishes and former parishes. This means my place-name system has now been fully modernised and uses the Bootstrap framework throughout, which looks a lot better and works more effectively on all screen dimensions.
I also imported the data from GB1900 for the relevant parishes. There are more than 10,000 names, although a lot of these could be trimmed out – lots of ‘F.P.’ for footpath etc. It’s likely that the parishes listed are rather broader than the study will be. All the names in and around St Andrews are in there, for example. In order to generate altitude for each of the names imported from GB1900 I had to run a script I’d written that passes the latitude and longitude for each name in turn to Google Maps, which then returns elevation data. I had to limit the frequency of submissions to one every few seconds otherwise Google blocks access, so it took rather a long time for the altitudes of more than 10,000 names to be gathered, but the process completed successfully.
Also this week I dealt with an issue with the SCOTS corpus, which had broken (the database had gone offline) and helped Raymond at Arts IT Support to investigate why the Anglo-Norman Dictionary server had been blocking uploads to the dictionary management system when thousands of files were added to the upload form. It turns out that while the Glasgow IP address range was added into the whitelist the VPN’s IP address range wasn’t, which is why uploads were being blocked.
Next week I’m also taking a couple of days off to cover the Easter School holidays, and will no doubt continue with the DSL and Comparative Kingship projects then.