Week Beginning 7th September 2020

This was a pretty busy week, involving lots of different projects.  I set up the systems for a new place-name project focusing on Ayrshire this week, based on the system that I initially developed for the Berwickshire project and has subsequently been used for Kirkcudbrightshire and Mull.  It didn’t take too long to port the system over, but the PI also wanted the system to be populated with data from the GB1900 crowdsourcing project.  This project has transcribed every place-name on the GB1900 Ordnance Survey maps across the whole of the UK and is an amazing collection of data totalling some 2.5 million names.  I had previously extracted a subset of names for the Mull and Ulva project so thankfully had all of the scripts needed to get the information for Ayrshire.  Unfortunately what I didn’t have was the data in a database, as I’d previously extracted it to my PC at work.  This meant that I had to run the extraction script again on my home PC, which took about three days to work through all of the rows in the monstrous CSV file.  Once this was complete I could then extract the names found in the Ayrshire parishes that the project will be dealing with, resulting in almost 4,000 place-names.  However, this wasn’t the end of the process as while the extracted place-names had latitude and longitude they didn’t have grid references or altitude.  My place-names system is set up to automatically generate these values and I could customise the scripts to automatically apply the generated data to each of the 4000 places.  Generating the grid reference was pretty straightforward but grabbing the altitude was less so, as it involved submitting a query to Google Maps and then inserting the returned value into my system using an AJAX call.  I ran into difficulties with my script exceeding the allowed number of Google Map queries and also the maximum number of page requests on our server, resulting in my PC getting blocked by the server and a ‘Forbidden’ error being displayed instead, but with some tweaking I managed to get everything working within the allowed limits.

I also continued to work on the Second Edition of the Historical Thesaurus.  I set up a new version of the website that we will work on for the Second Edition, and created new versions of the database tables that this new site connects to.  I also spent some time thinking about how we will implement some kind of changelog or ‘history’ feature to track changes to the lexemes, their dates and corresponding categories.  I had a Zoom call with Marc and Fraser on Wednesday to discuss the developments and we realised that the date matching spreadsheets I’d generated last week could do with some additional columns from the OED data, namely links through to the entries on the OED website and also a note to say whether the definition contains ‘(a)’ or ‘(also’ as these would suggest the entry has multiple senses that may need a closer analysis of the dates.

I then started to update the new front-end to use the new date structure that we will use for the Second Edition (with dates stored in a separate date table rather than split across almost 20 different date fields in the lexeme table).  I updated the timeline visualisations (mini and full) to use this new date table, and although this took quite some time to get my head around the resulting code is MUCH less complicated than the horrible code I had to write to deal with the old 20-odd date columns.  For example, the code to generate the data for the mini timelines is about 70 lines long now as opposed to over 400 previously.

The timelines use the new data tables in the category browse and the search results.  I also spotted some dates weren’t working properly with the old system but are working properly now.  I then updated the ‘label’ autocomplete in the advanced search to use the labels in the new date table.  What I still need to do is update the search to actually search for the new labels and also to search the new date tables for both ‘simple’ and ‘complex’ year searches.  This might be a little tricky, and I will continue on this next week.

Also this week I gave Gerry McKeever some advice about preserving the data of his Regional Romanticism project, spoke to the DSL people about the wording of the search results page, gave feedback on and wrote some sections for Matthew Creasy’s Chancellor’s Fund proposal, gave feedback to Craig Lamont regarding the structure of a spreadsheet for holding data about the correspondence of Robert Burns and gave some advice to Rob Maslen about the stats for his ‘City of Lost Books’ blog.  I also made a couple of tweaks to the content management system for the Books and Borrowers project based on feedback from the team.

I spent the remainder of the week working on the redevelopment of the Anglo-Norman dictionary.  I updated the search results page to style the parts of speech to make it clearer where one ends and the next begins.  I also reworked the ‘forms’ section to add in a cut-off point for entries that have a huge number of forms.  In such cases the long list of cut off and an ellipsis is added in, together with an ‘expand’ button.  Pressing on this scrolls down the full list of forms and the button is replaced with a ‘collapse’ button.  I also updated the search so that it no longer includes cross references (these are to be used for the ‘Browse’ list only) and the quick search now defaults to an exact match search whether you select an item from the auto-complete or not.  Previously it performed an exact match if you selected an item but defaulted to a partial match if you didn’t.  Now if you search for ‘mes’ (for example) and press enter or the search button your results are for “mes” (exactly).  I suspect most people will select ‘mes’ from the list of options, which already did this, though.  It is also still possible to use the question mark wildcard with an ‘exact’ search, e.g. “m?s” will find 14 entries that have three letter forms beginning with ‘m’ and ending in ‘s’.

I also updated the display of the parts of speech so that they are in order of appearance in the XML rather than alphabetically and I’ve updated the ‘v.a.’ and ‘v.n.’ labels as the editor requested.  I also updated the ‘entry’ page to make the ‘results’ tab load by default when reaching an entry from the search results page or when choosing a different entry in the search results tab.  In addition, the search result navigation buttons no longer appear in the search tab if all the results fit on the page and the ‘clear search’ button now works properly.  Also, on the search results page the pagination options now only appear if there is more than one page of results.

On Friday I began to process the entry XML for display on the entry page, which was pretty slow going, wading through the XSLT file that is used to transform the XML to HTML for display.  Unfortunately I can’t just use the existing XSLT file from the old site because we’re using the editor’s version of the XML and not the system version, and the two are structurally very different in places.

So far I’ve been dealing with forms and have managed to get the forms listed, with grammatical labels displayed where available and commas separating forms and semi-colons separating groups of forms.  Deviant forms are surrounded by brackets.  Where there are lots of forms the area is cut off as with the search results.  I still need to add in references where these appear, which is what I’ll tackle next week.  Hopefully now I’ve started to get my head around the XML a bit progress with the rest of the page will be a little speedier, but there will undoubtedly be many more complexities that will need to be dealt with.