Week Beginning 9th November 2020

I took Friday off this week as I had a dentist’s appointment across town in the West End and I decided to take the opportunity to do some Christmas shopping whilst all the shops in Glasgow are still open (there’s some talk of us having greater Covid restrictions imposed in the next week or so).  I spent a couple of days this week working on the Dictionary of the Scots Language, a project I’ve been meaning to return to for many months but have been too busy with other work to really focus on.  Thankfully in November with the launch of the second edition of the Historical Thesaurus out of the way I have a bit of time to get back into the outstanding DSL issues.

Rhona Alcorn had sent a list of outstanding tasks a while back and I spent some time going through this and commenting on each item.  I then began to work through each item, starting with fixing cross references in our ‘V3’ test site (which features data that the editors have been working on in recent years).  Cross references appear differently in the XML for this version so I needed to update the XSLT in order to make them work correctly.  I then updated the full-text extraction script that prepares data for inclusion in the Solr search engine.  Previously this was stripping out all of the XML tags in order to leave the plain text, but unfortunately there were occasions where the entries contains words separated by tags but no spaces, meaning when the tags were removed the words ended up joined together.  I fixed this by adding a space character before every XML tag before the tags were stripped out.  This resulted in plain text that often contained multiple spaces between words, but thankfully Solr ignores these when it indexes the text.  I asked Raymond of Arts IT Support to upload the new text to the server and tested things out and all worked perfectly.

After this I moved on to creating a new ordering for the ‘browse’ feature.  This new ordering takes into consideration parts of speech and ensures that supplemental entries appear below main entries.  It also correctly positions entries beginning with a yogh.  I’d created a script to generate the new browse order many months ago, so I could just tweak this and then use it to update the database.  After that I needed to make some updates to the V2 and V3 front-ends to use the new ordering fields, which took a little time, but it seems to have worked successfully.  I may need to tweak the ordering further, but will await feedback before I make any changes.

I then moved on to investigating searches for accented characters, that were apparently not working correctly.  I noticed that the htaccess script was not set up to accept accented characters so I updated this.  However, the advanced headword search itself was finding forms with accented characters in them if the non-accented version was passed.  The ‘privace’ example was redirecting to the entry page as only one result was matched, but if you perform a search for ‘*vace’ it finds and displays the accented headword in both V2 and V3 but not the live site.  Therefore I think this issue is now sorted.  However, we should perhaps strip out accents from any submitted search terms as allowing accented characters to be submitted (e.g. for *vacé) gives the impression that we allow accented characters to be searched for distinctly from their unaccented versions and the results including both accented and unaccented might confuse people.

The last DSL issue I looked at involved hiding superscript characters in certain circumstances (after ‘geo’ tags in ‘cref’ tags).  There are 3093 SND entries that include the text ‘</geo><su>’ or ‘</geo> <su>’ and I updated the XSLT file that transforms the XML into HTML to deal with these.  Previously it transformed the <su> tag into the HTML superscript tag <sup>.  I’ve updated it so that it now checks to see what the tag’s preceding sibling is.  If it’s a <geo> tag it now adds the class ‘noSup’ to the generated <sup>.  Currently I’ve set <sup> elements with this class to have a pink background so the editors can check to see how the match is performing, and once they’re happy with it I can update the CSS to hide ‘noSup’ elements.

Other than DSL work I also spent some time continuing to work on the redevelopment of the Anglo-Norman Dictionary and completed an initial version of the label search that I began working on last week.  The search form as discussed last week hasn’t changed, but it’s now possible to submit the search, navigate through the search results, return to the search form to make changes to your selection and view entries.  I have needed to overhaul how the search page works to accommodate the label search, which required some pretty major changes behind the scenes, but hopefully none of the other searches will have been affected by this.  You can select a single label and search for that, e.g. ‘archit.’ and if you then refine your search you will see that the label is ‘remembered’ in the form so you can add to it or remove it, for example if you’re interested in all of the entries that are labelled ‘archit.’ and ‘mil’.  As mentioned last week, adding or changing a citation year resets the boxes as different labels are displayed depending on the years chosen.  The chosen year is remembered by the form if you choose to refine your search and the labels and selected labels and Booleans are pulled in alongside the remembered year.  So for example if you want to find entries that feature a sense labelled ‘agricultural’ or ‘bot.’ that have a citation between 1400 and 1410 you can do this.  On the entry page both semantic and usage labels are now links that lead through to the search results for the label in question.  I’ve currently given both label types a somewhat garish pink colour, but this can be changed, or we could use two different colours for the two types.

Other than these projects, I fixed an issue with the 18th century Glasgow borrowers site (https://18c-borrowing.glasgow.ac.uk/) and made some tweaks to the place-names of Iona site, fixing the banner and creating Gaelic versions of the pages and menu items.  The site is not live yet, but I’m pretty happy with how it’s looking.  Here’s an image of the banner I created:

Also this week I spoke to Kirsteen McCue about the project she’s currently preparing a proposal for and I created a new version of the Burns Suppers map for Paul Malgrati.  This was rather tricky as his data is contained in a spreadsheet that has more than 2,500 rows and more than 90 columns, and it took some time to process this in a way that worked, especially as some fields contained carriage returns which resulted in lines being split where they shouldn’t be when the data was exported.  However, I got there in the end, and next week I hope to develop the filters for the data.