Week Beginning 4th September 2023

I continued with the new developments for the Dictionaries of the Scots Language for most of this week, focussing primarily on implementing the sparklines for dates of attestation.  I decided to use the same JavaScript library as I used for the Historical Thesaurus (https://omnipotent.net/jquery.sparkline) to produce a mini bar chart for the date range, with either a 1 when a date is present or a zero where a date is not present.  In order to create the ranges for an entry all of the citations that have a date for the entry are returned in date order.  For SND the sparkline range is 1700 to 2000 and for DOST the range is 1050 to 1700.  Any citations with dates beyond this are given a date of the start or end as applicable.  Each year in the range is created with a zero assigned by default and then my script iterates through the citations to figure out which of the years needs to be assigned a 1, taking into consideration citations that have a date range in addition to ones that have a single year.  After that my script iterates through the years to generate blocks of 1 values where individual 1s are found 25 years or less from each other, as I’d agreed with the team, in order to make continuous periods of usage.  My script also generates a textual representation of the blocks and individual years that is then used as a tooltip for the sparkline.

I’d originally intended each year in the range to then appear as a bar in the sparkline, with no gaps between the bars in order to make larger blocks, but the bar chart sparkline that the library offers has a minimum bar width of 1 pixel.  As the DOST period is 650 years this meant the sparkline would be 650 pixels wide.  The screenshot below shows how this would have looked (note that in this and the following two screenshots the data represented in the sparklines is test data and doesn’t correspond to the individual entries):

I then tried grouping the individual years into bars representing five years instead.  If a 1 was present in a five-year period then the value for that five year block was given a 1, otherwise it was given a 0.  As you can see in the following screenshot, this worked pretty well, giving the same overall view of the data but in a smaller space.  However, the sparklines were still a bit too long.  I also added in the first attested date for the entry to the left of the sparkline here, as was specified in the requirements document:

As a further experiment I grouped the individual years into bars representing a decade, and again if a year in that decade featured a 1 the decade was assigned a 1, otherwise it was assigned a 0.  This resulted in a sparkline that I reckon is about the right size, as you can see in the screenshot below:

With this in place I then updated the Solr indexes for entries and quotations to add in fields for the sparkline data and the sparkline tooltip text.  I then updated my scripts that generated entry and quotation data for Solr to incorporate the code for generating the sparklines, first creating blocks of attestation where individual citation dates were separated by 25 years or less and then further grouping the data into decades.  It took some time to get this working just right.  For example, on my first attempt when encountering individual years the textual version was outputting a range with the start and end year the same (e.g. 1710-1710) when it should have just outputted a single year.  But after a few iterations the data outputted successfully and I imported the new data into Solr.

With the sparkline data in Solr I then needed to update the API to retrieve the data alongside other data types and after that I could work with the data in the front-end, populating the sparklines for each result with the data for each entry and adding in the textual representation as a tooltip.  Having previously worked with a DOST entry as a sample, I realised at this point that as the SND period is much shorter (300 years as opposed to 650) the SND sparklines would be a lot shorter (30 pixels as opposed to 65).  Thankfully the sparkline library allows you to specify the width of the bars as each sparkline is generated and I set the width of SND bars to two pixels as opposed to the one pixel for DOST, making the SND sparklines a comparable 600 pixels wide.  It does mean that the visualisation of the SND data is not exactly the same as for DOST (e.g. an individual year is represented as 2 pixels as opposed to 1) but I think the overall picture given is comparable and I don’t think this is a problem – we are just giving an overall impression of periods of attestation after all.  The screenshot below shows the search results with the sparklines working with actual data, and also demonstrates a tooltip that displays the actual periods of attestation:

At this point I spotted another couple of quirks that needed to be dealt with.  Firstly, we have some entries that don’t feature any citations that include dates.  These understandably displayed a blank sparkline.  In such cases I have updated the tooltip text to display ‘No dates of attestation currently available’.  Secondly, there is a bug in the sparkline library that means an empty sparkline is displayed if all data values are identical.  Having spotted this I updated my code to ensure a full block of colour was displayed in the sparkline instead of white.

With the sparklines in the search results now working I then moved onto the display of sparklines in the entry page.  I wasn’t entirely sure where was the best place to put the sparkline so for now I’ve added it to the ‘About this entry’ section.  I’ve also added in the dates of attestation to this section too.  This is a simplified version showing the start and end dates.  I’ve used ‘to’ to separate the start and end date rather than a dash because both the start and end dates can in themselves be ranges.  This is because here I’m using the display version of the first date of the earliest citation and the last date of the latest citation (or first date if there is no last date).  Note that this includes prefixes and representations such as ’15..’.  The sparkline tooltip uses the raw years only.  You can see an entry with the new dates and sparkline below:

The design of the sparklines isn’t finalised yet and we may choose to display them differently.  For example, we don’t need to use the purple I’ve chosen and we could have rounded ends.  The following screenshot shows the sparklines with the blue from the site header as a bar colour and rounded ends.  This looks quite pleasing, but rounded ends do make it a little more difficult to see the data at the ends of the sparkline.  See for example DOST ‘scunner n.’ where the two lines at the very right of the sparkline are a bit hard to see.

I also managed to complete the final task in this block of work for the DSL, which was to add in links to the search results to download the data as a CSV.  The API already has facilities to output data as a CSV, but I needed to tweak this a bit to ensure the data was exported as we needed it.  Fields that were arrays were not displaying properly and certain fields needed to be supressed.   For other sites I’ve developed I was able to link directly to the API’s CSV output from the front-end but the DSL’s API is not publicly accessible so I had to do things a littler differently here.  Instead pressing on the ‘download’ link fires an AJAX call to a PHP script that passes the query string to the API without exposing the URL of the API, then takes the CSV data and presents it as a downloadable file.  This took a bit of time to sort out as the API was in itself offering the CSV as a downloadable file and this wasn’t working when being passed to another script.  Instead I had to set the API to output the CSV data on screen, meaning the scripts called via AJAX could then grab this data and process it.

With all of this working I put in a Helpdesk request to get the Solr instances set up and populated on the server and I then copied all of the updated files to the DSL’s test instance.  As of Friday the new Solr indexes don’t seem to be working but hopefully early next week everything will be operational.  I’ll then just need to tweak the search strings of the headword search so that the new Solr headword search matches the existing search.

Also this week I had a chat with Thomas Clancy about the development of the front-end for the Iona place-names project.  About a year ago I wrote a specification for the front-end but never heard anything further about it, but it looks like development will be starting soon.  I also had a chat with Jennifer Smith about the data for the Speak For Yersel spin-off projects and it looks like this will be coming together in the next few weeks too.  We also discussed another project that may use the data from SCOSYA and I might have some involvement in this.

Other than that I spent a bit of time on the Anglo-Norman Dictionary, creating a CSS file to style the entry XML in the Oxygen XML editor’s ‘Author’ view.  The team are intending to use this view to collaborate on the entries and previously we hadn’t created any styles for it.  I had to generate styles that replicated the look of the online dictionary as much as possible, which took some time to get right.  I’m pretty happy with the end result, though, which you can see in the following screenshot: