Week Beginning 19th August 2019

After meeting with Fraser to discuss his Scots Thesaurus project last Friday I spent some time on Monday this week writing a script that returns some random SND or DOST entries that met certain criteria, so as to allow him to figure out how these might be placed into HT categories.  The script brings back main entries (as opposed to supplements) that are nouns, are monosemous (i.e. no other noun entries with the same headword), have only one sense (i.e. not multiple meanings within the entry), have fewer than 5 variant spellings, have single word headwords and have definitions that are relatively short (100 characters or less).  Whilst writing the script I realised that database queries are somewhat limited on the server and if I try to extract the full SND or DOST dataset to then select rows that meet the criteria in my script these limits are reached and the script just displays a blank page.  So what I had to do is to set the script up to bring back a random sample of 5000 main entry nouns that don’t have multiple words in their headword in the selected dictionary.  I then have to apply the other checks on this set of 5000 random entries.  This can mean that the number of outputted entries ends up being less than the 200 that Fraser was hoping for, but still provides a good selection of data.  The output is currently an HTML table, with IDs linking through to the DSL website and I’ve given the option of setting the desired number of returned rows (up to 1000) and the number of characters that should be considered a ‘short’ definition (up to 5000).  Fraser seemed pretty happy with how the script is working.

Also this week I made some further updates to the new song story for RNSN and I spent a large amount of time on Friday preparing for my upcoming PDR session.  On Tuesday I met with Luca to have a bit of a catch-up, which was great.  I also fixed a few issues with the Thesaurus of Old English data for Jane Roberts and responded to a request for developer effort from a member of staff who is not in the College of Arts.  I also returned to working on the Books and Borrowing pilot system for Matthew Sangster, going through the data I’d uploaded in June, exporting rows with errors and sending these to Matthew for further checking.  Although there are still quite a lot of issues with the data, in terms of its structure things are pretty fixed, so I’m going to begin work on the front-end for the data next week, the plan being that I will work with the sample data as it currently stands and then replace it with a cleaner version once Matthew has finished working with it.

I divided the rest of my time this week between DSL and SCOSYA.  For the DSL I integrated the new APIs that I was working on last week with the ‘advanced search’ facilities on both the ‘new’ (v2 data) and ‘sienna’ (v3 data) test sites.  As previously discussed, the ‘headword match type’ from the live site has been removed in favour of just using wildcard characters (*?”).  Full-text searches, quotation searches and snippets should all be working, in addition to headword searches.  I’ve increased the maximum number of full-text / quotation results from 400 to 500 and I’ve updated the warning messages so they tell you how many results your query would have returned if the total number is greater than this.  I’ve tested both new versions out quite a bit and things are looking good to me, and I’ve contacted Ann and Rhona to let them know about my progress.  I think that’s all the DSL work I can do for now, until the bibliography data is made available.

For SCOSYA I engaged in an email conversation with Jennifer and others about how to cover the costs of MapBox in the event of users getting through the free provision of 200,000 map loads a month after the site launches next month.  I also continued to work on the public atlas interface based on discussions we had at a team meeting last Wednesday.  The main thing was replacing the ‘Home’ map, which previously just displayed the questionnaire locations, with a new map that highlights certain locations that have sound clips that demonstrate an interesting feature.  The plan is that this will then lead users on to finding out more about these features in the stories, whilst also showing people where some of the locations to project visited are.  This meant creating facilities in the CMS to manage this data, updating the database, updating the API and updating the front-end, so a fairly major thing.

I updated the CMS to include a page to manage the markers that appear on the new ‘Home’ map.  Once logged into the CMS click on the ‘Browse Home Map Clips’ menu item to load the page.  From here staff can see all of the locations and add / edit the information for a location (adding an MP3 file and the text for the popup).  I added the data for a couple of sample locations that E had sent me.  I then added a new endpoint to the API that brings back the information about the Home clips and updated the public atlas to replace the old ‘Home’ map with the new one.  Markers are still the bright blue colour and drop into the map.  I haven’t included the markers for locations that don’t have clips.  We did talk at the meeting about including these, but I think they might just clutter the map up and confuse people.

Getting the links to the stories to work turned out to be unexpectedly tricky, as the only information that changes in the link is after the hash sign, and browsers treat such links as referring to a different point on the same page rather than doing a full page reload.  So I’ve had to handle all of the loading of the story and updating the menu in JavaScript rather than it all reloading and just working, as would happen on a full page reload.  Below is a screenshot of how the ‘Home’ map currently looks, with a pop-up open:

I also reordered and relabelled the menu, and have changed things so that you can now click on an open section to close it.  Currently doing so still triggers the map reload for certain menu items (e.g. Home).  I’ll try to stop it doing so, but I haven’t managed to yet.

I also implemented the ‘Full screen’ slide type, although I think we might need to change the style of this.  Currently it takes up about 80% of the map width, pinned to the right hand edge (which it needs to be for the animated transitions between slides to work).  It’s only as tall as the content of the slide needs it to be, though, so the map is not really being obscured, which is what Jennifer was wanting.  Although I could set it so that the slide is taller, this would then shift the navigation buttons down to the bottom of the map and if people haven’t scrolled the map fully into view they might not notice the buttons.  I’m not sure what the best approach here might be, and this needs further discussion.

I also changed the way location data is returned from the API this week, to ensure that the GeoJSON area data is only returned from the API when it is specifically asked for, rather than by default.  This means such data is only requested and used in the front-end when a user selects the ‘area’ map in the ‘Explore’ menu.  The reason for doing this is to make things load quicker and to reduce the amount of data that was being downloaded unnecessarily.  The GeoJSON data was rather large (several megabytes) and requesting this each time a map loaded meant the maps took some time to load on slower connections.  With the areas removed the stories and ‘explore’ maps that are point based are much quicker to load.  I did have to update a lot of code so that things still work without the area data being present, and I also needed to update all API URLs contained in the stories to specifically exclude GeoJSON data, but I think it’s been worth spending the time doing this.

Week Beginning 12th August 2019

I’d taken Tuesday off this week to cover the last day of the school holidays so it was a four-day week for me.  It was a pretty busy four days, though, involving many projects.  I had some app related duties to attend to, including setting up a Google Play developer account for people in Sports and Recreation and meeting with Adam Majumdar from Research and Innovation about plans for commercialising apps in future.  I also did some further investigation into locating the Anglo-Norman Dictionary data, created a new song story for RNSN and read over Thomas Clancy’s Iona proposal materials one last time before the documents are submitted.  I also met with Fraser Dallachy to discuss his Scots Thesaurus plans and will spend a bit of time next week preparing some data for him.

Other than these tasks I split my remaining time between SCOSYA and DSL.  For SCOSYA we had a team meeting on Wednesday to discuss the public atlas.  There is only about a month left to complete all development work on the project and I was hoping that the public atlas that I’d been working on recently was more or less complete, which would then enable me to move on to the other tasks that still need to be completed, such as the experts interface and the facilities to manage access to the full dataset.  However, the team have once again changed their minds about how they want the public atlas to function and I’m therefore going to have to devote more time to this task than I had anticipated, which is rather frustrating at this late stage.  I made a start on some of the updates towards the end of the week, but there is still a lot to be done.

For DSL we finally managed to sort out the @dsl.ac.uk email addresses, meaning the DSL people can now use their email accounts again.  I also investigated and fixed an issue with the ‘v3’ version of the API which Ann Ferguson had spotted.  This version was not working with exact searches, which use speech marks.  After some investigation I discovered that the problem was being caused by the ‘v3’ API code missing a line that was present in the ‘v2’ API code.  The server automatically escapes quotes in URLs by adding a preceding slash (\).  The ‘v2’ code was stripping this slash before processing the query, meaning it correctly identified exact searches.  As the ‘v3’ code didn’t get rid of the slashes it wasn’t finding the quotation mark and was not treating it as an exact search.

I also investigated why some DSL entries were missing from the output of my script that prepared data for Solr.  I’d previously run the script on my laptop, but running it on my desktop instead seemed to output the full dataset including the rows I’d identified as being missing from the previous execution of the script.  Once I’d outputted the new dataset I sent it on to Raymond for import into Solr and then I set about integrating full-text searching into both ‘v2’ and ‘v3’ versions of the API.  This involved learning how Solr uses wildcard characters and Boolean searches, running some sample queries via the Solr interface and then updating my API scripts to connect to the Solr interface, format queries in a way that Solr could work with, submit the query and then deal with the results that Solr outputs, integrating these with fields taken from the database as required.

Other than the bibliography side of things I think that’s the work on the API more or less complete now (I still need to reorder the ‘browse’ output).  What I haven’t done yet is to work on the advanced search pages of the ‘new’ and ‘sienna’ versions of the website to actually work with the new APIs, so as of yet you can’t perform any free-text searches through these interfaces but only directly through the APIs.  Working to connect the front-ends fully to the APIs is my next task, which I will try to start on next week.

Week Beginning 29th July 2019

I split my time this week mostly between DSL and SCOSYA, with the bulk of my time spent making further updates to the SCOSYA public atlas based on feedback from a meeting I had last week with Jennifer and E, and also from a document that I had been given based on another project meeting which I hadn’t been invited to where the interface was discussed.  Updates included changing the ‘story’ view so that the rating data was displayed as points only rather than as areas and points.  It would appear that the team is going off the idea of using areas, which is a shame in some ways as a lot of work went into the development of the area display, but in other ways it is better as the GeoJSON data for each area is rather large and it makes the download of the rating data take quite some time over a slow internet connection.  The areas are still available in the ‘examples’ view, so are still included in the API, but it’s possible that this might be dropped too in future.  I removed the option to display both point and area data together on the map too, at the request of the team.  I also updated the rating colours with some slightly tweaked ones that E had sent me, although I still think it is far too difficult to differentiate the rating levels, especially levels 1 and 2 and 4 and 5, which are really hard to tell apart, and I’m not entirely pleased with how things are looking from the point of view of wanting a useable to easy to understand visualisation.  On a purely aesthetic level the shades look nice, but I don’t think that’s really good enough, personally.

I also removed the group selection entirely from the public atlas at the request of the team, which again I was happy to get rid of as it was only half implemented and still needed a lot of work.  I will still need to ensure that this feature works in the experts interface once I move on to creating that, though.   Other changes implemented included ensuring that the left-hand panel height resizes to fit the screen height.  This was sort of working before but there were some instances where the panel wasn’t resizing.  I think I’ve caught all these now.  Also, when a story is selected the left-hand panel now slides away.  The first story slide now features a ‘choose another story’ button which makes the left-hand panel appear again.  The story panel also now scrolls if the content is longer than the panel.  The panel also now resizes every time a slide loads so should adjust to screen dimensions a bit better.  Hopefully with these changes the story view is a bit more usable now on mobile devices.

Further updates to the atlas that I made this week included introducing a new fractional zoom level.  It’s now possible to zoom in and out at 0.25 increments rather than increments of 1.  This works for scroll-zooming, pinch zooming on touchscreens and when using the ‘+/-‘ buttons.  This more granular approach makes it a lot easier to display just the data you’re interested in and reduces the difficulty some people had of accidentally zooming in on the map and ending up at a very high zoom level in the middle of the sea very quickly.  It also means it’s possible to get all of Scotland including Shetland positioned on a screen at once, as the following map demonstrates:

It also demonstrates how difficult it currently is to differentiate between rating level colours.  You can also see that the ‘examples’ tab has now been renamed ‘Who says what’, and that there is another new tab labelled ‘Community Voices’.  This latter tab is the feature that was previously being called the ‘listening atlas’, which is going to present sound clips and transcriptions for all the questionnaire locations.  I spent about a day working on this feature.  When the section is expanded any locations that have community voices data specified get displayed.  Currently locations are displayed as green circles, the green being taken from the logo and used to differentiate the markers from other marker types.  Clicking on a marker opens a pop-up containing links to the sound files and the transcriptions.  I’ve used the full HTML5 Audio player here because the clips are longer and people may wish to jump to points in the recordings.  Transcriptions are hidden and scroll down when you click on the link to open them.  The supplied transcription text was just plain text, and I’ve had to add some formatting to it.  I’ve made the transcriptions into tables with a different shade behind the speaker column.  The screenshot below shows how the feature currently works:

The Community Voices data is stored in the database and I’ve updated the API to provide access to the data.  There are two new endpoints, one for listing all locations that have data, and one that outputs all data for a given location.  I’ve also updated the CMS to provide facilities to enable the team to upload this information to the database.  Within the CMS there is now a new ‘Browse community voices’ menu item.  For each location there are columns noting whether there are young and old soundfiles and transcriptions present.  If you press the ‘Edit’ button for a location you can supply new information or edit the existing information.  Note that you don’t have to supply all the information for a location at once – you could provide soundfiles first and transcriptions later, or provide both for ‘young’ and supply ‘old’ later.  As soon as there is information for ‘young’ or ‘old’ for a location it will automatically be added to the ‘community voices’ map in the front-end.

I also spent some time going through the document produced after the team meeting I hadn’t been invited too, responding to some of the suggestions that I didn’t think were a good idea (e.g. disabling zooming by scroll-wheel or pinch gesture and forcing people to use the ‘+/-‘ buttons), discussing the options for large changes that had been proposed (e.g. amalgamating the ‘stories’ and ‘examples’ rather than having them as separate tabs) and implementing things that were requested that didn’t raise any other issues (e.g. removing the ‘feature’ text from the story slide display and ensuring the story pane doesn’t overlap with the ‘+/-‘ buttons.).

For the DSL I continued to work with processing the data and preparing it for ingest into Solr.  There were a few issues with the data, which were being caused by unescaped characters appearing within the XML files (e.g. ‘<’ or ‘&’).  I updated my script to add htmlspecialchars to the output and when I regenerated the data and passed it on to Raymond he successfully managed to add it to Solr.  This was both for the ‘V2’ data and the ‘V3’ data.  I did unfortunately notice that some rows in the ‘V2’ data seem to be missing from my output script and I’m not sure why.  E.g. in the Solr browser ‘snd7232’ and ‘snds4956’ only have one search field from an earlier upload and these IDs are not found in my output file, even though these rows are in the database the output script connects to.  I’ll need to investigate this once I get back to working with the data.

Also for DSL I engaged in an email discussion with the DSL’s new IT people and UoG IT people about the new dsl.ac.uk email addresses.  Although we had updated things last week that should have ensured that the emails worked no emails were getting through.  It turned out that this was being caused by a pointer record in the DNS that was pointing to a subdomain rather than the main domain, which was confusing the email system.  Hopefully this issue should now be sorted, though.

Also this week I heard from Bryony Randall in English Literature about an AHRC follow-on funding project that I’d helped her write the proposal for.  The proposal was accepted, which is really great news, and I’ll by helping Bryony with the technical aspects of her project later on this year.  I also had a further discussion with Thomas Clancy about his Iona project that is inching closer to submission.  I also set up new App and Play Store developer accounts for a new app that people in Sport and Recreation are putting together with an external developer, fixed a bug in the Mary Queen of Scots’ Letters CMS I’d created for Alison Wiggins and tweaked one of the Levenshtein scripts I’d created for the HT / OED data linking for Fraser.  All in all it was a pretty full-on week.