Week Beginning 9th May 2022

I spent most of the week continuing with the Speak For Yersel website, which is now nearing completion.  A lot of my time was spent tweaking things that were already in place, and we had a Zoom call on Wednesday to discuss various matters too.  I updated the ‘explore more’ age maps so they now include markers for young and old who didn’t select ‘scunnered’, meaning people can get an idea of the totals.  I also changed the labels slightly and the new data types have been given two shades of grey and smaller markers, so the data is there but doesn’t catch the eye as much as the data for the selected term.  I’ve updated the lexical ‘explore more’ maps so they now actually have labels and the ‘darker dots’ text (which didn’t make much sense for many maps) has been removed.  Kinship terms now allow for two answers rather than one, which took some time to implement in order to differentiate this question type from the existing ‘up to 3 terms’ option.  I also updated some of the pictures that are used and added in an ‘other’ option to some questions.  I also updated the ‘Sounds about right’ quiz maps so that they display different legends that match the question words rather than the original questionnaire options.  I needed to add in some manual overrides to the scripts that generate the data for use in the site for this to work.

I also added in proper text to the homepage and ‘about’ page.  The former included a series of quotes above some paragraphs of text and I wrote a little script that highlighted each quote in turn, which looked rather nice.  This then led onto the idea of having the quotes positioned on a map on the homepage instead, with different quotes in different places around Scotland.  I therefore created an animated GIF based on some static map images that Mary had created and this looks pretty good.

I then spent some time researching geographical word clouds, which we had been hoping to incorporate into the site.  After much Googling it would appear that there is no existing solution that does what we want, i.e. take a geographical area and use this as the boundaries for a word cloud, featuring different coloured words arranged at various angles and sizes to cover the area.  One potential solution that I was pinning my hopes on was this one: https://github.com/JohnHenryEden/MapToWordCloud which promisingly states “Turn GeoJson polygon data into wordcloud picture of similar shape.”.  I managed to get the demo code to run, but I can’t get it to actually display a word cloud, even though the specifications for one are in the code.  I’ve tried investigating the code but I can’t figure out what’s going wrong.  No errors are thrown and there’s very little documentation.  All that happens is a map with a polygon area is displayed – no word cloud.

The word cloud aspects of the above are based on another package here: https://npm.io/package/wordcloud and this package allows you to specify a shape to use as an outline for the cloud, and one of the examples shows words taking up the shape of Taiwan: https://wordcloud2-js.timdream.org/#taiwan  However, this is a static image not an interactive map – you can’t zoom into it or pan around it.  One possible solution may be to create images of our regions, generate static word cloud images as with the above and then stitch the images together for form a single static map of Scotland.  This would be a static image, though, and not comparable to the interactive maps we use elsewhere in the website.  Programmatically stitching the individual region images together might also be quite tricky.  I guess another option would be to just allow users to select an individual region and view the static word cloud (dynamically generated based on the data available when the user selects to view it) for the selected region, rather than joining them all together.

I also looked at some further options that Mary had tracked down.  The word cloud on a leaflet map (http://hourann.com/2014/js-devs-dont-get-lost/leaflet-wordcloud.html?sydney) only uses a circle for the boundaries of the word cloud.  All of the code is written around the use of a circle (e.g. using diameters to work out placement) so couldn’t really be adapted to work with a complex polygon.  We could work out a central point for each region and have a circular word cloud positioned at that point, but we wouldn’t be able to make the words fill the entire region.  The second of Mary’s links (https://www.jasondavies.com/wordcloud/) as far as I can tell is just a standard word cloud generator with no geographical options.  The third option (https://github.com/peterschretlen/leaflet-wordcloud) has no demo or screenshot or much information about it and I’m afraid I can’t get it to work.

The final option (https://dagjomar.github.io/Leaflet.ParallaxMarker/) is pretty cool but it’s not really a word cloud as such.  Instead it’s a bunch of labels set to specific lat/lng points and given different levels which sets their size and behaviour on scroll.  We could use this to set the highest rated words to the largest level with lower rated words at lower level and position each randomly in a region, but it’s not really a word cloud and it would be likely that words would spill over into neighbouring regions.

Based on the limited options that appear to be out there, I think creating a working, interactive map-based word cloud would be a research project in itself and would take far more time than we have available.

Later on in the week Mary sent me the spreadsheet she’d been working on to list settlements found in postcode areas and to link these areas to the larger geographical regions we use.  This is exactly what we needed to fill in the missing piece in our system and I wrote a script that successfully imported the data.  For our 411 areas we now have 957 postcode records and 1638 settlement records.  After that I needed to make some major updates to the system.  Currently a person is associated with an area (e.g. ‘Aberdeen Southwest’) but I need to update this so that a person is associated with a specific settlement (e.g. ‘Ferryhill, Aberdeen’), which is then connected to the area and from the area to one of our 14 regions (e.g. ‘North East (Aberdeen)’).

I updated the system to make these changes and updated the ‘register’ form, which now features an autocomplete for the location – start typing a place and all matches appear.  Behind the scenes the location is saved and connected up to areas and regions, meaning we can now start generating real data, rather than a person being assigned a random area.  The perception follow-on now connects the respondent up with the larger region when selecting ‘listener is from’, although for now some of this data is not working.

I then needed to further update the registration page to add in an ‘outside Scotland’ option so people who did not grow up in Scotland can use the site.  Adding in this option actually broke much of the site:  registration requires an area with a geoJSON shape associated with the selected location otherwise it fails and the submission of answers requires this shape in order to generate a random marker point and this then failed when the shape wasn’t present.  I updated the scripts to fix these issues, meaning an answer submitted by an ‘outside’ person has a zero for both latitude and longitude, but then I also needed to update the script that gets the map data to ensure that none of these ‘outside’ answers were returned in any of the data used in the site (both for maps and for non-map visualisations such as the sliders).  So, much has changed and hopefully I haven’t broken anything whilst implementing these changes.  It does now mean that ‘outside’ people can now be included and we can export and use their data in future, even though it is not used in the current site.

Further tweaks I implemented this week included: changing the font sizes of some headings and buttons; renaming the ‘activities’ and ‘more’ pages as requested; adding ‘back’ buttons from all ‘activity’ and ‘more’ pages back to the index pages; adding an intro page to the click exercise as previously it just launched into the exercise whereas all others have an intro.  I also added summary pages to the end of the click and perception activities with links through to the ‘more’ pages and removed the temporary ‘skip to quiz’ option.  I also added progress bars to the click and perception activities.  Finally, I switched the location of the map legend from top right to top left as I realised when it was in the top right it was always obscuring Shetland whereas there’s nothing in the top left.  This has meant I’ve had to move the region label to the top right instead.

Also this week I continued to work on the Allan Ramsay ‘Gentle Shepherd’ performance data.  I added in faceted browsing to the tabular view, adding in a series of filter options for location, venue, adaptor and such things.  You can select any combination of filters (e.g. multiple locations and multiple years in combination).  When you select an item of one sort the limit options of other sorts update to only display those relevant to the limited data.  However, the display of limiting options can get a bit confusing once multiple limiting types have been selected.  I will try and sort this out next week.  There are also multiple occurrences of items in the limiting options (e.g. two Glasgows) because the data has spaces in some rows (‘Glasgow’ vs ‘Glasgow ‘) and I’ll need to see about trimming these out next time I import the data.

Also this week I arranged for the old DSL server to be taken offline, as the new website has now been operating successfully for two weeks.  I also had a chat with Katie Halsey about timescales for the development of the Books and Borrowers front-end.  Finally, I imported a new disordered paediatric speech dataset into the Speech Star website.  This included around double the number of records, new video files and a new ‘speaker code’ column.  Finally, I participated in a Zoom call for the Scottish Place-Names database where we discussed the various place-names surveys that are in progress and the possiblity of created an overarching search across all systems.

Week Beginning 2nd May 2022

Monday was the May Day holiday so it was a four-day week.  I spent three of the available days working on the Speak For Yersel project.  I completed work on the age-based questions for the lexical follow-on section.  We wanted to split responses based on the age of the respondent, but I had a question about this:  Should the age filters be fixed or dynamic?  We say 18 and younger / 60 and older but we don’t register ages for users, we register dates of birth.  I can therefore make the age filters fixed (i.e. birth >=2004 for 18, birth <=1962 for 60) or dynamic (e.g. birth >= currentyear-18 and birth <= currentyear -60).  However, each of these approaches have issues.  With the former with each passing year the boundaries will change.  With the latter we end up losing data with each passing year (if someone is 18 when they submitted their data in 2022 then their data will be automatically excluded next year).  I realised that there is a third way:  When a person registers I log the exact time of registration so I can ascertain their age at the point when they registered and this will never change.  I decided to do this instead, although it does mean that the answers of someone who is 18 today will be lumped in with the answers of someone who is 18 in 10 years time, which might cause issues.  However, we can always change how the age boundaries work at a later date.  Below is a screenshot of one of the date questions (more data is obviously still needed):

Whilst working on this I realised there is another problem with this type of question:  Unless we have equal numbers of young and old respondents is it not likely that the data visualised on the map will be misleading?  Say we have 100 ‘older’ respondents but 1000 ‘younger’ ones due to us targeting school children.  If 50% of the older respondents say ‘scunnered’ then there will be 50 ‘older’ markers on the map.  If 10% of the younger respondents say ‘scunnered’ then there will be 100 ‘younger’ markers on the map, meaning our answer ‘older’ (which is marked as ‘correct’) will look wrong even though statistically it is correct.  I’m not sure how we can get around this unless we maybe plot the markers for each age group who don’t use the form as well, so as to let people see the total number of people in each group.  Maybe using a smaller marker and / or a lighter shade for the people who didn’t say a form.  I raised this issue with the team and this is the approach we will probably take.

I then moved onto the follow-on activities for the ‘Sounds about right’ section.  Tis involved creating a ‘drag and drop’ feature where possible answers need to be dropped into boxes.  The mockup suggested that the draggable boxes should disappear from the list of options when dropped elsewhere but I’ve changed it so that the choices don’t disappear from the list, but instead the action copies the contents to the dotted area when you drop your selection.  The reason I’ve done it this way is that if the entire contents move over we could end up with someone dropping several into one box, or if they drop an option into the wrong box they would then have to drag it from the wrong box into the right one before they can try another word in the same box and it can all get very messy (e.g. if there are several words dropped into one box then do we consider this ‘correct’ if one of the words is the right one?).  This way keeps things a lot simpler.  However, it does mean the words the user has already successfully dropped still appear as selectable in the list, which might confuse people and I could disable or remove an option once it’s been correctly placed.  Below is a screenshot of the activity with one of the options dropped:

The next activity asks people to see whether rules apply to all words with the same sounds by selecting ‘yes’ or ‘no’ for each.  I set it up so that the ‘check answers’ button only appears once the user has selected ‘yes’ or ‘no’ for all of the words, and on checking the answers a tick or a cross is added to the right of the ‘yes’ and ‘no’ options.  The user must correct their answers and select ‘check answers’ again before the ‘Check answers’ button is replaced with a ‘Next’ button.  See a screenshot below:

With these in place I then moved onto the ‘perception’ activity, that I’d started to look into last week.  I completed stages 1 and 2 of this activity, allowing the user to rate how they think a person from a region sounds using the seven sliding scales as criteria, as you can see below:

And then rating actual sound clips of speakers from certain areas using the same seven criteria, as the screenshot below shows:

Finally, I created the ‘explore more’ option for the perception activity, which consists of two sections.  The first allows the user to select a region and view the average rating given by all respondents for that region, plotted on ‘read only’ versions of the same sliding scales.  The team had requested that the scales animated to their new locations when a new region was selected and although it took me a little bit of time to implement this I got it working in the end and I think it works really well.  The second option is very similar only it allows the user to select both the speaker and the listener, so you can see (for example) how people from Glasgow rate people from Edinburgh.  At the moment we don’t have information in the system that links up a user and the broader region, so for now this option is using sample data, but the actual system is fully operational.  Below is a screenshot of the first ‘explore’ option:

I feel like I’ve made really good progress with the project this week, but there is still a lot more to implement and I’ll continue with this next week.

I spent Friday working on another project, generating some views of performance data relating to performances of The Gentle Shepherd by Allan Ramsay ahead of a project launch at the end of the month.  I’d been given a spreadsheet of the data so my first step was to write a little script to extract the data, format it (e.g. extracting years from the dates) and save it as JSON, which I would then use to generate a timeline, a table view and a map-based view.  On Friday I completed an initial version of the timeline view and the table view.

I made the timeline vertical rather than horizontal as there are so many years and so much data that a horizontal timeline would be very long, and these days most people use touchscreens and are more used to scrolling down a page than along a page.  I added a ‘jump to year’ feature that lists all of the years as buttons.  Pressing on one of these scrolls to the appropriate year.  There are rather a lot of years so I’ve hidden them in a ‘Jump to Year’ section.  It may be better to have a drop-down list of options instead and I’ll maybe change this.  Each year has a header and a dividing line and a ‘top’ button that allows you to quickly scroll back to the top of the timeline.  Each item in the timeline is listed in a fixed-width box, with multiple boxes per row depending on your screen width and the data available.  Currently all fields are displayed, but this can be changed.

The table view displays all of the data in a table.  You can click on a column heading to sort the data by that heading.  Pressing a heading a second time reverses the order.  I still need to add in the filter options to the table view and then work on the map view once I’m given the latitude and longitude data that is still needed for this view to work.  I’ll continue with this next week.

Also this week I make a couple of minor tweaks to the DSL website and had some discussions with the DSL people about the SLD data and the fate of the old DSL website.  I also updated some of the data for the Books and Borrowing project and had a chat with Thomas Clancy about hosting an external website that is in danger of disappearing.

Week Beginning 25th April 2022

We launched the new version of the DSL website on Tuesday this week, which involved switching the domain name to point at the new server where I’d been developing the new site.  When we’ve done this previously (e.g. for the Anglo-Norman Dictionary) the switchover has been pretty speedy, but this time it took about 24 hours for the DNS updates to propagate, during which time the site was working for some people and not for others.  This is because there is a single SSL certificate for the dsl.ac.uk domain and as it was moved to the new server, the site on the old server (which was still being accessed by people whose ISP’s had not updated their domain name servers) was displaying a certificate error.  This was all a bit frustrating as the problem was out of our hands, but thankfully everything was working normally again by Wednesday.

I made a few final tweaks to the site this week too, including updating the text that is displayed when too many results are returned, updating the ‘cite this entry’ text, fixing a few broken links and fixing the directory permissions on the new site to allow file uploads.  I also gave some advice about the layout of a page for a new Scots / Polish app that the DSL people are going to publish.

I spent almost all of the rest of the week working on the Speak For Yersel project, for which I still have an awful lot to do in a pretty short period of time, as we need to pilot the resource in schools during the week of the 13th of June and need to sent it out to other people for testing and to populate it with initial data before then.  We had a team meeting on Thursday to go through some of the outstanding tasks, which was helpful.

This week I worked on the maps quite a bit, making the markers smaller and giving them a white border to help them stand out a bit.  I updated the rating colours as suggested, although I think we might need to change some of the shades used for ratings and words as after using the maps quite a bit I personally find it almost impossible to differentiate some of the shades, as you can see in the screenshot below.  We have all the colours of the rainbow at our disposal and while I can appreciate why shades are preferred from an aesthetical point of view, in terms of usability it seems a bit silly to me.  I remember having this discussion with SCOSYA too.  I think it is MUCH easier to read the maps when different colours are used, as with Our Dialects (e.g. https://www.ourdialects.uk/maps/bread/).

As you can also see from the above screenshot, I implemented the map legends as well, with only the options that have been chosen and have data appearing in the legend.  Options appear with their text, a coloured spot so you can tell which option is which, and a checkbox that allows you to turn on / off a particular answer, which I think will be helpful in differentiating the data once the map fills up.  For the ‘sound choice’ questions a ‘play’ button appears next to each option in the legend.  I then ensured that the maps work for the quiz questions too: rather than showing a map of answers submitted for the quiz question the maps now display the data for the associated questionnaire (e.g. the ‘Gonnae you’ map).  Maps are also now working for the ‘Explore more’ section too.  I also added in the pop-up for ‘Attribution and Copyright’ (the link in the bottom right of the map).

I then added further quiz questions to the ‘Give your word’ exercise, but the final quiz question in the document I was referencing had a very different structure, with multiple specific answer options from several different questions on the same map.  I spent about half a day making updates to the system to allow for such a question structure.  I needed to update the database structure, the way data is pulled into the website, the way maps are generated, how quiz questions are displayed and how they are processed.

The multi-choice quiz works in a similar way to the multi-choice questionnaire in that you can select more than one answer.  Unlike the questionnaire there is no limit to the number of options you can select.  When at least one choice is selected a ‘check your answers’ button appears.  The map displays all of the data for each of the listed words, even though these come from different questionnaires (this took some figuring out).  There are 9 words here and we only have 8 shades so the ninth is currently appearing as red.  The map legend lists the words alphabetically, which doesn’t match up with the quiz option order, but I can’t do anything about this (at least not without a lot of hacking about).  You can turn off/on map layers to help see the coverage.

When you press on the ‘Check your answers’ button all quiz options are greyed out and your selection is compared to the correct answers.  You get a tick for a correct one and a cross for an incorrect one.  In addition, any options you didn’t select that are correct are given a tick (in the greyed out button) so you can see what was correct that you missed.  If you selected all of the correct answers and didn’t select any incorrect answers then the overall question is marked as correct in the tally that gives your final score.  If you missed any correct answers or selected any incorrect ones then this question is not counted as correct overall.  Below is a screenshot showing how this type of question works:

Unfortunately, when we met on Thursday it turned out that Jennifer and Mary were not wanting this question to be presented on one single map, but instead for each answer option to have its own map, meaning the time I spent developing the above was wasted.  However, it does mean the question is much more simple, which is probably a good thing.  We decided to split the question up into individual questions to make things more straightforward for users and to ensure that getting one of the options incorrect didn’t mean they were marked as getting the entire multi-part question wrong.

Also this week I began implementing the perception questionnaire with seven interactive sliders allowing the user to rate an accent.  Styling the sliders was initially rather tricky but thankfully I found a handy resource that allows you to customise a slider and generates the CSS for you (https://www.cssportal.com/style-input-range/).  Below is a screenshot of the perception activity as it currently stands:

I also replaced one of the sound recordings and fixed the perception activity layout on narrow screens (as previously on narrow screens the labels ended up positioned at the wrong ends of the slider).  I added a ‘continue’ button under the perception activity that is greyed out and added a check to see whether the user has pressed on every slider.  If they have then the ‘continue’ button text changes and is no longer greyed out.  I also added area names to the top-left corner of the map when you hover over an area, so now no-one will confuse Orkney and Shetland!

We had also agreed to create a ‘more activities’ page and to have follow-on activities and the ‘explore more’ maps situated there.  I created a new top-level menu item currently labelled ‘More’.  If you click on this you find an index page similar to the ‘Activities’ page.  Press on an option (only the first page options work so far) and you’re given the choice to either start the further activities (not functioning yet) or explore the maps.  The latter is fully functional.  In the regular activities page I then removed the ‘explore more stage’ so that now when you finish the quiz the button underneath your score leads you to the ‘More’ page for the exercise in question.  Finally, I began working on the follow-on activities that display age-based maps, but I’ll discuss these in more detail next week.

I also spoke to Laura Rattray and Ailsa Boyd about a proposal they are putting together and arranged a Zoom meeting with them in a couple of weeks and spoke to Craig Lamont about the Ramsay project I’m hopefully going to be able to start working on next week.