Week Beginning 27th May 2019

Monday was a holiday this week, so Tuesday was the start of my working week.  I spent about half the day completing work on the Data Management Plan that I had been asked to write by the College of Arts research people, and the remainder of the day continuing to write scripts the help in the linkup of HT and OED lexeme data.  The latest script gets all unmatched HT words that are monosemous within part of speech in the unmatched dataset.  For each of these the script then retrieves all OED words where the stripped form matches, as does POS, but the words are already matched to a different HT lexeme.  If there is more than one OED lexeme matched to the HT lexeme I’ve added the information on subsequent rows in the table, so that full OED category information can more easily be read.  I’m not entirely sure what this script will be used for, but Fraser seems to think it will be useful in automatically pinpointing certain words that the OED are currently trying to manually track down.

During the week I also made some further updates to a couple of song stories for the RNSN project and had a meeting over the phone with Kirsteen McCue about a new project she’s in the planning stages for at the moment, and which I will be helping with the technical aspects for.  I also had a meeting with PhD student Ewa Wanat about a website she’s putting together and gave her some advice about things.

The rest of my week was split between DSL and SCOSYA.  For DSL I spent time answering a number of emails.  I then went through the SND data that had been outputted by Thomas Widmann’s scripts on the DSL server.  I had tried running this data through the script I’d written to take the outputted XML data and insert it into our online MySQL database, but my script was giving errors, stating that the input file wasn’t valid XML.  I loaded the file (an almost 90Mb text file) into Oxygen and asked it to validate the XML.  It took a while, but managed to find one easily identifiable error, and one error that was trickier to track down.

In the entry for snd22907 there was a closing </sense> tag in the ‘History’ that has no corresponding opening tag.  This was easy to track down and manually fix.  Entry snd12737 had two opening tags (<entry id=”snd12737″>) one below the <meta> tag.  This was trickier to find as I needed to manually track it down by chopping the file in half, checking which half the error was in, chopping this bit in half and so on until I ended up with a very small file in which it was easy to locate the problem.

With the SND data fixed I could then run it through my script.  However, I wanted to change the way the script worked based on feedback from Ann last week.  Previously I had added new fields to a test version of the main database, and the script found the matching row and inserted new data.  I decided instead to create an entirely new table for the new data, to keep things more cleanly divided, and to handle the possibility of there being new entries in the data that were not present in the existing database.  I also needed to update the way in which the URL tag was handled, as Ann had explained that there could be any number of URL tags, with them referencing other entries that has been merged with the current one.  After updating my test version of the database to make new tables and fields, and updating my script to take these changes into consideration I ran both the DOST and the SND data through the script, resulting in 50,373 entries for DOST and 34,184 entries for SND.  This is actually less entries than in the old database.  There are 3023 missing SND entries and 1994 missing DOST entries.  They are all supplemental entries (with IDs starting ‘snds’ in SND and ‘adds’ in DOST).  This leaves just 24 DOST ‘adds’ entries in the Sienna data and 2730 SND ‘snds’ entries.  I’m not sure what’s going on with the output – whether the omission of these entries is intentional (because the entries have been merged with regular entries) or whether this is an error, but I have exported information about the missing rows and have sent these on to Ann for further investigation.

For SCOSYA I focussed on adding in the sample sound clips and groupings for location markers.  I also engaged with some preparations for the project’s ‘data hack’ that will be taking place in mid June.  Adding in sound clips took quite a bit of time, as I needed to update both the Content Management System to allow sound clips to be uploaded and managed, and the API to incorporate links to the uploaded sound clips.  This is in addition to incorporating the feature into the front-end.

Now if a member of staff logs into the CMS and goes to the ‘Browse codes’ page they will see a new column that lists the number of sound clips associated with a code.  I’ve currently uploaded the four for E1 ‘This needs washed’ for test purposes.  From the table, clicking on a code loads its page, which now includes a new section for sound clips.  Any previously uploaded ones are listed and can be played or deleted.  New clips in MP3 format can also be uploaded here, with files being renamed upon upload, based on the code and the next free auto-incrementing number in the database.

In the API all soundfiles are included in the ‘attributes’ endpoint, which is used by the drop-down list in the atlas.  The public atlas has also been updated to include buttons to play any sound clips that are available, as the screenshot towards the end of this post demonstrates.

There is now a new section labelled ‘Listen’ with four ‘Play’ icons.  Pressing on one of these plays a different sound clip.  Getting these icons to work has been more tricky than you might expect.  HTML5 has a tag called <audio> that browsers can interpret in order to create their own audio player which is then embedded in the page.  This is what happens in the CMS.  Unfortunately an interface designer has no control over the display of the player – it’s different in every browser and generally takes up a lot of room, which we don’t really have.  I initially just used the HTML5 audio player but each sound clip then had to appear on a new row and the player in Chrome was too wide for the side panel.

Instead I’ve had to create my own audio player in JavaScript.  It still uses HTML5 audio so it works in all browsers, but it allows me to have complete control over the styling, which in this case means just a single button with a ‘Play’ icon on it than changes to a ‘Pause’ icon when the clips is playing.  But it also meant that functionality you might take for granted, such as ensuring an already playing clip is stopped when a different ‘play’ button is pressed, resetting a ‘pause’ button back to a ‘play’ button when a clip ends and such things needed to be implemented by me.  I think it’s all working now, but there may still be some bugs.

I then moved on to looking at the groups for locations.  This will be a fixed list of groups, taken from ones that the team has already created.  I copied these groups to a new location in the database and updated the API to create new endpoints for listing groups and bringing back the IDs of all locations that are contained within a specified group.  I haven’t managed to get the ‘groups’ feature fully working yet, but the selection options are now in place.  There’s a ‘groups’ button in the ‘Examples’ section, and when you click on this a section appears listing the groups.  Each group appears as a button.  When you click on one a ‘tick’ is added to the button and currently the background turns a highlighted green colour.  I’m going to include several different highlighted colours so the buttons light up differently.  Although it doesn’t work yet, these colours will then be applied to the appropriate group of points / areas on the map.  You can see an example of the buttons below:

The only slight reservation I have is that this option does make the public atlas more complicated to use and more cluttered.  I guess it’s ok as the groups are hidden by default, though.  There also may be an issue with the sidebar getting too long for narrow screens that I’ll need to investigate.