Week Beginning 19th October 2020

I was back at work this week after having a lovely holiday the previous week.  It was a pretty busy week, mostly spent continuing to work on the preparations for the second edition of the Historical Thesaurus, which needs to be launched before the end of the month.  I updated the OED date extraction script that formats all of the OED dates as we need them in the HT, including making full entries, in the HT dates table, generating the ‘full date’ text string that gets displayed on the website and generating cached first and last dates that are used for searching.  I’d somehow managed to get the plus and dash connectors the wrong way round in my previous version of the script (a plus should be used where there is a gap of more than 150 years, otherwise it’s a dash) so I fixed this.  I also stripped out dates that were within a 150 year time span, which really helped to make the full date text more readable. I also updated the category browser so that the category’s thematic heading is displayed in the drop-down section.

Fraser had made some suggested changes to the script I’d written to figure out whether an OED lexeme was new or already in the system so I made some changes to this and regenerated the output.  I also made further tweaks to the date extraction script so that we record the actual final date in the system rather than converting it to ‘9999’ and losing this information that will no doubt be useful in future.  I then worked on the post-1999 lexemes, which followed a similar set of processes.

With this all in place I could then run a script that would actually import the new lexemes and their associated dates into the HT database.  This included changelog codes, new search terms and new dates (cached firstdate and lastdate, fulldate and individual entries in the dates table).  A total of 11116 new words were added, although I subsequently noticed there were a few duplicates that had slipped through the net.  With these stripped out we had a total of 804,830 lexemes in the HT, and it’s great to have broken through the 800,000 mark.  Next week I need to fix a few things (e.g. the fullsize timelines aren’t set up to cope with post-1945 dates that don’t end in ‘9999’ if they’re current) but we’re mostly now good to launch the second edition.

Also this week I worked on setting up a website for the ‘Symposium for Seventeenth-Century Scottish Literature’ for Roslyn Potter in Scottish Literature and set up a subdomain for an art catalogue website for Bryony Randall’s ‘Imprints of the New Modernist Editing’ project.  I also helped Megan Coyer out with an issue she was having in transcribing multi-line brackets in Word and travelled to the University to collect a new, higher-resolution monitor and some other peripherals to make working from home more pleasant.  I also fixed a couple of bugs in the Books and Borrowing CMS, including one that was resulting in BC dates of birth and death for authors being lost when data was edited.  I also spent some time thinking about the structure for the Burns Correspondence system for Pauline Mackay, resulting in a long email with a proposed database structure.  I met with Thomas Clancy and Alasdair Whyte to discuss the CMS for the Iona place-names project (it now looks like this is going to have to be a completely separate system from Alasdair’s existing Mull / Ulva system) and replied to Simon Taylor about a query he had regarding the Place-names of Fife data.

I also found some time to continue with the redevelopment of the Anglo-Norman Dictionary website.  I updated the way cognate references were processed to enable multiple links to be displayed for each dictionary. I also added in a ‘Cite this entry’ button, which now appears in the top right of the entry that when clicked on opens a pop-up where citation styles will appear (they’re not there yet).  I updated the left-hand panel to make it ‘sticky’: If you scroll down a long entry the panel stays visible on screen (unless you’re viewing on a narrow screen like a mobile phone in which case the left-hand panel appears full-width before the entry).  I also added in a top bar that appears when you scroll down the screen that contains the site title, the entry headword and the ‘cite’ button.  I then began working on extracting the citations, including their dates and text, which will be used for search purposes.  I ran an extraction script that extracted about 60,0000 citations, but I released that this was not extracting all of the citations and further work will be required to get this right next week.


Week Beginning 2nd September 2019

I had my PDR session this week, so I needed to spend some time preparing for it, attending it, and reworking some of my PDR sections after it.  I think it all went pretty well, though, and it’s good to get it over with for another year.  I had one other meeting this week, with Sophie Vlacos from English Literature.  She is putting a proposal together and I get her some advice on setting up a website and other technical matters.

My main project of the week once again was SCOSYA, and this week I was able to really get stuck into the Experts Atlas interface, which I began work on last week.  I’ve set up the Experts Atlas to use the same grey map as the Public Atlas, but it currently retains the red to yellow markers of the CMS Atlas.  The side panel is slightly wider than the Public Atlas and uses different colours, taken from the logo.  The fractional zoom from the Public Atlas is also included, as is the left-hand menu style (i.e. not taking the full height of the Atlas).  The ‘Home’ map shows the interview locations, with each appearing as a red circle.  There are no pop-ups on this map, but the location name appears as a tooltip when hovered over.

The ‘Search Attributes’ option is mostly the same as the ‘Attribute Search’ option in the CMS Atlas.  I’ve not yet updated the display of the attributes to allow grouping at three as opposed to two levels, probably using a tree-based approach.  This is something I’ll need to tackle next week.  I have removed the ‘interviewed by’ option, but as of yet I haven’t changed the Boolean display.  At a team meeting we had discussed making the joining of multiple attributes default to ‘And’ and to hide ‘Or’ and ‘Not’ but I just can’t think of a way of doing this without ending up with more clutter and complexity.  ‘And’ is already the default option and I personally don’t think it’s too bad to just see the other options, even if they’re not used.

The searches all work in the same way as in the CMS Atlas, but I did need to change the API a little, as when multiple attributes were selected these weren’t being ordered by location (e.g. all the D3 data would display then all the D4 data rather than all the data for both attributes for Aberdeen etc).  This was meaning the full information was not getting displayed in the pop-ups.  I’ve also completely changed the content of the pop-ups so as to present the data in a tabular format.  The average rating appears in a circle to the right of the pop-up, with a background colour reflecting the average rating.  The individual ratings also appear in coloured circles, which I personally think works rather well.  Changing the layout of the popup was a fairly major undertaking as I had to change the way in which the data was processed, but I’d say it’s a marked improvement on the popups in the CMS Atlas.  I removed the descriptions from the popups as these were taking up a lot of space and they can be viewed in the left-hand menu anyway.  Currently if a location doesn’t meet the search criteria and is given a grey marker the popup still lists all of the data that is found for the selected attributes at that location.  I did try removing this and just displaying the ‘did not meet criteria’ message, but figured it would be more interesting for users to see what data there is and how it doesn’t meet the criteria.  Below is a screenshot of the Experts Atlas and an ‘AND’ search selected:

Popups for ‘Or’ and ‘Not’ searches are identical, but for an ‘Or’ search I’ve updated the legend to try and make it more obvious what the different colours and shapes refer to.  In the CMS Atlas the combinations appear as ‘Y/N’ values.  E.g. if you have selected ‘D3 ratings 3-5’ OR ‘Q14 ratings 3-5’ then locations where neither are found were identified as ‘NN’, locations were the D3 was present at these ratings but Q14 wasn’t were identified as ‘YN’, locations without D3 but with Q14 were ‘NY’ and locations with both were ‘YY’.  This wasn’t very easy to understand, so now the legend includes the codes, as the following screenshot demonstrates:

I think works a lot better, but there is a slight issue in that if someone chooses the same code but with different criteria (e.g. ‘D3 rated 4-5 by Older speakers’  OR ‘D3 rated 4-5 by Younger speakers’) the legend doesn’t differentiate between the different ‘D3’s, but hopefully anyone doing such a search would realise the first ‘D3’ relates to their first search selection while the second refers to their second selection.

I have omitted the ‘spurious’ tags from the ratings in the popups and also the comments.  I wasn’t sure whether these should be included, and if so how best to incorporate them.  I’ve also not included the animated dropping down of markers in the Experts Atlas as firstly it’s supposed to be more serious and secondly the drop down effect won’t work with the types of markers used for the ‘Or’ search.  I have also not currently incorporated the areas.  We had originally decided to include these, but they’ve fallen out of favour somewhat, plus they won’t work with ‘Or’ searches, which rely on differently shaped markers as well as colour, and they don’t work so well with group highlighting either.

The next menu item is ‘My search log’, which is what I’ve renamed the ‘History’ feature from the CMS Atlas.  This now appears in the general menu structure rather than replacing the left-hand menu contents.  Previously the rating levels just ran together (e.g. 1234), which wasn’t very clear so I’ve split these up so the description reads something like:

“D3: I’m just after, age: all, rated by 1 or more people giving it at score of 3, 4 or 5 Or Q14: Baremeasurepint, age: all, rated by 1 or more people giving it at score of 3, 4 or 5 viewed at 15:41:00”

As with the CMS Atlas, pressing on a ‘Load’ button loads the search back into the map.  The data download option has also been given its own menu item, and pressing on this downloads the CSV version of the data that’s displayed on the map.  And that’s as far as I’ve got.  The main things still to do are replacing the attribute drop-down list with a three-level tree-based approach and adding in the group statistics feature.  Plus I still need to create the facility for managing users who have been authorised to download the full dataset and creating the login / download options for this.

Also this week I made some changes to the still to launch Glasgow Medical Humanities Network website for Gavin Miller.  I made some minor tweaks, such as adding in the Twitter feed and links to subscribe to the blog, updated the site text on pages that are not part of the WordPress interface.  Gavin also wanted me to grab a copy of all the blogs on another of his sites (http://mhrc.academicblogs.co.uk/) and migrate this to the new site.  However, getting access to this site has proved to be tricky.  Gavin reckoned the domain was set up by UoG, but I submitted a Helpdesk query about it and no-one in IT knows anything about the site.  Eventually someone in the Web Team get back to me to say that the site had been set up by someone in Research Strategy and Innovation and they’d try to get me access, but despite the best efforts of a number of people I spoke to I haven’t managed to get access yet.  Hopefully next week, though.

Also this week I continued to work on the 18th Century Borrowing site for Matthew Sangster.  I have now fixed the issue with the zoomable images that were landscape being displayed on their side, as demonstrated in last week’s post.  All zoomable images should now display properly, although there are a few missing images at the start or end of the registers.  I also developed all of the ‘browse’ options for the site.  It’s now possible to browse a list of all student borrower names.  This page displays a list of all initial letters of the surnames, with a count of the number of students with surnames beginning with the letter.  Clicking on a letter displays a list of all students with surnames beginning with the letter, and a count of the number of records associated with each student.  Clicking on a student brings up the results page, which lists all of the associated records in a tabular format.  This is pretty much identical to the tabular view offered when looking at a page, only the records can come from any page.  As such there is an additional column displaying the register and page number of each record, and clicking on this takes you to the page view, so you can see the record in context and view the record in the zoomable image if you so wish.  There are links back to the results page, and also links back from the results page to the student page.  Here’s an example of the list of students with surnames beginning with ‘C’:

The ‘browse professors’ page does something similar, only all professors are listed on one page rather than being split into different pages for each initial letter of the surname.  This is due to there being a more limited number of professors.  That there are some issues with the data, which is why we have professors listed with names like ‘, &’.  There are what look like duplicates listed as separate professors (e.g. ‘Traill, Dr’) because the surname and / or title fields must have contained additional spaces or carriage returns so the scripts considered the contents to be different.  Clicking on a professor loads the results page in the same way as the students page.  Note that currently there is no pagination of results, so for example clicking on ‘Mr Anderson’ will display all 1034 associated records in one long table.  I might split this up, although in these days of touchscreens people tend to prefer scrolling through long pages rather than clicking links to browse through multiple smaller pages.

‘Browse Classes’ does the same for classes.  I also created two new related tables to hold details of the classes, which enables me to pass a numerical ‘class ID’ in the URL rather than the full class text, which is tidier and more easy to control.  Again, there are issues with the data that results in multiple entries for what is presumably the same class – e.g. ‘Anat, Anat., Anat:, Anato., Anatom, Anatomy’.  Matthew is still working on the data and it might be that creating a ‘normalised’ text field for class is something that we should do.

‘Book Names’ does the same thing for book names.  Again, I’ve written a script that extracts all of the unique book names and stores them once, allowing me to pass a ‘book name ID’ in the URL rather than the full text.  As with ‘students’ an alphabetical list of book names is presented initially due to the number of different books.  And as with other data types, a normalised book name should ideally be recorded as there are countless duplicates with slight variations here, making the browse feature pretty unhelpful as it currently stands.  I’ve taken the same approach with book titles, although surprisingly there is less variation here, even though the titles are considerably longer.  One thing to note is that any book with a title that doesn’t start with an a-z character is currently not included.  There are several that start with ‘….’ And some with ‘[‘ that are therefore omitted.  This is because the initial letter is passed in the URL and for security reasons there are checks in place to stop characters other than a-z being passed.  ‘Browse Authors’ works in the same way, and generally there don’t appear to be too many duplicate variants, although there are some (e.g. ‘Aeschylus’ and ‘Aeschylus.’), and finally, there is browse by lending date, which groups records by month of lending.

Also this week I added a new section to Bryony Randall’s New Modernist Editing site for her AHRC follow-on funding project: https://newmodernistediting.glasgow.ac.uk/the-imprints-of-the-new-modernist-editing/ and I spent a bit of time on DSL duties too.  I responded to a long email from Rhona Alcorn about the data and scripts that Thomas Widmann had been working on before he left, and I looked at some bibliographical data that Ann Ferguson had sent me last week, investigating what the files contained and how the data might be used.

Next week I will continue to focus on the SCOSYA project and try to get the Experts Atlas finished.

Week Beginning 11th March 2019

I mainly worked on three projects this week:  SCOSYA, the Historical Thesaurus and the DSL.  For SCOSYA I continued with the new version of my interactive ‘story map’ using Leaflet’s choropleth example and the geographical areas that has been created by the project’s RAs.  Last week I’d managed to get the areas working and colour coded based on the results from the project’s questionnaires.  This week I needed to make the interactive aspects, such as being able to navigate through slides, load in new data, click on areas to view details etc.  After a couple of days I had managed to get a version of the interface that did everything my earlier, more crudely laid out Voronoi diagram did, but using the much more pleasing (and useful) geographical areas and more up to date underlying technologies.  Here’s a screenshot of how things currently look:

If you look at the screenshot from last week’s post you’ll notice that one location (Whithorn) wasn’t getting displayed.  This was because the script iterates through the locations with data first, then the other locations, and the code to take the ratings from the locations with data and to add them to the map was only triggering when the next location also had data.  After figuring this out I fixed it.  I also figured out why ‘Airdrie’ had been given a darker colour.  This was because of a typo in the data.  We had both ‘Airdrie’ and ‘Ardrie’, so two layers were being generated, one on top of the other.  I’ve fixed ‘Ardrie’ now.  I also updated the styles of the layers to make the colours more transparent and the borders less thick and added in circles representing the actual questionnaire locations.  Areas now get a black border when you move the mouse over it, reverting to the dotted white border on mouse out.  When you click on an area (as with Harthill in the above screenshot) it is given a thicker black border and the area becomes more opaque.  The name of the location and its rating level appears in the box in the bottom left.  Clicking on another area, or clicking a second time on the selected area, deselects the current area.  Also, the pan and zoom between slides is now working, and this uses Leaflet’s ‘FlyTo’ method, which is a bit smoother than the older method used in the Voronoi storymap.  Similarly, switching from one dataset to another is also smoother.  Finally, the ‘Full screen’ option in the bottom right of the map works, although I might need to work on the positioning of the ‘slide’ box when in this view.  I haven’t implemented the ‘transparency slider’ feature that was present in the Voronoi version, as I’m not sure it’s really necessary any more.

The underlying data is exactly that same as for the Voronoi example, and is contained in a pretty simple JSON file.  So long as the project RAs stick to the same format they should be able to make new stories for different features, and I should be able to just plug a new file into the atlas and display it without any further work.  I think this new story map interface is working really well now and I’m very glad we took the time to manually plot out the geographical areas.

Also for SCOSYA this week, E contacted me to say that the team hadn’t been keeping a consistent record of all of the submitted questionnaires over the years, and wondered whether I might be able to write an export script that generated questionnaires in the same format as they were initially uploaded.  I spent a few hours creating such a feature, which at the click of a button iterates through the questionnaires in the database, formats all of the data, generates CSV files for each, adds them to a ZIP file and presents this for download.  I also added a facility to download an individual CSV when looking at a questionnaire’s page.

For the HT I continues with the seemingly endless task of matching up the HT and OED data.  Last week Fraser had sent me some category matches he’d manually approved that had been outputted by my gap matching script.  I ran these through a further script that ticked these matches off.  There were 154 matches, bringing the number of unmatched OED categories that have a POS and are not empty down to 995.  It feels like something of a milestone to get this figure under a thousand.

Last week we’d realised that using category ID to uniquely identify OED lexemes (as they don’t have a primary key) is not going to work in the long term as during the editing process the OED people can move lexemes between categories.  I’d agreed to write a script that identifies all of the OED lexemes that cannot be uniquely identified when disregarding category ID (i.e. all those OED lexemes that appear in more than one category).  Figuring this out proved to be rather tricky as the script I wrote takes up more memory than the server will allow me to use.  I had to run things on my desktop PC instead, but to do this I needed to export tables from the online database, and these were bigger than the server would let me export too.  So I had to process the XML on my desktop and generate fresh copies of the table that way. Ugh.

Anyway, the script I wrote goes through the new OED lexeme data and counts all the times a specific combination of refid, refentry and lemmaid appears (disregarding the catid).  As I expected, the figure is rather large.  There are 115,550 times when a combination of refid, refentry and lemmaid appears more than once.  Generally the number of times is 2, but looking through the data I’ve seen one combination that appears 7 times.  The total number of words with a non-unique combination is 261,028, which is about 35% of the entire dataset.  We clearly need some other way of uniquely identifying OED lexemes.  Marc’s suggestion last week of asking the OED to create a legacy ‘catid’ field that is retained in the data as it is now and is never updated in future that would be sufficient to uniquely identify everything in a (hopefully) persistent way.  However, we would still need to deal with new lexemes added in future, though, which might be an issue.

I then decided to generate a list of all of the OED words where the refentry, refid and lemmaid are the sameMost of the time the word has the same date in each category, but not always.  For example, see:


654         5154310                0              180932  absenteeism      1957       2005

654         5154310                0              210756  absenteeism      1850       2005


3366       9275424                0              92850    affectuous          1441       1888

3366       9275424                0              136581  affectuous          1441       1888

3366       9275424                0              136701  affectuous          1566       1888


10058    40557440             0              25985    aquiline                 1646       1855

10058    40557440             0              39861    aquiline                 1745       1855

10058    40557440             0              65014    aquiline                 1791       1855

I then updated the script to output data only when refentry, refid, lemmaid, lemma, sortdate and enddate are all the same.  There are 97927 times when a combination of all of these fields appears more than once, and the total number of words where this happens is 213,692 (about 28% of all of the OED lemmas).  Note that the output here will include the first two ‘affectuous’ lines listed above while omitting the third. After that I created a script that brings back all HT lexemes that appear in multiple categories but have the same word form (the ‘word’ column), ‘startd’ and ‘endd’ (non OE words only).  There are 71,934 times when a combination of these fields is not unique, and the total number of words where this happens is 154,335.  We have 746,658 non-OE lexemes, so this is about 21% of all the HT’s non-OE words.  Again, most of these appear in two categories, but not all of them.  See for example:

529752  138922  abridge of/from/in          1303       1839

532961  139700  abridge of/from/in          1303       1839

613480  164006  abridge of/from/in          1303       1839

328700  91512    abridged              1370       0

401949  111637  abridged              1370       0

779122  220350  abridged              1370       0

542289  142249  abridgedly           1801       0

774041  218654  abridgedly           1801       0

779129  220352  abridgedly           1801       0

I also created a script that attempted to identify whether the OED categories that had been deleted in their new version of the data, but we had connected up to one of the HT’s categories, had possibly been moved elsewhere rather than being deleted outright.  There were 42 such categories and I created two checks to try and find whether the categories had just been moved.  The first looks for a category in the new data that has the same path, sub and pos while the second looks for a category with the same heading and pos and the highest number of words (looking at the stripped form) that are identical to the deleted category.  Unfortunately neither approach has been very successful.  Check number 1 has identified a few categories, but all are clearly wrong.  It looks very much like where a category has been deleted things lower down the hierarchy have been shifted up.  Check number 2 has identified two possible matches but nothing more.  And unfortunately both of these OED categories are already matched to HT categories and are present in the new OED data too, so perhaps these are simply duplicate categories that have been removed from the new data.

I then began to use the new OED category table rather than the old one.  As expected, when using the new data the number of unmatched not empty OED categories with POS has increased, from 995 to 1952.  In order to check thow the new OED category data compares to the old data I wrote a script that brings back 100 random matched categories and their words for spot checking.  This displays the category and word details for the new OED data, the old OED data and the HT data.I’ve looked through a few output screens and haven’t spotted any issues with the matching yet.  However, it’s interesting to note how the path field in the new OED data differs from the old, and from the HT.  In many cases the new path is completely different to the old one.  In the HT data we use the ‘oedmaincat’ field, which (generally) matches the path in the old data. I added in a new field ‘HT current Tnum’ that displays the current HT catnum and sub, just to see if this matches up with the new OED path.  It is generally pretty similar but frequently slightly different. Here are some examples:

OED catid 47373 (HT 42865) ‘Turtle-soup’ is ‘|03 (n)’ in the old data and in the HT’s ‘oedmaincat’ field.  In the new OED data it’s ‘|03 (n)’ while the HT’s current catnum is ‘|03’.

OED catid 98467 (HT 91922) ‘place off centre’ is ‘|04.01 (vt)’ in the old data and oedmaincat.  In the new OED data its ‘|04.01 (vt)’ and HT catnum is ‘|04.01’.

OED catid 202508 (HT 192468) ‘Miniature vehicle for use in experiments’ is ‘|13 (n)’ in the old data and oedmaincat.  In the new data it’s ‘|13 (n)’ and the HT catnum (as you probably guessed) is ‘|13’.

As we’re linking categories on the catid it doesn’t really have any bearing on the matching process, but it’s possibly less than ideal that we have three different hierarchical structures on the go.

For the DSL I spent some time this week analysing the DSL API in order to try and figure out why the XML outputted by the API is different to the XML stored in the underlying database that the API apparently uses.  I wasn’t sure whether there was another database on the server that I was unaware of, or whether Peter’s API code was dynamically changing the XML each time it was requested.  It turns out it’s the latter.  As far as I can tell, every time a request for an entry is sent to the API, it grabs the XML in the database, plus some other information stored in other tables relating to citations and bibliographical entries, and then it dynamically updates sections of the XML (e.g. <cit>) to replace sections, adding in IDs, quotes and other such things.  It’s a bit of an odd system, but presumably there was a reason why Peter set it up like this.

Anyway, after figuring out that the API is behaving this way I could then work out a method to grab all of the fully formed XML that the API generates.  Basically I’ve written a little script that requests the full details for every word in the dictionary and then saves this information in a new version of the database.  It took several hours for the script to complete, but it has now done so.  I would appear to have the fully formed XML details for 89,574, and with access to this data I should be able to start working on a new version of the API using this data, that will hopefully give us something identical in functionality and content to the old API.

Also this week I moved offices, which took most of Friday morning to sort out.  I also helped Bryony Randall to get some stats for the New Modernist Editing websites, created a further ‘song story’ for the RNSN project and updated all of the WordPress sites I manage to the latest version of WordPress.

Week Beginning 17th December 2018

I took Friday off this week, in the run-up to Christmas, and spent the remaining four days trying to finish some of my outstanding tasks before the holidays begin.  This included finishing the ‘song story’ I’d left half finished last week for the RNSN project, starting and completing the other ‘song story’ that I had in my ‘to do’ list and updating two other stories to add in audio files.  All this required a lot of adapting images, uploading and linking to files, writing HTML and other such trivial but rather time-consuming tasks.  It probably took the best part of two days to get it all done, but things are looking great and I reckon the bulk of the work on these song stories is now complete.  We’re hoping to launch them early next year, at which point I’ll be able to share the URLs.

I also continued to talk Joanna Kopacyk with the proposal she’s putting together.  This included having a long email conversation with both her and Luca, plus meeting in person with Joanna to go through some of the technical aspects that still needed a bit of thought.  Things seem to be coming together well now and hopefully Joanna will be able to submit the proposal in the new year.

Bryony Randall is also working on a proposal, this time a follow-on funding bid.  She’d set up a project website on WordPress.com, but it was filled with horribly intrusive adverts, and I thought it would give a better impression to reviewers if we migrated the site to a Glasgow server.  I started this process last week and completed it this week.  The new website can be found here: https://newmodernistediting.glasgow.ac.uk/

I also spent a couple of hours on the Bilingual Thesaurus, changing the way the selection of languages of origin and citation are handled on the search page.  There are so many languages and it had been suggested that higher-level groupings could help ensure users selected all of the appropriate options.  So, for example, a new top level group would be ‘Celtic’ and then within this there would be Irish, Old Irish, Scots Gaelic etc.  Each group has a checkbox and if you click on it then everything within the group is checked.  Clicking again deselects everything, as the screenshot below demonstrates.  I think it works pretty well.

I could hardly let a week pass without continuing to work on the HT / OED category linking task, and therefore I spent several further hours working on this.  I completed a script that compares all lexemes and their search terms to try and find matches.  For this to work I also had to execute a script to generate suitable search terms for the OED data (e.g. variants with / without brackets).  The comparison script takes ages to run as it has to compare every word in every unmatched OED category to every word in every unmatched HT category.  The script has identified a number of new potential matches that will hopefully be of some use.  It also unfortunately identified many OED lexemes that just don’t have any match in the HT data, despite having a ‘GHT date’, which means there should be a matching HT word somewhere.  It looks like some erroneous matches might have crept into our matching processes.  In some cases the issue is merely that the OED have changed the lexeme so it no longer matches (e.g. making a word plural).  But in other cases things look a little weird.

For example, OED 231036 ‘relating to palindrome’ isn’t matched and contains 3 words, none of which are found in the remaining unmatched HT categories (palindrome, palindromic, palindromical).  I double-checked this in the database.  The corresponding HT category is 219358 ‘pertaining to palindrome’, which contains four words (palinedrome, cancrine, palindromic, palindromical).  This has been matched to OED category 194476 ‘crab-like, can be read backwards and forwards’, which contains the words ‘palinedrome, cancrine, palindromic, palindromical’.  on further investigation I’d say OED category 194476 ‘crab-like, can be read backwards and forwards’ should actually match HT category 91942 ‘having same form in both directions’ which contains a single word ‘palindromic’.  I get the feeling the final matching stages are going to get messy.  But this is something to think about next year.  That’s all from me for 2018.  I wish anyone who is reading this a very merry Christmas.

Week Beginning 18th June 2018

This week I continued with the new ‘Search Results’ page for the Historical Thesaurus.  Last week I added in the mini-timelines for search results, but I wanted to bring some of the other updated functionality from the ‘browse’ page to the ‘search results’ page too.

There is now a new section on the search results page where sort options and such things are located.  This is currently always visible, as it didn’t seem necessary to hide it away in a hamburger menu as there’s plenty of space at the top of the page.  Options include the facility to open the full timeline visualisation, turn the mini-timelines on or off and set the sorting options.  These all tie into the options on the ‘browse’ page too, and are therefore ‘remembered’.

It took some time to get this working as the search results page is rather different to the browse page.  I also had to introduce a new sort option (by ‘Thesaurus Category’) as this is how things are laid out by default.  It’s also been a bit of a pain as both word and category results are lumped together, but the category results always need to appear after the ‘word’ results, and the sort options don’t really apply to them.  Also, I had to make sure the ‘recommended’ section got ordered as well as the main results.  This wasn’t so important for searches with few recommended categories, such as ‘sausage’ but for things like ‘strike’ that have lots of ‘recommendeds’ I figured ordering them would be useful.  I also had to factor the pagination of results into the ordering options too.  It’s also now possible to bookmark / share the results page with a specific order set, allowing users to save or share a link to a page with results ordered by length of attestation, for example.  Here’s a screenshot showing the results ordered by length of attestation:

I then set about implementing the full timeline visualisation for the search results.  As with the other updates to the search results page, this proved to be rather tricky to implement as I had to pull apart the timeline visualisation code I’d made for the ‘browse’ page and reformat it so that it would work with results from different categories.  This introduced a number of weird edge cases and bugs that took me a long time to track down.  One example that I’ve only just fixed and has taken about two hours to get to the bottom of:

When ordering by length of attestation all OE dates were appearing at the end, even though many were clearly the longest attested words.  Why was this happening?  It turns out that elsewhere in the code where ‘OE’ appears in the ‘fulldate’ I was replacing this with an HTML ‘span’ to give the letters the smallcaps font.  But having HTML in this field was messing up the ‘order by duration’ code, buried deep within a function called within a function within a function.  Ugh, getting to the bottom of that almost had me at my wit’s end.

But I got it all working in the end, and the visualisation popup is now working properly on the search results page, including a new ‘Thesaurus Category’ ordering option.  I’ve also made the row label field wider and have incorporated the heading and PoS as well as the word.  ‘Thesaurus Category’ ordering might seem a little odd as the catnum doesn’t appear on the visualisation, but adding this in would made the row label very long.  Here’s how the timeline visualisation for results looks:

Note that this now search results page isn’t ‘live’ yet.  Fraser also wanted me to update how the search works to enable an ‘exact’ search to be performed, as currently a search for ‘set’ (for example) brings back things like ‘set (of teeth)’, which Fraser didn’t want included.  I did a little further digging into this as I had thought we once allowed exact searches to be performed, and I was right.  Actually, when you search for ‘set’ you are doing an exact search.  If it was a partial match search you’d use wildcards at the beginning and end and you’d end up with more results than the website allows you to see.


Maybe an example with less results would work better.  E.g. ‘wolf’.  Using the new results page, here’s an exact search: https://ht.ac.uk/category-selection/index-test.php?qsearch=wolf with 36 results, and here’s a wildcard search: https://ht.ac.uk/category-selection/index-test.php?qsearch=*wolf* with 240 results.

Several years ago (back when Christian was still involved) we set about splitting up results to allow multiple forms to be returned, and I wrote some convoluted code to extract possible permutations so that things like ‘wolf/wolf of hell/devil’s wolf < (deofles) wulf’ would be found when doing an exact search for ‘wolf’ or ‘wolf of hell’ or whatever.

When you do an exact search it searches the ‘searchterms’ table that contains all of these permutations.  One permutation type was to ignore stuff in brackets, which is why things like ‘set (about)’ are returned when you do an exact search for ‘set’, but handily keeps other stuff like ‘Wolffian ridge’ and ‘rauwolfia (serpentina)’ out of exact searches for ‘wolf’.  A search for ‘set’ is an exact match for one permuation of ‘set (about)’ that we have logged, so the result is returned.

To implement a proper ‘exact’ search I decided to allow users to surround a term with double quotes.  I updated my search code so that when double quotes are supplied the search code disregards the ‘search terms’ table and instead only searches for exact matches in the ‘wordoe’ and ‘wordoed’ fields.  This then strips out things like ‘set (about)’ but still ensures that words with OE forms, such as ‘set < (ge)settan’ are returned, as such words have ‘set’ in the ‘wordoed’ field and ‘(ge)settan’ in the ‘wordoe’ field.  This new search seems to be working very well but as with the other updates I haven’t made it live yet.

One thing I noticed with the search results timeline visualisation that might need fixed:  The contents of the visualisation are limited to a single results page, so reordering the visualisation will only reorder a subset of the results.  E.g. if the results go over two pages and are ordered by ‘Thesaurus Category’ and you open the visualisation, if you then reorder the visualisation by ‘length of attestation’ you’re only ordering those results that appeared on the first page of results when ordered by ‘Thesaurus Category’.

So to see the visualisation with the longest period of attestation you first need to reorder the results page by this option and then open the visualisation.  This is possibly a bit clunky and might lead to confusion.  I can change how things work if required.  The simplest way around this might be to just display all results in the visualisation, rather than just one page of results.  That might lead to some rather lengthy visualisations, though.  I’ve asked Marc and Fraser what they think about this.

I’m finding the search results visualisation to be quite fun to use.  It’s rather pleasing to search for a word (e.g. the old favourite ‘sausage’) and then have a visualisation showing when this word was used in all its various senses, or which sense has been used for longer, or which sense came first etc.

Also this week the Historical Thesaurus website moved to it’s new URL.  The site can now be accessed at https://ht.ac.uk/, which is much snappier than the old https://historicalthesaurus.arts.gla.ac.uk URL.  I updated the ‘cite’ options to reflect this change in URL and everything seems to be working very well.  I also discussed some further possible uses for the HT with Fraser and Marc, but I can’t really go into too many details at this point.

Also this week I fixed a minor issue on a page for The People’s Voice project, and a further one for the Woolf short story site, and gave some further feedback about a couple of further updates for the Data Management Plan for Faye Hammill.  I also had some App duties to take care of and gave some feedback on a Data Management Plan that Graeme had written for a project for someone in History.  I also created the interface for the project website for Matthew Creasy’s Decadence and Translation Network project, which I think is looking rather good (but isn’t live yet).  I had a chat with Scott Spurlock about his crowdsourcing project, which it looks like I’m going to start working on later in the summer, I spoke to David Wilson, a developer elsewhere in the college, about some WordPress issues, and I gave some feedback to Kirsteen McCue on a new timeline feature she is hoping to add to the RNSN website.  I also received an email from the AHRC this week thanking me for acting as a Technical Reviewer for them.  It turns out that I completed 39 reviews during my time as a reviewer, which I think it pretty good going!

Also this week I had a meeting with Megan Coyer where we discussed a project she’s putting together that’s related to Scottish Medicine in history.  We discussed possible visualisation techniques, bibliographical databases and other such things.  I can’t go into any further details just now, but it’s a project that I’ll probably be involved with writing the technical parts for in the coming months.

Eleanor Lawson sent me some feedback about my reworking of the Seeing Speech and Dynamic Dialects websites, so I acted on this feedback.  I updated the map so that the default zoom level is one level further out than before, centred on somewhere in Nigeria.  This means on most widths of screen it is possible to see all of North America and Australia.  New Zealand might still get cut off, though.  Obviously on narrower screens less of the world will be shown.

On the chart page I updated the way the scrollbar works.  It now appears at both the top and the bottom of the table.  This should hopefully make it clearer to people that they need to scroll horizontally to see additional content, and also make it easier to do the scrolling – no need to go all the way to the bottom of the table to scroll horizontally.  I also replaced one of the videos with an updated one that Eleanor sent me.

On Friday I began to think about how to implement a new feature for the SCOSYA atlas.  Gary had previously mentioned that he would like to be able to create groups of locations and for the system to then be able to generate summary statistics about the group for a particular search that a user has selected.  I wrote a mini ‘requirements’ document detailing how this might work, and it took some time to think through all of the various possibilities and to decide how the feature might work best.  By the end of the day I had a plan and had emailed the document to Gary for feedback.  I’m hoping to get started on this new feature next week.


Week Beginning 14th August 2017

I was on holiday last week but was back to work on Monday this week.  I’d kept tabs on my emails whilst I was away but as usual there were a number of issues that had cropped up in my absence that I needed to sort out.  I spent some time on Monday going through emails and updating my ‘to do’ list and generally getting back up to speed again after a lazy week off.

I had rather a lot of meetings and other such things to prepare for and attend this week.  On Monday I met with Bryony Randall for a final ‘sign off’ meeting for the New Modernist Editing project.  I’ve really enjoyed working on this project, both the creation of the digital edition and taking part in the project workshop.  We have now moved the digital edition of Virginia Woolf’s short story ‘Ode written partly in prose on seeing the name of Cutbush above a butcher’s shop in Pentonville’ to what will hopefully be its final and official URL and you can now access it here: http://nme-digital-ode.glasgow.ac.uk

On Tuesday I was on the interview panel for Jane Stuart-Smith’s SPADE project, which I’m also working on for a small percentage of my time.  After the interviews I also had a further meeting with Jane to discuss some of the technical aspects of her project.  On Wednesday I met with Alison Wiggins to discuss her ‘Archives and Writing Lives’ project, which is due to begin next month.  This project will involve creating digital editions of several account books from the 16th century.  When we were putting the bid together I did quite a bit of work creating a possible TEI schema for the account books and working out how best to represent all of the various data contained within the account entries.  Although this approach would work perfectly well, now that Alison has started transcribing some entries herself we’ve realised that managing complex relational structures via taxonomies in TEI via the Oxygen editor is a bit of a cumbersome process.  Instead Alison herself investigated using a relational database structure and had created her own Access database.  We went through the structure when we met and everything seems to be pretty nicely organised.  It should be possible to record all of the types of data and the relationships between these types using the Access database and so we’ve decided that Alison should just continue to use this for her project.  I did suggest making a MySQL database and creating a PHP based content management system for the project, but as there’s only one member of staff doing the work and Alison is very happy using Access it seemed to make sense to just stick with this approach.  Later on in the project I will then extract the data from Access, create a MySQL database out of it and develop a nice website for searching, browsing and visualising the data.  I will also write a script to migrate the data to our original TEI XML structure as this might prove useful in other projects.

It’s Performance and Development Review time again, and I have my meeting with my line manager coming up, so I spent about a day this week reviewing last year’s objectives and writing all of the required sections for this year.  Thankfully having my weekly blog posts makes it easier to figure out exactly what I’ve been up to in the review period.

Other than these tasks I helped Jane Roberts out with an issue with the Thesaurus of Old English, I fixed an issue with the STARN website that Jean Anderson had alerted me to, I had an email conversation with Rhona Brown about her Edinburgh Gazetteer project and I discussed data management issues with Stuart Gillespie.  I also uploaded the final set of metaphor data to the Mapping Metaphor database.  That’s all of the data processing for this project now completed, which is absolutely brilliant.  All categories are now complete and the number of metaphors has gone down from 12938 to 11883, while the number of sample lexemes (including first lexemes) has gone up from 25129 to a whopping 45108.

Other than the above I attended the ‘Future proof IT’ event on Friday.  This was an all-day event organised by the University’s IT services and included speakers from JISC, Microsoft, Cisco and various IT related people across the University.  It was an interesting day with some excellent speakers, although the talks weren’t as relevant to my role as I’d hoped they would be.  I did get to see Microsoft’s HoloLens technology in action, which was great, although I didn’t personally get a chance to try the headset on, which was a little disappointing.


Week Beginning 17th July 2017

This was my first week back after a very relaxing two weeks of holiday, and in fact Monday this week was a holiday too so it was a bit of a short week.  I spent some time doing the usual catching up with emails and issues that had accumulated in my absence, including updating access rights for the SciFiMedHums site, investigating an issue with some markers not appearing on the atlas for SCOSYA, looking into an issue that Luca had emailed me about, fixing some typos on the Woolf short story site and speaking to the people at the KEEP archive about hosting the site, and giving some feedback on the new ARIES publicity materials.  I also spent the best part of a day on AHRC review duties.

On Tuesday I met with Kirsteen McCue and her RA Brianna Robertson about a new project that is starting up about romantic national song.  The project will need a website and some sort of interactive map so we met to discuss this.  Kirsteen was hoping I’d be able to set up a site similar to the Burns sites I’ve done, but as I’m no longer allowed to use WordPress this is going to be a little difficult.  We’re going to try and set something up within the University’s T4 structure, but it might not be possible to get something working the way Kirsteen was hoping for.  I sent a long email to the University’s Web Team asking for advice and hopefully they’ll get back to me soon.

I spent the rest of the week returning to App development.  I’m going to be working on a new version of the ARIES app soon, so I thought it would be good to get everything up to date before this all starts up.  As I expected, since I last did any app development all the technical stuff has changed – a new version of Apache Cordova, new dependencies, new software to install, updates to XCode, a new requirement to install Android Studio etc etc.  Getting all of this infrastructure set up took quite a bit of time, especially the installation of a new piece of required software called ‘Cocoabeans’ that took an eternity to set up.

With all this in place I then focussed on creating the app version of ‘The Basics of English Metre’, which is a task that has been sitting in my ‘to do’ list for many months now.  I managed to create the required iOS and Android versions and installed them on my devices for testing.  All appeared to be working fine so I then set to work creating all of the files that are necessary to actually publish the App online.  I started with the iOS version.  This required the creation of 14 icon files and 10 launch screen images, which was a horrible tedious task.  I then needed to create several screenshots of the App store., which required getting screenshots from an iPad Pro (which I don’t have).  Thankfully XCode has an iOS simulator, which you can use to boot up your app and get screenshots.  However, although the simulator was working for the app earlier in the week, when I came to take the screenshots the app build just kept on failing when deploying to the simulator.  Rather strangely, the app would build just fine when deploying to me actual iPad, and also when building to an Archive file for submission to the store.  I spent ages trying to figure out what the problem was, but just couldn’t get to the bottom of it.  In the end I had to create a new version of the app, and this thankfully worked, so I guess there was some sort of conflict or corruption in the code for the first version.  With this out of the way I was able to take the screenshots, complete the App Store listings, upload my app file and submit the app for inclusion.  I managed to get this done on Friday afternoon so hopefully sometime next week the app will be available for download.  I didn’t have time to complete and submit the Android version of the app, so this is what I’ll focus on at the start of next week.

Week Beginning 26th June 2017

On Friday this week I attended the Kay Day event, a series of lectures to commemorate the work of Christian Kay.  It was a thoroughly interesting event with some wonderful talks and some lovely introductions where people spoke about the influence Christian had on their lives.  The main focus of the event was the Historical Thesaurus, and it was at this event that we officially launched the new versions of the main HT website and the Thesaurus of Old English website, which I have been working on over the past few weeks.  You can now see the new versions here http://historicalthesaurus.arts.gla.ac.uk/ and here: http://oldenglishthesaurus.arts.gla.ac.uk/.  We’ve had some really good feedback about the new versions and hopefully they will prove to be great research tools.

In the run-up to the even this week I spent some further time on last-minute tweaks to the websites. On Monday I finished my major reworking of the TOE browse structure, which I had spent quite a bit of time on towards the end of last week.  The ‘xx’ categories now all have no child categories.  This does look a little strange in some places as these categories are now sometimes the only ones at that level without child categories, and in some cases it’s fairly clear that they should have child categories (e.g. ’11 Action and Utility’ contains ’11 Action, operation’ that presumably then should contains ’11.01 Action, doing, performance’).  However, the structure generally makes a lot more sense now (no ‘weaving’ in ‘food and drink’!) and we can always work on further refinement of the tree structure at a later date.

I also updated the ‘jump to category’ section of the search page to hopefully make it clearer what these ‘t’ numbers are.  This text is also on the new HT website.  I also fixed the display of long category titles that have slashes in them.  In Firefox these were getting split up over multiple lines as you’d expect, but Chrome was keeping all of the text on one long line, thus breaking out of the box and looking a bit horrible.  I have added a little bit of code to the script that generates the category info to replace slashes with a slash followed by a zero-width space character (​).  This shouldn’t change the look of the titles, but means the line will break on the slashes if the text is too long for the box.  I also fixed the issue with subcategory ‘cite’ buttons being pushed out of the title section when the subcategory titles were of a certain long length.

I also noticed that the browser’s ‘back’ button wasn’t working when navigating the tree – e.g. if you click to load a new category or change the part of speech you can’t press the ‘back’ button to return to what you were looking at previously.  I’m not sure that this is a massive concern as I don’t think many people actually use the ‘back’ button much these days, but when you do press it the ‘back’ button the ‘hash’ in the URL changes, but the content of the page doesn’t update, unless you then press the browser’s ‘reload’ button.  I spent a bit of time investigating this and came up with a solution.  It’s not a perfect solution as all I’ve managed to do is to stop the browsing of the tree and parts of speech being added to the user’s history, therefore no matter how much clicking around the tree you do if you press ‘back’ you’ll just be taken to the last non-tree page you looked at.  I think this is acceptable as the URL in the address bar still gets updated when you click around, meaning you can still copy this and share the link, and clicking around the tree and parts of speech isn’t really reloading a new page anyway.  I’d say it’s better than the user pressing ‘back’ and nothing updating other than the ID in the URL, which is how it currently worked.

Marc also noted that our Google Analytics stats are not going to update now we’re using a new AJAX way to load category details.  Thankfully Google have thought about how to handle sites like ours and it looks like I followed some instructions to make my code submit a GA ‘hit’ when my ‘load category’ JavaScript runs, following the instructions here: https://developers.google.com/analytics/devguides/collection/analyticsjs/single-page-applications

There are still further things I want to do with the HT and TOE sites- e.g. I never did have the time to properly overhaul the back-end and create one unified API for handling all data requests.  That side of things is still a bit of a mess of individual scripts and I’d really like to tidy it up at some point.  Also, the way I updated the ‘back button’ issue was to use the HTML5 ‘history’ interface to update the URL in the address bar without actually adding this change to the browser’s history (See https://developer.mozilla.org/en-US/docs/Web/API/History).  If I had the time I would investigate using this interface to use proper variables in the URL (e.g. ‘?id=1’) rather than a hash (e.g. ‘#id=1’) as hashes are only ever handled client side whereas variables can be processed on both client and server.  Before this HTML5 interface was created there was no reliable way for Javascript to update the page URL in the address bar, other than by changing the hash.

Other than Historical Thesaurus matters, I spent some time this week on other projects.  I read through the job applications for the SPADE RA post and met with Jane to discuss these.  I also fixed a couple of issues with the SCOSYA content management system that had crept in since the system was moved to a new server a while back.  I also got my MacOS system and XCode up to date in preparation for doing more app work in the near future.

I spent the remainder of my week updating the digital edition of the Woolf short story that I’ve been working on for Bryony Randall’s ‘New Modernist Editing’ project.  Bryony had sent the URL out for feedback and we’d received quite a lot of useful suggestions.  Bryony herself had also provided me with some updated text for the explanatory notes and some additional pages about the project, such as a bibliography.

I made some tweaks to the XML transcription to fix a few issues that people had noticed.  I added in ‘Index’ as a title to the index page and I’ve added in Bryony’s explanatory text.

I relabelled ‘Edition Settings’ to ‘Create your own view’ to make it clearer what this option is.  I moved the ‘next’ and ‘previous’ buttons to midway down the left and right edges of the page, and I think this works really well as when you’re looking at the text it feels more intuitive to ‘turn the page’ at the edges of what you’re looking at.  It also frees up space for additional buttons in the top navigation bar.

I made the ‘explanatory notes’ a dotted orange line rather than blue and I removed the OpenLayers blue dot and link from the facsimile view to reduce confusion.  In the ‘create your own view’ facility I made it so that if you select ‘original text’ this automatically selects all of the options within it.  If you deselect ‘original text’ the options within are all deselected.  If ‘Edited text’ is not selected when you do this then it becomes selected.  If ‘Original text’ is deselected and you deselect ‘Edited text’ then ‘Original text’ and the options within all become selected.  This should hopefully make it more difficult to create a view of the text that doesn’t make sense.

I also added in some new interpretations to the first handwritten note, as this is still rather undecipherable.  I created new pages for the ‘further information’, ‘how to use’ and ‘bibliography’.  These are linked to from the navigation bar of the pages of the manuscript, in addition to being linked to from the index page text.  A link appears allowing you to return to the page you were looking at if you access one of these pages from a manuscript page.  I think the digital edition is looking rather good now, and it was good to get the work on this completed before my holiday.  I can’t share the URL yet as we’re still waiting on some web space for the resource at The KEEP archives.  Hopefully this will happen by the end of July.

I will be on holiday for the next two weeks now so no further updates from me until later on in the summer.


Week Beginning 29th May 2017

Monday this week was the spring bank holiday so it was a four-day week for me.  I split my time this week over three main projects.  Firstly, I set up an initial project website for Jane Stuart-Smith’s SPADE project.  We’d had some difficulty in assigning the resources for this project but thankfully this week we were given some web space and I managed to get a website set up, create a skeleton structure for it and create the user accounts that will allow the project team to manage the content.  I also had some email discussions with the project partners about how best to handle ‘private’ pages that should be accessible to the team but no-one else.  There is still some work to be done on the website, but for the time being my work is done.

I also continued this week to work on the public interface for the database of poems for The People’s Voice project.  Last week I started on the search facility, but only progressed as far as allowing a search for a few fields, with the search results page displaying nothing more than the number of matching poems.  This week I managed to pretty much complete the search facility.  Users can now search for any combination of search boxes and on the search results page there is now a section above the results that lists what you’ve searched for.  This also includes a ‘refine your search’ button that takes the user back to the search page.  The previously selected options are now ‘remembered’ by the form, allowing the user to update what they’ve searched for.  There is also a ‘clear search boxes’ button so the user can start a fresh search.

Search results are now paginated.  Twenty results are displayed per page and if there are more results than this then ‘next’, ‘previous’ and ‘jump to page’ links are displayed above and below your search results.  If there are lots of pages some ‘jump to page’ links are omitted to stop things getting too cluttered.  Search results display the poem title, date (or ‘undated’ if there is no date), archive / library, franchise and author.  Clicking on a result will lead to the full record, but this is still to do.  I also haven’t added in the option to order the results by anything other than poem title, as I’m not sure whether this will really be of much use and it’s going to require a reworking of the way search results are queried if I am to implement it.  I still have the ‘browse’ interface to work on and the actual page that displays the poem details, and I’ll continue with this next week.

I met with Bryony Randall this week to discuss some final tweaks to the digital edition of the Virginia Woolf short story that I’ve been working on.  I made a few changes to the transcription, updated how we label ‘sic’ and ‘corr’ text in the ‘edition settings’ (these are now called ‘original’ and ‘edited’) and I changed which edition settings are selected by default.  Where previously the original text was displayed we now display the ‘edited’ text with only line breaks from the ‘original’ retained.  Bryony is going to ask for feedback from members of the Network and we’re going to aim to get things finalised by the end of the month.

I spent the rest of the week working on the Historical Thesaurus.  Last week I met with Marc and Fraser to discuss updates to the website that we were going to try and implement before ‘Kay Day’ at the end of the month.  One thing I’ve wanted to try to implement for a while now is a tree-based browse structure.  I created a visual tree browse structure using the D3.js library for the Scots Thesaurus project and doing so made me realise how useful having such a speedy way to browse the full thesaurus structure would be.

I tried a few jQuery ‘tree’ plugins and in the end I went with FancyTree (https://github.com/mar10/fancytree) because it clearly explained how to load data into nodes via AJAX when a user opens the node.  This is important for us as we can’t load all 235,000 categories into the tree at once (well, we could but it would be a bad idea).  I created a PHP script (that I will eventually integrate with the HT API) that you can pass a catid to and it will spit out a JSON file containing all of the categories and subcategories that are one level down from it.  It also checks whether each of these have child categories.  If there are child categories then the tree knows to place the little ‘expand’ icon next to the category.  When the user clicks on a category this fires off a request for the category’s children and these are then dynamically loaded into the tree.  Here’s a screenshot of my first attempt at using FancyTree:

Subcategories are highlighted with a grey background and in this version you can’t actually view the words in a category.  Also, only nouns are currently represented.  I thought at this stage that we might have to have separate trees for each part of speech, but then realised that the other parts of speech don’t have a full hierarchy so the tree would be missing lots of branches and would therefore not work.  In this version the labels only show the category heading and catnum / subcat, but I can update the labels to display additional information.  We could for example show the number of categories within each category, or somehow represent the number of words contained in the category so you can see where the big categories are. I should also be able to override the arrow icons with font awesome icons.

After creating this initial version I realised there was still a lot to be done.  For example, if we’re using this browser then we need to ensure that when you open a category the tree loads with the correct part opened.  This might be tricky to implement.  Also there’s the issue of dealing with different parts of speech.

After working on this initial version I then began to work on a version that was integrated into the HT website design.  I also followed the plugin instructions for using Font Awsome icons rather than the bitmap icons, although this took some working out.  In order to get this to work another Javascript file was required (jquery.fancytree.glyph.js) but I just couldn’t get this to work.  It kept bringing up javascript errors about a variable called ‘span’ not existing.  Eventually I commented out the line of code (relating to the ‘loading’ section) and after that everything worked perfectly.  With this new version I also added in the facility to actually view the words when you open a category, and also to switch to different parts of speech.  It’s all working very nicely, apart from subcategories belonging to other parts of speech.  I’m wondering whether I should include subcategories in the tree or whether they should just be viewable through the ‘category’ pane.  If they only appear in the tree then it is never going to be possible to view subcategories that aren’t nouns whereas if they appear in a section as they do in the current category page then they will load whenever anyone selects that PoS.  It does mean we would lose a lot of content from the tree, though.  E.g. if you find ‘Beer’ all of those subcategories of beer would no longer be in the tree and you would no longer be able to flick through them all really quickly.  This is going to need some further though.  But here’s a screenshot of the current work-in-progress version of the tree browser:

Week Beginning 15th May 2017

I was ill this week and was off work sick on Wednesday and Thursday and because of this I didn’t manage to get much done with regards to the LDNA visualisations or the SCOSYA atlas.  I had to spend about half a day upgrading the 20 WordPress sites I manage to the most recent release and I spoke to Jane about the problems we’re having getting some space set aside for the website for her SPADE project.  I also replied to Alex Benchimol about a project he is putting together and completed the review of the paper that I had started last week.

On Tuesday I completed an initial version of the digital edition I have been working on for Bryony Randall’s New Modernist Editing project.  This version now includes all of the features I had intended to implement and I completed the TEI transcription for the whole manuscript.  Using the resource it is now possible to tailor the view as you would like to see it, ranging from a version of the text that closely represents the original typewritten page, including line breaks, handwritten notes and typos, through to a fully edited and ‘corrected’ version of the text complete with explanatory notes that closely resembles the text you’d find in an edited and printed edition.  I think it works rather well, although there are still some aspects that will need tweaking, such as adding in an introduction, possibly updating the way some features are labelled and maybe changing the way some of the document is tagged.

One new feature I added in this week that I’m rather pleased with is a reworking of the edition settings function.  Previously a user’s selected settings were stored in a JavaScript object behind the scenes; the site ‘remembered’ the settings as the user navigated from page to page, but if the user bookmarked a page, or copied the URL to send to someone or for a citation the exact settings would not be included and the default view would instead be loaded.  I decided that this wasn’t the best way to go about things so instead updated the JavaScript so that settings are now incorporated into the page URL.  This does make the URL rather longer and messier, but it does mean that the exact view of the page can be passed between sessions, which I think is more important than cleaner URLs.

When I returned to work on Friday I decided to make a start on the public interface for the database of poems and songs for The People’s Voice project.  I spent the morning writing a specification document and thinking about how the search would work.  The project website is WordPress based and I did consider developing the search as a WordPress plugin, as I have done for other projects, such as the SciFiMedHums project (see http://scifimedhums.glasgow.ac.uk/the-database/).  However, I didn’t want the resource to be too tied into WordPress and instead wanted it to be useable (with minimal changes) independently of the WordPress system.  Having said that, I still wanted the resource to feel like a part of the main project website and to use the WordPress theme the rest of the site uses.  After a bit of investigation I found a way to create a PHP page that is not part of WordPress but ‘hooks’ into some WordPress functions in order to use the website’s theme.  Basically you add a ‘require’ statement that pulls in ‘wp-load.php’ and then you can call the functions that process the WordPress header, sidebar and footer (get_header(), get_sidebar(), get_footer()) wherever you want these to appear.  All the rest of your script can be as you want it.

I emailed my specification document to the project team and started to work on the search interface in the afternoon.  This is going to use jQuery UI components so I created a theme for this and set up the basic structure for the search form.  It’s not fully complete yet as I need to add in some ‘auto-complete’ functions and some content for drop-down lists, but the overall structure is there.  The project team wanted pretty much every field in the database to be searchable, which makes for a rather unwieldy and intimidating search form so I’m going to have to think of a way to make this more appealing.  I’ll try to continue with this next week, if I have the time.