Week Beginning 23rd April 2018

I worked on a number of different projects this week.  Jane Stuart-Smith has secured some funding for me to redevelop a couple of old websites for her (Seeing Speech and Dynamic Dialects) so I spent some time working on this.  My first task was to add Google Analytics to the old pages so we can more easily track current usage and how things might change when we eventually launch the new sites.  It was a bit of a pain to add the code in as the current sites consist of flat HTML files (as opposed to using one template file), so every page had to be updated separately.  But with that in place I could begin to think about developing the new interface.  I decided that it would be a good opportunity to try out the Bootstrap front-end library (https://getbootstrap.com/).  I’d been meaning to look into this for a while and I spent a bit of time experimenting with it.  I like the way it provides a very straightforward grid-based layout system for responsive design, and some of the components such as tabs and drop-down menus are excellent too.  I do find that the documentation is a little haphazard, though.  I found myself having to find third party tutorials and explanations as a lot of stuff you can do with the library just isn’t very well explained in the documentation section.  The jQuery UI site, with its clear and extensive examples plus API seems a lot better to me.  However, I managed to get my head round how Bootstrap works and managed to create a number of different mock-up interfaces for the new websites.  I can’t really share them here, or even show screenshots yet, but I created 6 mock-ups and sent the URLs off to Jane and Eleanor Lawson at QMU for feedback.  Once I hear back from them I will be able to take things further.

I also continued with the new timeline feature for the Historical Thesaurus.  Last week I pretty much completed a version of the timeline that was ready to integrate with the live site, and this week I set about integrated it with a test version of the live site.  In this version a ‘Timeline’ button appears next to the ‘Cite’ button for each category / subcategory on the category browse page.  Pressing on this button opens up a jQuery UI modal popup containing the timeline, the sort options, the ‘save SVG’ option and the category heading / catnum and part of speech, as you can see in the following screenshot:

It took quite a bit of effort to implement this, as the timeline uses D3.js version 4 and the sparklines that are used elsewhere in the site (see http://historicalthesaurus.arts.gla.ac.uk/sparklines/) used version 3.  Rather than have two different versions I thought it would be better to upgrade the sparklines to version 4.  This took a couple of hours to sort out as the changes between versions are rather extensive, but thankfully I managed to get the sparklines working in the end.  Getting the timeline to work within a jQuery dialog box was also rather tricky.  The timeline just kept giving error messages and failed to work for ages until I worked out that the dialog box needs to be open and visible before the timeline loads, otherwise the timeline code falls over.  Previously all of my timeline code was contained in one PHP file, including database queries, working with the data in PHP, CSS styles, HTML and the actual processing of the final data outputted by PHP in JavaScript.  This all needed to be split up, with the PHP stuff going into the API, the JavaScript stuff getting added to the existing HT tree JavaScript file, CSS into the existing stylesheet and HTML into the existing category HTML page.  However, it’s all in place now. I still need to create mini-inline timelines for the category page, and I’m hoping to get some time to look into this next week.

Also this week I met with Luca to discuss changes to Joanna Kopaczyk’s project, and I had to spend some time deleting lots of old emails as I received an actual legitimate “your mailbox is almost full” message.  I was also sent a bunch of tweaks and some new song materials for ThePeoplesVoice website by Catriona MacDonald.  The project has now ended and these are the final updates.  You can view the website, the songs and the database of poems here: https://thepeoplesvoice.glasgow.ac.uk/.

My final project of the week was the REELS project, for which I added in the final bits of functionality for the front-end.  This included adding a ‘cite’ option to the record page and updating the API so that the formatting of the CSV for a record is more useable.  Rather than all of the data appearing on one very long row the file places each bit of data on a new row, with the item label in the first column and the corresponding data in the second.  Historical forms and sources now have separate rows for each of their bits of data rather than having everything joined in one field.

I’d also noticed that the ‘show on map’ link in the text list of results was failing to pan and zoom sometimes.  This was an intermittent fault that never produced any error messages and although I spent some time trying to figure out what caused it I didn’t manage to find an answer.  For this reason I decided to replace the feature that does a nice animation, zooming out from the current map location, panning to the new location and zooming back in.  Instead when you click on the ‘show on map’ link the map loads directly at the point.  It’s not as pleasing to look at as the animated version, but at least it works consistently.  I also updated the layout of the record page, splitting historical forms off from the rest of the record and placing them in their own tab, which I think works pretty well.  Other than some further formatting of pages, changing colour schemes and some other design tweaks I think that’s the front-end for the project pretty much sorted now.

Week Beginning 16th April 2018

I continued to work on the public interface for the REELS project for a lot of this week, and by the end of the week I had managed to implement pretty much all of the features that I’d hoped to create.  The first thing I tackled was adding in different base map options.  I picked out a few other MapBox tilesets that I thought would be useful, including a map that shows relief features and contour lines, a satellite map with labels included (e.g. town and road names) and a ‘clean’ satellite map that doesn’t have any labels.  I also discovered that the NLS offers free access to OS maps from the 1920s and 30s for use as a base map so I added that in as an option too.  By default the base map switcher gets added to the ‘legend’ option that I have in the top right of my map, which for REELS features the ability to turn layers on and off.  This made the box rather long and instead I wanted to have this feature appear in the ‘Display Options’ menu that slides out from the left, where the marker categorisation options are already located.  I figured out how to make buttons to handle base map switching and it’s all working pretty well.

Previously I’d set up the map so that the tooltip labels for map markers automatically appeared at a certain zoom level, and then hid themselves again when the user zoomed out.  This worked ok, but sometimes I found the labels would get in the way when zoomed in, while at other times I wished the labels would appear when zoomed out further.  For these reasons I decided to add in an option to manually choose when labels appeared or disappeared.  I added this option as a couple of buttons to the ‘Display Options’ menu and I think it works nicely.  Here’s a screenshot of how things currently look:

During this week I encountered something that is possibly going to be an issue.  The map layers (apart from the NLS one) all use a service called mapbox.com.  This allows you to customise maps, publish them and use them in your own websites.  It’s a commercial enterprise and they offer a free tier, which allows up to 50,000 map views each month (See https://www.mapbox.com/pricing/).  I had thought this would be ample but looking at the statistics I’ve managed to accrue almost 8,000 map views so far this month just while developing the site.  This is rather worrying as it suggests we’ll very quickly exceed the 50,000 limit.  If we do there’s a cost, $0.50 per 1,000 map views.  Although this doesn’t seem like a lot we’ve no way of knowing how many users and map views we might end up getting, especially in the first few months after publication.  The project has no funds to pay for this and it’s not great for long-term sustainability either.  It means we’re probably going to have to look at alternative free map suppliers and to drop Mapbox, which is a shame.  There is a list of free map suppliers here: https://leaflet-extras.github.io/leaflet-providers/preview/ and I guess we’ll have to pick a few from this.  We still might be able to include one Mapbox layer, but probably not four different ones as we currently have.

Carole emailed me this week to say that the ‘browse historical forms by initial letter’ feature was bringing back some unexpected results.  This is because we have historical forms such as ‘6 mercatas terrarum ville de Mersintoun’, which therefore appeared under ‘6’.  What Carole wanted was for the initial letter used to be that of the italicised text, so in the previous example the name would appear under ‘M’.  I updated the database to add in a new field for storing the initial letter of the text in italics and updated the CMS so that this field was automatically populated when a historical form was added or edited and the browse feature now works as Carole was expecting.  I had also noticed that the place-name element glossary wasn’t being ordered in a properly alphabetical manner, but instead any elements that were only associated with historical forms and not current place-names were appearing below the list of current elements.  Also, where an element was associated with both current and historical forms the element was appearing twice, rather than the historical forms being added to the existing element.  Some tweaking of the API code sorted these issues out.

The next thing I tackled was allowing specific map views to be bookmarked, shared and cited.  It’s really annoying with online map resources when you can’t ‘save’ a particular map you’re interested in, because the URL in the address bar is just for the page with the map on it – saving the URL simply returns you to the default view of the map.  Thankfully there’s a way to solve this issue, using a neat little Leaflet plugin called Leaflet Hash (https://github.com/mlevans/leaflet-hash) that adds the latitude, longitude and zoom level to the page URL as part of the hash value (the part of the URL after the ‘#’ sign).  The plugin updates this value every time the user pans or zooms the map, and when a page loads with a value after the hash the plugin automatically grabs the passed data and positions the map accordingly.  It’s all very nicely done.  However, I needed to pass more than just the latitude, longitude and zoom level in the hash as I wanted other things to be ‘remembered’.  The page has a text list of results in a separate tab so I needed to keep track of whether the user was looking at the map or the text list.  I also wanted the user’s choice of base map, marker classification type and whether the labels were on or off to be tracked as well.  In order to add these things to the hash, and to ensure the plugin didn’t remove them, I needed to update the plugin’s code, which took a bit of time to do.  I also had to write some new sections that handled setting the map up with the right view, based on the contents of the hash.  But with the code in place it means that when a hash like this: ‘#14/55.8405/-2.1199/resultsTabs-0/altitude/tileRelief/labelsOn’ is passed to my code the plugin can tell the map needs to zoom to level 14 and centre on latitude / longitude 55.8405, 2.1199, and it can also tell that the map (rather than the list of results) should be visible, the markers should be classified by their altitude, the relief base map should be selected and the marker labels should be visible.

With this in place I could the create a ‘cite this view’ option, which works in the same way as the ‘cite’ option I created for the Historical Thesaurus:  The user presses a button and a pop-up opens listing the citation in three styles (APA, MLA and Chicago), plus a citation for the whole resource.  The links included will return the user (or anyone else) to the specific view that the user had on screen when they pressed the ‘cite’ button, with information in the cite text about what it is that the view contains, e.g. ‘Map of Berwickshire place-names: Quick search for ‘h_ll’, classified by start date, base map: ‘relief’, labels visible. 2018. In Recovering the Earliest English Language in Scotland: evidence from place-names. Glasgow: University of Glasgow. Retrieved 23 April 2018, from <URL Here>’.

Other than REELS work I did some other smaller bits of work.  A new version of WordPress was released this week, so I upgraded all of the WordPress sites I manage to this version.  Another bunch of University hosted sites was migrated from HTTP to HTTPS this week, so I spent some time testing out the sites I had created that had been affected.  I needed to tweak a couple, but it didn’t take long to sort things.  I also received a request to update one of the Scottish Literature bibliography pages I’d created in T4 a couple of years ago so did that, and I spent a bit of time on App account management duties, removing the details of a user who is retiring and creating a user account for some external developers that are making an ‘university guide’ app.

I also spent about a day working some more on the timelines for the Historical Thesaurus.  I think I’m just about ready to add the feature to the site (well, as a test version – not  actually making it live until it’s been properly tested and approved).  In this version the colours have been replaced by three shades of green taken from the site banner, as Marc pointed out that people might expect the different colours to mean something rather than them being arbitrary.  We’ll add colours back in once we have something to categorise, such as language of origin.  I also added in a facility to sort the timeline by length of use in addition to date and alphabetically, and changing the sort option no longer breaks the tooltips.  Also, changing back to sorted by date now orders the data as it was originally ordered rather than being slightly different (where two categories have the same start date the end date is now taken into consideration).  Finally, there is an option to save the timeline as an SVG image for use elsewhere (e.g. presentations).

I had hoped to include a nice, swish animation when reordering the timeline, with the rows moving about the place but I’ve given up on this idea for now.  This is because the timeline library I’m using (which I’ve already had to adapt quite a bit) doesn’t group the labels and the data together as one entity so it’s not possible to grab an entire row and manipulate it.  I’d have to make some big changes to the library to add this structure and it’s more trouble than it’s worth, really.

Here’s an example of how the timeline looks for the category ‘The World’, ordered by length of use, with a tooltip visible:

My next step will be to create a test version of the HT ‘category’ page with a new button next to the ‘cite’ buttons for opening the timeline, which will probably be some kind of model dialog box.  I’ll also need to investigate the mini-timelines Marc wanted to appear beside each word.



Week Beginning 9th April 2018

I returned to work after my Easter holiday on Tuesday this week, making it another four-day week for me.  On Tuesday I spent some time going through my emails and dealing with some issues that had arisen whilst I’d been away.  This included sorting out why plain text versions of the texts in the Corpus of Modern Scottish Writing were giving 403 errors (it turned out the server was set up to not allow plain text files to be accessed and an email to Chris got this sorted).  I also spent some time going through the Mapping Metaphor data for Wendy.  She wanted me to structure the data to allow her to easily see which metaphors continued from Old English times and I wrote a script that gave a nice colour-coded output to show those that continued or didn’t.  I also created another script that lists the number (and the details of) metaphors that begin in each 50-year period across the full range.  In addition, I spoke to Gavin Miller about an estimate of my time for a potential follow-on project he’s putting together.

The rest of my week was split between two projects:  LinguisticDNA and REELS.  For LinguisticDNA I continued to work on the search facilities for the semantically tagged EEBO dataset.  Chris gave me a test server on Tuesday (just an old desktop PC to add to the several others I now have in my office) and I managed to get the database and the scripts I’d started working on before Easter transferred onto it.  With everything set up I continued to add new features to the search facility.  I completed the second search option (Choose a Thematic Heading and a specific book to view the most frequent words) which allowa you to specify a Thematic Heading, a book, a maximum number of returned words and whether the theme selection includes lower levels.  I also made it so that you can miss out the selection of a thematic heading to bring back all of the words in the specified book listed by frequency.  If you do this each word’s thematic heading is also listed in the output, and it’s a useful way of figuring out which thematic headings you might want to focus on.

I also added a new option to both searches 1 and 2 that allows you to amalgamate the different noun and verb types.  There are several different types (e.g. NN1 and NN2 for singular and plural forms of nouns) and it’s useful to join these together as single frequency counts  rather than having them listed separately.

I also completed search option 3 (Choose a specific book to view the most frequent Thematic Headings).  This allows the user to select a book from an autocomplete list and optionally provide a limit to the returned headings.  The results display the thematic headings found in the book listed in order of frequency.  The returned headings are displayed as links that perform a ‘search 2’ for the heading in the book, allowing you to more easily ‘drill down’ into the data.  For all results I have added in a count column, so you can easily see how many results are returned or reference a specific result, and I also added titles to the search results pages that tell you exactly what it is you’ve searched for.  I also created a list of all thematic headings, as I thought it might be handy to be able to see what’s what.  When looking at this list you can perform a ‘search 1’ for any of the headings by clicking on one, and similarly, I created an option to list all of the books that form the dataset.  This list displays each book’s ID, author, title, terms and number of pages, and you can perform a ‘search 3’ for a book by clicking on its ID.

On Friday I participated in the Linguistic DNA project conference call, following which I wrote a document describing the EEBO search facilities, as project members outside of Glasgow can’t currently access the site I’ve put together.

For REELS I continued to work on the public interface for the place-name data, which included the following:

  1. The number of returned place-names is now displayed in the ‘you searched for…’ box
  2. The textual list of results now features two buttons for each result, one to view the record and one to view the place-name on the map.  I’m hoping the latter might be quite useful as I often find an interesting name in the textual list and wonder which dot on the map it actually corresponds to.  Now with one click I can find it.
  3. Place-name labels on the map now appear when you zoom in past a certain level (currently set to zoom level 12).  Note that only results rather than grey spots get the visible labels as otherwise there’s too much clutter and the map takes ages to load too.
  4. The record page now features a map with the place-name at the centre, and all other place-names as grey dots.  The marker label is automatically visible.
  5. Returning back to the search results from a record when you’ve done a quick search now works – previously this was broken.
  6. The map zoom controls have been moved to the bottom right, and underneath them is a new icon for making the map ‘full screen’.  Pressing on this will make the map take up the whole of your screen.  Press ‘Esc’ or on the icon again to return to the regular view.  Note that this feature requires a modern web browser, although I’ve just tested in in IE on Windows 10 and it works.  Using full screen mode makes working with the map much more pleasant.  Note, however, that navigating away form the map (e.g. if you click a ‘view record’ button) will return you to the regular view.
  7. There is a new ‘menu’ icon in the top-left of the map.  Press on this and a menu slides out from the left.  This presents you with options to change how the results are categorised on the map.  In addition to the ‘by classification code’ option that has always been there, you can now categorise and colour code the markers by start date, altitude and element language.  As with code, you can turn on and off particular levels using the legend in the top right. E.g. if you only want to display markers that have an altitude of 300m or more.



Week Beginning 26th March 2018

As Friday this week was Good Friday this was a four-day week for me.  I’ll be on holiday all next week too, so I won’t be posting for a while.  I focussed on two projects this week: REELS and Linguistic DNA.  For REELS I continued to implement features for the front-end of the website, as I had defined in the specification document I wrote a few months ago.  I spent about a day working on the Element Glossary feature.  First of all I had to update the API in order to add in the queries required to bring back the place-name element data in a format that the glossary required.  This included not just bringing back information about the elements (e.g. language, part of speech) but also adding in queries that brought back the number of available current place-names and historical forms that the element appears in.  This was slightly tricky, but I managed to get the queries working in the end, and my API now spits out some nicely formatted JSON data for the elements that the front-end can use.  With this in place I could create the front-end functionality.  The element glossary functions as described in my specification document, displaying all available information about the element, including the number of place-names and historical forms it has been associated with.  There’s an option to limit the list of elements by language and clicking on an entry in the glossary performs a search for the item, leading through to the map / textual list of place-names.  I also embedded IDs in the list entries that allow the list to be loaded at a specific element, which will be useful for other parts of the site, such as the full place-name record.

The full place-name record page was the other major feature I implemented this week, and is really the final big piece of the front-end that needed to be implemented (but having said that there are still many other smaller pieces still to tackle).  First of all I updated the API to add in an endpoint that allows you to pass a place-name ID and to return all of the data about the place-name as JSON or CSV data (I still need to update the CSV output to make it a bit more usable, though – currently all data is presented on one long row, with headings in the row above and having this vertically rather than horizontally arranged would make more sense).  With the API endpoint in place I then created the page to display all of this data.  This included adding in links to allow users to download the data as CSV or JSON, making searchable parts of the data links that lead through to the search results (e.g. parish, classification codes), adding in the place-name elements and links through to the glossary, and adding in all of the historical forms, together with their sources and elements.  It’s coming along pretty well, but I still need to work a bit more on the layout (e.g. maybe moving the historical forms to another tab and adding in a map showing the location of the place-name).

For Linguistic DNA I continued to work on the EEBO thematic heading frequency data.  Chris is going to set me up with access to a temporary server for my database and queries for this, but didn’t manage to make it available this week, so I continued to work on my own desktop PC.  I added in the thematic heading metadata, to make the outputted spreadsheets more easy to understand (i.e. instead of just displaying a thematic heading code (e.g. ‘AA’) the spreadsheet can include the full heading names too (e.g. ‘The World’).  I also noticed that we have some duplicate heading codes in the system, which was causing problems when I tried to use the concatenated codes as a primary key.  I notified Fraser about this and we’ll have to fix this later.  I also integrated all of the TCP metadata, and then stripped out all of the records for books that are not in our dataset, leaving about 25,000 book records.  With this in place I will be able to join the records up to the frequency data in order to limit the queries, e.g. based on the year the books were published, or limiting to specific book titles.

I then created a search facility that lets a user query the full 103 million row dataset in order to bring back frequency data for specific thematic headings or years, or books.  I created the search form (as you can see below), with certain fields such as thematic heading and book title being ‘autocomplete’ fields, bringing up a list of matching items as you type.  You can also choose whether to focus on a specific thematic heading, or to include all lower levels in the hierarchy as well as the one you enter, so for example ‘AA’ will also bring back the data for AA:01, AA:02 etc.

With the form in place I set to work on the queries that will run when the form is submitted.  At this stage I still wasn’t sure whether it would be feasible to run the queries in a browser or if they might take hours to execute.  By the end of the week I had completed the first query option, and thankfully the query only took a few seconds to execute so it will be possible to make the query interface available to researchers to use themselves via their browser.  It’s now possible to do things like find the top 20 most common words within a specific thematic heading for one decade and then compare these results with the output for another decade, which I think will be hugely useful.  I still need to implement the other two search types as shown in the above screenshot, and get all of this working on a server rather than my own desktop PC, but it’s all looking rather promising.