Week Beginning 8th November 2021

I spent a bit of time this week working for the DSL.  I needed to act as the go-between for the DSL’s new IT people who are updating their email system and the University’s IT people who manage the DNS record on behalf of the DSL.  IT took a few attempts before the required changes were successfully in place.  I also read through a document that had been prepared about automatically ‘fixing’ the DSL’s dates to make them machine readable, and gave some feedback on the many different procedures that will need to be performed on the various date forms to produce the desired structure.

I also looked into an issue with cross references within citations that work in the live site but are not functioning in the new site or in the DSL’s editing system.  After some investigation it seems like it’s another case of the original API ‘fixing’ the XML in some way each time it’s processed in order for these links to work.  The XML for ‘put_v’ stored in the original API is as follows:

<cit><cref><date>1591</date> <title>Edinb. B. Rec.</title> V 41 (see <ref>Putting</ref> <i>vbl. n.</i> 1 (1)).</cref></cit>

There is a <ref> tag but no other information in this tag.  This is the same for the XML exported from DPS and used in the new dsl site (which has an additional bibliographic reference in):

<cit><cref refid=”bib013153″><date>1591</date> <title>Edinb. B. Rec.</title> V 41 (see <ref>Putting</ref> <i>vbl. n.</i> 1 (1)).</cref></cit>

The XSLT for both the live and new sites doesn’t include anything to process a <ref> that doesn’t include any attributes so both the live and new sites shouldn’t be displaying a link through to ‘putting’.  But of course the live site does.  I had generated and stored the XML that the original API (which I did not develop) outputs whenever the live site asks for an entry.  When looking at this I found the following:

<cit><cref ref=”db674″><date>1591</date> <title>Edinb. B. Rec.</title> V 41 (see <ref action=”link” href=”dost/putting”>Putting</ref> <i>vbl. n.</i> 1 (1)).</cref></cit>

You can see that the original API is injecting both a bibliographical cross-reference and the ‘putting’ reference.  The former we previously identified and sorted but the latter unfortunately hasn’t, although references that are not in citations do seem to have been fixed.  I updated the XSLT on the new dsl site to process the <ref> so the link now works, however this is not an approach that can be relied upon as all the XSLT is currently doing is taking the contents of the tag (Putting) and making a link out of it.  If the ‘slug’ of the entry doesn’t match the display form then the link is not going to work.  The original API includes a table containing cross references, but this doesn’t differentiate ones in citations from regular ones, and as the ‘putting_v’ entry contains 83 references it’s not going to be easy to pick out from this the ones that still need to be added.  This will need further discussion with the editors.

Continuing on a dictionary theme, I also did some further work for the Anglo-Norman Dictionary.  Last week I processed entries where a varlist date needed to be used as the citation date, but we noticed that the earliest date for entries hadn’t been updated in many cases where it should have been.  This week I figured out what went wrong.  My script only updated the entry’s date if the new date from the varlist was earlier than the existing earliest date for the entry.  This is obviously not what we want as in the majority of cases the varlist date will be later and should replace the earlier date that is erroneous.  Thankfully it was easy to pick out all of the entries that have a ‘usevardate’ and I then reran a corrected version of the script that checks and replaces an entry’s earliest date.

The editor spotted a couple of entries that still hadn’t been updated after this process and I then had to investigate them.  One of them had an error in the edited markup that was preventing the update from being applied.  For the other I realised that my code to update the XML wasn’t looking at all senses, just the first in each entry.  My script was attempting to loop through all senses as follows:

foreach($xml->main_entry->sense -> attestation as $a){

//process here

}

Which unfortunately only loops through all attestations in the first sense.  What I needed to do was:

 

foreach($xml->main_entry->sense as $s){

foreach($s->attestation as $a){

//process here

}

}

As the sense that needed updating for ‘aspreté’ was the last one the XML wasn’t getting changed, this meant ‘usevardate’ wasn’t present in the XML therefore my update to regenerate the earliest dates didn’t catch this entry (despite all dates for citations being successfully updated in the database for the entry).  I then fixed my script and regenerated all data again, including fixing the data so the ones with XML errors were updated.  I then ran a further spreadsheet containing entries that needed updated through the fixed script, resulting in a further 257 citations that had their dates updated.

Finally, I updated the Dictionary Management System so that ‘usevardate’ dates are taken into consideration when processing and publishing uploaded XML files.  If a ‘usevardate’ is found then this date is used for the attestation, which automatically affects the earliest date that is generated for the entry and also the dates used for attestations for search purposes.  I tried this out by downloading the XML for ‘admirable’, which features a ‘usevardate’.  I then edited the XML to remove the ‘usevardate’ before uploading and publishing this version.  As expected the dates for the attestation and the entry’s earliest date were affected by this change.  I then edited the XML to reinstate the ‘usevardate’ and uploaded and published this version, which took into consideration the ‘usevardate’ when generating the entry’s earliest date and attestation dates and returned the entry to the way it was before the test.

Also this week I set up a WordPress site that will be used for the archive of the International Journal of Scottish Theatre and Screen and migrated one of the issues to WordPress, which required me to do the following:

  1. Open the file in a PDF viewer for reference (e.g. Adobe Acrobat)
  2. Open the file in MS Word, which converts it into an editable format
  3. Create a WordPress page for the article with the article’s title as the page title and setting the page ‘parent’ as Volume 1
  4. Copy and paste the article contents from Word into WordPress
  5. Go through the article in WordPress, referencing the file in Acrobat, and manually fixing any issues that I spotted (e.g. fixing the display of headings, fixing some line breaks that were erroneously added). Footnotes proved to be particularly tricky as their layout was not handled very well by Word.  It’s possible that some footnotes are not quite right, especially with the ‘Trainspotting’ article that has more than 70(!) footnotes.
  6. Publish the WordPress page and update the ‘Volume 1’ page to add a link to it.

None of this was particularly difficult to do, but it was somewhat time-consuming.  There are a further 18 issues left to do (as far as I can tell), although some of these will take longer as they contain more articles, and some of these are more structurally complicated (e.g. including images).  Gerry Carruthers is getting a couple of students to do the rest and we have a meeting scheduled next week where I’ll talk through the process.

I also made some further tweaks to the WordPress site for the ‘Our Heritage, Our Stories’ site and dealt with renewing the domain for TheGlasgowStory.com site, which is now safe for a further nine years.  I also generated an Excel spreadsheet of the full lexical dataset from Mapping Metaphor for Wendy Anderson after she had a request for the data from some researchers in Germany.

I spent the rest of the week working for the Speak For Yersel project, continuing to generate mockups of the interactive exercises.  I completed an initial version of the overall structure for both the accessibility and word choice question types for the grammar exercise, so it will be possible to just ‘plug in’ any number of other questions that fit these templates.  What I haven’t done yet is incorporate the maps, the post-questionnaire ‘explore’ or the final quiz, as these need more content.  Here’s how things currently look:

I used another different font for the heading (Slackey), with the same one used for the ‘Question x of y’ text too’.  I also used CSS gradients quite a bit in this version, as the team seemed quite keen on these.  There’s a subtle diagonal gradient in the header and footer backgrounds, and a more obvious  top-to-bottom one in the answer buttons.  I used different combinations of colours too.  I created a progress bar, which works, but with only two questions in the system it’s not especially obvious what it does.  Rather than having people click an answer and then click a ‘next’ button to continue I’ve made it so that clicking an answer automatically loads the next step, and clicking an answer loads a panel with a ‘map’ – this is just a static image for now.  It also loads a ‘next’ button if there is a next question.  Clicking the ‘next’ button slides up the map panel, loads the next question in and advances the progress bar.  Users will be accessing this on many different screen sizes and I’ve tested it out on my Android phone and my iPad in both portrait and landscape orientations and all seems to work well to me.  However, the map panel will be displayed below rather than beside the questions on narrower screens.

I then began experimenting with randomly positioned markers in polygonal areas.  Initially I wanted to see whether this would be possible in ArcGIS, and a bit of Googling suggested it would be, see for example this post: http://gis.mtu.edu/?p=127 which is 10 years old, so the instructions don’t in any way match up to how things work in the current version of ArcGIS, but it at least showed it should be possible.  I loaded the desktop version of ArcGIS up via Glasgow Anywhere and after some experimentation and a fair bit of exasperation I managed to create a polygon shape and add 100 randomly placed marker points to it, which you can see here:

Something we will have to bear in mind is how such points will look when zoomed:

This is just 100 points over a pretty large geographical area.  We might end up with thousands of points, which might make this approach unusable.  Another issue is it took ArcGIS more than a minute to generate and process these 100 random points.  I don’t know how much of this is down to running the software via Glasgow Anywhere, but if we’re dealing with tens of polygons and hundreds or thousands of data points this is just not going to be feasible.

An issue of greater concern is that as far as I can tell (after more than an hour of investigation) the ‘create random points’ option is not available via ArcGIS Online, which is the tool we would need to use to generate maps to share online (if we choose to use ArcGIS).  The online version seems to be really pared back in terms of functionality compared to the desktop version and I just couldn’t see any way of incorporating the random points system.  However, I discovered a way of generating random points using Leaflet and another javascript based geospatial library called turf.js (http://turfjs.org/).  The information about how to go about it is here:  https://gis.stackexchange.com/questions/163044/mapbox-how-to-generate-a-random-coordinate-inside-a-polygon

I created a test map using the SCOSYA area for Campbeltown and the SCOSYA base map.  As a solution I’d say it’s working pretty well – it’s very fast and seems to do what we want it to.  You can view an example of the script output here:

The script generates 100 randomly placed markers each time you load the page.  At zoomed out levels the markers are too big, but I can make them smaller – this is just an initial test.  There is unfortunately going to be some clustering of markers as well, due to the nature of the random number generator.  This may give people to wrong impression.  I could maybe update the code to reject markers that are in too close proximity to another one, but I’d need to see about that.  I’d say it’s looking promising, anyway!

Week Beginning 7th December 2020

I spent most of the week working on the Anglo-Norman Dictionary as we’re planning on launching this next week and there was still much to be done before that.  One of the big outstanding tasks was to reorder all of the citations in all senses within all entries so they are listed by their date.  This was a pretty complex task as each entry may any number of up to four different types of sense:  main senses, subsenses and then main senses and subsenses within locutions.  My script needed to be able to extract the dates for each citation within each of these blocks, figure out their date order, rearrange the citations by this order and then overwrite the XML section with the reordered data.  Any loss of or mangling of the data would be disastrous and with almost 60,000 entries being updated it would not be possible to manually check that everything worked in all circumstances.

Updating the XML proved to be a little tricky as I had been manipulating the data with PHP’s simplexml functionsand it doesn’t include a facility to replace a child node.  This meant that I couldn’t tell the script to identify a sense and replace its citations with a new block.  In addition, the XML was not structured to include a ‘citations’ element that contained all of the individual citations for an entry but instead just listed each citation as an ‘attestation’ element within the sense, therefore it wasn’t straightforwardly possible to replace the clock of citations with an updated block.  Instead I needed to reconstruct the sense XML in its entirety, including both the complete set of citations and all other elements and attributes contained within the sense, such as IDs, categories and labels.  With a completely new version of the sense XML stored in memory by the script I then needed to write this to the XML, and for this I needed to use PHP’s DOM manipulation functions because (as mentioned earlier) simplexml has no means of identifying and replacing a child node.

I managed to get a version of my script working and all seemed to be well with the entries I was using for test purposes so I ran the script on the full dataset and replaced the data on the website (ensuring that I kept a record of the pre-reordered data handy in case of any problems).  When the editors reviewed the data they noticed that while the reordering had worked successfully for some senses, it had not reordered others.  This was a bit strange and I therefore had to return to my script to figure out what had gone wrong.  I noticed that only the citations in the first sense / subsense / locution sense / locution subsense had been reordered, with others being skipped.  But when I commented out the part of the script that updated the XML all senses were successfully being picked out.  This seemed strange to me as I didn’t see why the act of identifying senses should be affected by the writing of data.  After some investigation I discovered that with PHP’s simplexml implementation if you iterate through nodes using a ‘foreach’ and then update the item picked out by the loop (so for example in ‘foreach($sense as $s)’ updating $s) then subsequent iterations fail.  It would appear that updating $s in this example changes the XML string that’s loaded into memory which then means the loop reckons it’s reached the end of the matching elements and stops.  My script had different loops for going through senses / subsenses / locution senses / locution subsenses which is why the first of each type was being updated while others weren’t.  After I figured this out I updated my script to use a ‘for’ loop instead of a ‘foreach’ and stored $s within the scope of the loop only and this worked.  With the change in place I reran the script on the full dataset and uploaded it to the website and all thankfully appears to have worked.

For the rest of the week I worked through my ‘to do’ list, ticking items off. I updated the ‘Blog’ menu item to point to the existing blog site (this will eventually be migrated across).  The ‘Textbase’ menu item now loads a page stating that this feature will be added in 2021.  I managed to implement the ‘source texts’ page as it turns out that I’d already developed much of the underpinnings for this page whilst developing other features.  As with citation popups, it links into the advanced search and also to the DEAF website.  I figured out how to ensure that words with accented characters is citation searches now appear separately in the list from their non-accented versions.  E.g. a search for ‘apres*’ now has ‘apres (28)’ separate from ‘après (4)’ and ‘aprés (2229)’.  We may need to think about the ordering, though, as accented characters are currently appearing at the end of the list.  I also made the words lower case here – they were previously being transformed into upper case.  Exact searches (surrounded by quotes) are still accent-sensitive.  This is required so that the link through the list of forms to the search results works (otherwise the results display all accented and non-accented forms).  I also ensured that word highlighting in snippets in results now works as it should with accented characters and upper case initial letters are now retained too.

I added in an option to return to the list of forms (i.e. the intermediate page) from the search results.  In addition to ‘Refine your search’ there is also a ‘Select another form’ button and I ensured that the search results page still appears when there is only one search result for citation and translation searches now.  I also figured out why multiple words were sometimes being returned in the citation and translation searches.  This was because what looked like spaces between words in the XML were sometimes not regular spaces but non-breaking space characters (\u00a0).  As my script split up citations and translations on spaces these were not being picked up as divisions between words.  I needed to update my script to deal with these characters and then regenerate all of the citation and translation data again in order to fix this.

I also ensured that when conducting a label search the matching labels in an entry page are now highlighted and the page automatically scrolls down to the first matching label.  I also made several tweaks to the XSLT, ensuring that where there are no dates for citations the text ‘TBD’ appears instead and ensuring a number of tags that were not getting properly transformed were handled.

Also this week I made some final changes to the interactive map of Burns Suppers, including tweaking the site icon so it looks a bit nicer and adding in a ‘read more’ button to the intro text and fixing the scrolling issue on small screens, plus updating the text to show 17 filters.  I fixed the issue with the attendance filter and have also updated the layout of the filters so they look better on both monitors and mobile devices.

My other main task of the week was to restructure the Mapping Metaphor website based on suggestions for REF from Wendy and Carole.  This required a lot of work as the visualisations needed to be moved to different URLs and the Old English map, which was previously a separate site in a subdirectory, needed to be amalgamated with the main site.

I removed the top-level tabs that linked between MM, MMOE and MetaphorIC and also the ‘quick search’ box.  The ‘metaphor of the day’ page now displays both a main and an OE connection and the ‘Metaphor Map of English’ / ‘Metaphor Map of Old English’ text in the header has been removed.  I reworked the navigation bar in order to allow a sub-navigation bar to appear.  It is now positioned within the header and is centre-aligned.  ‘Home’ now features introductory text rather than the visualisation.  ‘About the project’ now has the new secondary menu rather than the old left-panel menu.  This is because the secondary menu on the map pages couldn’t have links in the left-hand panel as it’s already used for something else.  It’s better to have the sub-menu displaying consistently across different sections of the site.  I updated the text within several ‘About’ pages and ‘How to Use’, which also now has the new secondary menu.  The main metaphor map is now in the ‘Metaphor Map of English’ menu item.  This has sub-menu items for ‘search’ and ‘browse’.  The OE metaphor map is now in the ‘Metaphor Map of Old English’ menu item.  It also has sub-menu items for ‘search’ and ‘browse’.  The OE pages retain their purple colour to make a clear distinction between the OE map and the main one.  MetaphorIC retains the top-level navigation bar but now only features one link back to the main MM site.  This is right-aligned to avoid getting in the way of the ‘Home’ icon that appears in the top left of sub-pages.  The new site replaced the old one on Friday and I also ensured that all of the old URLs continue to work (e.g. the ‘cite this’ will continue to work.

Week Beginning 18th March 2019

This week I spent a lot of time continuing with the HT/OED linking task, tackling the outstanding items on my ‘to do’ list before I met with Marc and Fraser on Friday.  This included the following:

Re-running category pattern matching scripts on the new OED categories:  The bulk of the category matching scripts rely on matching the HT’s oedmaincat field against the OED’s path field (and then doing other things like comparing category contents).  However, these scripts aren’t really very helpful with the new OED category table as the path has changed for a lot of the categories.  The script that seemed the most promising was number 17 in our workflow document, which compares first dates of all lexemes in all unmatched OED and HT categories and doesn’t check anything else.  I’ve created an updated version of this that uses the new OED data, and the script only brings back unmatched categories that have at least one word that has a GHT date, and interestingly the new data has less unmatched categories featuring GHT dates than the old data (591 as opposed to 794).  I’m not really sure why this is, or what might have happened to the GHT dates.  The script brings back five 100% matches (only 3 more than the old data, all but one containing just one word) and 52 matches that don’t meet our criteria (down from 56 with the old data) so was not massively successful.

Ticking off all matching HT/OED lexemes rather than just those within completely matched categories: 627863 lexemes are now matched.  There are 731307 non-OE words in the HT, so about 86% of these are ticked off.  There are 751156 lexemes in the new OED data, so about 84% of these are ticked off.  Whilst doing this task I noticed another unexpected thing about the new OED data:  the number of categories in ’01’ and ‘02’ have decreased while the number in ‘03’ has increased.  In the old OED data we have the following number of matched categories:

01: 114968

02: 29077

03: 79282

In the new OED data we have the following number of matched categories:

01: 109956

02: 29069

03: 84260

The totals match up, other than the 42 matched categories that have been deleted in the new data, so (presumably) some categories have changed their top level.  Matching up the HT and OED lexemes has introduced a few additional duplicates, caused when a ‘stripped’ form means multiple words within a category match.  There aren’t too many, but they will need to be fixed manually.

Identifying all words in matched categories that have no GHT dates and see which of these can be matched on stripped form alone: I created a script to do this, which lists every unmatched OED word that doesn’t have a GHT date in every matched OED category and then tries to find a matching HT word from the remaining unmatched words within the matched HT category.  Perhaps I misunderstood what was being requested because there are no matches returned in any of the top-level categories.  But then maybe OED words that don’t have a GHT date are likely to be new words that aren’t in the HT data anyway?

Create a monosemous script that finds all unmatched HT words that are monosemous and sees whether there are any matching OED words that are also monosemous: Again, I think the script I created will need more work.  It is currently set to only look at lexemes within matched categories.  It finds all the unmatched HT words that are in matched categories, then checks how many times each word appears amongst the unmatched HT words in matched categories of the same POS. If the word only appears once then the script looks within the matched OED category to find a currently unmatched word that matches.  At the moment the script does not check to see if this word is monosemous as I figured that if the word matches and is in a matched category it’s probably a correct match.  Of the 108212 unmatched HT words in matched categories, 70916 are monosemous within their POS and of these 14474 can be matched to an OED lexeme in the corresponding OED category.

Deciding which OED dates to use: I created a script that gets all of the matched HT and OED lexemes in one of the top-level categories (e.g. 01) and then for each matched lexeme works out the largest difference between OED sortdate and HT firstd (if sortdate is later then sortdate-firstd, otherwise firstd-sortdate); works out the largest difference between OED enddate and HT lastd in the same way; adds these two differences together to work out the largest overall difference.  It then sorts the data on the largest difference and then displays all lexemes in a table ordered by largest difference, with additional fields containing the start difference, end difference and total difference for info.  I did, however, encounter a potential issue:  Not all HT lexemes have a firstd and lastd.  E.g. words that are ‘OE-‘ have nothing in firstd and lastd but instead have ‘OE’ in the ‘oe’ column and ‘_’ in the ‘current’ column.  In such cases the difference between HT and OED dates are massive, but not accurate.  I wonder whether using HT’s apps and appe columns might work better.

Looking at lexemes that have an OED citation after 1945, which should be marked as ‘current’:  I created a script that goes through all of the matched lexemes and lists all of the ones that either have an OED sortdate greater than 1945 or an OED enddate greater than 1945 where the matched HT lexeme does not have the ‘current’ flag set to ‘_’.  There are 73919 such lexemes.

On Friday afternoon I had a meeting with Marc and Fraser where we discussed the above and our next steps.  I now have a further long ‘to do’ list, which I will no doubt give more information about next week.

Other than HT duties I helped out with some research proposals this week.  Jane Stuart-Smith and Eleanor Lawson are currently putting a new proposal together and I helped to write the data management plan for this.  I also met with Ophira Gamliel in Theology to discuss a proposal she’s putting together.  This involved reading through a lot of materials and considering all the various aspects of the project and the data requirements of each, as it is a highly multifaceted project.  I’ll need to spend some further time next week writing a plan for the project.

I also had a chat to Wendy Anderson about updating the Mapping Metaphor database, and also the possibility of moving the site to a different domain.  I also met with Gavin Miller to discuss the new website I’ll be setting up for his new Glasgow-wide Medical Humanities Network, and I ran some queries on the DSL database in order to extract entries that reference the OED for some work Fraser is doing.

Finally, I had to make some changes to the links from the Bilingual Thesaurus to the Middle English dictionary website.  The site has had a makeover, and is looking great, but unfortunately when they redeveloped the site they didn’t put redirects from the old URLs to the new ones.  This is pretty bas as it means anyone who has cited or bookmarked a page will end up with broken links, not just BTh.  I would imagine entries have been cited in countless academic papers and all these citations will now be broken, which is not good.  Anyway, I’ve fixed the MED links in BTh now.  Unfortunately there are two forms of link in the database, for example: http://quod.lib.umich.edu/cgi/m/mec/med-idx?type=id&id=MED6466 and http://quod.lib.umich.edu/cgi/m/mec/med-idx?type=byte&byte=24476400&egdisplay=compact.  I’m not sure why this is the case and I’ve no idea what the ‘byte’ number refers to in the second link type.  The first type includes the entry ID, which is still used in the new MED URLs.  This means I can get my script to extract the ID from the URL in the database and then replace the rest with the new URL, so the above becomes https://quod.lib.umich.edu/m/middle-english-dictionary/dictionary/MED6466 as the target for our MED button and links directly through to the relevant entry page on their new site.

Unfortunately there doesn’t seem to be any way to identify an individual entry page for the second type of link.  This means there is no way to link directly to the relevant entry page.  However, I can link to the search results page by passing the headword, and this works pretty well.  So, for example the three words on this page: https://thesaurus.ac.uk/bth/category/?type=search&hw=2&qsearch=catourer&page=1#id=1393 have the second type of link, but if you press on one of the buttons you’ll find yourself at the search results page for that word on the MED website, e.g. https://quod.lib.umich.edu/m/middle-english-dictionary/dictionary?utf8=%E2%9C%93&search_field=hnf&q=Catourer.

 

 

Week Beginning 9th April 2018

I returned to work after my Easter holiday on Tuesday this week, making it another four-day week for me.  On Tuesday I spent some time going through my emails and dealing with some issues that had arisen whilst I’d been away.  This included sorting out why plain text versions of the texts in the Corpus of Modern Scottish Writing were giving 403 errors (it turned out the server was set up to not allow plain text files to be accessed and an email to Chris got this sorted).  I also spent some time going through the Mapping Metaphor data for Wendy.  She wanted me to structure the data to allow her to easily see which metaphors continued from Old English times and I wrote a script that gave a nice colour-coded output to show those that continued or didn’t.  I also created another script that lists the number (and the details of) metaphors that begin in each 50-year period across the full range.  In addition, I spoke to Gavin Miller about an estimate of my time for a potential follow-on project he’s putting together.

The rest of my week was split between two projects:  LinguisticDNA and REELS.  For LinguisticDNA I continued to work on the search facilities for the semantically tagged EEBO dataset.  Chris gave me a test server on Tuesday (just an old desktop PC to add to the several others I now have in my office) and I managed to get the database and the scripts I’d started working on before Easter transferred onto it.  With everything set up I continued to add new features to the search facility.  I completed the second search option (Choose a Thematic Heading and a specific book to view the most frequent words) which allowa you to specify a Thematic Heading, a book, a maximum number of returned words and whether the theme selection includes lower levels.  I also made it so that you can miss out the selection of a thematic heading to bring back all of the words in the specified book listed by frequency.  If you do this each word’s thematic heading is also listed in the output, and it’s a useful way of figuring out which thematic headings you might want to focus on.

I also added a new option to both searches 1 and 2 that allows you to amalgamate the different noun and verb types.  There are several different types (e.g. NN1 and NN2 for singular and plural forms of nouns) and it’s useful to join these together as single frequency counts  rather than having them listed separately.

I also completed search option 3 (Choose a specific book to view the most frequent Thematic Headings).  This allows the user to select a book from an autocomplete list and optionally provide a limit to the returned headings.  The results display the thematic headings found in the book listed in order of frequency.  The returned headings are displayed as links that perform a ‘search 2’ for the heading in the book, allowing you to more easily ‘drill down’ into the data.  For all results I have added in a count column, so you can easily see how many results are returned or reference a specific result, and I also added titles to the search results pages that tell you exactly what it is you’ve searched for.  I also created a list of all thematic headings, as I thought it might be handy to be able to see what’s what.  When looking at this list you can perform a ‘search 1’ for any of the headings by clicking on one, and similarly, I created an option to list all of the books that form the dataset.  This list displays each book’s ID, author, title, terms and number of pages, and you can perform a ‘search 3’ for a book by clicking on its ID.

On Friday I participated in the Linguistic DNA project conference call, following which I wrote a document describing the EEBO search facilities, as project members outside of Glasgow can’t currently access the site I’ve put together.

For REELS I continued to work on the public interface for the place-name data, which included the following:

  1. The number of returned place-names is now displayed in the ‘you searched for…’ box
  2. The textual list of results now features two buttons for each result, one to view the record and one to view the place-name on the map.  I’m hoping the latter might be quite useful as I often find an interesting name in the textual list and wonder which dot on the map it actually corresponds to.  Now with one click I can find it.
  3. Place-name labels on the map now appear when you zoom in past a certain level (currently set to zoom level 12).  Note that only results rather than grey spots get the visible labels as otherwise there’s too much clutter and the map takes ages to load too.
  4. The record page now features a map with the place-name at the centre, and all other place-names as grey dots.  The marker label is automatically visible.
  5. Returning back to the search results from a record when you’ve done a quick search now works – previously this was broken.
  6. The map zoom controls have been moved to the bottom right, and underneath them is a new icon for making the map ‘full screen’.  Pressing on this will make the map take up the whole of your screen.  Press ‘Esc’ or on the icon again to return to the regular view.  Note that this feature requires a modern web browser, although I’ve just tested in in IE on Windows 10 and it works.  Using full screen mode makes working with the map much more pleasant.  Note, however, that navigating away form the map (e.g. if you click a ‘view record’ button) will return you to the regular view.
  7. There is a new ‘menu’ icon in the top-left of the map.  Press on this and a menu slides out from the left.  This presents you with options to change how the results are categorised on the map.  In addition to the ‘by classification code’ option that has always been there, you can now categorise and colour code the markers by start date, altitude and element language.  As with code, you can turn on and off particular levels using the legend in the top right. E.g. if you only want to display markers that have an altitude of 300m or more.

 

 

Week Beginning 18th December 2017

This was a short week for me as I only worked from Monday to Wednesday due to Christmas coming along.  I spent most of Monday and Tuesday continuing to work on the Technical Plan for Joanna Kopaczyk’s proposal.  As it’s a project with quite a large technical component there was a lot to think about and lots of detail to try and squeeze into the maximum of four pages allowed for a Plan.  My first draft was five pages long, so I had to chop some information out and reformat things to try and bring the length down a bit, but thankfully I managed to get it within the limit whilst still making sense and retaining the important points.  I also chatted with Graeme some more about some of the XML aspects of the project and had an email conversation with Luca about it too.  It was good to get the Plan sent on to Joanna, although it’s still very much a first draft that will need some further tweaking as other aspects of the proposal are firmed up.

I had to fix an issue with the Thesaurus of Old English staff pages on Monday.  The ‘edit lexemes’ form was set to not allow words to be more than 21 characters long.  Jane Roberts had been trying to update the positioning of the word ‘(ge)mearcian mid . . . rōde’, and as  this is more than 21 characters any changes made to this row were being rejected.  I’m not sure why I’d set the maximum word length to 21 as the database allows up to 60 characters in this field.  But I updated the check to allow up to 60 characters and that fixed the problem.  I also spent a bit of time on Tuesday gathering some stats for Wendy about the various Mapping Metaphor resources (i.e. the main website, the blog, the iOS app and the Android app).  I also had a chat with Jane Stuart Smith about an older but still very important site that she would like me to redesign at some point next year, and I started looking through this and thinking how it could be improved.

On Wednesday, as it was my last day before the hols, I decided to focus on something from my ‘to do’ list that would be fun.  I’d been wanting to make a timeline for the Historical Thesaurus for a while so I thought I’d look into that.  What I’ve created so far is a page through which you can pass a category ID and then see all of the words in the category in a visualisation that shows when the word was used, based on the ‘apps’ and ‘appe’ fields in the database.  When a word’s ‘apps’ and ‘appe’ fields are the same it appears as a dot in the timeline, and where the fields are different the word appears as a coloured bar showing the extent of the attested usage.  Note that more complicated date structures such as ‘a1700 + 1850–‘ are not visualised yet, but could be incorporated (e.g. a dot for 1700 then a bar from 1850 to 2000).

When you hover over a dot or bar the word and its dates appear below the visualisation.  Eventually (if we’re going to use this anywhere) I would instead have this as a tool-tip pop-up sort of thing.

Here are a couple of screenshots of fitting examples for the festive season.  First up is words for ‘Be gluttonous’:

And here are words for ‘Excess in drinking’:

The next step with this would be to incorporate all subcategories for a category, with different shaded backgrounds for sections for each subcategory and a subcategory heading added in.  I’m not entirely sure where we’d link to this, though.  We could allow people to view the timeline by clicking on a button in the category browse page.  Or we might not want to incorporate it at all, as it might just clutter things up.  BTW, this is a D3 based visualisation created by adapting this code: https://github.com/denisemauldin/d3-timeline

That’s all from me for 2017.  Best wishes for Christmas and the New Year to one and all!

Week Beginning 4th December 2017

I was struck down with some sort of tummy bug at the weekend and wasn’t well enough to come into work on Monday, but I worked from home instead.  Unfortunately although I struggled through the day I was absolutely wiped out by the end of it and ended up being off work sick on Tuesday and Wednesday.  I was mostly back to full health on Thursday, which is the day I normally work from home anyway, so I made it through that day and was back to completely full health on Friday, thankfully.  So I only managed to work for three days this week, and for two of those I wasn’t exactly firing on all cylinders.  However, I still managed to get a few things done this week.

Last week I’d migrated the Mapping Metaphor blog site and after getting approval from Wendy I deleted the old site on Monday.  I took a backup of the database and files before I did so, and then I wrote a little redirect that ensures Google links and bookmarks to specific blog pages point to the correct page on the main Metaphor site.  I also had some further AHRC review duties to take care of, plus I spent some time reading through the Case for Support for Joanna Kopaczyk’s project and thinking about some of the technical implications.  Pauline Mackay also sent me a sample of an Access database she’s put together for her Scots Bawdry project.  I’m going to create an online version of this so I spent a bit of time going through it and thinking about how it would work.

I spent most of Thursday and Friday working on this new system for Pauline, and by the end of the week I had created an initial structure for the online database, had created some initial search and browse facilities and I also created some management pages to allow Pauline to add / edit / delete records.  The search page allows users to search for any combination of the following fields:

Verse title, first line, language, theme, type, ms title, publication year, place, publisher and location.  Verse title, first line and ms title are free text and will bring back any records with matching text – e.g. if you enter ‘the’ into ‘verse title’ you can find all records where these three characters appear together in a title.  Publication year allows users to search for an individual year or a range of years (e.g. 1820-1840 brings back everything that has a date between and including these years).  Language, place, publisher and location are drop-down lists that allow you to select one option.  Themes and type are checkboxes allowing you to select any number of options, with each joined by an ‘or’ (e.g. all the records that have a theme of ‘illicit love’ or ‘marriage’).  I can change any of the single selection drop-downs to multiple options (or vice versa) if required.  If multiple boxes are filled in these are joined by ‘and’ – e.g. publication place is Glasgow AND publication year is 1820.

The browse page presents all of the options in the search form as clickable lists, with each entry having a count to show you how many records match.  For ‘publication year’ only those records with a year supplied are included.  Clicking on a search or browse result displays the full record.  Any content that can be searched for (e.g. publication type) is a link and clicking on it performs a search for that thing.

For the management pages, once logged in a staff user can browse the data, which displays all of the records in one big table.  From here the user can access options to edit or delete a record.  Deleting a record simply deactivates it in the database and I can retrieve it again if required.  Users can also add new records by clicking on the ‘add new row’ link.  I also created a script for importing all of the data from the Access database and I will run this again on a more complete version of the database when Pauline is ready to import everything.  This is all just an initial version, and there will no doubt be a few changes required, but I think it’s all come together pretty well so far.

Week Beginning 27th November 2017

I was off on Tuesday this week to attend my uncle’s funeral.  I spent the rest of the week working on a number of relatively small tasks for a variety of different projects.  The Dictionary of Old English people got back to me on Monday to say they had updated their search system to allow our Thesaurus of Old English site to link directly from our word records to a search for that word on their site.  This was really great news, and I updated our site to add in the direct links.  This is going to be very useful for users of the both sites.  I spent a bit more time on AHRC review duties this week, and I also had an email discussion with Joanna Kopaczyk in English Language about a proposal she is putting together.  She sent me on the materials she is working on and I read through them all and gave some feedback about the technical aspects.  I’m going to help her to write the Technical Plan for her project soon too.  I also met with Rachel Douglas from the School of Modern Languages to offer some advice on technical matters relating to a projest she’s putting together.  Althoguh Rachel is not in my School and I therefore can’t be involved in her project it was still good to be able to give her a bit of help and show her some examples of digital outputs similar to the sorts of thing she is hoping to produce.

I also spent some further time working on the integration of OED data with the Historical Thesaurus data with Fraser.  Fraser had sent me some further categories that he and a student had manually matched up, and had also asked me to write another script that picks out all of the unmatched HT categories and all of the unmatched OED categories and for each HT category goes through all of the OED categories and finds the one with the lowest Levenshtein score (an algorithm that returns a number showing how many steps it would take to turn one string into another).  My initial version of this script wasn’t ideal, as it included all unmatched OED categories and I’d forgotten that this included several thousand that are ‘top level’ categories that don’t have a part of speech and shouldn’t be matched with our categories at all.  I also realised that the script should only compare categories that have the same part of speech, as my first version was ending up with (for example) a noun category being matched up with an adjective.  I updated the script to bear these things in mind, but unfortunately the output still doesn’t look all that useful.  However, there are definitely some real matches that can be manually picked out from the list, e.g. 31890 ‘locustana pardalina or rooibaadjie’ and ‘locustana pardalina (rooibaadjie)’ and some others around there.  Also 14149 ‘applied to weapon etc’ and ‘applied to weapon, etc’.  It’s over to Fraser again to continue with this.

I mentioned last week that I’d updated all of our WordPress sites to version 4.9, but that 4.9.1 would no doubt soon be released.  And in fact it was released this week, so I had to update all of the sites once more.  It’s a bit of a tedious task but it doesn’t really take too long – maybe about half an hour in total.  I also decided to tick an item off my long-term ‘to do’ list as I had a bit of time available.  The Mapping Metaphor site had a project blog, located at a different URL from the main site.  As the project has now ended there are no more blog posts being made so it seems a bit pointless hosting this WordPress site, and having to keep it maintained, when I could just migrate the content to the main MM website as static HTML and delete the WordPress site.  I spent some time investigating WordPress plugins that could export entire sites as static HTML, for example https://en-gb.wordpress.org/plugins/static-html-output-plugin/ and https://wordpress.org/plugins/simply-static/.  These plugins go through a WordPress site, convert all pages and posts to static HTML, pull in the WordPress file uploads folder and wrap everything up as a ZIP file.  This seemed ideal, and the tools both worked very well, but I realised they weren’t exactly what I needed.  Firstly, the Metaphor blog (which was set up before I was involved with the project) just uses page IDs in the URLs, not other sorts of permalinks.  Both the plugins don’t work with the default URL style in place, so I’d need to change the link type, meaning the new pages would have different URLs to the old pages which would be a problem for redirects.  Secondly, both plugins pull in all of the page elements, including the page design, the header and all the rest.  I didn’t actually want all of this stuff but just the actual body of the posts (plus titles and a few other details) so I could slot this into the main MM website template.  So instead of using a plugin I realised it was probably simpler and easier if I just wrote my own little export script that grabbed just the published posts (not pages), for each getting the ID, the title, the main body, the author and the date of creation.  My script hooked into the WordPress functions to make use of the ‘wpautop’ function, which adds paragraph markup to texts, and I also replaced absolute URLs with relative ones.  I then created a temporary table to hold just this data, set my script to insert into it and then I exported this table.  I imported this into the main MM site’s database and wrote a very simple script to pull out the correct post based on the passed ID and that was that. Oh, I also copied the WordPress uploads directory across too, so images and PDFs and such things embedded in posts would continue to work.  Finally, I created a simple list of posts.  It’s exactly what was required and was actually pretty simple to implement, which is a good combination.

On Thursday I heard that the Historical Thesaurus had been awarded the ‘Queen’s Anniversary Prize for Higher Education’, which is a wonderful achievement for the project.  Marc had arranged a champagne reception on Friday afternoon to celebrate the announcement, so I spent most of afternoon sipping champagne and eating chocolates, which was a nice way to end the week.

Week Beginning 23rd October 2017

After an enjoyable week’s holiday I returned to work on Monday, spending quite a bit of Monday catching up with some issues people had emailed me about whilst I was away, such as making further tweaks to the ‘Concise Scots Dictionary’ page on the DSL website for Rhona Alcorn (the page is now live if you’d like to order the book: http://dsl.ac.uk/concise-scots-dictionary/), speaking with Luca about a project he’s involved in the planning of that’s going to use some of the DSL data, helping Carolyn Jess-Cooke with some issues she was encountering when accessing one of her websites, giving some information to Brianna of the RNSN project about timeline tools we might use, and a few other such things.

I also spent some time adding paragraph IDs to the ‘Scots Language’ page of the DSL (http://dsl.ac.uk/about-scots/the-scots-language/) for Ann Fergusson to enable references to specific paragraphs to be embedded in other pages.  Implementing this was somewhat complicated by the ‘floating’ contents section on the left as when a ‘hash’ is included in a URL a browser automatically jumps to the ID of the element that has this value.  But for the contents section to float or be fixed to the top of the page depending on which section the user is viewing the page needs to load at the top for the position to be calculated.  If the page loads halfway down then the contents section remains fixed at the top of the page, which is not much use.  However, I managed to get the ‘jump to paragraph from a URL’ feature working with the floating contents section now with a bit of a hack.  Basically, I’ve made it so that the ‘hash’ that gets passed to the page doesn’t actually correspond to an element on the page, so the browser doesn’t jump anywhere.  But my JavaScript grabs the hash after the page has loaded, reworks it to a format that does match an actual element and then smoothly scrolls to this element.   I’ve tested this in Firefox, Chrome, Internet Explorer and Edge and it works pretty well.

I had a couple of queries from Wendy Anderson this week.  The first was for Mapping Metaphor.  Wendy wanted to grab all of the bidirectional metaphors in both the main and OE datasets, including all of their sample lexemes.  I wrote a script that extracted the required data and formatted it as a CSV file, which is just the sort of thing she wanted.  The second query was for all of the metadata associated with the Corpus of Modern Scots Writing texts.  A researcher had contacted Wendy to ask for a copy but although the metadata is in the database and can be viewed on a per text basis through the website, we didn’t have the complete dataset in an easy to share format.  I wrote a little script that queried the database and retrieved all of the data.  I had to do a little digging into how the database was structure in order to do this, as it is a system that wasn’t developed by me.  However, after a little bit of exploration I managed to write a script that grabbed the data about each text, including multiple authors that can be associated with each text.  I then formatted this as a CSV file and sent the outputted file to Wendy.

I met with Gary on Monday to discuss some changes to the SCOSYA atlas and CMS that he wanted me to implement ahead of an event the team are at next week.  This included adding Google Analytics to the website, updating the legend of the Atlas to make it clearer what the different rating levels meant, separating out the grey squares (which mean no data is present) and the grey circles (meaning data is present but doesn’t meet the specified criteria) into separate layers so they can be switched on and off independently of each other, making the map markers a little smaller, and adding in facilities to allow Gary to delete codes, attributes and code parents via the CMS.  This all took a fair amount of time to implement, and unfortunately I lost a lot of time on Thursday due to a very strange situation with my access to the server.

I work from home on Thursdays and I had intended to work on the ‘delete’ facilities that day, but when I came to log into the server the files and the database appeared to have reverted back to the state they were in in May – i.e. it looked like we had lost almost six months of data, plus all of the updates to the code I’d implemented during this time.  This was obviously rather worrying and I spent a lot of time toing and froing with Arts IT Support to try and figure out what had gone wrong.  This included restoring a backup from the weekend before, which strangely still seemed to reflect the state of things in May.  I was getting very concerned about this when Gary noted that he was seeing two different views of the data on his laptop.  In Safari on his laptop his view of the data appeared to have ‘stuck’ at May while in Chrome he could see the up to date dataset.  I then realised that perhaps the issue wasn’t with the server after all but instead the problem was my home PC (and Safari on Gary’s laptop) was connecting to the wrong server.  Arts IT Support’s Raymond Brasas suggested it might be an issue with my ‘hosts’ file and that’s when I realised what had happened.  As the SCOSYA domain is an ‘ac.uk’ domain and it takes a while for these domains to be set up, we had set up the server long before the domain was running, so to allow me to access the server I had added a line to the ‘hosts’ file on my PC to override what happens when the SCOSYA URL is requested.  Instead of it being resolved by a domain name service my PC pointed at the IP address of the server as I had entered it in my ‘hosts’ file.  Now in May, the SCOSYA site was moved to a new server, with a new IP address, but the old server had never been switched off, so my home PC was still connecting to this old server.  I had only encountered the issue this week because I hadn’t worked on SCOSYA from home since May.  So, it turned out there was no problem with the server, or the SCOSYA data.  I removed the line from my ‘hosts’ file, restarted my browser and immediately I could access the up to date site.  All this took several hours of worry and stress, but it was quite a relief to actually figure out what the issue was and to be able to sort it.

I had intended to start setting up the server for the SPADE project this week, but the machine has not yet been delivered, so I couldn’t work on this.  I did make a few further tweaks to the SPADE website, however, and responded to a couple of queries from Rachel about the SCOTS data and metadata, which the project will be using.

I also met with Fraser to discuss the ongoing issue of linking up the HT and OED data.  We’re at the stage now where we can think about linking up the actual words with categories.  I’d previously written a script that goes through each HT category that matches an OED category and compares the words in each, checking whether an HT word matches the next found in either the OED ‘ght_lemma’ or ‘lemma’ fields.  After our meeting I updated the HT lexeme table to include extra fields for the ID of a matching OED lexeme and whether the lexeme had been checked.  After that I updated the script to go through every matching category in order to ‘tick off’ the matching words within.  The first time I ran my script it crashed the browser, but with a bit of tweaking I got it to successfully complete the second time.  Here are some stats:

There are 655513 HT lexemes that are now matched up with an OED lexeme.  There are 47074 HT lexemes that only have OE forms, so with 793733 HT lexemes in total this means there are 91146 HT lexemes that should have an OED match but don’t.  Note, however, that we still have 12373 HT categories that don’t match OED categories and these categories contain a total of 25772 lexemes.

On the OED side of things, we have a total of 688817 lexemes, and of these 655513 now match an HT lexeme, meaning there are 33304 OED lexemes that don’t match anything.  At least some of these will also be cleared up by future HT / OED category matches.  Of the 655513 OED lexemes that now match, 243521 of them are ‘revised’.  There are 262453 ‘revised’ OED lexemes in total, meaning there are 18932 ‘revised’ lexemes that don’t currently match an HT lexeme.  I think this is all pretty encouraging as it looks like my script has managed to match up bulk of the data.  It’s just the several thousand edge cases that are going to be a bit more work.

On Wednesday I met with Thomas Widmann of Scots Language Dictionaries to discuss our plans to merge all three of the SLD websites (DSL, SLD and Scuilwab) into one resource that will have the DSL website’s overall look and feel.  We’re going to use WordPress as a CMS for all of the site other than the DSL’s dictionary pages, so as to allow SLD staff to very easily update the content of the site.  It’s going to take a bit of time to migrate things across (e.g. making a new WordPress theme based on the DSL website, create quick search widgets, updating the DSL dictionary pages to work with the WordPress theme), but we now have the basis of a plan.  I’ll try to get started on this before the year is out.

Finally this week, I responded to a request from Simon Taylor to make a few updates to the REELS system, and I replied to Thomas Clancy about how we might use existing Ordinance Survey data in the Scottish Place-Names survey.  All in all it has been a very busy week.

Week Beginning 18th September 2017

On Monday this week I spent a bit of time creating a new version of the MetaphorIC app, featuring the ‘final’ dataset from Mapping Metaphor.  This new version features almost 12,000 metaphorical connections between categories and more than 30,000 examples of metaphor.  Although the creation of the iOS version went perfectly smoothly (this time), I ran into some difficulties updating the Android app as the build process started giving me some unexplained errors.  I eventually tried dropping the Android app in order to rebuild it, but that didn’t work either and unfortunately dropping the app also deleted its icon files.  After that I had to build the app in a new location, which thankfully worked.  Also thankfully I still had the source files for the icons so I could create them again.  There’s always something that doesn’t go smoothly when publishing apps.  The new version of the app was made available on the Apple App and Google Play stores by the end of the week and you can download either version by following the links here: http://mappingmetaphor.arts.gla.ac.uk/metaphoric/.  That’s Mapping Metaphor and its follow-on project MetaphorIC completely finished now, other than the occasional tweak that will no doubt be required.

I spent the bulk of the rest of the week working on the Burns Paper Database for Ronnie Young.  Last week I started looking at the Access version of the database that Ronnie had sent me, and I’m managed to make an initial version of a MySQL database to hold the data and I created an upload script that populated this table with the data via a CSV file.  This week I met with Ronnie to discuss how to take the project further.  We agreed that rather than having an online content management system through which Ronnie would continue to update the database, he would instead continue to use his Access version and I would then run this through my ‘import’ script to replace the old online version whenever updates are required.  This is a more efficient approach as I already have an upload script and Ronnie is already used to working with his Access database.

We went through the data together and worked out which fields would need to be searchable and browseable, and how the data should be presented.  This was particularly useful as there are some consistency issues with the data, for example in how uncertain dates are recorded, which may include square brackets, asterisks, question marks, the use of ‘or’ and also date ranges.

After the meeting I set to work creating an updated structure for the database and an updated ‘import’ script that would enable the extraction and storage of the data required for search purposes.  This included creating separate tables for year searches, manuscript types, watermarks and countermarks, and also images of both the documents and the watermarks.  It took quite some time to get the import script working properly, but now that it is in place I will be able to run any updated version of the data through this in order to create a new online version.  With this in place I set to work on the actual pages for searching and browsing, viewing results and viewing an individual record.  Much of this I managed to repurpose from my previous work on The People’s Voice database of poems, which helped speed things up considerably.  The biggest issue I encountered was with working with the images of the manuscript pages.  The project contains over 1200 high-resolution images that Ronnie wants users to be able to zoom into and pan around.  In order to work with these images I had to batch process the creation of thumbnails and also the renaming of the images, as they had a mixture of upper and lower case file extensions, which causes problems for case sensitive servers.  I then had to decide on a library that would provide the required zoom and pan functionality.  Previously I’ve used OpenLayers, but this requires large images to be split into tiles, and I didn’t want to have to do this.  Instead I looked at some other JavaScript libraries.  What I really wanted was a ‘google maps’ style interface that allowed multiple levels of zoom.  Unfortunately most libraries didn’t seem to offer this.  I found one called ‘jQuery Panzoom’ (http://timmywil.github.io/jquery.panzoom/demo/) that fitted the bill, and I tried working with this for a while.  Unfortunately, my images were all very large and the pane they will be viewed in is considerably smaller, and it didn’t seem very straightforward to reposition the zoomed image so that it actually appeared visible in the pane when zoomed out by default.  Instead I tried another library called magnifier.js (http://mark-rolich.github.io/Magnifier.js/) that can be set up to have a thumbnail navigation window and a larger main window.  I spent quite a bit of time working with this library and thought everything was going to work out perfectly, but then I encountered a bug:  If you manually set the dimensions of the pane in which the zoomed in image appears and these dimensions are different to the image then the zoomed in image is distorted to fit the pane.  After investigating this issue I discovered it had been raised by someone in 2014 and had not been addressed (see https://github.com/mark-rolich/Magnifier.js/issues/4).  As a distorted image was no good I had to look elsewhere once again.  My third attempt was using the ‘Elevate Zoom’ plugin (http://www.elevateweb.co.uk/image-zoom/examples).  Thankfully I managed to get this working.  It also can be set up to have a thumbnail navigation window and then a larger pane for viewing the zoomed in image.  It can also be set up to use the mouse wheel to zoom in and out, which is ideal.  The only downside is without physical zoom controls there’s no way to zoom in and out when using a touchscreen device.  But as it’s still possible to view the full image at one zoom level I think this is good enough.  By the end of the week I had pretty much completed the online database and I emailed the details to Ronnie for feedback.

Other than the above I also did a little bit of work for the SPADE project, beginning to create a proper interface for the website with Rachel MacDonald, and I had a further chat with Gerry McKeever regarding the website for his new project.

Week Beginning 4th September 2017

I spent a lot of this week continuing with the redevelopment of the ARIES app and thankfully after laying the groundwork last week (e.g. working out the styles and the structure, implementing a couple of exercise types) my rate of progress this week was considerably improved.  In fact, by the end of the week I had added in all of the content and had completed an initial version of the web version of the app.  This included adding in some new quiz types, such as one that allows the user to reorder the sentences in a paragraph by dragging and dropping them, and also a simple multiple choice style quiz.  I also received some very useful feedback from members of the project team and made a number of refinements to the content based on this.

This included updating the punctuation quiz so that if you get three incorrect answers in a quiz a ‘show answer’ button is displayed.  Clicking on this puts in all of the answers and shows the ‘well done’ box.  This was rather tricky to implement as the script needed to reset the question, including removing all previous answers, ticks, and resetting the initial letter case as if you select a full stop the following letter is automatically capitalised.  I also implemented a workaround for answers where a space is acceptable.  These no longer count towards the final tally of correct answers, so leaving a space rather than selecting a comma can now result in the ‘well done’ message being displayed.  Again, this was rather tricky to implement and it would be good if you could test out this quiz thoroughly to make sure there aren’t any occasions where the quiz breaks.

I also improved navigation throughout the app.  I added ‘next’ buttons to all of the quizzes, which either take you to the next section, or to the next part of the quiz, as applicable.  I think this works much better than just having the option to return to the page the quiz was linked from.  I also added in a ‘hamburger’ button to the footer of every page within a section.  Pressing on this takes you to the section’s contents page, and I added ‘next’ and ‘previous’ buttons to the contents pages too, so you can navigate between sections without having to go back to the homepage.

I spent a bit of time fixing the drag / drop quizzes so that the draggable boxes were constrained to each exercise’s boundaries.  This seemed to work great until I got to the references quiz, which has quite long sections of draggable text.  With the constraint in place it became impossible for the part of the draggable button that triggers the drop to reach the boxes nearest the boundaries of the question as none of the button could pass the borders.  So rather annoyingly I had to remove this feature and just allow people to drag the buttons all over the page.  But dropping a button from one question into another will always give you an incorrect answer now, so it’s not too big a problem.

With all of this in place I’ll start working on the app version of the resource next week and will hopefully be able to submit it to the app stores by the end of the week, all being well.

In addition to my work on ARIES, I completed some other tasks for a number of other projects.  For Mapping Metaphor I created a couple of scripts for Wendy that output some statistics about the metaphorical connections in the data.  For the Thesaurus of Old English I created a facility to enable staff to create new categories and subcatetories (previously it was only possible to edit existing categories or add / edit / remove words from existing categories).  I met with Nigel Leask and some of the Curious Travellers team on Friday to discuss some details for a new post associated with this project.  I had an email discussion with Ronnie Young about the Burns database he wants me to make an online version of.  I also met with Jane Stuart-Smith and Rachel MacDonald, who is the new project RA for the SPADE project, and set up a user account for Rachel to manage the project website.  I had a chat with Graeme Cannon about a potential project he’s helping put together that may need some further technical input and I updated the DSL website and responded to a query from Ann Ferguson regarding a new section of the site.

I also spent most of a day working on the Edinburgh Gazetteer project, during which I completed work on the new ‘keywords’ feature.  It was great to be able to do this as I had been intending to work on this last week but just didn’t have the time.  I took Rhona’s keywords spreadsheet, which had page ID in one column and keywords separated by a semi-colon in another and created two database tables to hold the information (one for information about keywords and a joining table to link keywords to individual pages).  I then wrote a little script that went through the spreadsheet, extracted the information and added it to my database.  I then set to work on adding the actual feature to the website.

The index page of the Gazetteer now has a section where all of the keywords are listed.  There are more than 200 keywords so it’s a lot of information.  Currently the keywords appear like ‘bricks’ in a scrollable section, but this might need to be updated as it’s maybe a bit much information.  If you click on a keyword a page loads that lists all of the pages that the keyword is associated with.  When you load a specific page, either from the keyword page or from the regular browse option, there’s now a section above the page image that lists the associated keywords.  Clicking on one of these loads the keyword’s page, allowing you to access any other pages that are associated with it.  It’s a pretty simple system but it works well enough.  The actual keywords need a bit of work, though, as some are too specific and there are some near duplications due to typos and things like that.  Rhona is going to send me an updated spreadsheet and I will hopefully upload this next week.

Oh yes, it was five years ago this week that I started in this post.  How time flies.