Week Beginning 14th December 2020

This was my last week before the Christmas holidays, and it was a four-day week as I’d taken Friday off to use up some unspent holidays.  Despite only being four days long it was a very hectic week, as I had lots of loose ends to tie up before the launch of the new Anglo-Norman Dictionary website on Wednesday.  This included tweaking the appearance of ‘Edgloss’ tags to ensure they always have brackets (even if they don’t in the XML), updating the forms to add line breaks between parts of speech and updating the source texts pop-ups and source texts page to move the information about the DEAF website.

I also added in a lot of the ancillary page data, including the help text, various essays, the ‘history’ page, copyright and privacy pages, the memorial lectures and the multi-section ‘introduction to the AND’.  I didn’t quite manage to get all of the links working in the latter and I’ll need to return to this next year.  I also overhauled the homepage and footer, adding in the project’s Twitter feed, a new introduction and adding links to Twitter and Facebook to the footer.

I also identified and fixed an error with the label translations, which were sometimes displaying the wrong translation.  My script that extracted the labels was failing to grab the sense ID for subsenses.  This ID is only used to pull out the appropriate translation, but because of the failure the ID of the last main sense was being used instead.  I therefore had to update my script and regenerate the translation data.  I also updated the label search to add in citations as well as translations.  This means the search results page can get very long as both labels and translations are applied at sense level, so we end up with every citation in a matching sense listed, but apparently this is what’s wanted.

I also fixed the display of ‘YBB’ sources, which for some unknown reason are handled differently to all other sources in the system and fixed the issue with deviant forms and their references and parts of speech.

On Wednesday we made the site live, replacing the old site with the new one, which you can now access here:  https://anglo-norman.net/.  It wasn’t entirely straightforward to get the DNS update working, but we got there in the end, and after making some tweaks to paths and adding in Google Analytics the site was ready to use, which is quite a relief.  There is still a lot of work to do on the site, but I’m very happy with the progress I’ve made with the site since I began the redevelopment in October.

Also this week I set up a new website for phase two of the ‘Editing Burns for the 21st Century’ project and upgraded all of the WordPress sites I manage to the most recent version.  I also arranged a meeting with Jane Stuart-Smith to discuss a new project in the New Year, replied to Kirsteen McCue about a proposal she’s finishing off, replied to Simon Taylor about a new place-name project he wants me to be involved with and replied to Carolyn Jess-Cooke about a project of hers that will be starting next year.

That’s all for 2020.  Here’s hoping 2021 is not going to be quite so crazy!

Week Beginning 7th December 2020

I spent most of the week working on the Anglo-Norman Dictionary as we’re planning on launching this next week and there was still much to be done before that.  One of the big outstanding tasks was to reorder all of the citations in all senses within all entries so they are listed by their date.  This was a pretty complex task as each entry may any number of up to four different types of sense:  main senses, subsenses and then main senses and subsenses within locutions.  My script needed to be able to extract the dates for each citation within each of these blocks, figure out their date order, rearrange the citations by this order and then overwrite the XML section with the reordered data.  Any loss of or mangling of the data would be disastrous and with almost 60,000 entries being updated it would not be possible to manually check that everything worked in all circumstances.

Updating the XML proved to be a little tricky as I had been manipulating the data with PHP’s simplexml functionsand it doesn’t include a facility to replace a child node.  This meant that I couldn’t tell the script to identify a sense and replace its citations with a new block.  In addition, the XML was not structured to include a ‘citations’ element that contained all of the individual citations for an entry but instead just listed each citation as an ‘attestation’ element within the sense, therefore it wasn’t straightforwardly possible to replace the clock of citations with an updated block.  Instead I needed to reconstruct the sense XML in its entirety, including both the complete set of citations and all other elements and attributes contained within the sense, such as IDs, categories and labels.  With a completely new version of the sense XML stored in memory by the script I then needed to write this to the XML, and for this I needed to use PHP’s DOM manipulation functions because (as mentioned earlier) simplexml has no means of identifying and replacing a child node.

I managed to get a version of my script working and all seemed to be well with the entries I was using for test purposes so I ran the script on the full dataset and replaced the data on the website (ensuring that I kept a record of the pre-reordered data handy in case of any problems).  When the editors reviewed the data they noticed that while the reordering had worked successfully for some senses, it had not reordered others.  This was a bit strange and I therefore had to return to my script to figure out what had gone wrong.  I noticed that only the citations in the first sense / subsense / locution sense / locution subsense had been reordered, with others being skipped.  But when I commented out the part of the script that updated the XML all senses were successfully being picked out.  This seemed strange to me as I didn’t see why the act of identifying senses should be affected by the writing of data.  After some investigation I discovered that with PHP’s simplexml implementation if you iterate through nodes using a ‘foreach’ and then update the item picked out by the loop (so for example in ‘foreach($sense as $s)’ updating $s) then subsequent iterations fail.  It would appear that updating $s in this example changes the XML string that’s loaded into memory which then means the loop reckons it’s reached the end of the matching elements and stops.  My script had different loops for going through senses / subsenses / locution senses / locution subsenses which is why the first of each type was being updated while others weren’t.  After I figured this out I updated my script to use a ‘for’ loop instead of a ‘foreach’ and stored $s within the scope of the loop only and this worked.  With the change in place I reran the script on the full dataset and uploaded it to the website and all thankfully appears to have worked.

For the rest of the week I worked through my ‘to do’ list, ticking items off. I updated the ‘Blog’ menu item to point to the existing blog site (this will eventually be migrated across).  The ‘Textbase’ menu item now loads a page stating that this feature will be added in 2021.  I managed to implement the ‘source texts’ page as it turns out that I’d already developed much of the underpinnings for this page whilst developing other features.  As with citation popups, it links into the advanced search and also to the DEAF website.  I figured out how to ensure that words with accented characters is citation searches now appear separately in the list from their non-accented versions.  E.g. a search for ‘apres*’ now has ‘apres (28)’ separate from ‘après (4)’ and ‘aprés (2229)’.  We may need to think about the ordering, though, as accented characters are currently appearing at the end of the list.  I also made the words lower case here – they were previously being transformed into upper case.  Exact searches (surrounded by quotes) are still accent-sensitive.  This is required so that the link through the list of forms to the search results works (otherwise the results display all accented and non-accented forms).  I also ensured that word highlighting in snippets in results now works as it should with accented characters and upper case initial letters are now retained too.

I added in an option to return to the list of forms (i.e. the intermediate page) from the search results.  In addition to ‘Refine your search’ there is also a ‘Select another form’ button and I ensured that the search results page still appears when there is only one search result for citation and translation searches now.  I also figured out why multiple words were sometimes being returned in the citation and translation searches.  This was because what looked like spaces between words in the XML were sometimes not regular spaces but non-breaking space characters (\u00a0).  As my script split up citations and translations on spaces these were not being picked up as divisions between words.  I needed to update my script to deal with these characters and then regenerate all of the citation and translation data again in order to fix this.

I also ensured that when conducting a label search the matching labels in an entry page are now highlighted and the page automatically scrolls down to the first matching label.  I also made several tweaks to the XSLT, ensuring that where there are no dates for citations the text ‘TBD’ appears instead and ensuring a number of tags that were not getting properly transformed were handled.

Also this week I made some final changes to the interactive map of Burns Suppers, including tweaking the site icon so it looks a bit nicer and adding in a ‘read more’ button to the intro text and fixing the scrolling issue on small screens, plus updating the text to show 17 filters.  I fixed the issue with the attendance filter and have also updated the layout of the filters so they look better on both monitors and mobile devices.

My other main task of the week was to restructure the Mapping Metaphor website based on suggestions for REF from Wendy and Carole.  This required a lot of work as the visualisations needed to be moved to different URLs and the Old English map, which was previously a separate site in a subdirectory, needed to be amalgamated with the main site.

I removed the top-level tabs that linked between MM, MMOE and MetaphorIC and also the ‘quick search’ box.  The ‘metaphor of the day’ page now displays both a main and an OE connection and the ‘Metaphor Map of English’ / ‘Metaphor Map of Old English’ text in the header has been removed.  I reworked the navigation bar in order to allow a sub-navigation bar to appear.  It is now positioned within the header and is centre-aligned.  ‘Home’ now features introductory text rather than the visualisation.  ‘About the project’ now has the new secondary menu rather than the old left-panel menu.  This is because the secondary menu on the map pages couldn’t have links in the left-hand panel as it’s already used for something else.  It’s better to have the sub-menu displaying consistently across different sections of the site.  I updated the text within several ‘About’ pages and ‘How to Use’, which also now has the new secondary menu.  The main metaphor map is now in the ‘Metaphor Map of English’ menu item.  This has sub-menu items for ‘search’ and ‘browse’.  The OE metaphor map is now in the ‘Metaphor Map of Old English’ menu item.  It also has sub-menu items for ‘search’ and ‘browse’.  The OE pages retain their purple colour to make a clear distinction between the OE map and the main one.  MetaphorIC retains the top-level navigation bar but now only features one link back to the main MM site.  This is right-aligned to avoid getting in the way of the ‘Home’ icon that appears in the top left of sub-pages.  The new site replaced the old one on Friday and I also ensured that all of the old URLs continue to work (e.g. the ‘cite this’ will continue to work.

Week Beginning 30th November 2020

I took Friday off again this week as I needed to go and collect a new pair of glasses from my opticians in the West End, which is quite a trek from my house.  Although I’d taken the day off I ended up working for about three hours as on Thursday Fraser Dallachy emailed me to ask about the location of the semantically tagged EEBO dataset that we’d worked on a couple of years ago.  I didn’t have this at home but I was fairly certain I had it on a computer in my office so I decided to take the opportunity to pop in and locate the data.  I managed to find a 10Gb tar.gz file containing the data on my desktop PC, along with the unzipped contents (more than 25,000 files) in another folder.  I’d taken an empty external hard drive with me and began the process of copying the data, which took hours.  I’d also remembered that I’d developed a website where the tagged data could be searched and that this was on the old Historical Thesaurus server, but unfortunately it no longer seemed to be accessible.  I also couldn’t seem to find the code or data for it on my desktop PC, but I remembered the previously I’d set up one of the four old desktop PCs I have sitting in my office as a server and the system was running on this.  It took me a while to get the old PC connected and working, but I managed to get it to boot up.  It didn’t have a GUI installed so everything needed to be done at the command line, but I located the code and the database.  I had planned to copy this to a USB stick, but the server wasn’t recognising USB drives (in either NTFS or FAT format) so I couldn’t actually get the data off the machine.  I decided therefore to install Ubuntu Linux on a bootable USB stick and to get the old machine to boot into this rather than run the operating system on the hard drive.  Thankfully this worked and I could then access the PC’s hard drive from the GUI that ran from the USB stick.  I was able to locate the code and the data and copy them onto the external hard drive, which I then left somewhere that Fraser would be able to access it.  Not a bad bit of work for a supposed holiday.

As with previous week’s I split my time mostly between the Anglo-Norman Dictionary and the Dictionary of the Scots Language.  For the AND I finally updated the user interface.  I added in the AND logo and updated the colour schemes to reflect the colours used in the logo.  I’m afraid the colours used in the logo seem to be straight out of a late 1990s website so unfortunately the new interface has that sort of feel about it too.  The header area now has a white background as the logo needs a white background to work.  The ‘quick search’ option is now better positioned and there is a new bar for the navigation buttons.  The active navigation button and other site buttons are now the ‘A’ red, panels are generally the ‘N’ blue and the footer is the ‘D’ green.  The main body is now slightly grey so that the entry area stands out from it.  I replaced the header font (Cinzel) with a Cormorant Garamond as this more closely resembles the font used in the logo.

The left-hand panel has been reworked so that entries are smaller and their dates and right-aligned.  I also added stripes to make it easier to keep your eye on an entry and its date.  The fixed header that appears when you scroll down a longer entry now features the AND logo/  The ‘Top’ button that appears when you scroll down a long entry now appears to the right so it doesn’t interfere with the left-hand panel.  The footer now only features the logos for Aberystwyth and AHRC and these appear on the right, with links to some pages on the left.

I have also updated the appearance of the ‘Try and Advanced Search’ button so it only appears on the ‘quick search’ results page (which is what should have happened originally).  I also removed the semantic tags that were added from the XML but need to be edited out of the XML.  I have also ticked a few more things off my ‘to do’ list, including replacing underscores with spaces in parts of speech and language tags and replacing ‘v.a.’ and ‘v.n’ as requested.  I also updated the autocomplete styles (when you type into the quick search box) so they fit in with the site a bit better.

I then began looking into reordering the citations in the entries so they appear in date order within their senses, but I remembered that Geert wanted some dates to be batch processed and realised that this should be attempted first.  I had a conversation with Geert about this, but the information he sent wasn’t well structured enough to be used and it looks like the batch updating of dates will need to wait until after the launch.  Instead I moved on to updating the source text pop-ups in the entry.  These feature links to the DEAF website and a link to search the AND entries for all others that feature the source.

On the old site the DEAF links linked through to another page on the old site that included the DEAF text and then linked through to the DEAF website.  I figured it would be better to cut out this middle stage and link directly through to DEAF.  This meant figuring out which DEAF page should be linked to and formatting the link so their page jumps to the right place.  I also added in a note about the link under it.

This was pretty straightforward but the ‘AND Citations’ link was not.  On the old site clicking on this link ran a search that displayed the citations.  We had nothing comparable to this developed for the new site.  So I needed to update the citation search to allow the user to search based on the sigla (source text).  This in turn meant updating my citations table to add a field for holding the citation siglum and regenerating the citations and citation search words and then updating the API to allow a citation search to be limited by a siglum ID.  I then updated the ‘Citations’ tab of the ‘Advanced Search’ page to add a new box for ‘citation siglum’.  This is an autocomplete box – you type some text and a list of matching sigla are displayed, from which you can select one.  This in turn meant updating the API to allow the sigla to be queried for this autocomplete.  But for example type in ‘a-n’ into the box and a list of all sigla containing this are displayed.  Select ‘A-N Falconry’ and you can then find all entries where this siglum appears.  You can also combine this with citation text and date (although the latter won’t be much use).

I’ve also tweaked the search results tab on the entry page so that the up and down buttons don’t appear if you’re at the top or bottom of the results, and I’ve ensured if you’re looking at an entry towards the end of the results a sufficient number of results before the one you’re looking at appear.  I’ve also ensured that the entry lemma and hom appear in the <title> of the web page (in the browser tab) so you can easily tell which tab contains which entry.  Here’s a screenshot of the new interface:

For the DSL I spent some time answering emails about a variety of issues.  I also completed my work on the issue of accents in the search, updating the search forms so that any accented characters that a user adds in are converted to their non-accented version before the search runs, ensuring someone searching for ‘Privacé’ will find all ‘privace’ in the full text.  I also tweaked the wording of the search results to remove the ‘supplementary’ text from it as all supplementary items have now either been amalgamated or turned into main entries.  I also put in redirects from all of the URLs for the deleted child entries to their corresponding main entries.  This was rather time consuming to do as I needed to go through each deleted child entry, get each of its URLs, then get the main URL of the corresponding main entry, add these to a new database table and then add a new endpoint to the V4 API that accepts a child URL, then checks the database for any main URL and returns this.  Then I needed to update the entry page so that the URL is passed to this new redirect checking API endpoint and if it matches a deleted item the page needs to redirect to the proper URL.

Also this week I had a conversation with Wendy Anderson about updates to the Mapping Metaphor website.  I had thought these would just be some simple tweaks to the text of existing pages, but instead the site structure needs to be updated, which might prove to be tricky.  I’m hoping to be able to find the time to do this next week.

Finally, I continued to work on the Burns Supper map, adding in the remaining filters.  I also fixed a few dates, added in the introductory text and a ‘favicon’.  I still need to work on the layout a bit, which I’ll hopefully do next week, but the bulk of the work for the map is now complete.