Week Beginning 6th September 2021

I spent more than a day this week preparing my performance and development review form.  It’s the first time there’s been a PDR since before covid and it took some time to prepare everything.  Thankfully this blog provides a good record of everything I’ve done so I could base my form almost entirely on the material found here, which helped considerably.

Also this week I investigated and fixed an issue with the SCOTS corpus for Wendy Anderson.  One of the transcriptions of two speakers had the speaker IDs the wrong way round compared to the IDs in the metadata.  This was slightly complicated to sort out as I wasn’t sure whether it was better to change the participant metadata to match the IDs used in the text or vice-versa.  It turned out to be very difficult to change the IDs in the metadata as they are used to link numerous tables in the database, so instead I updated the text that’s displayed.  Rather strangely, the ‘download plan text’ file contained different incorrect IDs.  I fixed this as well, but it does make me worry that the IDs might be off in other plain text transcriptions too.  However, I looked at a couple of others and they seem ok, though, so perhaps it’s an isolated case.

I was contacted this week by a lecturer in English Literature who is intending to put a proposal together for a project to transcribe an author’s correspondence, and I spent some time writing a lengthy email with home helpful advice.  I also spoke to Jennifer Smith about her ‘Speak for Yersel’ project that’s starting this month, and we arranged to have a meeting the week after next.  I also spent quite a bit of time continuing to work on mockups for the STAR project’s websites based on feedback I’d received on the mockups I completed last week.  I created another four mockups with different colours, fonts and layouts, which should give the team plenty of options to decide from.  I also received more than a thousand new page images of library registers for the Books and Borrowing project and processed these and uploaded them to the server.  I’ll need to generate page records for them next week.

Finally, I continued to make updates to the Textbase search facilities for the Anglo-Norman Dictionary.  I updated genre headings to make them bigger and bolder, with more of a gap between the heading and the preceding items.  I also added a larger indent to the items within a genre and reordered the genres based on a new suggested order.  For each book I included the siglum as a link through to the book’s entry on the bibliography page and in the search results where a result’s page has an underscore in it the reference now displays volume and page number (e.g. 3_801 displays as ‘Volume 3, page 801’).  I updated the textbase text page so that page dividers in the continuous text also display volume and page in such cases.

Highlighted terms in the textbase text page no longer have padding around them (which was causing what looked like spaces when the term appears mid-word).  The text highlighting is unfortunately a bit of a blunt instrument, as one of the editors discovered by searching for the terms ‘le’ and fable’:  term 1 is located and highlighted first, then term 2 is.  In this example the first term is ‘le’ and the second term is ‘fable’.  Therefore the ‘le’ in ‘fable’ is highlighted during the first sweep and then ‘fable’ itself isn’t highlighted as it has already been changed to have the markup for the ‘le’ highlighting added to it and no longer matches ‘fable’.  Also, ‘le’ is matching some HTML tags buried in the text (‘style’), which is then breaking the HTML, which is why some HTML is getting displayed.  I’m not sure much can be done about any of this without a massive reworking of things, but it’s only an issue when searching for things like ‘le’ rather than actual content words so hopefully it’s not such a big deal.

The editor also wondered whether it would be possible to add in an option for searching and viewing multiple terms altogether but this would require me to rework the entire search and it’s not something I want to tackle if I can avoid it.  If a user wants to view the search results for different terms they can select two terms then open the full results in a new tab, repeating the process for each pair of terms they’re interested in, switching from tab to tab as required. Next week I’ll need to rename some of the textbase texts and split one of the texts into two separate texts, which is going to require me to regenerate the entire dataset.