Week Beginning 20th September 2021

This was a four-day week for me as I’d taken Friday off.  I went into my office at the University on Tuesday to have my Performance and Development Review with my line-manager Marc Alexander.  It was the first time I’d been at the University since before the summer and it felt really different to the last time – much busier and more back to normal, with lots of people in the building and a real bustle to the West End.  My PDR session was very positive and it was great to actually meet a colleague in person again – the first time I’d done so since the first lockdown began.  I spent the rest of the day trying to get my office PC up to date after months of inaction.  One of the STELLA apps (the Grammar one) had stopped working on iOS devices, seemingly because it was still a 32-bit app, and I wanted to generate a new version of it.  This meant upgrading MacOS on my dual-boot PC, which I hadn’t used for years and was very out of date.  I’m still not actually sure whether the Mac I’ve got will support a version of MacOS that will allow me to engage in app development, as I need to incrementally upgrade the MacOS version, which takes quite some time, and by the end of the day there were still further updates required.  I’ll need to continue with this another time.

I spent quite a bit of the remainder of the week working on the new ‘Speak for Yersel’ project.  We had a team meeting on Monday and a follow-up meeting on Wednesday with one of the researchers involved in the Manchester Voices project (https://www.manchestervoices.org/) who very helpfully showed us some of the data collection apps they use and some of the maps that they generate.  It gave us a lot to think about, which was great.  I spent some further time looking through other online map examples, such as the New York Times dialect quiz (https://www.nytimes.com/interactive/2014/upshot/dialect-quiz-map.html) and researching how we might generate the maps we’d like to see.  It’s going to take quite a bit more research to figure out how all of this is going to work.

Also this week I spoke to the Iona place-names people about how their conference in December might be moved online and fixed a permissions issue with the Imprints of New Modernist Editing website and discussed the domain name for the STAR project with Eleanor Lawson.  I also had a chat with Luca Guariento about the restrictions we have on using technologies on the servers in the College of Arts and how this might be addressed.

I also received a spreadsheet of borrowing records covering five registers for the Books and Borrowing project and went through it to figure out how the data might be integrated with our system.  The biggest issue is figuring out which page each record is on.  In the B&B system each borrowing record must ‘belong’ to a page, which in turn ‘belongs’ to a register.  If a borrowing record has no page it can’t exist in the system.  In this new data only three registers have a ‘Page No.’ column and not every record in these registers has a value in this column.  We’ll need to figure out what can be done about this, because as I say, having a page is mandatory in the B&B system.  We could use the ‘photo’ column as this is present across all registers and every row.  However, I noticed that there are multiple photos per page, e.g. for SL137144 page 2 has 2 photos (4538 and 4539) so photo IDs don’t have a 1:1 relationship with pages.  If we can think of a way to address the page issue then I should be able to import the data.

Finally, I continued to work on the Anglo-Norman Dictionary project, fixing some issues relating to yoghs in the entries and researching a potentially large issue relating to the extraction of earliest citation dates.  Apparently there are a number of cases when the date for a citation that should be used is not the date as coded in the date section of the citation’s XML, but should instead be a date taken from a manuscript containing a variant form within the citation.  The problem is there is no flag to state when this situation occurs, instead it occurs whenever the form of the word in the citation is markedly different within the citation but similar in the variant text.  It seems unlikely that an automated script would be able to ascertain when to use the variant date as there is just so much variation between the forms.  This will need some further investigation, which I hope to be able to do next week.

Week Beginning 13th September 2021

This week I attended a team meeting for the STAR project (via Teams) where we discussed how the project is progressing in these early weeks, including the various mockups I’d made for the main and academic sites.  We decided to mix and match various elements from the different mockups and I spent a bit of time making a further two possibilities based on our discussions.  I also had an email conversation with Jennifer Smith and E Jamieson about some tweaks to how data are accessed in the Scots Syntax Atlas, but during our discussions it turned out that what they wanted was already possible with the existing systems, so I didn’t need to do anything, which was great.  I also had a chat with Marc Alexander about my PDR, which I will be having next week, actually in person which will be the first time I’ve seen anyone from work in the flesh since the first lockdown began.  Also this week I read through all of the documentation for the ‘Speak for Yersel’ project, which begins this month.  We have a project meeting via Zoom next Monday, and I’ll be developing the systems and website for the project later on this year.

I spent the rest of the week continuing to work on the Anglo-Norman Dictionary site.  There was an issue with the server during the week and it turned out that database requests from the site to the AND API were being blocked at server level.  I had to speak to Arts IT Support about this and thankfully all was reinstated.  It also transpired that the new server we’d ordered for the AND had been delivered in August and I had to do some investigation to figure out what had happened to it.  Hopefully Arts IT Support will be able to set it up in the next couple of weeks and we’ll be able to migrate the AND site and all of its systems over to the new hardware soon after.  This should hopefully help with both the stability of the site and its performance.

I also made a number of updates to both the systems and the content of the Textbase this week, based on feedback I received last week.  Geert wanted one of the texts to be split into two individual texts so that they aligned with the entries in the bibliography, and it took a bit of time to separate them out.  This required splitting the XML files and also updating all of the page records and search data relating to the pages.  Upon completion it all seems to be working fine, both for viewing and searching.  I also updated a number of the titles of the Textbase texts, removed the DEAF link sections from the various Textbase pages and ensured that the ‘siglum’ (the link through to the AND bibliography with an abbreviated form of text’s title) appeared in both the list of texts within the search form and in the text names that appear in the search results.  I also changed all occurrences of ‘AND source’ to ‘AND bibliography’ for consistency and removed one of the texts that had somehow been added to the Textbase when it shouldn’t have (probably because someone many moons ago had uploaded its XML file to the text directory temporarily and had then forgotten to remove it).

Week Beginning 6th September 2021

I spent more than a day this week preparing my performance and development review form.  It’s the first time there’s been a PDR since before covid and it took some time to prepare everything.  Thankfully this blog provides a good record of everything I’ve done so I could base my form almost entirely on the material found here, which helped considerably.

Also this week I investigated and fixed an issue with the SCOTS corpus for Wendy Anderson.  One of the transcriptions of two speakers had the speaker IDs the wrong way round compared to the IDs in the metadata.  This was slightly complicated to sort out as I wasn’t sure whether it was better to change the participant metadata to match the IDs used in the text or vice-versa.  It turned out to be very difficult to change the IDs in the metadata as they are used to link numerous tables in the database, so instead I updated the text that’s displayed.  Rather strangely, the ‘download plan text’ file contained different incorrect IDs.  I fixed this as well, but it does make me worry that the IDs might be off in other plain text transcriptions too.  However, I looked at a couple of others and they seem ok, though, so perhaps it’s an isolated case.

I was contacted this week by a lecturer in English Literature who is intending to put a proposal together for a project to transcribe an author’s correspondence, and I spent some time writing a lengthy email with home helpful advice.  I also spoke to Jennifer Smith about her ‘Speak for Yersel’ project that’s starting this month, and we arranged to have a meeting the week after next.  I also spent quite a bit of time continuing to work on mockups for the STAR project’s websites based on feedback I’d received on the mockups I completed last week.  I created another four mockups with different colours, fonts and layouts, which should give the team plenty of options to decide from.  I also received more than a thousand new page images of library registers for the Books and Borrowing project and processed these and uploaded them to the server.  I’ll need to generate page records for them next week.

Finally, I continued to make updates to the Textbase search facilities for the Anglo-Norman Dictionary.  I updated genre headings to make them bigger and bolder, with more of a gap between the heading and the preceding items.  I also added a larger indent to the items within a genre and reordered the genres based on a new suggested order.  For each book I included the siglum as a link through to the book’s entry on the bibliography page and in the search results where a result’s page has an underscore in it the reference now displays volume and page number (e.g. 3_801 displays as ‘Volume 3, page 801’).  I updated the textbase text page so that page dividers in the continuous text also display volume and page in such cases.

Highlighted terms in the textbase text page no longer have padding around them (which was causing what looked like spaces when the term appears mid-word).  The text highlighting is unfortunately a bit of a blunt instrument, as one of the editors discovered by searching for the terms ‘le’ and fable’:  term 1 is located and highlighted first, then term 2 is.  In this example the first term is ‘le’ and the second term is ‘fable’.  Therefore the ‘le’ in ‘fable’ is highlighted during the first sweep and then ‘fable’ itself isn’t highlighted as it has already been changed to have the markup for the ‘le’ highlighting added to it and no longer matches ‘fable’.  Also, ‘le’ is matching some HTML tags buried in the text (‘style’), which is then breaking the HTML, which is why some HTML is getting displayed.  I’m not sure much can be done about any of this without a massive reworking of things, but it’s only an issue when searching for things like ‘le’ rather than actual content words so hopefully it’s not such a big deal.

The editor also wondered whether it would be possible to add in an option for searching and viewing multiple terms altogether but this would require me to rework the entire search and it’s not something I want to tackle if I can avoid it.  If a user wants to view the search results for different terms they can select two terms then open the full results in a new tab, repeating the process for each pair of terms they’re interested in, switching from tab to tab as required. Next week I’ll need to rename some of the textbase texts and split one of the texts into two separate texts, which is going to require me to regenerate the entire dataset.

Week Beginning 30th August 2021

This week I completed work on the proximity search of the Anglo-Norman textbase.  Thankfully the performance issues I’d feared might crop up haven’t occurred at all.  The proximity search allows you to search for term 1 up to 10 words to the left or right of term 2 using ‘after’ or ‘before’.  If you select ‘after or before’ then (as you might expect) the search looks 10 words in each direction.  This ties in nicely with the KWIC display, which displays 10 words either side of your term.  As mentioned last week, unless you search for exact terms (surrounded by double quotes) you’ll reach an intermediary page that lists all possible matching forms for terms 1 and 2.  Select one of each and you can press the ‘Continue’ button to perform the actual search.  What this does is finds all occurrences of term 2 (term 2 is the fixed anchor point, it’s term 1 that can be variable in position), then for each it checks the necessary words before or after (or before and after) the term for the presence of term 1.  When generating the search words I generated and stored the position the word appears on the page, which made it relatively easy to pinpoint nearby words.  What is trickier is dealing with words near the beginning or the end of a page, as in such cases the next or previous page must also then be looked at.  I hadn’t previously generated a total count of the number of words on a page, which was needed to ascertain whether a word was close to the end of the page, so I ran a script that generated and stored the word count for each page.  The search seems to be working as it should for words near the beginning and end of a page.

The results page is displayed in the same way as the regular search, complete with KWIC and sorting options.  Both terms 1 and 2 are bold, and if you sort the results the relevant numbered word left or right of term 2 is highlighted, as with the regular search.  When you click through to the actual text all occurrences of both term 1 and term 2 are highlighted (not just those in close proximity), but the page centres on the part of the text that meets the criteria, so hopefully this isn’t a problem – it is quite useful to see other occurrences of the terms after all.  There are still some tweaks I need to make to the search based on feedback I received during the week, and I’ll look at these next week, but on the whole the search facility (and the textbase facility in general) is just about ready to launch, which is great as it’s the last big publicly facing feature of the AND that I needed to develop.

Also this week I spent some time working on the Books and Borrowing project.  I created a new user account for someone who will be working for the project and I also received the digitised images for another library register, this time from the NLS.  I downloaded these and then uploaded them to the server, associating the images with the page records that were already in the system.  The process was a little more complicated and time consuming than I’d anticipated as the register has several blank pages in it that are not in our records but have been digitised.  Therefore the number of page images didn’t match up with the number of pages, plus page images were getting associated with the wrong page.  I had to manually look through the page images and delete the blanks, but I was still off by one image.  I then had to manually check through the contents of the images to compare them with the transcribed text to see where the missing image should have gone.  Thankfully I managed to track it down and reinstate it (it had one very faint record on it, which I hadn’t noticed when viewing and deleting blank thumbnails).  With that in place all images and page records aligned and I could made the associations in the database.  I also sent Gerry McKeever the zipped up images (several gigabytes) for a couple of the St Andrews registers as he prefers to have the complete set when working on the transcriptions.

I had a meeting with Gerry Carruthers and Pauline McKay this week to discuss further developments of the ‘phase 2’ Burns website, which they are hoping to launch in the new year, and also to discuss the hosting of the Scottish theatre studies journal that Gerry is sorting out.

I spent the rest of the week working on mockups for the two websites for the STAR speech and language therapy project.  Firstly there’s the academic site.  The academics site is going to sit alongside Seeing Speech and Dynamic Dialects, and as such it should have the same interface as these sites.  Therefore I’ve made a site that is pretty much identical in terms of the overall theme.  I added in a new ‘site tab’ for the site that sits at the top of the page and have added in the temporary logo as a site logo and favicon (the latter may need a dark background to make it stand out).  I created menu items for all of the items in Eleanor Lawson’s original mockup image.  These all work – leading to empty pages for now and added the star logo to the ‘Star in-clinic’ menu item as in the mockup too.  In the footer I made a couple of tweaks to the layout – the logos are all centre aligned and have a white border.  I added in the logo for Strathclyde and have only included the ESRC logo, but can add others in if required.  The actual content of the homepage is identical to Seeing Speech for now – I haven’t changed any images or text.

For the clinic website I’ve taken Eleanor’s mockup as a starting point again and have so far made two variations.  I will probably work on at least one more different version (with multiple variations) next week.  I haven’t added in the ‘site tabs’ to either version as I didn’t want to clutter things up, and I’m imagining that there will be a link somewhere to the STAR academic site for those that want it, and from there people would be able to find Seeing Speech and Dynamic Dialects.  The first version of the mockup has a top-level menu bar (we will need such a menu listing the pages the site features otherwise people may get confused) then the main body of the page is the blue, as in the mockup.  I used the same logo and the font for the header is this Google font: https://fonts.google.com/?query=rampart+one&preview.text=STAR%20Speech%20and%20Language%20Therapy&preview.text_type=custom.  Other headers on the page use this font: https://fonts.google.com/specimen/Annie+Use+Your+Telescope?query=annie&preview.text=STAR%20Speech%20and%20Language%20Therapy&preview.text_type=custom.  I added in a thick dashed border under the header.  The intro text is just some text I’ve taken from one of the Seeing Speech pages, and the images are still currently just the ones in the mockup.  Hovering over an image causes the same dashed border to appear.  The footer is a kind of pink colour, which is supposed to suggest those blue and pink rubbers you used to get in schools.

The second version uses the ‘rampart one’ font just for ‘STAR’ in the header, with the other font used for the rest of the text.  The menu bar is moved to underneath the header and the dashed line is gone.  The main body of the page is white rather than continuing the blue of the header and ‘rampart one’ is used as the in-page headers.  The images now have rounded edges, as do the text blocks in the images.  Hovering over an image brings up a red border, the same shade as used in the active menu item.  The pink footer has been replaced with the blue from the navbar.  Both versions are ‘responsive’ and work on all screen sizes.

I’ll be continuing to work on the mockups next week.