My son returned to school on Monday this week, marking an end to the home-schooling that began after the Christmas holidays. It’s quite a relief to no longer have to split my day between working and home-schooling after so long. This week I continued with some Data Management Plan related activities, completing a DMP for the metaphor project involving Duncan of Jordanstone College of Art and Design in Dundee and drafting a third version of the DMP for Kirsteen McCue’s proposal following a Zoom call with her on Wednesday.
I also spent some further time on the Books and Borrowing project, creating tilesets and page records for several new volumes. In fact, we ran out of space on the server. The project is digitising around 20,000 pages of library records from 1750-1830 and we’re approaching 5,000 pages so far. I’d originally suggested that we’d need about 60GB of server space for the images (3MB per image x 20,000). However, the JPEGS we’ve been receiving from the digitisation units have been generated at maximum quality / minimum compression and are around 9MB each, so my estimates were out. Dropping the JPEG quality setting down from 12 to 10 would result in 3MB files so I could do this to save space if required. However, there is another issue. The tilesets I’m generating for each image so that they can be zoomed and panned like a Google Map are taking up as much as 18MB per image. So we may need a minimum of 540GB of space (possibly 600GB to be safe): 9×20,000 for the JPEGs plus 18×20,000 for the tilesets. This is an awful lot of space, and storing image tilesets isn’t actually necessary these days of an IIIF server (https://iiif.io/about/) could be set up. IIIF is now well established as the best means of hosting images online and it would be hugely useful to use. Rather than generating and hosting thousands of tilesets at different zoom levels we could store just one image per page on the server and it would serve up the necessary subsection at the required zoom level based on the request from the client. This issue is that people in charge of servers don’t’ like having to support new software. I entered into discussions with Stirling’s IT people about the possibility of setting up an IIIF server, and these talks are currently ongoing, so in the meantime I still need to generate the tilesets.
Also this week I discussed a couple of issues with the Thesaurus of Old English with Jane Roberts. A search was bringing back some word results but when loading the category browser no content was being displayed. Some investigations uncovered that these words were in subcategories of ’02.03.03.03.01’ but there was no main category with that number in the system. A subcategory needs a main category in order to display in the tree browser and as none was available nothing was displaying. Looking at the underlying database I discovered that while there was no ’02.03.03.03.01’ main category there were two ’02.03.03.03.01|01’ subcategories: ‘A native people’ and ‘Natives of a country’. I bumped the former up from subcategory to main category and the search results then worked.
I spent the rest of the week continuing with the development of the Anglo-Norman Dictionary. I made the new bibliography pages live this week (https://anglo-norman.net/bibliography/), which also involved updating the ‘cited source’ popup in the entry page so that it displays all of the new information. For example, go to this page: https://anglo-norman.net/entry/abanduner and click on the ‘A-N Med’ link to see a record with multiple items in it. I also updated the advanced search for citations so that the ‘Citation siglum’ drop-down list uses the new data too.
After that I continued to update the Dictionary Management System. I updated the ‘View / Download Entry’ page so that the ‘Phase’ of the entry can be updated if necessary. In the ‘Phase’ section of the page all of the phases are now listed as radio buttons, with the entry’s phase checked. If you need to change the entry’s phase you can select a different radio button and press the ‘Update Phase’ button. I also added facilities to manage phase statements via the DMS. In the menu there’s now an ‘Add Phase’ button, through which you can add a new phase, and a ‘Browse Phases’ button which lists all of the active phases, the number of entries assigned to each, and an option to edit the phase statement. If there’s a phase statement that has no associated entries you’ll find an option to delete it here too.
I’m still working on the facilities to upload and manage XML entry files via the DMS. I’ve added in a new menu item labelled ‘Upload Entries’ that when pressed on loads a page through which you can upload entry XML files. There’s a text box where you can supply the lead editor initials to be added to the batch of files you upload (any files that already have a ‘lead’ attribute will not be affected) and an option to select the phase statement that should be applied to the batch of files. Below this area is a section where you can either click to open a file browser and select files to upload or drag and drop files from Windows Explorer (or other file browser). When files are attached they will be processed, with the results shown in the ‘Update log’ section below the upload area. Uploaded files are kept entirely separate from the live dictionary until they’ve been reviewed and approved (I haven’t written these sections yet). The upload process will generate all of the missing attributes I mentioned last week – ‘lead’ initials, the various ID fields, POS, sense numbers etc. If any of these are present the system won’t overwrite them so it should be able to handle various versions of files. The system does not validate the XML files – the editors will need to ensure that the XML is valid before it is uploaded. However, the ‘preview’ option (see below) will quickly let you know if your file is invalid as the entry won’t display properly. Note also that you can change the ‘lead’ and the phase statement between batches – you can drag and drop a set of files with one lead and statement selected, then change these and upload another batch. You can of course choose to upload a single file too.
When XML files are uploaded, in the ‘update log’ there will be links directly through to a preview of the entry, but you can also find all entries that have been uploaded but not yet published on the website in the ‘Holding Area’, which is linked to in the DMS menu. There are currently two test files in this. The holding area lists the information about the XML entries that have been uploaded but not yet published, such as the IDs, the slug, the phase statement etc. There is also an option to delete the holding entry. The last two columns in the table are links to any live entry. There are two columns. The first links to the entry as specified by the numerical ID in the XML filename, which will be present in the filename of all XML files exported via the DMS’s ‘Download Entry’ option. This is the ‘existing ID’ column in the table. The second linking column is based on the ‘slug’ of the holding entry (generated from the ‘lemma’ in the XML). The ‘slug’ is unique in the data so if a holding entry has a link in this column it means it will overwrite this entry if it’s made live. For XML files exported view the DMS and them uploaded both ‘live entry’ links should be the same, unless the editor has changed the lemma. For new entries both these columns should be blank.
The ‘Review’ button opens up a preview of the uploaded holding entry in the interface of the live site. This allows the editors to proofread the new entry to ensure that the XML is valid and that everything looks right. You can return to the holding area from this page by pressing on the button in the left-hand column. Note that this is just a preview – it’s not ‘live’ and no-one else can see it.
There’s still a lot I need to do. I’ll be adding in an option to publish an entry in the holding area, at which point all of the data needed for searching will be generated and stored and the existing live entry (if there is one) will be moved to the ‘history’ table. I also maybe need to extract the earliest date information to display in the preview and in the holding area. This information is only extracted when the data for searching is generated, but I guess it would be good to see it in the holding area / preview too. I also need to add in a preview of cross reference entries as these don’t display yet. I should probably also add in an option to allow the editors to view / download the holding entry XML as they might want to check how the upload process has changed this. So still lots to tackle over the coming weeks.