Week Beginning 25th September 2023

I had my PDR session on Monday this week, which was all very positive.  There was also one further UCU strike day on Wednesday this week, cutting my working days down to four.  The project I devoted the most of the available time to was Books and Borrowing.  Last week I had begun reworking the API to make it more usable and this week I completed this task, adding in a few endpoints that I’d created but hadn’t added to the documentation.  I then moved onto the task of adding ‘Download data’ links to the front-end.  These links now appear as buttons beside the ‘Cite’ button on any page that displays data, as you can see in the following screenshot:

Pressing on the button loads the API endpoint used to return the data found on the page with ‘CSV’ rather than ‘JSON’ selected as the file type.  This then prompts the file to be downloaded by the browser rather than loading the data into the browser tab.  It took a bit of time to add these links to every required page on the site, but I think I’ve got them all.  However, the CSV downloads still needed quite a lot of work doing to them.  When formatted as JSON any data held in nested arrays are properly transformed and usable, but a CSV is a flat file consisting of columns and rows and the data has a more complicated structure than this.  For example, if we have one row in the CSV file for each borrowing record on a register page the record may have multiple associated borrowers, each with any number of occupations consisting of multiple fields.  The record’s book holding may have any number of book items and may be associated with multiple book editions and there may be multiple authors associated with any level of book record (item, holding, edition and work).  Representing this structure in a simple two-dimensional spreadsheet is very tricky and requires the data to be ‘flattened’.  In order to do so a script needs to work out the maximum number of each variable items a record in the returned data has in order to create the required columns (with heading labels) and to pad out any other records that don’t have the maximum number of items with empty columns so that the columns of all records line up.

So, for example, when looking at borrowers: If borrowing row number 16 out of 20 has a borrower with five occupations then column headings need to be added for five sets of occupation columns and the data for the remaining 19 rows needs to be padded out with empty data to ensure any columns that appear after occupations continue to line up.  As a borrowing may involve multiple borrowers this then becomes even more complicated.

I managed to update the API to ensure nested arrays were flattened for several of the most complicated endpoints, such as a page of records and the search results.  The resulting CSV files can become quite monstrously large, with over 200 columns of data a regular occurrence.  However, with the data properly structured and labelled it should hopefully make it easier for users who are interested in the data to download the CSV and then delete the columns they are not interested in, resulting in a more manageable file.  I still need to complete the ‘flattening’ of CSV data for a few other endpoints, which I hope to tackle next week.

Also this week I had an email discussion with Petra Poncarova, a researcher in Scottish Literature who is beginning a research project and requires a project website.  I’ve arranged for hosting to be set up for this and by the end of the week we had the desired subdomain and WordPress installation.  I spent a bit of time on Friday afternoon getting the structure and plugins in place and next week I’ll work on the interface for the website.

I also made a couple of further updates to the House of Fraser Archive this week.  I’d completed most of the work last week but hadn’t managed to get the search facility working.  After some suggestions from Luca I managed to figure out what the problem was (it turned out to be the date search part of the query that was broken) and the search is now operational.  We even managed to get results highlighting in the records working again, which is something I wasn’t sure we’d be able to do.

The rest of my time was spent making updates to the Speech Star websites (and Seeing Speech).  Eleanor had noticed some errors in the metadata for a couple of the videos in the IPA charts so I fixed these.  There were also some better quality videos to add to the ExtIPA charts and some further updates to the metadata here too.  Also for this project Jane Stuart-Smith contacted me to say that I had been erroneously categorised as ‘Directly Incurred’ rather than ‘Directly Allocated’ when the grant application had been processed, which is now causing some bother.  I may have to create timesheets for my work on the project, but we’ll see what transpires.