This was my first full week back of the new year and I spent quite a bit of it working for the REELS project. We had a project meeting on Tuesday, and at this we discussed a data export facility that had been requested. Such a facility would allow place-name data to be exported as an Excel compatible CSV file. Exactly how this should be formatted took some thinking through and discussion as the place-name data is split across 13 related tables and many different approaches could be taken in order to ‘flatten’ this out into a single two dimensional spreadsheet. We decided to make two different export types to suit different requirements.
The first would list each place-name with one name per row and all of the related data stored in separate columns. A place-name may have any number of historical forms, classifications, parishes, sources and other such data and wherever these ‘one to many’ or ‘many to many’ relationships were encountered these would be added as additional columns. For example, if a place-name has 3 historical forms and each historical form consists of data in 7 columns then there will be 21 columns for historical form data for this place-name. Of course different place-names have different numbers of historical forms and in order for all of the columns to match up I also needed to work out which place-name had the most historical forms and ‘pad out’ the rows that had fewer historical forms with blank fields (well, actually fields with an ‘x’ in) to keep all of the data aligned. This proved slightly complicated to get working, but I got there in the end.
The second export option would group place-names by their source, so the researchers could see which names were found in which source. The structure of the resulting file is slightly different because ‘source’ is related to a specific historical form for a place-name rather than directly with the place-name itself. Each row in the resulting file has one historical form, relating to one place-name and place-names will appear in multiple rows across the export file – once for each ‘source’ one of their historical forms are linked to. Also, any place-name that doesn’t currently have a historical form will not appear in the file as such place-names will not have sources yet.
I created an ‘export’ page in the CMS that allows the researchers to select the export type, and also to optionally select a start and end date for their export. This allows them to export just a subset of the data that was created or edited during a specific time period rather than for the whole duration of the project. Leaving the date fields blank returns the full dataset. I also updated the system so that the ‘date of last edit’ field now gets updated when data relating to the place-name is changed. Previously this field only updated with the ‘core’ record for the place-name was edited (e.g. the name, the grid reference) whereas now it gets updated when other information such as the place-names elements and historical forms are updated, for example if new elements are added or a historical form is deleted.
I also had to make it clear to the team that when a record is edited the existing date of last edit is replaced so if a record was created in January 2016 and last edited in June 2016 it will be found if you limit the period to between March and June 2016, but if the record is then edited again in January 2017 the record will no longer be returned in the March to June 2016 export. My final task related to the export facility was to figure out how to make Excel open a CSV file that uses Unicode text. By default Excel doesn’t display Unicode characters properly and this is something of an issue for us as we use Unicode characters in the pronunciation field and elsewhere. Rather than opening the CSV file directly in Excel (e.g. by double clicking on the file icon) researchers will have to import the data into an existing blank Excel file using Excel’s ‘Get External Data’ option. It’s a bit of a pain but worth it to see the text as it’s supposed to look rather than all garbled.
The rest of my week was spent on a few other tasks. I had some further AHRC review duties to take care of, which I took care of towards the end of the week. I also had a phone call with Marc about a couple of new projects that will be starting up in the coming months. One is to further redevelop ARIES and the Grammar app with new content, which will be great to do. The other involves setting up a small corpus of legal documents. I had an email conversation with Fraser about some LDNA tasks I had been assigned and also the redevelopment of the HT data. I made a couple of further tweaks to the ‘Learning with the Old English Thesaurus’ resource for Carole. I responded to a request from Thomas Widmann of SLD about how to edit some of the ancillary sections of the DSL resource, I responded to a request from elsewhere in the University about the University app accounts, which I currently manage and I met with Gary to discuss some further updates to the SCOSYA atlas. I also fixed a bug he’d spotted with the atlas, whereby the first attribute in each parent category was being omitted.