I participated in the University and College Union’s strike action this week so I didn’t work on Wednesday and Thursday. I spent the rest of the week on lots of relatively small bits of work. I had some more AHRC review duties to carry out so I spent some time on that. I also spent a bit of time with the Hansard data and the new database that I have access to on a server in Physics and Astronomy. I migrated all of my existing tables, including data about speeches, members, constituencies and things like that to the new database. I didn’t copy my two-year sample of the frequency data across, but I did copy the structure of the table over, so it is now ready to accept the full 200 year data once I get things going. I looked into the possibility of using Python to run through the SQL files that had previously been generated via the Grid, as Python is available on the Grid nodes. However, in order to use Python with MySQL a new library needs to be installed and I wasn’t sure whether this would be possible on the Grid. I reached the conclusion that writing another shell script to run the MySQL command to process each SQL file probably made more sense than pulling each SQL file into Python and then processing it line by line. The only problem is that MySQL expects each of the insert statements within the SQL file to be terminated by a semi-colon, which is standard SQL syntax. Unfortunately I was so used to processing SQL commands based on line breaks using PHP that I omitted the semi-colon from the shell script that generated the SQL files. So I had over 80Gb of SQL files that MySQL itself was unable to process. I could have fixed the shell script and then re-run it on the Grid again, but I must admit I felt a little foolish for having missed off the semi-colon and decided to just fix the files on my own PC instead. I used Notepad++’s ‘find and replace in files’ feature to replace each line break with a semi-colon followed by a line break. It took a few days for Notepad++ to complete the task, but that was fine as I just left it running in the background whilst I got on with other things. I now have a set of SQL files that the MySQL commands will be able to read. The next step will be to write a shell script that connects to the MySQL server and runs a single SQL file. It should hopefully not be too tricky to create such a script.
I also spent a fair amount of time this week trying to investigate why the H27 categories had somehow been omitted from the Old English Mapping Metaphor data. I started looking into this last week but hadn’t found out what was the cause. This week I did some more investigation, both on my own and with the help of Wendy. We eventually figured out that although the H27 data had been present in the spreadsheet I had generated from the individual category spreadsheets last year, it didn’t appear in the Access database that Flora had created and which was being used by Carole and others to process the ‘Stage 5’ data for the metaphorical connections (i.e. directionality and sample lexemes). I had a long but useful meeting with Wendy, Flora and Carole on Friday where we went through all of this, trying to work out where the omission had occurred. It looks very much like the H27 categories were always processed independently of the main OE dataset, which is why they never appeared in the Access table. We agreed that Flora would update the Access form that Carole was using in order to incorporate the H27 categories, which will hopefully allow these categories to be processed without having any impact on the other categories. Flora is going to work on this next week, all being well.
During the rest of the week I wrote up my notes from the SCOSYA meeting last week and continued to help Vivien out with some updates to the new section of the Burns website she is putting together. I also gave some advice to Luca Guariento, who has this week taken over the management of the Curious Travellers website.