I had rather a lot of different things to get through this week, which made the week fly past alarmingly quickly. On Monday I attended an all-day meeting for the Scots Syntax Atlas project. The meeting was primarily focussed on the online resources that the project intends to produce, so it was obviously quite important for me to attend, seen as I’ll be creating the resources. We discussed some existing online linguistic atlas projects, looking at some in detail and figuring out aspects of each that we liked or didn’t like. We also tried out a few online language quizzes too as the project is hoping to incorporate such a feature. We also discussed some of the requirements of our intended online resource, and how the ‘general public’ interface that the project intends to create should differ from the ‘academic’ interface. I have a much clearer idea now of what it is the project would like to develop, although we still need to have a more detailed discussion about the underlying data itself and how this will be presented on the map-based interface. Things are beginning to take shape, however.
Gareth Roy from Physics and Astronomy got back to me this week to say that one of his colleagues had set up a database on a server for me to use for the Hansard data, which was excellent news. I didn’t have much time to test this out this week, but did manage to confirm that I can access it from both my desktop machine and the little test server located in my office. Hopefully I will find the time next week to migrate some of the existing data to this database and start to write a script that can process the thousands of SQL files that contain the actual data.
I also had to spend a bit of time this week doing AHRC review duties, and I attended a Digital Humanities talk on Friday afternoon, which was interesting but perhaps lacking a bit in detail. I spend some further time on the Mapping Metaphor project, as Wendy and Carole had prepared new batches of both OE and non-OE data that I needed to upload to the database. This all seemed to go smoothly enough, although as a result of the upload Wendy noticed that the data from certain OE categories appears to have been overlooked. It would appear that several of the categories that were split off from category H27 were omitted from the spreadsheets at some earlier stage in the data processing and I had to spend some time trying to figure out when and how this happened, which was not easy as this must have happened more than a year ago and it may have been because of one of my scripts or it may have been as a result of some of the processes Ellen and Flora conducted. We still hadn’t managed to get to the bottom of the issue by the end of the week and I’ll have to return to this next week.
I also spent some time this week helping out Vivien Williams of the Burns project, who is putting the finishing touches to lots of new content for the website. She needed my help to get some parts of her pages laid out as she required them, and in converting some WAV files and other such tasks. I also had a detailed email conversation with Alison Wiggins about the proposal she’s putting together, and I think we now have a much clearer idea of how the technical aspects of the proposal will fit together. I also set up a ‘bare bones’ WordPress installation for Christine Ferguson’s new project and sent some code on to some researchers that were interested in how the SciFiMedHums bibliographic database was put together. Oh, I also published the Android version of ‘Readings in Early English’, which can be found here: https://play.google.com/store/apps/details?id=com.gla.stella.readings
It was another four-day week this week due to the May Day holiday on Monday. I spent most of Tuesday finishing off a first version of the Technical Plan for the proposal Murray Pittock is putting together and sent it off to him for comment on Wednesday morning. There will no doubt be some refinements to make to it before it can be integrated with the rest of the bid documentation but I think overall the technical side of the potential project have been given sufficient consideration. On Wednesday I heard from Christine Ferguson that a proposal she had submitted which I had given technical advice on had been awarded funding, which is great news. I’m going to meet with her next week to discuss the technical requirements in more detail.
Also on Wednesday we had a College of Arts developers meeting. This consisted of Matt Barr, Graeme Cannon, Neil McDermott and me so not a huge gathering, but it was very useful to catch up with other developers and discuss some of the important issues for developers in the College.
I also spent some time on Wednesday investigating a bug in the Mapping Metaphor tabular view. Wendy had noticed that ordering the table by category two was not working as it should have done. It looks like I introduced this bug when I allowed the table to be properly ordered by the ‘direction’ column a few weeks ago. For the table to be ordered by the direction column I needed to look at the full HTML of the data in the columns to get some info from within the ‘image’ tag of the arrow. But for some reason the reordering isn’t working properly for the other columns when the full HTML rather than the plain text is used. I’ve updated things so that the full HTML is used only when the ‘direction’ column is clicked on and the plain text is used for all other columns. I’ve fixed this in the main site and the Metaphoric site. I’m afraid the problem also exists in the App, but I’ll wait to fix this until we have the next batch of data to add later on this month.
On Thursday I contacted Gareth Roy about the possibility of the Hansard data being hosted on one of the servers managed by Physics and Astronomy. Gareth had previously been hugely helpful in giving advice on how to use the ScotGrid infrastructure to extract all of the Hansard data and to convert it into millions of SQL insert statements, but without a server to host the data I couldn’t proceed any further. In March Gareth and I attended the same ‘big data’ meeting and after wards he suggested that there might be the possibility of getting the data hosted on one of the servers he has access to and now that Metaphoric is out of the way I have a bit of time to return to the Hansard data and consider how it can be used. I’m going to meet with Gareth next week to consider the options. In the meantime I tried to access the test server that Chris had set up for me last year, on which I had a subset of the Hansard data running through a graph-based front end that I’d created. Unfortunately when I tried to connect to it nothing happened. As the box is physically in my office (it’s just an old desktop PC set up as a server) I tried manually turning it off and on again but it just made a worrying series of low pitch beeps and then nothing happened. I tried plugging a monitor and keyboard into it and there was no display and the caps lock key wouldn’t light up or anything. I took out the hard drive and put it in my old desktop PC and that made the same worrying beeps and then did nothing when I started it too. I then attached the drive in place of the secondary hard drive in my old PC and after it had successfully booted into Windows it didn’t find the drive. So I guess there’s been some sort of hard drive failure! Thankfully I have all the code I was working with on my desktop PC anyway. However, I’ve realised that the version of the database I have on my desktop PC is not the most recent version. I’ll be able to reconstruct the database from the original data I have but it’s a shame I’ve lost the working version. It’s completely my own fault for not backing things up. Chris is going to try and see if he can access the hard drive too, but I’m not holding out much hope.
On Friday I had a meeting with Alison Wiggins to discuss a proposal she is putting together that will involve crowdsourcing. I did a bit of research into the Zooniverse Scribe tool (http://scribeproject.github.io/), which she was keen to use. I also tried installing the tool in OSX but despite following their detailed installation instructions all I ended up with was a bunch of errors. Nevertheless, it was useful to learn about the tool and also to try out some online examples of the tool as well. I think it has potential, if we can only get it working. The meeting with Alison went well and we discussed the technical issues relating to her project and how crowdsourcing might fit in with it. She is still at the planning stages at the moment and we’ll need to see if the project will feature a technical aspect or if it will involve some kind of event discussing issues relating to crowdsourcing instead. I think either might work pretty well.
Also this week I received an external hard drive in the mail from the Royal College of Physicians in Edinburgh, who want access to the raw TIFF images that were produced for the Cullen project. Project PI David Shuttleton had previously agreed to this so I started the process off. Copying half a terabyte of images from a Network drive to an external hard drive takes rather a long time, but by Friday morning I had completed the process and had mailed the hard drive off. Hopefully it will arrive in one piece.
It’s been another busy week, but I have to keep this report brief as I’m running short of time and I’m off next Monday. I came into work on Monday to find that the script I had left executing on the Grid to extract all of the Hansard data had finished working successfully! It left me with a nice pile of text files containing SQL insert statements – about 10Gb of them. As we don’t currently have a server on which to store the data I instead started a script executing that runs each SQL insert command on my desktop PC and puts the data into a local MySQL database. Unfortunately it looks like it’s going to take a horribly long time to process the data. I’m putting the estimate at about 229 days.
My arithmetic skills are sometimes rather flaky so here’s how I’m working out the estimate. My script is performing about 2000 inserts a minute. There are about 1200 output files and based on the ones I’ve looked at they contain about 550,000 lines each. 550,000 x 1200 = 660,000,000 lines in total. This figure divided by 2000 gives the number of minutes it would take (330,000). Divide this by 60 gives the number of hours (5,500). Divide this by 24 gives the number of days (229). My previous estimate for doing all of the processing and uploading on my desktop PC was more than 2 years, so using the Grid has speeded things up enormously, but we’re going to need something more than my desktop PC to get all of the data into a usable form any time soon. Until we get a server for the database there’s not much more I can do.
On Tuesday this week we had a REELS team meeting where we discussed some of the outstanding issues relating to the structure of the database (amongst other things). This was very useful and I think we all now have a clear idea of how the database will be structured and what it will be able to do. After the meeting I wrote up and distributed an updated version of my database specification document and I also worked with some map images to create a more pleasing interface for the project website (it’s not live yet though, so no URL). Later in the week I also created the first version of the database for the project, based on the specification document I’d written. Things are progressing rather nicely at this stage.
I spent a bit of time fixing some issues that had cropped up with other projects. The Medical Humanities Network people wanted a feature of the site tweaked a little bit, so I did this. I also fixed an issue with the lexeme upload facility of the Scots Corpus, which was running into some maximum form size limits. I had a funeral to attend on Thursday afternoon so I was away from work for that.
I worked on several projects this week. I continued to refine the database and content management system specification document for the REELS project. Last week I had sent an initial version out to the members of the team, who each responded with useful comments. I spent some time considering their comments and replying to each in turn. The structure of the database is shaping up pretty nicely now and I should have a mostly completed version of the specification document written next week before our next project meeting.
I also met with Gary Thoms of the SCOSYA project to discuss some unusual behaviour he had encountered with the data upload form I had created. Using the form Gary is able to drag and drop CSV files containing survey data, which then pass through some error checking and are uploaded. Rather strangely, some files were passing through the error checks but were uploading blank data, even though the files themselves appeared to be in the correct format and well structured. Even more strangely, when Gary emailed one of the files to me and I tried to upload it (without even opening the file) it uploaded successfully. We also worked out that if Gary opened the file and then saved it on his computer (without changing anything) the file also uploaded successfully. Helpfully, the offending CSV files don’t display with the correct CSV icon on Gary’s Macbook so it’s easy to identify them. There must be some kind of file encoding issue here, possibly caused by passing the file from Windows to Mac. We haven’t exactly got to the bottom of this, but at least we’ve figured out how to avoid it happening in future.
On Friday I had a final project meeting for the Medical Humanities Network project. The meeting was really just to go over who will be responsible for what after the project officially ends, in order to ensure new content can continue to be added to the site. There shouldn’t really be too much for me to do, but I will help out when required. I also continued with some outstanding tasks for the SciFiMedHums project on Friday too. Gavin wants visitors to the site to be able to suggest new bibliographical items for the database and we’ve decided that asking them to fill out the entire form would be too cumbersome. Instead we will provide a slimmed down form (item title, medium, themes and comments) and upon submission an Editor will then be able to decide if the item should be added to the main system and if so manage this through the facilities I’ll develop. On Friday I figured out how the system will function and began implementing things on a test server I have access to. So far I’ve updated the database with the new fields that are required, added in the facilities to enable visitors to the site to log in and register and I’ve created the form that users will fill in. I still need to write the logic that will process the form and all of the scripts the editor will use to process things, which hopefully I’ll find time to tackle next week.
I continued to work with the Hansard data for the SAMUELS project this week as well. I managed to finish the shell script for processing one text file, which I had started work on last week. I managed to figure out how to process the base64 decoded chunk of data that featured line breaks, allowing me to extract and process an individual code / frequency pairing. I then figured out a way to write each line of data to an output text file. The script now takes one of the input text files that contain 5000 lines of base64 encoded code / frequency data and for each code / frequency pair it writes an SQL statement to a text file. I tested the script out and ensured that the resulting SQL statements worked with my database and after that I contacted Gareth Roy in Physics, who has been helping to guide me through the workings of the Grid. Gareth provided a great deal of invaluable help here, including setting up space on the Grid, writing a script that would send jobs for each text file to the nodes for processing, updating my shell script so that the output text file location could be specified and testing things out for me. I really couldn’t have got this done without his help. On Friday Gareth submitted an initial test batch of 5 jobs, and these were all processed successfully. As all was looking good I then submitted a further batch of jobs for scripts 6 to 200. These all completed successfully by late on Friday afternoon. Gareth then suggested I submit the remaining files to be processed over the weekend so I did. It’s all looking very promising indeed. The only possible downside is that as things currently stand we have no server on which to store the database for all of this data. This is why we’re outputting SQL statements in text files rather than writing directly to a database. As there will likely be more than 100 million SQL insert statements to process we are probably going to face another bottleneck when we do actually have a database in which to house the data. I need to meet with Marc to discuss this issue.
I spent a fair amount of time this week working on the REELS project, which began last week. I set up a basic WordPress powered project website and got some network drive space set up and then on Wednesday we had a long meeting where we went over some of the technical aspects of the project. We discussed the structure of the project website and also the structure of the database that the project will require in order to record the required place-name data. I spent the best part of Thursday writing a specification document for the database and content management system which I sent to the rest of the project team for comment on Thursday evening. Next week I will update this document based on the team’s comments and will hopefully find the time to start working on the database itself.
I met with a PhD student this week to discuss online survey tools that might be suitable for the research that she was hoping to gather. I heard this week from Bryony Randall in English Literature that an AHRC proposal that I’d given her some technical advice on had been granted funding, which is great news. I had a brief meeting with the SCOSYA team this week too, mainly to discuss development of the project website. We’re still waiting on the domain being activated, but we’re also waiting for a designer to finish work on a logo for the project so we can’t do much about the interface for the project website until we get this anyway.
I also attended the ‘showcase’ session for the Digging into Data conference that was taking place at Glasgow this week. The showcase was an evening session where projects had stalls and could speak to attendees about their work. I was there with the Mapping Metaphor project, along with Wendy, Ellen and Rachael. We had some interesting and at times pretty in-depth discussions with some of the attendees and it was a good opportunity to see the sorts of outputs other projects have created with their data.
Before the event I went through the website to remind myself of how it all worked and managed to uncover a bug in the top-level visualisation: When you click on a category yellow circles appear at the categories the one you’ve clicked on have a connection to. These circles represent the number of metaphorical connections between the two categories. What I noticed was that the size of the circles was not taking into consideration the metaphor strength that had been selected, which was giving confusing results. E.g. if there are 14 connections but only one of these is ‘strong’ and you’ve selected to view only strong metaphors the circle size was still being based on 14 connections rather than one. Thankfully I managed to track down the cause of the error and I fixed it before the event.
I also spent a little bit of time further investigating the problems with the Curious Travellers server, which for some reason is blocking external network connections. I was hoping to install a ‘captcha’ on the contact form to cut down on the amount of spam that was being submitted and the Contact Form 7 plugin has a facility to integrated Google’s ‘reCaptcha’ service. This looked like it was working very well, but for some reason when ‘reCaptcha’ was added to forms these forms failed to submit, instead giving error messages in a yellow box. The Contact Form 7 documentation suggests that a yellow box means the content has been marked as spam and therefore won’t send, but my message wasn’t spam. Removing ‘reCaptcha’ from the form allowed it to submit without any issue. I tried to find out what was causing this but have been unable to find an answer. I can only assume it is something to do with the server blocking external connections and somehow failing to receive a ‘message is not spam’ notification from the service. I think we’re going to have to look at moving the site to a different server unless Chris can figure out what’s different about the settings on the current one.
My final project this week was SAMUELS, for which I am continuing to work on the extraction of the Hansard data. Last week I figured out how to run a test job on the Grid and I split the gigantic Hansard text file into 5000 line chunks for processing. This week I started writing a shell script that will be able to process these chunks. The script needs to do the same tasks as my initial PHP script, but because of the setup of the Grid I need to write a script that will run directly in the Bash shell. I’ve never done much with shell scripting so it’s taken me some time to figure out how to write such a script. So far I have managed to write a script that takes a file as an input, goes through each line at a time, splits the line up into two sections based on the tab character, base64 decodes each section and then extracts the parts of the first section into variables. The second section is proving to be a little trickier as the decoded content includes line breaks which seem to be ignored. Once I’ve figured out how to work with the line breaks I should then be able to isolate each tag / frequency pair, write the necessary SQL insert statement and then write this to an output file. Hopefully I’ll get this sorted next week.
Two new projects that I will be involved with over the coming months and years started up this week. The first one was the People’s Voice project for Catriona MacDonald and Gerry Carruthers. I will be developing a database of poems and establishing a means of enabling the team to transcribe poems using the TEI guidelines. This is a good opportunity for me to learn more about text encoding as although I’ve been involved in some text encoding projects before I’ve never had sole responsibility for such aspects. Since starting back after Christmas I’ve been getting up to speed with TEI and the Oxygen text editing tool and this week I met with the team and we had a two hour introductory session to transcription using TEI and Oxygen. I spent quite a bit of time before the session preparing a worksheet for them and getting my head around the workings of Oxygen and the workshop went very well. It was the first time some of the people had ever written any XML and everyone did very well. It will obviously take a bit of practise for them to be able to transcribe poems rapidly, but hopefully the worksheet I prepared together with the template files I’d made will allow them to get started. There is still lots to get the project up and running and we will be meeting again in the next few weeks to get started on the database and the website, but so far things are progressing well.
The second new project I was involved with this week was Carole Hough’s REELS project (Recovering the Earliest English Language in Scotland). This project will be analysing the placenames of Berwickshire and I’ll be developing a content management system to enable the team to record all of the data. We had a project meeting this week where we went over the project timetable and discussed how and when certain tasks would start up. It was a useful meeting and a good opportunity to meet the rest of the team and we have now arranged a further, more technical meeting for next week where we will think in more detail about the requirements for the database and the CMS and things like that. I also put a request in for a subdomain for the project website and got some shared drive space set up for the project too.
As well as these two projects beginning, another project I’ve been involved with launched this week. Over the past few months I’ve been developing the technical infrastructure for a Medical Humanities Network website for Megan Coyer. This project had its official launch on Friday evening, and all went very well. The project can be accessed here: http://medical-humanities.glasgow.ac.uk/
In addition to these projects I also spent a bit of time trying to figure out what was preventing the Curious Travellers WordPress installation from connecting to external services. Using my test server I managed to fix one of the issues (the instance will now connect to the WordPress RSS feeds) but other services such as Akismet and the searching of new plugins fail to connect. The strange thing is if I copy both the database and the files for the site onto my test server external connections work, which would suggest that the problem is with the server configuration. I’ve spoken to Chris about this and he has done some investigation but as far as he can tell there is nothing at server level that is any different to other server setups. It’s all very odd and we may have to consider moving the site to a different server to see if this fixes the problem.
I also spent some time working with the Grid and preparing the Hansard data for processing on the Grid. Gareth Roy from Physics has been helping me with this and he’d sent me some instructions on how to submit a test job to the Grid before Christmas. This week I managed to successfully submit process and extract the output for my test script, which is encouraging. Gareth thought that splitting my 10Gb text file into small chunks for processing would make the most sense so I wrote a little script that split the file up, with 5000 lines per file. This resulted in about 1200 files with sizes varying from 5Mb to 16Mb, which should hopefully be relatively easy to load and process. I now have to figure out how to write a shell script that will load a file, process it and export SQL statements to another text file. I’ve never written a shell script that does anywhere near as much as this before, so it’s going to take a bit of time to get the hang of things.
My final project of the week was Metaphor in the Curriculum. We had another project meeting this week and as a result of this I added a new feature to the Mapping Metaphor website and did some further work on our prototype app. The new feature is a ‘Metaphor of the Day’ page that does what you’d expect it to: displaying a different metaphorical connection each day. You can view the feature here: http://mappingmetaphor.arts.gla.ac.uk/metaphor-of-the-day/
The Medical Humanities Network ‘soft launched’ on Friday this week so I had quite a bit of last minute tweaking and adding of features to manage before this happened. This included updating the structure to allow people and collections to be associated with each other, fixing a number of bugs, ensuring ‘deleted’ content could no longer be accessed through the site (it was previously available for test purposes), adding a new ‘contact’ section and adding a feature that ensures people agree that they have the rights to upload images. It’s all looking pretty good and as far as I’m aware it’s going to be officially launched in a week’s time.
I also received the final pieces of information I required this week to allow paid apps to be published through the Apple App Store and the Google Play Store. This is something that has been dragging on for a while and it is really good to get it out of the way. It’s actually something that’s required by people outside of the Critical Studies and getting it sorted took quite a bit of effort so it’s especially good to get it sorted.
I also took ownership of the Pennant project’s technical stuff this week. This is a temporary arrangement until a new developer is found, but in the meantime I noticed some problems with the project’s WordPress installation. There is some sort of issue that is stopping WordPress connecting to external servers. It’s being blocked somehow and as this is affecting things such as the Akismet anti-spam plugin I thought I’d better try and investigate. I had thought it was some kind of server setting, but I installed the site on my test server and it gave the same errors, even though another WordPress site I had on the server worked fine. I tried a variety of approaches, such as updating the version of WordPress, replacing the data with data from a different instance and deactivating each plugin in turn, and I eventually figured out that it’s something within the wordpress options table that’s causing the problem. If I replace this with a default version the connection works. However, this table contains a lot of information about valid WordPress plugins so I’ll have to carefully go through it to identify what has caused the problem. I’m fairly certain it’s one of the plugins that has somehow managed to block external connections. I’ll need to continue with this next week.
I met with Gary Thoms this week to discuss the technical aspects of the SCOSYA project. The .ac.uk domain name has still not come through yet so I contacted Derek Higgins, who is the person who deals with JANET, to ask him what’s going on. Happily he said JANET have now approved the domain and are awaiting payment, so we should be able to get a project website set up in the next few weeks at least. In the meantime I set Gary’s laptop up so that it could access the test version of the site I developed last year. This now means that he can use the content management system to upload and edit survey data and things like that.
I also tried to help Fraser Dallachy out with a problem he was encountering when using the command line version of the SAMUELS tagger. When running the script on his laptop he was just getting memory errors. I updated the command he was running and this at least got the script to start off, but unfortunately it got stuck loading the HT data and didn’t proceed any further. Fraser spoke to Scott at Lancaster about this and he thought it was a memory issue – apparently the script requires a minimum of 2Gb of RAM. Fraser’s laptop had 4Gb of RAM so I wasn’t convinced this was the problem, but we agreed to try running it on my new desktop PC (with 16Gb of RAM) to see what would happen. Surprisingly, the script ran successfully, so it would appear that 4Gb of RAM is insufficient. I say it ran successfully, which it did with a test file that only included ‘the cat sat on the mat’. Unfortunately, no matter what we did by way of changing the input file, the output resolutely produced the output for ‘the cat sat on the mat’! It was most infuriating and after trying everything I could think of I’m afraid I was stumped. Fraser is going to speak to Scott again to see what the problem might be.
I spent the rest of the week getting up to speed with TEI and Oxygen for the People’s Voice project. Although I have a bit of experience with TEI and XML technologies I have never been responsible for these aspects on any project I’ve been involved with before, and to become the resident expert is going to take some time. Thankfully I found some very handy online tutorials (http://tei.it.ox.ac.uk/Talks/2009-04-galway/) aimed at complete beginners, which I found to be a very useful starting point, despite being a few years old now. With a sample poem from the project in hand and Oxygen opened on my computer I managed to make some pretty good progress with transcribing and figuring out how to cope with the variety of content that needed to be marked up in some way. The thing about TEI is there are often several different ways something could be encoded and it’s difficult to know which is ‘right’, or perhaps most suitable. Thankfully I had Graeme Cannon around to offer advice, which was hugely helpful as Graeme has been using these technologies for a long time now and knows them all inside out. By the end of the week I had familiarised myself with Oxygen as a tool, had created a RelaxNG schema using the TEI P5 Roma tool, had created a stylesheet for the Author View, had transcribed the poem to a standard that Graeme was happy with and had begun work preparing the workshop I’m going to be leaving for the project team next Wednesday.
So, here we are in the last full working week before the Christmas holidays. It’s certainly sneaked up quickly this year. I was sort of expecting work to be calming down in the run-up to Christmas but somehow the opposite has happened and there has been a lot going on this week, although I can’t really go into too much detail about at least some of it. On Monday I had a meeting with Gerry Carruthers and Catriona MacDonald about the People’s Voice project, which will be starting in January and for which I will be creating an online resource and giving advice on TEI markup and the like. We had a useful meeting where we discussed some possible technical approaches to the issues the project will be tackling and discussed the sorts of materials that the project will be transcribing. We arranged a time for a training session on Oxygen, TEI and XML in January, so I’ll need to ensure I get some materials ready for this. A lot of Monday and Tuesday was spent going through the documentation for the new Burns bid that Gerry is putting together and preparing feedback on this. Gerry is hoping to get the bid submitted soon so fingers crossed that it will be a success.
I spent a fair amount of time this week setting things up to allow me to access the ScotGrid computing resource in order to process the Hansard data for the Samuels project. This included getting my Grid certificate from John Watt and then running through quite a few steps that were required in order to get me SSH access to the Grid. Thankfully Gareth Roy had sent me some useful documentation that I followed and the process all went pretty smoothly. I have now managed to run a test script on the grid, and in the new year I will hopefully be able to set up some scripts to process chunks of the massive text file that I need to work with. On Wednesday I met with Chris McGlashan and Mike Black from Arts IT Support to discuss the possibility of me getting at least 300Gb of server space for a database in which to store all of the data I hope to extract. Unfortunately they are not currently able to offer this space as the only servers that are available host live sites and they fear having a Grid based process inserting data into might be too much load for the server. 300Gb of data is a lot – it’s probably more than all the other Arts hosted databases put together, so I can appreciate why they are reluctant to get involved. I’ll just need to see what we can do about this once I manage to get in touch with Marc Alexander. I believe there were funds in the project budget for server costs, but I’ll need to speak to Marc to make sure.
Also this week I helped Carole Hough out with some issues she’s been having with the CogTop website and Twitter, and spoke further with Pauline about the restructuring of the Burns website. She is now hoping to have this done next Monday so hopefully we can still launch the new version before Christmas. I also spent some time finishing off the final outstanding items on my Medical Humanities Network ‘to do’ list. This included allowing project members to be associated with teaching materials and updating the system so that the different types of data (projects, people, teaching materials, collections, keywords) can be ‘deleted’ by admin users as well as just being ‘deactivated’. Note that ‘deleted’ records do still exist in the underlying database so I can always retrieve these if needs be.
I was also involved in a lot of App based stuff this week. Some people in MVSL have been trying to get a paid app published via the University account for some time now, but there have been many hurdles on the way, such as the need for the University to approve the paid app contract, filling in of tax forms and bank account details, creating of custom EULAs and a seemingly endless stream of other tasks that need to be completed. I’ve been working with various people across the University to try and get this process completed, and this has taken quite a bit of time this week. We’re almost there now and I really hope that everything will be ready next week. However, even if it is the App Store will not be accepting app submissions over the Christmas holidays anyway so things are going to be delayed a little longer at least. I was also involved in a lengthy email discussion with Fraser Rowan about app development in the University. There is something of a push for app development and approval to be more formally arranged in the University, which I think is a good thing. There are lots of things that need to be considered relating to this, but I can’t really go into any detail about them here at this stage.
I will be working on Monday and Tuesday next week and then that is me off until the New Year.
It was a week of many projects this week, mostly working on smallish tasks that still managed to take up some time. I was involved in an email discussion this week with some of the University’s data centre people, who would like to see more Arts projects using some of the spare capacity on the ScotGrid infrastructure. This seemed pretty encouraging for the ongoing Hansard work and it culminated in a meeting with Gareth Roy, who works with the Grid for Physics on Friday. This was a very useful meeting, during which I talked through our requirements for data extraction and showed Gareth my existing scripts. Gareth gave some really helpful advice on how to tackle the extraction, such as splitting the file up into 5Mb chunks before processing and getting nodes on the Grid to tackle these chunks one at a time. At this stage we still need to see whether Arts Support will be able to provide us with the database space we require (at least 300Gb) and allow external servers (with specified IP addresses) to insert data. I’m going to meet with Chris next week to discuss this matter. At this stage things are definitely looking encouraging and hopefully some time early in the new year we’ll actually have all of the frequency data extracted.
For the Metaphor in the Curriculum project we had a little Christmas lunch out for the team on Tuesday, which was nice. On Friday Ellen and Rachael had organised a testing session for undergraduates to test out the prototype quiz that we have created, and I met with them afterwards to discuss how it went. The feedback the received was very positive and no-one encountered any problems with the interface. A few useful suggestions were made – for example that only the first answer given should be registered for the overall score, and that questions should be checked as soon as an answer is selected rather than having a separate ‘check answer’ button. I’ll create a new version of the prototype with these suggestions in place.
Hannah Tweed contacted me this week with some further suggestions for the Medical Humanities Network website, including adding facilities to allow non-admin users to upload keywords and some tweaks to the site text. I still need to implement some of the other requests she made, such as associating members with teaching materials. I should be able to get this done before Christmas, though.
Magda also contacted me about updating the Scots Thesaurus search facility to allow variants of words to be searched for. Many words have multiple forms divided with a slash, or alternative spellings laid out with brackets, for example ‘swing(e)’. Other forms were split with hyphens or included apostrophes are Magda wanted to be able to search for these with or without the hyphens. I created a script that generated such variant forms and stored them in a ‘search terms’ database table, much in the same way as I had done for the Historical Thesaurus of English. I then updated the search facilities so that they checked the contents of this new table and I also updated the WordPress plugin so that whenever words are added, edited or deleted the search variants are updated to reflect this. Magda tested everything out and all seems to be working well.
For the SCOSYA project Gary sent me the first real questionnaire to test out the upload system with. My error checking scripts picked up a couple of problems with the contents (a typo in the codes, plus some other codes that hadn’t been entered into my database yet) but after these were addressed the upload went very smoothly. I also completed work on the facilities for editing and deleting uploaded data.
During the week there were times when the majority of internet access was cut off due to some issues with JANET. Unfortunately this had a bit of an impact on the work I could do as I do kind of need internet access to do pretty much everything I’m involved with. However, I made use of the time with some tasks I’d been meaning to tackle for a while. I installed Windows 10 on my MacBook and then reinstalled all of the software I use. I also copied all of my app development stuff from my MacBook onto my desktop computer in preparation for creating the Metaphor in the Curriculum app and also for creating new Android versions of the STELLA apps that still don’t have Android versions available.
I also spent some time this week getting up to speed on the use of Oxygen, XML and TEI in preparation for the ‘People’s Voice’ project that starts in January. I also went through all of the bid documentation for this project and began to consider how the other technical parts of the project might fit together. I have a meeting with Gerry and Catriona next week where we will talk about this further.
These weeks seem to be zipping by at an alarming rate! I split most of my time this week between three projects and tackled a few bits and bobs for other projects along the way too. First up in Metaphor in the Curriculum. Last week I created a fully functioning mockup of a metaphor quiz and I’d created three basic interface designs. This week I created a fourth design that is an adaptation of the third design, but incorporates some fairly significant changes. The biggest change is the introduction of a background image – a stock image from the very handy free resource http://www.freeimages.com/. The use of a background image really brightens up the interface and some transparency features on some of the interface elements helps to make the interface look appealing without making it difficult to read the actual content. I also reworked the ‘MetaphorIC’ header text so that the ‘IC’ is in a different, more cursive font and added a ‘home’ button to the header. I think it’s coming together quite nicely. We have another project meeting next week so I’ll probably have a better idea about where to focus next on this project after that.
My next project was the Burns project. Last month Pauline sent round a document listing some fairly major changes to the project website – restructuring sections, changing navigation, layout and page content etc. I set up a test version of the live site and set about implementing all of the changes that I could make without further input from project people. After getting it all working pretty well I contacted Pauline and we arranged to meet on Monday next week to go through everything and (hopefully) make all of the changes live.
The third project I worked on this week was the SCOSYA project and this took up the bulk of my time. Last week Gary had sent me a template of the spreadsheet that the project fieldworkers will fill in and email to Gary. Gary will then need to upload these spreadsheets to an online database through a content management system that I need to create. This week I began working on the database structure and the content management system. The project also wants the usual sort of project website and blog, so first of all I set up WordPress on the project’s domain. I toyed with the idea of making the content management system a WordPress ‘plugin’, but as I want the eventual front-end to be non-Wordpress I decided against this. I also looked into using Drupal for the content management system as Drupal is a tool I feel I ought to learn more about. However, the content management system is going to be very straightforward – just file upload plus data browse, edit and delete and using Drupal or other such tools seemed like overkill to me. I was also reluctant to use a system such as Drupal because they seem to change so rapidly. SCOSYA is a 5 year project (I think!) and my worry is that by the end of the project the version of Drupal that I use would have been superseded, no longer supported and seen as a bad thing to have running on a server. So I decided just to create the CMS myself.
I decided that rather than write all of the user authentication and management stuff myself I would tie this in with the WordPress system that I’d set up to power the project website and blog. After a bit of research I figured out that it is remarkably easy for non-Wordpress scripts to access the WordPress authentication methods so I set up the CMS to use these, following the instructions I found here: http://skookum.com/blog/using-wordpress-as-a-user-and-authentication-database. With this in place SCOSYA staff can manage their user accounts via WordPress and use the same details to access the CMS, which seems very neat.
By the end of the week I had created an upload script that allows you to drag and drop multiple files into the upload pane, for these to be checked both on the client and server side and for a log of uploads to be built up dynamically as each file is processed in a scrolling section beneath the upload pane. I still need to do quite a lot of work on the server-side script in order to extract the actual data from the uploaded files and to insert this data into the relevant tables, but I feel that I have made very good progress with the system so far. I’ll continue with this next week.
Other than these three main projects I was involved with some others. I fixed a few bugs that had crept into the SciFiMedHums bibliographic search facility when I’d updated the functionality of it last week. I slightly tweaked the Medical Humanities Network system to give a user feedback if they try to log in with incorrect details (previously no feedback was given at all). I also contacted John Watt at NeSC to see whether he might be able to help with the extraction of the Hansard data and he suggested I try to do this on the University’s High Performance Compute Cluster. I’ll need to speak with the HPCC people to see how I might be able to use their facilities for the task that needs performed.