Week Beginning 2nd November 2015

My first task this week was to finish off the AHRC duties I’d started last Friday, and with that out of the way I set about trying to fix a small bug with the Scots Corpus that I’d been meaning to try and sort for some time. The concordance feature of the advanced search allows users to order the sentences alphabetically by words to the left or right of the node word but the ordering was treating upper and lower case words separately, e.g. ABC and then abc, rather than AaBbCc. This was rather confusing for users and was being caused by the XSLT file that transforms the XML data into HTML. This is processed dynamically via PHP and unfortunately PHP doesn’t support XPath 2, which provides some handy functions for ignoring case. There is a hack to make XPath 1 ignore case by transforming the data before it is ordered but last time I tried to get this to work I just couldn’t figure out how to integrate it. Thankfully when I looked at the XSLT this time I realised where the transformation needed to go and we have nicely ordered results in the concordance at last.

On Tuesday we had a team meeting for the Metaphor in the Curriculum project. One of the metaphor related items on my ‘to do’ list was to integrate a search for sample lexemes with the Mapping Metaphor search facilities so In preparation for this meeting I tried to get such a feature working. I managed to get an updated version of the Advanced Search working before the meeting, and this allows users to supply some text (with wildcards if required) into a textbox and for this to search the sample lexemes we have recorded about each metaphorical link. I also updated the way advanced search results are displayed. Previously upon completing an advanced search users were then taken to a page where they could then decide which of the returned categories they actually wanted to view the data for. This was put in place to avoid the visualisation getting swamped with data, but I always found it a rather confusing feature. What I’ve done instead is to present a summary of the user’s search, the number of returned metaphorical connections, an option to refine the search and then buttons leading to the four data views for the results (visualisation, table etc). I think this works a lot better and makes a lot more sense. I also updated the quick search to incorporate a search for sample lexemes. The quick search is actually rather different to the advanced search in that the former searches categories while the latter searches metaphorical connections. Sample lexemes are an attribute of a metaphorical connection rather than a category so it took me a while to decide how to integrate the sample lexeme search with the quick search. In the end I realised that both categories in a metaphorical connection share the same pool of sample lexemes – that is how we know there is overlap between the two categories. So I could just assume that if a search term appeared in the sample lexemes then both categories in the associated connection should be returned in the quick search results.

The actual project meeting on Tuesday went well and Ellen has supplied me with some example questions for which I need to make some mock-ups to test out the functionality of the interactive exercises. Ellen had also noticed that some of the OE data was missing from the online database and after she provided it to me I incorporated it into the system.

On Thursday morning I had a meeting with Jennifer and Gary to discuss the SCOSYA project. I’m going to be developing the database, content management system and frontend for the project, and my involvement will now be starting in the next week or so. The project will be looking at about 200 locations, with student fieldworkers filling in about 1000 questionnaires and a similar number of recordings. To keep things simple the questionnaires will be paper based and the students will then transfer their answers to an Excel spreadsheet afterwards. They will then email these to Gary. So rather than having to make a CMS that needs to keep track of at least 40 users and their various data I thankfully just need to provide facilities for Gary and one or two others to use, and the data will be uploaded into the database via Gary attaching the spreadsheets to a web form. Gary is going to provide me with a first version of the spreadsheet structure, with all data / metadata fields present and I will begin working on the database after that.

I spent most of the rest of the week on tasks relating to the Hansard data for the Samuels project. On Thursday afternoon I attended the launch of the Hansard Corpus, and that all went very well. My own Hansard related work (visualising the frequency of thematic headings throughout the Hansard texts) is also progressing, albeit slowly. This week I incorporated a facility to allow the user’s search to be limited to a specific category that they have chosen, or for the returned data to include the chosen category and every other category below this point in the hierarchy. So for example whether ‘Love’ (AU:27) should only show results for this category specifically or whether it should also include all categories lower down, e.g. ‘Liking’ (AU:27:a). I created cached versions of the data for the ‘cascading’ searches too, and it’s working pretty well. I then began to tackle limiting searches by speaker. I’ve now got a ‘limit by’ box that users can open up for each row that is displayed. This box currently features a text box where a speaker’s name can be entered. This box has an ‘autocomplete’ function so for example searching for ‘Tony B’ will display people like ‘Blair’ and ‘Benn’. Clicking on a person adds them to the limit option and updates the query. And this is as far as I’ve got, because trying to run the query on the fly even for the two years of sample data causes my little server to stop working. I’m going to need to figure out a way to optimise the queries if this feature is going to be at all usable. This will be a task for next week.