Week Beginning 13th May 2013

A very late post for this week as I was working 8-4 on the Friday and ran out of time, then I was ill the Monday to Wednesday of the following week.  I continued with the redevelopment of the Historical Thesaurus website for the majority of this week.  The main achievements this week were the creation of the ‘category’ page, reached via the search results page.  This page displays the words found within a particular category, and also provides extensive browse options to related material, for example any categories that have the same number but a different part of speech, any subcategories of the category in question, plus also any parent and child categories in the overall HT hierarchy.  It took quite a long time to implement all these browse options, but I think it’s working rather nicely.  There are some issues related to traversing the hierarchy due to issues with the data, but hopefully the bulk of these will be resolved.

I also added search term highlighting to the category page – with the user’s original search term (minus wildcards) highlighted wherever the term appears (including within longer words).  This works throughout the hierarchy – so if the user searches for ‘*sausage*’ and accesses the subcategory ‘types of sausage’ then the term is highlighted wherever it appears within the words, and if the user then browses up the hierarchy to the main category ‘Sausage’ any occurrences of the term will be highlighted here too.

Also this week I reworked the search results (category selection) page with the aim of speeding up the queries.  Previously queries were taking far too long to run – sometimes as much as 30 seconds, which is completely unacceptable.  Thankfully after reworking things the search is significantly faster, generally loading the search results in less than a second for non-wildcard searches and only taking a little longer than this for wildcard searches.  I think the search is now as fast as it needs to be.

Also this week I began work on the ‘advanced search’ page.  I now have all of the required search options within a form on the search page, although for now none of the options actually work – that is still to be tackled next week.  I have also added in the option of jumping straight to a specific category of you know the category number, and this option is fully operational.

I had a further meeting with Marc and Christian on Thursday this week, which was another useful opportunity to go through some of the outstanding tasks and make some decisions.  We are still dealing with a number of issues with the HT data, some having come from the Access database, some introduced during the migration process and others resulting from the original format of the data.  For example, there are some problems with empty categories.  Categories that have no words were not part of the Access database, but are needed to properly enable the traversal of the hierarchy.  Previously Marc gave me a spreadsheet containing a lot of the empty categories, but it turns out this list wasn’t up to date and there are some problems with category numbers having been changed.  Marc is going to get the up to date categories to me from the XML file that was submitted to the OED people, which will help greatly with this.

Other than HT work I met with Jean this week to discuss finalising the redevelopment of the Digital Humanities Network website.  We had a very useful meeting and we came up with a list of outstanding tasks that needed tackled.  I spend most of Friday this week working through the list and managed to get most items implemented.  The website is looking much better now, and I also made it live, replacing the older version, this week too.  you can access it here: