I decided this week to devote some time to redevelop the Thesaurus of Old English, to bring it into line with the work I’ve been doing to redevelop the main Historical Thesaurus website. I had thought I wouldn’t have time to do this before next week’s ‘Kay Day’ event but I decided that it would be better to tackle the redevelopment whilst the changes I’d made for the main site were still fresh in my mind, rather than coming back to it in possibly a few months’ time, having forgotten how I implemented the tree browse and things like that. It actually took me less time than I had anticipated to get the new version up and running, and by the end of Tuesday I had a new version in place that was structurally similar to the new HT site. We will hopefully be able to launch this alongside the new HT site towards the end of next week.
I sent the new URL to Carole Hough for feedback as I was aware that she had some issues with the existing TOE website. Carole sent me some useful feedback, which led to me making some additional changes to the site – mainly to the tree browse structure. The biggest issue is that the hierarchical structure of TOE doesn’t quite make sense. There are 18 top-level categories, but for some reason I am not at all clear about each top-level category isn’t a ‘parent’ category but is in fact a sibling category to the ones that are one level down. E.g, logically ’04 Consumption of food/drink’ would be the parent category of ’04.01’, ’04.02’ etc but in the TOE this isn’t the case, rather ’04.01’, ’04.02’ should sit alongside ‘04’. This really confuses both me and my tree browse code, which expects categories ‘xx.yy’ to be child categories of ‘xx’. This led to the tree browse putting categories where logically they belong, but within the confines of the TOE make no sense – e.g. we ended up with ’04.04 Weaving’ within ’04 Consumption of food/drink’!
To confuse matters further, there are some additional ‘super categories’ that I didn’t have in my TOE database but apparently should be used as the real 18 top-level categories. Rather confusingly these have the same numbers as the other top-level categories. So we now have ’04 Material Needs’ that has a child category ’04 Consumption of food/drink’ that then has ’04.04 Weaving’ as a sibling and not as a child as the number would suggest. This situation is a horrible mess that makes little sense to a user, but is even harder for a computer program to make sense of. Ideally we should renumber the categories in a more logical manner, but apparently this isn’t an option. Therefore I had to hack about with my code to try and allow it to cope with these weird anomalies. I just about managed to get it all working by the end of the week but there are a few issues that I still need to clear up next week. The biggest one is that all of the ‘xx.yy’ categories and their child categories are currently appearing in two places – within ‘xx’ where they logically belong and beside ‘xx’ where this crazy structure says they should be placed.
In addition to all this TOE madness I also spent some further time tweaking the new HT website, including updating the quick search box so the display doesn’t mess up on narrow screens, making some further tweaks to the photo gallery and making alterations to the interface. I also responded to a request from Fraser to update one of the scripts I’d written for the HT OED data migration that we’re still in the process of working through.
In terms of non-thesaurus related tasks this week, I was involved in a few other projects. I had to spend some time on some AHRC review duties. I also fixed an issue that had crept into the SCOTS and CMSW Corpus websites since their migration: the ‘download corpus as a zip’ issue was no longer working due to the PHP code using an old class to create the zip that was not compatible with the new server. I spent some time investigating this and finding a new way of using PHP to create zip files. I also locked down the SPADE website admin interface to IP address ranges of our partner institutions and fixed an issue with the SCOSYA questionnaire upload facility. I also responded to a request for information about TEI XML training from a PhD student and made a tweak to a page of the DSL website.
I spent the remainder of my week looking at some app issues. We are hopefully going to be releasing a new and completely overhauled version of the ARIES app by the end of the summer and I had been sent a document detailing the overall structure of the new site. I spent a bit of time creating a new version of the web-based ARIES app that reflected this structure, in preparation for receiving content. I also returned to the Metre app, that I’ve not done anything about since last year. I added in some explanatory text and I am hopefully going to be able to start wrapping this app up and deploying it to the App and Play stores soon. But possibly not until after my summer holiday, which starts the week after next.
I continued to work on the redevelopment of the Historical Thesaurus website this week, which took up the bulk of my time. Marc, Fraser and I have made some really good progress on this and by the end of the week we had most of the new version complete, apart from adding in new content to some of the ‘About’ pages. I think it’s looking really great and the tree browse in particular makes accessing the content so much quicker and easier.
The first major item I attempted to implement was the facility to open the tree at a specific category rather than at the top level. Being able to do this was absolutely vital for the new design as if I couldn’t figure out a way to do it we wouldn’t be able to go from a search result or citation to a specific category and the tree view would therefore be pretty useless.
Initially I attempted to generate the full, open tree structure on the server side and to return this to the ‘fancytree’ plugin but figuring out a script that would generate the full, nested JSON structure where children arrays are children of children of children etc was taking too long for me to figure out.
I also make lots of other, smaller updates, namely: I fixed a gap that appeared under the arrow showing selected nav menu item; I ensured that the resizing of the tree / category section on narrow screens worked; I fixed the formatting of the random category ‘reload’ button; I updated the search page to remove the tabs and to add in proper PoS abbreviations to the parts of speech buttons; I reimplemented the ‘Jump to category number’ feature, and gave it the proper PoS abbreviations; I created one specific page for the tree, rather than there being a separate ‘browse’ and ‘search’ page. The system highlights the appropriate nav menu item based on what the user is looking at. I fully integrated the search with the tree view – you can go back to search results or clear your search and breadcrumbs now appear on the tree page. The ‘Select category’ search results now display the proper PoS abbreviations and search word highlighting is now working in the tree.
I also added an ‘autocomplete’ function for the ‘label’ search – so now if you start typing in a label all of the labels in the system that match your text appear in a selectable list. I also fixed the issue I mentioned last week whereby ‘T7’ categories looked like they had child categories and then gave an error when you tried to expand them. I added the ‘cite’ popup to the tree view too, appearing beside each maincat and subcat. I also upgraded the version of jQuery UI that the site uses and implemented a new ‘search’ option beside each work that opens a pop-up that allows the user to search for the word in the HT, the OED and TOE.
I added in a little function to display the HT version and added a call to this wherever the version is displayed, so that in future we will only have to update the version in one place. I updated the tooltip styles to match the new colours Marc has been working on. I fixed a bug whereby if a search word had a space in it the category pane failed to load in the tree view. I slightly altered the category pane so there is no padding between the heading and the border and I tweaked the appearance of the PoS boxes on the search page and have made the ‘clear search boxes’ link into a button. I also implemented a completely new structure for the ‘About’ pages, which included incorporating my Sparklines and also embedding Flikr photo galleries in the ‘photo gallery’ page rather than just linking out to that site. I also added in a ‘top’ button that appears on screen when the user is mid-way down a page and updated the way dates are searched for in the advanced search. There is now a ‘standard’ and ‘advanced’ date search option. It has certainly been a productive week!
In addition to the above I spent about a day working on the poem database for The People’s Voice project, which I have now completed. This included making some updates to the search facility and implementing the ‘view record’ page. You can now click on a search / browse result to view the information about a poem. I think all of the information about each poem is displayed, other than sound files as there are none in the system yet. Currently the page is split into two panes. The left-hand pane includes all of the information about the poem (authors, publication details, library details etc) while the right-hand pane contains the researcher’s comments. Poem title and franchise appear above these panes. Information that can be searched for appears as blue links (e.g. author, year of publication). Clicking on one of these links performs a search for this information.
I also upgraded all of my WordPress sites to the latest version of WordPress that was recently released, fixed an issue Rob Maslen was having with his blog and met with Jennifer Smith, Gary Thoms and Niels Cadee from the library to discuss managing the research data for the SCOSYA project.
Next week I will continue to tweak the new HT website and I might have a go at updating the Thesaurus of Old English site as well, depending on what other things crop up.
I spent quite a bit of time this week on the Historical Thesaurus. A few tweaks ahead of Kay Day has now turned into a complete website redevelopment, so things are likely to get a little hectic over the next couple of weeks. Last week I implemented an initial version of a new HT tree-based browse mechanism but at the start of this week I still wasn’t sure how best to handle different parts of speech and subcategories. Originally I had thought we’d have a separate tree for each part of speech, but I came to realise that this was not going to work as the non-noun hierarchy has more gaps than actual content. There are also issues with subcategories as ones with the same number but different parts of speech have no direct connection. Main categories with the same number but different parts of speech always refer to the same thing – e.g. 01.02aj is the adjective version of 01.02.n. But subcategories just fill out the numbers, meaning 01.02|01.aj can be something entirely different to 01.02|01.n. This means providing an option to jump from a subcategory in one part of speech to another wouldn’t make sense.
Initially I went with the idea of having noun subcategories represented in the tree and the option to switch part of speech in the right-hand pane after the user selected a category in the tree (if a main category was selected). When a non-noun main category was selected then the subcategories for this part of speech would then be displayed under the main category words. This approach worked, but I felt that it was too inconsistent. I didn’t like that subcategories were handled differently depending on their part of speech. I therefore created two additional versions of the tree browser in addition to the one I created last week.
The second one has [+] and [-] instead of chevrons. It has the catnum in grey before the heading. The tree structure is the same as the first version (i.e. includes all noun categories and noun subcats). When you open a category the different parts of speech now appear as tabs, with ‘noun’ open by default. Hover over a tab to see the full part of speech and the heading for that part of speech. The good thing about the tabs is the currently active PoS doesn’t disappear from the list, as happens with the other view. When viewing a PoS that isn’t ‘noun’ and there are subcategories the full contents of these subcategories are visible underneath the maincat words. Subcats are indented and coloured to reflect their level, as with the ‘live’ site’s subcats, but here all lexemes are also displayed. As ‘noun’ subcats are handled differently and this could be confusing a line of text explains how to access these when viewing a non-noun category.
For the third version I removed all subcats from the tree and it only features noun maincats. It is therefore considerably less extensive, and no doubt less intimidating. In the category pane, the PoS selector is the same as the first version. The full subcat contents as in v2 are displayed for every PoS including nouns. This does make for some very long pages, but does at least mean all parts of speech are handled in the same way.
Marc, Fraser and I met to discuss the HT on Wednesday. It was a very productive meeting and we formed a plan about how to proceed with the revamp of the site. Marc showed us some new versions of the interface he has been working on too. There is going to be a new colour scheme and new fonts will be used too. Following on from the meeting I updated the navigation structure of the HT site, replaced all icons used in the site with Font Awesome icons, added in the facility to reload the ’random category’ that gets displayed on the homepage, moved the ‘quick search’ to the navigation bar of every page and made some other tweaks to the interface.
I spent more time towards the end of the week on the tree browser. I’ve updated the ‘parts of speech’ section so that the current PoS is also included. I’ve also updated the ordering to reflect the order in the printed HT and updated the abbreviations to match these too. Tooltips now give text as found in the HT PDF. The PoS beside the cat number is also now a tooltip. I’ve updated the ‘random category’ to display the correct PoS abbreviation too. I’ve also added in some default text that appears on the ‘browse’ page before you select a category.
- If it’s a subcat we don’t just want to display this, we need to grab its maincat, all of the maincat’s subcats but then ensure the passed subcat is displayed on screen.
- We need to build up the tree hierarchy, which is for nouns, so if the passed catid is not a noun category we need to also then find the appropriate noun category
I have sorted out point 1 now. If you pass a subcat ID to the page the maincat record is loaded and the page scrolls until the subcat is in view. I will also highlight the subcat as well, but haven’t done this yet. I’m still in the middle of addressing the second point. I know where and how to add in the grabbing of the noun category, I just haven’t had the time to do it yet. I also need to properly build up the tree structure and have the relevant parts open. This is still to do as currently only the tree from the maincat downwards is loaded in. It’s potentially going to be rather tricky to get the full tree represented and opened properly so I’ll be focussing on this next week. Also, T7 categories are currently giving an error in the tree. They all appear to have children and when you click on the [+] then an error occurs. I’ll get this fixed next week too. After that I’ll focus on integrating the search facilities with the tree view. Here’s a screenshot of how the tree currently looks:
I was pretty busy with other projects this week as well. I met with Thomas Clancy and Simon Taylor on Tuesday to discuss a new place-names project they are putting together. I will hopefully be able to be involved in this in some capacity, despite it not being based in the School of Critical Studies. I also helped Chris to migrate the SCOTS Corpus websites to a new server. This caused some issues with the PostGreSQL database that took use several hours to get to the bottom of. These were causing the search facilities to be completely broken, but thankfully I figured out what was causing this and by the end of the week the site was sitting on a new server. I also had an AHRC review to undertake this week.
On Friday I met with Marc and the group of people who are working on a new version of the ARIES app. I will be implementing their changes so it was good to speak to them and learn what they intend to do. The timing of this is going to be pretty tight as they want to release a new version by the end of August, so we’ll just need to see how this goes. I also made some updates to the ‘Burns and the Fiddle’ section of the Burns website. It’s looking like this new section will now launch in July.
Finally, I spent several hours on The People’s Voice project, implementing the ‘browse’ functionality for the database of poems. This includes a series of tabs for different ways of browsing the data. E.g. you can browse the titles of poems by initial letter, you can browse a list of authors, years of publication etc. Each list includes the items plus the number of poems that are associated with the item – so for example in the list of archives and libraries you can see that Aberdeen Central Library has 70 associated poems. You can then click on an item and view a list of all of the matching poems. I still need to create the page for actually viewing the poem record. This is pretty much the last thing I need to implement for the public database and all being well I’ll get this finished next Friday.
Monday this week was the spring bank holiday so it was a four-day week for me. I split my time this week over three main projects. Firstly, I set up an initial project website for Jane Stuart-Smith’s SPADE project. We’d had some difficulty in assigning the resources for this project but thankfully this week we were given some web space and I managed to get a website set up, create a skeleton structure for it and create the user accounts that will allow the project team to manage the content. I also had some email discussions with the project partners about how best to handle ‘private’ pages that should be accessible to the team but no-one else. There is still some work to be done on the website, but for the time being my work is done.
I also continued this week to work on the public interface for the database of poems for The People’s Voice project. Last week I started on the search facility, but only progressed as far as allowing a search for a few fields, with the search results page displaying nothing more than the number of matching poems. This week I managed to pretty much complete the search facility. Users can now search for any combination of search boxes and on the search results page there is now a section above the results that lists what you’ve searched for. This also includes a ‘refine your search’ button that takes the user back to the search page. The previously selected options are now ‘remembered’ by the form, allowing the user to update what they’ve searched for. There is also a ‘clear search boxes’ button so the user can start a fresh search.
Search results are now paginated. Twenty results are displayed per page and if there are more results than this then ‘next’, ‘previous’ and ‘jump to page’ links are displayed above and below your search results. If there are lots of pages some ‘jump to page’ links are omitted to stop things getting too cluttered. Search results display the poem title, date (or ‘undated’ if there is no date), archive / library, franchise and author. Clicking on a result will lead to the full record, but this is still to do. I also haven’t added in the option to order the results by anything other than poem title, as I’m not sure whether this will really be of much use and it’s going to require a reworking of the way search results are queried if I am to implement it. I still have the ‘browse’ interface to work on and the actual page that displays the poem details, and I’ll continue with this next week.
I met with Bryony Randall this week to discuss some final tweaks to the digital edition of the Virginia Woolf short story that I’ve been working on. I made a few changes to the transcription, updated how we label ‘sic’ and ‘corr’ text in the ‘edition settings’ (these are now called ‘original’ and ‘edited’) and I changed which edition settings are selected by default. Where previously the original text was displayed we now display the ‘edited’ text with only line breaks from the ‘original’ retained. Bryony is going to ask for feedback from members of the Network and we’re going to aim to get things finalised by the end of the month.
I spent the rest of the week working on the Historical Thesaurus. Last week I met with Marc and Fraser to discuss updates to the website that we were going to try and implement before ‘Kay Day’ at the end of the month. One thing I’ve wanted to try to implement for a while now is a tree-based browse structure. I created a visual tree browse structure using the D3.js library for the Scots Thesaurus project and doing so made me realise how useful having such a speedy way to browse the full thesaurus structure would be.
I tried a few jQuery ‘tree’ plugins and in the end I went with FancyTree (https://github.com/mar10/fancytree) because it clearly explained how to load data into nodes via AJAX when a user opens the node. This is important for us as we can’t load all 235,000 categories into the tree at once (well, we could but it would be a bad idea). I created a PHP script (that I will eventually integrate with the HT API) that you can pass a catid to and it will spit out a JSON file containing all of the categories and subcategories that are one level down from it. It also checks whether each of these have child categories. If there are child categories then the tree knows to place the little ‘expand’ icon next to the category. When the user clicks on a category this fires off a request for the category’s children and these are then dynamically loaded into the tree. Here’s a screenshot of my first attempt at using FancyTree:
Subcategories are highlighted with a grey background and in this version you can’t actually view the words in a category. Also, only nouns are currently represented. I thought at this stage that we might have to have separate trees for each part of speech, but then realised that the other parts of speech don’t have a full hierarchy so the tree would be missing lots of branches and would therefore not work. In this version the labels only show the category heading and catnum / subcat, but I can update the labels to display additional information. We could for example show the number of categories within each category, or somehow represent the number of words contained in the category so you can see where the big categories are. I should also be able to override the arrow icons with font awesome icons.
After creating this initial version I realised there was still a lot to be done. For example, if we’re using this browser then we need to ensure that when you open a category the tree loads with the correct part opened. This might be tricky to implement. Also there’s the issue of dealing with different parts of speech.