Another primarily DSL related week this week, with pretty much four out of my five days spent working on the redevelopment of the new website. Last week I started migrating the ‘History of Scots’ section from the old website to the new one and thankfully I managed to complete this task this week. I seem to have given myself repetitive strain injury whilst getting it done but at least that’s it completed now. There were times when I never thought I’d get it done so it is quite a relief to see it all there. All this retagging text took up rather a large percentage of the week but by midway through Wednesday I was able to move on to other, more interesting tasks. I fixed the issues relating to the scrolling to highlighted search terms in the entry page. Previously we’d set it up so that the entry page scrolled down to the first instance of the search term, but occasionally this would then be obscured by the ‘fixed’ heading that appears once the page has scrolled down a bit. Adding in a margin ensured that there would be enough space to prevent this from happening. I also updated the highlighting script to make it only activate when the search type is ‘full text’ or ‘quotations’, meaning a headword search loads the page at the very top now.
Other DSL tasks this week included updating the ‘News’ box on the homepage in preparation for the launch and adding in an error message to be displayed in the (hopefully unlikely) event of the entry XML failing to pass through the XSLT. I also tackled two of the larger outstanding items on my ‘to do’ list, namely adding in facilities to enable users to perform an advanced search in the full text while excluding quotations, and adding a predictive text search to the bibliography search form. The latter required some extending of the jQuiery UI autocomplete widget so that I could pass multiple variables to my AJAX script (to enable users’ selections of source type and search type to be passed) and also some reqorking of the AJAX script I had previously created for the quick search predictive drop-down so as to enable both text and an ID to be passed. This allows the correct bibliography page to be immediately loaded as soon as the user selects a bibliography item from the list of possibilities. It all seems to be working pretty well, although I’ve had to strip out the XML tags that were being returned via the API as the autocomplete widget can only display plain text without further extending.
On Friday I devoted quite a bit of time to continuing updates to the Historical Thesaurus database. I thought I had solved the ‘duplicate t7’ issue last week but alas it turned out that there were lots of other subcategories that needed to be updated. I managed to fix the bulk of these but there are unfortunately still some issues with the data that may need manual intervention to be sorted. Thankfully it’s not a massive number of categories (a few hundred out of 230,000) but it’s still disappointing that the renumbering process didn’t proceed as smoothly as I’d hoped. Also this week I updated the search so that categories are now returned even if they don’t contain any words, which is something Marc noticed that it wasn’t doing last week.
There are still quite a few tweaks I would like to make to the HT website, but these will have to wait for another time. I didn’t have enough time this week to rewrite my PDR form, which is something I really must tackle next week.
I gave a bit of advice to a couple of people this week, firstly Wendy Anderson on a new possible project she’s putting together (I can’t really say much more at this stage) and secondly Stevie Barrett in Celtic who wanted me to give some feedback on the technology they are proposing to use for an upcoming project.
I got back into the redevelopment of the DSL website again this week, which occupied a large chunk of my available time. This included adding in some new features, such as making the entry page automatically scroll to the first highlighted item if reached via the search results page, fixing some layout issues and bugs (such as ‘hide snippets’ sometimes appearing twice on the results page), updating some content such as the ‘support DSL’ page and fixing a number of typos in the content that people had noted. Peter migrated the API to the server at Glasgow this week as well, and I spent a bit of time updating the way the front end connects to the API (as the API and the front-end are now running on the same server there is no need to use the PHP library cURL to make a pathway through the university’s proxy – instead we can just use a direct localhost URL, which will no doubt give a performance boost). Peter had also implemented the ‘word of the day’ feature this week so I got that working on the homepage too.
I also started migrating the ‘History of Scots’ section of the old website to the new one. I was hoping I’d be able to save the HTML from the old site and simply slot this into the page structure of the new site but upon looking at the source of the old pages I realised this would not be a good idea. The HTML of the pages was generated by MS Word, which produces an absolute mess of non-standard tags, fixed font styles and sizes and other horrible issues. Not only that, but the HTML was generated by a version of Word from more than 10 years ago when Microsoft was particularly bad about even remotely following HTML standards. I just couldn’t bear the thought of all that horrible mess going into the new site so I decided to recode the whole lot myself. This is a pretty major and utterly tedious task, but at least it means the pages will have decent HTML, plus it’s a process that will only ever have to be done once. By the end of the week I’d managed to complete the first 5 sections (out of 9) so hopefully I’ll get it all completed next week, leaving enough time to get the other outstanding tasks completed before the launch on the 12th of September.
Other than DSL tasks I had spent some further time working on the reviewer’s response for Carole’s AHRC project. We submitted the response on Tuesday and here’s hoping the review panel goes well. I also spent a little more time on the response to Jennifer’s project too. I had my PDR with Jennifer on Thursday, which went ok. My PDR form was considered to be very good except for one thing: the words, most of which I’ll be changing next week.
I had two other meetings this week, the first one with Marc and Fraser to discuss the Hansard part of the SAMUELS project, which I will be working on before Christmas. The people at Lancaster will be semantically tagging the Hansard texts and then I will be getting all of this data in a database and will be creating a front end through which people will be able to query things like the most common semantic groupings. We’ll provide facilities to allow users to generate some nice graphs and things from the data too.
My other meeting was with Susan Rennie to discuss her Scots Thesaurus project. She’s hoping to get a developer employed in the next couple of months and the meantime I’m going to be helping out with some visualisations based on sample data and also will be giving her advice on technical matters.
I spent most of Friday working on Historical Thesaurus matters, mainly fixing some bugs that had crept in. Firstly there was an issue with duplicate ‘t7’ numbers. Some categories that had a ‘t7’ number (e.g. 01.05.19.16.02.03.03) had a duplicate number in the t7 column – e.g. the real number should be 01.05.19.16.02.03. I figured out what had caused this problem, which was my fault. When I had renumbered the categories last week some categories shifted up one, but in these cases I’d forgotten to clear the contents of the t7 column, resulting in the duplication. Fixing this was no small matter, as there are many instances in the data when it is perfectly legitimate for the t7 column to be the same as the t6 column. I think I managed to sort the issue and implemented the fix. I also investigated why certain categories had parts of speech in one of the ‘t’ columns, and it turned out these categories were duplicates that could safely be deleted. I also added many more alternative forms to the lexeme search term table to allow variant spellings to work – e.g. the HT uses ‘ize’ for words like ‘civilize’ so if you search for ‘civilise’ you don’t find anything. Several thousand new variants have now been added to address this
I was on holiday for the first three days of this week (a very pleasant but somewhat wet trip to Skye) and was back to work on the Thursday. Despite being a short working week I managed to cram quite a lot in, most of which revolved around responses to the reviews of two AHRC bids that had come back. The first was Carole’s place-name project. I read the reviews, formulated some responses to the some of the technical matters that were raised and participated in a meeting with the prospective project team on the Friday morning. A response is now being written by Thomas Clancy and hopefully the review panel will go well. The other project was Jennifer’s dialect syntax project. There were fewer issues raised with this project and I managed to supply the feedback required from me in email form. Fingers crossed for this one too.
Other than bid related stuff and the usual catching up with emails that always seem to be waiting on return from a holiday, I continued with the updates to the Historical Thesaurus that I had begun before I went away. The HT is undergoing a large-scale renumbering of categories (with tens of thousands having their numbers updated). I had previously completed work on two of the three sections and this week I completed the renumbering of the third section (The Social World). It was a slightly laborious and tricky process but it’s all done now and would appear to have been a success. I also made some further updates to the HT categories that Fraser had emailed me about separately.
I also had a meeting on Friday with a guy called Adam Wyner from Aberdeen University. He is a computer scientist who is involved in a number of Digital Humanities projects (and potential projects) at Aberdeen and he wanted to discuss his research and the research that is being undertaken at Glasgow. One of the main areas he is currently investigating is the creation of a sort of European-wide ontology of machine readable dictionaries and he was really hoping to get access to the DSL data. I showed him the new DSL website, some of the source XML files plus gave him a glimpse of the API but pointed him in the direction of the SLD people for any access to this data. He was also interested in the Historical Thesaurus and the SAMUELS project.
I was hoping to get back into the outstanding DSL redevelopment tasks this week but what with everything else that was going on I simply ran out of time. However, I will be focussing on the DSL almost exclusively next week, all being well.
A very late report this week as I was on holiday on the Friday afternoon and Monday to Wednesday of the following week. Despite being a very busy week this is going to be a bit of a short report due to the time that has elapsed! I spent the majority of the week continuing to go through the list of outstanding tasks for the DSL redevelopment following on from last week’s meeting – ticking off 16 of the 33 items, plus making some further tweaks that Ann had identified. These included fixing some further XSLT errors, ‘unfixing’ the footer so it’s not always visible on screen, fixing the fixed header print bug, fixing the jittery page error when scrolling down entries that are only just longer than the page, tweaking the layout of a number of elements such as the position of the ‘cite’ button and the advanced search page and reducing the number of browse results that are displayed when the browser window is below a certain height.
In addition to DSL work, I also spent some time working with the Burns people. On Tuesday I went to Edinburgh with Pauline to meet with Chris Fleet at the NLS regarding the Burns tour maps. It was a really useful meeting and it was very helpful indeed to talk with Chris and his colleague Daniel. I showed them some mock-ups of the tour maps I’d previously made using OpenLayers and we explained the sort of thing we were after (a panable, zoomable map with icons that you could click on to open up pop-ups containing HTML). I had thought that OpenLayers required latitude and longitude values in order to ‘pin’ markers on an image but Chris said that this wasn’t the case and that pixel coordinates could be used instead. Pauline and I were expecting to get some advice from Chris and then I would then develop my existing mock-ups further based on this advice but instead Chris and Daniel agreed to make initial interactive versions of the tour maps for us, which was absolutely brilliant. A couple of days later Chris emailed an initial version of one of the tour maps, complete with the tour route plotted as a vector later and markers pinned to the route. It’s all looking very promising indeed. I placed my simple mockup versions of the three tour maps on the website for now, with the plan being we will replace these with the fancier, more interactive ones in the next couple of months once they are available. Also this week I uploaded a new batch of prose recordings.
My final major task of the week was to tackle the renumbering of the Historical Thesaurus. Christian and Fraser have been hard at work rationalising and correcting the HT category hierarchy and provided me with a rather large document of required changes. This was quite a large task to tackle, involving the moving around of tens of thousands of HT categories which obviously required a great deal of care to ensure that data wasn’t lost, incorrectly amalgamated or mangled in some other way. I managed to complete the necessary changes to ‘The External World’ and ‘The Mental World’ this week, and will tackle the remaining category once I’m back from my holiday on Skye next week.
My time this week was mostly split between four projects. First and foremost was DSL redevelopment work. I went across to Edinburgh for a meeting with Ann and Peter on the Thursday and we spent the morning going through the outstanding tasks, looking at the website and the API and deciding what else needs to be done before the launch and how we should proceed. It was a very useful meeting and my ‘to do’ list (which had been getting rather short) was extended greatly as a result of the meeting. The website will be launched on September the 12th and we should have everything done and dusted before that date. In addition to attending the meeting I also continued with the development tasks, including fixing some issues with cross-references and updating the way the XSLT handles certain elements.
The search facility currently replicates (as far as is possible) the functionality of the CD-ROM search facility, although I’ve updated it to allow you to combine region with word search rather than them being separate. It’s possible that the search will need some further work to make it easier to understand. Currently a search has to contain text as well – e.g. you can’t just select a region and view all of the words. This might need updated too. After performing a search the results are loaded as buttons. Click on one of these to open the word record, and from here you can navigate through the results using the ‘Previous Result’ and ‘Next Result’ options.
I haven’t included any of the additional functionality of the CD-ROM (e.g. the Grammar Broonie or the games) as I believe these aren’t to be included in the initial release. The size of the App is about 15Mb (with sound clips taking up about 10Mb and data about 5Mb). This shouldn’t pose any problems.
My third project of the week was Katie’s Aelfric bid that she’s putting together. I can’t go into too much detail about it at this stage, but I’ve been helping her get the technical side of the project in order and I participated in a conference call with her and the Zooniverse people. This was a useful call and helped define the sorts of input, output and workflows that would be possible. The project is shaping up really nicely and hopefully we’ll be able to submit it by the end of the summer.
My final project this week was the Historical Thesaurus. Fraser had emailed me a list of updates to be made to the data and I spent some time going through these and making the necessary changes. Fraser and Christian also wanted me to create a little script that would pinpoint those categories of any part of speech that don’t have a parent category. A lot of categories that aren’t nouns don’t have parent categories so users can’t navigate up the hierarchy within the part of speech – instead they would have to select the corresponding noun category. We’re going to update this in some way – either creating empty categories to facilitate browsing or updating the interface to point out to people that they need to select the noun to navigate up further. Whilst creating the little script I came across a bit of an error with some rows in the category database. There are about a thousand categories that have a part of speech recorded in their final category number column and this is wrong. Rather strangely, the part of speech found in this field generally isn’t the same as the value found in the actual part of speech column for the category. It’s most perplexing and I really don’t know how these errors have crept in. Hopefully it won’t take too long to identify what the real figures should be and to fix the data.