I got back into the redevelopment of the DSL website again this week, which occupied a large chunk of my available time. This included adding in some new features, such as making the entry page automatically scroll to the first highlighted item if reached via the search results page, fixing some layout issues and bugs (such as ‘hide snippets’ sometimes appearing twice on the results page), updating some content such as the ‘support DSL’ page and fixing a number of typos in the content that people had noted. Peter migrated the API to the server at Glasgow this week as well, and I spent a bit of time updating the way the front end connects to the API (as the API and the front-end are now running on the same server there is no need to use the PHP library cURL to make a pathway through the university’s proxy – instead we can just use a direct localhost URL, which will no doubt give a performance boost). Peter had also implemented the ‘word of the day’ feature this week so I got that working on the homepage too.
I also started migrating the ‘History of Scots’ section of the old website to the new one. I was hoping I’d be able to save the HTML from the old site and simply slot this into the page structure of the new site but upon looking at the source of the old pages I realised this would not be a good idea. The HTML of the pages was generated by MS Word, which produces an absolute mess of non-standard tags, fixed font styles and sizes and other horrible issues. Not only that, but the HTML was generated by a version of Word from more than 10 years ago when Microsoft was particularly bad about even remotely following HTML standards. I just couldn’t bear the thought of all that horrible mess going into the new site so I decided to recode the whole lot myself. This is a pretty major and utterly tedious task, but at least it means the pages will have decent HTML, plus it’s a process that will only ever have to be done once. By the end of the week I’d managed to complete the first 5 sections (out of 9) so hopefully I’ll get it all completed next week, leaving enough time to get the other outstanding tasks completed before the launch on the 12th of September.
Other than DSL tasks I had spent some further time working on the reviewer’s response for Carole’s AHRC project. We submitted the response on Tuesday and here’s hoping the review panel goes well. I also spent a little more time on the response to Jennifer’s project too. I had my PDR with Jennifer on Thursday, which went ok. My PDR form was considered to be very good except for one thing: the words, most of which I’ll be changing next week.
I had two other meetings this week, the first one with Marc and Fraser to discuss the Hansard part of the SAMUELS project, which I will be working on before Christmas. The people at Lancaster will be semantically tagging the Hansard texts and then I will be getting all of this data in a database and will be creating a front end through which people will be able to query things like the most common semantic groupings. We’ll provide facilities to allow users to generate some nice graphs and things from the data too.
My other meeting was with Susan Rennie to discuss her Scots Thesaurus project. She’s hoping to get a developer employed in the next couple of months and the meantime I’m going to be helping out with some visualisations based on sample data and also will be giving her advice on technical matters.
I spent most of Friday working on Historical Thesaurus matters, mainly fixing some bugs that had crept in. Firstly there was an issue with duplicate ‘t7’ numbers. Some categories that had a ‘t7’ number (e.g. 01.05.19.16.02.03.03) had a duplicate number in the t7 column – e.g. the real number should be 01.05.19.16.02.03. I figured out what had caused this problem, which was my fault. When I had renumbered the categories last week some categories shifted up one, but in these cases I’d forgotten to clear the contents of the t7 column, resulting in the duplication. Fixing this was no small matter, as there are many instances in the data when it is perfectly legitimate for the t7 column to be the same as the t6 column. I think I managed to sort the issue and implemented the fix. I also investigated why certain categories had parts of speech in one of the ‘t’ columns, and it turned out these categories were duplicates that could safely be deleted. I also added many more alternative forms to the lexeme search term table to allow variant spellings to work – e.g. the HT uses ‘ize’ for words like ‘civilize’ so if you search for ‘civilise’ you don’t find anything. Several thousand new variants have now been added to address this