I also made some further browse options for the ‘drilldown’ view of the data, which are now accessible from the ‘metaphor card’. When viewing the connections between two major categories if you click on a line joining up two categories the card now gives you the option of viewing all the other categories that are connected to either one or both of the categories in question. It’s a nice little feature that allows visualisations comparable to Ellen’s previous ‘Beauty and Light’ diagrams to be viewed. I have also implemented the option to select the type of metaphor to view, which now appears as an option in the left-hand box. You can select whether to view strong, weak or both types of metaphor and the visualisation updates immediately to reflect your choice, both through the aggregated diagram and the drill-down one. I also moved the colour-code ‘key’ information to its own little box as the left-hand box was starting to get a bit cluttered.
The final update I made was to begin working on the timeline view of the data. I decided to use the timeglider API for this, and created a PHP script that would automatically spit out data in the correct JSON format depending on user selections. At the moment the script only spits out data for three ‘streams’ (Society-World, Mind-Society, World-Mind) and you can’t provide different options, but this is just a start. The metaphor data gets displayed on the timeline as an exact point in time (i.e. not rounded to 50 year chunks) with different coloured icons for the three streams and different sizes of icon depending on whether the metaphor is strong, weak or unclassified. Even with the test data it is obvious that there is too much data to be properly displayed. If you zoom out so you can view 100 years on screen at one time the screen is swamped in different coloured dots. I’m going to have to think about how this data could be displayed a little more effectively.
As previously mentioned, I attended a Digital Humanities event in Edinburgh on Friday. It was an all-day event and on the whole it was very useful. Andrew Prescott gave a hugely interesting talk about the current state of digitisation and digital editions and there was a great session where attendees could discuss their project in 5 minutes. I gave a talk about the Mapping Metaphor visualisations, and it was great to hear the other speakers too, especially the other projects there were also dealing with visualisations. The organisers are hoping to launch a Scotland-wide Digital Humanities Network soon, which I think has a lot of potential.
Other than the above I did a few other tasks this week, including a little work on the Choral Burns project website, a little work on the Cognitive Toponymy website and I also was in contact with Peter at SLD about the API he has been working on. He has now reached a point where he is ready to give me access to the API. I’ve had a quick look at it and it seems pretty straightforward to follow, which is very good. I aim to start working on connecting the front-end to the API next week, all being well.
This week has been gloriously hot and sunny, not exactly ideal conditions in which to be cooped up inside hunched over a keyboard. However, I have made some good progress this week. The biggest achievement has been completing development work on the new Historical Thesaurus website! Woo! Of course it is highly likely that further tweaks and refinements will be required over the coming weeks and months, but everything that was on my ‘to do’ list has now been done (other than tasks where I’m awaiting content from others). You can view the new website here:
But note that this is a purely temporary URL and once the site goes live this URL will in all likelihood stop working. The final tasks included creating a print CSS file so that when you print a page (e.g. a category) you don’t get all of the non-relevant parts of the web page such as navigation items. I also updated the way subcategories appear on the site following a suggestion from Marc. I had managed to omit the subcategory number from the lists of categories and the category page, so that all subcategories shared the main category number, so instead of ‘01.02.08.01.15.08|01’ it was only displaying ‘01.02.08.01.15.08’. Marc also wanted the full hierarchy of subcategories to be represented in the titles too, which makes perfect sense. I made these changes plus a bunch of others too, but it’s too hot on a Friday afternoon before a long weekend to go into too much detail about them!
Also this week I completed a first draft of the Grammar app, which is something of a relief. It will require further tweaking but all of the interactive exercises are now in place, including the tortuous ‘general exercise’, which can be found here:
Good luck getting top marks in those!
Also this week I met with Nigel Leask to discuss a large AHRC bid that he’s putting together with Aberystwyth. I’ll be contributing to the technical plan for this project and will hopefully continue to be involved with it if it gets funded. I also updated the Burns website to finally fix the Twitter integration error, and also to add tow PhD students as Blog authors so they can post updates about their Highland Tour in a couple of weeks time. I also made some further updates to the Digital Humanities Network website and spent a bit of time giving some advice to a postgrad student who had an idea for an app.
In addition to the above I also spent the best part of a day working on further mock-ups for the Dictionary of the Scots Language website, and these are coming along very nicely. Hopefully in a couple of weeks time I’ll be able to share them with the other members of the team and get some feedback.
It was a busy old week this week, starting on Monday with the Digital Humanities Network launch event. The day-long event was a huge success and everything went very smoothly. There were two half-hour talks in the morning, given by Marilyn Deegan and Bill Kretzschmar, each of which was very interesting and very different; the first being a general talk about Digital Humanities while the second looked in more detail at a specific digital humanities project. The demo of the Digital Humanities Network website (http://www.digital-humanities.glasgow.ac.uk/) went well, and I think it’s looking pretty impressive now. In the afternoon there were a series of 5 minute talks about a wide range of Digital Humanities projects within the University and the format of these talks worked very well indeed – each speaker kept within their allotted time and it was a great way to learn more about the projects without getting too bogged down in detailed information about them. I did a five minute talk about the redevelopment of the STELLA learning and teaching applications, which went fine too.
On Tuesday there was a DROG meeting – the first in fact since February! As with previous meetings it was a good opportunity to discuss the work I have been undertaking, plus upcoming projects and priorities. It looks very much like it will be possible for me to redevelop the SCOTS corpus website at some point, which will be a nice big project to sink my teeth into. Dave Beavan, who was responsible for developing the SCOTS website, was at the DHN event on Monday and I had a brief chat about it and the possibility of it being redeveloped, which was useful.
On Thursday I had arranged to meet the Scottish Language Dictionaries people in Edinburgh to discuss the redevelopment of the Dictionary of the Scots Language website. Ann Ferguson had previously sent me through some requirements documents and Word-based mock-ups of the desired user interface elements, plus access to two test versions of the website, so I spent a good deal of the remainder of Tuesday and also the Wednesday going through all of these and preparing a document of discussion points for the meeting. The meeting itself went very well. All of my questions were answered and it was great to meet Peter Bell, the developer of the test versions and the guy who is going to develop the API for the new version of the website. The meeting lasted almost 3 hours but I think we all had a clearer idea of what would be happening with the redevelopment of the website afterwards. It was agreed that I would design the front end and Peter would develop the API. I will get some server space set up in Glasgow and will then develop some static HTML mock-ups of possible interfaces for discussion and further refinement.
On Friday I wrote up my notes from the meeting and did some investigation about server possibilities. I spent the rest of the day further tweaking the HT website, and next week I will settle down to the big task of re-importing all of the data from Access to MySQL. It’s going to take quite a while and possibly be quite tricky to get everything working as it should do.
This week I continued to work through my ‘to do’ list for the Historical Thesaurus website. The biggest thing I tackled was to make subcategories properly hierarchical within the category page. I had previously implemented a nice little indented list of subcategories, with a greater indentation representing a lower level subcategory, and this appears (when the user asks for it) on the category page when viewing a main category that has one or more subcategories. But if the user clicks to view a subcategory and it contains lower level subcategories or is a child of a higher level subcategory none of this information was being represented – instead all subcategories were flattened and being treated as if they were all one level down from the main category. After quite a bit of reworking of the category page I managed to sort this. Now when viewing a subcategory any parent subcategories appear in the ‘Up hierarchy to’ bar (with a darker background colour to differentiate these from main categories) and any child categories appear in the subcategory section, as happens with a main category. An example of this can be seen here:
I made some further updates to the user interface as well this week, including adding in an option to select all verb forms to the Part of Speech selection section of the Advanced Search page, and a facility to allow users to easily enter ashes, thorns and yoghs into the word search box by simply clicking on a button. I also look some more into optimising the advanced search and I have succeeded in speeding up the search algorithm significantly in most cases.
I also spent quite a bit of time this week finalising the Digital Humanities website before next Monday’s launch. The university web design was altered slightly this week so I had to update the DHN template to accurately reflect this. I also made some further tweaks to the layout and added some further content. I also had to prepare my 5 minute talk on the redevelopment of the STELLA teaching applications, which I will give at the event, and I met with Jeremy to discuss how we will jointly demo the content of the website. Hopefully all will go smoothly.
I met with Jean on Tuesday to discuss the Digital Humanities event and to finalise a few details. I also had a phone conversation with Ann Ferguson at Scottish Dictionaries about the redevelopment of the Dictionary of the Scots Language, which I will be involved with in the coming months. We will be having a meeting in Edinburgh to discuss things further next Thursday, and Ann sent me some mock-ups and requirements information about the proposed redesign, which was very useful.
I also had a chat with Pauline MacKay about the Burns timeline that we were hoping to publish this month. It would appear that there have been some delays with the Prose Volume and it now looks like the timeline won’t need to go live until January instead, but we had a useful chat about some of the other outstanding issues, including maps and the podcasts that need to go live next month.
Not a massive amount to report this week due to illness. I was struck down with a nasty feverish throat infection on Sunday and only came back to work on Thursday. Even then I was still feeling a bit wobbly. I spent Thursday morning catching up with the backlog of emails from the week and dealing with issues relating to them. In the afternoon I met with Johanna Green, who will be updating the content of the Digital Humanities Network website in preparation for the official launch in June. I went through the system with her and showed her how everything worked and she should be able to use the system without any problems. I also had a chat with Wendy about possible Scottish Corpus redevelopment and helped get access to usage stats for the site too, all in preparation for the REF.
I spent the remainder of the week working through the outstanding redevelopment tasks for the Digital Humanities Network website, including adding more filter options to the project page, ensuring the site works sufficiently well in old versions of IE and implementing some other visual tweaks and improvements.
The new and hopefully completed Digital Humanities at Glasgow website can now be found here:
A very late post for this week as I was working 8-4 on the Friday and ran out of time, then I was ill the Monday to Wednesday of the following week. I continued with the redevelopment of the Historical Thesaurus website for the majority of this week. The main achievements this week were the creation of the ‘category’ page, reached via the search results page. This page displays the words found within a particular category, and also provides extensive browse options to related material, for example any categories that have the same number but a different part of speech, any subcategories of the category in question, plus also any parent and child categories in the overall HT hierarchy. It took quite a long time to implement all these browse options, but I think it’s working rather nicely. There are some issues related to traversing the hierarchy due to issues with the data, but hopefully the bulk of these will be resolved.
I also added search term highlighting to the category page – with the user’s original search term (minus wildcards) highlighted wherever the term appears (including within longer words). This works throughout the hierarchy – so if the user searches for ‘*sausage*’ and accesses the subcategory ‘types of sausage’ then the term is highlighted wherever it appears within the words, and if the user then browses up the hierarchy to the main category ‘Sausage’ any occurrences of the term will be highlighted here too.
Also this week I reworked the search results (category selection) page with the aim of speeding up the queries. Previously queries were taking far too long to run – sometimes as much as 30 seconds, which is completely unacceptable. Thankfully after reworking things the search is significantly faster, generally loading the search results in less than a second for non-wildcard searches and only taking a little longer than this for wildcard searches. I think the search is now as fast as it needs to be.
Also this week I began work on the ‘advanced search’ page. I now have all of the required search options within a form on the search page, although for now none of the options actually work – that is still to be tackled next week. I have also added in the option of jumping straight to a specific category of you know the category number, and this option is fully operational.
I had a further meeting with Marc and Christian on Thursday this week, which was another useful opportunity to go through some of the outstanding tasks and make some decisions. We are still dealing with a number of issues with the HT data, some having come from the Access database, some introduced during the migration process and others resulting from the original format of the data. For example, there are some problems with empty categories. Categories that have no words were not part of the Access database, but are needed to properly enable the traversal of the hierarchy. Previously Marc gave me a spreadsheet containing a lot of the empty categories, but it turns out this list wasn’t up to date and there are some problems with category numbers having been changed. Marc is going to get the up to date categories to me from the XML file that was submitted to the OED people, which will help greatly with this.
Other than HT work I met with Jean this week to discuss finalising the redevelopment of the Digital Humanities Network website. We had a very useful meeting and we came up with a list of outstanding tasks that needed tackled. I spend most of Friday this week working through the list and managed to get most items implemented. The website is looking much better now, and I also made it live, replacing the older version, this week too. you can access it here:
As with last week, I spent most of this week working on the Historical Thesaurus redevelopment. The focus this week was on the search options, firstly generating scripts that would be able to extract all individual word variants and store these as separate entries in a database for search purposes, and secondly working on the search front end.
In addition to extracting forms separated by a slash the script also looks for brackets and generates versions of words with and without brackets – so for example hono(u)r results in two variants – honour and honor. This would then allow exact words to be matched as well as allow for wildcard searches. The script works well in most instances, but there are some situations where the way in which the information has been stored makes automated extraction difficult, for example ‘weather-gleam/glim’, ‘champian/-ion’, ‘(ge)hawian (on/to)’. In these cases the full version of the word / phrase is not repeated after the slash, and it would be very difficult to establish rules to determine what the script would do with the part after the slash. Christian, Marc and I met on Thursday to discuss what might be done about this, including using a list of ‘stop words’ that the search script would ignore (e.g. prepositions). I will also look into situations where hyphens appear after a slash to see if there is a way to automate what happens to these words. It is looking like at least some manual editing of words will be required at some point, however.
During the week I ran my script to generate search terms, resulting in 855,810 forms. The majority of these will have been extracted successfully, and I estimate that there are maybe 3-4000 words that might need to be manually fixed at some point. However, even with these words it is likely that a wildcard search would still successfully retrieve the word in question.
I spent most of my remaining time on HT matters working on the category selection page and the quick search. I have now managed to get a quick search up and running that searches words and category headings and uses asterisks for wildcards at the beginning and end of a search term. The quick search leads to the category selection page which pulls out all matching categories and lexemes. It creates a ‘recommended’ section which includes lexemes where the search term appears in both the lexeme and the category heading, and a big long list of all other returned hits underneath. I have also added in pagination for results too. Marc and Christian are wanting the results list to be split into sections where the search term appears in the lexeme and then where it appears only in the category, which I will do next week. The search is still a bit slow at the moment and I’ll need to look into optimising it soon, either by using more indexes or by generating cached versions of search results.
In addition to this I responded to a query about developing a project website that was sent to me by Charlotte Metheun in Theology and I provided some advice to someone in another part of the university who was wanting to develop a Google Maps style interface similar to the one I made for Archive Services. I also made some further updates to the ICOS 2014 website, adding in the banner and logo images and making a few other visual tweaks. My input into this website is now pretty much complete. I also arranged to meet Jean to discuss finalising the Digital Humanities Network website, and I signed up as a ‘five minute speaker’ for the Digital Humanities website launch. I’ll be talking about the redevelopment of the STELLA Teaching resources.
This week was predominantly spent working on the Historical Thesaurus redesign, both the database and the page design for the front end. For the database I created a bunch of upload and data processing scripts to get the almost 800,000 rows of data from the Access database into the new MySQL database that will be used to power the website. Despite stating last week that I wouldn’t change the structure of the data, this week I decided to do just that by moving the 13 fields that make up category information to a dedicated category table rather than having this information as part of the lexeme table. Splitting the information up reduces the amount of needlessly repeated data – for example there are up to 50 lexemes in each category and previously all 13 category fields were being repeated up to 50 times whereas now the information is stored once and then linked to the lexeme table, which is much neater.
By the end of the week I had all of the data migrated and moved into the new database structure, with a number of indices in place to make data retrieval speedier too. One slight issue with the data was that ‘empty’ categories in the hierarchy (i.e. ones that don’t have any associated lexemes) are not present in the Access database. This makes sense when you’re focussing on lexemes, but in order to develop a browse option or present breadcrumbs the full hierarchy is needed. For example 01.01.06.01 is ‘Europe’ and its parent category is ’01.01.06’, regions of the earth. But as this category has no lexemes of its own it isn’t represented in the database. I met with Marc on Thursday and he managed to get a complete list of the categories to me, including the ‘empty’ ones and I spent some time working on a script that would pull out the empty ones and add them to my new ‘category’ database. While doing this I came across a few errors in the data, where the full combination of headings and part of speech was not unique. I also noticed that I had somehow made an error in my database structure, missing out three parts of speech types. Rectifying this will mean reuploading all the data, which I will do next week.
In terms of front end work, I made some further possible interface designs, all of which are ‘responsive’ designs (they automatically change with screen size, meaning no separate mobile / tablet interface needs to be developed). It was a good opportunity to learn more about responsive web design. My second possible interface can be found here http://historicalthesaurus.arts.gla.ac.uk/new-design-2/ and possibly looks a bit too ‘bloggy’. I further adapted this design to use a horizontal navigation section, which you can view here: http://historicalthesaurus.arts.gla.ac.uk/new-design-3/. At the meeting with Marc on Thursday I received some feedback from him and the other people involved with the project regarding colour schemes and fonts, and as a result of this I came up with a fourth design, which will probably end up being used and can be viewed here: http://historicalthesaurus.arts.gla.ac.uk/new-design-4/. This combines the horizontal navigation of the previous design with the left-hand navigation of design number 2, and I think it looks quite appropriate.
Also this week I helped to set up the domain and provided some feedback to Daria for the ICOS2014 conference website and did some Digital Humanities Network related tasks.
This was my first week in my new office and while the room is lovely the heating has been underperforming somewhat, meaning I’ve been chilled to the bone by lunchtime most days. Thankfully some heating engineers worked their magic on Thursday and after that the office has been nice and toasty.
I spent a lot of time this week continuing to develop the ‘Readings’ app that I started last week. In fact, I have completed this app now, in standard HTML5. This version can be accessed here: http://www.arts.gla.ac.uk/STELLA/briantest/readings/ (but note that this is a test URL and will probably be taken down at some point). All the content for Old, Middle and Early Modern English is present, including all sound files, texts, translations and notes. After completing this work I started to look into wrapping the website up as an app and deploying it. Unfortunately I haven’t quite managed to get Phonegap (or Apache Cordova as the open source version is properly know as http://docs.phonegap.com/en/2.2.0/index.html) working on my PC yet. I spent a frustrating couple of hours on Friday afternoon trying to set it up but by the end of the day I was still getting errors. Next week I will continue with this task.
One limitation to app development will be that developing apps for iOS requires not only a Mac but also paying Apple $99 per year for a developer certificate. I’ll have to see whether this is going to be feasible. It might be possible to arrange something through STELLA and Marc.
Also this week I continued to develop the Digital Humanities Network website, fixing a few issues, such as ‘subjects’ not working properly. I also created a new way of recording project PIs as the current system was a bit inefficient and led to people being recorded with different names (e.g. sometimes with ‘Professor’, othertimes without). Now PIs are only recorded in the system once and then linked to as many projects as required. I also updated the ‘projects’ page so that it is possible to view projects linked to a specific PI. And finally, I asked some people to sign up with the site and we now have a decent selection of people represented. More would still be good though!
My other major task this week was to work some more with the Burns website. I started last week to look into having sub-pages for each song, and this week I found a solution which I have now implemented on my local test installation of the website. I reached the solution in a bit of a round-about way unfortunately. I initially intended song ‘pages’ to be blog posts and to have a category listing in the menu to enable drop-down access to the individual song ‘pages’. I thought this would work quite nicely as it would allow commenting on the song pages, and it would still also allow an HTML5 player to be embedded within the blog content. However, the more I looked into this solution the more I realised it was far from ideal. You can’t have a drop-down list of blog pages from a menu in WordPress (which is understandable as there could be thousands of blog posts) so I had to create subcategories that would only be used for one single post. Plus when viewing the blog archives or other blog views the song pages would be all mixed in with the proper blog pages. Instead I found a much easier way of having sub-pages represented in the menu bar as drop-down items and added these instead. At the moment I’ve had to activate commenting on all pages in order for users to be able to post comments about songs. There will be a way to state that comments should not be possible on certain pages but I still need to find a way to do this.
Also this week I attended a further meeting of the Corpus Workgroup, which was useful. We are all very happy with the way the test server is working out and we now need to get a dedicated server for the Corpus software. The next step of development will be to try and get multiple front-ends working with the data, which should be an interesting task.
I am writing this week’s post from the delightful surroundings of my new office. It’s been almost three months since I started the job, and although it has been great spending that time with my old HATII colleagues it feels very pleasant to finally be in my own office!
I began this week by completing work on the revamped Digital Humanities Network pages that I was working on last week. I spent most of Monday tweaking the pages, adding sample content and fixing a few bugs that had reared their heads. By the end of the day I had emailed Ann, Jeremy, Marc and Graeme about the pages and received favourable feedback during the course of the week. On Friday Marc, Ann, Graeme and I met to discuss the pages and to decide who should write the site text that still needs to be supplied.
I decided to start developing the ‘Readings in Early English’ app as I figured this would be the simplest to tackle seen as it has no exercises built into it. I familiarised myself with the Jquerymobile framework and built some test pages, and by the end of the week I had managed to put together an interface that was pretty much identical to the Powerpoint based mock-ups that I had made previously. Currently only the ‘Old English’ section contains content, but within this section you can open a ‘reading’ and play the sound clip using HTML5’s <audio > tag, through which the user’s browser embeds an audio player within the page. It works really smoothly and requires absolutely no plug-in to work. The ‘reading’ pages also feature original texts and translations / notes. I created a little bit of adaptive CSS using Jquery to position the translation to the right of the original text if the browser’s window is over 500px wide, or underneath the original text if the window is smaller than this. It works really well and allows the original text and the translation to be displayed side by side when the user has their phone in landscape mode, automatically switching to displaying the translation beneath the original text when they flip their phone to portrait mode. I’m really happy with how things are working out so far, although I still need to see about wrapping the website as an app. Plus the websites that have a lot of user interaction (i.e. exercises) are going to be a lot more challenging to implement.
The test version of the site can be found here: http://www.arts.gla.ac.uk/STELLA/briantest/readings/ although you should note that this is a test URL and content is liable to be removed or broken in future.
Also this week I met with Marc to discuss the Hansard texts and the Test Corpus Server. Although I managed to get over 400 texts imported into the corpus this really is just a drop in the ocean as there are more than 2.3 million pages of text in the full body. It’s going to be a massive undertaking to get all these text and their metadata formatted for display and searching, and we are thinking of developing a Chancellor’s Fund bid to get some dedicated funds to tackle the issue. There may be as many as 2 billion words in the corpus!
I also found some time this week to look into some of the outstanding issues with the Burns website. I set up a local instance of the website so I could work on things without messing up the live content. What I’m trying to do at the moment is make individual pages for each song that is listed in the ‘Song & Music’ page. It sounds like a simple task but it’s taking a little bit of work to get right. I will continue with this task on Monday next week and will hopefully get something ready to deploy on the main site next week.