I spent a couple of days this week reworking the Scots School dictionary app that I’m developing for Scottish Language Dictionaries. Over the past couple of weeks I have received feedback from a number of people about the first version I put together over the summer and last week created a ‘to do’ list of things that needed tweaked. I’m happy to say that I managed to complete all of these items this week and by Friday I had sent out the URL for a new version for people to try out. I can’t really share the URL here, but hopefully it won’t be long now until the app is available through the iOS App Store and the Android Play Store. The main tasks I managed to tick off were:
1. Updating the interface slightly, mainly just tweaking the colours a bit
2. Creating an introduction page with some sample text (to be replaced with real text). This is now accessible by clicking on the ‘Scots Dictionary for Schools’ header at the top of every page.
3. Creating a ‘random word’ feature for the welcome page, with a button allowing users to load a new word.
4. Creating a placeholder ‘help’ page
5. Updating the footer to include the SLD logo on the right and some text on the left. The text changes each time the page fully reloads (only when you navigate between the intro page, browse page, search page and help page, not when you navigate between words on a page).
6. Ensuring the ‘back to search results’ and ‘back to letter’ links when viewing a word now take you to the relevant part of the list rather than dumping you at the top of the page
7. Adding in form numbers, which now appear in grey in the word page (e.g. ‘wey (1)’)
8. Ensuring that if there’s only one search result the navigate between results buttons are now hidden
9. Adding in search term highlighting, with the search term highlighted in yellow in the word page
10. Making all words in ‘related words’ and ‘meaning’ sections clickable. Click on one to bring up a popup allowing you to select ‘Scots’ or ‘English’ then press search to search the full text for this word.
11. Ensuring the ‘search’ button no longer gets hidden by the footer. You still have to scroll down the page to see it, but at least it should be visible on al screens now.
12. Removing the confusing ‘<<‘ and ‘>>’ symbols and instead styling the text within these as dark blue and bold to make them stand out.
A further day was mostly taken up with AHRC duties, reviewing yet another technical plan. I’ve got another one to do next week too. I also met with Scott Spurlock from Theology to discuss a project he is putting together. He was interested in the possibilities of performing the equivalent of OCR on handwritten texts, more specifically on cursive, historical texts written by many different hands. I spent a bit of time researching this possibility and speaking to other developers in the College about it but it’s not looking massively promising. There is a Wikipedia page on handwriting recognition (http://en.wikipedia.org/wiki/Handwriting_recognition) but it rather unpromisingly states “There is no OCR/ICR engine that supports handwriting recognition as of today.” The current state of the art seems to be tools that can either:
1. Recognise individual handwritten printed characters in separate boxes (e.g. in forms or post codes)
2. Be trained to recognise the handwriting of one individual, converting this automatically to machine readable text on the fly as a user writes on a touchscreen.
Neither of these approaches would suit the project, which has cursive texts written by numerous hands. There is a concept called ‘Intelligent Word Recognition’ (http://en.wikipedia.org/wiki/Intelligent_word_recognition) and there may be tools in this field that are worth pursuing. These tools aim to extract and pattern match words rather than attempting to split words into individual characters. Unfortunately information about products that claim to be able to achieve this is rather vague. I’ve found this product http://www.a2ia.com/en/handwriting-recognition and their ‘white paper’ (http://www.a2ia.com/sites/default/files/industry_solutions/a2ia-using_iwr_to_cut_labor_costs_without_outsourcing.pdf) provides quite a lot of information about how their product works and it does look sort of promising, but I think it is intended to work on free-text boxes in forms rather than page after page of cursive text.
I also found this blog post: http://blog.parascript.com/icr-software-101-handprint-recognition that discusses ‘handprint recognition’ and links to another ‘white paper’ at the bottom (you need to subscribe to receive it and I haven’t done this but the blurb does state ‘Advanced ICR technology thinks like a human to process documents that include any type of handwriting, including unconstrained handprint, cursive and more’). However I’m a little sceptical as ‘handprint’ means individually printed characters.
Other developers in the College of Arts that I have spoken to thought that current technology would not be able to automatically extract text from the sorts of handwritten historical documents the project will likely be dealing with and the general consensus was that crowdsourcing or outsourcing transcription would be more suitable. However, it’s an emerging technology that is worth keeping track of.
Other than fixing a bug with the Scottish Corpus (a server setting was limiting the number of documents that could be downloaded at once to 1000) and dealing with the DSL website stopping working briefly I spent the rest of the week on Mapping Metaphor duties. I completed the updates to the way the ‘centre on category’ feature works, as discussed last week. I also completed the reformatting of the ‘info box’, adding the ‘key’ information to the bottom of it and reducing the amount of space taken up by the links to other categories, into ‘view info’ button, the ‘download’ button etc.
The big task I worked on for the project this week was to try and get some sort of background colour for the visualisation labels. Why is this required? Mainly to allow L3 categories in the hybrid view to stand out from the L2 categories – so that the L3 categories within the L2 category you’re looking at can have one background colour and the L3 categories that these link to in the L2 category you’ve opened can have another colour.
It is not as straightforward as you might think to get a text background colour as SVG does not allow text elements to have background colour styling. Instead what you need to do is create a new rectangle object that is the right dimensions and position and then place this behind the text item. I found a possible way of doing this here: http://stackoverflow.com/questions/15500894/background-color-of-text-in-svg and after a lot of tweaking I managed to get all of the L3 categories in the selected L2 category with a red background colour (for test purposes). I wasn’t really very satisfied with how this looked though. The labels are positioned round a circle so the background colours jutted out as individual spokes. I decided that having a small rectangle on the side of the label nearest the inner circle looked a lot better and implemented this instead. These small rectangles aren’t ‘text background’ but are just added to the SVG group in the same way as the node circles I added a few weeks ago. I think this looks ok, and it allows the selected or linked to category to be given a differently coloured rectangle – something I will try to implement next week.