Week Beginning 2nd October 2017

This was another week of doing lots of fairly small bits of work for many different projects.  I was involved in some discussions about some possible updates to websites with Scottish Language Dictionaries, and created a new version of a page for the Concise Scots Dictionary for them.  I also made a couple of minor tweaks to a DSL page for them as well.

For the Edinburgh Gazetteer project I added in all of the ancillary material that Rhona Brown had sent me, added in some new logos, set up a couple of new pages and made a couple of final tweaks to the Gazetteer and reform societies map pages.  The site is now live and can be accessed here: http://edinburghgazetteer.glasgow.ac.uk/

I also read through the Case for Support for Thomas Clancy’s project proposal and made a couple of updated to the Technical Plan based on this, and I spent some time reading over the applications for a post that I’m on the interview panel for.  I also spent a bit more time on the Burns Paper Database project.  There were some issues with the filenames of the images used.  Some included apostrophes and ampersands, which meant the images wouldn’t load on the server.  I decided to write a little script to rename all of the images in a more uniform way, while keeping a reference to the original filenames in the database for display and for future imports.  It took a bit of time to get this sorted but the images work a lot better now.

I also had a chat with Gary Thoms about the SCOSYA Atlas.  This is a project I’ve not worked on recently as the team are focussing on getting all of the data together and are not so bothered about the development of the Atlas.  However, Gary will be teaching a class in a few weeks and he wanted to let the students try out the Atlas.  As the only available version can be accessed by the project team once they have logged into the content management system he wondered whether I could make a limited access guest account for students.  I suggested that instead of this I could create a version that is publicly accessible and is not part of the CMS at all.  Gary agreed with this so I spent some time creating this version.  To do so I had to strip out the CMS specific bits of code from the Atlas (there weren’t many such bits as I’d designed it to be easily extractable) and create a new, publicly accessible page for it to reside in.  I also had to update some of the JavaScript that powers the Atlas to cut back on certain aspects of functionality – e.g. to disable the feature that allows users to upload their own map data for display on the map and to ensure that links through to the full questionnaire details don’t appear.  With this done the new version of the Atlas worked perfectly.  I can’t put the URL here, though, as it’s still just a work in progress and the URL will be taken down once the students have had a go with the Atlas.

I met with Fraser on Wednesday to get back into the whole issue of merging the new OED data with the HT data.  It had been a few months since either of us had looked at the issues relating to this, so it took a bit of time to get back up to speed with things.  The outcome of our meeting was that I would create three new scripts.  The first would find all of the categories where there was no ‘oedmaincat’ and the part of speech was not a noun.  The script would then check to see whether there was a noun at the same level and if so grab its ‘oedmaincat’ and then see if this matched anything in the OED data for the given part of speech.  This managed to match up a further 183 categories that weren’t previously matched so we could tick these off.  The second script generated a CSV for Fraser to use that ordered unmatched categories by size.  This is going to be helpful for manual checking and it thankfully demonstrated that of the more than 12,000 non-matched categories only about 750 have more than 5 words in them.  The final script was an update to the ‘all the non-matches’ script that added in counts of the number of words within the non-matching HT and OED categories.  It’s now down to Fraser and some assistants to manually go through things.

I did some further work for the SPADE project this week, extracting some information about the SCOTS corpus.  I wrote a script that queries the SCOTS database and pulls out some summary information about the audio recordings.  For each audio recording the ID, title, year recorded and duration in minutes are listed.  Details for each participant (there are between 1 and 6) are also listed:  ID, Gender, decade of birth (this is the only data about the age of the person that there is), place of birth and occupation (there is no data about ‘class’).  This information appears in a table.  Beneath this I also added some tallies:  the total number of recordings, the total duration, the number of unique speakers (as a speaker can appear in multiple recordings) and a breakdown of how many of these are male, female or not specified.  Hopefully this will be of use to the project.

Finally, I had a meeting with Kirsteen McCue and project RA Brianna Robertson-Kirkland about the Romantic National Song Network project.  We discussed potential updates to the project website, how it would be structured, how the song features might work and other such matters.  I’m intending to produce a new version of the website next week.

 

Week Beginning 11th September 2017

I spent more than half of this week continuing to work on the new ARIES app.  Last week I finished work on an initial, plain HTML and JavaScript version of the app, and I received another couple of bits of feedback this week that I implemented.  The bulk of my time, however, was spent using Apache Cordova to ‘wrap’ the HTML and JavaScript version, converting it into actual iOS and Android apps, then testing these apps on my iOS and Android devices, and then making all of the media files that an app needs, such as icon files, screenshots, app loading screens, app store graphics and things like that.  This process always takes longer than I think it should.  For example, I have to make more than 20 different icon files at varying resolutions, and I need to grab multiple screenshots from at least four different devices.  This latter process is made trickier because my Android Nexus 7 tablet no longer connects properly to my PC – the ‘photos’ folder appears blank when I connect for photo transfer and doesn’t contain the actual updated contents when I connect for file transfer, so I have to use a third party file explorer app to move the screenshots to a different folder on the device that somehow does get updated when viewing on my PC.  Regarding the icons, I came up with a few alternatives for this, based on the header image for the app, and we finally agreed on a sort of ‘marble’ effect circle on a white background.  I think it looks pretty good, and is certainly better than the old ARIES logo.  The app publication process was also complicated by two new issues that have emerged since I last made an app.  Firstly, Apple have updated the build process to disallow any extended image metadata, I guess as a security precaution.  I created my app icon PNG files in Photoshop, which added in such metadata.  When I then built my iOS app in xCode I received some rather unhelpful errors.  Thankfully StackOverflow had the answer (see https://stackoverflow.com/questions/39652867/code-sign-error-in-macos-sierra-xcode-8-3-3-resource-fork-finder-information) and after running a couple of command-line scripts this metadata was stripped out and the build succeeded.  My second issue related to the app name on the App Store.  Apple has decided to limit app names to 30 characters, meaning we could no longer call our app ‘ARIES: Assisted Revision in English Style’.  And as there is already an app names ‘ARIES’ we couldn’t call it that either.  This is a real pain, and seems like a completely unnecessary restriction to me.  In the end we called the app “ARIES: English Academic Style”.  I managed to submit the app to Apple and Google on Wednesday, and thankfully by the end of the week the new version was available on both the App and Play Stores.  I also made the ‘web’ version available, replacing the old ARIES site.  You can access this, and link through to the app versions from here: http://www.arts.gla.ac.uk/stella/apps/web/aries/

Other than ARIES work, I made some further changes to the Edinburgh Gazetteer keywords, replacing the old list of keywords with a much trimmed down list that Rhona supplied.  I think this works much better than the previous list, and things are looking good.  I also helped Alison Wiggins with some information she wanted to add to the Digital Humanities website, and I spent about half a day working with the Mapping Metaphor data, generating new versions of all of the JSON files that are required for the ‘Metaphoric’ app and testing the web version of this out.  It looks like everything is working fine with the full dataset, so next week I’ll hopefully publish a new version of the app that contains this data.  I also started working on the database of Burns’ paper for Ronnie Young, firstly converting his Access database into an online MySQL version and then creating a simple browse interface for it.  There’s still lots more to be done for this but I need to meet with Ronnie before I can take this further.

The rest of my week was taken up with meetings.  On Wednesday morning I was on an interview panel for a developer post in another part of the college.  I also met with Gerry McKeever in the afternoon to discuss his new British Academy funded ‘Regional Romanticism’ project.  I’ll be working with him to set up a website for this, with some sort of interactive map being added in sometime down the road.  I spent Friday morning attending a network meeting for Kirsteen McCue’s Romantic National Song Network.  It was interesting to hear more about the project and to participate in the discussions about how the web resource for this project will work.  There were several ideas for where the focus for the online aspect of the project should lie, and thankfully by lunchtime we’d reached a consensus about this.  I can’t say much more about it now, but it’s going to be using some software I’ve not used before but am keen to try out, which is great.

Week Beginning 4th September 2017

I spent a lot of this week continuing with the redevelopment of the ARIES app and thankfully after laying the groundwork last week (e.g. working out the styles and the structure, implementing a couple of exercise types) my rate of progress this week was considerably improved.  In fact, by the end of the week I had added in all of the content and had completed an initial version of the web version of the app.  This included adding in some new quiz types, such as one that allows the user to reorder the sentences in a paragraph by dragging and dropping them, and also a simple multiple choice style quiz.  I also received some very useful feedback from members of the project team and made a number of refinements to the content based on this.

This included updating the punctuation quiz so that if you get three incorrect answers in a quiz a ‘show answer’ button is displayed.  Clicking on this puts in all of the answers and shows the ‘well done’ box.  This was rather tricky to implement as the script needed to reset the question, including removing all previous answers, ticks, and resetting the initial letter case as if you select a full stop the following letter is automatically capitalised.  I also implemented a workaround for answers where a space is acceptable.  These no longer count towards the final tally of correct answers, so leaving a space rather than selecting a comma can now result in the ‘well done’ message being displayed.  Again, this was rather tricky to implement and it would be good if you could test out this quiz thoroughly to make sure there aren’t any occasions where the quiz breaks.

I also improved navigation throughout the app.  I added ‘next’ buttons to all of the quizzes, which either take you to the next section, or to the next part of the quiz, as applicable.  I think this works much better than just having the option to return to the page the quiz was linked from.  I also added in a ‘hamburger’ button to the footer of every page within a section.  Pressing on this takes you to the section’s contents page, and I added ‘next’ and ‘previous’ buttons to the contents pages too, so you can navigate between sections without having to go back to the homepage.

I spent a bit of time fixing the drag / drop quizzes so that the draggable boxes were constrained to each exercise’s boundaries.  This seemed to work great until I got to the references quiz, which has quite long sections of draggable text.  With the constraint in place it became impossible for the part of the draggable button that triggers the drop to reach the boxes nearest the boundaries of the question as none of the button could pass the borders.  So rather annoyingly I had to remove this feature and just allow people to drag the buttons all over the page.  But dropping a button from one question into another will always give you an incorrect answer now, so it’s not too big a problem.

With all of this in place I’ll start working on the app version of the resource next week and will hopefully be able to submit it to the app stores by the end of the week, all being well.

In addition to my work on ARIES, I completed some other tasks for a number of other projects.  For Mapping Metaphor I created a couple of scripts for Wendy that output some statistics about the metaphorical connections in the data.  For the Thesaurus of Old English I created a facility to enable staff to create new categories and subcatetories (previously it was only possible to edit existing categories or add / edit / remove words from existing categories).  I met with Nigel Leask and some of the Curious Travellers team on Friday to discuss some details for a new post associated with this project.  I had an email discussion with Ronnie Young about the Burns database he wants me to make an online version of.  I also met with Jane Stuart-Smith and Rachel MacDonald, who is the new project RA for the SPADE project, and set up a user account for Rachel to manage the project website.  I had a chat with Graeme Cannon about a potential project he’s helping put together that may need some further technical input and I updated the DSL website and responded to a query from Ann Ferguson regarding a new section of the site.

I also spent most of a day working on the Edinburgh Gazetteer project, during which I completed work on the new ‘keywords’ feature.  It was great to be able to do this as I had been intending to work on this last week but just didn’t have the time.  I took Rhona’s keywords spreadsheet, which had page ID in one column and keywords separated by a semi-colon in another and created two database tables to hold the information (one for information about keywords and a joining table to link keywords to individual pages).  I then wrote a little script that went through the spreadsheet, extracted the information and added it to my database.  I then set to work on adding the actual feature to the website.

The index page of the Gazetteer now has a section where all of the keywords are listed.  There are more than 200 keywords so it’s a lot of information.  Currently the keywords appear like ‘bricks’ in a scrollable section, but this might need to be updated as it’s maybe a bit much information.  If you click on a keyword a page loads that lists all of the pages that the keyword is associated with.  When you load a specific page, either from the keyword page or from the regular browse option, there’s now a section above the page image that lists the associated keywords.  Clicking on one of these loads the keyword’s page, allowing you to access any other pages that are associated with it.  It’s a pretty simple system but it works well enough.  The actual keywords need a bit of work, though, as some are too specific and there are some near duplications due to typos and things like that.  Rhona is going to send me an updated spreadsheet and I will hopefully upload this next week.

Oh yes, it was five years ago this week that I started in this post.  How time flies.

Week Beginning 21st August 2017

I worked on quite a number of different projects this week, mostly lots of little bits of work rather than major things.  I set up an initial website for Kirsteen McCue’s Romantic National Song Network project, which involved trying out different themes, preparing background images and the like.  I also upgraded all of the WordPress instances I manage to the latest release and spoke to Chris McGlashan about the possibility of moving all our sites from HTTP to HTTPS.  This would be great from a security point of view and as the majority of our sites are just subdomains of the main University domain I’m hoping we can just use the existing certificate with our sites.

I replied to Gavin Miller, who wanted my input into a new Wellcome Trust bid he is putting together and I continued an email discussion with Alison Wiggins about her new project.  I also updated the Digital Humanities at Glasgow website to add several new projects to the resource and to update the records of some existing projects, such as ‘Basics of English Metre’, which now contains information about the app rather than the ancient web resource.  See all of the projects here: http://digital-humanities.glasgow.ac.uk/projects/.

On Thursday I attended the ‘SICSA Digital Humanities meets Computer Science Workshop’ at the University of Strathclyde.  It was a very interesting event with lots of opportunities to talk to other digital humanities and computing specialists and to learn more about other projects.  Unfortunately I had to leave early due to childcare obligations, but I found the parts I was able to attend to be very useful.

The biggest chunk of work I did this week was to develop a map of reform societies for Rhona Brown’s Edinburgh Gazetteer project.  Rhona had prepared a Word document that listed about 90 reform societies that were mentioned across all of the pages of the Gazetteer and I had to convert this into data that could then be plugged into a map interface.  We had previously arranged with the NLS to use on of their geocoded historical maps as a base map – John Thomson’s map of Scotland from 1815, which is the same base map I’d previously used for the Robert Burns walking tours feature (see http://burnsc21.glasgow.ac.uk/highland-tour-interactive/) so I got to work setting this up.  I decided to structure the data using JSON, as this could very easily be plugged into the map but also then reused for a textual list of the societies.  I had to manually grab the latitude and longitude values for each location using Google Maps, which was a bit of a pain, but thankfully although there were about 90 records many of these were at the same location, which cut down on the required work slightly.  For example, there are 13 reform societies in Edinburgh and 9 in Glasgow.  In the end I had a JSON structure for each record as follows:

{“id”:61, “latLng”: [55.941855, -3.054019], “toolTip”: “Musselburgh”, “title”: “Friends of Reform, Musselburgh “,”people”:”Preses: Colen Clerk<br />Secretary: William Wilson”,”pageID”:92,”linkText”:”19 February 1793, p.4″}

This provided the information for the location on the map, the tooltip that appears when you hover over a point and the contents of the popup, including a link through to the actual page of the Gazetteer where the society is mentioned.  I spent a bit of time thinking about the best way to represent there being multiple records at a single point.  I considered using circles of different sizes to let people see at a glance where the largest number of societies were, but realised this actually made it look like a larger geographical area was being covered instead.  I then decided to have a number in the marker to show how many societies were there.  I was using Leaflet circlemarkers rather than pins, as I didn’t want to give the impression that the societies were associated with an exact point on the map, but unfortunately adding text to Leaflet circlemarkers isn’t possible.  Instead I switched to using Leaflet’s divicon (see http://leafletjs.com/reference-1.2.0.html#divicon).  This marker type allows you to specify HTML to appear on the map and to then style the marker with regular CSS styling.  It took a bit of experimentation to get the style looking as I wanted – positioning the text was especially tricky – but in the end I had a map featuring circles with numbers in the middle, which I think works rather well.  Another issue is the old map is not completely accurate, meaning the real latitude and longitude values for a place may actually result in a marker some way off on the historical map.  However, I spoke to Rhona about this and she said it didn’t really matter too much.  I also added in a ‘full screen’ option for the map, and for good measure I added the same feature to the Gazetteer page too, for browsing round the large Gazetteer page images.  It all seems to be working pretty well.  The site isn’t live yet so I can’t include the URL, but here’s an image of the map:

Also this week I helped Michael Shaw of The People’s Voice project with a file upload issue he was experiencing.  I created a CSV upload facility for adding data to the project’s database but his file just wouldn’t upload.  It turned out to be an issue with CSVs created on a Mac, but we implemented a workaround for this.  I also had an email conversation with Joanna Kopaczyk, who will be starting in English Language next month.  She has an idea for a project and wanted to ask for my advice on some technical matters.

Finally this week I started working on the Technical Plan for a project Thomas Clancy is putting together.  It’s another place-name project and it will use a lot of the same technologies as the REELS project so I’m helping out with this.  I should hopefully get a first draft of the Technical Plan together during next week, although this depends on when some of the questions I’ve asked can be answered.

Week Beginning 14th August 2017

I was on holiday last week but was back to work on Monday this week.  I’d kept tabs on my emails whilst I was away but as usual there were a number of issues that had cropped up in my absence that I needed to sort out.  I spent some time on Monday going through emails and updating my ‘to do’ list and generally getting back up to speed again after a lazy week off.

I had rather a lot of meetings and other such things to prepare for and attend this week.  On Monday I met with Bryony Randall for a final ‘sign off’ meeting for the New Modernist Editing project.  I’ve really enjoyed working on this project, both the creation of the digital edition and taking part in the project workshop.  We have now moved the digital edition of Virginia Woolf’s short story ‘Ode written partly in prose on seeing the name of Cutbush above a butcher’s shop in Pentonville’ to what will hopefully be its final and official URL and you can now access it here: http://nme-digital-ode.glasgow.ac.uk

On Tuesday I was on the interview panel for Jane Stuart-Smith’s SPADE project, which I’m also working on for a small percentage of my time.  After the interviews I also had a further meeting with Jane to discuss some of the technical aspects of her project.  On Wednesday I met with Alison Wiggins to discuss her ‘Archives and Writing Lives’ project, which is due to begin next month.  This project will involve creating digital editions of several account books from the 16th century.  When we were putting the bid together I did quite a bit of work creating a possible TEI schema for the account books and working out how best to represent all of the various data contained within the account entries.  Although this approach would work perfectly well, now that Alison has started transcribing some entries herself we’ve realised that managing complex relational structures via taxonomies in TEI via the Oxygen editor is a bit of a cumbersome process.  Instead Alison herself investigated using a relational database structure and had created her own Access database.  We went through the structure when we met and everything seems to be pretty nicely organised.  It should be possible to record all of the types of data and the relationships between these types using the Access database and so we’ve decided that Alison should just continue to use this for her project.  I did suggest making a MySQL database and creating a PHP based content management system for the project, but as there’s only one member of staff doing the work and Alison is very happy using Access it seemed to make sense to just stick with this approach.  Later on in the project I will then extract the data from Access, create a MySQL database out of it and develop a nice website for searching, browsing and visualising the data.  I will also write a script to migrate the data to our original TEI XML structure as this might prove useful in other projects.

It’s Performance and Development Review time again, and I have my meeting with my line manager coming up, so I spent about a day this week reviewing last year’s objectives and writing all of the required sections for this year.  Thankfully having my weekly blog posts makes it easier to figure out exactly what I’ve been up to in the review period.

Other than these tasks I helped Jane Roberts out with an issue with the Thesaurus of Old English, I fixed an issue with the STARN website that Jean Anderson had alerted me to, I had an email conversation with Rhona Brown about her Edinburgh Gazetteer project and I discussed data management issues with Stuart Gillespie.  I also uploaded the final set of metaphor data to the Mapping Metaphor database.  That’s all of the data processing for this project now completed, which is absolutely brilliant.  All categories are now complete and the number of metaphors has gone down from 12938 to 11883, while the number of sample lexemes (including first lexemes) has gone up from 25129 to a whopping 45108.

Other than the above I attended the ‘Future proof IT’ event on Friday.  This was an all-day event organised by the University’s IT services and included speakers from JISC, Microsoft, Cisco and various IT related people across the University.  It was an interesting day with some excellent speakers, although the talks weren’t as relevant to my role as I’d hoped they would be.  I did get to see Microsoft’s HoloLens technology in action, which was great, although I didn’t personally get a chance to try the headset on, which was a little disappointing.

 

Week Beginning 12th September 2016

I didn’t have any pressing deadlines for any particular projects this week so I took the opportunity to return to some tasks that had been sitting on my ‘to do’ list for a while.  I made some further changes to the Edinburgh Gazetteer manuscript interface:  Previously the width of the interface had a maximum value applied to it, meaning that on widescreen monitors the area available to pan and zoom around the newspaper image was much less wide than the screen width and there was lots of empty, wasted white space on either side.  I’ve now changed this to remove the maximum width restriction, thus making the page much more usable.

I also continued to work with the Hansard data.  Although the data entry processes have now completed it is still terribly slow to query the data, due to both the size of the data and the fact that I haven’t added in any indexes yet.  I tried creating an index when I was working from home last week but the operation timed out before it completed.  This week I tried from my office and managed to get a few indexes created.  It took an awfully long time to generate each one, though – between 5 and 10 hours per index.  However, now that the indexes are in place a query that can utilise an index is now much speedier.  I created a little script on my test server that connects to the database and grabs the data for a specified year and then outputs this as a CSV file and the script only takes a couple of minutes to process.  I’m hoping I’ll be able to get a working version of the visualisation interface for the data up and running, although this will have to be a proof of concept as it will likely still take several minutes for the data to process and display until we can get a heftier database server.

I had a task to perform for the Burns people this week – launching a new section of the website, which can be found here: http://burnsc21.glasgow.ac.uk/performing-burnss-songs-in-his-own-day/.  This section includes performances of many songs, including both audio and video.  I also spent a fair amount of time this week giving advice to staff.  I helped Matt Barr out with a jQuery issue, I advised the MVLS people on some app development issues, I discussed a few server access issues with Chris McGlashan, I responded to an email from Adrian Chapman about a proposal he is hoping to put together, I gave some advice to fellow Arts developer Kirsty Bell who is having some issues with a website she is putting together, I spoke to Andrew Roach from History about web development effort and I spoke to Carolyn Jess-Cooke about a proposal she is putting together.  Wendy also contacted me about an issue with the Mapping Metaphor Staff pages, but thankfully this turned out to be a small matter that I will fix at a later date.  I also met separately with both Gary and Jennifer to discuss the Atlas interface for the SCOSYA project.

Also this week I returned to the ‘Basics of English Metre’ app that I started developing earlier in the year.  I hadn’t had time to work on this since early June so it took quite a bit of time to get back up to speed with things, especially as I’d left off in the middle of a particularly tricky four-stage exercise.  It took a little bit of time to think things through but I managed to get it all working and began dealing with the next exercise, which is unlike any previous exercise type I’ve dealt with as it requires an entire foot to be selected.  I didn’t have the time to complete this exercise so to remind myself for when I next get a chance to work on this:  Next I need to allow the user to click on a foot or feet to select it, which should highlight the foot.  Clicking a second time should deselect it.  Then I need to handle the checking of the answer and the ‘show answer’ option.

On Friday I was due to take part in a conference call about Jane’s big EPSRC proposal, but unfortunately my son was sick during Thursday night and then I caught whatever he had and had to be off work on Friday, both to look after my son and myself.  This was not ideal, but thankfully it only lasted a day and I am going to meet with Jane next week to discuss the technical issues of her project.

Week Beginning 8th August 2016

This was my first five-day week in the office for rather a long time, what with holidays and conferences.  I spent pretty much all of Monday and some of Tuesday working on the Technical Plan for a proposal Alison Wiggins is putting together.  I can’t really go into any details here at this stage, but the proposal is shaping up nicely and the relatively small technical component is now fairly clearly mapped out.  Fingers crossed that it receives funding.  I spent a small amount of time on a number of small-scale tasks for different project, such as getting some content from the DSL server for Ann Ferguson and fixing a couple of issues with the Glasgow University Guardian that central IT services had contacted me about.  I also emailed Scott Spurlock in Theology to pass on my notes from the crowdsourcing sessions of DH2016, as I thought they might be of some use to him, and I had an email conversation with Gerard McKeever in Scottish Literature about a proposal he is putting together that has a small technical component he wanted advice on.  I also had an email conversation with Megan Coyer about the issues relating to her Medical Humanities Network site.

The remainder of the week was split between two projects.  First up is the Scots Syntax Atlas project.  Last week I began working through a series of updates to the content management system for the project.  This week I completed the list of items that I’d agreed to implement for Gary when we met a few weeks ago.  This consisted of the following:

  1. Codes can now be added via ‘Add Code’.  This now includes an option to select attributes for the new code too
  2. Attributes can now be added via ‘Add Attribute’.  This allows you to select the codes to apply the attribute to.
  3. There is a ‘Browse attributes’ page which lists all attributes and the number of codes associated with each.
  4. Clicking on an attribute in this list displays the code associations and allows you to edit the attribute – both its name and associated codes
  5. There is a ‘Browse codes’ page that lists the codes, the number of questionnaires each code appears in, the attributes associated with each code and the example sentences for each code.
  6. Clicking on a code in this list brings up a page for the code that features a list of its attributes and example sentences, plus a table containing the data for every occurrence of this code in a questionnaire, including some information about each questionnaire, a link through to the full questionnaire page, plus the rating information.  You can order the table by clicking on the headings.
  7. Through this page you can edit the attributes associated with the code
  8. Through this page you can also add / edit example sentences for the code.  This allows you to supply both the ‘Q code’ and the sentence for as many sentences as are required.
  9. I’ve also updated the ‘browse questionnaires’ page to make the ‘interview date’ the same ‘yyyy-mm-dd’ format as the upload date, to make it easier to order the table by this column in a meaningful way.

With all of this out of the way I can now start on developing the actual atlas interface for the project, although I need to meet with Gary to discuss exactly what this will involve. I’ve arranged to meet with him next Monday.

The second project I worked on was the Edinburgh Gazetteer project for Rhona Brown.  I set up the WordPress site for the project website, through which the issues of the Gazetteer will be accessible, as will the interactive map of ‘reform societies’.  I’ve decided to publish these via a WordPress plugin that I’ll create for the project, as it seemed the easiest way to integrate the content with the rest of the WordPress site.  The plugin won’t have any admin interface component, but will instead focus on providing the search and browse interface for the issues of the Gazetteer and the map, via a WordPress shortcode.

I tackled the thorny issue of OCR for the Gazetteer’s badly printed pages again this week.  I’m afraid it’s looking like this is going to be hopeless.  I should really have looked at some of the pages whilst we were preparing the proposal because if I’d seen the print quality then I would never have suggested OCR as a possibility.  I think the only way to extract the text in a useable way will be manual transcription.  We might be able to get the images online and then instigate some kind of rudimentary crowd-sourcing approach.  There aren’t that many pages (325 broadsheet pages in total) so it might be possible.

I tried three different OCR packages – Tesseract (which Google uses for Google Books), ABBYY Finereader, and Omipage Pro (these are considered to be the best OCR packages available).  I’m afraid none of them give usable results.  The ABBYY one looks to me to be the best, but I would still consider it unusable, even for background search purposes, and it would probably take more time to manually correct it than it would to just transcribe the page manually.

Here is one of the better sections that was produced by ABBYY:

“PETioN^c^itlzensy your intentiofofoubtlefs SS^Q c.bferve a dignified con du& iti this important Caiife. You wife to difcuft and to decide with deliberation; My opinion refpe&ing inviolability is well known. I declare^ my principled &t a time when a kind Of fu- perftitious refpcftjVfiasgenerallyentetfoinedforthisin¬violability, yet I think .that you ought to treat a qtief- tion of fo much’magnitude diftin&ly -from all ..flfoers. i A number of writings had already appeared, all. of ’ which are eagerly read and -compared  */;.,- France, *”1t”

Here is the same section in Tesseract:

“Pz”‘-rzo,\1.—a“.’€:@i1;izens, your iiitenziogzcloubtlefs is to

c:‘oferv’e_ a dig1]lfiQia-COI1£‘l_lX€,l’.l_l) this important ‘ca_ufe.

You with to ‘clil’cii’fs_and to decide with deliberation‘.

My opinion refpeéling inviolability is Well l”l°“’“–

 

red my principles atra, tiine when a kind of in-

 

‘us refpc&_jw:as gener-allAy_ Efained for tl1isin-

 

.3331’ y, yet–E tllivllkrtllgt .y'{ou_6,ugl1l’— l° ‘Feat ‘$1_Fl”e{‘

t¢o;aof_fo‘inuch magnitude diitinélly from all ‘filters-

, X number of wiitiiigs had already nap” ared, all. of

‘ill tell’ are eagerly read and compared Fl‘,-“‘“-“ea “=1”

“Europe haveitl-ieir eyesup 0 i g 53‘. “Ure-”

 

And here it is outputted from Omnipage:

“PETIet\-.” Citizens, your intention doubtlefs is to cufeive a dignified conduct it, this important eaufe. You wifil to cffcufs and to decide with deliberation. fly opinion rcfncaing inviolability is well known. I declared my principles it a time when a kind of fu¬

i               tcitained for this in¬Pcrftitioas tc.pc~t tva,gcncrilly en

vioiabilit)•, yet I tlftok:that you ought to treata quef¬tic»t of fo much magnitude d!Stin4ly from all others. A number of writings had already appeared, all of whidi are eagerly read anti compared,     France, ail Europe I:ave their eyes Upon- you m this great ca ufe.”

As an experiment I manually transcribed the page myself, timing how long it took. Here is how the section should read:

“Petition- “Citizens, your intention doubtless is to observe a dignified conduct in this important cause.  You wish to discuss and to decide with deliberation.  My opinion respecting inviolability is well known.  I declared my principles at a time when a kind of superstitious respect was generally entertained for this inviolability, yet I think that you ought to treat a question of so much magnitude distinctly from all others. A number of writings had already appeared, all of which are eagerly read and compared.  France, all Europe have their eyes upon you in this great cause.”

It took about 100 minutes to transcribe the full page.  As there are 325 images then full transcription would take 32,500 minutes, which is about 541 hours.  Working solidly for 7 hours a day on this would mean full transcription would take one person about 77 and a half days, which is rather a long time.  I wonder if there might be members of the public who would be interested enough in this to transcribe a page or two?  It might be more trouble than it’s worth to pursue this, though.  I will return to the issue of OCR, and see if anything further can be done, for example training the software to recognise long ‘s’, but I decided to spend the rest of the week working on the browse facility for the images instead.

I created three possible interfaces for the project website, and after consulting Rhona I completed an initial version of the interface, which incorporates the ‘Edinburgh Gazetteer’ logo with inverted colours (to get away from all that beige that you end up with so much of when dealing with digitising old books and manuscripts).  Rhona and I also agreed that I would create a system for associating keywords with each page, and I created an Excel spreadsheet through which Rhona could compile these.

I also created an initial interface for the ‘browse issues’ part of the site.  I based this around the OpenLayers library, which I configured to use tiled versions of the scanned images that I created using an old version of Zoomify that I had kicking around.  This allows users to pan around the large images of each broadsheet page and zoom in on specific sections to enable reading.

I created a ‘browse’ page for the issues, split by month.  There are thumbnails of the first page of each, which I generated using ImageMagick and a little PHP script.  Further PHP scripts extracted dates from the image filenames, created database records, renamed the images, grouped images into issues and things like that.

You can jump to a specific month by pressing on the buttons at the top of the ‘browse’ page, and clicking on a thumbnail opens the issue at the first page.

When you’ve loaded a page the image is loaded into the ‘zoom and pan’ interface.  I might still rework this so it uses the full page width and height as on wide monitors there’s an awful lot of unused white space at the moment.  The options above the image allow you to navigate between pages (if you’re on page one of an issue the ‘previous’ button takes you to the last page of the previous issue.  If you’re on the last page of the issue the ‘next’ button takes you to page one of the next issue).  And I added in other buttons that allow you to load the full image and return to the Gazetteer index page.

All in all it’s been a very productive week.

 

 

 

Week Beginning 1st August 2016

This was a very short week for me as I was on holiday until Thursday.  I still managed to cram a fair amount into my two days of work, though.  On Thursday I spent quite a bit of time dealing with emails that had come in whilst I’d been away.  Carole Hough emailed me about a slight bug in the Old English version of the Mapping Metaphor website.  With the OE version all metaphorical connections are supposed to default to a strength of ‘both’ rather than ‘strong’ like with the main site.  However, when accessing data via the quick and advanced search the default was still set to ‘strong’, which was causing some confusion as this was obviously giving different results to the browse facilities, which defaulted to ‘both’.  Thankfully it didn’t take long to identify the problem and fix it.  I also had to update a logo for the ‘People’s Voice’ project website, which was another very quick fix.  Luca Guariento, who is the new developer for the Curious Travellers project, emailed me this week to ask for some advice on linking proper names in TEI documents to a database of names for search purposes and I explained to him how I am working with this for the ‘People’s Voice’ project, which has similar requirements.  I also spoke to Megan Coyer about the ongoing maintenance of her Medical Humanities Network website and fixed an issue with the MemNet blog, which I was previously struggling to update.  It would appear that the problem was being caused by an out of date version of the sFTP helper plugin, as once I updated that everything went smoothly.

I also set up a new blog for Rob Maslen, who wants to use it to allow postgrad students and others in the University to post articles about fantasy literature.  I also managed to get Rob’s Facebook group integrated with the blog for his fantasy MLitt course.  I’ve also got the web space set up for Rhona’s Edinburgh Gazetteer project, and extracted all of the images for this project too.  I spent about half of Friday working on the Technical Plan for the proposal Alison Wiggins is putting together and I now have a clearer picture of how the technical aspects of the project should fit together.  There is still quite a bit of work to do on this document, however, and a number of further questions I need to speak to Alison about before I can finish things off.  Hopefully I’ll get a first draft completed early next week, though.

The remainder of my short working week was spent on the SCOSYA project, working on updates to the CMS.  I added in facilities to create codes and attributes through the CMS, and also to browse these types of data.  This includes facilities to edit attributes and view which codes have which attributes and vice-versa.  I also began work on a new page for displaying data relating to each code – for example which questionnaires the code appears in.  There’s still work to be done here, however, and hopefully I’ll get a chance to continue with this next week.