This was my third four-day week in a row as I’ve taken Friday off (and will be off all next week too). It was another mostly Mapping Metaphor week, with a few other bits of work thrown in as well. Last week I managed to get the ‘Metaphoric’ app submitted to both the Apple App Store and the Google Play Store, which was very satisfying. Over the weekend the Android app was approved for sale and appeared on the Google Play Store (the project doesn’t have its official launch until later this month but you can try the app out here: https://play.google.com/store/apps/details?id=com.gla.metaphoric&hl=en_GB). On Thursday the app was approved for sale on the Apple App Store as well. Sometimes the Apple approval process can take a while to complete but thankfully it was rather swift this time. The iOS version of the app can be found here: https://itunes.apple.com/gb/app/metaphoric/id1095336949?mt=8.
I also spent about a day trying to make the loading of the data into the visualisation a bit quicker, especially the top-level visualisation. I’m afraid I have been unable to speed up the process, but I have at least got to the bottom of why it’s taking so long. Basically when you press on a category in the top-level visualisation the code has to go through each of the grey lines in order to work out which should be yellow. It has to take the ID of the pressed category and see if it appears as either a source or a target for each line. There are 37 categories, each of which may be connected to any of the remaining 36 categories, either as source or target. This means the code has to run 2664 times to find all of the potential matches each time a category is pressed on. I’ve been trying to figure out how I might be able to cache this data so it doesn’t have to be processed each time, but unfortunately this would mean picking apart the visualisation code and doing some major reworking, as the whole thing is based on the concept of processing data on a node by node basis. I spent a few hours today trying to do this but I’m afraid it would likely take me a long time to rework it (if I could get it working again at all).
However, there is a small silver lining, in that I’ve figured out how to get a ‘loading’ message to appear when a user presses on a category and then stay on screen until the last node has been processed. Although this appears like a simple thing, I have spent many fruitless hours over the past few weeks trying to get such a thing appearing due to a combination of the node update code operating asynchronously (so the main function can’t tell when the nodes have finished updating) and swamping the processor (resulting in the interface locking up and blocking the appearance of any ‘loading’ message). But now when you press a category on the top-level visualisation a ‘loading’ message is displayed, which I think will be a great help.
I also spent some time this week on The People’s Voice project. I started working on the CSV Upload script last week and I finished working on it this week. The ‘import CSV’ page now displays an area where you can drag and drop CSV files (or click in it to open the ‘attach file’ box). I also updated the CSV Template and the Guidelines, and provided links to these files from this page. The template was missing a field for recording page numbers in the publication so I added that on the end. The guidelines now include information about the publication types and also a warning about the publication date column. Excel seems to want to reformat the ‘yyyy-mm-dd’ dates as ‘dd/mm/yyyy’, which then causes them to fail to be uploaded. I’ve added an explanation of how you can stop Excel from doing this.
I also noticed a small problem with the pseudonyms: I was converting characters like ampersands into their HTML equivalents before splitting the data up into individual names based on a semi-colon. Unfortunately the HTML code for an ampersand is ‘&’ so my script interpreted the semi-colon as the division between two names. I’ve updated things so that the text is split by semi-colon before HTML conversion is done, which should solve the problem.
I spent a further bit of time on the Scots Thesaurus project. Magda had encountered an issue with the search facility not working for some words that had apostrophes. It turned out that this was being caused by slashes being added to the lexeme data whenever a category was updated, with these slashes then preventing the search from working properly. Thankfully it was a relatively easy thing to fix once identified.
My final project of the week was the REELS project. I set up a handy short URL for the project (www.gla.ac.uk/reels) and also addressed a number of issues that Eila had spotted when using the content management system. This included adding in buttons to navigate straight to the next and previous place-name record when editing a record and fixing a few bugs such as buttons not working. There are also some problems relating to accuracy when entering four-figure grid references (the latitude and longitude values that then get generated are sometimes very far away from where they should be). As the code I’m using to generate the figures is third party code I’m just making use of I’m not sure I can really fix this, but as there are options in the CMS to manually override the latitude and longitude values I’ve suggested that when the map point appears off the project staff to quite easily find a better value and manually enter this instead. There are a few further tweaks to the CMS that I still need to make (e.g. adding filters and pagination to the ‘browse place-names’ page) but I’ll have to do these after I’m back from my holidays. I will be on holiday all of next week and will return to work on Monday the 18th.
It’s been another busy week, but I have to keep this report brief as I’m running short of time and I’m off next Monday. I came into work on Monday to find that the script I had left executing on the Grid to extract all of the Hansard data had finished working successfully! It left me with a nice pile of text files containing SQL insert statements – about 10Gb of them. As we don’t currently have a server on which to store the data I instead started a script executing that runs each SQL insert command on my desktop PC and puts the data into a local MySQL database. Unfortunately it looks like it’s going to take a horribly long time to process the data. I’m putting the estimate at about 229 days.
My arithmetic skills are sometimes rather flaky so here’s how I’m working out the estimate. My script is performing about 2000 inserts a minute. There are about 1200 output files and based on the ones I’ve looked at they contain about 550,000 lines each. 550,000 x 1200 = 660,000,000 lines in total. This figure divided by 2000 gives the number of minutes it would take (330,000). Divide this by 60 gives the number of hours (5,500). Divide this by 24 gives the number of days (229). My previous estimate for doing all of the processing and uploading on my desktop PC was more than 2 years, so using the Grid has speeded things up enormously, but we’re going to need something more than my desktop PC to get all of the data into a usable form any time soon. Until we get a server for the database there’s not much more I can do.
On Tuesday this week we had a REELS team meeting where we discussed some of the outstanding issues relating to the structure of the database (amongst other things). This was very useful and I think we all now have a clear idea of how the database will be structured and what it will be able to do. After the meeting I wrote up and distributed an updated version of my database specification document and I also worked with some map images to create a more pleasing interface for the project website (it’s not live yet though, so no URL). Later in the week I also created the first version of the database for the project, based on the specification document I’d written. Things are progressing rather nicely at this stage.
I spent a bit of time fixing some issues that had cropped up with other projects. The Medical Humanities Network people wanted a feature of the site tweaked a little bit, so I did this. I also fixed an issue with the lexeme upload facility of the Scots Corpus, which was running into some maximum form size limits. I had a funeral to attend on Thursday afternoon so I was away from work for that.
It was a week of many projects this week, mostly working on smallish tasks that still managed to take up some time. I was involved in an email discussion this week with some of the University’s data centre people, who would like to see more Arts projects using some of the spare capacity on the ScotGrid infrastructure. This seemed pretty encouraging for the ongoing Hansard work and it culminated in a meeting with Gareth Roy, who works with the Grid for Physics on Friday. This was a very useful meeting, during which I talked through our requirements for data extraction and showed Gareth my existing scripts. Gareth gave some really helpful advice on how to tackle the extraction, such as splitting the file up into 5Mb chunks before processing and getting nodes on the Grid to tackle these chunks one at a time. At this stage we still need to see whether Arts Support will be able to provide us with the database space we require (at least 300Gb) and allow external servers (with specified IP addresses) to insert data. I’m going to meet with Chris next week to discuss this matter. At this stage things are definitely looking encouraging and hopefully some time early in the new year we’ll actually have all of the frequency data extracted.
For the Metaphor in the Curriculum project we had a little Christmas lunch out for the team on Tuesday, which was nice. On Friday Ellen and Rachael had organised a testing session for undergraduates to test out the prototype quiz that we have created, and I met with them afterwards to discuss how it went. The feedback the received was very positive and no-one encountered any problems with the interface. A few useful suggestions were made – for example that only the first answer given should be registered for the overall score, and that questions should be checked as soon as an answer is selected rather than having a separate ‘check answer’ button. I’ll create a new version of the prototype with these suggestions in place.
Hannah Tweed contacted me this week with some further suggestions for the Medical Humanities Network website, including adding facilities to allow non-admin users to upload keywords and some tweaks to the site text. I still need to implement some of the other requests she made, such as associating members with teaching materials. I should be able to get this done before Christmas, though.
Magda also contacted me about updating the Scots Thesaurus search facility to allow variants of words to be searched for. Many words have multiple forms divided with a slash, or alternative spellings laid out with brackets, for example ‘swing(e)’. Other forms were split with hyphens or included apostrophes are Magda wanted to be able to search for these with or without the hyphens. I created a script that generated such variant forms and stored them in a ‘search terms’ database table, much in the same way as I had done for the Historical Thesaurus of English. I then updated the search facilities so that they checked the contents of this new table and I also updated the WordPress plugin so that whenever words are added, edited or deleted the search variants are updated to reflect this. Magda tested everything out and all seems to be working well.
For the SCOSYA project Gary sent me the first real questionnaire to test out the upload system with. My error checking scripts picked up a couple of problems with the contents (a typo in the codes, plus some other codes that hadn’t been entered into my database yet) but after these were addressed the upload went very smoothly. I also completed work on the facilities for editing and deleting uploaded data.
During the week there were times when the majority of internet access was cut off due to some issues with JANET. Unfortunately this had a bit of an impact on the work I could do as I do kind of need internet access to do pretty much everything I’m involved with. However, I made use of the time with some tasks I’d been meaning to tackle for a while. I installed Windows 10 on my MacBook and then reinstalled all of the software I use. I also copied all of my app development stuff from my MacBook onto my desktop computer in preparation for creating the Metaphor in the Curriculum app and also for creating new Android versions of the STELLA apps that still don’t have Android versions available.
I also spent some time this week getting up to speed on the use of Oxygen, XML and TEI in preparation for the ‘People’s Voice’ project that starts in January. I also went through all of the bid documentation for this project and began to consider how the other technical parts of the project might fit together. I have a meeting with Gerry and Catriona next week where we will talk about this further.
My time this week was mostly spent on the same three projects as last week. On Monday I met with Pauline Mackay to run through the updates I’d made to my ‘test version’ of the Burns website based on her document of suggested changes. All of the updates have come together remarkable quickly and easily (so far!) and at the meeting we just confirmed what still needed to be done (mostly by Pauline) and when it would be done (ideally we’ll be launching the new version of the website early next week). There are a few further tweaks I’ll need to make, but other than replacing the live version of the site with the new version my work is pretty much done.
For ‘Metaphor in the Curriculum’ we had a further project meeting on Tuesday, where we talked about the mock-up metaphor quizzes that I’d previously produced. Everyone seems very happy with how these are turning out, and Ellen and Rachael showed them to some school children whilst they were visiting a school recently and the feedback was positive, which is encouraging. After the meeting Ellen sent me some updated and extended quiz questions and I set to work on creating a more extensive prototype based on these questions. Ellen and Rachael are hopefully going to be able to test this version out on some undergraduates next week, so it was important that I got this new version completed. This version now ‘tracks’ the user’s answers during a session using HTML5 sessionStorage. This allows us to give the user a final score at the end of the quiz and also allows the user to return to previously answered questions and look at the results again. It took a fair amount of time to get these (and other) updates in place, but I think the quiz is looking pretty good now and once we have further quizzes it should be possible to just plug them into the structure I’ve set up.
Most of the remainder of my week was spent on the content management system for the SCOSYA project. Last week I’d created a nice little drag and drop feature that enables a logged in user to upload CSV files. This week I needed to extend this so that the data contained within the CSV files could be extracted and added into the relevant tables (if it passed a series of validation checks, of course). During the course of developing the upload script I spotted a few possible shortcomings with the way the questionnaire template was structured and Gary and I had a few chats about this, which resulted in a third version of the template being created. Hopefully this version will be the final one. As the data will all be plotted on maps, storing location data for the questionnaires is pretty important. The questionnaire includes the postcode of the interviewee, and also the postcodes of the fieldworker and the interviewer. I found a very handy site called http://uk-postcodes.com/ that gives data for a place based on a postcode that is passed to it. In addition to providing a web form the site also has an API that can spit out data in the JSON format – for example here is the data for the University’s postcode: http://uk-postcodes.com/postcode/G128QQ.json
This data includes latitude and longitude values for the postcode, which will be vital for pinning questionnaire results on a map, and I managed to get my upload script to connect to the postcode API, check the supplied postcode and return the lat/long values for insertion into our database. It seems to work very well. It does mean that we’re dependant on a third party API for our data to be uploaded successfully and I’ll just have to keep an eye on how this works out, but if the API proves to be reliable it will really help with the data management process.
There was a major powercut at the university on Wednesday that knocked out all power to one of the buildings where our servers are located, including the server I’m using for SCOSYA, and this cut down on the amount of time I could spent on the project, but despite this by the end of the week I had managed to complete the upload script, a page for browsing uploaded data and a page for viewing a complete record. Next week I’ll create facilities to allow uploaded data to be edited or deleted, and after that I’ll need to meet with Gary again to discuss what other features are required of the CMS at this stage.
The powercut also took down the DSL website, amongst others (this blog included) so I spent some time on Wednesday evening and Thursday morning ensuring everything was back online again. I also spent a bit of time this week on the Scots Thesaurus project. Magda was having problems uploading new lexemes to a particularly large category, even though new lexemes could still be added with no problems to a smaller category. This was a very odd error that seemed to be caused by the number of lexemes in a category. After a bit of investigation I figured out what was causing the problem. The ‘edit category’ page is an absolutely ginormous form, made even larger because it’s within WordPress and it adds even more form elements to a page. PHP has a limit of 1000 form elements in a POST form and rather astoundingly the ‘edit’ page for the category in question had more than 1000 elements. With this figured out I asked Chris to update PHP to increase the number of elements and that solved the problem. Magda has also been working on updated word forms and I need to create a new ‘search term’ table that allows words with multiple variants to be properly searched. I’ll need to try and find the time to do this next week.
I worked on quite a number of different projects this week. My first task was to set up a discussion forum for Sean Adams’ Academic Publishing conference website. The website is another WordPress powered site and I hadn’t worked with any forum plugins before so it was interesting to learn about this. I settled for the widely adopted ‘bbpress’ plugin, which turned out to be very straightforward to set up and integrates nicely with WordPress. I had to tweak the University theme I’d created a little so that the various sections displayed properly, but after that all appeared to be working well. I also spent some time continuing to contribute to the new Burns bid for Gerry Carruthers. I’d received some feedback on my first version of the Technical Plan and based on this and some updated bid documentation I created a second version. I also participated in some email discussions about other parts of the bid too. It seems to be shaping up very well. Ann Fergusson from SND contacted me this week as someone had spotted a missing section of text in one of the explanatory pages. I swiftly integrated the missing text and all is now well again.
On Tuesday I had a meeting with Susan and Magda about the Scots Thesaurus. We went through some of the outstanding tasks and figured out how and when these would be implemented. The biggest one is the creation of a search variants table, which will allow any number of different spellings to be associated with a lexeme, enabling it to be found by the search option. However, Magda is going to rework a lot of the lexemes over the coming weeks so I’m going to hold off on implementing this feature until this work has been completed.
I also had a Mapping Metaphor task to do this week: updating the database with new data. Wendy has been continuing to work with the data, adding in directionality, dates and sample lexemes and Ellen sent me a new batch. It has been a while since I’d last uploaded new data and it took me a while to remember how my upload script worked, but after I’d figured that out everything went smoothly. We now have information about 16,378 metaphorical connections in the system and 12,845 sample lexemes linked into the Historical Thesaurus.
I’m going to be on holiday next week and the following week I’m going to be at a conference so there will be no more from me until after I return.
This week was a return to something like normality after the somewhat hectic time I had in the run-up to the launch of the Scots Thesaurus website last week. I spent a bit of further time on the Scots Thesaurus project, making some tweaks to things that were noticed last week and adding in some functionality that I didn’t have time to implement before the launch. This include differentiating between regular SND entries and supplemental entries in the ‘source’ links and updating the advanced search functionality to enable users to limit their search by source. I also spent the best part of a day working on the Technical Plan for the Burns people, submitting a first draft and a long list of questions to them on Monday. Gerry and Pauline got back to me with some replies by the end of the week and I’ll be writing a second version of the plan next week.
On Friday we had a team meeting for the Metaphor in the Curriculum project. We spent a couple of hours going over the intended outputs of the project and getting some more concrete ideas about how they might be structured and interconnected, and also about timescales for development. It’s looking like I will be creating some mockups of possible exercise interfaces in early November, based on content that Ellen is going to send to me this month. I will then start to develop the app and the website in December with testing and refinement in January, or there abouts.
I also spent some time this week working on the Medical Humanities Network website for Megan Coyer. I have now completed the keywords page, the ‘add and edit keywords’ facilities and I’ve added in options to add and edit organisations and units. I think that means all the development work is now complete! I’ll still need to add in any site text when this has been prepared and I’ll need to remove the ‘log in’ pop-up when the site is ready to go live. Other than that my development work on this project is now complete.
Continuing on a Medical Humanities them, I spent a few hours this week working on some of the front end features for the ScifFiMedHums website, specifically features that will allow users to browse the bibliographical items for things like years and themes. There’s still a lot to implement but it’s coming along quite nicely. I also helped Alison Wiggins out with a new website she’s wanting to set up. It’s another WordPress based site and the bare-bones site is now up and running and ready for her to work with when she has the time available.
On Friday afternoon I received my new desktop PC for my office and I spent quite a bit of the afternoon getting it set up, installing software, copying files across from my old PC and things like that. It’s going to be so good to have a PC that doesn’t crash if you tell it to open Excel in the afternoons!
A lot of this week was devoted to the Scots Thesaurus project, which we launched on Wednesday. You can now access the website here: http://scotsthesaurus.org/. I spent quite a bit of time on Monday and Tuesday making some last minute updates to the website and visualisations and also preparing my session for Wednesday’s colloquium. The colloquium went well, as did the launch itself. We had considerable media attention and many thousands of page hits and thankfully the website coped with all of this admirably. I still have a number of additional features to implement now the launch is out of the way, and I’ll hopefully get a chance to implement these in the coming weeks.
Other than Scots Thesaurus stuff I had to spend a day or so this week doing my Performance and Development review exercise. This involved preparing materials, having my meeting and then updating the materials. It has been a very successful year for me so the process all went fine.
I spent some of the remainder of the week working on the front end for the bibliographical database for Gavin Miller’s SciFiMedHums project. This involved updating my WordPress plugin to incorporate some functions that could then be called in the front-end template page using WordPress shortcodes. This is a really handy way to add custom content to the WordPress front end and I managed to get a first draft of the bibliographical entry page completed, including lists of associated people, places, organisations and other items, themes, excerpts and other information. It’s all looking pretty good so far, but there is still a lot of functionality to add, for example search and browse facilities and the interlinking of data such as themes (e.g. click on a theme listed in one entry to view all other entries that have been classified with it).
I also spent some time this week starting on the Technical Plan for a new project for the Burns people, but I haven’t got very far with it yet. I’ll be continuing with this on Monday. So, a very short report this week, even though the week itself was really rather hectic.
I attended a project meeting and workshop for the Linguistic DNA project this week (see http://www.linguisticdna.org/ for more information and some very helpful blog posts about the project). I’m involved with the project for half a day a week over the next three years, but that effort will be bundled up into much larger chunks. At the moment there are no tasks assigned to me so I was attending the meeting mainly to meet the other participants and to hear what has been going on so far. It was really useful to meet the project team and to hear about their experiences with the data and tools that they’re working with so far. The day after the project meeting there was a workshop about the project’s methodological approach, and this also featured a variety of external speakers who are dealing or who have previously dealt with some of the same sorts of issues that the project will be facing, so it was hugely informative to hear these speakers too.
Preparing for, travelling to and attending the project meeting and workshop took up a fair chunk of my working week, but I did also manage to squeeze in some work on other projects as well. I spent about a day continuing to work on the Medical Humanities Network website, adding in the teaching materials section and facilities to manage teaching materials and the images that appear in the carousel on the homepage. I’ve also updated the ‘spotlight on’ feature so that collections and teaching materials can appear in this section in addition to projects. That just leaves keyword management, the browse keywords feature, and organisation / unit management to complete. I also spent a small amount of time updating the registration form for Sean’s Academic Publishing event. There were a couple of issues with it that needed tweaking, for example sending users a notification email and things like that. All fairly minor things that didn’t take long to fix.
I also gave advice to a couple of members of staff on projects they are putting together. Firstly Katherine Heavey and secondly Alice Jenkins. I can’t really go into any detail about their projects at this stage, but I managed to give them some (hopefully helpful) advice. I met with Fraser on Monday to collect my tickets for the project meeting and also to show him developments on the Hansard visualisations. This week I added a couple of further enhancements which enable users to add up to seven different lines on the graph. So for example you can compare ‘love’ and ‘hate’ and ‘war’ and ‘peace’ over time all on the same graph. It’s really quite a fascinating little tool to use already, but of course there’s still a lot more to implement. I had a meeting with Marc on Wednesday to discuss Hansard and a variety of other issues. Marc made some very good suggestions about the types of data that it should be possible to view on the graph (e.g. not just simple counts of terms but normalised figures too).
I also met with Susan and Magda on Monday to discuss the upcoming Scots Thesaurus launch. There are a few further enhancements I need to make before next Wednesday, such as adding in a search term variant table for search purposes. I also need to prepare a little 10 minute talk about the implementation of the Scots Thesaurus, which I will be giving at the colloquium. There’s actually quite a lot that needs to be finished off before next Wednesday and a few other tasks I need to focus on before then as well, so it could all get slightly rushed next week.
My time this week was mostly divided between three projects: the Scots Thesaurus, the Hansard work for SAMUELS and the Medical Humanities Network. For the Scots Thesaurus I managed to tick off all of the outstanding items on my ‘to do’ list (although there are still a number of refinements and tweaks that still need to be made). I updated the ‘Thesaurus Browse’ page so that it shows all of the main categories that are available in the system. These are split into different tabs for each part of speech and I’ve added in a little feature that allows the user to select whether the categories are ordered by category number or heading. I also completed a first version of the search facilities. There is a ‘quick search’ box that appears in the right-hand column of every page, which searches category headings, words and definitions. By default it performs an exact match search, but you can specify an asterisk character at the beginning and / or end. There’s also an advanced search that allows you to search for word, parts of speech, definition and category. Asterisk wildcards can be used in the word, definition and category text boxes here too. The ‘Jump to category’ feature is also now working. I’ve also added the ‘Thesaurus Search’ and ‘Browse Thesaurus Categories’ pages as menu items so people can find the content and I’ve also reinstated the ‘random category’ feature widget in the right-hand column.
Another feature that had been requested was to provide a way for people to click on a word in a category to find out which other categories it appears in. To achieve this I added in a little magnifying glass icon beside each word, and clicking on this performs a quick search for the word. I also made some further refinements to the visualisation as follows:
- The visualisation can now be ‘zoomed and panned’ like Google Maps. Click, hold and drag any white space in the visualisation and you can move the contents, meaning if you open lots of stuff that gets lost off the right-hand edge you can simply drag the visualisation to move this area to the middle. You can also zoom in and out using the scroll wheel on your mouse. The zoom functionality isn’t really all that important, but it can help if you want to focus in on a cluttered part of the visualisation.
- Category labels in the visualisation are now ‘clickable’ again, as they used to be with the previous visualisation style. This makes it easier to follow links as previously only the dots representing categories were clickable.
- The buttons for ‘browsing up’ or ‘centring on a category’ in the visualisation are now working properly again. If you click on the root node and this has a parent in the database the ‘browse up’ button appears in the infobox. If you click on any other node a button displays in the infobox that allows you to make this node the root.
- In the visualisation I’ve added [+] and [-] signs to the labels of categories that have child categories. As you’d probably expect, if the child categories are hidden a [+] is displayed and when clicked on this expands the categories and changes to a [-].
I’m meeting again with Susan and Magda next Monday to discuss the website and which areas (if any) still need further work. I think it’s all come together very well.
I had quite a long but useful meeting with Fraser to discuss the Hansard data on Wednesday this week. We went through all of the options that should be available to limit what data gets displayed in the graph and have agreed to try and provide facilities to limit the data by:
- Speaker’s name
- House (commons or lords)
- Speaker’s party (commons only, and probably only possible from the 1920s onwards)
- Office (commons only)
- Constituency (commons only)
- Title (lords only)
We spent quite a lot of time looking through the metadata we have about speeches, which is split across many different SQL dumps and XML files, and it’s looking like it will be possible to get all of these options working. It’s all looking very promising.
For the Medical Humanities Network I continued working on the site and the contement management system. I realised I hadn’t added in options to record organisational units for projects or people, or to associate keywords with people. I’ve added in these facilities now. I still need to add in options to allow staff to manage organisational units. ‘Organisation’ is currently hidden as it defaults to ‘University of Glasgow’ for now and only ‘Unit’ (e.g. College of Arts, School of Critical Studies) appears anywhere. If we add an organisation that isn’t University of Glasgow this will appear, though.
I’ve also completed a first draft of the ‘collections’ section of the site, including scripts for adding, editing and listing collections. As agreed in the original project documentation, collections can only be added by admin users. We could potentially change this at some point, though. One thing that wasn’t stated in the documentation is whether collections should have a relationship with organisational units. It seemed sensible to me to be able to record who owns the collection (e.g. The Hunterian) so I’ve added in the relationship type.
It’s possible to make a collection a ‘spotlight’ feature through the collection edit page, but I still need to update the homepage so that it checks the collections as well as just projects. I’ll do this next time I’m working on the project. After that I still need to add in the teaching materials pages and complete work on the keywords section and then all of the main parts of the system should be in place.
I also spent a little time this week working on the map for Murray Pittock’s Ramsay and the Enlightenment project. I’ve been helping Craig Lamont with this, with Craig working on the data while I develop the map. Craig has ‘pinned’ quite a lot of data to the map now and was wanting me to add in the facility to enable markers of a certain type to be switched on or off. I’d never done this before using Leaflet.js so it was fun to figure out how it could work. I managed to get a very nice little list of checkboxes that when clicked on automatically turn marker types on or off. It is working very well. The next challenge will be to get it all working properly within T4.
Other than meeting with Fraser and Craig, I had a few other meetings this week. On Monday I attended a project meeting with the ‘Metaphor in the Curriculum’ project. It was good to catch up with developments on this project. It’s looking like I’ll start doing development work on this project in October, which should hopefully fit in with my other work. I also had two meetings with the Burns people this week. The first was with Kirsteen and Vivien to discuss the George Thomson part of the Burns project. There are going to be some events for this and some audio, video and textual material that they would like to be nicely packaged up and we discussed some of the possibilities. I also met with Gerry and Pauline on Friday to discuss the next big Burns project, specifically some of the technical aspects of the proposal that I will be working on. I think we all have a clearer idea of what is involved now and I’m going to start writing the technical aspects in the next week or so.
This week I returned to working a full five days, after the previous two part-time weeks. It was good to have a bit more time to work on the various projects I’m involved with, and to be able to actually get stuck into some development work again. On Monday and Tuesday and a bit of Thursday this week I focussed on the Scots Thesaurus project. The project is ending at the end of September so there’s going to be a bit of a final push over the coming weeks to get all of the outstanding tasks completed.
I spent quite a bit of time continuing to try to get an option to enable multiple parts of speech represented in the visualisations at the same time, but unfortunately I had to abandon this due to the limitations of my available time. It’s quite difficult to explain why allowing multiple parts of speech to appear in the same visualisation is tricky, but I’ll try. The difficulty is caused by the way parts of speech and categories are handled in the thesaurus database. A category for each part of speech is considered to be a completely separate entity, with a different unique identifier, different lexemes and subcategories. For example there isn’t just one category ‘01.01.11.02.08.02.02 Rain’, and then certain lexemes within it that are nouns and others that are verbs. Instead, ‘01.01.11.02.08.02.02n Rain’ is one category (ID 398) and ‘01.01.11.02.08.02.02v Rain’ is another, different category (ID 401). This is useful because categories of different parts of speech can then have different names (e.g. ‘Dew'(n) and ‘Cover with dew'(v)), but it also means building a multiple part of speech visualisation is tricky because the system is based around the IDs.
The tree based visualisations we’re using expect every element to have one parent category and if we try to include multiple parts of speech things get a bit confused as we no longer have a single top-level parent category as the noun categories have a different parent from the verbs etc. I thought of trying to get around this by just taking the category for one part of speech to be the top category but this is a little confusing if the multiple top categories have different names. It also makes it confusing to know where the ‘browse up’ link goes to if multiple parts of speech are displayed.
There is also the potential for confusion relating to the display of categories that are at the same level but with a different part of speech. It’s not currently possible to tell by looking at the visualisation which category ‘belongs’ to which part of speech when multiple parts of speech are selected, so for example if looking at both ‘n’ and ‘v’ we end up with two circles for ‘Rain’ but no way of telling which is ‘n’ and which is ‘v’. We could amalgamate these into one circle but that brings other problems if the categories have different names, like the ‘Dew’ example. Also, what then should happen with subcategories? If an ‘n’ category has 3 subcategories and a ‘v’ category has 2 subcategories and these are amalgamated it’s not possible to tell which main category the subcategories belong to. Also, subcategory numbers can be the same in different categories, so the ‘n’ category may have a subcategory ’01’ and a further one ‘01.01’ while the ‘v’ category may also have ones with the same numbers and it would be difficult to get these to display as separate subcategories.
There is also a further issue with us ending up with too much information in the right-hand column, where the lexemes in each category are displayed. If the user selects 2 or 3 parts of speech we then have to display the category headings and the words for each of these in the right-hand column, which can result in far too much data being displayed.
None of these issues are completely insurmountable, but I decided that given the limited amount of time I have left on the project it would be risky to continue to pursue this approach for the time being. Instead what I implemented is a feature that allows users to select a single part of speech to view from a list of available options. Users are able to, for example, switch from viewing ‘n’ to viewing ‘v’ and back again, but can’t to view both ‘n’ and ‘v’ at the same time. I think this facility works well enough and considerably cuts down on the potential for confusion.
After completing the part of speech facility I moved onto some of the other outstanding, ono-visualisation tasks I still have to tackle, namely a ‘browse’ facility and the search facilities. Using WordPress shortcodes I created an option that lists all of the top level main categories in the system – i.e. those categories that have no parent category. This option provides a pathway into the thesaurus data and is a handy reference showing which semantic areas the project has so far tackled. I also began work on the search facilities, which will work in a very similar manner to those offered by the Historical Thesaurus of English. So far I’ve managed to create the required search forms but not the search that this needs to connect to.
After making this progress with non-visualisation features I returned to the visualisations. The visualisation style we had adopted was a radial tree, based on this example: http://bl.ocks.org/mbostock/4063550. This approach worked well for representing the hierarchical nature of the thesaurus, but it was quite hard to read the labels. I decided instead to investigate a more traditional tree approach, initially hoping to get a workable vertical tree, with the parent node at the top and levels down the hierarchy from this expanding down the page. Unfortunately our labels are rather long and this approach meant that there were a lot of categories on the same horizontal line of the visualisation, leading to a massive amount of overlap of labels. So instead I went for a horizontal tree approach, and adapted a very nice collapsible tree style similar to the one found here: http://mbostock.github.io/d3/talk/20111018/tree.html. I continued to work on this on Thursday and I have managed to get a first version integrated with the WordPress plugin I’m developing.
Also on Thursday I met with Susan and Magda to discuss the project and the technical tasks that are still outstanding. We agreed on what I should focus in my remaining time and we also discussed the launch at the end of the month. We also had a further meeting with Wendy, as a representative of the steering group, and showed her what we’d been working on.
On Wednesday this week I focussed on Medical Humanities. I spent a few hours adding a new facility to the SciFiMedHums database and WordPress plugin to enable bibliographical items to cross reference any number of other items. This facility adds such a connection in both directions, allowing (for example) Blade Runner to have an ‘adapted from’ relationship with ‘Do androids dream of electric sheep’ and for the relationship in the other direction to then automatically be recorded with an ‘adapted into’ relationship.
I spent the remainder of Wednesday and some other bits of free time continuing to work on the Medical Humanities Network website and CMS. I have now completed the pages and the management scripts for managing people and projects and have begun work on Keywords. There should be enough in place now to enable the project staff to start uploading content and I will continue to add in the other features (e.g. collections, teaching materials) over the next few weeks.
On Friday I met with Stuart Gillespie to discuss some possibilities for developing an online resource out of a research project he is currently in the middle of. We had a useful discussion and hopefully this will develop into a great resource if funding can be secured. The rest of my available time this week was spent on the Hansard materials again. After discussions with Fraser I think I now have a firmer grasp on where the metadata that we require for search purposes is located. I managed to get access to information about speeches from one of the files supplied by Lancaster and also access to the metadata used in the Millbanksystems website relating to constituencies, offices and things like that. The only thing we don’t seem to have access to is which party a member belonged to, which is a shame is this would be hugely useful information. Fraser is going to chase this up, but in the meantime I have the bulk of the required information. On Friday I wrote a script to extract the information relating to speeches from the file sent by Lancaster. This will allow us to limit the visualisations by speaker, and also hopefully by constituency and office too. I also worked some more with the visualisation, writing a script that created output files for each thematic heading in the two-year sample data I’m using, to enable these to be plugged into the visualisation. I also started to work on facilities to allow a user to specify which thematic headings to search for, but I didn’t quite manage to get this working before the end of the day. I’ll continue with this next week.