Week Beginning 19th February 2024

On Monday this week I addressed a couple of issues with the Books and Borrowing search that had been identified last Friday.  Multi-word searches were not working as intended and were returning far too many results.  The reason being (as mentioned last week) a search for ‘Lord Byron’ (without quotes) was searching the specified field for ‘Lord’ and then all fields for ‘Byron’.  It was rather tricky to think through this issue as multi-word searches surrounded by quotes need to be treated differently, as do multi-word searches that contain a Boolean.  We don’t actually mention Booleans in the search help, but AND, OR and NOT (which must be upper-case) can be used in the search fields.

I wrote a new function that hopefully sorts out the search strings as required, but note that search strings containing multiple sets of quotes are not supported as this would be much more complicated to sort out and it seemed like a bit of an edge case.  This new function has been applied to all free-text search fields other than the quick search, which is set to search all fields anyway.  After running several tests I made the update live, and now searching author surnames for ‘Lord Byron’ finds no results, which is as it should be.

Here are some examples that do return content.

 

  1. If you search book titles for ‘rome’ you currently find 2046 records:

https://borrowing.stir.ac.uk/search/0/0/0/simple/bookname|rome

  1. If you search book titles for ‘rome popes’ you currently find 100 records (as this is the equivalent of searching book titles for ‘rome’ AND ‘popes’:

https://borrowing.stir.ac.uk/search/0/0/0/simple/bookname|rome%20popes

  1. Using Boolean ‘AND’ gives the same results:

https://borrowing.stir.ac.uk/search/0/0/0/simple/bookname|rome%20AND%20popes

  1. A search for ‘rome OR popes’ currently returns 2046 records, presumably because all book titles containing ‘popes’ also contain ‘rome’ (at least I hope that’s the case):

https://borrowing.stir.ac.uk/search/0/0/0/simple/bookname|rome%20OR%20popes

  1. A search for ‘rome NOT popes’ currently brings back 1946 records:

https://borrowing.stir.ac.uk/search/0/0/0/simple/bookname|rome%20NOT%20popes

  1. And searches for a full string also work as intended, for example a search for “see of rome”:

https://borrowing.stir.ac.uk/search/0/0/0/simple/bookname|%22see%20of%20rome%22

With this update in place I then slightly changed the structure of the Solr index to add a new ‘copy’ field that stores publication place as a string, rather than text.  This is then used in the facts, ensuring the full text of the place is displayed rather than being split into tokens.  I then regenerated the cache and asked the helpful IT people in Stirling to update this on the project’s server.  Once the update had been made everything then worked as it should.

Also this week I exported all of the dictionary entries beginning with ‘Y’ for the Anglo-Norman Dictionary as these are now being overhauled.  I also fixed an issue with the Saints Places website – a site I didn’t develop but I’m responsible for now.  A broken query was causing various errors to appear.  The strange thing is the broken query must have been present for years, but presumably the query was previously failing silently while the server on which the site now reside must be more strict.

I spent the rest of the week developing the tool for publishing linguistic surveys for the Speak For Yersel project.  I’d spent a lot of time previously working with the data for the three new linguistic areas, ensuring it was consistently stored so that a tool could be generated to publish the data for all three areas (and more in future).  This week I began developing the tool.  I spent the week developing a ‘setup’ script that would run in a web browser and allow someone to create a new survey website – specifying one or more survey types (e.g. Phonology, Morphology) and uploading sets of questions and answer options for each survey type.  The setup script then provides facilities to integrate the places that the survey area will use to ascertain where a respondent is from and where a marker corresponding to their answer should be located.  This data includes both GoeJSON data and CSV data, both of which need to be analysed by the script and brought together in a relational database.  It took most of the week to create all of the logic for processing all of the above, as you can see a screenshot of one of the stages below:

As I developed the script I realised further tweaks needed to be made to the data, including the addition of a ‘map title’ field that will appear on the map page.  Other fields such as the full question were too verbose to be used here.  I therefore had to update the spreadsheets for all three survey types in all three areas to add this field in.  Similarly, one survey allowed more than one answer to be selected for a few questions, with differing numbers of answers being permitted.  I therefore had to update the spreadsheets to add in a new ‘max answers’ column that the system will then use to ascertain how the answer options should be processed.

I also needed to generate new regions for the Republic of Ireland survey.  Last week I’d created regions using QGIS based on the ‘county’ stored in the GEOJSON data.  This resulted in 26 regions, which is rather more than we used for Scotland.  The team reckoned this was too many and they decided to use a different grouping called NUTS regions (see https://en.wikipedia.org/wiki/NUTS_statistical_regions_of_Ireland).  A member of the team updated the spreadsheet to include these regions for all locations and I was then able to generate GeoJSON data for these new regions using QGIS following the same method as I documented last week.

When processing the data for the Republic of Ireland (which I’m using as my first test area) my setup script spotted some inconsistencies with the data as found in the GeoJSON files and the spreadsheets.  I’ve passed these on to the team who are going to investigate next week.  I’ll also continue to develop the tool next week, moving onto the development of the front-end now that the script for importing the data is now more-or-less complete.

Week Beginning 12th February 2024

I’d taken Monday off this week and on Tuesday I continued to work on the Speak For Yerself follow-on projects.  Last week I started working with the data and discovered that it wasn’t stored in a particularly consistent manner.  I overhauled the ‘lexis’ data and this week I performed a similar task for the ‘morphology’ and ‘phonology’ data.  I also engaged in email conversations with Jennifer and Mary about the data and how it will eventually be accessed by researchers, in addition to the general public.

I then moved on to looking at the GeoJSON data that will be used to ascertain where a user is located and which area the marker representing their answer should be randomly positioned in.  Wales was missing its area data, but thankfully Mary was able to track it down.

For Speak For Yersel we had three levels of locations:

Location:  The individual places that people can select when they register (e.g. ‘Hillhead, Glasgow’).

Area: The wider area that the location is found in.  We store GeoJSON coordinates for these areas and they are then used as the boundaries for placing a random marker to represent the answer of a person who selected a specific location in the area when they registered.  So for example we have a GeoJSON shape for ‘Glasgow Kelvin’ that Hillhead is located in.  Note that these shapes are never displayed on any maps.

Region: the broader geographical region that the area is located in.  These are the areas that appear on the maps (e.g. ‘Glasgow’ or ‘Fife’) and they are stored as GeoJSON files.

For the new areas we didn’t have the ‘region’ data.  I therefore did some experimenting with the QGIS package and I found a way of merging areas to form regions, as the following screenshot demonstrates:

I was therefore able to create the necessary region shapes myself using the following method:

  1. I opened the GeoJSON file in QGIS via the file browser and added the OpenStreetMap XYZ layer in ‘XYZ Tiles’, ensuring this was the bottom layer in the layer browser
  2. In the layer styling right-hand panel selected the ‘ABC’ labels icon and chose ‘County’ as the value, meaning the county names are displayed on the map
  3. In the top row of icons selected the ‘Select Features by area or single click’ icon (the 23rd icon along in my version of QGIS)
  4. I could then do ‘Ctrl+click’ to select multiple areas
  5. Then I selected the ‘Vector’ menu item and selected ‘Geoprocessing’ then ‘Dissolve’
  6. In the dialog box I had to press the green ‘reload’ icon to make the ‘Selected features only’ checkbox clickable then I clicked it
  7. Then I pressed ‘Run’ which created a new, merged shape.
  8. The layer then needed to be saved using the layer browser in the left panel.
  9. This gave me separate GeoJSON files for each region but I was then able to merge them into one file by opening the ‘Toolbox’ by pressing on the cog icon in the top menu bar, searching for ‘merge’ then open ‘Vector general; -> Merge Vector layers, selecting the input layers and ensuring the destination CRS is WGS84, then entering a filename and running the script to merge all layers.

I was then able to edit / create / delete attributes for each region area by pressing on the ‘open attribute table’ icon in the top menu bar.  It’s been a good opportunity to learn more about QGIS and next week I’ll begin updating the code to importing the data and setting up the systems.

Also this week I created an entry for the Books and Borrowing project on this site (see https://digital-humanities.glasgow.ac.uk/project/?id=160).  On Friday afternoon I also investigated a couple of issues with the search that Matt Sangster had spotted.  He noticed that an author surname search for ‘Byron’ wasn’t finding Lord Byron, and entering ‘Lord Byron’ into the surname search was bringing back lots of results that didn’t have this text in the author surname.

It turned out that Byron hadn’t been entered into the system correctly and was in as forename ‘George’, surname ‘Gordon’ with ‘Lord Byron’ as ‘othername’.  I’ll need to regenerate the data once this error has been fixed.  But the second issue, whereby an author surname search for ‘Lord Byron’ was returning lots of records is a strange one.  This would appear to be an issue with searches for multiple words and unfortunately it’s something that will need a major reworking.  I hadn’t noticed previously, but if you search for multiple words without surrounding them by quotes Solr searches the first word against the field and the remaining words against all fields.  So “surname ‘Lord’ OR any field ‘Byron’”.  Whereas what the query should be doing is “surname ‘Lord’ AND surname ‘Byron’”.  This is something that will probably affect all free-text fields.  I’m going to have to update the search to ensure multi-word searches without quotes are processed correctly, which will take some time and I’ll try to tackle next week.  I also need to create a ‘copy’ field for place of publication as this is being tokenised in the search facet options.  So much for thinking my work on this project was at an end!

Also this week I spent many hours going through the Iona map site to compile a spreadsheet listing all of the text that appears in English in order to make the site multilingual.  There is a Gaelic column in the spreadsheet and the plan is that someone will supply the appropriate forms.  There are 157 separate bits of text, with some being individual words and others being somewhat longer.  By far the longest is the content of the copyright and attribution popup, although we might also want to change this as it references the API which might not be made public.  We might also want to change some of the other English text, such as the ‘grid ref’ tooltip that gives as an example a grid reference that isn’t relevant to Iona.  I’ll hold off on developing the multilingual interface until I’m sure the team definitely want to proceed with this.

Finally this week I continued to migrate some of the poems from the Anthology of 16th and Early 17th Century Scots Poetry to TEI XML.  It’s going to take a long time to get through all of them, but progress is being made.

Week Beginning 5th February 2024

I continued to make updates to the Books and Borrowing website this week after the soft launch last week.  I had noticed last week that the figures appearing on the ‘Facts’ page didn’t correlate with the number of results returned through the search facilities.  I reckoned this was because the ‘Facts’ page, which queries the database directly was not necessarily including the full chain of interrelated tables and their individual ‘isactive’ flags when returning figures whereas the search facilities use the Solr index, and the data stored within this were generated using the full chain.  So for example, counts of borrowing records for books may not incorporate the register page record, but if the register page is set to ‘inactive’ then it is important to factor in the register page.

Another issue was the ‘in stats’ flag.  We have this flag for borrowing records to decide whether the record should appear in the stats or not, so for example a duplicate record could be omitted from the stats, but would still be findable when viewing the register page.  The search results were finding records with ‘instats’ set to ‘no’ but these were not included in the ‘Facts’ page, meaning the numbers would never match up.

To try and make the figures more consistent I decided to update the ‘Facts’ page to use the Solr index for calculating borrowing records.  This slightly changed the figures that appear in the summary section on this page, but as a borrowing record can involve multiple borrowers the total borrowings broken down by gender may not equal the total borrowings.  If a borrowing has two borrowers, one male and one female then it will count as one borrowing in the total borrowings and one each in the totals per gender.

Further investigation into the discrepancies between figures and on the ‘Facts’ page and the number of records returned in a related search did seem to suggest that these are caused by the ‘instats’ flag.  For example, for Chambers library, Fiction is the most popular genre with 4417 borrowings listed.  This is the figure for borrowings with ‘instats’ set to ‘Y’.  But following the link through to the search results listing borrowing records involving the genre ‘Fiction’ at Chambers displays 4420 borrowings.  This is because the search does not take into consideration the ‘instats’ field.  I tried running the queries (and others) through Solr and this does appear to be the reason for the discrepancies.  After discussion with the team we decided therefore to update the search page to avoid returning any ‘instats’ results.  These records will still be findable when viewing register pages, but will not now appear in any search results.  With this update in place the figures on the ‘Facts’ page began to match up with the search results.

However, after further testing I noticed that the stats for authors and genres in the site-wide facts page were still not aligned with the number of search results returned and I spent most of Wednesday investigating this.  It was a very long and complex process but I finally managed to sort it.  I updated the ‘top ten’ cache files for each library to use data from the Solr index to calculate the number of borrowings, but this alone wasn’t giving the correct figures.  The reason was that each library had its own ‘top ten’ for authors and genres, both overall and for each gender.  Then when the facts page was being generated for multiple libraries these lists were then brought together for each library to formulate overall top tens.  If an author or genre was not in the top ten for a library the data for this item was not included in the calculations as only the top ten were being stored.  For example, if ‘Fiction’ was the eleventh most borrowed genre at a library then its borrowings were not found and were therefore not getting added to the total.  What I’ve had to do instead is store all genres and authors for each library in the cache rather than just the top tens, thus ensuring that all items are joined and their number of borrowings are compared.  Unfortunately this does mean the cache files are now a lot bigger and more processing needs to be done.  But at least the figures do now match up.  Also this week I updated the Chambers Library Map to add a link to view the relevant borrowing records to the borrower popups (see https://borrowing.stir.ac.uk/chambers-library-map).

I then moved onto making further updates to the Place-names of Iona map.  Sofia had spotted two minor issues with the map and I fixed them.  Cross references were only set to be included in the data if both records had ‘landranger’ set to ‘Y’.  I’m not sure why this was – it must have been something decided in an earlier project.  I removed this requirement and the cross references now display in the full map record.  However, pressing on the links wasn’t actually doing anything so I also fixed this.  Pressing on a cross reference now works in the same way as the links in the tabular view of the data – the popup is closed, the map focusses on the location of the cross reference and the marker pop-up opens.  The other issue was that the glossary pop-up could end up longer than the page and wasn’t using the updated in-popup scrollbar that I’d developed for the tabular view, so I updated this.

I also continued working through my ‘to do’ list for the project, which is now almost entirely ticked off.  I updated the searches and browses so that they now shift the map to accommodate all of the returned markers.  I wasn’t sure about doing this as I worried it might be annoying, but I think it works pretty well.  For example, if you browse for place-names beginning with ‘U’ the map now zooms into the part of the island where all of these names are found rather than leaving you looking at a section of the island that may not feature any of the markers.

I also updated the citations to use shorter links.  Whenever you press on the ‘cite’ button or open a record the system grabs the hash part of the URL (the part after ‘#’) and generates a random string of characters as a citation ID.  This random string is then checked against a new citation table I’ve created to ensure it’s not already in use (a new string is generated if so) and then the hash plus the citation ID are stored in the database.  This citation ID is then displayed in the citation links.  Loading a URL with a citation ID then queries the database to find the relevant hash and then redirects the browser to this URL.  I’ll need to keep an eye on how this works in practice as we may end up with huge numbers of citation URLs in the database.

I also updated the maximum zoom of the map layers to be whatever each layer supports.  The satellite with labels view supports the highest zoom of the layers (20) and if you’re on this then switch to a different layer (e.g. Modern OS with a max zoom of 16.25) then the map will automatically zoom out to this level.

The only major thing left to tackle is the bilingual view.  My first step in implementing this will be to go through the site and document each and every label and bit of text.  I’ll create a spreadsheet containing these and then someone will have to fill in the corresponding Gaelic text for each.  I’ll then have to update the site so that the code references a language file for every bit of text rather than having English hard-coded.  I’ll begin investigating this next week.

Also this week I spent a bit of time working for the DSL.  They wanted the Twitter logo that is displayed on the site to be replaced with the ‘Artist formerly known as Twitter’ logo.  Unfortunately this was a bigger job than expected as the logos are part of a font package called ‘Font Awesome’ that provides icons across the site (e.g. the magnifying glass for the search button).  The version of the font used on the website was a couple of years old so obviously didn’t include the new ‘X’ logo.  I therefore had to replace the package with a newer version, but the structure of the package and the way in which icons are called and included had changed, which meant I needed to make several updates across the site.  I also replaced the Twitter logo on the entry page in the ‘Share’ section of the right-hand panel.

I spent most of Thursday working on the migration of the Anthology of 16th and Early 17th Century Scots Poetry.  I’ve now migrated more than 25 poems to TEI XML and I’ve completed the interface for displaying these.  I still need to make some tweaks to the XSLT (e.g. to decide how line indents should be handled) but I have now managed to sort other issues (e.g. poems with line numbers that start at a number other than 1).  I should now be able to continue manually migrating the poems to XML when I have a few spare minutes between other commitments and hopefully in a couple of months at most I’ll be able to launch the new site.

On Friday I began looking at developing the Speak For Yersel tool (https://speakforyersel.ac.uk/) for new geographical areas.  We now have data for Wales, the Republic of Ireland and Northern Ireland and I began going through this.  I had hoped I’d just be able to use the spreadsheets I’d been given access to but after looking at them I realised I would not be able to use them as they currently are.  I’m hoping to reshape the code into a tool that it will be possible to easily apply to other areas.  What I want to do is create the tool for one of our three areas and once it’s done it should then be possible to simply ‘plug in’ the data from the other two areas and everything will just work.  Unfortunately the spreadsheets for the three areas are all differently structured and for this approach to work the data needs to be structured absolutely identically in each region.  I.e. every column in every tab of every region must be labelled exactly the same and all columns must be in exactly the same location in each spreadsheet.

For example, the ’Lexis’ tab is vastly different in each of the three areas.  ROI has an ID field, part of speech, a ‘direction of change’ column, 6 ‘variant’ columns, a ‘notes’ column and an ‘asking in NI’ column.  It has a total of 14 columns.  NI doesn’t have ID, PoS or direction of change but has 7 ‘Choice’ columns, a ‘potential photos’ column, no ‘notes’ column and then two columns for cross references to Scotland and ROI.  It has a total of 12 columns.  Wales has ‘Feature type’ and ‘collection method’ columns that aren’t found in the other two areas, it has no ‘Variable’ column, but does have something called ‘Feature’ (although this is much longer than ‘Variable’ in the other two areas).  It has an ‘if oral – who’ column that is entirely empty, six ‘Option’ columns plus a further unlabelled column that only contains the word ‘mush’ in it and four unlabelled columns at the end before a final ‘Picture’ column.  It has a total of 17 columns, plus the data for the feature ‘butt, boyo, mun, mate, dude, boi, lad, la, bruh, bro, fella’ in ‘Option F’ states ‘Lad ++ la/bruh/bro/fella (these should be separate options)’.  As you can probably appreciate that there’s no way that any of this can be automatically extracted.

I therefore worked on reformatting the spreadsheets to be consistent, starting with the ‘lexis’ spreadsheets for the three areas, which took most of the day to complete.  Hopefully the other tabs won’t take as long.  The spreadsheet has the following column structure:

  1. ID: this takes the form ‘area-lexis-threeDigitNumber’, e.g. ‘roi-lexis-001’. This allows rows in different areas to be cross referenced and differentiates between survey types in a readable manner
  2. Order: a numeric field specifying the order the question will appear in. For now this is sequential, but if anyone ever wants to change the order (as happened with SFY) they can do so via this field.
  3. Variable: The textual identifier for the question. Note that there are a few questions in the Wales data that don’t have one of these
  4. Question: The question to be asked
  5. Picture filename: the filename of the picture (if applicable)
  6. Picture alt text: Alt text to be used for the picture (if applicable). Note that I haven’t filled in this field
  7. Picture credit: The credit to appear somewhere near the picture (if applicable)
  8. Xrefs: If this question is linked to others (e.g. in different areas) any number of cross references can be added here.  I haven’t completed this field but the format should be the ID of the cross-referenced question.  If there are multiple then separate with a bar character.  E.g. ‘wales-lexis-001|roi-lexis-006’
  9. Notes: Any notes (currently only relevant to ROI)
  10. POS: Part of speech (as above)
  11. Change: (as above)
  12. Options: Any number of columns containing the answer options.  The reason these appear last is that the number of columns is not fixed.

Next week I’ll work on the other two survey question types and once the data is in order I’ll be able to start developing the tool itself.  I’ve taken Monday next week off, though, so I won’t be starting on this until Tuesday, all being well.

Week Beginning 29th January 2024

I spent a lot of time this week working on the Books and Borrowing project, making final preparations for the launch of the full website.  By the end of the week the website was mostly publicly available (see https://borrowing.stir.ac.uk) but it wasn’t as smooth a process as I was hoping for and there are still things I need to finish next week.

My first task of the week was to write a script that identified borrowers who have at least one active borrowing record but of these zero are records on pages that are active in registers that are active.  This should then bring back a list of borrowers who are only associated with inactive registers.  It took quite some time to get my head around this after several earlier attempts didn’t do exactly what was requested and my final script identified 128 borrowers that were then deactivated, all from St Andrews.

I then moved on to an issue that had been noticed with a search for author names.  A search for ‘Keats’ was bringing back matches for ‘Keating’, which was clearly not very helpful.  The cause of this was Solr trying to be too helpful.  The author name fields were stored at ‘text_en’ and this field type has stemming applied to it, whereby stems of words are identified (e.g. ‘Keat’) and a search for a stem plus a known suffix (e.g. ‘s’, ‘ing’) will bring back other forms of the same stem.  For names this is hopeless as ‘Keating’ is in no way connected to ‘Keats’.

It turned out that this issue was affecting many other fields as well, such as book titles.  A search for ‘excellencies’ was finding book titles containing the forms ‘excellency’ and also ‘excellence’, which again was pretty unhelpful.  I did some investigation into stemming and whether a Solr query could be set to ignore it, but this did not seem to be possible.  For a while I thought I’ve had to change all of the fields to strings, which would have been awful as strings in Solr are case sensitive and do not get split into tokens, meaning wildcards would need to be used and the search scripts I’d created would need to be rewritten.

Thankfully I discovered that if I stored the text in the field type ‘text_general’ then stemming would be ignored but the text would still be split into tokens.  I created a test Solr index on my laptop with all of the ‘text_en’ fields set to ‘text_general’ and searching this index for author surname ‘Keats’ only brought back ‘Keats’ and book title for ‘excellencies’ only brought back ‘excellencies’.  This is exactly what we wanted and with the change in place I was able to fully regenerate the cached data and the JSON files for import into Solr (a process that takes a couple of hours to complete) and ask for the online Solr index to be updated.

I also updated the help text on the website, adding in some new tooltips about editors and translators to the advanced search form.  With no other issues reported by the team I then began the process of making the site live, including adding a ‘quick search’ bar to the header of all pages, adding in menu items to access the data and adding the ‘on this day’ feature to the homepage.

However, I noticed one fairly sizeable issue, unfortunately. The stats on the ‘facts’ page do not correspond to the results you get when you click through to the search results.  I therefore removed the top-level ‘Facts & figures’ menu item until I can figure out what’s going on here. I first noticed an issue with the top ten genre lists.  The number of borrowings for the most popular genre (history) were crazy.  Overall borrowings of ‘history’ is listed as ‘104849’, which almost as many as the total number of borrowing records than we have in the system.  Also the link to view borrowings in the search was limiting the search to Haddington, which is clearly wrong.    Other numbers in the page just weren’t matching up to the search results either.  For example for Chambers the most prolific borrower is Mrs Thomas Hutchings with 274 listed borrowings, but following the link to the search results gives 282.

It took several hours of going through the code to ascertain what was going on with the top ten genre issue and it was all down to a missing equals sign in an ‘if’ statement.  Where there should have been two (==) there was only one, and this was stopping the following genre code from processing successfully.

There is unfortunately still the issue of the figures on the facts pages not exactly matching up with the number of results returned by the search.  I had noticed this before but I’d hoped it was caused by the database still being updated by the teasm and being slightly out of sync with the Solr index.  Alas, it’s looking like this is not the case.  There must instead be discrepancies in the queries used to generate the Solr index and those used the generate the facts data.  I fear that it might be the case that in the chain of related data some checks for ‘isactive’ have been omitted.  Each register, page, borrowing record, borrower, author, and book at each level has its own ‘isactive’ flag.  If (for example) a book item is set to inactive but a query fails to check the book item ‘isactive’ flag then any associated borrowing records will still be returned (unless they have each been set to inactive).  I’m going to have to check every query to ensure the flags are always present.  And it’s even more complicated than that because queries don’t necessarily always include the same data types.  E.g. a borrower is related to a book via a borrowing record and if you’re only interested in borrowers and books the query doesn’t need to include the associated page or register.  But of course if the page or register where the borrowing record is located is inactive then this does become important.  I might actually overhaul the ‘facts’ so they are generated directly from the Solr index.  This would mean things should remain consistent, even when updates are made to the data in the CMS (these would not be reflected until the Solr index is regenerated).  Something I’ll need to work on next week.

Also this week I had a Zoom call with Jennifer Smith and Mary Robinson to discuss expanding the Speak For Yersel project to other areas, namely Wales, Ireland and Northern Ireland.  The data for these areas is now ready to use and we agreed that I would start working with it next week.

I also fixed a couple of issues with the DSL website.  The first was an easy one – the team wanted the boxes on the homepage to be updated to include a ‘Donate’ box.  It didn’t take long to get this done.  The second was an issue with Boolean searches on our test site.  Performing a fulltext search for ‘letters and dunb’ on the live site was returning 192 results while the same search on our test site was returning 54,466 results (limited to 500).

It turned out that the search string was being converted to lower case before it was being processed by the API.  I must have added this in to ignore case in the headword search, but an unintended consequence was that the Booleans were also converted to lower case and were therefore not getting picked up as Booleans.  I updated the API so that the search string is not converted to lower case, but where a headword search is to be performed the string is converted to lower case after the Boolean logic is executed.  With the update in place the searches started working properly again.

I also made a minor tweak to the advanced search on both live and new sites so that query strings with quotations in them no longer lose their quotation marks when returning to the advanced search form.  This was an issue that was identified at our in-person meeting a couple of weeks ago.

I found a bit of time to continue working on the new map for the Place-names of Iona project this week too, during which I completed work on the ‘Table view’ feature.  I updated the table’s popup so that it never becomes taller than the visible map.  If the table is longer than this then the popup now features a scrollbar, which works a lot better than the previous method, whereby pressing on the browser’s scrollbar closed the popup.

I have also now completed the option to press on a place-name in the table to view the place-name on the map.  It took a lot of experimentation to figure out how to get the relevant marker popup to open and also populate it with the correct data.  Things went a little crazy during testing when opening a popup was somehow getting chained to any subsequent popup openings, resulting in the map getting stuck in an endless loop of opening and closing different popups and attempting to autoscroll between them all.  Thankfully I figured out what was causing this and sorted it.  Now when you press on a place-name in the table view any open popups are closed, the map navigates to the appropriate location and the relevant popup opens.  Originally I had the full record popup opening as well as the smaller marker popup, but I began to find this quite annoying as when I pressed on a place-name in the table what I generally wanted to see was where the place-name was on the map, and the full record obscured this.  Instead I’ve set it to open the marker popup, which provides the user with an option to open the full record without obscuring things.  I’ve also now made it possible tocite / bookmark / share the table in a URL.

I then moved onto developing the ‘Download CSV’ option, which took quite some time.  The link at the bottom of the left-hand menu updates every time the map data changes and pressing on the button now downloads the map data as a CSV.

Also this week I had discussions with the Anglo-Norman Dictionary team about a new proposal they are putting together and I updated the Fife Place-names website to fix a couple of issues that had been introduced when the site was migrated to a new server.  I also continued to migrate the collection of Scots poems discussed in previous weeks from ancient HTML to XML, although I didn’t have much time available to spend on this.