The Royal Botanic Garden Edinburgh's extensive Herbarium collection is estimated to hold three million specimens representing half to two thirds of the world's flora. The specimens, alongside the collection label can be used to answer many research questions including: changes in species distribution and flowering times linked to habitat loss and climate change. Thus specimen data helps inform and target conservation and climate change mitigation efforts protecting plant species for future generations.
Our digitisation programme seeks to make this data as widely accessible as possible. Full label transcription would take around 40 years for our team of three digitisers. With the rate of species loss being estimated to be at 1000 - 10,000 above the naturally expected rate we cannot afford to take decades, we had to look for another way to speed up the process. This involved minimal data entry which is 14 times faster in combination with citizen science label transcription.
The Royal Botanic Garden Edinburgh will be launching a citizen science project on Zooniverse which aims to transcribe the data on the collection labels of plant specimens collected from across the world over a period of more than 200 years. The specimens, alongside the collection label can be used to answer many research questions including changes in species distribution and flowering times linked to habitat loss and climate change. Thus specimen data helps inform and target conservation and climate change mitigation efforts protecting plant species for future generations. The specimens in this project are all in the Gesneriaceae plant family which will be familiar to many from the widely cultivated houseplants, African Violets. All these cultivated African Violets are derived from a handful of species found only in Tanzania and Kenya and which are under threat of extinction. As a whole, the family has about 2,500 species from mostly tropical environments, and a large number of species are only known from one or two specimens. This is part of Engaging Crowds: Citizen research and heritage data at scale, a project funded by the Arts & Humanities Research Council. Within the AHRC programme we want to analyse the impact of increasing agency of the citizen scientist on data quality.
We will create two projects: a baseline project and indexed project, splitting up our digital specimen images of the plant family Gesneriaceae into two randomised datasets. These will build upon previous analysis of data quality from citizen science projects carried out at RBGE. We will use dropdown lists where possible as this has proved to increase transcription accuracy. We will also include the ability for volunteers to indicate where data are either missing or illegible on the specimen labels. The baseline project will present specimens to the volunteers in a random order. The indexed project will utilise output from Optical Character Recognition (OCR) software, created from our specimen images, to pull out batches associated with specific collectors and geographical locations. When used in conjunction with the Zooniverse indexing tool this will allow volunteers to choose a subset of transcription tasks. It will be interesting to see how this impacts the both transcriber behaviour and data quality. Experience from previous projects indicates that different transcribers have strengths in transcribing different aspects of specimen data.