Scenario

Data Mobilization Project from Literature “Birds fallen at Danish Lighthouses, 1883–1939”

use case 3 scanner
High resolution scanner for book digitization project by Heiko Hornig (licensed under CC BY-SA 2.5)

This narrative was developed as a basis for practical exercises in the biodiversity data mobilization course and the exercise concept and content was developed by Alberto González-Talaván, Andrea Hahn, Laura Russell and Sharon Grant. It is based upon a previous adaptation by Alberto González-Talaván, Danny Vélez, Larissa Smirnova, Laura Russell, Mélianie Raymond and Nicolas Noé.

It is a fictionalized scenario based on a real project and dataset and is meant only for instructional purposes. The original project and the original dataset are attributed to the Danish GBIF Node, DanBIF.

Description

The Natural History Museum of Denmark (NHM-DK) is a research centre associated with the University of Copenhagen. Their library is a member of the national library association who recently received state funding to make available online the resources held by its members. The NHM-DK would like to begin digitization of the field notebooks, journal publications and books held in their library, some of which have significant historic value.

After a short consultation with their regular partners, NHM-DK received a suggestion from the Head of the management office of the Nordjylland National Park. They would like the contents of a particular classic literature compilation digitized for a project they are running: ‘Birds at the Danish Lighthouses, 1883–1939’ (In Danish, ‘Fuglene ved de danske Fyr, 1883–1939’). They want to use any occurrence data recorded in those books from two lighthouses (Lodbjerg Fyr and Hanstholm Fyr) for an on-site exhibition project.

The NHM-DK has started discussions with their national GBIF node, DanBIF, about the mobilization of the information contained in these volumes, namely to preserve their contents for the future and provide online access for everyone. With the involvement of DanBIF, there is intent to publish and register the resulting extracted data with GBIF. As GBIF requires a license be applied with all published data, the museum has decided to publish the data with a Creative Commons license allowing use of data with attribution (CC-BY).

The IT services required are provided by the Technology Unit of the University of Copenhagen, as for all museum digital projects.

The NHM-DK deputy director, who is coordinating this piece of work has developed a general outline for the work:

  1. The museum will carry out the digitization of the literature using two library staff members trained in the use of the library scanner to scan delicate volumes. They will also extract text from the scans through OCR (Optical Character Recognition) software.

  2. Three volunteers from the Copenhagen Ornithological Society (COS) who regularly collaborate with the museum and are familiar with the birds of the region have been enlisted to assist and will complete the transfer of data from the scanned PDFs into spreadsheet format. They will need to go to the museum and use the computers available in the library to gain access to the files stored in the museum intranet (private network).

  3. The Ornithology Curator in the NHM-DK Bird Department will lead the team responsible for taxonomic checking, data curation, cleaning, format and transformation, and will oversee the entry of metadata for the published dataset. The team includes a collaborating researcher from Sweden, and two postdoctoral students. They have been selected for this task because they are used to working with digital biodiversity data. They will all use their own work computers.

  4. The DanBIF Node Manager will ensure that the institution is adequately registered in GBIF as a data publisher and that the deputy director and the ornithology curator have the proper credentials and access to DanBIF’s IPT instance to upload and publish the data.

Original data collection

In the period 1883-1939, there were 45 lighthouses and lightships functioning in Denmark. These lighthouses were used by several species of birds during the nights of the bird migration period from the years 1886 through 1939. The presence and activities of these birds were recorded, especially by the keepers of these lighthouses who also collected specimens that were sent to the museum in Copenhagen. These birds were carefully preserved and catalogued by collection managers at the museum and the specimens can still be found there today. Observations of weather conditions during the nights when birds were observed by the keepers were also documented.

Analogue data description

This is an example of the description of a series of species observations from one of the books (in German, except the common name for the species which is provided in Danish).

use case 3 analogue

Scanned and translated data description

This is an example of the scanned and translated output from the analogue example above.

use case 3 scanned

Digital data description

Studying the extract from the book, the volunteers from the Copenhagen Ornithology Society suggest extracting the following data from the scanned and translated text:

  • Scientific name as appearing in the book

  • Common name(s) in Danish as appearing in the book

  • Locality

  • Year/month/day

  • Observed number of individuals

  • Sex

  • Lifestage

  • Remarks

  • URL of the digitized book page in which the occurrence is provided