photo

Colophon

Suggested citation

Ingenloff K (2025) Survey and Monitoring Data Quick-Start Guide: A how-to for updating a Darwin Core dataset using the Humboldt Extension. GBIF Secretariat: Copenhagen. https://doi.org/10.35035/doc-7t3p-ve38

Authors

Licence

The document Survey and Monitoring Data Quick-Start Guide: A how-to for updating a Darwin Core dataset using the Humboldt Extension is licensed under Creative Commons Attribution-ShareAlike 4.0 Unported License.

Acknowledgement

Survey and Monitoring Data Quick-Start Guide: A how-to for updating a Darwin Core dataset using the Humboldt extension was produced under the BioDT project, which received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101057437.

Document control

v2.0, September 2025

Cover image

Illustration by Javier Gamboa, GBIF Secretariat 2025. Licensed under Creative Commons Attribution-ShareAlike 4.0 Unported License.

Summary

The details about a biological survey (how it was carried out, the spatio-temporal scope, the taxonomic groups targeted, who was involved, etc.) are important to properly understand the structure of the survey and how the published data can be reused. The Humboldt Extension for Ecological Inventories (HE), a vocabulary extension to the Darwin Core (DwC) Event Class, provides a means by which to explicitly report the context in which species occurrence data and/or material specimens were collected. The extension includes 55 terms to capture critical facets of survey design including protocol, scope, and sampling effort in a structured manner, thus enhancing overall FAIRness (specifically findability and interoperability) of biological survey data.

This document will guide GBIF data publishers who (a) already have data formatted as a Darwin Core (DwC) Event dataset through the process of updating their dataset with the Humboldt extension or (b) are comfortable with the DwC Event core and wish to map a new dataset as a DwC Event class to DwC Event class and Humboldt extension terms.

If you are not yet comfortable with Darwin Core Event dataset and are looking for more in-depth guideance in structuring a your data following Darwin Core guidelines, refere to the more comprehensive document, Guide for publishing biological survey and monitoring data to GBIF.

1. Getting started

The process of updating your Darwin Core Archive (DwC-A) Event dataset with the Humboldt extension will likely involve moving some information from the existing DwC-A metadata to the event table; referring to existing documentation, publications, or associated weblinks related to the dataset; and, when possible, conference with the original data collectors or individuals involved in the design and oversight of the project resulting in the dataset. Data republication efforts described here are expected to increase the value and usefulness of existing event datasets in GBIF, broaden their application and therefore data citation across science and policy reuse scenarios.

Before you get started, we recommend that you prepare by taking the following steps:

  • Review the information already published in your DwC-A for the Event dataset, focusing specifically on the Event, metadata, and extended measurement or facts tables, noting where key information about survey design, sampling protocol, scope, and effort are available.

  • Identify additional dataset resources that can be referred to including supplementary documentation, publications, websites, and dataset contacts and people involved in the data collection or oversight of the project or survey.

  • Review the reported data structure to determine if the existing event hierarchy accurately reflect the data and the level of complexity desired. Make necessary changes.

Now, you’re ready to capture survey design data using the Humboldt extension following the recommendations below.

2. Data mapping template

A basic biodiversity survey data template is available to facilitate mapping of survey data and preparation for formatting of a DwC-A. The template can be accessed as a single .xlsx file or as three separate .csv files.

Table Description

event

Terms in the event table are used to capture survey Event information (i.e. information that applies to the observations of all taxa) including survey design, protocol(s), scopes, and effort. Terms from the Darwin Core Event class and the Humboldt extension are included.

Column heads are populated with the DwC Event core and Humboldt extension terms referenced in this guide. The rows beneath each term include term definitions, comments, recommended usage for publication in GBIF, and additional comments or usage guidance.

occurrence

Terms in the occurrence table should be used to capture information about the occurrence of a single taxon with terms from the Darwin Core Occurrence extension.

Column heads are populated with the DwC Occurrence extension terms referenced in this guide. The rows beneath each term include term definitions, comments, and recommended usage for publication in GBIF. Additional Occurrence extension terms should be added to your own data occurrence table as appropriate for your dataset.

README

The README table provides additional information about the structure and information included in each data table.

2.1. Example datasets

  • Faveyts W and Cooleman S (2025). Bird census counts at the Zwin Nature Park. Version 1.5. Belgian Biodiversity Platform. Sampling event dataset https://doi.org/10.15468/saesvn.

  • Palpurina S (2025). Vegetation plots collected in dry grasslands throughout Bulgaria and Romanian Dobrudzha. Version 1.12. Masaryk University, Department of Botany and Zoology. Sampling event dataset https://doi.org/10.15468/pkx4tg.

  • Piesschaert F, Vermeersch G, Brosens D, Westra T, Desmet P, Feys S, Van de Poel S, Pollet M, and Cooleman S (2025). ABV - Common breeding birds in Flanders, Belgium (post 2016). Version 1.14. Research Institute for Nature and Forest (INBO). Sampling event dataset https://doi.org/10.15468/pj2v6h.

  • van Klink R and Gerrits G (2025). Biological Station Wijster standard trapping program: Sampling event data for ground beetles (Coleoptera: Carabidae). Version 1.3. WBBS foundation. Sampling event dataset https://doi.org/10.15468/3mcqja.

3. Updating your DwC Event dataset with the Humboldt extension

The contextual information about survey Events should be saved to the DwC-A event table.

This section will guide you through the process of mapping Event-level data specifically related to survey structure, location, protocols, scopes, and effort.

About DwC terms in this document

Each term in this document is linked with its respective term internationalized resource identifier (IRI) alias (ex., eco:protocolNames). Always use these links to refer to the definition, comments, and examples provided when populating a term.

The terms to be used to describe Event-level information are a combination of Darwin Core Event class and Humboldt Extension terms:

Data mapping tips

Survey Event data should be saved to the DwC-A event table. This includes DwC Event terms (any term preceded by dwc, e.g., dwc:eventID) and Humboldt extension terms (any term preceded by eco, e.g., eco:protocolNames).

Populate all terms for which information is available.

Paired terms must be populated together. These terms are designed to offer data publishers some level of flexibility in reporting data. Paired terms are most common in terms available for reporting a variable value and associated unit of measure (for example, dwc:sampleSizeValue and dwc:sampleSizeUnit).

No data, missing data, and data values of 0

  • Cells with a value of 0 (zero) should be explicitly populated as 0.

  • Cells with missing data or NULL values should be left empty.

  • Terms for which there is no data to share at any hierarchical level can be excluded from the data table.

Populating terms across Event levels (e.g., from parent Event to child Event)

  • Each Event can have its own set of attributes and measurements which can be captured using the Humboldt and/or other extension(s) and be unambiguously linked to the corresponding Event through the appropriate dwc:eventIDs.

  • Terms should contain data clearly (explicitly) reported at every Event level in the hierarchy to which they directly apply. This means that when publishing a data export,

Refer to Properties of hierarchical events in the Humboldt Extension for Ecological Inventories for more guidance in populating Humboldt extension terms across Event levels.

3.1. Survey sampling design and event hierarchy

The first step in the process of updating your Darwinc Core dataset it check that the existing datase structure reflects the survey design implemented to capture the data reported in your dataset. Biological survey design, the sampling structure of a biological survey, varies widely. Identifying how to best translate survey design to DwC Event core is the most difficult part of mapping a survey dataset. DwC defines an Event as 'an action that occurs at some location during some time’, such as a specimen collection expedition, a camera trap image capture, or a marine trawl. This broad definition of Event means biological surveys can be framed as a single Event or as a series of Events nested within Events using a parent-child relationship as necessary. The sampling Event hierarchy is the translation of survey design into an Event-based perspective using Darwin Core.

Sharing biodiversity data in a way that clearly and accurately reflects survey design helps ensure accurate understanding and interpretation of the information contained in a dataset enabling potential data users to more readily assess the appropriateness of the data for inclusion in their own analyses.

3.1.1. Non-nested datasets

The simplest Event data structure is a non-nested dataset. Non-nested datasets reflect a simple or flat survey design structure (Figure 1). These are typically simple datasets consisting of:

  • a single sampling Event occurring at a particular place and time and conducted using a single standardized sampling protocol that is not repeated and is not necessarily part of a larger sampling schema (Figure 1a), or

  • a series of single sampling Events that are not joined by a larger parent Event (Figure 1b). A compilation (e.g., a combination of unrelated surveys, compiled data sources and/or literature searches, see the Biological survey data section) could be a special case of non-nested dataset where there is a unique Event level that describes the compilation itself (e.g., the broad area where multiple surveys are aggregated), which results in one or more Occurrences.

Fig1
Figure 1. A simple schematic of a non-nested Event dataset (a) consisting of a single Event (purple box) with associated Occurrences related to the Event via the Occurrence extension (blue box) and (b) a series of individual Events (purple boxes) with associated Occurrences related to the appropriate Event via the Occurrence extension (blue boxes).

3.1.2. Nested datasets

More complex survey designs will require implemntation of a nested dataset structure. Nested datasets use parent-child relationships to capture information collected through more complex survey design, such as datasets resulting from repeated sampling events and/or multiple sampling protocols. Creating nested Event levels may be important to relating the full story a dataset has to tell and to facilitating downstream analysis of the data.

There is no single correct dataset structure. Identifying the data structure most appropriate for a dataset may not always be a straightforward process; however, structure is most commonly defined as a function of sampling location, protocol, and date.

In a nested dataset:

  • The top-most Event level does not have a parent Event but is parent to all Events beneath it.

  • An Event may be parent to multiple child Events.

  • All Events except those at the lowest Event level are considered the parent Event to any Event(s) beneath it.

    • A parent Event must fully encompass its child Events spatially and temporally. Specifically, the spatial extent and temporal interval of a parent Event must contain the spatial extents and temporal intervals of all of its children (see Section 3.2.1 Principle of spatiotemporal coverage in Properties of hierarchical events in the Humboldt Extension for Ecological Inventories).

    • A child Event (an Event that is contained entirely within a single parent Event) may represent either multiple sampling sites, protocols, or repeated sampling at the same locality using the same protocol.

  • Events at the lowest hierarchical level are never a parent Event.

Each Event level should reflect a meaningful ecological or operational unit (e.g., spatial, temporal, or ecological) in the survey design. An Event level should only be added if the addition of that Event level is necessary to facilitate data interpretation, downstream analysis, and/or linkage of information across data sources. Do not create Event levels that are not necessary.

Refer to Properties of hierarchical events in the Humboldt Extension for Ecological Inventories (TDWG Humboldt Extension Task Group, 2024) for more information about creating nested data structures for Darwin Core datasets.

The goal in establishing a dataset structure is to keep it as simple as possible while still accurately representing the survey design. There may be multiple ways to structure a dataset and there is no single correct dataset structure. Further, identifying the data structure most appropriate for a dataset may not be a straightforward process. As a general guideline, dataset structure is most commonly defined as a function of sampling location, protocol, and date.

Consider a hypothetical survey where two sampling protocols (Protocol a and Protocol b) are implemented at two different sites (Site 1 and Site 2). Both sites are sampled (site visits) twice (t1 and t2) using each of the protocols.

This survey dataset could be structured with two Event levels as shown in Figure 2. Here, the highest Event level would consist of four Events representing each unique site-protocol combination: Site 1–Protocol a, Site 1–Protocol b, Site 2–Protocol a, Site 2–Protocol b. Events at the lowest Event level will represent site visits that occur on a particular date for each site-protocol combination. Organismal Occurrence information collected during each site visit is linked to the relevant site visit Event. This two Event level structure represents the simplest possible nested dataset structure with only a single level of nesting.

It is ideal to structure a dataset such that each implemented protocol and unique site location is represented as a specific Event so that information from the same pool of species (i.e. location) and likelihood of detecting these species (i.e. protocol) is joined together by being part of the same Event. However, it is not always possible to disentangle information collected using multiple protocols.

Fig2
Figure 2. Simplified example schematic of a nested Event dataset consisting of a series of surveys conducted at two sites (Site 1 and Site 2) with two distinct sampling protocols (Protocol a, Protocol b) represented by the pink boxes. Surveys implementing each protocol are conducted at Sites 1 and 2 on two different dates (Site visit t1, Site visit t2; orange boxes). Associated Occurrences are related to the appropriate Event via the Occurrence extension (blue boxes).

3.1.3. Project information

Surveys conducted as part of a larger or established network or project should report as much contextual information as possible to capture information about the project or network. Project-level information will always be shared at the highest Event level. This can be achieved in one of two ways:

  • By embedding project-level information within the highest existing survey Event level. With the dataset presented in Figure 2, project-level information would be included with each of the four Site–Protocol Events.

  • By introducing a new parent Event level above all existing Events dedicated to capturing project-level information. In the context of the example dataset presented in Figure 2, this would mean adding a third Event level to the dataset structure that is parent to all four Site–Protocol Events (see Figure 3). Creating a single parent Event is particularly useful option when a project will result in multiple, independent datasets. In this case, the Event identifier used for the project Event level can be used in all relevant datasets providing a means of identifying related datasets.

Fig3
Figure 3. Simple nested hierarchy as presented in Figure 2 with the addition of a single Event that is parent to all other Events to consolidate all survey Events under the context of the broader project (purple box).

3.1.4. Complex survey design

  • multiple protocols are implemented within the same survey design,

  • survey outputs include a mix of data types (e.g., specimen collections, field observations, observed co-occurrences),

  • collected material contributes to downstream products (e.g., trait data, lab measurements, voucher specimens, media representations), or

  • relationships among datasets need to be preserved or exposed (e.g., datasets resulting from different types of surveys within the same Project and/or at the same established survey sites).

For example, consider the dataset Krill along the 110°E meridian: Oceanographic influences on assemblages in the eastern Indian Ocean, RV Investigator voyage IN2019_V03 (2019), published by Ocean Biodiversity Information System (OBIS)-Australia. The dataset contains information about a zooplankton survey conducted by the CSIRO Marine National Facility in the eastern Indian Ocean in 2019. The survey consisted of daytime and nighttime sampling at 20 locations (stations) along an established transect. As illustrated in Figure 4, this dataset could be structured as a non-nested dataset (Figure 4a) or as nested dataset (Figures 4b-d); and, as a nested dataset, the structure could be simple (Figures 4b and c) or more deeply nested with more than two Event levels (Figure 4d).

  • Non-nested dataset structure (Figure 4a): As a non-nested dataset, each sampling at a given station at a particular date and time would be a unique Event with no obvious link to other Events in the dataset beyond being part of the same dataset. Implementing this structure is the simplest approach to sharing data from the survey, however, without any nesting of Events, it may be difficult for data users to understand the relationships between survey Events. Associated Occurrences are related to the appropriate Event via the Occurrence extension.

  • Simple nested dataset structure (Figure 4b): An alternative a simple nested dataset structure could consist of two Event levels. The highest Event level would capture information about the survey stations, where each of the 20 survey stations would be a unique, unrelated parent Event to the relevant daytime and nighttime sampling Events. Associated Occurrences would be related to the appropriate Event via the Occurrence extension.

  • Simple nested dataset structure (Figure 4c): As a simple nested dataset, the data structure would consist of two Event levels with the highest Event level capturing information about the overall cruise or campaign and second Event level represents the daytime and nighttime sampling events at each station as a series of unique Events. Associated Occurrences are related to the appropriate Event via the Occurrence extension.

  • Deeply nested dataset structure (Figure 4d): As a more deeply nested dataset, the structure would consist of three Event levels: the highest Event level represents the Survey (that is, the overall cruise or campaign); the middle Event level represents each of the 20 survey stations; and, the lowest Event level represents the daytime and nighttime sampling Events at each station. Note that the child Events of each parent Event are used to report independent replicates of the same type within the same parent Event and/or to preserve individual sampling units. Associated Occurrences are related to the appropriate Event via the Occurrence extension.

If the survey itself was a unique Event, the simpler two Event level structure (e.g., Figures 4b and 4c) would likely suffice. However, the stations sampled during the survey are standard sampling locations used in other survey efforts not covered by this dataset. To make it easier to link information from this dataset to data from other surveys conducted at the same localities, a more complex nested structure was chosen by the data publisher.

Fig4
Figure 4. Four potential dataset structures for a zooplankton survey conducted by CSIRO at 20 stations, each sampled once during the day and once at night: non-nested structure (a), simple nested structure (b and c), and complex or deeply nested structure (d).

3.1.5. Sampling Event hierarchy terms

Historically, only 2 terms were available to structure and relate different levels of survey design in a dataset: dwc:eventID and dwc:parentEventID. One additional Darwin Core Event term, dwc:fieldNumber, provided a means by which to relate a sampling Event with a dataset- or project-specific field number. The Humboldt extension provides an additional 2 terms (eco:siteCount and eco:siteNestingDescription) to better support complex or nested survey designs.

Event data in GBIF
  • Any dataset to be published using the DwC Event core must have at least one Event record.

  • Each dwc:eventID in a dataset must be unique within the dataset. Use of a persistent globally unique identifier (GUID) is recommended to ensure that the GUID is unique across all datasets. A unique dwc:eventID should be reused between datasets where appropriate (for example, where data collected during the same sampling event are published as multiple datasets). See A Beginner’s Guide to Persistent Identifiers for guidance in creating persistent identifiers. Note that your field numbers should be reported using dwc:fieldNumber.

  • An Event is not required to have associated organism Occurrence data. If organism Occurrence or non-detection data are available, they will be linked via the dwc:eventID in the occurrence table using the occurrence extension.

  • Other DwC Event extensions, including occurrence, extended measurement or fact, and relevé extensions, can be linked to any appropriate Event via the dwc:eventID.

Non-nested datasets

Nested datasets

Nested hierarchies are established by relating a child Event to a parent Event through the child Event´s dwc:parentEventID. As such, these more complex datasets require use of both dwc:eventID and dwc:parentEventID.

In practice, this means that the parent and the child will each have a unique dwc:eventID. To create the parent-child relationship, the parent Event’s dwc:eventID will be also be reported as the child Event’s dwc:parentEventID.

Simple example illustrating how a parent-child relationship between two Events would look using Event identifiers.

dwc:parentEventID

dwc:eventID

survey2022

survey2022

survey2022_a-2

In addition to Event and parent Event identifiers:

  • Site count and site nesting description: Nested datasets should include the total number of sites sampled in eco:siteCount and provide a textual description of the survey design or site sampling structure using eco:siteNestingDescription for each parent Event for which the information is available.

  • Field number: If the survey data include a field number for a specific Event, this should be shared using dwc:fieldNumber.

Event hierarchy terms, their recommended usage (status), and example data entries.
Status Term Example entry

Required

dwc:eventID

survey2022_a-2

Required for nested datasets

dwc:parentEventID

survey2022

Recommended

eco:siteCount

75

eco:siteNestingDescription

25 survey sites each with 3 1m2 quadrats

Share if available

dwc:fieldNumber

RV Sol 87-03-08

Review your DwC Event dataset to ensure that the survey design is accurately reflected in the use of the five (5) available sampling event hierarchy terms. Where additional events or event levels must be created, be sure to reference A Beginner’s Guide to Persistent Identifiers for guidance in creating new persistent identifiers.

Do NOT change existing identifiers if it can be avoided!

3.2. Project information

If the survey(s) being reported were part of a larger Project, four terms are available to capture the project name(s) and funding institution(s).

  • Project title: The official name(s) of the project(s) that contributed to the creation of the dataset should be shared as a concatenated list with values separated using a pipe separator | in dwc:projectTitle.

  • Project ID: A list, concatenated and separated using a pipe separator |, of the globally unique identifiers for the project(s) that contributed to the creation of the dataset should be reported in dwc:projectID.

  • Funding attribution: The official name(s) of the funding body or bodies that provided funding for the survey(s) resulting in the creation of the dataset should be shared as a concatenated list with values separated using a pipe separator | in dwc:fundingAttribution.

  • Funding attribution ID: A list, concatenated and separated using a pipe separator |, of the globally unique identifiers for the funding organizations or agencies that supported the project can be provided in dwc:fundingAttributionID.

Project terms, their recommended usage (status), and example data entries
Status Term Example entry

Share if available

dwc:projectTitle

Scalidophora i Noreg, Biowide

dwc:projectID

RCN276730 | Artsproject_7-24, https://arvenetternansen.com/

dwc:fundingAttribution

Norges forskningsråd

dwc:fundingAttributionID

https://ror.org/00epmv149 | https://ror.org/04jnzhb65

3.3. Survey event site

An Event site is the location at which observations are made or samples and/or measurements are taken. Sharing thorough information about a sampling Event site, including description, locality, and vegetative cover provides critical context to potential data users about conditions in which a survey was conducted. Information about the location of each survey site such best-practice georeferences, site description (locality name, habitat type, microhabitat), and environmental data (e.g., physical parameters, vegetarian, water quality) should be populated at for each Event for which the information is available.

The Darwin Core site terms listed in this section are not comprehensive. Explore all Darwin Core Location class terms and the Humboldt Extension site terms.

3.3.1. Site description

Additional context about a survey site can be reported through myriad terms for every Event that the information is available, including:

  • Site names: survey site names can be reported using eco:verbatimSiteNames. A concatenated list of site names can be provided at higher Event levels with values separated using a pipe separator, |.

  • Habitat: reported habitat at a survey site should be recorded in dwc:habitat. A concatenated list of habitats can be provided at higher Event levels with values separated using a pipe separator, |. Use of a controlled vocabulary is recommended.

  • Weather: reported weather during a survey Event should be reported using eco:reportedWeather. If you have detailed weather data (e.g., weather station or data logger produced data) archived elsewhere, you may provide a link here.

  • Extreme conditions: reported extreme conditions at a site at the time of the survey should be recorded in eco:reportedExtremeConditions.

  • Verbatim site description: verbatim comments (e.g., the original textual description) about a site or sites should be recorded in eco:verbatimSiteDescriptions.

These terms should be populated for each individual Event for which the information is accurate.

General event site terms, their recommended usage (status), and example data entries
Status Term Example entry

Share if available

eco:verbatimSiteNames

Trap_18|Trap_27|Trap_54|Trap_96, Annala | Kumpula

dwc:habitat

Ephemeral wetland

eco:reportedWeather

{"minimumTemperatureInDegreesFahrenheit": 18, "maximumTemperatureInDegreesFahrenheit": 32}

eco:reportedExtremeConditions

Site flooded

eco:verbatimSiteDescriptions

Coastal sand dunes at dry oak forest edge. Vegetation: Ammophila arenaria, Betula pendula, Leymus arenarius, Pinus sylvestris

3.3.2. Site locality

The geographic location and extent of each survey site should be reported. Five terms are currently recommended for Event datasets:

  • Location ID: a unique identifier for each survey site should be shared in dwc:locationID. If a site is visited repeated (as in long-term monitoring and other repeated survey efforts), dwc:locationID should be consistent across Events within a dataset and across datasets in situations where the same survey sites are visited in other datasets.

  • Country code: the ISO two letter code for the country, region, or economy in which a survey takes place should be provided in dwc:countryCode.

  • Latitude-longitude: The decimal latitude and longitude and geodetic datum location of each survey site should be reported in dwc:decimalLatitude, dwc:decimalLongitude, and dwc:geodeticDatum. All three terms should be populated together.

    • If the geographic coordinates of your dataset are not in decimal latitude and decimal longitude format, use the terms dwc:verbatimLatitude, dwc:verbatimLongitude, and dwc:verbatimCoordinateSystem to report geographic location instead.

    • Note that this is a minimum recommendation and does not make data fit for the maximum number of purposes. It is highly recommended to provide georeference information that follow best practices.

Survey site area

Reporting additional information about the areas targeted for sampling and the area(s) actually sampled during a survey is recommended to provide greater context about the geospatial scope of a survey. The Humboldt extension includes two sets of paired terms to report the survey area of an Event: geospatial scope terms and total area sampled terms.

  • Geospatial scope terms (eco:geospatialScopeAreaValue and eco:geospatialScopeAreaUnit) define the geospatial scope or extent of a survey or sampling Event. Geospatial scope terms can be applied at any Event level and should report the entire area considered for the survey.

  • Total area sampled terms (eco:totalAreaSampledValue and eco:totalAreaSampledUnit) report the area actually sampled during an Event. Total area sampled terms can be populated at any Event level but are most commonly applied at lower Event levels to, for example, capture the survey extent of a single plot or (at higher Event levels) the cummulative area surveyed in a series of plots within a site.

In non-nested event datasets, geospatial scope terms and total area sampled terms may contain the same values.

In nested datasets, geospatial scope terms will be equal to or greater than the area values shared in total area sampled terms. See Box 2 for an example.

If the surveyed unit is not an area (e.g., km² or ), dwc:sampleSizeValue and dwc:sampleSizeUnit should be used instead. Examples include:

  • point locations (such as a sensor or trap),

  • distances (such as transect lengths), and

  • volumetric measures (such as a filtered volume of water in a zooplankton haul).

Box 1. Biowide project example

Consider the Biowide project which surveyed 130 40×40m plots across Denmark.

Here, the project-level parent Event would report the full geographic extent of Denmark: eco:geospatialScopeAreaValue = 42934 and eco:geospatialScopeAreaUnit = km² and the sum of sampled areas: eco:totalAreaSampledValue = 208000 and eco:totalAreaSampledUnit = .

For each of the 130 associated child Events representing the individual plots, eco:geospatialScopeAreaValue and eco:geospatialScopeAreaUnit would be left empty because geospatialScope was a characteristic of the higher-level survey design not the individual survey site visits. For the plots, the area of the site would be eco:totalAreaSampledValue = 1600 and eco:totalAreaSampledUnit = .

Additional survey site information
  • Survey site geometry: If available, the geometry of a survey site area should be shared using dwc:footprintWKT and dwc:footprintSRS.

  • Verbatim site location information: A more general text description of the site location, if available, can be shared using dwc:locality.

Event site geographic locality and scope terms and their recommended usage (status), namespace abbreviation, and example data entries.
Status Term Example entry

Recommended

dwc:locationID

Trap_138

dwc:countryCode

SE

dwc:decimalLatitude

59.3168

dwc:decimalLongitude

18.0627

dwc:geodeticDatum

epsg:4326

eco:geospatialScopeAreaValue

580000

eco:geospatialScopeAreaUnit

km²

eco:totalAreaSampledValue

1600

eco:totalAreaSampledUnit

dwc:sampleSizeValue

200

dwc:sampleSizeUnit

Share if available

dwc:footprintWKT

POLYGON ((10 20, 11 20, 11 21, 10 21, 10 20))

dwc:footprintSRS

epsg:4326

dwc:locality

Agriculture site, Kongskilde Friluftsgård, Zealand

3.3.3. Vegetation cover

If vegetation cover data are available for a site (for example, if a relevé was conducted or if a textual site description was provided), it can be reported in three ways:

There is no single best method of reporting vegetation cover information for a site, although it is recommended to choose the most explicit method possible based on the type of information avilable.

If vegetation cover is reported using one of the three methods described above, then eco:isVegetationCoverReported = true; otherwise, eco:isVegetationCoverReported = false.

3.4. Survey date and time

Complete and accurate reporting of the temporal scope of a survey is crucial to asserting Event structure and providing key contextual information about sampling conditions.

Each Event should include a date or date range in dwc:eventDate. Nested datasets should, at the parent Event level, report a date range encompassing the dates of all relevant child Events.

The time and duration of each Event should be reported using dwc:eventTime and the paired terms eco:eventDurationValue and eco:eventDurationUnit respectively.

Refer to GBIF’s technical documentation on date and time interpretation for more guidance on reporting Event dates and times.

Event date and temporal scope terms, their recommended usage (status), and example data entries.
Status Term Example entry

Required

dwc:eventDate

2018-08-29

Recommended

dwc:eventTime

08:00Z

eco:eventDurationValue

1

eco:eventDurationUnit

hour

3.5. Sampling Event protocol

Sampling protocols provide the details of how a survey was conducted. Protocol information should be a detailed, step-wise description outlining all the details about the data collection process necessary to ensure repeatibility of the implemented methodology. Clear communication of a sampling protocol or the method(s) implemented during a survey or monitoring effort guarantees consistency, accuracy, and reliability in the data collected. This information further ensures reproducibility and reusability of a dataset, and facilitates data aggregation, integration, and subsequent analysis.

Sampling protocol terms should be populated for every Event regardless of hierarchical level as inheritance in either direction should not be assumed or inferred between Event levels.

3.5.1. Event type

Biological survey Event data can result from a wide variety of effort types (e.g., Bioblitzes, inventories). The nature of the survey event should be reported using dwc:eventType.

dwc:eventType should provide a high level overview of survey type but should not be so specific as to overlap with sampling protocol. There is no single, standardized vocabulary for dwc:eventType. If your organization or community has a controlled vocabulary, it is recommended to apply terms from that. Otherwise, you can refer to the common event types below for guidance.

Biological survey Event data can result from a wide variety of effort types (e.g., Bioblitzes, inventories, monitoring schemas, expeditions). The nature of the survey Event should be reported using dwc:eventType.

Identifying Event type

dwc:eventType should provide a high level overview or broadly cateorize the type of survey without being so specific as to overlap with sampling protocol. There is no single, standardized vocabulary for dwc:eventType. If your organization or community has a controlled vocabulary, it is recommended to apply terms from that. Otherwise, you can refer to the common Event types below for guidance. More than one term may apply to an Event; choose the term that fits most closely.

  • Project: Projects are structured initiatives with an explicitly stated objective or suite of objectives and with clear targets, timelines, and deliverables. Projects typically are linked to non-biological information identifying participating organizations and people (agents), funding agencies, and other high-level administrative information. Biological sampling may be only one facet of a project’s scope. Project as an dwc:eventType is typically most appropriate only at the highest Event level in a nested dataset.

  • Expedition: An expedition is an organized information gathering venture that inherently includes multiple sampling Events and event types. Expeditions may include multiple taxonomic and/organismal scopes, any number of documented sampling protocols, and varying degrees of complexity in survey design. Expedition as an dwc:eventType is typically most appropriate at higher Event levels in nested hierarchies.

  • Survey: A survey is a broad but systematic effort to collect information about the biological organisms in a specific area at a given time. Surveys typically included at least one documented protocol and may or may not have an explicitly defined taxonomic and/or organismal scope. Survey is the most general Event type term and can be applied as an dwc:eventType at any Event level.

  • Inventory: An inventory is a comprehensive, focused survey of the taxa present in a specific area over an explicit period of time. Inventories typically have an explicit taxonomic and/or organismal scope and a well-defined protocol. Inventory is typically most appropriate as an dwc:eventType at lower Event levels in nested hierarchies.

  • Bioblitz: A bioblitz is a survey Event aimed at finding and identifying as many species as possible in a specific area over a (typically) short, contiguous period of time. Bioblitzes often include participants (agents) with a wide range of backgrounds and levels of expertise in biodiversity sciences including formal biologists as well as the broader, general public. Bioblitz as an dwc:eventType is typically most appropriate at lower Event levels in nested hierarchies.

  • Site visit: A site visit is a single survey, inventory, or sampling at a pre-established geographic location at a discrete time. Site visit as an dwc:eventType is typically most appropriate at the lowest event level in a nested hierarchy.

  • Sample: A survey event denoted by the specific act of collecting physical samples resulting in material specimen. A sampling dwc:eventType is a specific implementation of a survey Event. 'Sample' as an dwc:eventType is typically most appropriate at lower (child) Event levels in a nested hierarchy.

  • Sensor: The detection of an Occurrence (or a group of related occurrences such as a time series or group of organisms) by means of a sensor. A sensor may be static (e.g., camera traps) or mobile (e.g., drones) external to an organism, or it may be attached to an organism (e.g. radio collar). 'Sensor' as an dwc:eventType is typically most appropriate at lower (child) Event levels in a nested hierarchy.

Inventory Event types

If dwc:eventType = inventory, the type(s) of search implemented (e.g., restricted search, open search, opportunistic search, trap or sample, compilation) must be reported in eco:inventoryTypes.

If eco:inventoryTypes = compilation, the compilation type should be reported using eco:compilationTypes and data sources listed in eco:compilationSourceTypes.

  • A is a summary inventory resulting from the combination of multiple existing inventories (as described in [Guralnick2018]). Compilations are aggregates of multiple studies and may combine surveys employing different protocols, processes, and observers, often with variable reporting of the methods employed or other compiled data sources and literature searches.

Event type terms, their recommended usage (status), and example data entries
Status Term Example entry

Recommended

dwc:eventType

Inventory

Recommended if applicable

eco:inventoryTypes

Open search

eco:compilationTypes

compilationOfExistingSourcesAndSamplingEvents

eco:compilationSourceTypes

museumSpecimens | literature

3.5.2. Sampling protocol

Four protocol terms exist; however, only 1 term is currently required to publish an Event dataset in GBIF: dwc:samplingProtocol. This requirement is because the initial Darwin Core Event classification only included the one term. The Humboldt extension introduced an additional three terms to capture information about sampling protocol in a more explicit manner:

Survey Event protocol terms, their recommended usage (status), and example data entries.
Status Term Example entry

Required

dwc:samplingProtocol

Visual survey

Recommended

eco:protocolNames

Visual survey

eco:protocolDescriptions

For each site a total list of lichen species (lichenized fungi) was produced based on a careful examination of soil, wood, stone surfaces and bark of trees up to 2m at three time periods: October-November 2014, February-December 2015 and March and May 2016. Specimens that were not possible to identify with certainty in the field were sampled and subsequently identified in the laboratory. For each species the substrate, e.g. phorophyte (host) species was recorded. All records were registered in www.svampeatlas.dk, and the nomenclature used is in accordance with this database.

eco:protocolReferences

See Appendix B of Brunbjerg, A.K., Bruun, H.H., Brøndum, L. et al. A systematic survey of regional multi-taxon biodiversity: evaluating strategies and coverage. BMC Ecol 19, 43 (2019). https://doi.org/10.1186/s12898-019-0260-x | https://doi.org/10.17504/protocols.io.kxygx3jwkg8j/v1

3.5.3. Absences (non-detections)

Organismal absences are defined here as the lack of detection of organisms that are members of an explicitly stated target taxonomic scope. Absence information is critical to understanding species´ biogeography, modeling species' responses to climate- and human-induced environmental change, conservation planning and resource management, monitoring and restoration efforts, eradications or reintroductions, and other aspects of biodiversity dynamics.

  • If the dataset includes absence information for one or more organisms (to be reported in the occurrence table as dwc:occurrenceStatus = absent), then eco:isAbsenceReported = true.

  • A list of absent taxa can be provided using eco:absentTaxa for all relevant Events. Best practice is to use scientific names to report absent taxa.

    • Absences should only be reported for taxa within the stated taxonomic and/or organismal scope of a survey and should use scientific nomenclature.

    • Absence cannot be asserted for bycatch.

3.5.4. Abundance

Abundance is a quantitative measure of the same taxonomic designation in a particular area at a specific time. Abundance data are a key indicator of ecological health. They are necessary for evaluating ecological patterns and dynamics, managing invasive species, informing effective habitat and ecosystem management, and for practical tasks such as quantifying existing resource.

Absence and abundance terms, their recommended usage (status), and example data entries.
Status Term Example entry

Recommended

eco:isAbsenceReported

true or false

eco:isAbundanceReported

true or false

eco:isAbundanceCapReported

true or false

Share if available

eco:absentTaxa

eco:abundanceCap

5

3.5.5. Material samples

A material sample is a physical entity ´…​ that represents an entity of interest in whole or in part´ (see dwc:MaterialSample). Essentially, material samples are specimens collected during a survey. A material sample may consist of an entire organism, part of an organism, or a genetic sample, or even multiple organisms not necessarily of the same taxonomic designation.

If the dataset includes at least one specimen from which a material sample was taken, for each relevant Event:

If the dataset or Event does not include material samples, eco:hasMaterialSamples = false.

3.5.6. Vouchers

A voucher is a physical specimen or material sample collected and accessioned into a museum collection in support of a specific project or survey.

If the dataset has vouchers, for each relevant Event:

If the dataset or sampling event does not include vouchers, eco:hasVouchers = false.

3.5.7. Least specific target category quantity inclusive

The term eco:isLeastSpecificTargetCategoryQuantityInclusive indicates if the total number of organisms detected for a dwc:Taxon (including all its subgroups) is shown in one record in dwc:individualCount or the paried terms dwc:organismQuantity and dwc:organismQuantityType in the occurrence table. This true/false (Boolean) term helps data users know if the numbers given in these terms include all organisms of that dwc:Taxon.

3.5.8. Data generalizations & information withheld

Although the general recommendation is to share all biodiversity data available at its highest spatio-temporal resolution, situations exist where it is necessary to generalize data prior to sharing a dataset publicly or even withhold information completely. Two terms are available to communicate if data are generalized or withheld in a dataset: dwc:dataGeneralizations and dwc:informationWithheld.

While it is the responsibility of the publisher to protect sensitive species occurrence data, it is also the data publisher´s responsibility to clearly communicate any action(s) taken and to indicate if the full data are available upon request. How you generalize sensitive data (for example, restricting the resolution of the data) depends on the species´ category of sensitivity. Where there is low risk of adverse outcomes, unrestricted publication of sensitive species data may remain appropriate. See the published guide Current Best Practices for Generalizing Sensitive Species Occurrence Data or guidance on when and how to generalize or withhold information sensitive biodiversity data [Chapman2020]. The guide is also available in French and Spanish.

Reporting data generalizations

When generalizing data you should try not to reduce the value of the data for analysis. A clear summary of the data generalization process should be reported for each relevant Event using dwc:dataGeneralizations.

For example, if the spatial resolution of locality data for an Event is reduced to the nearest half degree, then dwc:dataGeneralizations = Coordinates generalized from original GPS coordinates to the nearest half degree grid cell for each Event to which this treatment was applied. If the location information was generalized for every survey site in a nested hierarchy, then at the parent Event level dwc:dataGeneralizations = Coordinates for each event site generalized from original GPS coordinates to the nearest half degree grid cell.

Reporting information withheld

If specific data are not reported with the dataset, a clarifying statement should be provided with each affected Event using dwc:informationWithheld.

For example, if sensitive species data are purposefully excluded from the published data, dwc:informationWithheld should include a statement along the lines of Sensitive species occurrence information not reported.

3.5.9. Verbatim fields

Two verbatim fields are available to provide additional information about an Event.

  • Field notes: Field notes can be copied, transcribed verbatim, or linked into dwc:fieldNotes.

  • Event remarks: Additional comments about a particular Event that don’t fit in any other term can be shared using dwc:eventRemarks.

Both fields can be applied to any Event at any level.

Other survey protocol information and verbatim protocol terms, their recommended usage (status), and example data entries.
Status Term Example entry

Recommended

eco:hasMaterialSamples

true or false

eco:hasVouchers

true or false

eco:isLeastSpecificTargetCategoryQuantityInclusive

true or false

Share if available

eco:materialSampleTypes

wholeOrganism, blood

eco:voucherInstitutions

AMNH | KUNHM

dwc:dataGeneralizations

Coordinates generalized from original GPS coordinates to the nearest half degree grid cell, Coordinates for each event site generalized from original GPS coordinates to the nearest half degree grid cell

dwc:informationWithheld

Sensitive species occurrence information not reported

dwc:fieldNotes

Notes available in the Grinnell-Miller Library

dwc:eventRemarks

3.6. Scope and completeness

Survey scope identifies the organisms targeted (or not targeted) during a survey. Structured reporting of explicitly stated survey scopes is necessary for evaluating and reporting completeness and is critical to understanding if the data can be used to assert absences (non-detections) of taxa.

Completeness indicates the thoroughness of a survey relative to the stated scope. Structured reporting of explicitly stated survey scopes and completeness is necessary for evaluating and reporting completeness and is critical to understanding if the data can be used to assert absences (non-detections) of taxa. Reported scope and completeness information facilitates the ability of downstream data users to interpret species populations, areas of occupancy, infer species absences, etc.

The 'target' and 'excluded' scope terms (e.g., eco:targetTaxonomicScope) presented in this section are the only Event terms designed to capture intent. That is, these terms capture the breadth of the information the biological survey intended to capture. All other terms should be used to report the actuality of the survey (e.g., what protocol was in practice implemented, what information was actually collected).

Implementing scope terms

  • Scope terms can be applied at any Event level.

  • Recommended best practice is to populate scope terms every Event to which they apply. This information should be reported only at the Event levels for which the information is explicitly stated; information should not be inferred up or down an Event hierarchy.

  • Scope terms of an Event must be populated whenever the scope was in effect to be able to infer absence of detection within that Event whenever the Occurrences linked to that Event do not explicitly state zero counts or when there are no Occurrence records for a given taxon that fell within the taxonomic scope (see Section 3.2.4 Principle of inference in Properties of hierarchical events in the Humboldt Extension for Ecological Inventories).

  • Do not retrospectively infer scope terms.

3.6.1. Verbatim scope

The complete scope explicitly identifying the full suite of stated parameters defining the breadth of a sampling Event should be reported using eco:verbatimTargetScope. eco:verbatimTargetScope is particularly useful for capturing scope conditions not covered by existing taxonomic or organismal scope terms.

General scope terms, their recommended usage (status), and example data entries.
Status Term Example entry

Recommended

eco:verbatimTargetScope

Adult flying insects

3.6.2. Taxonomic scope

Reporting taxonomic scope enables reliable, quantitative, and statistical interpretation of survey and monitoring data. Knowledge of taxonomic scope is essential to interpret local non-detection of taxa as local absences. The taxonomic scope, stated either as targeted or intentionally excluded taxa, should be reported using eco:targetTaxonomicScope and eco:excludedTaxonomicScope.

If every organism in the stated terms:eco[eco:targetTaxonomicScope] that was observed during an Event was reported, then eco:isTaxonomicScopeFullyReported = true; if not, eco:isTaxonomicScopeFullyReported = false.

Knowledge about taxonomic completeness allows data users to determine how comprehensively an area was sampled.

If a specific person(s) or organization(s) are reported as making the taxonomic identifications relevant to the stated survey scope(s), they should be acknowledged in dwc:identifiedBy. A list of names can be be shared with values separated by a |. It is not possible to share a list of unique identifiers such as ORCID´s at the Event level.

Taxonomic scope terms, their recommended usage (status), and example data entries.
Status Term Example entry

Recommended

eco:targetTaxonomicScope

Arthropods

eco:excludedTaxonomicScope

Aves

Share if available

eco:isTaxonomicScopeFullyReported

true or false

eco:taxonCompletenessReported

reportedComplete, reportedIncomplete, or notReported

eco:taxonCompletenessProtocols

Based on sampling effort

dwc:identifiedBy

3.6.3. Organismal scope

Why are organismal scope terms important?

As with taxonomic scope, providing information about other organismal scopes when relevant enables reliable, quantitative interpretation of survey and monitoring data and can be essential to interpreting local non-detection as local absences.

Organismal scope terms

As with taxonomic scope, providing information about other organismal scopes when relevant enables reliable, quantitative interpretation of survey and monitoring data and can be essential to interpreting local non-detection as local absences. Three categories of terms are available with which to report an explicitly stated target or excluded organismal scope, and state whether or not all target organisms observed were reported.

Any explicitly stated target or excluded organismal scopes, and clarification as to whether or not all target organisms observed were reported (true or false), should be indicated using the following terms:

Any additional organismal scopes should be reported using eco:verbatimTargetScope.

Organismal scope terms, their recommended usage (status), and example data entries.
Status Term Example entry

Share if available

eco:targetLifeStageScope

larva

eco:excludedLifeStageScope

adult | juvenile

eco:isLifeStageScopeFullyReported

true or false

eco:targetDegreeOfEstablishmentScope

native

eco:excludedDegreeOfEstablishmentScope

invasive

eco:isDegreeOfEstablishmentScopeFullyReported

true or false

eco:targetGrowthFormScope

tree

eco:excludedGrowthFormScope

shrub

3.6.4. Bycatch

Bycatch are organisms detected during a survey that were not explicitly targeted in the scope of a study. Bycatch, or a lack thereof, in a dataset can be reported at the taxonomic and organismal levels.

If taxonomic bycatch are reported:

If organismal bycatch are reported:

If the dataset does NOT include taxonomic or organismal bycatch:

Bycatch terms, their recommended usage (status), and example data entries.
Status Term Example entry

Share if available

eco:hasNonTargetTaxa

true or false

eco:areNonTargetTaxaFullyReported

true or false

eco:nonTargetTaxa

Parabuteo unicinctus | Geranoaetus melanoleucus; Cetoniinae | Aclopinae | Cyclocephala modesta

eco:hasNonTargetOrganisms

true or false

3.6.5. Habitat scope

If the survey includes an explicitly stated targeted or excluded habitat scope these can be reported in eco:targetHabitatScope and eco:excludedHabitatScope.

The actual habitat observed at a survey site during an Event should be reported in dwc:habitat.

Habitat scope terms, their recommended usage (status), and example data entries.
Status Term Example entry

Share if available

eco:targetHabitatScope

deciduous forest

eco:excludedHabitatScope

urban

3.7. Sampling Effort

Sampling effort communicates information about the likelihood that a type of organism were be detected: greater effort generally equals a higher probability of detection. Clear reporting of sampling effort is necessary for interpretation of measures of completeness and calculation of abundance (relative or absolute) or biomass, and is critical in assessing the ability to compare information and aggregate data across studies.

Capture sampling effort information as structured data using the following Humboldt extension terms:

The DwC Event term dwc:samplingEffort is currently a recommended field when publishing Event datasets to GBIF; however, this term captures sampling effort in an unstructured way. The Humboldt extension includes 5 terms to more explicitly capture different aspects of sampling effort. The updated recommended best practice is to report sampling effort information as structured data using the Humboldt Extension terms. Through these terms, data providers may explicitly provide the following information:

  • Is sampling effort reported?: Indicate if sampling effort is reported (true or false) in eco:isSamplingEffortReported.

  • Sampling effort protocol: eco:samplingEffortProtocol should contain a textual description of the sampling effort protocol (e.g., number and arrangement of people or sensors deployed, whether or not sensors were mobile or stationary, how frequently observation, measurements, or samples were taken) and/or provide a link to the protocol used.

  • Sampling effort: report sampling effort (e.g., the total amount of time of the sampling Event, the total numer of people involved) value and units (e.g., trap nights, people) using the paired terms eco:samplingEffortValue and eco:samplingEffortUnit.

  • Sampling performed by: eco:samplingPerformedBy should be used to credit the people involved in the sampling eventSampling effort. The names or one or more people can be reported, with individual names in a list separated with |. Best practice is to use a unique identifier (e.g., ORCID) if available.

    • NOTE: Because eco:samplingPerformedBy has an IRI (internationalized resource identifier) equivalent, only a single ORCID can be provided (the term cannot support a list). If more than one ORCID needs to be shared, a list of ORCID´s (using the pipe separator between values) can be supplied using the term dwc:recordedByID used BUT it must be applied to each relevant Occurrence and located on the occurrence table.

Sampling effort terms, their recommended usage (status), and example data entries.
Status Term Example entry

Recommended

eco:isSamplingEffortReported

true or false

eco:samplingEffortProtocol

40 box traps deployed in the afternoon even spacings along 4 parallel 100m transects placed 50m apart and visited after sunrise the next day

eco:samplingEffortValue

40, 5

eco:samplingEffortUnit

trap nights, person hours

dwc:samplingEffort

40 trap nights, 5 person hours

eco:samplingPerformedBy

A. Townsend Peterson

Appendix A: Additional guidance and seeking assistance

Additional DwC Event terms

While all Humboldt extension terms are covered in this guide, the Darwin Core Event terms included are not exhaustive. The full suite of available DwC Event terms that can be applied to a DwC-A Event dataset can be found in the GBIF Repository of Schemas Darwin Core Event page.

Need more information?

Check out the following documentation:

Or, reach out for assistance from:

  • Humboldt Extension GitHub repository: questions about usage, issues with the vocabulary, and recommendations for new terms should be reported as an Issue.

  • The GBIF community forum

  • The GBIF Node for your country or organization

    • If your country or organization is a member of GBIF and has an established node, you can reach out directly to your node.

      • If you’re uncertain if your country or organization is part of the GBIF network you can search here.

    • If your country or organization is not a member of GBIF, reach out to the GBIF helpdesk.

  • GBIF help desk