This document is also available in PDF format.

Colophon
Suggested citation
Ingenloff K (2025) Survey and Monitoring Data Quick-Start Guide: A how-to for updating a Darwin Core dataset using the Humboldt Extension. GBIF Secretariat: Copenhagen. https://doi.org/10.35035/doc-7t3p-ve38
Licence
The document Survey and Monitoring Data Quick-Start Guide: A how-to for updating a Darwin Core dataset using the Humboldt Extension is licensed under Creative Commons Attribution-ShareAlike 4.0 Unported License.
Acknowledgement
Survey and Monitoring Data Quick-Start Guide: A how-to for updating a Darwin Core dataset using the Humboldt extension was produced under the BioDT project, which received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101057437.
Cover image
Illustration by Javier Gamboa, GBIF Secretariat 2025. Licensed under Creative Commons Attribution-ShareAlike 4.0 Unported License.
Summary
The details about a biological survey (how it was carried out, the spatio-temporal scope, the taxonomic groups targeted, who was involved, etc.) are important to properly understand the structure of the survey and how the published data can be reused. The Humboldt Extension for Ecological Inventories (HE), a vocabulary extension to the Darwin Core (DwC) Event Class, provides a means by which to explicitly report the context in which species occurrence data and/or material specimens were collected. The extension includes 55 terms to capture critical facets of survey design including protocol, scope, and sampling effort in a structured manner, thus enhancing overall FAIRness (specifically findability and interoperability) of biological survey data.
This document will guide GBIF data publishers who (a) already have data formatted as a Darwin Core (DwC) Event dataset through the process of updating their dataset with the Humboldt extension or (b) are comfortable with the DwC Event core and wish to map a new dataset as a DwC Event class to DwC Event class and Humboldt extension terms.
If you are not yet comfortable with Darwin Core Event dataset and are looking for more in-depth guideance in structuring a your data following Darwin Core guidelines, refere to the more comprehensive document, Guide for publishing biological survey and monitoring data to GBIF.
1. Getting started
The process of updating your Darwin Core Archive (DwC-A) Event dataset with the Humboldt extension will likely involve moving some information from the existing DwC-A metadata to the event
table; referring to existing documentation, publications, or associated weblinks related to the dataset; and, when possible, conference with the original data collectors or individuals involved in the design and oversight of the project resulting in the dataset. Data republication efforts described here are expected to increase the value and usefulness of existing event datasets in GBIF, broaden their application and therefore data citation across science and policy reuse scenarios.
Before you get started, we recommend that you prepare by taking the following steps:
-
Review the information already published in your DwC-A for the Event dataset, focusing specifically on the Event, metadata, and extended measurement or facts tables, noting where key information about survey design, sampling protocol, scope, and effort are available.
-
Identify additional dataset resources that can be referred to including supplementary documentation, publications, websites, and dataset contacts and people involved in the data collection or oversight of the project or survey.
-
Review the reported data structure to determine if the existing event hierarchy accurately reflect the data and the level of complexity desired. Make necessary changes.
Now, you’re ready to capture survey design data using the Humboldt extension following the recommendations below.
2. Data mapping template
A basic biodiversity survey data template is available to facilitate mapping of survey data and preparation for formatting of a DwC-A. The template can be accessed as a single .xlsx file or as three separate .csv files.
-
Survey data template (.xlsx)
-
Survey event table template (.csv)
-
Survey template README (.csv)
Table | Description |
---|---|
event |
Terms in the Column heads are populated with the DwC Event core and Humboldt extension terms referenced in this guide. The rows beneath each term include term definitions, comments, recommended usage for publication in GBIF, and additional comments or usage guidance. |
occurrence |
Terms in the Column heads are populated with the DwC Occurrence extension terms referenced in this guide. The rows beneath each term include term definitions, comments, and recommended usage for publication in GBIF. Additional Occurrence extension terms should be added to your own data |
README |
The README table provides additional information about the structure and information included in each data table. |
2.1. Example datasets
-
Faveyts W and Cooleman S (2025). Bird census counts at the Zwin Nature Park. Version 1.5. Belgian Biodiversity Platform. Sampling event dataset https://doi.org/10.15468/saesvn.
-
Palpurina S (2025). Vegetation plots collected in dry grasslands throughout Bulgaria and Romanian Dobrudzha. Version 1.12. Masaryk University, Department of Botany and Zoology. Sampling event dataset https://doi.org/10.15468/pkx4tg.
-
Piesschaert F, Vermeersch G, Brosens D, Westra T, Desmet P, Feys S, Van de Poel S, Pollet M, and Cooleman S (2025). ABV - Common breeding birds in Flanders, Belgium (post 2016). Version 1.14. Research Institute for Nature and Forest (INBO). Sampling event dataset https://doi.org/10.15468/pj2v6h.
-
van Klink R and Gerrits G (2025). Biological Station Wijster standard trapping program: Sampling event data for ground beetles (Coleoptera: Carabidae). Version 1.3. WBBS foundation. Sampling event dataset https://doi.org/10.15468/3mcqja.
3. Updating your DwC Event dataset with the Humboldt extension
The contextual information about survey Events should be saved to the DwC-A |
This section will guide you through the process of mapping Event-level data specifically related to survey structure, location, protocols, scopes, and effort.
About DwC terms in this document
Each term in this document is linked with its respective term internationalized resource identifier (IRI) alias (ex., eco:protocolNames). Always use these links to refer to the definition, comments, and examples provided when populating a term. The terms to be used to describe Event-level information are a combination of Darwin Core Event class and Humboldt Extension terms:
|
3.1. Survey sampling design and event hierarchy
The first step in the process of updating your Darwinc Core dataset it check that the existing datase structure reflects the survey design implemented to capture the data reported in your dataset. Biological survey design, the sampling structure of a biological survey, varies widely. Identifying how to best translate survey design to DwC Event core is the most difficult part of mapping a survey dataset. DwC defines an Event as 'an action that occurs at some location during some time’, such as a specimen collection expedition, a camera trap image capture, or a marine trawl. This broad definition of Event means biological surveys can be framed as a single Event or as a series of Events nested within Events using a parent-child relationship as necessary. The sampling Event hierarchy is the translation of survey design into an Event-based perspective using Darwin Core.
Sharing biodiversity data in a way that clearly and accurately reflects survey design helps ensure accurate understanding and interpretation of the information contained in a dataset enabling potential data users to more readily assess the appropriateness of the data for inclusion in their own analyses.
3.1.1. Non-nested datasets
The simplest Event data structure is a non-nested dataset. Non-nested datasets reflect a simple or flat survey design structure (Figure 1). These are typically simple datasets consisting of:
-
a single sampling Event occurring at a particular place and time and conducted using a single standardized sampling protocol that is not repeated and is not necessarily part of a larger sampling schema (Figure 1a), or
-
a series of single sampling Events that are not joined by a larger parent Event (Figure 1b). A compilation (e.g., a combination of unrelated surveys, compiled data sources and/or literature searches, see the Biological survey data section) could be a special case of non-nested dataset where there is a unique Event level that describes the compilation itself (e.g., the broad area where multiple surveys are aggregated), which results in one or more Occurrences.

3.1.2. Nested datasets
More complex survey designs will require implemntation of a nested dataset structure. Nested datasets use parent-child relationships to capture information collected through more complex survey design, such as datasets resulting from repeated sampling events and/or multiple sampling protocols. Creating nested Event levels may be important to relating the full story a dataset has to tell and to facilitating downstream analysis of the data.
There is no single correct dataset structure. Identifying the data structure most appropriate for a dataset may not always be a straightforward process; however, structure is most commonly defined as a function of sampling location, protocol, and date.
The goal in establishing a dataset structure is to keep it as simple as possible while still accurately representing the survey design. There may be multiple ways to structure a dataset and there is no single correct dataset structure. Further, identifying the data structure most appropriate for a dataset may not be a straightforward process. As a general guideline, dataset structure is most commonly defined as a function of sampling location, protocol, and date.
Consider a hypothetical survey where two sampling protocols (Protocol a and Protocol b) are implemented at two different sites (Site 1 and Site 2). Both sites are sampled (site visits) twice (t1 and t2) using each of the protocols.
This survey dataset could be structured with two Event levels as shown in Figure 2. Here, the highest Event level would consist of four Events representing each unique site-protocol combination: Site 1–Protocol a, Site 1–Protocol b, Site 2–Protocol a, Site 2–Protocol b. Events at the lowest Event level will represent site visits that occur on a particular date for each site-protocol combination. Organismal Occurrence information collected during each site visit is linked to the relevant site visit Event. This two Event level structure represents the simplest possible nested dataset structure with only a single level of nesting.
It is ideal to structure a dataset such that each implemented protocol and unique site location is represented as a specific Event so that information from the same pool of species (i.e. location) and likelihood of detecting these species (i.e. protocol) is joined together by being part of the same Event. However, it is not always possible to disentangle information collected using multiple protocols.

3.1.3. Project information
Surveys conducted as part of a larger or established network or project should report as much contextual information as possible to capture information about the project or network. Project-level information will always be shared at the highest Event level. This can be achieved in one of two ways:
-
By embedding project-level information within the highest existing survey Event level. With the dataset presented in Figure 2, project-level information would be included with each of the four Site–Protocol Events.
-
By introducing a new parent Event level above all existing Events dedicated to capturing project-level information. In the context of the example dataset presented in Figure 2, this would mean adding a third Event level to the dataset structure that is parent to all four Site–Protocol Events (see Figure 3). Creating a single parent Event is particularly useful option when a project will result in multiple, independent datasets. In this case, the Event identifier used for the project Event level can be used in all relevant datasets providing a means of identifying related datasets.

3.1.4. Complex survey design
-
multiple protocols are implemented within the same survey design,
-
survey outputs include a mix of data types (e.g., specimen collections, field observations, observed co-occurrences),
-
collected material contributes to downstream products (e.g., trait data, lab measurements, voucher specimens, media representations), or
-
relationships among datasets need to be preserved or exposed (e.g., datasets resulting from different types of surveys within the same Project and/or at the same established survey sites).
For example, consider the dataset Krill along the 110°E meridian: Oceanographic influences on assemblages in the eastern Indian Ocean, RV Investigator voyage IN2019_V03 (2019), published by Ocean Biodiversity Information System (OBIS)-Australia. The dataset contains information about a zooplankton survey conducted by the CSIRO Marine National Facility in the eastern Indian Ocean in 2019. The survey consisted of daytime and nighttime sampling at 20 locations (stations) along an established transect. As illustrated in Figure 4, this dataset could be structured as a non-nested dataset (Figure 4a) or as nested dataset (Figures 4b-d); and, as a nested dataset, the structure could be simple (Figures 4b and c) or more deeply nested with more than two Event levels (Figure 4d).
-
Non-nested dataset structure (Figure 4a): As a non-nested dataset, each sampling at a given station at a particular date and time would be a unique Event with no obvious link to other Events in the dataset beyond being part of the same dataset. Implementing this structure is the simplest approach to sharing data from the survey, however, without any nesting of Events, it may be difficult for data users to understand the relationships between survey Events. Associated Occurrences are related to the appropriate Event via the Occurrence extension.
-
Simple nested dataset structure (Figure 4b): An alternative a simple nested dataset structure could consist of two Event levels. The highest Event level would capture information about the survey stations, where each of the 20 survey stations would be a unique, unrelated parent Event to the relevant daytime and nighttime sampling Events. Associated Occurrences would be related to the appropriate Event via the Occurrence extension.
-
Simple nested dataset structure (Figure 4c): As a simple nested dataset, the data structure would consist of two Event levels with the highest Event level capturing information about the overall cruise or campaign and second Event level represents the daytime and nighttime sampling events at each station as a series of unique Events. Associated Occurrences are related to the appropriate Event via the Occurrence extension.
-
Deeply nested dataset structure (Figure 4d): As a more deeply nested dataset, the structure would consist of three Event levels: the highest Event level represents the Survey (that is, the overall cruise or campaign); the middle Event level represents each of the 20 survey stations; and, the lowest Event level represents the daytime and nighttime sampling Events at each station. Note that the child Events of each parent Event are used to report independent replicates of the same type within the same parent Event and/or to preserve individual sampling units. Associated Occurrences are related to the appropriate Event via the Occurrence extension.
If the survey itself was a unique Event, the simpler two Event level structure (e.g., Figures 4b and 4c) would likely suffice. However, the stations sampled during the survey are standard sampling locations used in other survey efforts not covered by this dataset. To make it easier to link information from this dataset to data from other surveys conducted at the same localities, a more complex nested structure was chosen by the data publisher.

3.1.5. Sampling Event hierarchy terms
Historically, only 2 terms were available to structure and relate different levels of survey design in a dataset: dwc:eventID and dwc:parentEventID. One additional Darwin Core Event term, dwc:fieldNumber, provided a means by which to relate a sampling Event with a dataset- or project-specific field number. The Humboldt extension provides an additional 2 terms (eco:siteCount and eco:siteNestingDescription) to better support complex or nested survey designs.
Event data in GBIF
|
Non-nested datasets
-
Each Event in a non-nested dataset must be assigned a unique dwc:eventID.
-
Non-nested datasets will not have a dwc:parentEventID.
Nested datasets
Nested hierarchies are established by relating a child Event to a parent Event through the child Event´s dwc:parentEventID. As such, these more complex datasets require use of both dwc:eventID and dwc:parentEventID.
-
Each Event in a nested dataset must have a unique dwc:eventID.
-
Each child Event should include the dwc:parentEventID of its parent in dwc:parentEventID.
In practice, this means that the parent and the child will each have a unique dwc:eventID. To create the parent-child relationship, the parent Event’s dwc:eventID will be also be reported as the child Event’s dwc:parentEventID.
survey2022 |
|
survey2022 |
survey2022_a-2 |
In addition to Event and parent Event identifiers:
-
Site count and site nesting description: Nested datasets should include the total number of sites sampled in eco:siteCount and provide a textual description of the survey design or site sampling structure using eco:siteNestingDescription for each parent Event for which the information is available.
-
Field number: If the survey data include a field number for a specific Event, this should be shared using dwc:fieldNumber.
Status | Term | Example entry |
---|---|---|
Required |
|
|
Required for nested datasets |
|
|
Recommended |
|
|
|
||
Share if available |
|
Review your DwC Event dataset to ensure that the survey design is accurately reflected in the use of the five (5) available sampling event hierarchy terms. Where additional events or event levels must be created, be sure to reference A Beginner’s Guide to Persistent Identifiers for guidance in creating new persistent identifiers.
Do NOT change existing identifiers if it can be avoided! |
3.2. Project information
If the survey(s) being reported were part of a larger Project, four terms are available to capture the project name(s) and funding institution(s).
-
Project title: The official name(s) of the project(s) that contributed to the creation of the dataset should be shared as a concatenated list with values separated using a pipe separator
|
in dwc:projectTitle. -
Project ID: A list, concatenated and separated using a pipe separator
|
, of the globally unique identifiers for the project(s) that contributed to the creation of the dataset should be reported in dwc:projectID. -
Funding attribution: The official name(s) of the funding body or bodies that provided funding for the survey(s) resulting in the creation of the dataset should be shared as a concatenated list with values separated using a pipe separator
|
in dwc:fundingAttribution. -
Funding attribution ID: A list, concatenated and separated using a pipe separator
|
, of the globally unique identifiers for the funding organizations or agencies that supported the project can be provided in dwc:fundingAttributionID.
Status | Term | Example entry |
---|---|---|
Share if available |
|
|
|
||
|
||
3.3. Survey event site
An Event site is the location at which observations are made or samples and/or measurements are taken. Sharing thorough information about a sampling Event site, including description, locality, and vegetative cover provides critical context to potential data users about conditions in which a survey was conducted. Information about the location of each survey site such best-practice georeferences, site description (locality name, habitat type, microhabitat), and environmental data (e.g., physical parameters, vegetarian, water quality) should be populated at for each Event for which the information is available.
The Darwin Core site terms listed in this section are not comprehensive. Explore all Darwin Core Location class terms and the Humboldt Extension site terms. |
3.3.1. Site description
Additional context about a survey site can be reported through myriad terms for every Event that the information is available, including:
-
Site names: survey site names can be reported using eco:verbatimSiteNames. A concatenated list of site names can be provided at higher Event levels with values separated using a pipe separator,
|
. -
Habitat: reported habitat at a survey site should be recorded in dwc:habitat. A concatenated list of habitats can be provided at higher Event levels with values separated using a pipe separator,
|
. Use of a controlled vocabulary is recommended. -
Weather: reported weather during a survey Event should be reported using eco:reportedWeather. If you have detailed weather data (e.g., weather station or data logger produced data) archived elsewhere, you may provide a link here.
-
Extreme conditions: reported extreme conditions at a site at the time of the survey should be recorded in eco:reportedExtremeConditions.
-
Verbatim site description: verbatim comments (e.g., the original textual description) about a site or sites should be recorded in eco:verbatimSiteDescriptions.
These terms should be populated for each individual Event for which the information is accurate.
Status | Term | Example entry |
---|---|---|
Share if available |
|
|
|
||
|
||
|
||
|
3.3.2. Site locality
The geographic location and extent of each survey site should be reported. Five terms are currently recommended for Event datasets:
-
Location ID: a unique identifier for each survey site should be shared in dwc:locationID. If a site is visited repeated (as in long-term monitoring and other repeated survey efforts), dwc:locationID should be consistent across Events within a dataset and across datasets in situations where the same survey sites are visited in other datasets.
-
Country code: the ISO two letter code for the country, region, or economy in which a survey takes place should be provided in dwc:countryCode.
-
Latitude-longitude: The decimal latitude and longitude and geodetic datum location of each survey site should be reported in dwc:decimalLatitude, dwc:decimalLongitude, and dwc:geodeticDatum. All three terms should be populated together.
-
If the geographic coordinates of your dataset are not in decimal latitude and decimal longitude format, use the terms dwc:verbatimLatitude, dwc:verbatimLongitude, and dwc:verbatimCoordinateSystem to report geographic location instead.
-
Note that this is a minimum recommendation and does not make data fit for the maximum number of purposes. It is highly recommended to provide georeference information that follow best practices.
-
Survey site area
Reporting additional information about the areas targeted for sampling and the area(s) actually sampled during a survey is recommended to provide greater context about the geospatial scope of a survey. The Humboldt extension includes two sets of paired terms to report the survey area of an Event: geospatial scope terms and total area sampled terms.
-
Geospatial scope terms (eco:geospatialScopeAreaValue and eco:geospatialScopeAreaUnit) define the geospatial scope or extent of a survey or sampling Event. Geospatial scope terms can be applied at any Event level and should report the entire area considered for the survey.
-
Total area sampled terms (eco:totalAreaSampledValue and eco:totalAreaSampledUnit) report the area actually sampled during an Event. Total area sampled terms can be populated at any Event level but are most commonly applied at lower Event levels to, for example, capture the survey extent of a single plot or (at higher Event levels) the cummulative area surveyed in a series of plots within a site.
In non-nested event datasets, geospatial scope terms and total area sampled terms may contain the same values.
In nested datasets, geospatial scope terms will be equal to or greater than the area values shared in total area sampled terms. See Box 2 for an example.
If the surveyed unit is not an area (e.g., km²
or m²
), dwc:sampleSizeValue and dwc:sampleSizeUnit should be used instead. Examples include:
-
point locations (such as a sensor or trap),
-
distances (such as transect lengths), and
-
volumetric measures (such as a filtered volume of water in a zooplankton haul).
Additional survey site information
-
Survey site geometry: If available, the geometry of a survey site area should be shared using dwc:footprintWKT and dwc:footprintSRS.
-
Verbatim site location information: A more general text description of the site location, if available, can be shared using dwc:locality.
Status | Term | Example entry |
---|---|---|
Recommended |
|
|
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
Share if available |
|
|
|
||
|
3.3.3. Vegetation cover
If vegetation cover data are available for a site (for example, if a relevé was conducted or if a textual site description was provided), it can be reported in three ways:
-
Verbatim vegetation cover: a verbatim or textual description of vegetation cover can be captured using eco:verbatimSiteDescriptions.
-
Percent vegetation cover: simple percent vegetation cover can be recorded as structured data using the extended measurement or fact extension. Data reported using emof should be linked to the appropriate Event using dwc:eventID. See the 'Extended measurement or fact (eMoF) extension' section for details on using the extension.
-
Vegetation plot survey: vegetation plot survey information (that is, data collected during a relevé) can be reported using the relevé extension. Data from individual relevé’s should be linked to the appropriate Event using dwc:eventID. See the 'Relevé extension' section for details on using the extension.
There is no single best method of reporting vegetation cover information for a site, although it is recommended to choose the most explicit method possible based on the type of information avilable.
If vegetation cover is reported using one of the three methods described above, then eco:isVegetationCoverReported = true
; otherwise, eco:isVegetationCoverReported = false
.
3.4. Survey date and time
Complete and accurate reporting of the temporal scope of a survey is crucial to asserting Event structure and providing key contextual information about sampling conditions.
Each Event should include a date or date range in dwc:eventDate. Nested datasets should, at the parent Event level, report a date range encompassing the dates of all relevant child Events.
The time and duration of each Event should be reported using dwc:eventTime and the paired terms eco:eventDurationValue and eco:eventDurationUnit respectively.
Refer to GBIF’s technical documentation on date and time interpretation for more guidance on reporting Event dates and times.
Status | Term | Example entry |
---|---|---|
Required |
|
|
Recommended |
|
|
|
||
|
3.5. Sampling Event protocol
Sampling protocols provide the details of how a survey was conducted. Protocol information should be a detailed, step-wise description outlining all the details about the data collection process necessary to ensure repeatibility of the implemented methodology. Clear communication of a sampling protocol or the method(s) implemented during a survey or monitoring effort guarantees consistency, accuracy, and reliability in the data collected. This information further ensures reproducibility and reusability of a dataset, and facilitates data aggregation, integration, and subsequent analysis.
Sampling protocol terms should be populated for every Event regardless of hierarchical level as inheritance in either direction should not be assumed or inferred between Event levels. |
3.5.1. Event type
Biological survey Event data can result from a wide variety of effort types (e.g., Bioblitzes, inventories). The nature of the survey event should be reported using dwc:eventType.
dwc:eventType should provide a high level overview of survey type but should not be so specific as to overlap with sampling protocol. There is no single, standardized vocabulary for dwc:eventType. If your organization or community has a controlled vocabulary, it is recommended to apply terms from that. Otherwise, you can refer to the common event types below for guidance.
Biological survey Event data can result from a wide variety of effort types (e.g., Bioblitzes, inventories, monitoring schemas, expeditions). The nature of the survey Event should be reported using dwc:eventType.
Inventory Event types
If dwc:eventType = inventory
, the type(s) of search implemented (e.g., restricted search, open search, opportunistic search, trap or sample, compilation) must be reported in eco:inventoryTypes.
If eco:inventoryTypes = compilation
, the compilation type should be reported using eco:compilationTypes and data sources listed in eco:compilationSourceTypes.
-
A is a summary inventory resulting from the combination of multiple existing inventories (as described in [Guralnick2018]). Compilations are aggregates of multiple studies and may combine surveys employing different protocols, processes, and observers, often with variable reporting of the methods employed or other compiled data sources and literature searches.
Status | Term | Example entry |
---|---|---|
Recommended |
|
|
Recommended if applicable |
|
|
|
||
|
3.5.2. Sampling protocol
Four protocol terms exist; however, only 1 term is currently required to publish an Event dataset in GBIF: dwc:samplingProtocol. This requirement is because the initial Darwin Core Event classification only included the one term. The Humboldt extension introduced an additional three terms to capture information about sampling protocol in a more explicit manner:
Status | Term | Example entry |
---|---|---|
Required |
|
|
Recommended |
|
|
|
||
|
3.5.3. Absences (non-detections)
Organismal absences are defined here as the lack of detection of organisms that are members of an explicitly stated target taxonomic scope. Absence information is critical to understanding species´ biogeography, modeling species' responses to climate- and human-induced environmental change, conservation planning and resource management, monitoring and restoration efforts, eradications or reintroductions, and other aspects of biodiversity dynamics.
-
If the dataset includes absence information for one or more organisms (to be reported in the
occurrence
table as dwc:occurrenceStatus =absent
), then eco:isAbsenceReported =true
. -
A list of absent taxa can be provided using eco:absentTaxa for all relevant Events. Best practice is to use scientific names to report absent taxa.
-
Absences should only be reported for taxa within the stated taxonomic and/or organismal scope of a survey and should use scientific nomenclature.
-
Absence cannot be asserted for bycatch.
-
3.5.4. Abundance
Abundance is a quantitative measure of the same taxonomic designation in a particular area at a specific time. Abundance data are a key indicator of ecological health. They are necessary for evaluating ecological patterns and dynamics, managing invasive species, informing effective habitat and ecosystem management, and for practical tasks such as quantifying existing resource.
-
If the dataset includes any abundance information, eco:isAbundanceReported =
true
for all appropriate Events. If there is an abundance cap (that is, if there was a designated maximum value at which abundance was no longer counted), then eco:isAbundanceCapReported =true
and the value of the cap should be reported in eco:abundanceCap. -
If there is no abundance cap, then eco:isAbundanceCapReported =
false
.
Status | Term | Example entry |
---|---|---|
Recommended |
|
|
|
||
|
||
Share if available |
||
|
3.5.5. Material samples
A material sample is a physical entity ´… that represents an entity of interest in whole or in part´ (see dwc:MaterialSample). Essentially, material samples are specimens collected during a survey. A material sample may consist of an entire organism, part of an organism, or a genetic sample, or even multiple organisms not necessarily of the same taxonomic designation.
If the dataset includes at least one specimen from which a material sample was taken, for each relevant Event:
-
eco:hasMaterialSamples =
true
and -
the type(s) of materials collected should be listed in eco:materialSampleTypes.
If the dataset or Event does not include material samples, eco:hasMaterialSamples = false
.
3.5.6. Vouchers
A voucher is a physical specimen or material sample collected and accessioned into a museum collection in support of a specific project or survey.
If the dataset has vouchers, for each relevant Event:
-
eco:hasVouchers =
true
, and -
a list of institutions housing them should be shared in eco:voucherInstitutions.
If the dataset or sampling event does not include vouchers, eco:hasVouchers = false
.
3.5.7. Least specific target category quantity inclusive
The term eco:isLeastSpecificTargetCategoryQuantityInclusive indicates if the total number of organisms detected for a dwc:Taxon (including all its subgroups) is shown in one record in dwc:individualCount or the paried terms dwc:organismQuantity and dwc:organismQuantityType in the occurrence
table. This true/false (Boolean) term helps data users know if the numbers given in these terms include all organisms of that dwc:Taxon.
-
For eco:isLeastSpecificTargetCategoryQuantityInclusive to be
true
, the values shared in dwc:individualCount or dwc:organismQuantity and dwc:organismQuantityType for a single Occurrence record are inclusive of all organisms of that dwc:Taxon detected during the Event. -
For eco:isLeastSpecificTargetCategoryQuantityInclusive to be
false
, the values shared in dwc:individualCount or dwc:organismQuantity and dwc:organismQuantityType for a single Occurrence record are not inclusive of all organisms of the dwc:Taxon detected during the survey Event. This means that to find the total number of organisms detected for a given dwc:Taxon, you need to add up the dwc:organismQuantity values from multiple occurrence records within the Event.
See Guidelines for eco:isLeastSpecificTargetCategoryQuantityInclusive [tdwg2024a] for more information.
3.5.8. Data generalizations & information withheld
Although the general recommendation is to share all biodiversity data available at its highest spatio-temporal resolution, situations exist where it is necessary to generalize data prior to sharing a dataset publicly or even withhold information completely. Two terms are available to communicate if data are generalized or withheld in a dataset: dwc:dataGeneralizations and dwc:informationWithheld.
While it is the responsibility of the publisher to protect sensitive species occurrence data, it is also the data publisher´s responsibility to clearly communicate any action(s) taken and to indicate if the full data are available upon request. How you generalize sensitive data (for example, restricting the resolution of the data) depends on the species´ category of sensitivity. Where there is low risk of adverse outcomes, unrestricted publication of sensitive species data may remain appropriate. See the published guide Current Best Practices for Generalizing Sensitive Species Occurrence Data or guidance on when and how to generalize or withhold information sensitive biodiversity data [Chapman2020]. The guide is also available in French and Spanish.
Reporting data generalizations
When generalizing data you should try not to reduce the value of the data for analysis. A clear summary of the data generalization process should be reported for each relevant Event using dwc:dataGeneralizations.
For example, if the spatial resolution of locality data for an Event is reduced to the nearest half degree, then dwc:dataGeneralizations = Coordinates generalized from original GPS coordinates to the nearest half degree grid cell
for each Event to which this treatment was applied. If the location information was generalized for every survey site in a nested hierarchy, then at the parent Event level dwc:dataGeneralizations = Coordinates for each event site generalized from original GPS coordinates to the nearest half degree grid cell
.
Reporting information withheld
If specific data are not reported with the dataset, a clarifying statement should be provided with each affected Event using dwc:informationWithheld.
For example, if sensitive species data are purposefully excluded from the published data, dwc:informationWithheld should include a statement along the lines of Sensitive species occurrence information not reported
.
3.5.9. Verbatim fields
Two verbatim fields are available to provide additional information about an Event.
-
Field notes: Field notes can be copied, transcribed verbatim, or linked into dwc:fieldNotes.
-
Event remarks: Additional comments about a particular Event that don’t fit in any other term can be shared using dwc:eventRemarks.
Both fields can be applied to any Event at any level.
Status | Term | Example entry |
---|---|---|
Recommended |
|
|
|
||
|
||
Share if available |
|
|
|
||
|
||
|
||
|
||
3.6. Scope and completeness
Survey scope identifies the organisms targeted (or not targeted) during a survey. Structured reporting of explicitly stated survey scopes is necessary for evaluating and reporting completeness and is critical to understanding if the data can be used to assert absences (non-detections) of taxa.
Completeness indicates the thoroughness of a survey relative to the stated scope. Structured reporting of explicitly stated survey scopes and completeness is necessary for evaluating and reporting completeness and is critical to understanding if the data can be used to assert absences (non-detections) of taxa. Reported scope and completeness information facilitates the ability of downstream data users to interpret species populations, areas of occupancy, infer species absences, etc.
The 'target' and 'excluded' scope terms (e.g., eco:targetTaxonomicScope) presented in this section are the only Event terms designed to capture intent. That is, these terms capture the breadth of the information the biological survey intended to capture. All other terms should be used to report the actuality of the survey (e.g., what protocol was in practice implemented, what information was actually collected).
3.6.1. Verbatim scope
The complete scope explicitly identifying the full suite of stated parameters defining the breadth of a sampling Event should be reported using eco:verbatimTargetScope. eco:verbatimTargetScope is particularly useful for capturing scope conditions not covered by existing taxonomic or organismal scope terms.
Status | Term | Example entry |
---|---|---|
Recommended |
|
3.6.2. Taxonomic scope
Reporting taxonomic scope enables reliable, quantitative, and statistical interpretation of survey and monitoring data. Knowledge of taxonomic scope is essential to interpret local non-detection of taxa as local absences. The taxonomic scope, stated either as targeted or intentionally excluded taxa, should be reported using eco:targetTaxonomicScope and eco:excludedTaxonomicScope.
If every organism in the stated terms:eco[eco:targetTaxonomicScope] that was observed during an Event was reported, then eco:isTaxonomicScopeFullyReported = true
; if not, eco:isTaxonomicScopeFullyReported = false
.
Knowledge about taxonomic completeness allows data users to determine how comprehensively an area was sampled.
-
If taxonomic completeness is reported,
-
eco:taxonCompletenessReported =
reportedComplete
orreportedIncomplete
as appropriate and **the method used to assess completeness reported in eco:taxonCompletenessProtocols.
-
-
If taxonomic completeness is not reported: eco:taxonCompletenessReported =
notReported
.
If a specific person(s) or organization(s) are reported as making the taxonomic identifications relevant to the stated survey scope(s), they should be acknowledged in dwc:identifiedBy. A list of names can be be shared with values separated by a |
. It is not possible to share a list of unique identifiers such as ORCID´s at the Event level.
Status | Term | Example entry |
---|---|---|
Recommended |
|
|
|
||
Share if available |
|
|
|
||
|
||
3.6.3. Organismal scope
Why are organismal scope terms important?
As with taxonomic scope, providing information about other organismal scopes when relevant enables reliable, quantitative interpretation of survey and monitoring data and can be essential to interpreting local non-detection as local absences.
Organismal scope terms
As with taxonomic scope, providing information about other organismal scopes when relevant enables reliable, quantitative interpretation of survey and monitoring data and can be essential to interpreting local non-detection as local absences. Three categories of terms are available with which to report an explicitly stated target or excluded organismal scope, and state whether or not all target organisms observed were reported.
Any explicitly stated target or excluded organismal scopes, and clarification as to whether or not all target organisms observed were reported (true
or false
), should be indicated using the following terms:
-
Life stage: eco:targetLifeStageScope, eco:excludedLifeStageScope, eco:isLifeStageScopeFullyReported
-
Growth form: eco:targetGrowthFormScope, eco:excludedGrowthFormScope, eco:isGrowthFormScopeFullyReported
-
Degree of establishment: eco:targetDegreeOfEstablishmentScope, eco:excludedDegreeOfEstablishmentScope, eco:isDegreeOfEstablishmentScopeFullyReported
Any additional organismal scopes should be reported using eco:verbatimTargetScope.
Status | Term | Example entry |
---|---|---|
Share if available |
|
|
|
||
|
||
|
||
|
||
|
||
|
||
|
3.6.4. Bycatch
Bycatch are organisms detected during a survey that were not explicitly targeted in the scope of a study. Bycatch, or a lack thereof, in a dataset can be reported at the taxonomic and organismal levels.
If taxonomic bycatch are reported:
-
eco:hasNonTargetTaxa =
true
for all relevant Events. -
If all taxonomic bycatch (eco:hasNonTargetTaxa =
true
) captured/observed during an Event are reported in the dataset:-
eco:areNonTargetTaxaFullyReported =
true
, and -
a list of taxonomic bycatch should be shared in eco:nonTargetTaxa using scientific nomenclature. Entries in a list should be separated by a
|
.
-
If organismal bycatch are reported:
-
eco:hasNonTargetOrganisms =
true
at all relevant Event levels.
If the dataset does NOT include taxonomic or organismal bycatch:
-
eco:hasNonTargetTaxa =
false
for all relevant Events and -
eco:hasNonTargetOrganisms =
false
for all relevant Events.
Status | Term | Example entry |
---|---|---|
Share if available |
|
|
|
||
|
||
|
3.6.5. Habitat scope
If the survey includes an explicitly stated targeted or excluded habitat scope these can be reported in eco:targetHabitatScope and eco:excludedHabitatScope.
The actual habitat observed at a survey site during an Event should be reported in dwc:habitat.
Status | Term | Example entry |
---|---|---|
Share if available |
|
|
|
3.7. Sampling Effort
Sampling effort communicates information about the likelihood that a type of organism were be detected: greater effort generally equals a higher probability of detection. Clear reporting of sampling effort is necessary for interpretation of measures of completeness and calculation of abundance (relative or absolute) or biomass, and is critical in assessing the ability to compare information and aggregate data across studies.
The DwC Event term dwc:samplingEffort is currently a recommended field when publishing Event datasets to GBIF; however, this term captures sampling effort in an unstructured way. The Humboldt extension includes 5 terms to more explicitly capture different aspects of sampling effort. The updated recommended best practice is to report sampling effort information as structured data using the Humboldt Extension terms. Through these terms, data providers may explicitly provide the following information:
-
Is sampling effort reported?: Indicate if sampling effort is reported (
true
orfalse
) in eco:isSamplingEffortReported. -
Sampling effort protocol: eco:samplingEffortProtocol should contain a textual description of the sampling effort protocol (e.g., number and arrangement of people or sensors deployed, whether or not sensors were mobile or stationary, how frequently observation, measurements, or samples were taken) and/or provide a link to the protocol used.
-
Sampling effort: report sampling effort (e.g., the total amount of time of the sampling Event, the total numer of people involved) value and units (e.g., trap nights, people) using the paired terms eco:samplingEffortValue and eco:samplingEffortUnit.
-
Sampling performed by: eco:samplingPerformedBy should be used to credit the people involved in the sampling eventSampling effort. The names or one or more people can be reported, with individual names in a list separated with
|
. Best practice is to use a unique identifier (e.g., ORCID) if available.-
NOTE: Because eco:samplingPerformedBy has an IRI (internationalized resource identifier) equivalent, only a single ORCID can be provided (the term cannot support a list). If more than one ORCID needs to be shared, a list of ORCID´s (using the pipe separator between values) can be supplied using the term dwc:recordedByID used BUT it must be applied to each relevant Occurrence and located on the
occurrence
table.
-
Status | Term | Example entry |
---|---|---|
Recommended |
|
|
|
||
|
||
|
||
|
||
|
Appendix A: Additional guidance and seeking assistance
Additional DwC Event terms
While all Humboldt extension terms are covered in this guide, the Darwin Core Event terms included are not exhaustive. The full suite of available DwC Event terms that can be applied to a DwC-A Event dataset can be found in the GBIF Repository of Schemas Darwin Core Event page.
Need more information?
Check out the following documentation:
Or, reach out for assistance from:
-
Humboldt Extension GitHub repository: questions about usage, issues with the vocabulary, and recommendations for new terms should be reported as an Issue.
-
The GBIF Node for your country or organization
-
If your country or organization is a member of GBIF and has an established node, you can reach out directly to your node.
-
If you’re uncertain if your country or organization is part of the GBIF network you can search here.
-
-
If your country or organization is not a member of GBIF, reach out to the GBIF helpdesk.
-
-
GBIF help desk
-
Create an issue on the GitHub tech-docs project.
-
Send an email to the GBIF helpdesk.
-