Handling data quality

Determining the precision and accuracy of the data for use is a key step in determining the usefulness of the data for any intended purpose. While GBIF can support the identification of some quality issues that arise from within the data publishing workflow, handling some quality issues requires additional expert knowledge. The two most common issues for which this may be required are:

  • Data gaps - sampling across taxonomic groups and geographic regions is not equal and users may need to take into account sampling bias in their analyses before the data can be used effectively.

  • Taxonomic misidentification - some taxonomic groups may require additional information to ensure that taxa have been correctly identified such as images, videos and audio recordings that accompany data or collector information.

GBIF Flags for Data Quality Issues

During the indexation process, GBIF assigns issues and flags to data for common data quality issues. These most frequently occur from data entry errors or missing data fields whose interpretation can be automated centrally by GBIF. These interpretations are classified as

  • Excluded - where the original data couldn’t be interpreted, so is excluded in the interpreted fields.

  • Altered - where the original data is modified in the interpretation process to be indexed in GBIF.org.

  • Inferred - where an empty field is inferred using other record information.

Be aware that if you are filtering for data quality issues, you should reverse the filter to exclude those data that have been flagged with that issue. You can also see the verbatim data i.e. the non-interpreted data in a Darwin Core Archive if you would like to validate the interpretation process.

How can I improve data quality?

Data publishers have the responsibility for improving the quality of the data, and as a user, you play a key role in identifying where there are errors. If you should find an error in the data, you should contact the publisher directly using the contact details that GBIF provides on the publisher page. GBIF also provides the ability for users to log data quality issues using the "Feedback and questions" button on the menu bar of GBIF.org.