Exercise tips
Validation checks
Technical errors Relatively simple, often able to be automated, checks against the integrity of the data. These may indicate incorrect exports, data mapping, field slippage (e.g. moving 1 column to the right) or data missing at the source.
-
Completeness: Whether all the data and metadata is available – are all fields present, are all fields filled out?
-
Bounds: For example, are days given in the range 1-31 (depending on month)
-
Data type: For example, does the Date field contain a date or a number?
-
Data format: For example, are Dates provided as 01/01/2010 or 01/Jan/10?
Consistency errors
Application of real-world rules to the data. These may indicate incorrect data entry from older records, transcription errors or post processing. Some are complex to implement and require reference data sets to check against. E.g. a list of known collectors and collecting habits. These rules can be gathered from data users and analysts.
-
Taxonomic: For example, if identified to species level, have a binomial scientific name and entries in genus and species fields been provided?
-
Currency: Are dates of collection, identification, update and digitization consistent?
-
Outliers: Detect outliers, but remember that not all outliers are necessarily errors. For example, compare against a known species range, or known environmental range (but remember that outliers may be misidentifications, rather than incorrect coordinates).
-
Geographic: Are the coordinates within the identified locality or region? For example, are there any terrestrial occurrences in the sea or marine occurrences on land?
-
Collecting patterns: Does the occurrence detail match the known collecting patterns of the organization or collector? Do any records appear to have been created after a collector has died (could this possibly be a different collector with a similar name)? For example, are any mammal records attributed to a bird watching group?
-
Accuracy and precision: For example, are any georeferenced records indicating very high precision or accuracy from a pre-GPS (or pre-accurate GPS) collecting period?
-
Collecting methods: Different survey methods (e.g. transects and area surveys) have particular characteristics. Are the records consistent with the method provided?
Helpful tools
-
GBIF Name Parser: https://www.gbif.org/tools/name-parser
-
Global Names Resolver: http://resolver.globalnames.org
-
Catalogue of Life name match: https://data.catalogueoflife.org/tools/name-match
-
Georeferencing Calculator: http://georeferencing.org/georefcalculator/gc.html
-
Canadensys coordinate conversion: http://data.canadensys.net/tools/coordinates
-
Canadensys date parsing: http://data.canadensys.net/tools/dates
-
Google Maps: https://maps.google.com/