Geospatial Filters & Issues

The data can be filtered spatially in an occurrrence search in one of 3 ways:

  • Country or area/Continent - data is filtered by country and will include data within the Exclusive Economic Zone (EEZ)

  • Administrative area - this filter uses the GADM database https://gadm.org/data.html of administrative areas for all countries in the world to allow for GBIF removes common geospatial issues by default if you choose to have data with a location.

  • Location - this filter allows you to filter for data with coordinates and/or draw your own polygon shape filters or use a GeoJSON file to delimit your own shape filter. If you filter for those data with coordinates, a number of geospatial issues associated with the data publishing workflow will be eliminated. These are:

    • Zero Coordinates- Coordinates are exactly (0,0) or what is sometimes called "null island". Zero-zero coordinate is a very common geospatial issue. GBIF removes (0,0) when hasgeospatialissue is set to FALSE.

    • Country coordinate mis-match - Data publishers will often supply GBIF with a country code (US,TW,SE,JP…). GBIF uses the two letter ISO 3166-1 alpha-2 coding system - https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2. When a point does not fall within the country’s polygon or EEZ, but says that it should occur within the country, it gets flagged as having “country coordinate mis-match” and will be removed if data are filtered for locations.

    • Coordinate invalid - If GBIF is unable to interpret the coordinates i.e. the coordinates.

    • Coordinate invalid - The coordinates are outside of the range for decimal lat/lon values -90,90), (-180,180.

Country centroids

Country centroids are where the observation is pinned to the centre of the country instead of where the taxon was observed or recorded. Country centroids are usually records that have been retrospectively given a lat-lon value based on a textual description of where the original record was located. Geocoding software uses gazetteers, geographical dictionaries or directories used in conjunction with a map or atlas, to attribute coordinates to place names. So, if the record simply says “Brazil”, some publishers will put the record in the center of Brazil. Similarly if the record simply says “Texas” or “Paris” the record will go in the center of those regions. This is almost exclusively a feature of museum data (PRESERVED_SPECIMEN), but it can also happen with other types of records as well.

Identifying country centroid data is currently not possible using GBIF filters, however, the R package CoordinateCleaner can be used for identifying and filtering for country centroids.

Points along the equator or prime meridian

Some publishers consider zero and NULL to be equivalent so that empty latitude and longitude fields for a record are given a zero value. As a result, records end up being plotted along the equator and prime meridian lines.

Uncertain location

Often you will want to be sure that the coordinates give a certain location and are not really 1000s of km away from where the organism was observed or collected. There are two fields - coordinate precision and coordinateUncertaintyInMeters - in Darwin Core that you get with a SIMPLE CSV download. that you can use to filter by “uncertainty”. However, these fields are not used very often by publishers who feel that their records are fairly certain (from a GPS) and we would recommend not filtering out missing values.

There are also a few “fake” values for coordinate uncertainty that you should be aware of. These values are errors produced by geocoding software and do not represent real uncertainty values. These "fake" values are 301, 3036, 999 and 9999. In the case of the value 301, the uncertainty is often much-much greater than 301 and actually represents a country centroid.