Duplicates

Duplication of records can occur when several records of the same individal are made. This can occur from for instance, a researcher depositing several specimens from an individual tree in herbaria around the world who all then publish these data on GBIF, or when an individual has been deposited in a natural history collection and the indidivual was also sampled for its DNA. In this instance, there will be a record for the specimen in the collections and one for the DNA sequence.

GBIF has recently introduces a clustering function in its advanced search that allows users to identify clusters of records i.e. records that appear to be derived from the same source. This allows users to identify potential duplicated data and filter for these out of your download. Note that if you filter out those records that are in a cluster, you will lose all records found within that cluster and will lose potentially useful data. The filter may be better used to indicate the extent to which there is duplication in the dataset, or for indepedent donwloads of the clustered and non-clustered datasets for comparison.