Selecting covariates (or environmental variables)

More environmental data isn’t always better. You want to balance to achieve a balance between the number of data points and the number of environmental variables so that you do not overfit you model. When selecting variables we want to be sure that:

  • our variables are biologically relevant - they should reflect the species of study’s biology e.g. solar radiation my not be a relevant environmental variable for soil dwelling species

  • our variables are not highly correlated - for instance, if we take the two variables: elevation and temperature. Temperature is not independent of elevation so we may want to remove one of these variables. In this instance, elevation would be preferably removed as it is more accurately measured.

  • we do not use all 19 Bioclim variables

Importantly, spatio-temporal resolution and covariate data extent should align with:

  • the limitations of other input data (e.g., available usable occurrence data)

  • the scope of the base question(s)/hypotheses

For example, if your environmental data have a spatial resolution of 10 Arc Minutes and a temporal resolution between 1955 and 2006, then the temporal and spatial resolution of the GBIF-meadited data you are going to use should correspond to those resolutions.