Data management rubric

Data management

Skills Beginning performance 1 Developing performance 2 Accomplished performance 3 Outstanding performance 4

A. Capacity to assess the quality (i.e. identify issues and their types) of a biodiversity dataset.

Only uses visual checks to analyse quality. Cannot differentiate between types of errors. Can detect missing values in required fields and severe data inconsistencies.

Can only use very basic techniques (e.g. sorting) to analyse data quality. Can detect mismatches between field names and content. Can consistently identify technical errors, but only the most typical consistency errors in a dataset.

Can use specific tools and techniques to assess quality. Recognizes the minimum level of disaggregation/normalization needed for common use and publishing. Can consistently identify technical errors and most of the consistency errors in a dataset.

Uses a systematic approach to dataset analysis covering all major data domains. Can consistently identify both technical and consistency errors in a dataset. Can use other sources of data (e.g. metadata or other datasets) to identify or infer consistency errors in a dataset.

B. Capacity to perform data format correction.

Can only make corrections manually in the tables. Shows generic knowledge about use of format types in digital data (e.g. dates, strings, numbers)

Can identify at least one specific tool to automatically correct format errors, but can only use it in specific cases. Otherwise, uses simple mechanisms (e.g. ‘find & replace’) to solve issues.

Can use at least one tool to automatically correct format errors.

Can use advanced features of more than one tool to correct format errors.

C. Capacity to perform nomenclatural data correction.

Can only make corrections manually in the tables. Only uses personal knowledge of known taxonomic groups.

Can identify at least one specific tool to automatically correct nomenclatural errors, but can only use it in specific cases. Otherwise, uses simple mechanisms (e.g. ‘find & replace’) to solve issues.

Can use at least one tool to automatically correct nomenclatural errors. Can find and use suitable reference nomenclatural information for the taxonomic groups with which (s)he usually works.

Can use more than one tool to correct nomenclatural errors. Can find and use suitable reference nomenclatural information for taxonomic groups outside of his/her areas of expertise.

D. Capacity to perform geographical data correction.

Can only make corrections manually in the tables. Only uses personal knowledge of known geographical areas.

Can identify at least one specific tool to map and/or automatically correct errors in geographical information, but can only use it in specific cases. Otherwise, uses simple mechanisms (e.g. ‘find & replace3’) to solve issues.

Can use at least one tool to map and/or automatically correct errors in geographical information. Can find and use suitable reference geographical information in a suitable format for the areas with which (s)he usually works.

Can use more than one tool to map and/or automatically correct errors in geographical information. Can find and use reference geographical information in a suitable format for areas outside of his/her areas of expertise.

E. Capacity to use specific software (e.g. OpenRefine) as tools for data cleaning.

Can identify at least one data cleaning tool. Can identify the main features of a data cleaning tool (e.g. OpenRefine).

Can identify multiple data cleaning tools. Can use one or a few of the basic features of data cleaning software to clean a dataset (e.g. create an OpenRefine project, use faceting, filtering, clustering or reconciling).

Can use all the basic features of a data cleaning software to clean a dataset (e.g. in OpenRefine: faceting, filtering, clustering, reconciling).

Can use the advanced features of one or more data cleaning software packages to clean datasets (e.g. in OpenRefine: use API, regular expressions, Google Refine Expression Language).

F. Capacity to document data transformation procedures.

Seldom describes any changes made while curating, formatting, or transforming data.

Describes changes made most of the time. Doesn’t describe changes consistently or fully (e.g. describes the change, but not the author).

Always remembers to describe changes made. Always describes changes consistently, so that all edits of the same type can be easily identified.

Can accurately and consistently describe changes made in a repeatable way.