Skip to content Skip to footer

Human clinical and health data: Provenance


W3C PROV is a general purpose standard for provenance information. The standard suggests expression of provenance in terms of entities, activities, agents, and their mutual relations. The standard’s data model is realized in different serializations, including the PROV-O ontology, which have been extended for various domains.

In addition to the PROV primer, the PROV Book gives a detailed introduction to using PROV.

HL7 FHIR Provenance

HL7 FHIR is an interoperability standard for healthcare information exchange between systems. FHIR aims to define the key entities involved in healthcare information exchange as resources.

FHIR provides support for expression of provenance information of resources. Provenance of a resource is “a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource”, and “tracks information about the activity that created, revised, deleted, or signed a version of a resource, describing the entities and agents involved”.

The provenance part of HL7 FHIR extends W3C PROV.

The Common Provenance Model

The Common Provenance Model (CPM) is an extension of W3C PROV that aims to provide support for the integration of provenance information from heterogeneous environments. In particular, it provides guidelines for the representation of domain-independent provenance information (provenance backbone), to which domain-specific provenance information can be attached in a prescribed way.

The CPM forms a conceptual foundation for the ISO standard series ISO 23494 Provenance information model for biological specimen and data. The ISO standard is still in an early phase of its development.


RO-Crate is a lightweight implementation of a FAIR Digital Object, which is able to pack data together with its metadata into a Research Object. It is based on Linked Data standards including and JSON-LD, but can be written and consumed as regular JSON.

The RO-Crate specifications can be used to form different RO-Crate profiles, which are suitable for various domains and use cases. While the base specifications already contain some guidelines on representing the provenance of data entities included in the crate, some contexts require a more detailed description to enhance traceability and reproducibility. To meet this demand, several provenance-oriented RO-Crate profiles are being developed:

  • The Workflow Run RO-Crate working group is developing a collection of profiles to describe the execution of computational workflows. The profiles define provenance descriptions at different granularity levels, from “black box” (only workflow-level inputs, outputs and parameters are considered) to step-by-step rundown.

  • The CPM team, with the help of the RO-Crate community, is developing an RO-Crate profile for representing CPM-compliant provenance and meta-provenance in an RO-Crate.

Support for RO-Crate provenance reporting is being added or is planned to be added to several workflow engines, including Galaxy, CWL, Snakemake, StreamFlow, Sapporo WES, COMPSs, WfExS.

More information

Links to other ELIXIR resources

Relevant tools and resources

Skip tool table
Tool or resource Description Related pages Registry
Galaxy Open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses. An automated SARS-CoV-2 genome surveillance system built around Galaxy Tool info Training
Common Workflow Language (CWL) An open standard for describing workflows that are build from command line tools Standards/Databases Training
Snakemake Snakemake is a framework for data analysis workflow execution Tool info Training
StreamFlow Container-native workflow manager for hybrid infrastructures
Sapporo WES Implementation of Workflow Execution Service (WES) or so-called Workflow-as-a-Service. Tool info
COMPSs COMP Superscalar (COMPSs) is a task-based programming model which aims to ease the development of applications for distributed infrastructures, such as large High-Performance clusters (HPC), clouds and container managed clusters. Tool info