Human clinical and health data: Provenance
W3C PROV is a general purpose standard for provenance information. The standard suggests expression of provenance in terms of entities, activities, agents, and their mutual relations. The standard’s data model is realized in different serializations, including the PROV-O ontology, which have been extended for various domains.
In addition to the PROV primer, the PROV Book gives a detailed introduction to using PROV.
HL7 FHIR Provenance
HL7 FHIR is an interoperability standard for healthcare information exchange between systems. FHIR aims to define the key entities involved in healthcare information exchange as resources.
FHIR provides support for expression of provenance information of resources. Provenance of a resource is “a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource”, and “tracks information about the activity that created, revised, deleted, or signed a version of a resource, describing the entities and agents involved”.
The provenance part of HL7 FHIR extends W3C PROV.
The Common Provenance Model
The Common Provenance Model (CPM) is an extension of W3C PROV that aims to provide support for the integration of provenance information from heterogeneous environments. In particular, it provides guidelines for the representation of domain-independent provenance information (provenance backbone), to which domain-specific provenance information can be attached in a prescribed way.
The CPM forms a conceptual foundation for the ISO standard series ISO 23494 Provenance information model for biological specimen and data. The ISO standard is still in an early phase of its development.
RO-Crate is a lightweight implementation of a FAIR Digital Object, which is able to pack data together with its metadata into a Research Object. It is based on Linked Data standards including schema.org and JSON-LD, but can be written and consumed as regular JSON.
The RO-Crate specifications can be used to form different RO-Crate profiles, which are suitable for various domains and use cases. While the base specifications already contain some guidelines on representing the provenance of data entities included in the crate, some contexts require a more detailed description to enhance traceability and reproducibility. To meet this demand, several provenance-oriented RO-Crate profiles are being developed:
The Workflow Run RO-Crate working group is developing a collection of profiles to describe the execution of computational workflows. The profiles define provenance descriptions at different granularity levels, from “black box” (only workflow-level inputs, outputs and parameters are considered) to step-by-step rundown.
The CPM team, with the help of the RO-Crate community, is developing an RO-Crate profile for representing CPM-compliant provenance and meta-provenance in an RO-Crate.
Support for RO-Crate provenance reporting is being added or is planned to be added to several workflow engines, including Galaxy, CWL, Snakemake, StreamFlow, Sapporo WES, COMPSs, WfExS.
Relevant tools and resourcesSkip tool table
|Tool or resource||Description||Related pages||Registry|
|Galaxy||Open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses.||An automated SARS-CoV-2 genome surveillance system built around Galaxy||Tool info Training|
|Common Workflow Language (CWL)||An open standard for describing workflows that are build from command line tools||Standards/Databases Training|
|Snakemake||Snakemake is a framework for data analysis workflow execution||Tool info Training|
|StreamFlow||Container-native workflow manager for hybrid infrastructures|
|Sapporo WES||Implementation of Workflow Execution Service (WES) or so-called Workflow-as-a-Service.||Tool info|
|COMPSs||COMP Superscalar (COMPSs) is a task-based programming model which aims to ease the development of applications for distributed infrastructures, such as large High-Performance clusters (HPC), clouds and container managed clusters.||Tool info|