Skip to content Skip to footer

Provenance: Human clinical and health data

W3C PROV

W3C PROV is a general purpose standard for provenance information. The standard suggests expression of provenance in terms of entities, activities, agents, and their mutual relations. The standard’s data model is realized in different serializations, including the PROV-O ontology, which have been extended for various domains.

In addition to the PROV primer, the PROV Book gives a detailed introduction to using PROV.

HL7 FHIR Provenance

HL7 FHIR is an interoperability standard for healthcare information exchange between systems. FHIR aims to define the key entities involved in healthcare information exchange as resources.

FHIR provides support for expression of provenance information of resources. Provenance of a resource is “a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource”, and “tracks information about the activity that created, revised, deleted, or signed a version of a resource, describing the entities and agents involved”.

The provenance part of HL7 FHIR extends W3C PROV.

The Common Provenance Model

The Common Provenance Model (CPM) is an extension of W3C PROV that aims to provide support for the integration of provenance information from heterogeneous environments. In particular, it provides guidelines for the representation of domain-independent provenance information (provenance backbone), to which domain-specific provenance information can be attached in a prescribed way.

The CPM forms a conceptual foundation for the ISO standard series ISO 23494 Provenance information model for biological specimen and data. The ISO standard is still in an early phase of its development.

RO-Crate

Research Object Crate (RO-Crate) is a lightweight implementation of a FAIR Digital Object, which is able to pack data together with its metadata into a Research Object. It is based on Linked Data standards including Schema.org and JSON-LD, but can be written and consumed as regular JSON.

The RO-Crate specifications can be used to form different RO-Crate profiles, which are suitable for various domains and use cases. While the base specifications already contain some guidelines on representing the provenance of data entities included in the crate, some contexts require a more detailed description to enhance traceability and reproducibility. To meet this demand, several provenance-oriented RO-Crate profiles are being developed:

  • The Workflow Run RO-Crate working group is developing a collection of profiles to describe the execution of computational workflows. The profiles define provenance descriptions at different granularity levels, from “black box” (only workflow-level inputs, outputs and parameters are considered) to step-by-step rundown.

  • The CPM team, with the help of the RO-Crate community, is developing an RO-Crate profile for representing CPM-compliant provenance and meta-provenance in an RO-Crate.

Support for RO-Crate provenance reporting is being added or is planned to be added to several workflow engines, including Galaxy, Common Workflow Language (CWL), Snakemake, StreamFlow, Sapporo WES, COMPSs, WfExS.

More information

RDMkit is the Research Data Management toolkit for Life Sciences describing best practices and guidelines to help you make your data FAIR (Findable, Accessible, Interoperable and Reusable)

Skip tool table
Tool or resource Description Related pages Registry
Common Workflow Language (CWL) An open standard for describing workflows that are build from command line tools Standards/Databases Training
COMPSs COMP Superscalar (COMPSs) is a task-based programming model which aims to ease the development of applications for distributed infrastructures, such as large High-Performance clusters (HPC), clouds and container managed clusters. Tool info
Galaxy Open, web-based platform for data intensive biomedical research. Whether on the free public server or your own instance, you can perform, reproduce, and share complete analyses. Human biomolecular data Tool info Training
Research Object Crate (RO-Crate) RO-Crate is a lightweight approach to packaging research data with their metadata, using schema.org. An RO-Crate is a structured archive of all the items that contributed to the research outcome, including their identifiers, provenance, relations and annotations. Standards/Databases
Sapporo WES Implementation of Workflow Execution Service (WES) or so-called Workflow-as-a-Service. Tool info
Schema.org Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. Standards/Databases Training
Snakemake Snakemake is a framework for data analysis workflow execution Human biomolecular data Tool info Training
StreamFlow Container-native workflow manager for hybrid infrastructures
WfExS Workflow Execution Service Backend (WfExS-backend) is a high-level orchestrator to run scientific workflows reproducibly.
Contributors