Introduction
Before initiating your research, it is imperative to evaluate the ethical, legal, and social implications (ELSI)—also known as ethical, legal, and social aspects (ELSA). Ethical and social considerations relate to whether the study should be done, whilst legal considerations relate to whether the study can be done. This page offers guidance on how to comprehensively assess the ELSI aspects of your work throughout all stages of data management, from study planning and data collection to processing, analysis, sharing, and archiving. This guide focuses specifically on human-related research. While infectious disease studies involving animals also involve their own ethical and legal issues, these are not covered in this guide. If you are interested in reading more about it, a starting point can be this European Commission guide.
General considerations
ELSI has implications for all the steps in the data lifecycle. This session briefly lists the implications that apply to all, and the following sessions provide deeper considerations for each lifecycle step.
Considerations
- Personal data: It is important to determine whether the data you will collect is considered personal or not. If your data is considered personal data, then the General Data Protection Regulation (GDPR) applies in Europe. According to article 4.1 of GDPR, personal data can be defined as any information that pertains to an identified or identifiable natural person. This includes any person that could, directly or indirectly, be identified by an element of the data, such as their name, location, online identifier or any other factors relating to e.g. their health or socioeconomic status.
- If the study involves personal data, the relevant legal and ethical requirements for collecting and working with that data must be met. During planning, it is important to:
- Identify types of personal data
- Determine the roles and responsibilities of actors
- Get the legal basis to process the data (consent, public interest etc)
- Check whether there are regulatory submissions and approvals needed
- Develop a Data Management Plan (DMP)
- Consider contractual arrangements
- Assess risks associated with data processing, analysis, sharing and archiving activities
- Discuss potential technical/organisational requirements to protect the data with the Data Protection Officer (DPO)
- Check whether any specific pieces of legislation/ethical recommendations apply to the field
Existing approaches
- Refer to RDMkit’s information on:
- GDPR compliance when working with personal data
- Data sensitivity to assess the sensitivity of your data
- Data security to gain an understanding of how to handle data securely
- Develop a Data Management Plan. You can find examples in RDMkit or DMEG.
Planning a study
Considerations
- Informed Consent: Typically, obtaining informed consent from patients or individuals included in the study is essential when handling personal data. Obtaining informed consent should thus be a standard procedure in such cases. Nonetheless, alternative justifications, such as public interest in public health, may override the requirement for consent. More information on the legal basis for processing personal data is specified in Articles 6 and 9 of General Data Protection Regulation (GDPR)[ gdpr.eu/article-9-processing-special-categories-of-personal-data-prohibited/].
- Obtain consent: When conducting research that requires informed consent, it is crucial to have a well-defined strategy for securing such consent. This plan should account for any unique circumstances pertinent to the study. For instance, one may need to determine whether broad consent is appropriate or feasible for the research objectives. Additionally, in situations where participants are unable to provide consent themselves, such as patients in an induced coma, one should consider alternative methods of consent acquisition, which may include obtaining consent retrospectively. Ensuring that consent encompasses any secondary use of data that was previously collected safeguards against unauthorised use.
- Identify populations: Identifying the participants that will be involved in the study is a foundational step. This includes understanding the origin of the study population, the characteristics that define them, and the specific individuals who make up the cohort. One should pay particular attention to vulnerable populations; it’s important to clarify how consent will be obtained and to assess if any ethical, legal, and social implications (ELSI) will emerge, including potential conflicts of interest. In the context of infectious diseases, the possibility of stigmatisation, such as with HIV, must be carefully considered.
- Data management plan: A meticulously formulated and comprehensive DMP will be a critical instrument for outlining the whole data life cycle, encompassing processes, requirements, and roles. This dynamic document, designed to adapt over time, must encapsulate all pertinent details, such as data collection, processing, analysis, storage, and archiving methodologies. It should also outline the procedures for obtaining consent and withdrawal, data anonymisation strategies, data ownership and copyright considerations, and mechanisms for regulating data access, complete with ethical and legal guidelines for disseminating personal data. Furthermore, the plan must articulate the obligations associated with data stewardship, e.g. distinguishing between roles such as data processors and controllers and their respective legal ramifications. If needed, it should also specify the formation and mandates of ethical committees, detailing their essential functions and accountabilities.
- Types of data: One should consider different data types and whether they actually fall under GDPR in an infectious disease study. For instance, clinical-epidemiological data and human sequences are concerned by GDPR, while pathogen genomic sequences and serological data are not subject to GDPR.
- Distributed data: When dealing with data distributed across multiple countries or outside the original jurisdiction, it’s essential to consider the inherent limitations. The location of the data significantly influences the permissions required for access and sharing. Navigating through the complexities of international data laws requires a clear understanding of the regulatory environment in each country involved. Obtaining approval often entails a thorough compliance process with these regulations to facilitate data sharing, ensuring that all data handling practices meet the necessary legal standards.
- Legal basis: When the study deals with personal data, one should consider the legal basis for processing data. Different legal bodies are responsible for components of personal data that fall outside of GDPR. There is no overarching approach at the EU level for public health data (Article 168 TFEU). However, there are specific areas in which the EU Commission or specific institutions have responsibility.
- Regulations: Different funding bodies, institutions, and countries can have varying regulations that the researchers must adhere to. It is important to consider checklists and guidelines relevant to one’s particular case.
- Regulatory Evolution: As regulations and laws are subject to change over time, it is crucial to establish an ongoing monitoring and evaluation mechanism to ensure that chosen strategies remain current and compliant. Some data (e.g. genomic data) may be subject to intellectual property, which may restrict the data access and subsequent study and management steps.
Existing approaches
- Generally, the principles of ELSI are anchored in four fundamental pillars. These pillars serve as a constant reminder and guide for best practices in human-data-centred research.
- Autonomy – respect for the patient’s right to self-determination
- Beneficence – the duty to ‘do good’
- Non-maleficence – the duty to ‘not do bad’
- Justice – to treat all people equally and equitably
- Creating a Data Management Plan that lists all the ELSI bodies that need approval. This can be used as a checklist to determine whether everything that is needed is considered, including how to collect consent. In any case, the institution’s Data Protection - Officer (DPO) is the person to refer to when considering the ethical and legal aspects of data management.
- Checking the restrictions related to data movement (e.g. between entities, universities, and countries). This can be done by consulting the institution/institutions that hold the data.
- Checklists and information related to ethics from different entities:
- For Ethical Principles for Medical Research Involving Human Subjects (includes human disease), see WMA Declaration of Helsinki
- For EU funded projects - consult the EU commission ethics checklist
- For information related to access for genetic resources for biotechnology research, see the Nagoya Protocol
- For information related to health research related to humans, see the Council for International Organizations of Medical Sciences (CIOMS) guidance
- For information on issues pertaining to work data on indigenous populations, see the CARE principles
- For information on data sequence information, see Hartman Scholz et al. (2022)
- Institutional rules, institutional review boards, and ethics committees should be consulted for regulations related to ethics, and also for gaining additional consents. e.g. for secondary use, non-disclosure agreements (NDAs), and dual use (i.e. both military and civilian use), and other safeguards
- Checking the legal frameworks of the institution/commission(s) for the area(s) of interest. While these might not have rules that are directly applicable, it is worth possibly referring to how the work relates to the following mandates:
- The European Health Data Space (EHDS) is an EU Regulation approved in April 2024, aiming to empower individuals through better access to their health data, supporting the free movement of health data with people and setting up rules for the use of health data for research, innovation, policy making (called secondary use of health data). These rules for secondary health data use are directed at Member State and EU levels rather than individual researchers. Yet, one might want to refer to the EHDS, since it is a central pillar of the European Health Union.
- The European Centre for Disease Prevention and Control (ECDC) is an EU agency responsible for coordinating the collection, quality, analysis, and dissemination of EU-level data on infectious diseases.
- The European Health Emergency Response Authority (HERA) is responsible for improving preparedness and establishing countermeasures for cross-border threats. Created after the COVID pandemic, its mandate includes ensuring the availability of and access to key medical countermeasures.
- Ensuring that any restrictions have been considered on e.g. genomic data due to intellectual property, which can restrict the access to the data access and sharing, and complied with these requirements.
- Implementing a continuous monitoring system is imperative to ensure adherence to all relevant protocols and regulations. This includes regularly reviewing the terms of your grant agreement, consulting with your institution’s designated ethics authority or advisor, coordinating with legal partners to stay abreast of any changes, and verifying compliance with the rules set forth by your institution. They can advise you on harmonising legal needs and requirements across jurisdictions.
- Emergency response protocols - processing of data for the public interest. Early in the COVID-19 pandemic, the European Data Protection Board (EDPB) released a statement with a framework and regulations to ensure data protection in data access, processing, and distribution and recommending regulations. In summary, although the fight against infectious diseases should be supported in the best way possible, personal data must be protected even in times of crisis.
Data Collection
Consideration
- Assessing whether the informed consent has been obtained for the initial aims of data collection and any subsequent use of the data. In specific scenarios, obtaining re-consent from the subjects may be imperative. For instance, a clinical study may have been designed for the primary usage of the data collected. However, the same data can also be used (secondary/subsequent use of data) for a research study. Extracting further details from medical records or handling data withdrawal requests from patients or individuals are other examples of subsequent but frequently unforeseen usages of data collection that need to appear in the strategy to secure all these consents.
- Pathogen data may be the least likely to include personal data, however, in the processing steps, one needs to ensure that obviously personal data are removed (see Data description - Pathogen characterisation page), and the consent to any secondary usage needs to be obtained as well.
- Carefully evaluating the volume of data that is being gathered. It is essential to collect all necessary data while avoiding accumulating superfluous information that is not pertinent to the study.
- Using standardised data collection methods from the beginning to ensure the data can be harmonised effectively, facilitating seamless integration and analysis, and data FAIRness.
Existing approaches
- Refer to Data Management Guide DMGE for the methods and examples on how to obtain informed consent, right to withdraw, etc.
- Refer to the following IDTk paragraphs for data harmonisation:
- When looking for solutions to standards, schemas, ontologies and vocabularies, you can check this documentation.
- FAIRsharing is also a good resource to find metadata standards that can be useful for your research.
Processing
Consideration
- Determining whether the data needs to be anonymised and, if so, to what level (e.g. pseudonymised vs fully anonymised).
- Anonymisation means a complete and irreversible removal of direct and indirect identifiers of data subjects.
- Achieving absolute complete anonymisation — where there is no potential for future re-identification—is basically impossible. Even data that was initially anonymised may become susceptible to re-identification over time due to possible evolutions of the technology or through integration with other datasets.
- Pseudonymisation involves assigning alternative identifiers to data subjects, with the original linking key securely stored and inaccessible to data users. While this method reduces the risk of direct identification, the possibility of re-linkage remains, presenting a latent risk of subject identification.
- Considering also the European Health Data Space Regulation, which refers to anonymisation and pseudonymisation for secondary use of health data. It foresees that health data should only be shared for research in anonymised format (though directed primarily at Health Data Access Bodies).
- Synthetic data (i.e. artificially generated data that is based on actual data) can be used as an alternative in some cases:
- The advantages of using synthetic data are that (a) it is seen as ‘more anonymous’ as there are no real data subjects in the anonymised data, and (b) it is relatively easy to generate with Large Language Models.
- The disadvantage is that it is not fully representative of the original dataset.
- Keeping in mind that changes to the data processing workflow may change the roles and responsibilities under GDPR.
Existing approaches
- Methods of anonymisation:
- A number of anonymisation tools (e.g. Amnesia) are available, based on anonymisation algorithms.
- Most anonymisation algorithms rely on the principle of k-anonymity:
- Transforms data in a way that means that each subject cannot be distinguished from k-1 other data subjects in the data set.
- The main advantages of this are that (a) there is a built-in reidentification threshold, at worst, the data subject can be identified as part of a subgroup of k-1 individuals, (b) it is easy to understand and broadly used, with some proof of concept available.
- The main disadvantages are (a) there is a loss of some data granularity, as it involves the generalisation of certain data points, (b) it is difficult to use for data with very diverse variables, as you lose part of the dataset for very unique profiles, (c) it is difficult to choose the value of k, with higher values leading to more data is likely to be lost, (d) disclosure about whether a given data subject is a member of the database is possible in the event that an attacker knows details about the data subject.
- To determine whether your data is anonymised or pseudonymised, you need to assess how easy it is to identify and by whom. If one is unsure, it is typically safest to assume that the data is pseudonymous, to which GDPR applies. The following blog posts and publications explain judgements and implications about anonymisation vs pseudonymisation:
- The Health and Ageing Law Lab (HALL) at Vrije Universiteit Brussel (VUB)
- KU Leuven
- Mourby et al. (2018)
- Rumbold & Pierscionek (2017).
- One should note that so far all anonymisation techniques reduce data quality, which might affect data usability.
- Guidance related to how the size of the study impacts maintaining privacy is typically very specific to the nation and institute within which the data was collected. One should contact the local Data Compliance Office, or the Data Protection Officer (DPO) to discuss the specific case. To read more, refer to e.g. Last (1991) and Ursin et al. (2019).
- When there are workflow changes, it is important to ensure that everyone knows exactly which role they play in data processing, etc. This can be checked using e.g. guidelines from the European Data Protection Board (EDPB) on the concepts of controller and processor under GDPR.
Data Analysis, Sharing,Storage and Archiving
Considerations
- When using and reusing personal data, it is imperative to secure explicit consent for any primary and secondary applications of such data. As stipulated by Article 5.1 of the GDPR, and in the sections above, it is not only necessary to have a legal basis for the data collection, but also to ensure that one has permission to use the data collected for specific analysis purpose (for instance, to ask targeted or specific questions during the analysis phase).
- The analysis might involve data that are held across multiple repositories and countries. It is essential to request permission to transfer or move the data, in case the analysis must be done in another country/institution other than the one which collected the data. It is good to consider creating contracts/agreements related to the movement of the data in advance.
- When data is collected, one should maximise the use of the data (within the permissions surrounding the data). This means making the data available for sharing and reuse, whenever possible. Data should be as open as possible, as closed as necessary.
- Personal data must be stored securely to remain compliant with GDPR. Thus one needs to ensure that the selected storage solution for the data is sufficiently secure.
- Those working at infrastructure providers providing storage and archiving solutions must ensure the requestors have consented to save their data. According to Art. 33.1 of the GDPR, the infrastructure storages are liable for the data security plan. In the case of a data breach, they are responsible for it and must inform the Data Protection Authorities. If the personal data breach is likely to result in a high risk to the individuals, the controller will also need to communicate the personal data breach to the data subject.
- It is necessary to consider where the data is sent, deletion policies (including in the event of withdrawal and when agreed upon periods for storage expire), whether it is stored on a cloud/local storage, the access policy of the storage solution, and de-identification of the data prior to storage.
Existing approaches
- Consent forms provided at the point of data collection should make clear permissions related to the primary and secondary use of the data. Whilst there are exceptions within GDPR for use for research purposes, it is possible that particular funders/institutes/countries require that one asks for explicit consent. The researchers should check with the local Data Compliance Office (DPO) to confirm the requirements.
- If the data is held in various places, one should verify consent for any data movement, look for solutions that enable data use without relocation, and formalise agreements.
- Check the EDPB for more information on how to deal with data breaches
- Information about how to appropriately store personal data is available on the RDMkit.
More information
Links to RDMkit
RDMkit is the Research Data Management toolkit for Life Sciences describing best practices and guidelines to help you make your data FAIR (Findable, Accessible, Interoperable and Reusable)
Tools and resources on this page
Tool or resource | Description | Related pages | Registry |
---|---|---|---|
FAIRsharing | FAIRsharing is a FAIR-supporting resource that provides an informative and educational registry on data standards, databases, repositories and policy, alongside search and visualization tools and services that interoperate with other FAIR-enabling resources. fairsharing guides consumers to discover, select and use standards, databases, repositories and policy with confidence, and producers to make their resources more discoverable, more widely adopted and cited. Each record in fairsharing is curated in collaboration with the maintainers of the resource themselves, ensuring that the metadata in the fairsharing registry is accurate and timely. Every record is manually reviewed at least once a year. Records can be collated into collections, based on a project, society or organisation, or Recommendations, where they are collated around a policy, such as a journal or funder data policy. | Pathogen characterisation Human biomolecular data | Standards/Databases Training |