ArticlePDF Available

Transparency in the secondary use of health data: assessing the status quo of guidance and best practices

The Royal Society
Royal Society Open Science
Authors:

Abstract and Figures

We evaluated what guidance exists in the literature to improve the transparency of studies that make secondary use of health data. To find peer-reviewed papers, we searched PubMed and Google Scholar. To find institutional documents, we used our personal expertise to draft a list of health organizations and searched their websites. We quantitatively and qualitatively coded different types of research transparency: registration, methods reporting, results reporting, data sharing and code sharing. We found 56 documents that provide recommendations to improve the transparency of studies making secondary use of health data, mainly in relation to study registration (n = 27) and/or methods reporting (n = 39). Only three documents made recommendations on data sharing or code sharing. Recommendations for study registration and methods reporting mainly came in the form of structured documents like registration templates and reporting guidelines. Aside from the recommendations aimed directly at researchers, we also found recommendations aimed at the wider research community, typically on how to improve research infrastructure. Limitations or challenges of improving transparency were rarely mentioned, highlighting the need for more nuance in providing transparency guidance for studies that make secondary use of health data.
This content is subject to copyright.
Transparency in the
secondary use of health
data: assessing the status
quo of guidance and
best practices
Olmo R. van den Akker1, Robert T. Thibault2,4, John P. A.
Ioannidis2,3, Susanne G. Schorr1 and Daniel Strech1
1QUEST Center for Responsible Research, Berlin Institute of Health, Berlin, Germany
2Meta-Research Innovation Center at Stanford (METRICS), and 3Departments of Medicine and
of Epidemiology and Population Health, Stanford University, Stanford, CA, USA
4Coalition for Aligning Science, Chevy Chase, MD, USA
ORvdA,0000-0002-0712-3746; JPAI,0000-0003-3118-6859
We evaluated what guidance exists in the literature to improve
the transparency of studies that make secondary use of health
data. To find peer-reviewed papers, we searched PubMed and
Google Scholar. To find institutional documents, we used our
personal expertise to draft a list of health organizations and
searched their websites. We quantitatively and qualitatively
coded different types of research transparency: registration,
methods reporting, results reporting, data sharing and
code sharing. We found 56 documents that provide
recommendations to improve the transparency of studies
making secondary use of health data, mainly in relation
to study registration (n = 27) and/or methods reporting (n
= 39). Only three documents made recommendations on
data sharing or code sharing. Recommendations for study
registration and methods reporting mainly came in the
form of structured documents like registration templates
and reporting guidelines. Aside from the recommendations
aimed directly at researchers, we also found recommendations
aimed at the wider research community, typically on how to
improve research infrastructure. Limitations or challenges of
improving transparency were rarely mentioned, highlighting
the need for more nuance in providing transparency guidance
for studies that make secondary use of health data.
© 2025 The Authors. Published by the Royal Society under the terms of the Creative
Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits
unrestricted use, provided the original author and source are credited.
Research
Cite this article: van den Akker OR, Thibault RT,
Ioannidis JPA, Schorr SG, Strech D. 2025
Transparency in the secondary use of health data:
assessing the status quo of guidance and best
practices. R. Soc. Open Sci. 12: 241364.
https://doi.org/10.1098/rsos.241364
Received: 11 August 2024
Accepted: 31 December 2024
Subject Category:
science, society and policy
Subject Areas:
health and disease and epidemiology
Keywords:
registration, methods reporting, results reporting,
data sharing, code sharing
Author for correspondence:
Olmo R. van den Akker
e-mail: ovdakker@gmail.com
Electronic supplementary material is available
online at https://doi.org/10.6084/
m9.figshare.c.7644199.
1. Introduction
Health data have become increasingly accessible to researchers with the advent of large databases
providing routine patient data from electronic health records (e.g. OpenSafely, OpenPrescribing,
Clinical Practice Research Datalink (CPRD), German Portal for Medical Research Data (FDPG)). The
secondary use of health data (SU/HD) for research purposes may yield valuable knowledge, but
sometimes high risk of bias or even fraud may arise [1–6]. Because the datasets are not tailor-made to
research studies, researchers typically need to inspect the data before being able to develop a sensible
analysis plan. However, inspecting the data provides researchers with information about the variables
of interest, thereby potentially biasing the statistical analyses [7]. Aside from that, routinely collected
health data may be more prone than clinical trial data to selection bias because proper randomization
cannot typically be achieved [8,9], and to measurement error because of differences in the data entry
and classification procedures among health organizations [10]. Furthermore, it can be challenging
to identify all available SU/HD studies relevant to a given research question, making it difficult to
properly review and synthesize the literature.
The analytical complexity and the potential for bias in SU/HD studies highlight the need for more
transparency, which would benefit science in two ways. First, it becomes easier to identify SU/HD
studies and thus to reduce bias in the review and synthesis of such studies. Second, it becomes easier
to identify biases in individual studies, and to prevent (via researchers engaging less in questionable
research practices) and correct (via follow-up studies or correction notices) these biases. Moreover, as
secondary use of patient data increasingly works with broad consent or opt-out models, transparency
about the studies conducted also plays an important role in building and maintaining social trust in
this form of patient data use [11]. In a broad consent and opt-out model, patients no longer consent to
the individual secondary use studies with their patient data but are only informed about study-wide
objectives, risks and governance of secondary use. In these situations, it is even more important that
society is informed about which studies are being conducted.
Important pillars of research transparency are registration, methods and results reporting and data
and code sharing [12]. Registration (also called preregistration because it should take place before data
analysis; see [13,14]) refers to the documentation of research plans (e.g. hypotheses and/or analyses)
before research outcomes are known [15]. This documentation typically occurs in a specific registration
repository like https://clinicaltrials.gov or https://osf.io. Registration allows readers of a scientific paper
to assess what the research plan was and whether the author(s) conducted a study as planned. This
documentation can help identify potentially biased, data-driven decisions the authors might have
made during or after running the study. Moreover, registration of the existence of specific studies
has the advantage of making transparent what studies are out there, and thus informing the public,
informing researchers working on similar topics, and potentially preventing publication bias [16].
A recent study in the Swedish context shows that only 0.5% of SU/HD studies are prospectively
registered [17].
Methods reporting refers to the public documentation of the research design and methodology
of a scientific study once it is completed and the results are known. This typically occurs in the
methods section of a research paper. Transparent methods reporting allows readers of a paper to assess
whether the study was carried out in line with the registration (assuming a registration is available and
sufficiently clear), and potentially rerun and verify analyses or perform other replications [18–20]. In
the case of SU/HD studies, methods reporting typically involves a detailed description of the handling
of data and the statistical analyses performed.
Results reporting refers to the documentation of the outcomes of a scientific study. It is transparent
if a result is reported for all the planned analyses, and unplanned analyses are presented as unplanned.
Results transparency is important because omitting certain results (e.g. because they are not statisti-
cally significant) biases the scientific literature [21,22].
Sharing refers to the distribution of the data and code of a study, which can be ‘open’ or ‘controlled’.
Open data and open code are available to anyone with access to the Internet. Controlled data and
controlled code are available to bona fide researchers but come with restrictions such as a confiden-
tiality agreement. Controlled sharing is customary for data that are sensitive, which typically is the
case for electronic health data. In general, sharing is transparent if it allows readers to redo the
study’s analyses on the original data. Data and code sharing are seen as some of the most important
transparency practices in biomedicine [23]. In the context of SU/HD, control over the data typically lies
with the registry or database that provides the data, not with the researchers themselves. Transparency
2
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
therefore does not necessarily mean providing access to the data but means providing information as
to how to access the data from the data provider (if access is possible at all). For example, researchers
could provide information about the specifics of the data use agreement they had in place with the
data provider, the use-and-access criteria applied by the patient registry or an explanation of why
access to the data to third parties is not feasible or allowed. Note that sharing is sometimes called
on-sharing in the context of secondary use of data because, per definition, the data have already been
shared before [24].
The importance of research transparency is already well acknowledged in the realm of clinical trials
[25], as is evidenced by the large collection of authoritative guidance documents regarding registration
(e.g. [26]; article III.L.1 of [27]; article 35 of [28]), reporting (e.g. [27,29,30]; article 36 of [28]) and
sharing (e.g. article III.L.2 of [27]), as well as legal requirements and infrastructure such as clinical trial
registries (e.g. https://clinicaltrials.gov, https://www.clinicaltrialsregister.eu).
While there is a large amount of guidance and infrastructure available from international organ-
izations, journals, funders and research institutions to improve the transparency of clinical trials,
the guidance and infrastructure in the area of SU/HD seem less developed. In recent years, some
important repositories have taken root that aim to provide health data to researchers (e.g. OpenSafely,
Clinical Research Practice Datalink, European Health Data Space). However, to our knowledge, there
has been no assessment of peer-reviewed literature or institutional documents regarding guidance
for improving the transparency of studies using such data. The current review involves such an
assessment and includes a quantitative and qualitative analysis. In the quantitative analysis, we count
the number of papers that include guidance and the prevalence of the different types of guidance
provided. This allows us to identify areas of focus and expose gaps in the existing literature. In the
qualitative analysis, we synthesize the content of the guidance, providing context about the robustness
and potential impact of the recommendations. Together, these analyses provide a complete picture of
the available guidance aimed at improving the transparency of SU/HD studies, making it possible to
prioritize the development and adjustments of specific types of guidance in the future.
1.1. Terminology used in this study
1.1.1. Health data
According to the European Data Protection Supervisor [31] health data refer to personal information
that relates to the health status of a person and includes medical data as well as administrative and
financial information about health. Health data can stem from routine clinical processes as well as from
patient-reported outcome (PRO) measures [32].
1.1.2. Secondary use of health data
Researchers have reported some confusion about what secondary use means (Joint Action Towards
the European Health Data Space [33]). We follow the World Health Organization [34] by defining
the secondary use of health data as the processing of health data for purposes other than the initial
purposes for which the data were collected. Even though health data can have many secondary uses
[35] we only focus on its use for biomedical research. A largely synonymous term that has gained
traction in recent years is ‘real-world data’. Real-world data are typically used to refer to health data
that are not derived from clinical trials but during routine clinical practice [36].
1.1.3. Transparency
We use transparency in the context of scientific research, by focusing on registration, reporting and sharing.
However, transparency in the context of SU/HD is often also used to mean transparency with regard to the
patient (i.e. whether the patient knows what happens with their personal data) [37,38]. The ethical and legal
debate on whether patients should be informed about every secondary use project involving their patient
data to decide whether to give their consent is not addressed in this paper (but see [11]).
2. Methods
The study design was registered on 20 July 2023 on the Open Science Framework at https://osf.io/
7864h. The raw data and analysed data used in this study can be found at https://osf.io/2nup4. Note
3
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
that we also preregistered an assessment of the transparency guidance on patient registry websites,
but in hindsight we realized that providing guidance is not one of the main goals of such websites.
As such, we do not present results from that part of the preregistration in this paper. Additionally,
we decided not to pursue our preregistered analysis on dataset registration (i.e. publicly registering
that and how one is using a specific dataset). We did so because we could not find any references to
dataset registration in the first batch of about 30 documents and decided it would not be worth the
extra coding effort to further pursue it. An overview of all deviations from our preregistration can be
found at https://osf.io/m4ehx.
2.1. Sample selection
To find documents that potentially discuss transparency in SU/HD studies, we used the following
search term combination on PubMed: (‘guidance’ OR ‘best practice*’ OR ‘guideline*’ OR ‘recommen-
dation*’ OR ‘road map’ OR ‘position paper’) AND (‘secondary use’ OR ‘secondary data’ OR ‘reuse’ OR
‘database stud*’ OR ‘real-world data’ OR ‘real-world evidence’ OR ‘registry data’) AND (‘transparen*’
OR ‘registration’ OR ‘reporting’ OR ‘sharing’). This PubMed search retrieved 954 documents (see
https://osf.io/bz9mr) on 20 July 2023, before our registration. We also did a search of these keyword
combinations on Google Scholar, where we added the term ‘health’ to restrict our search to the
secondary use of health data.
Prior to registration, we conducted the search process using Google Scholar. Unfortunately, search
results for Google Scholar are not reproducible [39]. We decided to include Google Scholar despite the
disadvantage of irreproducibility because it is the most comprehensive [40,41] and most used [42,43]
source of scientific literature. Moreover, Google Scholar is especially useful for exploratory searches
like ours [44]. To identify any missed documents that may be relevant, we used the snowball method
and searched the references section of the documents included based on the initial screening.
We also wanted to include documents from health institutions with a relevance for SU/HD studies.
Based on our own expertise and an overview provided by Burns et al. [45] we selected a set of
(inter)national health institutions that had previously published transparency guidance in the context
of clinical trials (as these institutions often extend their guidance frameworks to cover secondary use
of health data), and a set of learned societies specifically revolving around SU/HD studies. We then
looked on their websites for any documents that may conceivably include transparency guidance for
SU/HD studies.
From all identified documents (both peer reviewed and institutional), we selected documents
relevant to our research question (post-registration) in the following way. First, we screened the title
and abstract (for documents found via Google Scholar and PubMed) or the title (for documents found
via the snowball method) of a document and assessed whether the document was likely to contain
guidance for any of the transparency aspects: registration, methods reporting, results reporting, data
sharing and code sharing. We assessed that this would be the case for documents that state:
that they provide guidance for SU/HD studies on one or more transparency aspects;
that they provide general guidance for SU/HD studies;
that they discuss one or more transparency aspects in the context of SU/HD studies;
and
that they discuss SU/HD studies generally.
In sum, our set of included documents involved documents that based on the title and/or abstract
potentially included transparency guidance for SU/HD. A PRISMA flow diagram of our search and
selection process can be found in electronic supplementary material, figure S1. The full set of peer-
reviewed papers and institutional documents can be found at https://osf.io/ednwx and https://osf.io/
gajxt, respectively.
2.2. Analysis
We employed both a qualitative as well as a quantitative approach. The qualitative approach consisted
of a thematic analysis [46] using MAXQDA [47] in which we retrieved relevant sections from the
peer-reviewed papers and institutional documents. In the first stage of coding, our approach was
4
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
primarily deductive as we screened documents for text relevant to one of the transparency themes:
registration, methods reporting, results reporting, data sharing and code sharing. We also extracted
texts that highlighted the main goal of the documents, and any additional texts that we deemed
potentially useful in writing our paper.
In the second, inductive stage of coding, we identified subthemes within the a priori selected themes.
We went over all the extracted texts from a given theme and categorized each text based on content.
This led to a more granular understanding of guidance in the main transparency areas.
The quantitative approach consisted of counting the number of documents that included one or
more texts regarding each of the themes we used in the deductive stage, and all the subthemes
identified in the inductive stage of our thematic analysis. We distinguished three ways in which the
themes were included in a document. We defined a piece of text as a ‘call’ if the authors claimed
that transparency should be improved in a particular area, without providing an argument for why
this should be the case. We defined a piece of text as a ‘justification’ if the authors did make an
argument supporting a statement for more transparency in a particular area. Finally, we defined a
piece of text as a ‘recommendation’ when the authors made a recommendation for how to improve the
transparency in a particular area. If a document contained multiple calls for a transparency practice,
we coded this as one call because calls do not qualitatively differ from one another like justifications
and recommendations do (e.g. there could be multiple justifications for why we need more registration
of SU/SH studies). As such, a document could have a maximum of five calls, one for each of the
transparency practices, but in theory unlimited justifications and recommendations.
In some cases, we initially coded a text excerpt as a general call because authors did not directly
explain why more transparency would be beneficial. However, justifications were often provided
earlier or later in the text. In those cases, we recoded the ‘call’ to a ‘justification’. Consequently, our
dataset does not include documents with a call and a justification. Note that a justification does not
automatically imply a call, and a recommendation does not automatically imply a justification or a call.
For example, it could be that a paper states that a health organization has called for more registration
and then provides recommendations, which does not mean that the authors in the paper call for more
registration.
3. Results
Below we present the quantitative and qualitative results in narrative form. In addition, we present the
quantitative results in tabular form in table 1, and visually in figure 1, and the qualitative results in
tabular form in table 2 (registration), table 3 (reporting) and table 4 (sharing), and visually in figure 2
(justifications) and figure 3 (recommendations).
Our sample of peer-reviewed literature (which was slightly different from our preregistered sample;
see https://osf.io/m4ehx) that included guidance on transparency for SU/HD studies contained 116
papers (36 first found through Google Scholar, 57 first found through PubMed and 23 found through
the snowball method by checking the 4745 references of the papers found through Google Scholar
and PubMed). We extracted 606 text excerpts from the peer-reviewed papers. The quantitative data
are summarized per paper at https://osf.io/z7hvg, including links to the papers and their numbers of
citations so that readers can easily identify papers that might be relevant to them. An overview of all
the texts we extracted from our analyses can be found at https://osf.io/ednwx.
Our sample of 21 institutional documents was slightly different from our preregistered sample; see
https://osf.io/m4ehx. We extracted 130 text excerpts from these 21 documents. The quantitative data are
summarized per paper at https://osf.io/rmvjp, including links to the documents. An overview of all the
texts we extracted from our analyses can be found at https://osf.io/gajxt.
3.1. Guidance in the peer-reviewed literature
3.1.1. Registration
Among the 116 papers, we found four papers with a call for more registrations of SU/HD studies, 18
with a justification for more registrations of SU/HD studies (of which four had multiple justifications)
and 19 with one or more recommendations on how to register SU/HD studies. In total, we found 112
recommendations, with three papers providing most (61/112) of those (25 in [52], 12 in [72] and 24
in [58]). These three papers involved structured templates that researchers can use to register SU/HD
5
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
studies. We counted each individual item of these templates as a separate recommendation, but we
excluded subitems. For example, for Wang et al. [59] we coded the reporting item ‘Methods used for
confounder adjustment’ as a separate recommendation but did not code the subitems in which the
method for confounder adjustment was discussed for different potential statistical analyses.
3.1.1.1. Justifications
The main justification for registering SU/HD studies was that registration would prevent questionable
research practices like p-hacking, selective or retrospective reporting and HARKing (hypothesizing
after the results are known). Several authors argued that the large scale [52,73] and widespread
availability of health datasets [7,52] make analyses based on existing health data more susceptible
to biases because of the large number of analysis options and possible prior knowledge of the
data. Another commonly mentioned justification for registration was that it would allow someone
to identify publication bias or potentially prevent publication bias. This is because an overview of all
Figure 1. Number of papers with at least one call, justification or recommendation regarding the five transparency elements in the
peer-reviewed literature (top) and institutional documents (bottom).
6
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
published and non-published studies allows the research community to identify non-published studies
and request the results to be made available. The authors of one study claimed that publication bias
might be more severe for SU/HD studies because journals may have less expertise in evaluating such
studies [7]. Finally, a paper mentioned that registration could be helpful in drafting ethical review board
submissions and informed consent forms [74] and justified more registrations of SU/SH studies on that
ground.
Figure 2. Overview of the justifications in peer-reviewed literature and institutional documents for improving the transparency
in studies making secondary use of health data. Note that the sizes of the areas represent the number of times we found these
justifications in the literature.
Figure 3. Overview of the recommendations in peer-reviewed literature and institutional documents for improving the transparency
in studies making secondary use of health data. Note that the sizes of the areas represent the number of times we found these
recommendations in the literature.
7
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
3.1.1.2. Recommendations for researchers
Most often, recommendations on how to register SU/HD were about which study elements need to be
specified in a registration. Many elements were mentioned but in the context of SU/HD most emphasis
was placed on registering the data source and the statistical choices. Registering the authors’ prior
knowledge of the data was emphasized strongly by Baldwin et al. [49], Orsini et al. [7] and Van den
Akker et al. [52], but was not mentioned in two extensive papers presenting registration guidance
[72,75]. Several papers simply provided a list of platforms where authors can preregister their SU/HD
study, where references to clinicaltrials.gov and the electronic Register of Post-Authorization Stud-
ies were most common [7,74,76,77]. Finally, several papers stated that deviations from registrations
should be transparently disclosed, preferably including the timing of and justification for the change
[7,49,51,53,67,70,78].
3.1.1.3. Recommendations for institutions
We also found several recommendations in the literature where guidance was provided to institutions
on how to improve the infrastructure surrounding registration. For example, Orsini et al. [7] discussed
that embargoing registrations could preserve intellectual property and prevent scooping. Zarin et al.
[73] and Wang et al. [72] focused more on registration templates, where the former argued for creating
balanced templates that take into account both comprehensiveness (providing more information about
the study) and simplicity (yielding higher uptake of registration of SU/HD studies), and the latter
argued for the integration of templates with HD registries.
3.1.1.4. Limitations
Some authors also provided points of concern with relation to the registration of SU/HD studies,
although these discussions were often limited. Orsini et al. [7] stated that registration does not
guarantee high-quality studies, and Zarin et al. [73] argued that registration of SU/HD studies likely
has limited impact unless any recommendations or policies can be legally enforced. Dhruva et al. [53]
agreed with the point about enforcement and called for a mandate for registration of SU/HD studies as
the 2004 ICMJE policy did for clinical trials [79].
3.1.2. Reporting
We found three papers with a call for methods reporting, four with a call for results reporting, 16
with a justification for methods reporting, six with a justification for results reporting, 32 with one
or more recommendations for how to best report the methods of a SU/HD study and 13 with one or
more recommendations for how to best report the results of a SU/HD study. In total, we found 147
recommendations for methods reporting, and 26 recommendations for results reporting. Five papers
provided more than 10 recommendations (13 in [61]; 15 in [62]; 13 in [63]; 50 in [59]; 11 in [66]).
Table 1. Total number of calls (C), justifications (J) and recommendations (R) in the peer-reviewed literature and institutional
documents.
peer-reviewed literature (n = 116) institutional documents (n = 23)
C J R C J R
registration 4 24 112 2 1 55
methods reporting 3 24 147 1 1 19
results reporting 4 6 26 0 1 1
data sharing 2 9 1 1 0 1
code sharing 2 9 4 2 0 1
total 15 72 290 6 3 77
8
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
Table 2. Qualitative analysis of justifications and recommendations on registration.
explanation example quote
justifications
publication bias registration can prevent or at least allows
identification of publication bias because
it provides a record of studies that are not
published
’How can investigators presenting an RWE study dispel
the suspicion that they ran multiple similar studies
and analyses, but published only the one that gave
a positive result? One answer is to adopt institutional
and corporate policies on RWE studies that provide a
similar level of rigor to policies on the conduct of RCTs.
Such a policy may set the definition of RWE studies,
mandate posting an outline protocol for each study on
an appropriate forum’ [48]
questionable
research practices
registration can prevent or at least allows
identification of questionable research
practices because registration allows third
parties to compare the plans to the actual
study
‘Because research plans and hypotheses are specified
before the results are known, pre-registration reduces
the potential for cognitive biases to lead to p-hacking,
selective reporting, and HARK-ing’ [49]
informative
conclusions
registration allows third parties to better assess
the study and its results when the study has
been completed
‘[T]ransparency improves the ability of decision-makers
to assess the quality and validity of a study by
giving them a deeper understanding of why and how
the research was conducted and whether the results
reflect pre-established questions and methods’ [7]
prior knowledge registration makes clear what prior knowledge
authors had about the dataset that could
have biased their decisions
‘Bias resulting from retrospective selection can be serious,
especially when selecting external data and key
analysis features, when the external control results
are already known. Pre-specification is therefore an
essential pre-requisite when using external control
data’ [50]
recommendations
deviations transparently discuss any deviations made
from the registration in the final paper
‘We recommend that researchers be transparent about
their ex ante analytic plans, provide justification for
subsequent changes in analytic models, and report out
the results of their ex ante analytic plan as well as the
results from its modifications’ [51]
what to register:
— basic information provide basic information about the study,
like the study rationale and the main
hypotheses
‘Please provide the hypotheses of your secondary data
analysis. Make sure they are specific and testable, and
make it clear what your statistical framework is (e.g.,
Bayesian inference, NHST). In case your hypothesis is
directional, do not forget to state the direction. Please
also provide a rationale for each hypothesis’ [52]
— data source provide information about the data source/
study population
‘First, registration would require researchers to prespecify
their data source(s) along with sample inclusion and
exclusion criteria allowing preanalytic evaluation of
whether the study is representative of the patient
population using the medical product. For any data
source, quality-control measures to ensure data
integrity would be proactively described’ [53]
— prior knowledge provide information about any knowledge the
authors have about the data
‘To increase transparenc y about potential biases arising
from knowledge of the data, researchers could
routinely report all prior data access in a pre-
registration. This would ideally include evidence from
(Continued.)
9
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
3.1.2.1. Justifications
Justifications for better methods reporting typically came in two shapes: (i) good reporting provides
the information necessary to reproduce (i.e. redo the analyses on the same dataset) or replicate (i.e.
do the same analyses on a different dataset) results from SU/HD studies [56,59,76,80] and (ii) good
reporting makes it easier to assess the results of the study itself by other researchers or peer reviewers
[48,55,60,74,81].
Several authors justified results reporting by claiming that it is a good way to provide informa-
tion to patients, enhancing trust and facilitating informed decisions [65,82,83]. Others mentioned that
reporting the results of all conducted analyses would decrease publication bias [48,65].
3.1.2.2. Recommendations for researchers
The main recommendations for improved methods reporting came in the shape of formal report-
ing guidelines, in which authors presented lists of relevant study elements that are important to
include in research reports. Some papers presented a guideline themselves [59,61,63,84], while others
merely advised to adhere to such guidelines [48,58,84–86]. The Reporting of Studies Conducted Using
Observational Routinely-Collected Health Data (RECORD) [84] and Wang et al. [59] had a general
goal to improve reporting in SU/HD studies. Other guidelines were more specific: RECORD-PE [63]
is dedicated to improve reporting in pharmacoepidemiologic research, Patorno et al. [87] discussed
RECORD in light of the reporting of diabetes research and Chai et al. [62] focused on traditional
Chinese medicine.
Recommendations for results reporting were limited because many authors simply state that results
should be reported and why, not necessarily how (though an elaborate list of different elements to
consider can be found in the supplementary materials of [59]). Berger et al. [76] stated that not
only medical journals can be used to report results, but also publicly available websites. Hersh et
al. [88], Roche et al. [64] and Wang et al. [66] emphasized that sensitivity analyses can be useful when
presenting results because many analysis options in secondary health data give rise to many different
interpretations of the data.
3.1.2.3. Recommendations for institutions
We also found some institutional recommendations regarding reporting. For example, while the value
of reporting guidelines was echoed by Khachfe et al. [89] they emphasized that such guidelines should
be included in manuscript submission and editorial processes for them to be effective. In a similar
way as Dhruva et al. [53] did for registration, they stated that the ICMJE could play a mandating role
in this regard. In addition, Khachfe et al. [89] argued that more domain-specific checklists should be
drafted, and such checklists could already be integrated into educational modules. Finally, Bate [90]
mentioned that reporting guidelines for unstructured data like social media data are lacking and that
meta-research on the impact of the guidelines is desirable.
Table 2. (Continued.)
explanation example quote
an independent gatekeeper (e.g., a data guardian of the
study) stating whether data and relevant variables
were accessed by each co-author’ [49]
— analysis plan provide detailed information about the
planned statistical analyses
‘Describe details of sensitivity analysis that will be
performed to confirm the robustness of the results
of analysis. Sensitivity analysis is especially important
in pharmacoepidemiological studies with databases
because the results of analysis tend to vary
significantly depending on study design such as
definition of exposure, outcome, covariates, etc.
Describe all previously planned sensitivity analyses
and ensure that these are differentiated from
additional interim sensitivity analyses’ [54]
10
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
Table 3. Qualitative analysis of justifications and recommendations on reporting.
explanation example
methods reporting
justifications
informative conclusions methods reporting provides context with which third parties can interpret the drawn conclusions in a less
biased manner
‘Data users should agree to make the methods and results of their secondary analyses publicly available not only
through scientific publications (that may or may not be prepared and, if prepared, that may or may not be
accepted for publication) but also by depositing them in a repository and making them discoverable. This will be
important to provide further examples of effective data sharing and allow any conclusions from secondary use
to be examined by others’ [55]
reproducibility providing information about the methods/statistical analyses helps other researchers to redo the analysis using
the same dataset to see whether the results are consistent
‘Because of the lack of standardization in secondary data analytics, complete transparency is critically important
in the reporting of analytic approaches and all coding details. This will allow reproduction of analyses,
replication of findings using different data sources, and ultimately greater confidence in such analyses, possibly
approaching the trust we place in highly controlled clinical trials’ [56]
replicability providing information about the methods/statistical analyses helps other researchers to redo the analysis using
a different dataset to see whether the results are consistent
‘This system would enable regulators to repeat the exact same study and change assumptions or definitions in the
design and statistical analysis either through submission of data or by providing access to the data’ [57]
recommendations
guidelines use existing reporting guidelines to write the methods section of your research papers ‘Several guidelines have been developed to enhance reporting, such as Strengthening the Reporting of
Observational Studies in Epidemiology (STROBE), the Reporting of studies Conducted using Observational
Routinely-collected Data (RECORD) statement, and its extension for pharmacoepidemiology studies
(RECORD-PE). Interested researchers should always consult these guidelines for reporting of their studies’
[58]
what to report:
— overview study design provide a descriptive or visual overview of the basic design elements that make up your study ‘Reporting on overall study design should include a figure that contains 1st and 2nd order temporal anchors and
depicts their relation to each other’ [59]
— data source/study population provide any relevant details about where the data came from and how they were managed ‘Describe the nature of dataset(s) used. In particular: The purpose of the dataset—e.g. observational research
registry, national audit programme, administrative dataset (linked to financial remuneration or service
delivery). This should include details of the funding of the dataset, and the organization(s) responsible for
the administration and oversight’ [60]
(Continued.)
11
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
Table 3. (Continued.)
explanation example
— prior knowledge provide any prior knowledge about the data that the authors may have had before doing the statistical
analyses
‘Authors should describe the extent to which the investigators had access to the database population used to create
the study population’ [61]
— exposures/predictors provide information about the factor, condition or intervention that may impact the health outcomes of
interest
‘Reporting on exposure definition should include: The type of exposure that is captured or measured, e.g. drug
versus procedure, new use, incident, prevalent, cumulative, time-varying’ [59]
— comparators/control sample provide information about the groups or conditions against which the exposure group is compared to evaluate
the impact of the exposure on the health outcome of interest
‘When compared with another TCM exposure/intervention, it is necessary to evaluate whether the evidence base for
the efficacy and safety of the control is sufficient’ [62]
— confounders/covariates provide information about the variables that are (potentially) associated with both the exposure and the
health outcome of interest
‘Discuss the potential for confounding, both measured and unmeasured, and how this was assessed and addressed’
[51]
— outcomes provide information about health-related endpoints or events of interest ‘Discuss how outcomes were measured and how classification bias was addressed’ [51]
— statistical analysis provide detailed information about the statistical methods used to draw inferences about the variables of
interest
‘Describe the methods used to evaluate whether the assumptions have been met’ [63]
results reporting
justifications
informative conclusions effective reporting of results helps make more accurate conclusions of the results of a study The results of all analyses that are conduc ted (e.g., matched, unmatched, adjusted, and unadjusted) results should
be reported. Presentation of the unadjusted results helps to demonstrate the robustness of the chosen method
of analysis; matched or adjusted results that differ substantially from the unmatched/unadjusted can reduce
confidence in the matched/adjusted trends observed’ [64]
publication bias also presenting non-statistically significant results helps to prevent the literature from being disproportionally
filled with ‘positive’ studies
‘The lack of RW study protocol registration and reporting of results can potentially lead to significant bias in
reporting positive/selective results, as studies that do not produce the expected data will probably not be
completed or submitted for peer review. RW studies, regardless of the origin of RW data, need to be registered
in a manner equivalent to that of clinical trials’ [65]
recommendations
patient characteristics provide detailed information about the characteristics of the patients in the sample ‘The rationale for reporting on characteristics of the study population is described in numerous other reporting
guidance documents. This includes items such as an attrition table (showing patient numbers as eligibility
criteria are applied), baseline characteristics of the derived population, as well as the number and timing of
outcomes of interest. It allows the investigator and reviewers to describe and assess whether the frequency of a
(Continued.)
12
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
Table 3. (Continued.)
explanation example
derived variable is consistent with expectation (e.g., that the outcome incidence or a covariate prevalence looks
approximately correct). The same rationale applies in studies that develop or use derived information from NLP
and ML algorithms’ [66]
sensitivity analyses provide more than one analysis so that others can assess the robustness of the results ‘Report univariate and multivariate results in an unbiased and complete fashion such that the benefits and risks of
all comparators reflect “fair balance” ’ [67]
effect sizes provide effect sizes alongside the statistical significance of the results ‘second, we recommend researchers report effect sizes as many associations may be statistically, but not practically,
significant when analyzing large sample sizes. In doing so, we may need to adjust our collective expectations of
what effect sizes to expect, and which ones to treat as substantial’ [68]
interpretation provide a cautious discussion of the results in light of the research questions and/or hypotheses ‘Discuss the potential for confounding by indication, contraindication or disease severity or selection bias (healthy
adherer/sick stopper) as alternative explanations for the study findings when relevant’ [63]
13
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
Table 4. Qualitative analysis of justifications and recommendations on sharing.
explanation example quote
data sharing
justifications
informative conclusions shared data, including metadata, allows third parties to make better assessments of the findings
based on the data
‘The disclosure of research-related information is also fundamental to improving the transparency of studies, especially the availability
of raw data, which enables readers to assess the authenticity and reliability of the findings’ [69]
computational reproducibility data sharing (and code sharing) allows third parties to redo the analyses in the study to see
whether the same results are found
‘For full analytic reproducibility, sharing of code and data is encouraged. However, there are often privacy and intellectual property
considerations that prevent sharing of data, data derivatives, or code’ [66]
sensitivity analyses data sharing allows third parties to do analyses that differ from the analyses in the original study,
allowing an assessment of the reliability of the results
‘Ideally reviewers of submitted evidence, including HTA bodies or independent review groups, would also have access to the data and
analytical code to ensure the replicability of the submitted results and assess the impact of alternative analytical decisions or data
on the resulting estimate(s). However, there remain substantial governance, technical and practical challenges to sharing data,
including a lack of in-house expertise in many HTA agencies’ [70]
code sharing
justifications
informative conclusions shared code allows third parties to see the exact analyses that were conducted, allowing a better
interpretation of the results
‘Finally, the analyses conducted on secondary data are commonly more complex than those applied to simpler experimental designs.
Methods sections in high-impact journals are often highly condensed or hidden at the end of an article, which can make it difficult
or even impossible to assess which analyses exactly were performed. To address this issue, we recommend authors always publish
the full analytic code, even when the raw data cannot be directly shared’ [68]
computational reproducibility shared code allows third parties to redo the analyses in the study to see whether the same results
are found
‘Irreproducibility can be mitigated by sharing raw and processed data and codes, assuming no privac y is compromised in this
process. For replicability, given that RWD are not generated from controlled trials and every data set may has its own unique
data characteristics, complete replicability can be difficult or even infeasible. Nevertheless, detailed documentation of data
characteristics and pre-processing, pre-registration of analysis procedures, and adherence to open science principles (e.g., code
repositories) are critical for replicating findings on different RWD datasets, assuming they come from the same underlying
population’ [71]
In contrast to tables 2 and 3, this table does not include recommendations because we could not find enough recommendations to do a thematical analysis.
14
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
3.1.2.4. Limitations
A limited number of concerns or critiques were raised. Both Kent et al. [70] and Orsini et al. [7] warn
that adhering to reporting guidelines does not necessarily support reproduction, and that they are not
necessarily a sign of high research quality.
3.1.3. Sharing
Guidance on sharing the data and sharing the code of SU/HD studies was sparse in the peer-reviewed
literature. Two papers included a call for more data sharing and more code sharing, eight papers
included a justification for more data sharing and seven papers for more or better code sharing. We
found one recommendation in a paper with regard to data sharing [91], and four recommendations in
one paper [80] with regard to code sharing.
3.1.3.1. Justifications
As justifications for data and code sharing, we found that it would allow computational reproducibility
checks [68,71,72] and robustness checks [70,57]. Both have to do with redoing the analysis but the goal
of computational reproducibility checks is to see whether one arrives at the same outcome using the
same parameters as in the original analysis, and the goal of robustness checks is to see whether one
arrives at the same outcome using slightly different parameters than in the original analysis. Some
authors [77] also mentioned that access to data and code is necessary for third parties to redo the
analysis on a different dataset.
3.1.3.2. Recommendations for researchers
Regarding code sharing, recommendations include a modular programming approach, where code is
separated into independent and interchangeable modules, version control systems and a standardiza-
tion of common analytical approaches [80]. Herrett et al. [92] point to code repositories for electronic
health record research. In these repositories, users can share their methods (metadata, code) so that
others can use it or modify it. An example of such a repository is the HDR UK Phenotype Library
(https://phenotypes.healthdatagateway.org/).
3.1.3.3. Recommendations for institutions
Then et al. [24] would like to see that the conditions of data sharing are stated more clearly when data
users make agreements with data providers. Kent et al. [70] discussed a broader conception of data
sharing and argued that data and code availability would be useful not only for other researchers but
also for health technology assessment (HTA) bodies or independent review groups.
3.1.3.4. Limitations
Most discussions about re-sharing data came with caveats, which could be technical, practical or moral.
Many authors cautioned that data sharing is not always allowed because of privacy reasons embedded
in data transfer agreements [59,66]. Many add that information like codebooks or verbal descriptions of
the data is necessary for other researchers to effectively re-use the data [59,66]. To alleviate the concerns
surrounding privacy and data transfer agreements, one paper [91] argues in favour of broad consent,
where patients provide their consent not only for the original study but also for studies after that, or
dynamic consent, where patients can make granular decisions about who can use their data and when.
3.2. Guidance in institutional documents
3.2.1. Registration
Of the 21 institutional documents, two included a call for more or better registration of SU/HD
studies, one included a justification for more or better registration and seven included one or more
recommendations. In total, we found 55 recommendations for the registration of SU/HD studies. The
registration recommendations were mainly found in three papers: seven in [93]; 15 in [94]; and 24 in
15
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
[54]. Because we found hardly any justifications, recommendations for institutions and limitations in
our set of institutional documents, we only discuss the recommendations below.
3.2.2.1. Recommendations for researchers
The Japanese Pharmaceuticals and Medical Devices Agency (2014) established a ‘Committee for
preparation of guidelines on conducting pharmacoepidemiological studies’ to provide a list of
elements that would be good to include in registrations of SU/HD studies. Similarly, the German
Society for Epidemiology (2008) based their list of to-register elements on a working group, the
AGENS Working Group for the Survey and Utilization of Secondary Data, which in turn was strongly
based on the Good Epidemiological Practice report, which has been available since 2000 and has
seen many revisions, the most recent one in 2019 [95]. Other documents typically refer to external
sources when making recommendations. For example, Health Canada [94] stated a list of elements that
researchers would do well to register based on European Network of Centres for Pharmacoepidemi-
ology and Pharmacovigilance [96] and International Society of Pharmacoepidemiology [97], and the
Council for International Organizations of Medical Sciences [98] refers to Berger et al. [76]. The specific
elements recommended to be included in registrations overlapped greatly with those specified in the
peer-reviewed literature.
3.2.2. Reporting
For reporting, we found one call and one justification for methods reporting, and one justification for
results reporting. We found 19 recommendations for methods reporting in seven documents, and just
one recommendation for results reporting. We only discuss the recommendations in more detail below.
3.2.2.1. Recommendations for researchers
The Japanese Pharmaceuticals and Medical Devices Agency (2014) and the German Society for
Epidemiology [93] provided several recommendations for methods reporting, most of which revolve
around specifying the data source and summarizing the study design. We did not locate recommenda-
tions for results reporting aside from a listing of possible publication sites [98].
3.2.3. Sharing
Guidance for data sharing and code sharing was scarce, with only one call for data sharing, two calls
for code sharing and one recommendation for both. We only discuss the recommendations in more
detail below.
3.2.3.1. Recommendations for researchers
While the Council for International Organizations of Medical Sciences (2024) did call for more code
sharing and Health Canada [94] called for more data sharing, they did not provide any recommenda-
tions on how to effectively engage in these practices. Only the German Society for Epidemiology (2008)
came with some advice: data sharing should only take place with permission of the data owner, but
code sharing could take place independently.
4. Conclusions and discussion
In this study, we found that 44 (out of 116) peer-reviewed papers and 12 (out of 21) institutional
documents discussed transparency in studies that make secondary use of health data (SU/HD). The
justifications for increased transparency included the prevention of questionable research practices, the
facilitation of more informative conclusions and the enhancement of reproducibility and replicability.
Recommendations for increased transparency were primarily presented in structured documents like
registration templates and reporting guidelines. These documents provide guidance on the study
elements to describe in registrations and research papers. For registrations, guidance documents are
16
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
mainly recommended to provide detailed information about the data source, the planned statistical
analyses and any prior knowledge about the data that authors may have. For methods reporting,
guidance documents primarily recommended to provide detailed descriptions of the study design,
data sources, the variables used in the analysis and the statistical analysis itself. Recommendations on
results reporting highlighted the importance of presenting all conducted analyses, including non-sig-
nificant results, providing detailed sample characteristics and running sensitivity analyses. Guidance
was limited in the context of data and code sharing. Instead, practical and privacy concerns that could
prevent sharing were noted frequently.
In the context of registration and reporting, we found that the available guidance often emphasized
that communication about the data source is vital. Indeed, some authors argued that existing data,
being more voluminous and accessible, provide more scope for researcher biases to creep in, and
therefore a higher need for transparency. Based on these arguments, we believe that it would be better
if discussions about the transparency of SU/HD studies would not be limited to methods reporting and
results reporting, like is typically done for clinical trials, but would also involve a separate category of
dataset reporting (i.e. providing detailed information about the data source, and the variables in the
dataset). Formalizing these three separate categories of reporting could help researchers become more
aware of the need to be transparent in all three of these areas.
As data and code sharing are crucial for reproducibility and transparency, the limited guidance
in these areas represents a significant gap that needs to be addressed. In the case of data sharing,
researchers may not know how to be transparent because the data owner may have placed restrictions
on sharing, or it is unclear who is allowed to share and what is allowed to be shared. Further research
should prioritize developing detailed and actionable guidelines for data and code sharing in SU/HD
studies and how to align these steps with current consent processes.
When recommendations were provided for improving transparency, we often noted that little
attention was given to the caveats or concerns related to these recommendations. This is surprising
because improving the transparency of a research study is not as straightforward as it may seem.
For example, drafting a transparent manuscript typically requires more time and effort than drafting
a non-transparent one [99–101]. This can impact researchers at any career stage but is particularly
relevant for early career researchers, who rely heavily on producing output quickly to attain their
desired academic careers [102]. Future guidance on improving the transparency of SU/HD studies
could discuss the benefits and costs of transparency. When researchers are more aware of the complexi-
ties of transparent practices upfront, they may be more likely to continue to engage with transparency
practices in the future. If they are faced with challenges during or after improving the transparency of
their papers, they may become disgruntled and steer clear of these practices from then on. Research
into the day-to-day work processes of researchers may shed more light on this.
Finally, the majority of transparency guidance we found came from papers in the scientific
literature. Guidance in institutional documents was relatively sparse, which is important to know
because researchers may be more likely to turn to organizations like the WHO or the EMA for
guidance than to the peer-reviewed literature. More institutional guidance would align with broader
trends in biomedical research that underscore the importance of clear, reproducible and robust study
methodologies to maintain public trust and scientific integrity. In March 2024, a new law came into
force in Germany, the Health Data Utilization Act (Gesundheitsdatennutzungsgesetz, GDNG), which
regulates the secondary use of health data and, in a separate paragraph, makes both the registration
of the corresponding studies in WHO-recognized registers and results reporting mandatory (Bundes-
ministerium für Gesundheit [103]). One month later, the European Parliament adopted the provisional
agreement on the European Health Data Space (EHDS) Regulation [104]. The EHDS will provide
researchers, innovators and industry with access to a large health dataset. These developments show
that the secondary use of health data is becoming more and more embedded into the scientific
ecosystem, highlighting the importance of guidance.
4.1. Limitations
Our review of institutional documents was less comprehensive compared to that of the peer-reviewed
literature. While we aimed to include a representative sample of institutional guidance documents,
the subjective selection process based on our own expertise may have inadvertently overlooked key
documents from health organizations, regulatory bodies or other relevant entities. This limitation
means that our findings might not fully capture the institutional perspective on transparency practi-
17
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
ces, potentially overlooking valuable insights and recommendations that could have influenced our
conclusions.
Relatedly, one could argue that our distinction between peer-reviewed literature and institutional
documents was somewhat arbitrary, potentially leading us to underestimate the availability of
guidance in institutional documents. Indeed, some peer-reviewed papers involved initiatives or
collaborations of formal organizations like the International Society for Pharmacoeconomics and
Outcomes Research (ISPOR) [59,76,105]. That being said, drawing a line between (more top-down)
institutional initiatives and (more bottom-up) initiatives by researchers is hard as it is often difficult
to assess the formality of scientific collaborations (e.g. in case of the RECORD initiative [61], the
RECORD-PE initiative [63] and the STaRT-RWE initiative [72]). To allow readers to draw their own
conclusions about our sample selection for both peer-reviewed literature and institutional documents,
a full list of documents in our sample can be found at https://osf.io/ednwx (peer-reviewed papers) and
https://osf.io/gajxt (institutional documents).
Another point to take into consideration when interpreting our results is the subjectivity of the
decisions we made throughout the research process. One example lies in our choice of inclusion criteria
for guidance documents. While carefully chosen, we might have missed important papers that use
different terminology or focus on specific aspects of transparency not covered by our search strategy.
Another example of subjectivity in our research choices lies in the nature of our thematic analysis
and coding process. Because this process is inherently subjective, our prior knowledge and experience
could have introduced bias in our judgements. Although we used established qualitative analysis
methods to mitigate this risk, it remains a potential limitation that could affect the reliability of our
results. All our quantitative and qualitative codes can be found on the OSF repository of this project:
https://osf.io/2nup4. Readers interested in finding specific guidance on any of the transparency topics
can find a quantitative summary of the analysed articles at https://osf.io/ap7e8, including links to the
documents and the numbers of citations.
4.2. General conclusion
Our study highlights substantial efforts in the academic community to enhance transparency in SU/HD
studies. To bridge the gap between peer-reviewed recommendations and institutional practices, health
organizations could integrate the existing bottom-up initiatives into their formal guidelines. Future
research could focus on developing standardized, enforceable guidelines for data and code sharing,
while addressing practical and privacy concerns. Additionally, meta-research evaluating the imple-
mentation of transparency practices in SU/HD and the impact of transparency practices on research
quality, health outcomes and public trust would be desirable.
Ethics. This work did not require ethical approval from a human subject or animal welfare committee.
Data accessibility. All our data are available on the OSF repository of our project at https://osf.io/2nup4. The
preregistration is available at https://osf.io/7864h. A document with deviations from our preregistration is available
at https://osf.io/2nup4.
Supplementary material is available online [106].
Declaration of AI use. We have not used AI-assisted technologies in creating this article.
Authors’ contributions. O.R.v.d.A.: conceptualization, data curation, formal analysis, investigation, methodology,
project administration, writing—original draft; R.T.T.: methodology, writing—review and editing; J.P.A.I.:
methodology, writing—review and editing; S.G.S.: methodology, writing—review and editing; D.S.: conceptualiza-
tion, funding acquisition, methodology, supervision, writing—review and editing.
All authors gave final approval for publication and agreed to be held accountable for the work performed
therein.
Conflict of interest declaration. We declare we have no competing interests.
Funding. This study is part of the project HiGHmed, which is funded by the Medical Informatics Initiative of the
German Federal Ministry of Education and Research (funding number 01ZZ2302E).
References
Note: * indicates references used in the analyses but not mentioned in the main text.
1. Mehra MR, Desai SS, Kuy S, Henry TD, Patel AN. 2020 Cardiovascular disease, drug therapy, and mortality in Covid-19. N. Engl. J. Med. 382,
e102. (doi:10.1056/NEJMoa2007621)
18
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
2. Mehra MR, Desai SS, Ruschitzka F , Patel AN. 2020 RETRACTED. Hydroxychloroquine or chloroquine with or without a macrolide for treatment
of COVID-19: a multinational registry analysis. Lancet. (doi:10.1016/s0140-6736(20)31180-6)
3. Patel AN, Desai SS, Grainger DW , Mehra MR. 2020 RETRACTED. Ivermectin in COVID-19 related critical illness. See https://www.isglobal.org/
documents/10179/6022921/Patel+et+al.+2020+version+1.pdf/fab19388-dc3e-4593-a075-db96f4536e9d.
4. Rubin EJ. 2020 Expression of concern: Mehra MR et al. Cardiovascular disease, drug therapy, and mortality in Covid-19. N Engl J Med. DOI:
10.1056/NEJMoa2007621. N. Engl. J. Med. 382, 2464–2464. (doi:10.1056/nejme2020822)
5. Piller C, Servick K. 2020 Two elite medical journals retract coronavirus papers over data integrity questions. Science. (doi:10.1126/science.
abd1697)
6. The Lancet Editors. 2020 Expression of concern. Hydrox ychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a
multinational registry analysis. Lancet 395, e102. (doi:10.1016/S0140-6736(20)31290-3)
7. Orsini LS etal. 2020 Improving transparency to build trust in real-world secondary sata studies for hypothesis testing—why, what, and how:
recommendations and a road map from the Real-World Evidence Transparency Initiative. Value Health 23, 1128–1136. (doi:10.1016/j.jval.
2020.04.002)
8. Beesley LJ, Mukherjee B. 2022 Statistical inference for association studies using electronic health records: handling both selection bias and
outcome misclassification. Biometrics 78, 214–226. (doi:10.1111/biom.13400)
9. Kundu R, Shi X, Morrison J, Mukherjee B. 2023 A framework for understanding selection bias in real-world healthcare data. (https://
arxiv.org/abs/2304.04652)
10. Young JC, Conover MM, Jonsson Funk M. 2018 Measurement error and misclassification in electronic medical records: methods to mitigate
bias. Curr. Epidemiol. Rep. 5, 343–356. (doi:10.1007/s40471-018-0164-x)
11. Zenker S etal. 2022 Data protection-compliant broad consent for secondary use of health care data and human biosamples for (bio)medical
research: towards a new German national standard. J. Biomed. Inform. 131, 104096. (doi:10.1016/j.jbi.2022.104096)
12. Westmore M, Bowdery M, Cody A, Dunham K, Goble D , van der Linden B, Whitlock E, Williams E, Lujan Barroso C. 2023 How an international
research funder’s forum developed guiding principles to ensure value and reduce waste in research. F1000Res. 12, 310. (doi:10.12688/
f1000research.128797.1)
13. Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. 2018 The preregistration revolution. Proc. Natl Acad. Sci. USA 115, 2600–2606. (doi:10.1073/
pnas.1708274114)
14. Rice DB, Moher D. 2019 Curtailing the use of preregistration: a misused term. Perspect. Psychol. Sci. 14, 1105–1108. (doi:10.1177/
1745691619858427)
15. Hardwicke TE, Wagenmakers EJ. 2023 Reducing bias, increasing transparency and calibrating confidence with preregistration. Nat. Hum.
Behav. 7, 15–26. (doi:10.1038/s41562-022-01497-2)
16. Dickersin K, Rennie D. 2003 Registering clinical trials. JAMA 290, 516–523. (doi:10.1001/jama.290.4.516)
17. Axfors C, Fröbert O, Janiaud P, Zavalis E, Hemkens LG, Ioannidis JP. 2024 Registrering av forskning baserad på nationella hälsoregister.
Lakartidningen 121, 15–16.
18. Errington TM, Mathur M, Soderberg CK, Denis A, Perfito N, Iorns E, Nosek BA. 2021 Investigating the replicability of preclinical cancer biology.
eLife 10, e71601. (doi:10.7554/elife.71601)
19. Ioannidis JPA. 2012 Why science is not necessarily self-correcting. Perspect. Psychol. Sci. 7, 645–654. (doi:10.1177/1745691612464056)
20. Nosek BA, Errington TM. 2020 What is replication? PLoS Biol. 18, e3000691. (doi:10.1371/journal.pbio.3000691)
21. Hart B, Lundh A, Bero L. 2012 Effect of reporting bias on meta-analyses of drug trials: reanalysis of meta-analyses. BMJ 344, d7202–d7202.
(doi:10.1136/bmj.d7202)
22. Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, Williamson PR. 2010 The impact of outcome reporting bias in randomised
controlled trials on a cohort of systematic reviews. BMJ 340, c365–c365. (doi:10.1136/bmj.c365)
23. Cobey KD etal. 2023 Community consensus on core open science practices to monitor in biomedicine. PLoS Biol. 21, e3001949. (doi:10.1371/
journal.pbio.3001949)
24. Then SN, Lipworth W, Stewart C, Kerridge I. 2021 A framework for ethics review of applications to store, reuse and share tissue samples.
Monash Bioeth. Rev. 39, 115–124. (doi:10.1007/s40592-021-00126-4)
25. Borysowski J, Wnukiewicz‐Kozłowska A, Górski A. 2020 Legal regulations, ethical guidelines and recent policies to increase transparency of
clinical trials. Br. J. Clin. Pharmacol. 86, 679–686. (doi:10.1111/bcp.14223)
26. Gamble C etal. 2017 Guidelines for the content of statistical analysis plans in clinical trials. JAMA 318, 2337–2343. (doi:10.1001/jama.2017.
18556)
27. International Committee of Medical Journal Editors, ICMJE. 2022 Recommendations for the conduct, reporting, editing, and publication of
scholarly work in medical journals. See https://www.icmje.org/icmje-recommendations.pdf.
28. World Medical Association. 2013 Declaration of Helsinki: ethical principles for medical research involving human subjects. See https://www.
wma.net/what-we-do/medical-ethics/declaration-of-helsinki/.
29. Schulz KF, Altman DG, Moher D. 2010 CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. J.
Pharmacol. Pharmacother. 1, 100–107. (doi:10.4103/0976-500x.72352)
30. World Health Organization. 2017 Joint statement on public disclosure of results from clinical trials. See https://www.who.int/news/item/18-
05-2017-joint-statement-on-registration.
19
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
31. European Data Protection Supervisor. 2023 Health data in the workplace. See https://edps.europa.eu/data-protection/data-protection/
reference-library/health-data-workplace_en.
32. Hjollund NHI, Valderas JM, Kyte D, Calvert MJ . 2019 Health data processes: a framework for analyzing and discussing efficient use and reuse
of health data with a focus on patient-reported outcome measures. J. Med. Internet Res. 21, e12412. (doi:10.2196/12412)
33. Towards the European Health Data Space. 2022 Report on secondary use of health data through European case studies. See https://tehdas.eu/
app/uploads/2022/08/tehdas-report-on-secondary-use-of-health-data-through-european-case-studies-.pdf.
34. World Health Organization. 2022 Meeting on secondary use of health data. See https://www.who.int/europe/news-room/events/item/2022/
12/13/default-calendar/meeting-on-secondary-use-of-health-data.
35. Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Detmer DE. 2007 Toward a national framework for the secondary
use of health data: an American Medical Informatics Association white paper. J. Am. Med. Informatics Assoc. 14, 1–9. (doi:10.1197/jamia.
m2273)
36. Makady A, de Boer A, Hillege H, Klungel O, Goettsch W. 2017 What is real-world data? A review of definitions based on literature and
stakeholder interviews. Value Health 20, 858–865. (doi:10.1016/j.jval.2017.03.008)
37. Geissbuhler A et al. 2013 Trustworthy reuse of health data: a transnational perspective. Int. J. Med. Informatics 82, 1–9. (doi:10.1016/j.
ijmedinf.2012.11.003)
38. Hripcsak G etal. 2014 Health data use, stewardship, and governance: ongoing gaps and challenges: a report from AMIA’s 2012 Health Policy
Meeting. J. Am. Med. Informatics Assoc. 21, 204–211. (doi:10.1136/amiajnl-2013-002117)
39. Gusenbauer M, Haddaway NR. 2020 Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating
retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11, 181–217. (doi:10.1002/jrsm.1378)
40. Gusenbauer M. 2019 Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic
databases. Scientometrics 118, 177–214. (doi:10.1007/s11192-018-2958-5)
41. Martín-Martín A, Thelwall M, Orduna-Malea E, Delgado López-Cózar E. 2021 Google Scholar, Microsoft Academic, Scopus, Dimensions, Web
of Science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations. Scientometrics 126, 871–906. (doi:10.1007/
s11192-020-03690-4)
42. Nicholas D, Boukacem‐Zeghmouri C, Rodríguez‐Bravo B, Xu J, Watkinson A, Abrizah A, Herman E, Świgoń M. 2017 Where and how early
career researchers find scholarly information. Learn. Publ. 30, 19–29. (doi:10.1002/leap.1087)
43. Van Noorden R. 2014 Online collaboration: scientists and the social network. Nature 512, 126–129. (doi:10.1038/512126a)
44. Athukorala K, Głowacka D, Jacucci G, Oulasvirta A, Vreeken J. 2016 Is exploratory search different? A comparison of information search
behavior for exploratory and lookup tasks. J. Assoc. Inf. Sci. Technol. 67, 2635–2651. (doi:10.1002/asi.23617)
45. Burns L, Roux NL, Kalesnik-Orszulak R, Christian J, Hukkelhoven M, Rockhold F, O’Donnell J. 2022 Real-world evidence for regulatory
decision-making: guidance from around the world. Clin. Ther. 44, 420–437. (doi:10.1016/j.clinthera.2022.01.012)
46. Braun V, Clarke V. 2006 Using thematic analysis in psychology. Qual. Res. Psychol. 3, 77–101. (doi:10.1191/1478088706qp063oa)
47. VERBI Software. 2021 MAXQDA 2022 [computer software]. Berlin, Germany. See https://www.maxqda.com/.
48. White R. 2017 Building trust in real-world evidence and comparative effectiveness research: the need for transparency. J. Comp. Eff. Res. 6,
5–7. (doi:10.2217/cer-2016-0070)
49. Baldwin JR, Pingault JB, Schoeler T, Sallis HM, Munafò MR. 2022 Protecting against researcher bias in secondary data analysis: challenges
and potential solutions. Eur. J. Epidemiol. 37, 1–10. (doi:10.1007/s10654-021-00839-0)
50. Burger HU, Gerlinger C, Harbron C, Koch A, Posch M, Rochon J, Schiel A. 2021 The use of external controls: to what extent can it currently be
recommended? Pharm. Stat. 20, 1002–1016. (doi:10.1002/pst.2120)
51. Berger ML, Mamdani M, Atkins D, Johnson ML. 2009 Good research practices for comparative effectiveness research: defining, reporting and
interpreting nonrandomized studies of treatment effects using secondary data sources: the ISPOR Good Research Practices for Retrospective
Database Analysis Task Force report—part I. Value Health 12, 1044–1052. (doi:10.1111/j.1524-4733.2009.00600.x)
52. Van den Akker OR et al. 2021 Preregistration of secondary data analysis: a template and tutorial. Meta Psychol. 5, 2625. (doi:10.15626/mp.
2020.2625)
53. Dhruva SS, Shah ND, Ross JS. 2020 Mandatory registration and results reporting of real-world evidence studies of FDA-regulated medical
products. Mayo Clin. Proc. 95, 2609–2611. (doi:10.1016/j.mayocp.2020.04.013)
54. Japanese Pharmaceuticals and Medical Devices Agency, PMDA. 2014 Guidelines for the conduct of pharmacoepidemiological studies in drug
safety assessment with medical information databases. See https://www.pmda.go.jp/files/000240951.pdf.
55. Ohmann C et al. 2017 Sharing and reuse of individual participant data from clinical trials: principles and recommendations. BMJ Open 7,
e018647. (doi:10.1136/bmjopen-2017-018647)
56. Schneeweiss S. 2016 Improving therapeutic effectiveness and safety through big healthcare data. Clin. Pharmacol. Ther. 99, 262–265. (doi:
10.1002/cpt.316)
57. Franklin JM, Glynn RJ, Martin D, Schneeweiss S. 2019 Evaluating the use of nonrandomized real-world data analyses for regulatory decision
making. Clin. Pharmacol. Therapeut. 105, 867–877. (doi:10.1002/cpt.1351)
58. Liu M, Qi Y, Wang W, Sun X. 2022 Toward a better understanding about real-world evidence. Eur. J. Hosp. Pharm. 29, 8–11. (doi:10.1136/
ejhpharm-2021-003081)
59. Wang SV et al. 2017 Reporting to improve reproducibility and facilitate validity assessment for healthcare database studies V1.0. Value
Health 20, 1009–1022. (doi:10.1016/j.jval.2017.08.3018)
20
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
60. Perry DC, Parsons N, Costa ML. 2014 Big data reporting guidelines: how to answer big questions, yet avoid big problems. Bone Joint J. 96B,
1575–1577. (doi:10.1302/0301-620X.96B12.35027)
61. Benchimol EI et al. 2015 The reporting of studies conducted using observational routinely-collected health Data (RECORD) statement. PLoS
Med. 12, e1001885. (doi:10.1371/journal.pmed.1001885)
62. Chai QY etal. 2022 Quality evaluation and reporting specification for real-world studies of traditional Chinese medicine. Chin. J. Integr. Med.
28, 1059–1062. (doi:10.1007/s11655-022-3583-y)
63. Langan SM et al. 2018 The reporting of studies conducted using observational routinely collected health data statement for
pharmacoepidemiology (RECORD-PE). BMJ 363, k3532. (doi:10.1136/bmj.k3532)
64. Roche N et al. 2014 Quality standards for real-world research. Focus on observational database studies of comparative effectiveness. Ann.
Am. Thorac. Soc. 11, S99–104. (doi:10.1513/AnnalsATS.201309-300RM)
65. van den Broek RWM, Matheis RJ, Bright JL, Hartog TE, Perfetto EM. 2022 Value-based evidence across health care sectors: a push for
transparent real-world studies, data, and evidence dissemination. Health Econ. Policy Law 17, 416–427. (doi:10.1017/s1744133122000056)
66. Wang SV etal. 2019 Transparent reporting on research using unstructured electronic health record data to generate ‘real world’ evidence of
comparative effectiveness and safety. Drug Saf. 42, 1297–1309. (doi:10.1007/s40264-019-00851-0)
67. Willke RJ, Mullins CD. 2011 Ten commandments’ for conducting comparative effectiveness research using ‘real-world data (doi:10.18553/
jmcp.2011.17.s9-a.S10)
68. Kievit RA, McCormick EM, Fuhrmann D, Deserno MK, Orben A. 2022 Using large, publicly available data sets to study adolescent
development: opportunities and challenges. Curr. Opin. Psychol. 44, 303–308. (doi:10.1016/j.copsyc.2021.10.003)
69. Zhao R, Zhang W, Zhang Z, He C, Xu R, Tang X, Wang B. 2023 Evaluation of reporting quality of cohort studies using real-world data based on
RECORD: systematic review. BMC Med. Res. Methodol. 23, 152. (doi:10.1186/s12874-023-01960-2)
70. Kent S et al. 2021 The use of nonrandomized evidence to estimate treatment effects in health technology assessment. J. Comp. Eff. Res. 10,
1035–1043. (doi:10.2217/cer-2021-0108)
71. Liu F, Panagiotakos D. 2022 Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med. Res.
Methodol. 22, 287. (doi:10.1186/s12874-022-01768-6)
72. Wang SV etal. 2021 STaRT-RWE: structured template for planning and reporting on the implementation of real world evidence studies. BMJ
372, m4856. (doi:10.1136/bmj.m4856)
73. Zarin DA, Crown WH, Bierer BE. 2020 Issues in the registration of database studies. J. Clin. Epidemiol. 121, 29–31. (doi:10.1016/j.jclinepi.
2020.01.007)
74. Santos J etal. 2017 ISPOR code of ethics 2017 (4th edition). Value Health 20, 1227–1242. (doi:10.1016/j.jval.2017.10.018)
75. Wang SV et al. 2022 Harmonized protocol template to enhance reproducibility of hypothesis evaluating real-world evidence studies on
treatment effects: a good practices report of a joint ISPE/ISPOR task force. Value Health 25, 1663–1672. (doi:10.1016/j.jval.2022.09.001)
76. Berger ML etal. 2017 Good practices for real‐world data studies of treatment and/or comparative effectiveness: recommendations from the
joint ISPOR‐ISPE Special Task Force on real‐world evidence in health care decision making. Pharmacoepidemiol. Drug Saf. 26, 1033–1039.
(doi:10.1002/pds.4297)
77. Moon RJ et al. 2023 Real-world evidence: new opportunities for osteoporosis research. Recommendations from a working group from the
European Society for Clinical and Economic Aspects of Osteoporosis, Osteoarthritis and Musculoskeletal Diseases (ESCEO). Osteoporos. Int. 34,
1283–1299. (doi:10.1007/s00198-023-06827-2)
78. Wang SV et al. 2022 Reproducibility of real-world evidence studies using clinical practice data to inform regulatory and coverage decisions.
Nat. Commun. 13, 5126. (doi:10.1038/s41467-022-32310-3)
79. De Angelis C etal. 2004 Clinical trial registration: a statement from the International Committee of Medical Journal Editors. Lancet 364, 911–
912. (doi:10.1016/s0140-6736(04)17034-7)
80. Denaxas S, Direk K, Gonzalez-Izquierdo A, Pikoula M, Cakiroglu A, Moore J, Hemingway H, Smeeth L. 2017 Methods for enhancing the
reproducibility of biomedical research findings using electronic health records. BioData Min. 10, 31. (doi:10.1186/s13040-017-0151-7)
81. Wang W etal. 2023 Data source profile reporting by studies that use routinely collected health data to explore the effec ts of drug treatment.
BMC Med. Res. Methodol. 23, 95. (doi:10.1186/s12874-023-01922-8)
82. Cave A, Kurz X, Arlett P . 2019 Real‐world data for regulatory decision making: challenges and possible solutions for Europe. Clin. Pharmacol.
Ther. 106, 36–39. (doi:10.1002/cpt.1426)
83. Cumyn A, Ménard JF, Barton A, Dault R, Lévesque F, Ethier JF. 2023 Patients’ and members of the public’s wishes regarding transparency in
the context of secondary use of health data: scoping review. J. Med. Internet Res. 25, e45002. (doi:10.2196/45002)
84. Nicholls SG et al. 2015 The REporting of Studies Conducted Using Observational Routinely-Collected Health Data (RECORD) statement:
methods for arriving at consensus and developing reporting guidelines. PLoS ONE 10, e0125620. (doi:10.1371/journal.pone.0125620)
85. Heikinheimo O, Bitzer J, García Rodríguez L. 2017 Real-world research and the role of observational data in the field of gynaecology: a
practical review. Eur. J. Contracept. Reprod. Health Care 22, 250–259. (doi:10.1080/13625187.2017.1361528)
86. Khosla S, White R, Medina J, Ouwens M, Emmas C , Koder T, Male G, Leonard S. 2018 Real world evidence (RWE)—a disruptive innovation or
the quiet evolution of medical evidence generation? F1000Research 7, 111. (doi:10.12688/f1000research.13585.2)
87. Patorno E, Schneeweiss S, Wang SV. 2020 Transparency in real‐world evidence (RWE) studies to build confidence for decision‐making:
reporting RWE research in diabetes. Diabetes Obes. Metab. 22, 45–59. (doi:10.1111/dom.13918)
21
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
88. Hersh WR etal. 2013 Recommendations for the use of operational electronic health record data in comparative effectiveness research. EGEMS
1, 1018. (doi:10.13063/2327-9214.1018)
89. Khachfe HH, Habib JR, Salhab HA, Fares MY, Chahrour MA, Jamali FR. 2022 American College of Surgeons NSQIP pancreatic surgery
publications: a critical appraisal of the quality of methodological reporting. Am. J. Surg. 223, 705–714. (doi:10.1016/j.amjsurg.2021.06.012)
90. Bate A. 2017 Guidance to reinforce the credibility of health care database studies and ensure their appropriate impact. Pharmacoepidemiol.
Drug Saf. 26, 1013–1017. (doi:10.1002/pds.4305)
91. Matandika L, Ngóngóla RT, Mita K, Manda-Taylor L, Gooding K, Mwale D, Masiye F, Mfutso-Bengo J. 2020 A qualitative study exploring
stakeholder perspectives on the use of biological samples for future unspecified research in Malawi. BMC Med. Ethics 21, 61. (doi:10.1186/
s12910-020-00503-4)
92. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, Smeeth L. 2015 Data resource profile: Clinical Practice Research
Datalink (CPRD). Int. J. Epidemiol. 44, 827–836. (doi:10.1093/ije/dyv098)
93. German Society for Epidemiology, DDGEpi. 2008 GPS—good practice in secondary data analysis: revision after fundamental reworking. See
https://www.dgepi.de/assets/Leitlinien-und-Empfehlungen/Practice-in-Secondary-Data-Analysis.pdf.
94. Health Canada, HC. 2019 Elements of real world data/evidence quality throughout the prescription drug produc t life cycle. See https://www.
canada.ca/en/services/health/publications/drugs-health-products/real-world-data-evidence-drug-lifecycle-report.html.
95. Hoffmann W et al. 2019 Guidelines and recommendations for ensuring good epidemiological practice (GEP): a guideline developed by the
German Society for Epidemiology. Eur. J. Epidemiol. 34, 301–317. (doi:10.1007/s10654-019-00500-x)
96. European Network of Centres for Pharmacoepidemiology and Pharmacovigilance, ENCePP. 2010 Guide on methodological standards in
pharmacoepidemiology (revision 11). EMA/95098/2010.
97. Public Policy Committee, International Society of Pharmacoepidemiology. 2016 Guidelines for good pharmacoepidemiology practice (GPP).
Pharmacoepidemiol. Drug Saf. 25, 2–10. (doi:10.1002/pds.3891)
98. Council for International Organizations of Medical Sciences, CIOMS. 2024 Real-world data and real world evidence in regulatory decision
making. See https://cioms.ch/working-groups/real-world-data-and-real-world-evidence-in-regulatory-decision-making.
99. Sarafoglou A, Kovacs M, Bakos B, Wagenmakers EJ, Aczel B. 2022 A survey on how preregistration affects the research workflow: better
science but more work. R. Soc. Open Sci. 9, 211997. (doi:10.1098/rsos.211997)
100. Spitzer L, Mueller S. 2023 Registered report: survey on attitudes and experiences regarding preregistration in psychological research. PLoS
ONE 18, e0281086. (doi:10.1371/journal.pone.0281086)
101. Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, Read E, Manoff M, Frame M. 2011 Data sharing by scientists: practices and perceptions.
PLoS ONE 6, e21101. (doi:10.1371/journal.pone.0021101)
102. Allen C, Mehler DMA. 2019 Open science challenges, benefits and tips in early career and beyond. PLoS Biol. 17, e3000246. (doi:10.1371/
journal.pbio.3000246)
103. Bundesministerium für Gesundheit. 2024 Gesetz zur verbesserten Nutzung von Gesundheitsdaten [Health Data Utilization Act]. See https://
www.bundesgesundheitsministerium.de/service/gesetze-und-verordnungen/detail/gesundheitsdatennutzungsgesetz.html.
104. Council European Union. 2024 Proposal for a regulation on the European Health Data Space. See https://www.consilium.europa.eu/media/
70909/st07553-en24.pdf.
105. Garrison LP, Neumann PJ, Erickson P, Marshall D, Mullins CD. 2007 Using real-world data for coverage and payment decisions: the ISPOR
Real-World Data Task Force report. Value Health 10, 326–335. (doi:10.1111/j.1524-4733.2007.00186.x)
106. Van den Akker OR, Thibault RT, Ioannidis JPA, Schorr SG, Strech D. 2025 Supplementary material from: Transparency in the secondary use of
health data: assessing the status quo of guidance and best practices. Figshare. doi:10.6084/m9.figshare.c.7644199
107*. Acha V, Barefoot B, Juarez Garcia A, Lehner V, Monno R, Sandler S, Spooner A, Verpillat P. 2023 Principles for good practice in the conduct of
non-interventional studies: the view of industry researchers. Ther. Innov. Regul. Sci. 57, 1199–1208. (doi:10.1007/s43441-023-00544-y)
108*. Alemayehu D, Cappelleri JC. 2013 Revisiting issues, drawbacks and opportunities with observational studies in comparative effectiveness
research. J. Eval. Clin. Pract. 19, 579–583. (doi:10.1111/j.1365-2753.2011.01802.x)
109*. Bahr A, Schlünder I. 2015 Code of practice on secondary use of medical data in European scientific research projects. Int. Data Priv. Law 5,
279–291. (doi:10.1093/idpl/ipv018)
110*. Bakker E, Plueschke K, Jonker CJ , Kurz X, Starokozhko V, Mol PGM. 2023 Contribution of real‐world evidence in European Medicines Agency’s
regulatory decision making. Clin. Pharmacol. Ther. 113, 135–151. (doi:10.1002/cpt.2766)
111*. Ballantyne A, Style R. 2017 Health data research in New Zealand: updating the ethical governance framework. N. Z. Med. J. 130, 1464.
112*. Banzi R, Canham S, Kuchinke W, Krleza-Jeric K, Demotes-Mainard J, Ohmann C. 2019 Evaluation of repositories for sharing individual-
participant data from clinical studies. Trials 20, 169. (doi:10.1186/s13063-019-3253-3)
113*. Baxter S, Franklin M, Haywood A, Stone T, Jones M, Mason S, Sterniczuk K. 2023 Sharing real-world data for public benefit: a qualitative
exploration of stakeholder views and perceptions. BMC Public Health 23, 133. (doi:10.1186/s12889-023-15035-w)
114*. Beaulieu‐Jones BK, Finlayson SG, Yuan W, Altman RB, Kohane IS, Prasad V, Yu K. 2020 Examining the use of real‐world evidence in the
regulatory process. Clin. Pharmacol. Ther. 107, 843–852. (doi:10.1002/cpt.1658)
115*. Bishop L, Kuula-Luumi A. 2017 Revisiting qualitative data reuse: a decade on. SAGE Open 7, 215824401668513. (doi:10.1177/
2158244016685136)
116*. Brakewood B, Poldrack RA. 2013 The ethics of secondary data analysis: considering the application of Belmont principles to the sharing of
neuroimaging data. NeuroImage 82, 671–676. (doi:10.1016/j.neuroimage.2013.02.040)
22
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
117*. Calvert M et al. 2018 Guidelines for inclusion of patient-reported outcomes in clinical trial protocols: the SPIRIT-PRO extension. JAMA 319,
483. (doi:10.1001/jama.2017.21903)
118*. Cambon-Thomsen A, Rial-Sebbag E, Knoppers BM. 2007 Trends in ethical and legal frameworks for the use of human biobanks. Eur. Respir. J.
30, 373–382. (doi:10.1183/09031936.00165006)
119*. Canaway R, Boyle DI, Manski‐Nankervis JE, Bell J, Hocking JS, Clarke K, Clark M, G unn JM, Emery JD. 2019 Gathering data for decisions: best
practice use of primary care electronic records for research. Med. J. Aust. 210. (doi:10.5694/mja2.50026)
120*. Capkun G et al. 2022 Can we use existing guidance to support the development of robust real-world evidence for health technology
assessment/payer decision-making? Int. J. Technol. Assess. Health Care 38, e79. (doi:10.1017/s0266462322000605)
121*. Chinese Center for Drug Evaluation, NMPA . 2021 Guiding principles of real world data used to generate real world evidence (trial). See https:/
/redica.com/wp-content/uploads/NMPA_-Attachment_-_Guiding-Principles-of-Real-World-Data-Used-to-Generate-Real-World-Evidence-
Trial_.pdf.
122*. Collins GS, Reitsma JB, Altman DG, Moons KGM. (n.d). Transparent reporting of a multivariable prediction model for individual prognosis or
diagnosis (TRIPOD)
123*. Cox E, Martin BC, Van Staa T, Garbe E, Siebert U, Johnson ML. 2009 Good research practices for comparative effectiveness research:
approaches to mitigate bias and confounding in the design of nonrandomized studies of treatment effects using secondary data sources: the
International Society for Pharmacoeconomics and Outcomes Research good research practices for retrospective database analysis task force
report—Part II. Value Health 12, 1053–1061. (doi:10.1111/j.1524-4733.2009.00601.x)
124*. Cragg L, Williams S, van der Molen T, Thomas M, Correia de Sousa J, Chavannes NH. 2018 Fostering the exchange of real world data across
different countries to answer primary care research questions: an UNLOCK study from the IPCRG. Npj Prim. Care Respir. Med. 28. (doi:10.1038/
s41533-018-0075-9)
125*. Cristea IA, Naudet F, Caquelin L. 2022 Meta-research studies should improve and evaluate their own data sharing practices. J. Clin. Epidemiol.
149, 183–189. (doi:10.1016/j.jclinepi.2022.05.007)
126*. Danciu I, Cowan JD, Basford M, Wang X, Saip A, Osgood S, Shirey-Rice J, Kirby J , Harris PA. 2014 Secondar y use of clinical data: the Vanderbilt
approach. J. Biomed. Informatics 52, 28–35. (doi:10.1016/j.jbi.2014.02.003)
127*. de Vries J, Munung SN, Matimba A, McCurdy S , Ouwe Missi Oukem-Boyer O, Staunton C, Yakubu A, Tindana P. the H3Africa Consortium2017
Regulation of genomic and biobanking research in Africa: a content analysis of ethics guidelines, policies and procedures from 22 African
countries. BMC Med. Ethics 18, 8. (doi:10.1186/s12910-016-0165-6)
128*. Dhruva SS, Ross JS, Desai NR. 2018 Real-world evidence: promise and peril for medical product evaluation. Pharm. Ther. 43, 464.
129*. Dreyer NA. 2018 Advancing a framework for regulatory use of real-world evidence: when real is reliable. Ther. Innov. Regul. S ci. 52, 362–368.
(doi:10.1177/2168479018763591)
130*. Edmondson ME, Reimer AP. 2020 Challenges frequently encountered in the secondary use of electronic medical record data for research. CIN
38, 338–348. (doi:10.1097/cin.0000000000000609)
131*. Engel P, Almas MF, De Bruin ML, Starzyk K, Blackburn S, Dreyer NA. 2017 Lessons learned on the design and the conduct of post‐
authorization safety studies: review of 3 years of PRAC oversight. Br. J. Clin. Pharmacol. 83, 884–893. (doi:10.1111/bcp.13165)
132*. European Medicines Agency, EMA. 2018 Discussion paper: use of patient disease registries for regulatory purposes – methodological and
operational considerations. EMA/644749/2018
133*. European Medicines Agency, EMA. 2021 Guideline on registry-based studies. EMA/426390/2021
134*. European Network of Centres for Pharmacoepidemiology and Pharmacovigilance, ENCePP. 2018 ENCePP Checklist for Study Protocols
(Revision 4). EMA/540136/2009.
135*. Facey KM, Rannanheimo P, Batchelor L, Borchardt M, de Cock J. 2020 Real-world evidence to support payer/HTA decisions about highly
innovative technologies in the EU—actions for stakeholders. Int. J. Technol. Assess. Health Care 36, 459–468. (doi:10.1017/
s026646232000063x)
136*. Franklin JM, Glynn RJ , Martin D, Schneeweiss S. 2019 Evaluating the use of nonrandomized real‐world data analyses for regulatory decision
making. Clin. Pharmacol. Ther. 105, 867–877. (doi:10.1002/cpt.1351)
137*. Garrison LP, Neumann PJ, Erickson P, Marshall D, Mullins CD. 2007 Using real-world data for coverage and payment decisions: The ISPOR
real-world data task force report. Value Health 10, 326–335. (doi:10.1111/j.1524-4733.2007.00186.x)
138*. Gatto NM, Reynolds RF, Campbell UB. 2019 A structured preapproval and postapproval comparative study design framework to generate
valid and transparent real‐world evidence for regulatory decisions. Clin. Pharmacol. Ther. 106, 103–115. (doi:10.1002/cpt.1480)
139*. Gatto NM, Wang SV, Murk W, Mattox P, Brookhart MA, Bate A, Schneeweiss S, Rassen JA. 2022 Visualizations throughout
pharmacoepidemiology study planning, implementation, and reporting. Pharmacoepidemiol. Drug Saf. 31, 1140–1152. (doi:10.1002/pds.
5529)
140*. Gini R, Fournie X, D olk H, Kurz X, Verpillat P, Simondon F, Strassmann V, Apostolidis K, Goedecke T. 2019 The ENCePP code of conduct: a best
practise for scientific independence and transparency in noninterventional postauthorisation studies. Pharmacoepidemiol. Drug Saf. 28,
422–433. (doi:10.1002/pds.4763)
141*. Gregg EW etal. 2023 Use of real-world data in population science to improve the prevention and care of diabetes-related outcomes. Diabetes
Care 46, 1316–1326. (doi:10.2337/dc22-1438)
142*. Hanzel J et al. 2023 Development of a core outcome set for real-world data in inflammatory bowel disease: a European Crohn’s and Colitis
Organisation [ECCO] position paper. J. Crohn’s Colitis 17, 311–317. (doi:10.1093/ecco-jcc/jjac136)
23
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
143*. Hayrinen K, Saranto K, Nykanen P. 2008 Definition, structure, content, use and impacts of electronic health records: a review of the research
literature. Int. J. Med. Informatics 77, 291–304. (doi:10.1016/j.ijmedinf.2007.09.001)
144*. Health Canada, HC. 2023 Guidance for reporting real-world evidence. See https://www.cadth.ca/sites/default/files/RWE/MG0020/MG0020-
RWE-Guidance-Report-Secured.pdf.
145*. Honor LB, Haselgrove C, Frazier JA, Kennedy DN. 2016 Data citation in neuroimaging: proposed best practices for data identification and
attribution. Front. Neuroinformatics 10, 34. (doi:10.3389/fninf.2016.00034)
146*. Husereau D et al. 2013 Consolidated Health Economic Evaluation Reporting Standards (CHEERS)—explanation and elaboration: a report of
the ISPOR Health Economic Evaluation Publication Guidelines Good Reporting Practices Task Force. Value Health 16, 231–250. (doi:10.1016/j.
jval.2013.02.002)
147*. Innovative Medicines Initiative, IMI. 2014 Code of practice on secondary use of medical data in scientific research projects. See https://www.
imi.europa.eu/sites/default/files/uploads/documents/reference-documents/CodeofPractice_SecondaryUseDRAFT.pdf.
148*. International Coalition of Medicines Regulatory Authorities, ICMRA. 2022 ICMRA statement oninternational collaboration to enable real-world
evidence (RWE) for regulatory decision-making. See https://www.icmra.info/drupal/sites/default/files/2022-07/icmra_statement_on_rwe.
pdf.
149*. Jaksa A, Wu J, Jónsson P, Eichler HG, Vititoe S, Gatto NM. 2021 Organized structure of real-world evidence best practices: moving from
fragmented recommendations to comprehensive guidance. J. Comp. Eff. Res. 10, 711–731. (doi:10.2217/cer-2020-0228)
150*. Jansen ACM, van Aalst-Cohen ES, Hutten BA, Büller HR, Kastelein JJP, Prins MH. 2005 Guidelines were developed for data collection from
medical records for use in retrospective analyses. J. Clin. Epidemiol. 58, 269–274. (doi:10.1016/j.jclinepi.2004.07.006)
151*. Japanese Pharmaceuticals and Medical Devices Agency, PMDA. 2021a Points to consider for ensuring the reliability in utilization of registry
data for applications. See https://www.pmda.go.jp/files/000240807.pdf.
152*. Japanese Pharmaceuticals and Medical Devices Agency, PMDA. 2021b Basic principles on utilization of registry for applications. See https://
www.pmda.go.jp/files/000240806.pdf.
153*. Jawad M, Butrous E, Faber B, Gupta C. 2012 A study to define the international guidelines of ethics concerning electronic medical data 3.
154*. Johnson ML, Crown W, Martin BC, Dormuth CR, Siebert U. 2009 Good research practices for comparative effectiveness research: analytic
methods to improve causal inference from nonrandomized studies of treatment effects using secondary data sources: the ISPOR good
research practices for retrospective database analysis task force report—Part III. Value Health 12, 1062–1073. (doi:10.1111/j.1524-4733.
2009.00602.x)
155*. Kim MJ, Kim HJ, Kang D, Ahn HK, Shin SY, Park S, Cho J, Park YH. 2023 Preliminary attainability assessment of real-world data for answering
major clinical research questions in breast cancer brain metastasis: framework development and validation study. J. Med. Internet Res. 25,
e43359. (doi:10.2196/43359)
156*. Kohane IS et al. 2021 What every reader should know about studies using electronic health record data but may be afraid to ask. J. Med.
Internet Res. 23, e22219. (doi:10.2196/22219)
157*. Lamas E, Barh A, Brown D, Jaulent MC. 2015Legal and social issues related to the health data-warehouses: re-using health data in the
research and public health research
158*. Langhof H, K ahrass H, Sievers S, Strech D. 2017 Access policies in biobank research: what criteria do they include and how publicly available
are they? A cross-sectional study. Eur. J. Hum. Genet. 25, 293–300. (doi:10.1038/ejhg.2016.172)
159*. Laurent T, Lambrelli D, Wakabayashi R, Hirano T, Kuwatsuru R. 2023 Strategies to address current challenges in real-world evidence
generation in Japan. Drugs Real World Outcomes 10, 167–176. (doi:10.1007/s40801-023-00371-5)
160*. Levenson MS. 2020 Regulatory-grade clinical trial design using real-world data. Clin. Trials 17, 377–382. (doi:10.1177/1740774520905576)
161*. Loder E, Groves T, MacAuley D. 2010 Registration of observational studies. BMJ 340, c950–c950. (doi:10.1136/bmj.c950)
162*. Mahmud N, Goldberg DS, Bittermann T. 2022 Best practices in large database clinical epidemiology research in hepatology: barriers and
opportunities. Liver Transplant. 28, 113–122. (doi:10.1002/lt.26231)
163*. Medicines & Healthcare products Regulatory Agency, MHRA. 2021 MHRA guidance on the use of real-world data in clinical studies to support
regulatory decisions. See https://www.gov.uk/government/publications/mhra-guidance-on-the-use-of-real-world-data-in-clinical-studies-
to-support-regulatory-decisions/mhra-guidance-on-the-use-of-real-world-data-in-clinical-studies-to-support-regulatory-decisions.
164*. National Institute for Health Care and Excellence, NICE. 2020 Widening the evidence base: use of broader data and applied analytics in NICE’s
work. See https://www.nice.org.uk/Media/Default/About/what-we-do/NICE-guidance/NICE-guidelines/how-we-develop-nice-guidelines/
statement-of-intent.docx.
165*. Nishioka K, Makimura T, Ishiguro A, Nonaka T, Yamaguchi M, Uyama Y. 2022 Evolving acceptance and use of RWE for regulatory decision
making on the benefit/risk assessment of a drug in Japan. Clin. Pharmacol. Ther. 111, 35–43. (doi:10.1002/cpt.2410)
166*. Oortwijn W, Sampietro-Colom L, Trowman R. 2019 How to deal with the inevitable: generating real-world data and using real-world
evidence for HTA purposes – from theory to action. Int. J. Technol. Assess. Health Care 35, 346–350. (doi:10.1017/s0266462319000400)
167*. Pacurariu A, Plueschke K, McGettigan P, Morales DR, Slattery J, Vogl D, Goedecke T, Kurz X, Cave A. 2018 Electronic healthcare databases in
Europe: descriptive analysis of characteristics and potential for use in medicines regulation. BMJ Open 8, e023090. (doi:10.1136/bmjopen-
2018-023090)
168*. Pavlenko E, Strech D, Langhof H. 2020 Implementation of data access and use procedures in clinical data warehouses. A systematic review of
literature and publicly available policies. BMC Med. Informatics Decis. Mak. 20, 157. (doi:10.1186/s12911-020-01177-z)
24
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
169*. Peacock A, Larance B, Bruno R, Pearson S, Buckley NA, Farrell M, Degenhardt L. 2019 Post‐marketing studies of pharmaceutical opioid abuse‐
deterrent formulations: a framework for research design and reporting. Addiction 114, 389–399. (doi:10.1111/add.14380)
170*. Perfetto EM, Burke L, Oehrlein EM, Gaballah M. 2015 FDAMA Section 114: why the renewed interest? J. Manag. Care Spec. Pharm. 21, 368–
374. (doi:10.18553/jmcp.2015.21.5.368)
171*. Rea S et al. 2012 Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project. J.
Biomed. Informatics 45, 763–771. (doi:10.1016/j.jbi.2012.01.009)
172*. RELEASE Collaboration. 2020 Communicating simply, but not too simply: Reporting of participants and speech and language interventions
for aphasia after stroke. Int. J. Speech Lang. Pa thol. 22, 302–312. (doi:10.1080/17549507.2020.1762000)
173*. Shenkin SD et al. 2017 Improving data availability for brain image biobanking in healthy subjects: practice-based suggestions from an
international multidisciplinary working group. NeuroImage 153, 399–409. (doi:10.1016/j.neuroimage.2017.02.030)
174*. Sunjic-Alic A, Zebenholzer K, Gall W. 2021 Reporting of studies conducted on Austrian claims data. In Navigating healthcare through
challenging times studies in health technology and informatics (eds D Hayn, G Schreier, M B aumgartner). IOS Press. (doi:10.3233/SHTI210090)
175*. Szkultecka-Dębek M. 2015 Real world data guidelines - current status review. J. Health Policy Outcomes Res. 10–14. (doi:10.7365/jhpor.2015.
1.2)
176*. Towards European Health Data Space, TEHDAS. 2022 Report on secondary use of health data through European case studies. See https://
tehdas.eu/app/uploads/2022/08/tehdas-report-on-secondary-use-of-health-data-through-european-case-studies-.pdf.
177*. Thomas M, Cleland J, Price D. 2003 Database studies in asthma pharmacoeconomics: uses, limitations and quality markers. Expert Opin.
Pharmacother. 4, 351–358. (doi:10.1517/eoph.4.3.351.22241)
178*. Umberfield EE, Kardia SLR, Jiang Y, Thomer AK, Harris MR. 2022 Regulations and norms for reuse of residual clinical biospecimens and health
data. West. J. Nurs. Res. 44, 1068–1081. (doi:10.1177/01939459211029296)
179*. United States Food and Drug Administration, FDA. 2018 Framework for FDA’s Real-World Evidence Program. See https://www.fda.gov/media/
120060/download?attachment.
180*. Verheij RA, Curcin V, Delaney BC, McGilchrist MM. 2018 Possible sources of bias in primary care electronic health record data use and reuse. J.
Med. Internet Res. 20, e185. (doi:10.2196/jmir.9134)
181*. Vollmer S et al. 2020 Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency,
replicability, ethics, and effectiveness. BMJ 368, l6927. (doi:10.1136/bmj.l6927)
182*. Walker J, Dormer L, Garner P. 2023 Journal of Comparative Effectiveness Research welcoming the submission of study design protocols to
foster transparency and trust in real-world evidence. J. Comp. Eff. Res. 12. (doi:10.2217/cer-2022-0197)
183*. Wang S, Verpillat P, Rassen J, Patrick A, Garr y E, Bartels D. 2016 Transparency and reproducibility of observational cohort studies using large
healthcare databases. Clin. Pharmacol. Ther. 99, 325–332. (doi:10.1002/cpt.329)
184*. Weiskopf NG, Weng C. 2013 Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical
research. J. Am. Med. Informatics Assoc. 20, 144–151. (doi:10.1136/amiajnl-2011-000681)
185*. Weston SJ, Ritchie SJ, Rohrer JM, Przybylski AK. 2019 Recommendations for increasing the transparency of analysis of preexisting data sets.
Adv. Methods Pract. Psychol. Sci. 2, 214–227. (doi:10.1177/2515245919848684)
186*. World Health Organization, WHO. 2022 Sharing and reuse of health-related data for research purposes: WHO policy and implementation
guidance. See https://iris.who.int/handle/10665/352859.
187*. World Medical Association, WMA. 2016 WMA Declaration of Taipei on ethical considerations regarding health databases and biobanks. See
https://www.wma.net/policies-post/wma-declaration-of-taipei-on-ethical-considerations-regarding-health-databases-and-biobanks/.
25
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 12: 241364
ResearchGate has not been able to resolve any citations for this publication.
Preprint
Full-text available
We evaluated what guidance exists in the literature to improve the transparency of studies that make secondary use of health data. To find relevant literature, we searched PubMed and Google Scholar and drafted a list of health organizations based on our personal expertise. We quantitatively and qualitatively coded different types of research transparency: registration, methods reporting, results reporting, data sharing, and code sharing. We found 54 documents that provide recommendations to improve the transparency of studies making secondary use of health data, mainly in relation to study registration (n = 27) and methods reporting (n = 39). Only three documents made recommendations on data sharing or code sharing. Recommendations for study registration and methods reporting mainly came in the form of structured documents like registration templates and reporting guidelines. Aside from the recommendations aimed directly at researchers, we found 31 recommendations aimed at the wider research community, typically on how to improve research infrastructure. Limitations or challenges of improving transparency were rarely mentioned, highlighting the need for more nuance in providing transparency guidance for studies that make secondary use of health data.
Article
Full-text available
This reflection paper presents a consolidated view of EFPIA on the need for principles for good practice in the generation and use of non-interventional studies (NIS), including overarching principles such as the registration of hypothesis evaluating treatment effect (HETE) studies. We first define NIS and the important adjacencies to clinical trials and relationship with real-world evidence (RWE). We then outline the principles for good practice with respect to appropriate research design, study protocol, fit-for-purpose variables and data quality, analytical methods, bias reduction, transparency in conduct and use, privacy management and ethics review. We conclude with recommendations for action for the research community to promote trust and credibility in the use of NIS.
Article
Full-text available
Objective Real-world data (RWD) and real-world evidence (RWE) have been paid more and more attention in recent years. We aimed to evaluate the reporting quality of cohort studies using real-world data (RWD) published between 2013 and 2021 and analyze the possible factors. Methods We conducted a comprehensive search in Medline and Embase through the OVID interface for cohort studies published from 2013 to 2021 on April 29, 2022. Studies aimed at comparing the effectiveness or safety of exposure factors in the real-world setting were included. The evaluation was based on the REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. Agreement for inclusion and evaluation was calculated using Cohen’s kappa. Pearson chi-square test or Fisher’s exact test and Mann-Whitney U test were used to analyze the possible factors, including the release of RECORD, journal IFs, and article citations. Bonferroni’s correction was conducted for multiple comparisons. Interrupted time series analysis was performed to display the changes in report quality over time. Results 187 articles were finally included. The mean ± SD of the percentage of adequately reported items in the 187 articles was 44.7 ± 14.3 with a range of 11.1–87%. Of 23 items, the adequate reporting rate of 10 items reached 50%, and the reporting rate of some vital items was inadequate. After Bonferroni’s correction, the reporting of only one item significantly improved after the release of RECORD and there was no significant improvement in the overall report quality. For interrupted time series analysis, there were no significant changes in the slope (p = 0.42) and level (p = 0.12) of adequate reporting rate. The journal IFs and citations were respectively related to 2 areas and the former significantly higher in high-reporting quality articles. Conclusion The endorsement of the RECORD cheklist was generally inadequate in cohort studies using RWD and has not improved in recent years. We encourage researchers to endorse relevant guidelines when utilizing RWD for research.
Article
Full-text available
Unlabelled: This narrative review summarises the recommendations of a Working Group of the European Society for Clinical and Economic Aspects of Osteoporosis, Osteoarthritis and Musculoskeletal Diseases (ESCEO) for the conduct and reporting of real-world evidence studies with a focus on osteoporosis research. Purpose: Vast amounts of data are routinely generated at every healthcare contact and activity, and there is increasing recognition that these real-world data can be analysed to generate scientific evidence. Real-world evidence (RWE) is increasingly used to delineate the natural history of disease, assess real-life drug effectiveness, understand adverse events and in health economic analysis. The aim of this work was to understand the benefits and limitations of this type of data and outline approaches to ensure that transparent and high-quality evidence is generated. Methods: A ESCEO Working Group was convened in December 2022 to discuss the applicability of RWE to osteoporosis research and approaches to best practice. Results: This narrative review summarises the agreed recommendations for the conduct and reporting of RWE studies with a focus on osteoporosis research. Conclusions: It is imperative that research using real-world data is conducted to the highest standards with close attention to limitations and biases of these data, and with transparency at all stages of study design, data acquisition and curation, analysis and reporting to increase the trustworthiness of RWE study findings.
Article
Full-text available
The generation of real-world evidence (RWE), which describes patient characteristics or treatment patterns using real-world data (RWD), is rapidly growing more popular as a tool for decision-making in Japan. The aim of this review was to summarize challenges to RWE generation in Japan related to pharmacoepidemiology, and to propose strategies to address some of these challenges. We first focused on data-related issues, including the lack of transparency of RWD sources, linkage across different care settings, definitions of clinical outcomes, and the overall assessment framework of RWD when used for research purposes. Next the study reviewed methodology-related challenges. As lack of design transparency impairs study reproducibility, transparent reporting of study design is critical for stakeholders. For this review, we considered different sources of biases and time-varying confounding, along with potential study design and methodological solutions. Additionally, the implementation of robust assessment of definition uncertainty, misclassification, and unmeasured confounders would enhance RWE credibility in light of RWD source-related limitations, and is being strongly considered by task forces in Japan. Overall, the development of guidance for best practices on data source selection, design transparency, and analytical methods to address different sources of biases and robustness in the process of RWE generation will enhance credibility for stakeholders and local decision-makers.
Article
Full-text available
Background Routinely collected health data (RCD) are important resource for exploring drug treatment effects. Adequate reporting of data source profiles may increase the credibility of evidence generated from these data. This study conducted a systematic literature review to evaluate the reporting characteristics of databases used by RCD studies to explore the effects of drug treatment. Methods Observational studies published in 2018 that used RCD to explore the effects of drug treatment were identified by searching PubMed. We categorized eligible reports into two groups by journal impact factor (IF), including the top 5 general medical journals (NEJM, Lancet, JAMA, BMJ and JAMA Internal Medicine) and the other journals. The reporting characteristics of the databases used were described and compared between the two groups and between studies citing and not citing database references. Results A total of 222 studies were included, of which 53 (23.9%) reported that they applied data linkage, 202 (91.0%) reported the type of database, and 211 (95.0%) reported the coverage of the data source. Only 81 (36.5%) studies reported the timeframe of the database. Studies in high-impact journals were more likely to report that they applied data linkage (65.1% vs. 20.2%) and used electronic medical records (EMR) (73.7% vs. 30.0%) and national data sources (77.8% vs. 51.3%) than those published in other medical journals. There were 137/222 (61.7%) cited database references. Studies with database-specific citations had better reporting of the data sources and were more likely to publish in high-impact journals than those without (mean IF, 6.08 vs. 4.09). Conclusions Some deficits were found in the reporting quality of databases in studies that used RCD to explore the effects of drug treatment. Studies citing database-specific references may provide detailed information regarding data source characteristics. The adoption of reporting guidelines and education on their use is urgently needed to promote transparency by research groups.
Article
Full-text available
Background: Secondary use of health data has reached unequaled potential to improve health systems governance, knowledge, and clinical care. Transparency regarding this secondary use is frequently cited as necessary to address deficits in trust and conditional support and to increase patient awareness. Objective: We aimed to review the current published literature to identify different stakeholders’ perspectives and recommendations on what information patients and members of the public want to learn about the secondary use of health data for research purposes and how and in which situations. Methods: Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, we conducted a scoping review using Medline, CINAHL, PsycINFO, Scopus, Cochrane Library, and PubMed databases to locate a broad range of studies published in English or French until November 2022. We included articles reporting a stakeholder’s perspective or recommendations of what information patients and members of the public want to learn about the secondary use of health data for research purposes and how or in which situations. Data were collected and analyzed with an iterative thematic approach using NVivo. Results: Overall, 178 articles were included in this scoping review. The type of information can be divided into generic and specific content. Generic content includes information on governance and regulatory frameworks, technical aspects, and scientific aims. Specific content includes updates on the use of one’s data, return of results from individual tests, information on global results, information on data sharing, and how to access one’s data. Recommendations on how to communicate the information focused on frequency, use of various supports, formats, and wording. Methods for communication generally favored broad approaches such as nationwide publicity campaigns, mainstream and social media for generic content, and mixed approaches for specific content including websites, patient portals, and face-to-face encounters. Content should be tailored to the individual as much as possible with regard to length, avoidance of technical terms, cultural competence, and level of detail. Finally, the review outlined 4 major situations where communication was deemed necessary: before a new use of data, when new test results became available, when global research results were released, and in the advent of a breach in confidentiality. Conclusions: This review highlights how different types of information and approaches to communication efforts may serve as the basis for achieving greater transparency. Governing bodies could use the results: to elaborate or evaluate strategies to educate on the potential benefits; to provide some knowledge and control over data use as a form of reciprocity; and as a condition to engage citizens and build and maintain trust. Future work is needed to assess which strategies achieve the greatest outreach while striking a balance between meeting information needs and use of resources.
Article
Using administrative patient-care data such as Electronic Health Records (EHR) and medical/pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect estimates of the association between a binary outcome and an exposure (continuous or categorical) of interest. We consider four easy-to-implement weighting approaches to reduce selection bias with accompanying variance formulae. We demonstrate through a simulation study when they can rescue us in practice with analysis of real-world data. We compare these methods using a data example where our goal is to estimate the well-known association of cancer and biological sex, using EHR from a longitudinal biorepository at the University of Michigan Healthcare system. We provide annotated R codes to implement these weighted methods with associated inference.
Article
Background: When health-related research funding agencies choose to fund research, they balance a number of competing issues: costs, stakeholder views and potential benefits. The REWARD Alliance, and the related Lancet-REWARD Campaign, question whether those decisions are yielding all the value they could. Methods: A group of health-related research funding agencies, organisations that represent health-related research funding agencies and those that inform and set health-related-research funding policy from around the world have come together since 2016 to share, learn, collaborate and influence emerging practice. This group meets under the name of the Ensuring Value in Research Funders’ Forum (EViR Funders’ Forum). The EViR Funders’ Forum worked together to develop a set of ten Guiding Principles, that if funders adhered to would reduce research waste and ensure value in research. Results: The EViR Funders’ Forum has previously agreed and published a Consensus Statement. The Forum has agreed on a set of ten Guiding Principles to help health-research funders to maximise the value of research by ensuring that: research priorities are justifiable; the design, conduct and analysis of research minimise bias; regulation and management are proportionate to risks; methods and findings are accessible in full; and findings are appropriately and effectively disseminated and used. Conclusions: When setting research funding policy, we must balance multiple stakeholders’ needs and expectations. When funders do this well, they maximise the probability of benefits to society from the research they support - when funders do this badly, they passively allow or actively contribute to research waste. These challenges must be resolved by funders either working together or in conjunction with other actors in the research ecosystem.
Article
The past decade of population research for diabetes has seen a dramatic proliferation of the use of real-world data (RWD) and real-world evidence (RWE) generation from non-research settings, including both health and non-health sources, to influence decisions related to optimal diabetes care. A common attribute of these new data is that they were not collected for research purposes yet have the potential to enrich the information around the characteristics of individuals, risk factors, interventions, and health effects. This has expanded the role of subdisciplines like comparative effectiveness research and precision medicine, new quasi-experimental study designs, new research platforms like distributed data networks, and new analytic approaches for clinical prediction of prognosis or treatment response. The result of these developments is a greater potential to progress diabetes treatment and prevention through the increasing range of populations, interventions, outcomes, and settings that can be efficiently examined. However, this proliferation also carries an increased threat of bias and misleading findings. The level of evidence that may be derived from RWD is ultimately a function of the data quality and the rigorous application of study design and analysis. This report reviews the current landscape and applications of RWD in clinical effectiveness and population health research for diabetes and summarizes opportunities and best practices in the conduct, reporting, and dissemination of RWD to optimize its value and limit its drawbacks.