ArticlePDF Available

Reducing Sample Size While Improving Equity in Vaccine Clinical Trials: A Machine Learning-Based Recruitment Methodology with Application to Improving Trials of Hepatitis C Virus Vaccines in People Who Inject Drugs

MDPI
Healthcare
Authors:

Abstract and Figures

Despite the availability of direct-acting antivirals that cure individuals infected with the hepatitis C virus (HCV), developing a vaccine is critically needed in achieving HCV elimination. HCV vaccine trials have been performed in populations with high incidence of new HCV infection such as people who inject drugs (PWID). Developing strategies of optimal recruitment of PWID for HCV vaccine trials could reduce sample size, follow-up costs and disparities in enrollment. We investigate trial recruitment informed by machine learning and evaluate a strategy for HCV vaccine trials termed PREDICTEE—Predictive Recruitment and Enrichment method balancing Demographics and Incidence for Clinical Trial Equity and Efficiency. PREDICTEE utilizes a survival analysis model applied to trial candidates, considering their demographic and injection characteristics to predict the candidate’s probability of HCV infection during the trial. The decision to recruit considers both the candidate’s predicted incidence and demographic characteristics such as age, sex, and race. We evaluated PREDICTEE using in silico methods, in which we first generated a synthetic candidate pool and their respective HCV infection events using HepCEP, a validated agent-based simulation model of HCV transmission among PWID in metropolitan Chicago. We then compared PREDICTEE to conventional recruitment of high-risk PWID who share drugs or injection equipment in terms of sample size and recruitment equity, with the latter measured by participation-to-prevalence ratio (PPR) across age, sex, and race. Comparing conventional recruitment to PREDICTEE found a reduction in sample size from 802 (95%: 642–1010) to 278 (95%: 264–294) with PREDICTEE, while also reducing screening requirements by 30%. Simultaneously, PPR increased from 0.475 (95%: 0.356–0.568) to 0.754 (95%: 0.685–0.834). Even when targeting a dissimilar maximally balanced population in which achieving recruitment equity would be more difficult, PREDICTEE is able to reduce sample size from 802 (95%: 642–1010) to 304 (95%: 288–322) while improving PPR to 0.807 (95%: 0.792–0.821). PREDICTEE presents a promising strategy for HCV clinical trial recruitment, achieving sample size reduction while improving recruitment equity.
Content may be subject to copyright.
Citation: Chiu, R.; Tatara, E.;
Mackesy-Amiti, M.E.; Page, K.; Ozik,
J.; Boodram, B.; Dahari, H.; Gutfraind,
A. Reducing Sample Size While
Improving Equity in Vaccine Clinical
Trials: A Machine Learning-Based
Recruitment Methodology with
Application to Improving Trials of
Hepatitis C Virus Vaccines in People
Who Inject Drugs. Healthcare 2024,12,
644. https://doi.org/10.3390/
healthcare12060644
Academic Editor: Tin-Chih Toly
Chen
Received: 29 December 2023
Revised: 1 March 2024
Accepted: 6 March 2024
Published: 13 March 2024
Copyright: © 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
healthcare
Article
Reducing Sample Size While Improving Equity in Vaccine
Clinical Trials: A Machine Learning-Based Recruitment
Methodology with Application to Improving Trials of Hepatitis
C Virus Vaccines in People Who Inject Drugs
Richard Chiu 1,2 , Eric Tatara 3,4, Mary Ellen Mackesy-Amiti 5, Kimberly Page 6, Jonathan Ozik 3,4 ,
Basmattee Boodram 5, Harel Dahari 2and Alexander Gutfraind 2, 6, *
1Department of Medicine, University of Illinois College of Medicine at Chicago, Chicago, IL 60612, USA;
rchiu8@uic.edu
2The Program for Experimental & Theoretical Modeling, Department of Medicine, Division of Hepatology,
Stritch School of Medicine, Loyola University Chicago, Maywood, IL 60660, USA; hdahari@luc.edu
3Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL 60637, USA
4Argonne National Laboratory, Lemont, IL 60439, USA
5Division of Community Health Sciences, School of Public Health, University of Illinois at Chicago,
Chicago, IL 60607, USA; bboodram@uic.edu (B.B.)
6Department of Internal Medicine, Division of Epidemiology, Biostatistics and Preventive Medicine,
University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA; pagek@salud.unm.edu
*Correspondence: agutfraind@luc.edu; Tel.: +1-708-216-4682
Abstract:
Despite the availability of direct-acting antivirals that cure individuals infected with the
hepatitis C virus (HCV), developing a vaccine is critically needed in achieving HCV elimination.
HCV vaccine trials have been performed in populations with high incidence of new HCV infection
such as people who inject drugs (PWID). Developing strategies of optimal recruitment of PWID
for HCV vaccine trials could reduce sample size, follow-up costs and disparities in enrollment. We
investigate trial recruitment informed by machine learning and evaluate a strategy for HCV vaccine
trials termed PREDICTEE—Predictive Recruitment and Enrichment method balancing Demographics
and Incidence for Clinical Trial Equity and Efficiency. PREDICTEE utilizes a survival analysis model
applied to trial candidates, considering their demographic and injection characteristics to predict
the candidate’s probability of HCV infection during the trial. The decision to recruit considers both
the candidate’s predicted incidence and demographic characteristics such as age, sex, and race. We
evaluated PREDICTEE using in silico methods, in which we first generated a synthetic candidate
pool and their respective HCV infection events using HepCEP, a validated agent-based simulation
model of HCV transmission among PWID in metropolitan Chicago. We then compared PREDICTEE
to conventional recruitment of high-risk PWID who share drugs or injection equipment in terms
of sample size and recruitment equity, with the latter measured by participation-to-prevalence
ratio (PPR) across age, sex, and race. Comparing conventional recruitment to PREDICTEE found
a reduction in sample size from 802 (95%: 642–1010) to 278 (95%: 264–294) with PREDICTEE,
while also reducing screening requirements by 30%. Simultaneously, PPR increased from 0.475
(95%: 0.356–0.568) to 0.754 (95%: 0.685–0.834). Even when targeting a dissimilar maximally balanced
population in which achieving recruitment equity would be more difficult, PREDICTEE is able to
reduce sample size from 802 (95%: 642–1010) to 304 (95%: 288–322) while improving PPR to 0.807
(95%: 0.792–0.821). PREDICTEE presents a promising strategy for HCV clinical trial recruitment,
achieving sample size reduction while improving recruitment equity.
Keywords:
randomized clinical trial; vaccine trial recruitment; hepatitis C; equity; people who inject
drugs; machine learning
Healthcare 2024,12, 644. https://doi.org/10.3390/healthcare12060644 https://www.mdpi.com/journal/healthcare
Healthcare 2024,12, 644 2 of 16
1. Introduction
Despite over thirty years of research, there is still no vaccine for hepatitis C virus (HCV)
infection [
1
,
2
], a virus affecting over 2 million adults in the U.S. alone [
3
,
4
]. While HCV can
be cured with direct-acting antivirals (DAAs) [5], treatment with DAAs can cost upwards
of $25,000 [
6
] and does not prevent reinfection [
7
]. A vaccine promises to be an inexpensive,
feasible, and accessible viral control strategy, especially for high-risk populations such as
people who inject drugs (PWID), who are affected by the opioid crisis [8].
Our primary motivation for improving vaccine clinical trial recruitment is to reduce
the sample size needed in an HCV vaccine trial to achieve adequate statistical power.
An analysis of randomized clinical trials (RCTs) revealed that 19% were terminated due
to insufficient accrual of participants [
9
], and up to 86% do not reach their recruitment
targets within the originally envisaged timescale [
10
,
11
]. Moreover, studies place the cost
of recruitment to a vaccine clinical trial to be in the range of $9000 to $16,000 per recruited
subject, considering expenses such as tracking and retention costs, reimbursement, and
drug costs [
12
,
13
]. Reducing sample size therefore not only increases the feasibility of
vaccine trials, but also enables significant cost savings.
We secondarily seek to improve representation of diverse populations in HCV RCTs
and ensure equitable access to novel vaccines. A study by Wilder et al. found that par-
ticipation of Black/African American participants in HCV-related clinical trials does not
proportionally reflect the burden of HCV among this demographic in North America [
14
].
The issue of adequate representation is particularly pertinent in the context of HCV be-
cause demographic characteristics such as sex [
15
17
], race [
15
,
18
], and age [
19
] can impact
HCV viral clearance and immunology and likely alter vaccine efficacy. The FDA recently
called for innovative strategies promoting enrollment of underrepresented populations in
RCTs [
20
] and the NIH instructed trials it funds to ensure adequate inclusion of minori-
ties [
21
]. However, it is desired to achieve these objectives without magnifying the costs of
screening and the overall sample size.
This study aims to address both sample size and equity concerns by introducing a
novel approach for recruiting for HCV vaccine clinical trials. The approach integrates
machine learning (ML) to identify persons most likely to be exposed to HCV while also
implementing a weighting scheme prioritizing underrepresented candidates. We refer to
this recruitment strategy as the Predictive Recruitment and Enrichment method balancing
Demographics and Incidence for Clinical Trial Equity and Efficiency (PREDICTEE). We
surveyed the literature for similar work and identified several studies that used a predictive
model for recruitment of individuals at high risk for the outcome of interest [
22
25
] as well
as others that discuss strategies to improve equity and inclusion in RCTs [
26
]. There have
also been other suggestions for improving trials for HCV vaccine candidates (e.g., [
27
,
28
]),
but none targeting sample size or equity. Therefore, we believe this study is the first
to achieve the two opposing objectives in the context of vaccine trial recruitment for
HCV vaccines.
This study contributes to a growing body of literature on the implications of artificial
intelligence (AI) and machine learning in clinical trials. Recent reviews by Ismail et al. [
29
]
and Harrer et al. [
30
] describe how AI/ML may be leveraged to optimize recruitment to
reduce trial failures, optimize patient composition, and reduce time and costs associated
with conducting an RCT. We believe PREDICTEE offers one such ML-based solution to
these goals. This study also complements a recent study by Oikonomou et al. [
31
] which
describes a ML-based phenomapping strategy capable of maximizing RCT enrollment
efficiency that is similar to what we propose in our PREDICTEE methodology. However,
we offer an alternative method of optimizing the clinical trial cohort and apply it to a
simulated vaccine clinical trial.
PREDICTEE also builds upon our previous work on a model of Hepatitis C elimination
in PWID (HepCEP), which simulates HCV infection, network formation, and syringe shar-
ing in the PWID population of metropolitan Chicago [
32
]. The present study investigates
how longitudinal PWID data, such as that generated using the HepCEP model, can be
Healthcare 2024,12, 644 3 of 16
leveraged to improve vaccine trial recruitment equity and efficiency among PWID. We
also propose a novel use of survival analysis in clinical trial recruitment decisions for the
purpose of prognostic enrichment. Using simulations of HCV vaccine trial recruitment of
PWID based on data collected from our previous study, we show that PREDICTEE yields
trial cohorts that are approximately half the size of those recruited using conventional
methods, while also improving demographic representativeness.
2. Materials and Methods
In designing PREDICTEE, we set multiple objectives: (1) to identify subgroups with
high incidence in the trial cohort while ensuring that (2) the final sample satisfies constraints
on representation. Additionally, the recruitment process should (3) be efficient and conclude
in a pre-specified amount of time and (4) cope with uncertainty about the quality and
background of trial candidates. Below, we describe how PREDICTEE works and evaluate
its performance in a simulated HCV vaccine RCT in Chicago.
2.1. Preparation
2.1.1. Longitudinal Data
Our longitudinal data on PWID are derived from a large synthetic PWID population.
To achieve this, we used the HepCEP agent-based model that simulates PWID behavioral
patterns–daily injection drug use, social network formation and dissolution, and geogra-
phy [
32
,
33
]. Details of the HepCEP model are described in the Supplementary Materials; in
brief, the HepCEP model simulates events such as PWID attrition, new PWID arrival, drug
sharing, network formation, HCV infection, recovery, vaccination and more. To generate
our synthetic population, we constrained the HepCEP model to maintain a population of
100,000 PWID over its 10-year simulation, removed any simulated vaccine effects, and kept
all other HepCEP parameters at default. This generated profiles for 123,071 PWID—the
pool of candidates we used for our vaccine trial recruitment. Each PWID profile contains
demographic, network, and injection characteristics, as well as the recorded time to HCV
infection if it occurred. Additional variables were also calculated for these synthetic PWID
using the simulated HepCEP events including an indicator variable denoting if the indi-
vidual would become infected in a clinical trial with a 1.5-year follow-up period and if the
individual is HCV-susceptible (HCV RNA/Ab-negative).
2.1.2. Survival Analysis Models
To achieve its goals of minimizing sample size, PREDICTEE relies on a survival model
that estimates the probability of HCV infection by the end of the clinical trial follow-up
period, given the demographic, network, and injection characteristics of individual PWID.
This trained survival model will be used in the PREDICTEE process to enrich the trial
cohort with high-risk PWID most likely to experience acute HCV infection, increasing
the cohort incidence of HCV and thereby decreasing the required sample size needed to
attain adequate power. In our simulated HCV vaccine trials, we tested two models: (1) a
Cox proportional hazards model, a classic survival analysis model [
34
], and (2) a non-
parametric random survival forest (RSF), a more recently developed ensemble prediction
method [
35
]. For each simulation, we implemented a 20/80 train-test split, in which 20% of
the synthetic population of 123,071 PWID was used to train the survival analysis model, and
the remaining 80% served as the recruitment pool from which simulated trial participants
were recruited from. Both Cox and RSF models were trained using the demographic,
behavioral, and network attributes provided by the HepCEP model, listed as follows:
1. Age;
2. Sex assigned at birth;
3. Syringe source (harm reduction program or other);
4. The number of PWID who gave the candidate drugs/injection equipment in the last
30 days;
Healthcare 2024,12, 644 4 of 16
5.
The number of PWID who the candidate gave drugs/injection equipment to in the
last 30 days;
6. The total number of people in the candidate’s drug use network;
7. The number of daily injections;
8.
The fraction of injections that involve receiving drugs or injection equipment from
another person in the network.
The outcome variable used to train the models was the time (in years) until the PWID
experience an acute HCV infection. Any PWID who did not experience an HCV infection
during the 10-year HepCEP simulation were right-censored at 10 years. These trained
Cox and RSF models are used to predict the probability that new PWID will be infected
with HCV by 1.5 years after enrollment, allowing for the recruitment of high-risk PWID to
maximize trial cohort incidence.
The RSF model also requires the selection of two main hyperparameters: number of
trees (ntree) and number of variables randomly selected as candidates for splitting a node
(mtry). Using the recommendations in the literature [
36
,
37
], we set mtry = p/3, with pbeing
the number of explanatory variables; thus, mtry = 3 because there are eight explanatory
variables used to train the model. Additionally, we set ntree = 100 based on [38,39].
2.1.3. Demographic Targets
While PREDICTEE primarily aims to achieve a minimal sample size through PWID
risk prediction, it also attempts to improve trial generalizability by recruiting a trial cohort
that resembles a prespecified target demographic. We balanced both objectives through a
dynamic weighting and scoring process, which factors in both a candidate’s HCV risk as
well as the characteristics needed to reach the target demographic composition. Details on
the scoring equation are described later in this section.
2.2. The PREDICTEE Workflow
An overview of PREDICTEE recruitment as it was simulated in this study is illustrated
in Figure 1. PREDICTEE first requires pre-existing longitudinal data of the recruited popu-
lation, containing characteristics predictive of HCV infection and a timeline of infection
events. The synthetic population of 123,071 PWID serves this purpose for our simulations.
We used 20% of these data to train a Cox or RSF model, and the remaining 80% served as
the recruitment pool. PWID screening was simulated via random sampling of this recruit-
ment pool. Screened candidates undergo immediate scoring, factoring in demographic
considerations and their probability of HCV infection, estimated using the survival model.
Once a batch of Bcandidates is screened and scored, the candidates with the highest scores
receive HCV screening, and the top-scoring RHCV-susceptible PWID are recruited and
randomized to vaccine or control groups.
After each batch, we update the weights used for scoring candidates based on the
composition of the partial cohort. At a predetermined point, we also optionally perform
sample size re-estimation based on the predicted incidence of the partial cohort, calculated
by averaging the individual infection probabilities of the recruits, and allow for early
termination of recruitment. The point at which sample size is re-estimated is inversely
proportional to the estimated incidence improvement of the predictive model compared to
conventional recruitment methods. For example, a model that is expected to double the
incidence in the trial cohort would re-estimate sample size when half the required sample
size of conventional recruitment is reached. These parameters would be determined
via simulations using the same preliminary data that were used to train the predictive
model. For a list of parameters used in PREDICTEE, see the Supplementary Materials. In
subsequent sections, we describe the processes of weighting and scoring in greater detail.
Healthcare 2024,12, 644 5 of 16
Healthcare 2024, 12, x FOR PEER REVIEW 5 of 17
Figure 1. Schematic of the PREDICTEE recruitment method as it was performed in our simulations.
During the setup, a predictive model is trained using existing longitudinal data, and parameters for
PREDICTEE are set. Recruitment proceeds in cycles of scoring, batching, recruiting, and adjusting
parameters. Optionally, sample size is re-estimated at a pre-specied number of PWID recruited
(re-estimation point).
After each batch, we update the weights used for scoring candidates based on the
composition of the partial cohort. At a predetermined point, we also optionally perform
sample size re-estimation based on the predicted incidence of the partial cohort, calculated
by averaging the individual infection probabilities of the recruits, and allow for early
termination of recruitment. The point at which sample size is re-estimated is inversely
proportional to the estimated incidence improvement of the predictive model compared
to conventional recruitment methods. For example, a model that is expected to double the
incidence in the trial cohort would re-estimate sample size when half the required sample
size of conventional recruitment is reached. These parameters would be determined via
simulations using the same preliminary data that were used to train the predictive model.
For a list of parameters used in PREDICTEE, see the Supplementary Materials. In
subsequent sections, we describe the processes of weighting and scoring in greater detail.
2.2.1. Candidate Scoring Equations
The arriving candidates are denoted 𝑐, 𝑐, …, and a new candidate 𝑐 arrives at a
time point t, where t represents the number of prior batches that have already been
Figure 1. Schematic of the PREDICTEE recruitment method as it was performed in our simulations.
During the setup, a predictive model is trained using existing longitudinal data, and parameters for
PREDICTEE are set. Recruitment proceeds in cycles of scoring, batching, recruiting, and adjusting
parameters. Optionally, sample size is re-estimated at a pre-specified number of PWID recruited
(re-estimation point).
2.2.1. Candidate Scoring Equations
The arriving candidates are denoted
c1
,
c2
,
. . .
, and a new candidate
ci
arrives at a time
point t, where trepresents the number of prior batches that have already been considered.
The score
Scoret(ci)
is based on several factors: (1) the cohort of previously recruited
subjects, denoted
At1
, with individual recruited candidates denoted a; (2) the target
composition for the final cohort across each population category (e.g., proportion of non-
Hispanic Black), denoted
pj
, where
jJ
, with Jrepresenting the larger set of all considered
categories (demographic and/or others); and (3) the characteristics of candidate
ci
. The
latter includes the candidate’s predicted incidence of HCV, denoted
I(ci)
, as estimated by
the predictive model, and the candidate’s categories, denoted
dj(ci)
, which equal 1 when
the person is a member of category
j
and 0 otherwise. The candidate score is given by
Equation (1):
Healthcare 2024,12, 644 6 of 16
Scoret(ci) = wtI(ci) + (100 wt)
jJ
dj(ci)"pj1
|At1|
aAt1
dj(a)#(1)
where
wt
is the weight assigned to incidence and
100 wt
is the weight of categorical
factors. This scoring function simultaneously addresses the conflicting goals of trial equity
while decreasing sample size via recruiting higher-risk candidates. For an example of
candidate scoring, see Figure S1 in the Supplementary Materials.
After score assignment, the top-R-scoring HCV-naïve candidates are recruited. Because
we test for susceptibility to HCV after scoring, if any of these Rcandidates are not HCV-
susceptible, they are replaced by the next-best-scoring naïve candidate. A trial could
optionally maintain a backlog of candidates who were not recruited and consider them
alongside subsequent batches, which would help decrease batch variability and maintain
a steady recruitment screening, particularly in situations when the value of Bis small.
This process is repeated until the end sample size is reached. The values of Band Rcan
be altered based on trial capabilities. The ratio between them, R/B, is the proportion of
each batch that is recruited. Decreasing the ratio minimizes the size of the sample, while
increasing it minimizes the cost of screening.
Incidence and demographic representativeness are weighted dynamically, with PRE-
DICTEE prioritizing incidence at the beginning of the trial and gradually giving more
weight to closing population gaps as the trial advances. Thus, the initial
w0
is always 100,
and then it decreases after each batch according to Equation (2):
wt=wt1R
N(wt1wmin )(2)
where w
t
is the incidence weight for the current batch t,w
t1
is the incidence weight for the
last batch t
1,
wmin
is the incidence weight floor, Ris the number of candidates recruited
per batch, and Nis the most recent estimated sample size. For simulations where we
anticipate a large gap from targets, we accelerated the rate of decrease, as described in
the Supplementary Materials. This weighting approach would allow investigators to first
establish a baseline profile of recruited candidates at the beginning of a trial based on the
highest-risk candidates screened. As the trial progresses, this profile of candidates can be
leveraged to address sampling inequities without significantly compromising the main
objective of maintaining a low sample size.
Optionally, the trial plan could allow for early stopping of recruitment based on the
cohort incidence. Namely, the stopping decision could consider the expected number of
events in the recruited cohort as estimated by the model, without unblinding the arms or
waiting to observe actual events. It is recommended that the decision threshold include a
margin of safety for any model error.
2.3. Evaluating PREDICTEE HCV Vaccine Trial in Chicago PWID Population
2.3.1. Design of Simulation Experiments
In previous work, we simulated trials of HCV vaccines end-to-end [
32
,
33
,
40
,
41
]. In
this work, we implement software that simulates only the outcome of the recruitment
process. We applied PREDICTEE to two challenges in building a representative cohort:
(a) a study cohort matching the Chicago area’s PWID population in terms of race/ethnicity,
sex, and age distribution; (b) a maximally balanced cohort with a racial composition target
for Hispanic, non-Hispanic Black (NHBlack), non-Hispanic White (NHWhite), and Other
individuals of 33:33:33:1, respectively, and a sex composition of 50:50 (male–female). For
simulations in which age was targeted, each candidate was categorized into ten-year age
groups. Age was not considered in the maximally balanced cohort (Scenario b) because
of the extremely low prevalence (~3.5% of the recruitment pool) and HCV incidence in
candidates in the 49+ age group in our Chicago PWID data. This would make it difficult
to match a balanced cohort without arbitrarily decreasing the target for candidates in
Healthcare 2024,12, 644 7 of 16
the 49+ group. To compare PREDICTEE to conventional recruitment strategies, we also
simulated two additional recruitment strategies using the Chicago area’s PWID dataset, as
follows. (1) Random/uniform: candidates are an unbiased random sample from the HCV-
susceptible PWID population; (2) in-network: only candidates who receive syringes from
their social network are recruited because they have a higher incidence rate (as proposed
in [
32
,
33
,
42
]). Random recruitment is offered as a synthetic benchmark, although few trials
can realistically implement the random recruitment of PWID due to the complexity of
working with the PWID population, and it is avoided in sites like Chicago due to the low
incidence rate.
For both Cox and RSF models, PREDICTEE was executed 100 times for each of 100 ran-
dom train/test splits of the synthetic population, yielding 10,000 total trial recruitment
simulations. For each simulation, we calculate Harrell’s concordance index (C-index) for
the unique trained model. The C-index is a goodness-of-fit measurement commonly used
for survival analysis models with censored data, analogous to the area under the ROC
curve (AUC) for more classic predictive models and diagnostic tests [
43
,
44
]. Based on
the existing literature, the C-index for a survival analysis model should be at least 0.7 to
adequately discriminate between risk profiles [
45
47
]. For all simulation runs, assumed
vaccine efficacy was 60% and the trial follow-up period was 1.5 years, following the work
of Page et al. [
2
]. We define vaccine efficacy as the proportionate reduction in chronic cases
between the unvaccinated and vaccinated groups [
48
]. The initial required sample size was
set to 800 to achieve 80% power based on the estimated 1.5-year cumulative incidence of
5.5% from the in-network recruitment method. Parameters were set accordingly: B= 50,
R= 5, and
wmin =
25. The candidate arrival process was simulated via random sampling
from the recruitment pool. Simulations of random sampling and in-network recruitment
were also repeated 10,000 times.
Sample size re-estimation occurred after 400 candidates recruited for Cox PREDICTEE
and 267 candidates recruited for RSF PREDICTEE based on experimental data that showed
a required sample size of approximately 800 candidates for in-network recruitment and that
the Cox and RSF model would approximately double and triple the incidence, respectively.
At this re-estimation point, the predicted incidence of the partial cohort is used to update
the sample size rather than looking at the observed outcomes of recruits. This is achieved by
averaging the infection probabilities generated by the Cox or RSF models for all candidates
recruited to the trial up to the re-estimation point, without revealing treatment assignments.
Due to the blinded nature of this process, there is a minimal impact on type I error [
49
].
Specific parameters and formulae involved in sample size re-estimation can be found in
the Supplementary Materials.
Incidence values in this study represent cumulative incidence over the 1.5-year trial
follow-up period, expressed as the proportion of recruited participants who develop
HCV. Predicted incidence in PREDICTEE simulations is compared to observed incidence
to confirm model validity (see the Supplementary Materials). Demographic statistics
are recorded for each trial cohort and averaged across all runs for both Cox and RSF
PREDICTEE. These are plotted alongside the target population and in-network recruitment
to evaluate PREDICTEE’s demographic adjustment capability.
For each recruitment method and demographic category, we also calculate participation-
to-prevalence ratio (PPR), a metric that evaluates demographic representation and has been
widely used in clinical trial settings for assessing adequate enrollment of demographic
subgroups [
50
54
]. PPR = proportion among trial participants/proportion among disease
population. Thus, a PPR of 1.0 for a specific demographic represents perfect representation.
A PPR of less than 0.8 signifies underrepresentation, a PPR between 0.8 and 1.2 signifies
adequate representation, and a PPR greater than 1.2 signifies overrepresentation. For our
simulations, we report the average of the lowest PPR across all target categories in each
run, denoted PPRmin.
Healthcare 2024,12, 644 8 of 16
2.3.2. Software Modeling Platform and Tools
Trial power analysis and simulations were conducted using R, with Cox and RSF
models trained using the survival and randomForestSRC packages, respectively [5557].
3. Results
Our analysis below compares recruitment techniques in terms of required sample size,
representativeness, and screening requirements. We also compared PREDICTEE using the
Cox model against RSF.
3.1. Matching the Chicago PWID Population
The baseline characteristics of the Chicago PWID population are listed in Table 1.
Unless otherwise specified, PREDICTEE attempts to match the HCV-susceptible PWID
population (third column)—this is the population most likely to eventually receive the
vaccine. In-network recruitment represents a good method for recruitment into HCV trials
without a predictive model and serves as a comparison for PREDICTEE. As can be seen in
Table 1, in-network recruitment results in a significant underrepresentation of non-Hispanic
Black candidates compared to the HCV-susceptible population.
Table 1. Demographic, behavioral, and network attributes of PWID populations in Chicago.
Attribute 2018 Chicago PWID
Population
Susceptible Population
with Receptive Network
HCV-Susceptible
Population
Demographic Attributes
Location (by ZIP Code) City: 45.4% City: 27.5% City: 36.5%
Suburbs: 54.6% Suburbs: 72.5% Suburbs: 63.5%
Race/Ethnicity
Hispanic: 18.7% Hispanic: 17.5% Hispanic: 18.1%
NH Black: 20.8% NH Black: 10.0% NH Black: 15.5%
NH White: 57.2% NH White: 69.3% NH White: 63.2%
Other: 3.2% Other: 3.2% NH Other: 3.2%
Sex Female: 30.6% Female: 37.8% Female: 31.5%
Male: 69.4% Male: 62.2% Male: 68.5%
Age, mean (IQR) 35.2 (26.0–43.0) 29.8 (24.0–34.0) 31.4 (24.9–37.0)
Elapsed years of injection
drug use, mean (IQR) 11.3 (3.4–15.6) 6.3 (2.0–8.6) 7.2 (2.5–9.9)
Enrollment in any
harm-reduction (HR) program
HR: 48.4% HR: 33.1% HR: 45.5%
Non-HR: 51.6% Non-HR: 66.9% Non-HR: 54.5%
HCV Infection State
Infected (acute or chronic):
28.1% 0%—all susceptible 0%—all susceptible
Recovered (antibody-positive):
12.3%
Behavioral Attributes
Daily Drug Injections,
mean (IQR) 2.5 (0.9–3.3) 2.6 (0.9–3.6) 2.4 (0.8–3.2)
Probability of Receptible
Sharing, mean (IQR) 19.4% (0.0–37.3%) 30.3% (5.0–50.0%) 21.3% (0.0–40.6%)
Network Attributes
In Degree
(Receptive Network Size)
0 (no network)–68.0% 0 (no network)–0% 0 (no network)–68.0%
1–23.5% 1–75.7% 1–23.5%
2–8.5% 2–24.3% 2–8.5%
Out Degree
(Giving Network Size)
0 (no network)–72.0% 0 (no network)–45.0% 0 (no network)–69.4%
1–20.6% 1–40.5% 1–22.7%
2–7.4% 2–14.5% 2–7.9%
Healthcare 2024,12, 644 9 of 16
PREDICTEE led to a marked increase in cumulative incidence over the course of a
simulated trial, as shown in Table 2a. Application of the Cox model led to a nearly two-fold
increase in incidence compared to in-network recruitment from 0.055 (95%: 0.044–0.068)
to 0.097 (95%: 0.090–0.104), and the RSF model led to a nearly three-fold increase to 0.149
(95%: 0.141–0.155). This corresponds to a sample size of 444 and 278 for Cox and RSF
PREDICTEE, respectively, compared to 802 for in-network recruitment (a ratio of 1.81
and 2.88, respectively). In terms of screening requirements, Cox PREDICTEE achieved a
smaller sample size with an approximate 12% increase in eligibility screening; however,
RSF PREDICTEE achieved its reduction in sample size while also reducing screening
requirements by almost 30%.
Table 2.
Incidence data when PREDICTEE attempts to match two different demographic composi-
tions: (a) Chicago’s susceptible PWID and (b) an arbitrary balanced population. All values represent
the mean of 10,000 simulations, with a 95% range calculated using quantiles.
Random Sample In-Network
Recruitment
PREDICTEE
(Cox Model)
PREDICTEE
(RSF Model)
(a) Matching Chicago’s Susceptible PWID
Cohort Incidence 0.024
(0.018–0.033)
0.055
(0.044–0.068)
0.097
(0.090–0.104)
0.149
(0.141–0.155)
Required Sample Size
(Calculated Using Cohort
Incidence)
1876
(1356–2512)
802
(642–1010)
444
(408–480)
278
(264–294)
Expected Number of
Candidates Screened
Before Achieving Required
Sample Size *
2207
(1595–2955)
4648
(3721–5853)
5224
(4800–5647)
3271
(3106–3459)
Post Hoc Power if 800
Recruited
49.2%
(40.5–60.5%)
80.0%
(71.5–87.1%)
95.5%
(94.2–96.7%)
99.5%
(99.3–99.6%)
PPRmin ** - 0.475
(0.356–0.568)
0.764
(0.593–0.934)
0.754
(0.685–0.834)
(b) Matching Arbitrary Balanced Demographics
Cohort Incidence 0.024
(0.018–0.033)
0.055
(0.044–0.068)
0.085
(0.079–0.095)
0.137
(0.130–0.144)
Required Sample Size
(Calculated Using Cohort
Incidence)
1876
(1356–2512)
802
(642–1010)
506
(452–550)
304
(288–322)
Expected Number of
Candidates Screened
Before Achieving Required
Sample Size *
2207
(1595–2955)
4648
(3721–5853)
5953
(5318–6471)
3576
(3388–3788)
Post Hoc Power if 800
Recruited
49.2%
(40.5–60.5%)
80.0%
(71.5–87.1%)
93.1%
(91.3–95.2%)
99.2%
(98.9–99.4%)
PPRmin *** 0.433
(0.396–0.472)
0.250
(0.221–0.280)
0.903
(0.889–0.918)
0.807
(0.792–0.821)
* Assuming a refusal rate of 15% [
2
], with 20.3% of PWID being both eligible for in-network recruitment and
susceptible (calculated from HepCEP data) and 10% of PREDICTEE candidates being recruited. Values do not
account for any additional inclusion/exclusion criteria not mentioned, and thus true screening numbers may be
larger. ** Refers to the susceptible Chicago PWID population. PPR
min
was not calculated for random sample
because it is sampled directly from the target population of susceptible Chicago PWID. *** Refers to the arbitrary
maximally balanced target population.
While improving incidence, PREDICTEE simultaneously corrected for deviations in
demographic representativeness seen in in-network recruitment, illustrated by Figure 2A.
This is also expressed in the PPR
min
row in Table 2a, which shows that PPR
min
increased
Healthcare 2024,12, 644 10 of 16
greatly from 0.475 (95%: 0.356–0.568) with in-network recruitment to 0.764 (95%: 0.593–0.934)
for Cox and 0.754 (95%: 0.685–0.834) for RSF. It should be noted that PPR
min
remained
under 0.80 for PREDICTEE recruitment exclusively because of the over-49 age group that
is being matched. This is due to the low PWID prevalence and HCV incidence of this
demographic, leading to continued underrepresentation. If this demographic were not
considered in matching, PPR
min
was calculated to be 0.947 (95%: 0.924–0.970) for Cox
PREDICTEE and 0.964 (95%: 0.948–0.974) for RSF PREDICTEE.
Healthcare 2024, 12, x FOR PEER REVIEW 10 of 17
target population of susceptible Chicago PWID. *** Refers to the arbitrary maximally balanced
target population.
While improving incidence, PREDICTEE simultaneously corrected for deviations in
demographic representativeness seen in in-network recruitment, illustrated by Figure 2A.
This is also expressed in the PPR
min
row in Table 2a, which shows that PPR
min
increased
greatly from 0.475 (95%: 0.356–0.568) with in-network recruitment to 0.764 (95%: 0.593–
0.934) for Cox and 0.754 (95%: 0.685–0.834) for RSF. It should be noted that PPR
min
remained under 0.80 for PREDICTEE recruitment exclusively because of the over-49 age
group that is being matched. This is due to the low PWID prevalence and HCV incidence
of this demographic, leading to continued underrepresentation. If this demographic were
not considered in matching, PPR
min
was calculated to be 0.947 (95%: 0.9240.970) for Cox
PREDICTEE and 0.964 (95%: 0.948–0.974) for RSF PREDICTEE.
Figure 2. Demographic outcome of recruitment under two dierent target populations: (A)
Chicago’s susceptible PWID and (B) an arbitrary balanced population. Error bars represent the 95%
range over 10,000 trials. Sex reects assignment at birth.
3.2. Targeting Arbitrary Demographics
Results for matching a maximally balanced population to assess the ability of
PREDICTEE to adjust to more dissimilar targets are summarized in Table 2b and Figure
2B. In this scenario, in-network recruitment expressed signicant inequities with a PPR
min
of 0.250 (95%: 0.221–0.280), while PREDICTEE was able to close demographic gaps
reasonably well, with an average PPR
min
across the simulations of 0.903 (95%: 0.8890.918)
and 0.807 (95%: 0.792–0.821) for Cox and RSF PREDICTEE, respectively. At the same time,
PREDICTEE increased cumulative incidence to 0.085 (95%: 0.079–0.095) and 0.137 (95%:
0.130–0.144), respectively. This corresponds to a required sample size of 506 and 304 for
Cox and RSF PREDICTEE compared to 802 for in-network recruitment (a ratio of 1.58 and
2.64, respectively). Cox PREDICTEE achieved a smaller sample size with an
approximately 28% increase in eligibility screening, while RSF PREDICTEE achieved its
Figure 2.
Demographic outcome of recruitment under two different target populations: (
A
) Chicago’s
susceptible PWID and (
B
) an arbitrary balanced population. Error bars represent the 95% range over
10,000 trials. Sex reflects assignment at birth.
3.2. Targeting Arbitrary Demographics
Results for matching a maximally balanced population to assess the ability of PRE-
DICTEE to adjust to more dissimilar targets are summarized in Table 2b and Figure 2B.
In this scenario, in-network recruitment expressed significant inequities with a PPR
min
of 0.250 (95%: 0.221–0.280), while PREDICTEE was able to close demographic gaps rea-
sonably well, with an average PPR
min
across the simulations of 0.903 (95%: 0.889–0.918)
and 0.807 (95%: 0.792–0.821) for Cox and RSF PREDICTEE, respectively. At the same time,
PREDICTEE increased cumulative incidence to 0.085 (95%: 0.079–0.095) and 0.137 (95%:
0.130–0.144), respectively. This corresponds to a required sample size of 506 and 304 for
Cox and RSF PREDICTEE compared to 802 for in-network recruitment (a ratio of 1.58 and
2.64, respectively). Cox PREDICTEE achieved a smaller sample size with an approximately
28% increase in eligibility screening, while RSF PREDICTEE achieved its reduction in
sample size while also reducing screening requirements by roughly 23% compared to
in-network recruitment.
The models were generally quite accurate in predicting time to infection in the sim-
ulated epidemic; across 10,000 simulations, the average C-index value was 0.871 for Cox
models and 0.946 for RSF models, indicating adequate risk profile discrimination.
Healthcare 2024,12, 644 11 of 16
4. Discussion
Our study developed PREDICTEE as a proposed method for reducing the required
sample size within a vaccine clinical trial while ensuring cohort representativeness to
a pre-specified target population. We have shown that contemporary techniques in re-
cruitment may result in demographic differences within the trial cohort (Table 1), which
may contribute to a lack of generalizability if these characteristics are relevant to vaccine
efficacy or virus epidemiology. Through simulation, PREDICTEE illustrated its ability to
decrease sample size two- to three-fold while also improving recruitment equity of target
characteristics compared to conventional recruitment. It achieved this by selectively re-
cruiting high-incidence PWID while prioritizing underrepresented demographics (Table 2a
and Figure 2A). This was the case even in circumstances with a highly dissimilar target
population (Table 2b and Figure 2B).
4.1. Implications
PREDICTEE complements and contributes to the literature on trial design strategies
that leverage predictive models. Previous work examined ideas such as prognostic en-
richment, adaptive designs, and sample size re-estimation. The statistical and clinical
benefits of these strategies have been well documented, allowing trials to not only im-
prove their efficiency and impact but also to ensure adequate statistical power [
58
,
59
]. We
show in a simulated setting that PREDICTEE is able to maintain the benefits that adap-
tive trial designs offer while also providing further benefits to clinical trial recruitment
through the implementation of both a predictive model and a weighting system prioritizing
underrecruited categories of candidates.
A practical drawback of previous adaptive enrichment designs is that patient recruit-
ment may be halted before interim analysis so that primary outcomes can be assessed, in-
creasing trial duration and delaying submission of experimental agents for approval [
60
,
61
].
PREDICTEE presents a potential solution to this as it offers a predicted incidence value.
Sample size could be re-estimated without unblinding, using only the data recorded at
enrollment and before any events are observed. As a result, clinical trials would not need
to pause recruitment. Naturally, this scheme relies on a well-calibrated predictive model
and may necessitate a margin for any model error.
We also consider the ability of PREDICTEE to serve as an alternative to existing
estimators that aim to generalize the results of clinical trials to target populations by
accounting for nonrandom sampling [
62
,
63
]. By ensuring concordance between desired
characteristics such as demographics between the trial cohort and the target population,
PREDICTEE essentially aims to generalize trial results via equitable recruitment processes
instead of post hoc quantitative methods that may be non-robust to model misspecification
or the misestimation of parameters [64].
4.2. Limitations
Our quantitative results are based on agent-based simulations of a PWID population
from a large metropolitan city. Although these simulations have been shown to be more
realistic than aggregate models [
65
], they are still a simplification of human behavior and
network formation. As a result, the survival models might be accurate in the simulation
setting but experience increased prediction error in a real clinical trial environment. How-
ever, we believe that these simplifications are unlikely to affect the overall conclusions of
this study, since simulations were only used to train the predictive models and evaluate
the PREDICTEE workflow. The candidate data and recruitment pool that was used in
these simulations were derived from real survey data collected from PWID in Chicago
(see the Supplementary Materials). As in conventional trials, trial organizers may incor-
porate a margin of error into their sample size calculations to reduce the likelihood of an
underpowered trial.
In designing our PREDICTEE recruitment simulations, we aimed to reflect real-world
conditions, but some effects were not captured. Due to a lack of appropriate data on
Healthcare 2024,12, 644 12 of 16
the ease of recruitment and network referrals between PWID, we formed batches for
PREDICTEE by randomly sampling from a predefined recruitment pool. Actual feeder
processes for RCTs likely observe clustering and uneven sampling due to differences in
accessibility of PWID [
66
,
67
], but we do not anticipate this having a tangible effect on
outcomes. Additionally, loss to follow-up is higher in select subgroups (e.g., those engaged
in higher risk activities) [
68
70
], but given the appropriate data, these effects could be easily
incorporated into PREDICTEE. The PREDICTEE scheme also increases the HCV incidence
in underrepresented demographics to a smaller extent than other demographics; however,
this did not have a significant effect in our study (see the Supplementary Materials).
While our simulation results are based on metropolitan PWID data from the Chicago
area, we anticipate that PREDICTEE will work in other geographic areas with similar HCV
epidemics. However, each site has unique HCV risk factors, and ideally, users should
train their own site-specific models. Similarly, all parameters of PREDICTEE (see the
Supplementary Materials) should be designed based on local knowledge of the trial site.
PREDICTEE would ideally rely on ample and accurate longitudinal local data of the target
population to achieve optimal results. From a global lens, this study heavily incorporated a
North American context through the use of ethno-racial categories that are prominent in
the U.S.; however, PREDICTEE could be applied to a range of categories of interest such as
genetic variants, socioeconomic categories, and more.
Finally, more research is needed on the cost-effectiveness of PREDICTEE in compari-
son to traditional recruitment methods. Although we anticipate that cost savings associated
with recruiting fewer subjects would outweigh any additional screening costs with PRE-
DICTEE, other costs would also have to be considered such as predictive model training,
algorithm implementation, and personnel costs. This would require further studies on
specific real-world PREDICTEE parameters and how they compare to traditional recruit-
ment, such as total clinical trial duration, required volume of pre-trial data, and ease of
identifying high-risk PWID.
4.3. Extensions
Future research with PREDICTEE should focus on refining the methodology in field
conditions and validating its use in other types of geographical regions (e.g., rural). We
discuss some possible real-world implementations of PREDICTEE in the Supplementary
Materials. The PREDICTEE strategy could also incorporate extensions in which the arms
within the trial are balanced both in terms of static characteristics, such as race and sex,
as well as the number of predicted infectious events. Imbalance in post-randomization
events [
71
] has been identified as an important source of trial failure in HCV vaccines [
41
],
and it could be reduced by using the survival model to balance the trial arms. Efficiency of
recruitment could also be enhanced by using predictive models beyond predicting incidence
to, e.g., predict viral status, refusal probability, and/or nonadherence, among others,
as demonstrated in a recent COVID-19 vaccine study [
72
]. The efficiency of predictive
recruitment could benefit from schemes for dynamic model retraining, allowing trials to
be carried out in sites with limited previous incidence data. Moreover, the adoption of
PREDICTEE could be accelerated by the sharing of PWID and HCV epidemiological data
via an application programming interface (API). While this study illustrated the benefits of
PREDICTEE in the context of an HCV vaccine trial, this method could be easily extended
to trials of other vaccines, such as COVID-19 or HIV, or trials for novel treatments that also
often seek high-incidence candidates. Moreover, we envision that PREDICTEE could be
adapted to trials with non-binary outcomes, such as weight, A1c, etc.
5. Conclusions
PREDICTEE is the first recruitment method implementing a predictive model that
balances incidence and population representation with the goal of producing more eq-
uitable and feasible clinical trials. Our results illustrated that PREDICTEE can recruit
high-incidence participants while adjusting recruitment to multiple target populations.
Healthcare 2024,12, 644 13 of 16
Further research is warranted to validate PREDICTEE in field conditions and using a
variety of trial designs.
Supplementary Materials:
The following supporting information can be downloaded at: https://www.
mdpi.com/article/10.3390/healthcare12060644/s1, Supplementary Materials containing Supplemental
Methods (SM) and Supplemental Results (SR). References [
22
,
23
,
32
,
33
,
41
,
49
,
73
88
] are cited in the
supplementary materials.
Author Contributions:
Conceptualization, R.C. and A.G.; methodology, R.C., A.G., H.D., E.T., J.O.
and M.E.M.-A.; software, R.C. and A.G.; validation, R.C.; formal analysis, R.C.; investigation, R.C.; re-
sources, E.T., M.E.M.-A., K.P., J.O., B.B., H.D. and A.G.; data curation, B.B. and E.T.; writing—original
draft preparation, all authors; writing—review and editing, all authors; visualization, R.C.; supervi-
sion, A.G. and H.D.; project administration, A.G. and H.D.; funding acquisition, E.T., M.E.M.-A., J.O.,
B.B., H.D. and A.G. All authors have read and agreed to the published version of the manuscript.
Funding:
This work was supported by the National Institutes of Health (NIH) grant number R01-
AI158666 “Computational modeling for HCV vaccine trial design and optimal vaccine-based combi-
nation interventions”. The NIH had no involvement in the conception of this study, data collection
and analysis, interpretation of data, or in the decision to submit this manuscript for publication.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement:
Our source code repository (https://github.com/sashagutfraind/
vaccinetrials (accessed on 19 February 2024)) provides a copy of a spreadsheet alongside all analytical
code. Publicly available synthetic data for the Chicago area can be found at https://zenodo.org/
record/21714#.YshzX-zMLmo (accessed on 19 February 2024) [
32
]. Original survey data were
licensed to the authors by third-party investigators. Readers may request access and approval will be
considered on a case-by-case basis.
Acknowledgments:
We thank Ilya Lipkovich and Marian Major for providing helpful comments on
the manuscript.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1.
The Lancet Gastroenterology & Hepatology. The Hunt for a Vaccine for Hepatitis C Virus Continues. Lancet Gastroenterol. Hepatol.
2021,6, 253. [CrossRef] [PubMed]
2.
Page, K.; Melia, M.T.; Veenhuis, R.T.; Winter, M.; Rousseau, K.E.; Massaccesi, G.; Osburn, W.O.; Forman, M.; Thomas, E.; Thornton,
K.; et al. Randomized Trial of a Vaccine Regimen to Prevent Chronic HCV Infection. N. Engl. J. Med.
2021
,384, 541–549. [CrossRef]
[PubMed]
3.
Rosenberg, E.S.; Rosenthal, E.M.; Hall, E.W.; Barker, L.; Hofmeister, M.G.; Sullivan, P.S.; Dietz, P.; Mermin, J.; Ryerson, A.B.
Prevalence of Hepatitis C Virus Infection in US States and the District of Columbia, 2013 to 2016. JAMA Netw. Open
2018
,1,
e186371. [CrossRef]
4.
Hofmeister, M.G.; Rosenthal, E.M.; Barker, L.K.; Rosenberg, E.S.; Barranco, M.A.; Hall, E.W.; Edlin, B.R.; Mermin, J.; Ward, J.W.;
Ryerson, A.B. Estimating Prevalence of Hepatitis C Virus Infection in the United States, 2013–2016. Hepatology
2019
,69, 1020–1031.
[CrossRef] [PubMed]
5.
Shiffman, M.L. Hepatitis C Virus Therapy in the Direct Acting Antiviral Era. Curr. Opin. Gastroenterol.
2014
,30, 217–222.
[CrossRef] [PubMed]
6.
Bethea, E.D.; Chen, Q.; Hur, C.; Chung, R.T.; Chhatwal, J. Should We Treat Acute Hepatitis C? A Decision and Cost-Effectiveness
Analysis. Hepatology 2018,67, 837–846. [CrossRef] [PubMed]
7.
Bruggmann, P.; Litwin, A.H. Models of Care for the Management of Hepatitis C Virus among People Who Inject Drugs: One Size
Does Not Fit All. Clin. Infect. Dis. 2013,57 (Suppl. S2), S56–S61. [CrossRef]
8.
Hellard, M.; Doyle, J.S.; Sacks-Davis, R.; Thompson, A.J.; McBryde, E. Eradication of Hepatitis C Infection: The Importance of
Targeting People Who Inject Drugs. Hepatology 2014,59, 366–369. [CrossRef]
9.
Carlisle, B.; Kimmelman, J.; Ramsay, T.; MacKinnon, N. Unsuccessful Trial Accrual and Human Subjects Protections: An Empirical
Analysis of Recently Closed Trials. Clin. Trials 2015,12, 77–83. [CrossRef]
10.
Huang, G.D.; Bull, J.; Johnston McKee, K.; Mahon, E.; Harper, B.; Roberts, J.N. Clinical Trials Recruitment Planning: A Proposed
Framework from the Clinical Trials Transformation Initiative. Contemp. Clin. Trials 2018,66, 74–79. [CrossRef]
Healthcare 2024,12, 644 14 of 16
11.
Goldberg, A.; Bakhireva, L.N.; Page, K.; Henrie, A.M. A Qualitative Scoping Review of Early-Terminated Clinical Trials
Sponsored by the Department of Veterans Affairs Cooperative Studies Program From 2010 to 2020. Epidemiol. Rev.
2022
,44,
110–120. [CrossRef]
12.
Matheny, J.G. The Economics of Pharmaceutical Development: Costs, Risks, and Incentives. Ph.D. Thesis, Johns Hopkins
University, Baltimore, MD, USA, 2013.
13.
Battelle Technology Partnership Practice. Biopharmaceutical Industry-Sponsored Clinical Trials: Impact on State Economies; Battelle:
Columbus, OH, USA, 2015.
14.
Wilder, J.; Saraswathula, A.; Hasselblad, V.; Muir, A. A Systematic Review of Race and Ethnicity in Hepatitis C Clinical Trial
Enrollment. J. Natl. Med. Assoc. 2016,108, 24–29. [CrossRef]
15.
Grebely, J.; Raffa, J.D.; Lai, C.; Krajden, M.; Conway, B.; Tyndall, M.W. Factors Associated with Spontaneous Clearance of Hepatitis
C Virus among Illicit Drug Users. Can. J. Gastroenterol. 2007,21, 447–451. [CrossRef]
16.
van den Berg, C.H.B.S.; Grady, B.P.X.; Schinkel, J.; van de Laar, T.; Molenkamp, R.; van Houdt, R.; Coutinho, R.A.; van Baarle, D.;
Prins, M. Female Sex and IL28B, a Synergism for Spontaneous Viral Clearance in Hepatitis C Virus (HCV) Seroconverters from a
Community-Based Cohort. PLoS ONE 2011,6, e27555. [CrossRef]
17.
Bakr, I.; Rekacewicz, C.; Hosseiny, M.E.; Ismail, S.; Daly, M.E.; El-Kafrawy, S.; Esmat, G.; Hamid, M.A.; Mohamed, M.K.; Fontanet,
A. Higher Clearance of Hepatitis C Virus Infection in Females Compared with Males. Gut
2006
,55, 1183–1187. [CrossRef]
[PubMed]
18.
Piasecki, B.A.; Lewis, J.D.; Reddy, K.R.; Bellamy, S.L.; Porter, S.B.; Weinrieb, R.M.; Stieritz, D.D.; Chang, K.-M. Influence of Alcohol
Use, Race, and Viral Coinfections on Spontaneous HCV Clearance in a US Veteran Population. Hepatology
2004
,40, 892–899.
[CrossRef] [PubMed]
19.
Reid, M.; Price, J.C.; Tien, P.C. Hepatitis C Virus Infection in the Older Patient. Infect. Dis. Clin. North Am.
2017
,31, 827–838.
[CrossRef] [PubMed]
20.
FDA Office of Minority Health and Health Equity. Enhance EQUITY in Clinical Trials. Available online: https://www.fda.gov/
consumers/enhance-equity-initiative/enhance-equity-clinical-trials (accessed on 27 July 2022).
21.
National Institutes of Health. Inclusion of Women and Minorities as Participants in Research Involving Human Subjects. Available
online: https://grants.nih.gov/policy/inclusion/women-and-minorities.htm (accessed on 27 July 2022).
22.
Kerr, K.F.; Roth, J.; Zhu, K.; Thiessen-Philbrook, H.; Meisner, A.; Wilson, F.P.; Coca, S.; Parikh, C.R. Evaluating Biomarkers for
Prognostic Enrichment of Clinical Trials. Clin. Trials 2017,14, 629–638. [CrossRef]
23.
Irazabal, M.V.; Abebe, K.Z.; Bae, K.T.; Perrone, R.D.; Chapman, A.B.; Schrier, R.W.; Yu, A.S.; Braun, W.E.; Steinman, T.I.; Harris,
P.C.; et al. Prognostic Enrichment Design in Clinical Trials for Autosomal Dominant Polycystic Kidney Disease: The HALT-PKD
Clinical Trial. Nephrol. Dial. Transplant. 2017,32, 1857–1865. [CrossRef] [PubMed]
24.
Heitjan, D.F.; Ge, Z.; Ying, G. Real-Time Prediction of Clinical Trial Enrollment and Event Counts: A Review. Contemp. Clin. Trials
2015,45, 26–33. [CrossRef] [PubMed]
25.
Stevens, V.W.; Russo, E.M.; Young-Xu, Y.; Leecaster, M.; Zhang, Y.; Zhang, C.; Yu, H.; Cai, B.; Gonzalez, E.N.; Gerding, D.N.; et al.
Identification of Patients at Risk of Clostridioides Difficile Infection for Enrollment in Vaccine Clinical Trials. Vaccine
2021
,39,
536–544. [CrossRef] [PubMed]
26.
Kahn, J.M.; Gray, D.M., II; Oliveri, J.M.; Washington, C.M.; DeGraffinreid, C.R.; Paskett, E.D. Strategies to Improve Diversity,
Equity, and Inclusion in Clinical Trials. Cancer 2022,128, 216–221. [CrossRef]
27.
Kang, M.; Nicolay, U. Evaluation of Operational Chronic Infection Endpoints for HCV Vaccine Trials. Contemp. Clin. Trials
2008
,
29, 671–678. [CrossRef]
28.
Young, A.M.; Stephens, D.B.; Khaleel, H.A.; Havens, J.R. Hepatitis C Vaccine Clinical Trials among People Who Use Drugs:
Potential for Participation and Involvement in Recruitment. Contemp. Clin. Trials 2015,41, 9–16. [CrossRef] [PubMed]
29.
Ismail, A.; Al-Zoubi, T.; El Naqa, I.; Saeed, H. The Role of Artificial Intelligence in Hastening Time to Recruitment in Clinical
Trials. BJR Open 2023,5, 20220023. [CrossRef] [PubMed]
30.
Harrer, S.; Shah, P.; Antony, B.; Hu, J. Artificial Intelligence for Clinical Trial Design. Trends Pharmacol. Sci.
2019
,40, 577–591.
[CrossRef]
31.
Oikonomou, E.K.; Thangaraj, P.M.; Bhatt, D.L.; Ross, J.S.; Young, L.H.; Krumholz, H.M.; Suchard, M.A.; Khera, R. An Explainable
Machine Learning-Based Phenomapping Strategy for Adaptive Predictive Enrichment in Randomized Clinical Trials. npj Digit.
Med. 2023,6, 217. [CrossRef]
32.
Gutfraind, A.; Boodram, B.; Prachand, N.; Hailegiorgis, A.; Dahari, H.; Major, M.E. Agent-Based Model Forecasts Aging of the
Population of People Who Inject Drugs in Metropolitan Chicago and Changing Prevalence of Hepatitis C Infections. PLoS ONE
2015,10, e0137993. [CrossRef]
33.
Tatara, E.; Collier, N.T.; Ozik, J.; Gutfraind, A.; Cotler, S.J.; Dahari, H.; Major, M.; Boodram, B. Multi-Objective Model Exploration
of Hepatitis C Elimination in an Agent-Based Model of People Who Inject Drugs. Proc. Winter Simul. Conf.
2019
,2019, 1008–1019.
[CrossRef]
34. Cox, D.; Oakes, D. Analysis of Survival Data; CRC Press: Boca Raton, FL, USA, 1984; ISBN 978-0-412-24490-2.
35.
Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random Survival Forests. Ann. Appl. Stat.
2008
,2, 841–860. [CrossRef]
36. Genuer, R.; Poggi, J.-M.; Tuleau, C. Random Forests: Some Methodological Insights. arXiv 2008, arXiv:0811.3619.
Healthcare 2024,12, 644 15 of 16
37.
Goldstein, B.A.; Polley, E.C.; Briggs, F.B.S. Random Forests for Genetic Association Studies. Stat. Appl. Genet. Mol. Biol.
2011
,10,
32. [CrossRef]
38.
Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How Many Trees in a Random Forest? In Machine Learning and Data Mining in Pattern
Recognition—Proceedings of the 8th International Conference, MLDM 2012, Berlin, Germany, 13–20 July 2012; Perner, P., Ed.; Springer:
Berlin/Heidelberg, 2012; pp. 154–168.
39. Probst, P.; Boulesteix, A.-L. To Tune or Not to Tune the Number of Trees in Random Forest? J. Mach. Learn. Res. 2017,18, 1–18.
40.
Tatara, E.; Gutfraind, A.; Collier, N.T.; Echevarria, D.; Cotler, S.J.; Major, M.E.; Ozik, J.; Dahari, H.; Boodram, B. Modeling Hepatitis
C Micro-Elimination among People Who Inject Drugs with Direct-Acting Antivirals in Metropolitan Chicago. PLoS ONE
2022
,17,
e0264983. [CrossRef]
41. Mackesy-Amiti, M.E.; Gutfraind, A.; Tatara, E.R.; Collier, N.T.; Cotler, S.J.; Page, K.; Ozik, J.T.; Boodram, B.; Major, M.E.; Dahari,
H. Simulations of HCV Vaccine Trials Demonstrate Effects of Background Incidence and Unbalanced Exposure That Can Impact
Vaccine Efficacy. Hepatology 2021,74, 604A.
42. Gutfraind, A.; Mackesy-Amiti, M.E.; Tatara, E.R.; Collier, N.T.; Cotler, S.J.; Page, K.; Ozik, J.T.; Boodram, B.; Major, M.E.; Dahari,
H. Simulations of HCV Vaccine Trials Reveal Opportunities to Re-Evaluate Vaccine Efficacy. J. Hepatol. 2021,75, S768.
43.
Harrell, F.E., Jr.; Califf, R.M.; Pryor, D.B.; Lee, K.L.; Rosati, R.A. Evaluating the Yield of Medical Tests. JAMA
1982
,247, 2543–2546.
[CrossRef]
44.
Uno, H.; Cai, T.; Pencina, M.J.; D’Agostino, R.B.; Wei, L.J. On the C-Statistics for Evaluating Overall Adequacy of Risk Prediction
Procedures with Censored Survival Data. Stat. Med. 2011,30, 1105–1117. [CrossRef]
45.
Hartman, N.; Kim, S.; He, K.; Kalbfleisch, J.D. Pitfalls of the Concordance Index for Survival Outcomes. Stat. Med.
2023
,42,
2179–2190. [CrossRef]
46.
Steyerberg, E. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating; Springer: New York, NY,
USA, 2009.
47.
DeMaris, A.; Selman, S. Converting Data into Evidence: A Statistics Primer for the Medical Practitioner; Springer: New York, NY, USA,
2013.
48.
Centers for Disease Control and Prevention Principles of Epidemiology. Available online: https://www.cdc.gov/csels/dsepd/
ss1978/lesson3/section6.html (accessed on 1 July 2023).
49.
Friede, T.; Pohlmann, H.; Schmidli, H. Blinded sample size reestimation in event-driven clinical trials: Methods and an application
in multiple sclerosis. Pharm. Stat. 2019,18, 351–365. [CrossRef] [PubMed]
50.
Chen, S.; Li, J.; Shu, M. Use of Participation to Prevalence Ratio for Evaluating the Representation Status of Women in Oncology
Clinical Trials. JAMA Oncol. 2022,8, 479–480. [CrossRef]
51.
Khan, S.U.; Khan, M.Z.; Raghu Subramanian, C.; Riaz, H.; Khan, M.U.; Lone, A.N.; Khan, M.S.; Benson, E.-M.; Alkhouli, M.;
Blaha, M.J.; et al. Participation of Women and Older Participants in Randomized Clinical Trials of Lipid-Lowering Therapies: A
Systematic Review. JAMA Netw. Open 2020,3, e205202. [CrossRef]
52.
Saltzman, R.G.; Jayaweera, D.T.; Caceres, L.V.; Tovar, J.A.; Vidro-Casiano, M.; Karakeshishyan, V.; Soto, J.; Khan, A.; Mitrani, R.D.;
Schulman, I.H.; et al. Demographic Representation in Clinical Trials for Cell-Based Therapy. Contemp. Clin. Trials Commun.
2021
,
21, 100702. [CrossRef]
53.
Scott, P.E.; Unger, E.F.; Jenkins, M.R.; Southworth, M.R.; McDowell, T.-Y.; Geller, R.J.; Elahi, M.; Temple, R.J.; Woodcock, J.
Participation of Women in Clinical Trials Supporting FDA Approval of Cardiovascular Drugs. J. Am. Coll. Cardiol.
2018
,71,
1960–1969. [CrossRef]
54.
Jin, X.; Chandramouli, C.; Allocco, B.; Gong, E.; Lam, C.S.P.; Yan, L.L. Women’s Participation in Cardiovascular Clinical Trials
From 2010 to 2017. Circulation 2020,141, 540–548. [CrossRef] [PubMed]
55.
Therneau, T.M.; Lumley, T.; Atkinson, E.; Crowson, C. A Package for Survival Analysis in R. Available online: https://CRAN.R-
project.org/package=survival (accessed on 20 July 2022).
56.
Ishwaran, H.; Kogalur, U.B. Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). Available online:
https://cran.r-project.org/web/packages/randomForestSRC/index.html (accessed on 20 July 2022).
57. R Core Team R. A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2019.
58. Simon, N.; Simon, R. Adaptive Enrichment Designs for Clinical Trials. Biostatistics 2013,14, 613–625. [CrossRef] [PubMed]
59.
Mehta, C.R.; Pocock, S.J. Adaptive Increase in Sample Size When Interim Results Are Promising: A Practical Guide with Examples.
Stat. Med. 2011,30, 3267–3284. [CrossRef]
60.
Uozumi, R.; Yada, S.; Kawaguchi, A. Patient Recruitment Strategies for Adaptive Enrichment Designs with Time-to-Event
Endpoints. BMC Med. Res. Methodol. 2019,19, 159. [CrossRef]
61.
Rosenblum, M.; Hanley, D.F. Adaptive Enrichment Designs for Stroke Clinical Trials. Stroke
2017
,48, 2021–2025. [CrossRef]
[PubMed]
62.
Buchanan, A.L.; Hudgens, M.G.; Cole, S.R.; Mollan, K.R.; Sax, P.E.; Daar, E.S.; Adimora, A.A.; Eron, J.J.; Mugavero, M.J.
Generalizing Evidence from Randomized Trials Using Inverse Probability of Sampling Weights. J. R. Stat. Soc. Ser. A Stat. Soc.
2018,181, 1193–1209. [CrossRef] [PubMed]
63.
Li, F.; Buchanan, A.L.; Cole, S.R. Generalizing Trial Evidence to Target Populations in Non-Nested Designs: Applications to AIDS
Clinical Trials. J. R. Stat. Soc. Ser. C Appl. Stat. 2022,71, 669–697. [CrossRef]
64. Mansournia, M.A.; Altman, D.G. Inverse Probability Weighting. BMJ 2016,352, i189. [CrossRef]
Healthcare 2024,12, 644 16 of 16
65.
Kasereka, S.K.; Zohinga, G.N.; Kiketa, V.M.; Ngoie, R.-B.M.; Mputu, E.K.; Kasoro, N.M.; Kyandoghere, K. Equation-Based
Modeling vs. Agent-Based Modeling with Applications to the Spread of COVID-19 Outbreak. Mathematics
2023
,11, 253.
[CrossRef]
66.
Paquette, D.; Bryant, J.; de Wit, J. Respondent-Driven Sampling and the Recruitment of People with Small Injecting Networks.
AIDS Behav. 2012,16, 890–899. [CrossRef]
67.
Harris, M.; Rhodes, T. Hepatitis C Treatment Access and Uptake for People Who Inject Drugs: A Review Mapping the Role of
Social Factors. Harm Reduct. J. 2013,10, 7. [CrossRef] [PubMed]
68.
Soria, J.; Johnson, T.; Collins, J.; Corby-Lee, G.; Thacker, J.; White, C.; Hoven, A.; Thornton, A. Risk Factors for Loss to Follow-up
of Persons Who Inject Drugs Enrolled at Syringe Services Programs in Kentucky. Int. J. Drug Policy
2021
,95, 103255. [CrossRef]
[PubMed]
69.
Darvishian, M.; Wong, S.; Binka, M.; Yu, A.; Ramji, A.; Yoshida, E.M.; Wong, J.; Rossi, C.; Butt, Z.A.; Bartlett, S.; et al. Loss
to Follow-up: A Significant Barrier in the Treatment Cascade with Direct-Acting Therapies. J. Viral Hepat.
2020
,27, 243–260.
[CrossRef]
70.
Levy, V.; Evans, J.L.; Stein, E.S.; Davidson, P.J.; Lum, P.J.; Hahn, J.A.; Page, K. Are Young Injection Drug Users Ready and Willing
to Participate in Preventive HCV Vaccine Trials? Vaccine 2010,28, 5947–5951. [CrossRef]
71.
Gilbert, P.B.; Bosch, R.J.; Hudgens, M.G. Sensitivity Analysis for the Assessment of Causal Vaccine Effects on Viral Load in HIV
Vaccine Trials. Biometrics 2003,59, 531–541. [CrossRef]
72.
Mewhirter, J.; Sagir, M.; Sanders, R. Towards a Predictive Model of COVID-19 Vaccine Hesitancy among American Adults. Vaccine
2022,40, 1783–1789. [CrossRef]
73.
Tempalski, B.; Pouget, E.R.; Cleland, C.M.; Brady, J.E.; Cooper, H.L.F.; Hall, H.I.; Lansky, A.; West, B.S.; Friedman, S.R. Trends in
the Population Prevalence of People Who Inject Drugs in US Metropolitan Areas 1992–2007. PLoS ONE
2013
,8, e64789. [CrossRef]
74.
Lansky, A.; Abdul-Quader, A.S.; Cribbin, M.; Hall, T.; Finlayson, T.J.; Garfein, R.S.; Lin, L.S.; Sullivan, P.S. Developing an HIV
Behavioral Surveillance System for Injecting Drug Users: The National HIV Behavioral Surveillance System. Public Health Rep.
2007,122, 48–55. [CrossRef]
75.
Huo, D.; Ouellet, L.J. Needle Exchange and Injection-Related Risk Behaviors in Chicago: A Longitudinal Study. JAIDS J. Acquir.
Immune Defic. Syndr. 2007,45, 108–114. [CrossRef]
76.
Boodram, B.; Hotton, A.L.; Shekhtman, L.; Gutfraind, A.; Dahari, H. High-Risk Geographic Mobility Patterns among Young
Urban and Suburban Persons Who Inject Drugs and Their Injection Network Members. J. Urban Health
2018
,95, 71–82. [CrossRef]
77.
Boodram, B.; Mackesy-Amiti, M.-E.; Latkin, C. The Role of Social Networks and Geography on Risky Injection Behaviors of
Young Persons Who Inject Drugs. Drug Alcohol Depend. 2015,154, 229–235. [CrossRef]
78. Temple, R. Enrichment of Clinical Study Populations. Clin. Pharmacol. Ther. 2010,88, 774–778. [CrossRef]
79. Ferguson, T.S. Who Solved the Secretary Problem? Stat. Sci. 1989,4, 282–289. [CrossRef]
80.
Camidge, D.R.; Park, H.; Smoyer, K.E.; Jacobs, I.; Lee, L.J.; Askerova, Z.; McGinnis, J.; Zakharia, Y. Race and Ethnicity Repre-
sentation in Clinical Trials: Findings from a Literature Review of Phase I Oncology Trials. Future Oncol.
2021
,17, 3271–3280.
[CrossRef]
81.
Ma, M.A.; Gutiérrez, D.E.; Frausto, J.M.; Al-Delaimy, W.K. Minority Representation in Clinical Trials in the United States: Trends
Over the Past 25 Years. Mayo Clin. Proc. 2021,96, 264–266. [CrossRef] [PubMed]
82.
Kennedy-Martin, T.; Curtis, S.; Faries, D.; Robinson, S.; Johnston, J. A Literature Review on the Representativeness of Randomized
Controlled Trial Samples and Implications for the External Validity of Trial Results. Trials 2015,16, 495. [CrossRef]
83. Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 1988.
84.
Stark, M.; Hesse, M.; Brannath, W.; Zapf, A. Blinded Sample Size Re-Estimation in a Comparative Diagnostic Accuracy Study.
BMC Med. Res. Methodol. 2022,22, 115. [CrossRef] [PubMed]
85.
Venn, M.L.; Knowles, C.H.; Li, E.; Glasbey, J.; Morton, D.G.; Hooper, R. ESCP EAGLE Safe Anastomosis Collaborative Imple-
mentation of a Batched Stepped Wedge Trial Evaluating a Quality Improvement Intervention for Surgical Teams to Reduce
Anastomotic Leak after Right Colectomy. Trials 2023,24, 329. [CrossRef] [PubMed]
86.
Yao, X.; Attia, Z.I.; Behnken, E.M.; Walvatne, K.; Giblon, R.E.; Liu, S.; Siontis, K.C.; Gersh, B.J.; Graff-Radford, J.; Rabinstein,
A.A.; et al. Batch Enrollment for an Artificial Intelligence-Guided Intervention to Lower Neurologic Events in Patients with
Undiagnosed Atrial Fibrillation: Rationale and Design of a Digital Clinical Trial. Am. Heart J.
2021
,239, 73–79. [CrossRef]
[PubMed]
87.
Wermuth, P. Participant Recruitment, Screening, and Enrollment. In Principles and Practice of Clinical Trials; Piantadosi, S., Meinert,
C.L., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 257–278. ISBN 978-3-319-52636-2.
88. Breiman, L. Random Forests. Mach. Learn. 2001,45, 5–32. [CrossRef]
Disclaimer/Publisher’s Note:
The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
... 15,16 Vaccines are a cornerstone of public health efforts, crucial for controlling infectious diseases and reducing global morbidity and mortality. 17,18 However, traditional vaccine development often relies on trial-and-error methods, which can take years to yield successful candidates. AI and ML, on the other hand, offer unprecedented opportunities to expedite this process by leveraging vast amounts of data and advanced algorithms. ...
Article
Full-text available
With the rapid advancement of artificial intelligence (AI) and machine learning (ML) technologies, their applications in the medical field have expanded significantly. Particularly in vaccine innovation, AI and ML have shown considerable potential. This article employs bibliometric analysis to examine the progress of AI and ML in vaccine innovation over recent years. By conducting literature retrieval, data extraction, and intelligent analysis through Web of Science, it provides more accurate and comprehensive insights into vaccine development and dosimetry. The rapid growth in research publications since 2012, particularly the geometric growth observed since 2017, underscores the increasing recognition of the potential of AI and ML to revolutionize vaccine development. However, despite the substantial benefits of AI and ML in vaccine innovation, challenges remain regarding data quality, algorithm reliability, and ethical considerations. As technology continues to advance and research deepens, AI and machine learning are anticipated to play an even more pivotal role in vaccine innovation. Notably, AI has the potential to accelerate vaccine development timelines, particularly in the context of emerging infectious diseases. By leveraging data-driven insights and predictive modeling, AI can streamline processes such as antigen discovery, clinical trial design, and risk assessment, thereby enabling faster responses to public health emergencies. This capability is especially critical for addressing sudden outbreaks of infectious diseases, where rapid deployment of effective vaccines can significantly mitigate global health risks.
Article
Recent advances have given rise to a spectrum of digital health technologies that have the potential to revolutionize the design and conduct of cardiovascular clinical trials. Advances in domain tasks such as automated diagnosis and classification, synthesis of high-volume data and latent data from adjacent modalities, patient discovery, telemedicine, remote monitoring, augmented reality, and in silico modelling have the potential to enhance the efficiency, accuracy, and cost-effectiveness of cardiovascular clinical trials. However, early experience with these tools has also exposed important issues, including regulatory barriers, clinical validation and acceptance, technological literacy, integration with care models, and health equity concerns. This narrative review summarizes the landscape of digital tools at each stage of clinical trial planning and execution and outlines roadblocks and opportunities for successful implementation of digital tools in cardiovascular clinical trials.
Article
Full-text available
Randomized clinical trials (RCT) represent the cornerstone of evidence-based medicine but are resource-intensive. We propose and evaluate a machine learning (ML) strategy of adaptive predictive enrichment through computational trial phenomaps to optimize RCT enrollment. In simulated group sequential analyses of two large cardiovascular outcomes RCTs of (1) a therapeutic drug (pioglitazone versus placebo; Insulin Resistance Intervention after Stroke (IRIS) trial), and (2) a disease management strategy (intensive versus standard systolic blood pressure reduction in the Systolic Blood Pressure Intervention Trial (SPRINT)), we constructed dynamic phenotypic representations to infer response profiles during interim analyses and examined their association with study outcomes. Across three interim timepoints, our strategy learned dynamic phenotypic signatures predictive of individualized cardiovascular benefit. By conditioning a prospective candidate’s probability of enrollment on their predicted benefit, we estimate that our approach would have enabled a reduction in the final trial size across ten simulations (IRIS: −14.8% ± 3.1%, pone-sample t-test = 0.001; SPRINT: −17.6% ± 3.6%, pone-sample t-test < 0.001), while preserving the original average treatment effect (IRIS: hazard ratio of 0.73 ± 0.01 for pioglitazone vs placebo, vs 0.76 in the original trial; SPRINT: hazard ratio of 0.72 ± 0.01 for intensive vs standard systolic blood pressure, vs 0.75 in the original trial; all simulations with Cox regression-derived p value of < 0.01 for the effect of the intervention on the respective primary outcome). This adaptive framework has the potential to maximize RCT enrollment efficiency.
Article
Full-text available
Novel and developing artificial intelligence (AI) systems can be integrated into healthcare settings in numerous ways. For example, in the case of automated image classification and natural language processing, AI systems are beginning to demonstrate near expert level performance in detecting abnormalities such as seizure activity. This paper, however, focuses on AI integration into clinical trials. During the clinical trial recruitment process, considerable labor and time is spent sifting through electronic health record and interviewing patients. With the advancement of deep learning techniques such as natural language processing, intricate electronic health record data can be efficiently processed. This provides utility to workflows such as recruitment for clinical trials. Studies are starting to show promise in shortening the time to recruitment and reducing workload for those involved in clinical trial design. Additionally, numerous guidelines are being constructed to encourage integration of AI into the healthcare setting with meaningful impact. The goal would be to improve the clinical trial process by reducing bias in patient composition, improving retention of participants, and lowering costs and labor.
Article
Full-text available
Background Large-scale quality improvement interventions demand robust trial designs with flexibility for delivery in different contexts, particularly during a pandemic. We describe innovative features of a batched stepped wedge trial, ESCP sAfe Anastomosis proGramme in CoLorectal SurgEry (EAGLE), intended to reduce anastomotic leak following right colectomy, and reflect on lessons learned about the implementation of quality improvement programmes on an international scale. Methods Surgical units were recruited and randomised in batches to receive a hospital-level education intervention designed to reduce anastomotic leak, either before, during, or following data collection. All consecutive patients undergoing right colectomy were included. Online learning, patient risk stratification and an in-theatre checklist constituted the intervention. The study was powered to detect an absolute risk reduction of anastomotic leak from 8.1 to 5.6%. Statistical efficiency was optimised using an incomplete stepped wedge trial design and study batches analysed separately then meta-analysed to calculate the intervention effect. An established collaborative group helped nurture strong working relationships between units/countries and a prospectively designed process evaluation will enable evaluation of both the intervention and its implementation. Results The batched trial design allowed sequential entry of clusters, targeted research training and proved to be robust to pandemic interruptions. Staggered start times in the incomplete stepped wedge design with long lead-in times can reduce motivation and engagement and require careful administration. Conclusion EAGLE’s robust but flexible study design allowed completion of the study across globally distributed geographical locations in spite of the pandemic. The primary outcome analysed in conjunction with the process evaluation will ensure a rich understanding of the intervention and the effects of the study design. Trial registration National Institute of Health Research Clinical Research Network portfolio IRAS ID: 272,250. Health Research Authority approval 18 October 2019. ClinicalTrials.gov, identifier NCT04270721, protocol ID RG_19196.
Article
Full-text available
In this paper, we explore two modeling approaches to understanding the dynamics of infectious diseases in the population: equation-based modeling (EBM) and agent-based modeling (ABM). To achieve this, a comparative study of these approaches was conducted and we highlighted their advantages and disadvantages. Two case studies on the spread of the COVID-19 pandemic were carried out using both approaches. The results obtained show that differential equation-based models are faster but still simplistic, while agent-based models require more machine capabilities but are more realistic and very close to biology. Based on these outputs, it seems that the coupling of both approaches could be an interesting compromise.
Article
Full-text available
Background The sample size calculation in a confirmatory diagnostic accuracy study is performed for co-primary endpoints because sensitivity and specificity are considered simultaneously. The initial sample size calculation in an unpaired and paired diagnostic study is based on assumptions about, among others, the prevalence of the disease and, in the paired design, the proportion of discordant test results between the experimental and the comparator test. The choice of the power for the individual endpoints impacts the sample size and overall power. Uncertain assumptions about the nuisance parameters can additionally affect the sample size. Methods We develop an optimal sample size calculation considering co-primary endpoints to avoid an overpowered study in the unpaired and paired design. To adjust assumptions about the nuisance parameters during the study period, we introduce a blinded adaptive design for sample size re-estimation for the unpaired and the paired study design. A simulation study compares the adaptive design to the fixed design. For the paired design, the new approach is compared to an existing approach using an example study. Results Due to blinding, the adaptive design does not inflate type I error rates. The adaptive design reaches the target power and re-estimates nuisance parameters without any relevant bias. Compared to the existing approach, the proposed methods lead to a smaller sample size. Conclusions We recommend the application of the optimal sample size calculation and a blinded adaptive design in a confirmatory diagnostic accuracy study. They compensate inefficiencies of the sample size calculation and support to reach the study aim.
Article
Full-text available
Hepatitis C virus (HCV) infection is a leading cause of chronic liver disease and mortality worldwide. Direct-acting antiviral (DAA) therapy leads to high cure rates. However, persons who inject drugs (PWID) are at risk for reinfection after cure and may require multiple DAA treatments to reach the World Health Organization’s (WHO) goal of HCV elimination by 2030. Using an agent-based model (ABM) that accounts for the complex interplay of demographic factors, risk behaviors, social networks, and geographic location for HCV transmission among PWID, we examined the combination(s) of DAA enrollment (2.5%, 5%, 7.5%, 10%), adherence (60%, 70%, 80%, 90%) and frequency of DAA treatment courses needed to achieve the WHO’s goal of reducing incident chronic infections by 90% by 2030 among a large population of PWID from Chicago, IL and surrounding suburbs. We also estimated the economic DAA costs associated with each scenario. Our results indicate that a DAA treatment rate of >7.5% per year with 90% adherence results in 75% of enrolled PWID requiring only a single DAA course; however 19% would require 2 courses, 5%, 3 courses and <2%, 4 courses, with an overall DAA cost of $325 million to achieve the WHO goal in metropolitan Chicago. We estimate a 28% increase in the overall DAA cost under low adherence (70%) compared to high adherence (90%). Our modeling results have important public health implications for HCV elimination among U.S. PWID. Using a range of feasible treatment enrollment and adherence rates, we report robust findings supporting the need to address re-exposure and reinfection among PWID to reduce HCV incidence.
Article
Full-text available
Designing effective public health campaigns to combat COVID-19 vaccine hesitancy requires an understanding of i) who the vaccine hesitant population is, and ii) the determinants of said population’s hesitancy. While researchers have identified a number of variables associated with COVID-19 vaccine hesitancy that could inform such campaigns, little is known about the cumulative or relative predictive power of these factors. In this article, we employ a machine learning model to analyze online survey data collected from 3,353 respondents. The model incorporates an array of variables that have been shown to impact vaccine hesitancy, allowing us to i) test how well we can predict vaccine hesitancy, and ii) compare the relative predictive impact of each covariate. The model allows us to correctly classify individuals that are vaccine acceptant with 97% accuracy, and those that are vaccine hesitant with 72% accuracy. Trust in and knowledge about vaccines is, by far, the strongest predictor of vaccination choice. While our results demonstrate that public health campaigns designed to increase vaccination rates must find a way to increase public trust in COVID-19 vaccines, our results cannot speak to the malleability of such beliefs, nor how to enhance trust.
Article
Prognostic models are useful tools for assessing a patient's risk of experiencing adverse health events. In practice, these models must be validated before implementation to ensure that they are clinically useful. The concordance index (C-Index) is a popular statistic that is used for model validation, and it is often applied to models with binary or survival outcome variables. In this paper, we summarize existing criticism of the C-Index and show that many limitations are accentuated when applied to survival outcomes, and to continuous outcomes more generally. We present several examples that show the challenges in achieving high concordance with survival outcomes, and we argue that the C-Index is often not clinically meaningful in this setting. We derive a relationship between the concordance probability and the coefficient of determination under an ordinary least squares model with normally distributed predictors, which highlights the limitations of the C-Index for continuous outcomes. Finally, we recommend existing alternatives that more closely align with common uses of survival models.
Article
Increasing interest has been given to the risks and benefits of terminating large clinical trials before reaching pre-specified targets, as such decision can greatly impact the implementation of findings. Department of Veterans Affairs Cooperative Studies Program (VA CSP) is a research infrastructure dedicated to conducting high-quality clinical research. A scoping review was performed to characterize barriers preventing the attainment of pre-specified recruitment, statistical power, or sample size targets in VA CSP trials. A trial was eligible for inclusion if the trial was sponsored by the VA CSP, primary findings were published within the last 10 years, and a decision was made to terminate enrollment or follow-up prior to meeting a priori recruitment or endpoint targets. Following screening, a total of 29 clinical trials were identified. In 11 trials (37.9%), a decision was made to terminate the trial early. The most common reason for early termination was related to under-recruitment (N=5). Other reasons included early detection of safety signals (N=2), futility (N=1), and benefit (N=1). Under-recruitment can lead to underpowered study results, which may result in a failure to detect a true effect and hinder implementation of trial findings. This review highlights recruitment as a critical facet of trial conduct which may hinder the production of high-quality data, and thus warrant additional attention. Solutions to enhance recruitment now implemented by the VA CSP, including dedicated enrollment infrastructure and screening facilitated by informatics approaches, show promise in reducing this cause for early termination.
Article
Comparative effectiveness evidence from randomized trials may not be directly generalizable to a target population of substantive interest when, as in most cases, trial participants are not randomly sampled from the target population. Motivated by the need to generalize evidence from two trials conducted in the AIDS Clinical Trials Group (ACTG), we consider weighting, regression and doubly robust estimators to estimate the causal effects of HIV interventions in a specified population of people living with HIV in the USA. We focus on a non‐nested trial design and discuss strategies for both point and variance estimation of the target population average treatment effect. Specifically in the generalizability context, we demonstrate both analytically and empirically that estimating the known propensity score in trials does not increase the variance for each of the weighting, regression and doubly robust estimators. We apply these methods to generalize the average treatment effects from two ACTG trials to specified target populations and operationalize key practical considerations. Finally, we report on a simulation study that investigates the finite‐sample operating characteristics of the generalizability estimators and their sandwich variance estimators.