Page 1
VASCULAR-INTERVENTIONAL
“Number needed to read”—How to facilitate clinical trials
in MR-angiography
M. Voth & U. I. Attenberger & A. Luckscheiter &
S. Haneder & T. Henzler & S. O. Schoenberg &
C. Schwenke & H. J. Michaely
Received: 11 July 2010 /Revised: 30 August 2010 /Accepted: 29 September 2010 /Published online: 23 October 2010
# The Author(s) 2010. This article is published with open access at Springerlink.com
Abstract
Purpose To evaluate the effect of the number of readers on
the statistical results in peripheral MRA.
Materials and methods 40 patients with peripheral arterial
occlusive disease were included as a sample dataset in this
study, randomly separated into two matched groups with n=
20 patients (group 1—gadobutrol, group 2—gadoterate
meglumine) who underwent a continuous table movement
MRA of the peripheral vessels at 3 T. Image quality (IQ) of
17 vessel segments was evaluated by 5 independent readers.
The effect of the number of readers on significance and
statistical power was statistically analyzed.
Results Image quality in group 1 (gadobutrol) ranks
significantly higher compared to group 2 (gadoterade
meglumine) with a diagnostic IQ in 97% vs. 78% (p<
0.0001). For the diagnostic/non-diagnostic IQ assessment
significance was reached with one reader 1/5 times (20%),
with two readers in 4/10 (40%), with three readers in 6/10
(60%), with four readers in 4/5 (80%), with five readers in
1/1 (100%). Power considerations showed considerable
gain when increasing the number of readers.
Conclusion Increasing the number of readers in a diagnostic
MRA-study can be used to achieve a higher power or to
decrease the number of subjects included with maintained
statistical validity.
Keywords MRA . Comparison study . Power analysis .
Vessel segments . Statistical significance
Introduction
Diagnostic studies are clinical studies, designed to evaluate
different diagnostic methods/approaches. Thus, they are
subject to Good Clinical Practice and guidelines from health
authorities. Besides, due to additionally specific guidelines
[FDA [1], CPMP [2]] the efficacy of a diagnostic approach
needs to be proven by a blinded read performed by multiple,
blinded and independent clinical experts. In the majority of
cases, statistical evaluation these days is based on two or
three reader analyses. Moreover, contrary to other clinical
studies the experimental unit in diagnostic studies, i.e., the
unit to which the diagnostic procedure is applied, and the
observational unit, i.e., the unit from which the observation
is obtained, are different. For example in studies to detect
focal liver lesions by an imaging technique, the patient
serves as experimental unit, whereas the single liver lesions
are the observational unit in this instance. Statistical methods
like generalized estimation equations and the modified
adjusted Chi²-test [3] allow for analyses taking into account
correlations between units within a patient and multiple
measurements by different readers.
Up to now, sample size and power are most often based
on the assessment of a single reader whereas the analysis is
done on two or more readers’ assessments. To optimize
study designs with respect to a minimal number of patients
required and minimal overall costs, the effect of increasing
the number of readers should be considered. In addition
M. Voth : U. I. Attenberger : A. Luckscheiter : S. Haneder :
T. Henzler : S. O. Schoenberg : H. J. Michaely (*)
Institute of Clinical Radiology and Nuclear Medicine,
University Medical Center Mannheim,
Medical Faculty Mannheim, University of Heidelberg,
Theodor-Kutzer-Ufer 1-3,
68167 Mannheim, Germany
e-mail: henrik.michaely@umm.de
C. Schwenke
SCOSSIS Statistical Consulting,
Berlin, Germany
Eur Radiol (2011) 21:1034–1042
DOI 10.1007/s00330-010-1993-2
Page 2
underpowered studies expose subjects to unnecessary risks,
as excessively overpowered studies [4]. In our study we
focused on ordinal and binary endpoints in a parallel-group
setting comparing image quality of two gadolinium-based
contrast agents (GBCAs). Currently, contrast-enhanced
magnetic resonance angiography (CE-MRA) using conven-
tional extracellular GBCA is the most widely used
technique in all day clinical routine for visualization and
assessment of vascular disease.
The purpose of our study was therefore three-fold: First, to
demonstrate the effect of different numbers of readers on the
success of an image quality comparison of two macrocyclic
GBCAs in peripheral MRA evaluating, whether more readers
reduce the number of patients required or a fixed sample size
increases the chance of success; Second, to compare the
performance of two statistic methods (generalized estimation
equations (GEEs) and the modified adjusted Chi²-test) for
binary endpoints. Third, to compare image quality of two
different macrocyclic GBCAs for peripheral MRA using a
low dose regimen of 0.07 mmol/kg BW.
Material and methods
Patients
This monocentric, randomized, retrospective open-label
study was approved by the institutional review board and
written informed consent was obtained from all patients
included in the study. Between October 2008 to November
2009 two sets of 20 age and gender-matched patients (mean
age 72 years in gadobutrol group and 69 years in gadoterate
meglumine group, 24 men/16 women) were sampled
(Table 1) out of 172 routine patients suffering from PAOD
at Fontaine stages II–IV and referred for MRA with either
gadobutrol or gadoterate meglumine. The randomization of
the administered contrast agent was based on the day of the
treatment (randomization by treatment day). The data were
stored prospectively and retrospectively analyzed.
MR-hardware
All MRA examinations were performed on a 3.0 T 32
channel whole-body MR system (MAGNETOM Tim Trio
[102×32], Siemens AG, Healthcare Sector, Erlangen,
Germany). For signal reception, a dedicated peripheral
angiography matrix coil with 36 independent coil elements,
2 body matrix coils each with 6 independent coil elements
and 2 clusters of the inbuilt spine matrix were used to cover
the entire field of view (FoV) from the diaphragm to the
feet. All these coils can be tightly fitted to the patients to
allow for high SNR. The patients were positioned supine
and feet-first. In all patients 18 G intravenous access was
obtained in the left or right cubital vein. For the
administration of contrast agent an automated power
injector (Medrad Spectris Solaris EP, Medrad Indianola,
PA) was used.
Contrast agents
For this study two macrocyclic GBCAs were used as high
complex stability [5] owing to the kinetic stability charac-
teristic. The 1.0 molar formulated gadobutrol (Gadovist®,
Bayer Schering Pharma AG, Berlin, Germany) is a
hydrophilic, neutral (nonionic) contrast agent. The 0.5
molar formulated gadoterate meglumine (Dotarem®, Guer-
bet, France) represents a hydrophilic, ionic contrast agent.
The T1 relaxivity (r1) of gadobutrol vs. gadoterate
meglumine is 5.0±0.3 vs. 3.5±0.2 lmmol−1s−1 (in plasma,
at 3.0 T and 37°C) [6]. To allow for a sufficient comparison
between the two different GBCAs the 1 M gadobutrol was
diluted 1:1 with NaCl. This results in similar contrast agent
bolus geometry as form equivalent to the 0.5 M gadoterate
meglumine.
MR imaging
To allow for correct positioning of the MRA, 2D gradient-
echo sequences localizers in coronal and transversal
Gadobutrol (n=20) Gadoterate meglumine (n=20) p-value*
Age (years) 72±11 69±10 0.2030
Male 12 12 1.0000
Female 8 8
Weight (kg) 76±15 81±23 0.5237
Renal Function (MDRD group)
1 (>89 ml/min/1,73 m²) 4 5 0.3947
2 (>60 to ≤89 ml/min/1,73 m²) 6 9
3 (>30 to ≤60 ml/min/1,73 m²) 7 6
4 (>15 to ≤30 ml/min/1,73 m²) 0 0
5 (≤15 ml/min/1,73 m²) 3 0
Table 1 Demography and base-
line characteristics for
randomization
* p-value for difference between
treatment groups
Eur Radiol (2011) 21:1034–1042 1035
Page 3
orientation were acquired first. In addition, a phase-contrast
vessel scout and a fast-view localizer were acquired to
obtain the adjustment data required for continuous table
movement (CTM), for further details see Kramer et al. [7].
A test bolus technique at the level of the renal arteries was
used to calculate the patient’s individual circulation time.
For this purpose 1 ml was injected at 1.5 ml/s followed by a
30 ml NaCl chaser at the same injection rate. The sequence
parameters of the CTM-MRA are specified in Table 2. The
z-axis field-of-view reached from the abdominal aorta to
the distal calves. The CTM-MRA sequence was acquired
before and after the administration of the contrast agent to
allow for mask subtraction. For the CTM-MRA 0.07 mmol/
kg body weight of the respective GBCA was administered
at 1.5 ml/s followed by a 30 ml NaCl chaser. The CTM-
MRA slab was positioned to include the abdominal aorta,
the pelvic vessels as well as the entire vasculature of the
leg. Because of the coronal image orientation of the entire
FoV in z direction an angulation of the FoV is not possible.
The only parameter permitting adjustment of the spatial
resolution was slice thickness. In this study, a spatial
resolution of 1.2×1.2×1.2 mm3 was realized. This value
reflects the limit of the current implementation of the
method that is imposed by memory constraints. During the
CTM-MRA acquisition, the coil elements required to cover
the FOV around the isocenter of the magnet are selected
automatically. To cover a readout FOV of 38 cm, 18
elements are sufficient. Before table movement is initiated
in CTM-MRA, a number of lines are acquired without
moving the table. Likewise, the data acquisition is
prolonged for a few seconds at the end of the imaging
range after the table has stopped moving [7]. Table velocity
during data acquisition is influenced by several parameters;
the most important ones are the acquired spatial resolution
and the applied parallel imaging (PI) factor. In our setting
the table velocity was 22.3 mm/s.
Image evaluation
The respective image quality of CTM-MRA was evaluated
by 5 independent imaging experts with 3–10 years of
expertise in consensus according to a 4-point Likert-like
rating scale assessing overall image quality as previously
used in other studies [8, 9]. Scores allocated were: 4 =
excellent (strong enhancement of the vessels, small side-
branches seen throughout the course of the vessel, no
venous overlay), 3 = good (strong enhancement of the
vessels, some side-branches seen, non-disturbing venous
enhancement), 2 = moderate (moderate enhancement of the
vessels or no side branches seen or moderate venous
contamination), 1 = non diagnostic (poor opacification of
the vessels or disturbing venous signal).
Image quality for CMT MRA was scored for 17 vessel
segments per patient. The pre-defined segments evaluated
are shown in Table 3. As additional endpoint, the assess-
ments were dichotomized to provide information whether
the image was diagnostic (image quality scores 3 and 4) or
non-diagnostic (image quality score 1 and 2).
Statistical analysis
Differences for baseline characteristics between groups
were evaluated by Wilcoxon rank sum tests for continuous
data and Fisher’s exact tests for categorical data. The image
quality was assessed across all five readers and all 17
segments on a binary scale (diagnostic/non diagnostic) as
overall analysis by a modified adjusted Chi²-test.
Then three segments with small differences in image
quality between the two contrast agents were chosen to
analyse the image quality for each reader separately, for all
five readers and for all combinations of 2, 3, and 4 readers
to evaluate the effect of more readers’ assessments on the
significance of the differences between the contrast agents.
These were the Right common iliac artery (AIC right),
Right deep femoral artery (AFP right), and Right posterior
tibial artery (ATP right) marked bold in Table 3.
For the ordinally scaled image quality (4 point scale),
multinomial regression analysis was used based on gener-
alized estimation equations (GEEs) taking into account
multiple observations per patients (several segments within
a patient) and repeated measurements by up to five readers
of the same observational unit. GEEs are the standard
method to analyse data with multiple measures in a single
patient as in our study. It takes into account the correlation
between observations within the same patient and therefore
provides most appropriate summary statistics. Indepen-
dence was used as working correlation matrix and a
cumulative logit function as link function. The hypotheses
were defined as follows: H0: DistributiongroupA = Distribu-
tiongroupB vs. H1: DistributiongroupA ≠ DistributiongroupB.
Table 2 Sequence parameters for CTM-MRA
CTM-MRA
Parallel Imaging GRAPPA 2
Acquisition time (s) 62
Spatial resolution (mm3) 1.2×1.2×1.2
FoV (mm) 1280×337
TR (ms) 2.43
TE (ms) 1.02
Flip angle (°) 21
Matrix 384×312
Slices/slab 88
Bandwidth (Hz/Px) 1000
Orientation Coronal
1036 Eur Radiol (2011) 21:1034–1042
Page 4
The dichotomized image quality (diagnostic yes/no) was
analyzed using logistic regression analysis based on GEEs
as well as the modified adjusted Chi-square approach [3]
also taking into account the correlations between observa-
tional units within a patient and multiple assessments per
unit. Again in the GEEs independence was used as working
correlation matrix. The hypotheses were defined as follows.
H0: PgroupA = PgroupB vs. H1: PgroupA ≠ PgroupB..
Basis for the sample size considerations was the
dichotomized endpoint in a parallel group design. The
power considerations were done for a fixed sample size
of 40 patients with 20 per group. In addition, the sample
size was calculated to reach a power of about 83%, i.e.
more than 80%. Basic assumptions for the differences
between the groups were gained from the across reader
analysis of three vessel segments by all five readers with
proportions of 96% vs. 86% of segments with diagnostic
image quality. The ratio between groups was set to 1:1.
The power considerations were done by simulation
studies of 1000 runs each, simulating studies based on
the results of the average across the five readers’
assessments for between one and four observational units
per patient. The more conservative approach (GEE or
modified adjusted Chi²-test) was used for the simulation.
Two-sided p-values<0.05 were regarded as statistically
significant. A flow chart of the analyses can be found in
Fig. 1.
Statistical calculations were done with software SAS
Version 9.2 (SAS Institute Inc., Cary, NC, USA).
Results
Image quality of two different GBCA
All contrast agent administrations were performed without
complications. No adverse events were observed.
For image quality analysis 680 judgments ([20×17]×2)
were made in total by each reader; 310 vessel segments were
assessed in group 1 (gadobutrol), 281 vessel segments were
assessed in group 2 (gadoterate meglumine). In group 1, 30
vessel segments could not be assessed whereas in group 2 59
vessel segments were not assessable. For all readers the
overall median value was 4 for gadobutrol whereas gadoterate
meglumine revealed a lower overall median value of 3
(Table 3). Figure 2 illustrates an example for image quality
achieved with gadoterate meglumine and gadobutrol, respec-
tively. With all 17 segments and five readers, the proportions
of segments with diagnostic image quality across readers was
found to be 0.97 (95% CI=(0.94; 0.99)) for gadobutrol and
0.78 (95% CI=(0.70; 0.86)) for gadoterate meglumine
leading to a difference of 0.19 (95% CI=(0.10; 0.27)). This
difference was already highly significant (p<0.0001). Also
each reader’s assessment showed a highly significant
difference when evaluated separately.
Effect of different numbers of readers
To evaluate the effect of different numbers of readers on
the significance and power, we restricted the analysis to
Table 3 Vessel segments and results (Median values across patients by reader for all vessel segments given)
No Vessel segment MRA Gadobutrol Reader MRA Gadoterate meglumine Reader
1 2 3 4 5 1 2 3 4 5
1 Infrarenal aorta 4 4 4 4 4 3 3 3 4 4
2 Left common iliac artery 4 4 4 4 4 3 3 3 3 3
3 Right common iliac artery 4 4 4 4 4 3 3 3 3 3.5
4 Left superficial femoral artery 4 4 4 4 4 3 3 3 2.5 3
5 Right superficial femoral artery 4 4 4 4 4 3 3 3 3 3
6 Left deep femoral artery 3 3 3 4 4 3 3 3 3 3
7 Right deep femoral artery 4 3.5 3 4 4 3 3 3 3 3
8 Left popliteal artery 4 4 4 4 4 3 3 3 3 3
9 Right popliteal artery 4 4 4 4 4 3 3 3 3 3
10 Left tibiofibular trunc 3 3 3 4 4 3 3 3 3 3
11 Right tibiofibular trunc 4 4 4 4 4 3 3 3 3 3
12 Left anterior tibial artery 3 3 3.5 4 4 3 3 3 3 3
13 Right anterior tibial artery 4 4 4 4 4 3 3 3 3 3
14 Left posterior tibial artery 3 3 3 4 4 3 3 2 3 3
15 Right posterior tibial artery 3 3 3 4 4 3 3 3 3 3
16 Left peroneal artery 3 3 3 4 4 3 3 3 3 3
17 Right peroneal artery 3 3 3 4 4 3 3 3 3 3
Eur Radiol (2011) 21:1034–1042 1037
Page 5
three segments, where the difference between the
contrast agents was less dominant. The statistical
significances for the ordinal (4 point scale) and dichot-
omized image quality (diagnostic yes/no) on the three
segments per patient are summarized in Tables 4 and 5.
For the ordinal endpoint, again the differences were
significant in all scenarios, even with any single reader
assessment. For the binary endpoint, with one reader
assessment, only in 1 of 5 cases (20%), significance was
reached, with two readers in 4 of 10 cases (40%), with
three readers in 6 of 10 cases (60%), with four readers in
4 of 5 cases (80%) and with all five readers. The GEE
approach for binary data was found to be slightly more
conservative compared to the adjusted modified Chi²-test
as the p-values in the GEE analysis were found to be
slightly higher in all scenarios analyzed. Therefore the
GEE approach was used for the simulation study on power
and sample size.
Power and sample size
In Figs. 3 and 4 the power and sample size considerations
are summarized for two, three, and four observational units
per patient. For a sample size of 40 and 3 units per patient,
the power starts at 29% for a single reader and increased to
79% when five readers would be included into the study,
see Fig. 4. The required sample size for a power of about
83% consequently decreased from 120 to 44 for one and
five readers, respectively (Fig. 3). Overall, the required
number of patients needed for the analysis can be reduced
by increasing the number of observational units per patient
where possible (e.g. eight liver segments instead of two
liver lobes per patient) and/or by increasing the number of
readers assessing all images. With an intra-individual
comparison design, where the contrast agents would be
applied in a paired fashion, the power is already close to
80% with one reader and 40 patients and reaches 96% with
Fig. 1 Flow chart of statistical
analyses performed
Fig. 2 A Full-thickness coronal
MIP of the CTM MRA of a
patient illustrates the image
quality of gadoterate meglumine
(a) vs gadobutrol (b). Especially
the distal calf vessels could be
depicted more clearly using
gadobutrol, which was also
reflected by the statistically
significant higher median values
for gadobutrol for all readers
1038 Eur Radiol (2011) 21:1034–1042
End of preview.