ArticlePDF Available

Abstract and Figures

Background: The International Ovarian Tumour Analysis (IOTA) group have developed the ADNEX (The Assessment of Different NEoplasias in the adneXa) model to predict the risk that an ovarian mass is benign, borderline, stage I, stages II-IV or metastatic. We aimed to externally validate the ADNEX model in the hands of examiners with varied training and experience. Methods: This was a multicentre cross-sectional cohort study for diagnostic accuracy. Patients were recruited from three cancer centres in Europe. Patients who underwent transvaginal ultrasonography and had a histological diagnosis of surgically removed tissue were included. The diagnostic performance of the ADNEX model with and without the use of CA125 as a predictor was calculated. Results: Data from 610 women were analysed. The overall prevalence of malignancy was 30%. The area under the receiver operator curve (AUC) for the ADNEX diagnostic performance to differentiate between benign and malignant masses was 0.937 (95% CI: 0.915-0.954) when CA125 was included, and 0.925 (95% CI: 0.902-0.943) when CA125 was excluded. The calibration plots suggest good correspondence between the total predicted risk of malignancy and the observed proportion of malignancies. The model showed good discrimination between the different subtypes. Conclusions: The performance of the ADNEX model retains its performance on external validation in the hands of ultrasound examiners with varied training and experience.British Journal of Cancer advance online publication, 2 August 2016; doi:10.1038/bjc.2016.227 www.bjcancer.com.
Content may be subject to copyright.
Evaluating the risk of ovarian cancer before
surgery using the ADNEX model: a
multicentre external validation study
A Sayasneh*
,1,2,10
, L Ferrara
3,4,10
, B De Cock
5
, S Saso
3
, M Al-Memar
3
, S Johnson
6
, J Kaijser
7
, J Carvalho
3
,
R Husicka
3
, A Smith
8
, C Stalder
3
, MC Blanco
4
, G Ettore
4
, B Van Calster
5
, D Timmerman
5,9
and T Bourne
1,3,5
1
Department of Surgery and Cancer, Hammersmith Campus, Imperial College London, Du Cane Road, London W12 0HS, UK;
2
Department of Obstetrics and Gynaecology, Guy’s and St Thomas’ Hospital, Westminster Bridge Road, London SE1 7EH, UK;
3
Early Pregnancy and Acute Gynecology Unit, Queen Charlotte’s and Chelsea Hospital, Imperial College London, Du Cane Road,
London W12 0HS, UK;
4
Department of Obstetrics and Gynecology, Garibaldi Nesima Hospital, Via Palermo 636, Catania 95122,
Italy;
5
KU Leuven, Department of Development and Regeneration, Herestraat 49, Box 805, Leuven 3000, Belgium;
6
Southampton
University Hospitals, Princess Anne Hospital, Southampton SO16 5YA, UK;
7
Department of Obstetrics and Gynecology, Ikazia
Ziekenhuis Rotterdam, Montessoriweg 1, Rotterdam 3083 AN, The Netherlands;
8
Ultrasound Scan Department, Queen Charlottes
and Chelsea Hospital, Imperial College London, Du Cane Road, London W12 0HS, UK and
9
Department of Obstetrics and
Gynecology, University Hospitals Leuven, Herestraat 49, Box 7003, 3000 Leuven, Belgium
Background: The International Ovarian Tumour Analysis (IOTA) group have developed the ADNEX (The Assessment of Different
NEoplasias in the adneXa) model to predict the risk that an ovarian mass is benign, borderline, stage I, stages II–IV or metastatic.
We aimed to externally validate the ADNEX model in the hands of examiners with varied training and experience.
Methods: This was a multicentre cross-sectional cohort study for diagnostic accuracy. Patients were recruited from three cancer
centres in Europe. Patients who underwent transvaginal ultrasonography and had a histological diagnosis of surgically removed
tissue were included. The diagnostic performance of the ADNEX model with and without the use of CA125 as a predictor was
calculated.
Results: Data from 610 women were analysed. The overall prevalence of malignancy was 30%. The area under the receiver
operator curve (AUC) for the ADNEX diagnostic performance to differentiate between benign and malignant masses was 0.937
(95% CI: 0.915–0.954) when CA125 was included, and 0.925 (95% CI: 0.902–0.943) when CA125 was excluded. The calibration plots
suggest good correspondence between the total predicted risk of malignancy and the observed proportion of malignancies. The
model showed good discrimination between the different subtypes.
Conclusions: The performance of the ADNEX model retains its performance on external validation in the hands of ultrasound
examiners with varied training and experience.
According to the latest statistics from the National Cancer Institute
in United States, 12.1 per 100 000 women developed ovarian cancer
per year between 2008 and 2012, with a mortality of 7.7 per
100 000 women (Howlader et al, 2015). The overall 5-year survival
is estimated to be B45.6% for all stages of the disease (Howlader
et al, 2015). However, for early localised ovarian cancers, the 5-year
*Correspondence: A Sayasneh; E-mail: a.sayasneh@imperial.ac.uk
10
These authors contributed equally to this work.
Received 13 December 2015; revised 4 June 2016; accepted 1 July 2016
&2016 Cancer Research UK. All rights reserved 0007 – 0920/16
FULL PAPER
Keywords: diagnostic imaging; ovarian neoplasm; statistical models; ultrasonography
British Journal of Cancer (2016), 1–7 | doi: 10.1038/bjc.2016.227
www.bjcancer.com | DOI:10.1038/bjc.2016.227 1
Advance Online Publication: 2 August 2016
survival exceeds 90% (Howlader et al, 2015). A combination of
early diagnosis and centralised management are thought to be key
factors to optimise survival (Bristow et al, 2013, 2014; Howlader
et al, 2015). For early diagnosis, previous trials to evaluate ovarian
cancer screening have not been successful (Kobayashi et al, 2008;
Buys et al, 2011). However, recently, the United Kingdom
Collaborative Trial of Ovarian Cancer Screening (UKCTOCS)
showed that screening using the risk of ovarian cancer algorithm
(ROCA) doubled the number of detected primary invasive
epithelial ovarian or tubal cancers (iEOCs) compared with a fixed
cutoff of CA125 (Menon et al, 2015). The researchers also reported
a significant mortality reduction with annual multimodal screening
(MMS) when prevalent cases were excluded. However, the effect of
this mortality reduction on final ovarian cancer screening cost
effectiveness requires longer-term follow-up of the study patients
(Jacobs et al, 2015).
A further important aspect of clinical management is that an
accurate diagnosis is made when a woman presents with an
ovarian mass. This is essential if women with cancer are to be
referred to specialist oncology services. The International Ovarian
Tumour Analysis group (IOTA) have developed and validated
models and rules to characterise ovarian masses as benign or
malignant (Timmerman et al, 2005, 2010a, b; Van Holsbeke et al,
2012). These models and rules have also been validated in the
hands of less experienced (level II) ultrasound examiners (Sayasneh
et al, 2013a,b).
The IOTA group has developed the multiclass ADNEX
(The Assessment of Different NEoplasias in the adneXa) model
that can differentiate between benign tumours, borderline tumours,
early-stage primary cancers, late-stage primary cancers (stages II–
IV) and secondary metastatic cancers (Van Calster et al,2014).The
ADNEX is based on three clinical (including CA125) and six
ultrasound parameters (Van Calster et al, 2014), and also offers risk
calculation without CA125. The model was developed and
temporally validated using parameters collected by experienced
(or level III) ultrasound examiners, equivalent to a UK consultant
level with a special interest in gynaecological ultrasonography
(Education and Practical Standards Committee, European
Federation of Societies for Ultrasound in Medicine and Biology
(EFSUMB), 2006; Van Calster et al, 2014). This model should
facilitate the management of ovarian masses more efficiently as it
allows patients to be triaged to the correct management pathway,
whether for conservative follow-up, surgery at a general gynaecology
unit or management at high-volume specialised cancer centres.
Correctly classifying the subtype of malignancy is also of critical
importance as borderline ovarian tumours and early-stage ovarian
cancers can be treated less aggressively, leading to the possibility of
fertility preservation in younger women (Hennessy et al, 2009; Darai
et al, 2013). On the other hand, metastatic ovarian cancers should be
managed according to the origin of the primary cancer (Hennessy
et al, 2009).
The primary aim of this project was to externally validate the
ADNEX model. The secondary aim was to assess the performance of
the model by level II examiners with varied training (nonconsultant
doctors (MDs) and sonographers) (Education and Practical
Standards Committee, European Federation of Societies for
Ultrasound in Medicine and Biology (EFSUMB), 2006; Van
Calster et al, 2014). We hypothesised that the discriminatory
performance of ADNEX would be retained, that is, it would be
similar to the validation performance in the original ADNEX study.
MATERIALS AND METHODS
Setting and design. This was a multicentre cross-sectional cohort
study for diagnostic accuracy. Data were collected prospectively,
with the purpose of developing and validating ultrasound-based
prediction models from transvaginal ultrasound examinations
performed by level II ultrasound examiners (nonconsultant
gynaecology specialist, gynaecology trainees doctors and gynaecol-
ogy sonographers) (Education and Practical Standards Committee,
European Federation of Societies for Ultrasound in Medicine and
Biology (EFSUMB), 2006; The Royal College of Radiologists (RCR)
Board of the Faculty of Clinical Radiology, 2012). The ultrasound
examiners were blind to the results of the reference test, that is, the
final histological outcome or in the event of cancer the stage of the
disease The ADNEX model was applied by a single investigator
(AS) using a dedicated excel spreadsheet. Patients were recruited
from three cancer centres (Queen Charlotte’s Chelsea Hospital
(QCCH), London, UK; Princess Ann Hospital (PAH), South-
ampton, UK; and Garibaldi Nesima Hospital (GNH), Catania,
Italy). The study was approved as a service evaluation audit at the
UK centres and as a validation study by the hospital authority at
the Italian centre. The guidelines of the TRIPOD (Transparent
Reporting of a multivariable prediction model for Individual
Prognosis or Diagnosis) initiative were used (Collins et al, 2015).
Patients were recruited consecutively from September 2010 to
November 2014 at QCCH, from May 2012 to May 2014 at PAH
and from September 2012 to February 2015 at GNH. Patients at
QCCH and PAH were also recruited to the IOTA 4 study
(Sayasneh et al, 2013a,b). Transvaginal ultrasonography was
performed using the standardised approach previously published
by the IOTA group (Timmerman et al, 2000, 2010b). Transab-
dominal ultrasonography was undertaken when a large mass could
not be fully evaluated transvaginally (Timmerman et al, 2010b).
Participants and data collection. The inclusion criteria were
patients presenting with at least one adnexal mass who underwent
transvaginal ultrasonography at one of the participating centres.
For bilateral adnexal masses, the mass with the most complex
ultrasound features was included (Timmerman et al, 2000, 2010b).
If both masses had similar ultrasound morphology, the largest
mass or the one most easily accessible by ultrasonography was
included (Timmerman et al, 2010b).
The exclusion criteria were (1) pregnancy, (2) patients examined
by a consultant, (3) refusal of transvaginal ultrasonography, (4)
cytology rather than histology as an outcome and (5) failure to
undergo surgery within 120 days of the ultrasound examination. At
PAH, 8 cases were included in the final analysis, although they had
the ultrasound examination more than 120 days before surgery.
These cases underwent a CT scan within 120 days, confirming the
persistent presence of the mass.
The NHS Caldicott report guidelines were followed in all steps
of data handling (Great Britain; Department of Health, 1997).
At QCCH and GNH, a secure electronic data collection system was
used (Astraia Software, Munich, Germany). A unique identifier
was generated automatically for each patient’s record. Dedicated
data collection forms and excel sheets were used at PAH. Serum
CA125 was measured as per clinician’s discretion or clinical
practice in each centre, using Abbott Architect CA125 II (Abbott
Park, IL, USA) immunoassay kit at QCCH and GNH, and UniCel
DxI Immunoassay System (Beckman Coulter Inc., Brea, CA, USA)
Assay at PAH.
The ADNEX model. The ADNEX model contains three clinical
and six ultrasound predictors: age (in years), serum CA125 level
(U ml
1
), type of centre (oncology centres vs other hospitals),
maximum diameter of lesion (in mm), proportion of solid tissue,
more than 10 cyst locules (yes or no), number of papillary
projections (0, 1, 2, 3 or 43) acoustic shadows (yes or no) and
ascites (yes or no) (Van Calster et al, 2014). Oncology centres were
defined as ‘tertiary referral centres with a specific gynaecology
oncology unit’. The proportion of solid tissue is obtained as the
ratio of the maximum diameter of the largest solid component and
BRITISH JOURNAL OF CANCER Characterising ovarian masses by multiclass model
2 www.bjcancer.com | DOI:10.1038/bjc.2016.227
the maximum diameter of the lesion. The ADNEX model is
available online and in mobile applications (www.iotagroup.org/
adnexmodel/) (Van Calster et al, 2014). The ADNEX model can
still be calculated without including the serum CA125 value. In this
study we calculated the performance of the ADNEX model with
and without CA125. The temporal validation of the model with
CA125 in the original paper yielded an area under the receiver
operator curve (AUC) of 0.943 (0.934–0.952) to discriminate
benign from malignant tumours. The model without CA125 had
an AUC of 0.932 (0.922–0.941). Validation AUCs between all pairs
of the five categories varied between 0.71 (stage I cancer vs
secondary metastatic cancer) and 0.99 (benign tumours vs late
stage primary cancer). We applied the model exactly as presented
in the original publication, that is, without any changes to the
model formula or coefficients.
Reference tests. The reference standard was the histopathological
diagnosis of the mass after surgical removal. The excised tissues
underwent histological examination at the local centre. Tumours
were classified according to the WHO (World Health Organisa-
tion) classification of tumours and malignant tumours were staged
according to the FIGO (International Federation of Gynaecology
and Obstetrics) criteria (Tavassoli et al, 2003; Heintz et al, 2006).
Histological classification was performed without knowledge of the
ADNEX results or clinical and ultrasound findings for the patient.
The final diagnosis was categorised into five types: benign,
borderline, stage I invasive, stage II–IV invasive and secondary
metastatic cancer.
Statistical analysis. There were missing values for serum CA125
and for the presence of 410 cyst locules (loc10). Missing values
were handled differently for serum CA125 and loc10. The number
of missing values for the latter variable was small (3%), and hence
these were dealt with using single stochastic imputation based on
logistic regression. Missing loc10 values were predicted by a logistic
regression model with Firth correction with the following
predictors: age, maximum diameter of the lesion, proportion of
solid tissue, number of papillations, presence of acoustic shadows,
ascites, type of ovarian tumour and type of operator. The missing
serum CA125 values were handled with multiple stochastic
imputation using predictive mean matching regression. As the
distribution of serum CA125 was heavily skewed, the log–log
transformation of CA125 was used (i.e., log(log(CA125))). In this
imputation model, age, maximum diameter of the lesion,
proportion solid tissue, loc10, number of papillations, presence
of acoustic shadows, ascites, type of ovarian tumour, hospital and
operator type were used as predictors. Using this approach, the
missing values were replaced by 100 plausible values, leading to
100 completed data sets. Imputed values were back transformed to
the original scale. For the ADNEX model with CA125, each of the
100 completed data sets were analysed separately and their results
combined using Rubin’s Rules (Rubin, 1987).
External validation of the ADNEX model with and without
CA125 was performed by evaluating discrimination and calibration
performance. The AUC was calculated for the basic discrimination
between benign and malignant tumours using the total risk of
malignancy (i.e., the sum of the estimated risks of the four
malignant subtypes). The 95% confidence intervals for differences
in AUCs were computed based on 1000 bootstrap samples, where
for each bootstrap sample the same patients were selected across
the imputed data sets (Musoro et al, 2014). In addition, AUCs were
computed for each pair of tumour types using the conditional risk
method (Van Calster et al, 2012b). Finally, the polytomous
discrimination index was calculated (Van Calster et al, 2012a) that
estimates the average proportion of correctly classified patients by
the model when presented with five patients, one with each tumour
type. Sensitivity and specificity were calculated using a 1%, 5%,
10%, 15%, 20% and 30% cutoff denoting the total risk of
malignancy. Calibration of the predicted probabilities was assessed
through use of calibration plots that show the relation between the
observed and predicted probabilities for malignant tumours. The
calibration curve was estimated by using a loess smoother (Van
Calster et al, 2016).
RESULTS
During the study period, 751 women underwent ultrasonography
by level II examiners (one associate specialist in gynaecology,
12 resident gynaecology trainees and 29 sonographers) for a pelvic
mass and went through the surgical management pathway.
Of these, 141 women were excluded from the final analysis for
the following reasons: 65 women were examined by a consultant,
26 women had no histology result (14 only cytology, 12 no
cytology or histology), 24 women had surgery 4120 days from the
characterising ultrasound scan, 15 women were pregnant, 5 women
only had a transabdominal scan, 5 women had no surgery
performed (declined or were not medically fit) and finally
1 woman who had a recurrence of cervical cancer in the pelvis a
few years after radical hysterectomy and underwent a bilateral
salpingo-oophorectomy was excluded as the tumour was not
considered adnexal. Supplementary Table 1 presents exclusions for
each centre. In the final analysis, 610 women were included
(Supplementary Figure 1). Of these patients, 142 (23%) had a
missing CA125 level and 17 (3%) had a missing value for loc10.
Supplementary Table 2 presents the numbers of missing values for
each of the study centres. The prevalence of malignancy was 30%
(n¼182), with 33% for QCCH, 32% for PAH and 19% for GNH.
There were 42 (7%) borderline tumours, 47 (8%) stage I primary
ovarian cancers, 69 (11%) stage II–IV primary ovarian cancers and
24 (4%) secondary metastatic cancers (see Supplementary Table 3
for a breakdown per centre). The median age was 47 years with
352 (58%) premenopausal and 258 (42%) postmenopausal women.
Table 1 shows descriptive statistics of the ADNEX predictors per
tumour subtype. Supplementary Tables 4–6 shows descriptive
statistics per centre.
The calibration plots suggest good correspondence between the
total predicted risk of malignancy and the observed proportion of
malignant tumours, both for the ADNEX model with and without
CA125 (Figure 1).
The AUC to differentiate between benign and malignant masses
was 0.937 (95% CI: 0.915–0.954) for ADNEX with CA125 and
0.925 (95% CI: 0.902–0943) for ADNEX without CA125 (Figure 2
and Table 2). The model with CA125 showed slightly better
performance (AUC difference: 0.012, 95% CI: 0.006–0.020). At risk
cutoffs of 1%, 10% and 30%, sensitivities were 100%, 97% and 86%
for ADNEX with CA125 (Table 3). Corresponding specificities
were 12%, 68% and 84%. As in the original study, centre
differences were observed with centre-specific AUCs for ADNEX
with CA125 that varied from 0.90 for PAH to 0.99 for GNH
(Table 2). The AUC was higher for premenopausal women (0.94)
than for postmenopausal women (0.90) (Table 2): 0.939 vs 0.899
for the model with CA125 (difference 0.04, 95% CI 0.009 to
0.084) and 0.935 vs 0.873 for the model without CA125 (difference
0.062, 95% CI 0.012 0.116).
When tumours were classified into benign, borderline, stage I
invasive, stages II–IV, invasive and secondary metastatic, the
model showed good discrimination between the different subtypes
(Table 4). For example, discrimination between benign and
stage II–IV tumours was near perfect for the model with CA125
(AUC 0.99). In comparison, the model had most difficulties
discriminating between borderline and stage I tumours
(AUC 0.78), though its performance is still good. The model
without CA125 mainly had lower AUCs for stage II IV tumours
Characterising ovarian masses by multiclass model BRITISH JOURNAL OF CANCER
www.bjcancer.com | DOI:10.1038/bjc.2016.227 3
vs other groups, in particular vs secondary metastatic cancers
(AUC 0.88 for model with CA125, AUC 0.77 for model without
CA125). The polytomous discrimination index (PDI) was 0.58 for
ADNEX with CA125 and 0.52 for ADNEX without CA125
(Table 4), whereas PDI for random performance would be 0.20 for
five categories.
DISCUSSION
In this study, we have shown that in the hands of level II
ultrasound examiners, the ADNEX model was able to discriminate
between benign and malignant masses with a very similar level of
performance to that achieved by experienced ultrasound examiners
in the original ADNEX temporal validation study published by the
IOTA group (Van Calster et al, 2014). In our external validation
study using a 10% cutoff to define malignancy, the ADNEX model
achieved a sensitivity of 97.3% and a specificity of 67.7% compared
with 96.5% and 71.3% in the original study (Van Calster et al,
2014). The optimal cutoff for selecting patients for conservative
management may vary (e.g., between 1 and 5%) depending on the
health-care system, cost of surgery and surgical risk factors
(age, previous medical and surgical history). However, as this study
only included patients who underwent surgical management, we
cannot conclude which cutoff is optimal for conservative
management. This will be investigated in the IOTA5 study
(https://clinicaltrials.gov/ct2/show/NCT01698632). In contrast, in
a tertiary centre it may be preferable to have a lower false positive
rate, and a cutoff value of 30% may be more appropriate (Van
Calster et al, 2015).
To the best of our knowledge, this is the first external validation
study of the IOTA ADNEX model. Furthermore, the validation
was carried out by level II ultrasound examiners, whereas in the
previous IOTA development and temporal validation study (Van
Calster et al, 2014), the ultrasound scan parameters were collected
by experienced level III examiners. A strength of our study is that it
is multicentre, and as it includes level II examiners with varied
training and experience (sonographers and medical doctors), we
think the performance of the ADNEX model in this study is likely
to be generalisable. Another strength of our study is the robust
selection of the reference test, as only cases with a histological
outcome were included. However, this may also be seen as a
weakness in relation to the potential performance of the ADNEX
model for masses that are selected for conservative management as
these were not included in the study. This is an issue that applies to
most, if not all, of the diagnostic research carried out to date on
ovarian masses. The previously mentioned IOTA 5 study should
give us useful information on the diagnostic performance of
ADNEX and the long-term behaviour of these masses.
A potential limitation is the use of different assay kits for serum
CA125 measurements; however, the inconsistency in CA125 levels
Table 1. Descriptive information about the patients and masses included in the study according to tumour subtype
All patients Statistic
Benign
(n¼428)
Borderline
(n¼42)
Stage I OC
(n¼47)
Stage II–IV OC
(n¼69)
Secondary
metastasis
(n¼24)
Age, years Median (IQR) 43 (3155) 47 (3056) 57 (4868) 62 (5372) 55 (4969)
CA125, IU l
1
Median (IQR) 20 (1239) 28 (2164) 92 (35209) 485 (1361083) 66 (33129)
Max lesion diameter, mm Median (IQR) 72 (5195) 128 (91174) 146 (109180) 110 (76140) 90 (73135)
Presence of solid parts N(%) 142 (33%) 30 (71%) 46 (98%) 69 (100%) 22 (92%)
Proportion of solid tissue, if present Median (IQR) 0.36 (0.180.78) 0.37 (0.190.47) 0.43 (0.300.67) 0.59 (0.41–1.00) 1.00 (0.58–1.00)
More than 10 locules N(%) 31 (7%) 14 (33%) 13 (28%) 11 (16%) 7 (29%)
Number of papillations
0N(%) 371 (87%) 26 (62%) 33 (70%) 52 (75%) 21 (88%)
1N(%) 31 (7%) 6 (14%) 1 (2%) 8 (12%) 0 (0%)
2N(%) 12 (3%) 2 (5%) 5 (11%) 1 (1%) 2 (8%)
3N(%) 3 (1%) 2 (5%) 2 (4%) 0 (0%) 1 (4%)
43N(%) 11 (3%) 6 (14%) 6 (13%) 8 (12%) 0 (0%)
Acoustic shadows N(%) 94 (22%) 0 (0%) 6 (13%) 1 (1%) 1 (4%)
Ascites N(%) 6 (1%) 1 (2%) 3 (6%) 23 (33%) 7 (29%)
Abbreviations: CA125 ¼cancer antigen 125; IQR ¼interquartile range; OC ¼ovarian cancer.
1.0
AB
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Benign
Malignant
Benign
Malignant
Ideal
Flexible calibration (loess)
Predicted probability
0.0 0.2 0.4 0.6 0.8 1.0
Predicted probability
Observed proportion
1.0
0.8
0.6
0.4
0.2
0.0
Observed proportion
Ideal
Flexible calibration (loess)
Figure 1. (A) Calibration plot for the ADNEX model with serum CA125. (B) Calibration plot for the ADNEX model without serum CA125.
BRITISH JOURNAL OF CANCER Characterising ovarian masses by multiclass model
4 www.bjcancer.com | DOI:10.1038/bjc.2016.227
resulting from this is thought to be limited (Davelaar et al, 1998).
Furthermore, the variance in CA125 assay kits used in the study is
a reflection of clinical reality and again means results are more
likely to be reproducible (Van Calster et al, 2014). A further
possible limitation of the study is that all three participating
hospitals were referral centres for gynaecological cancers, resulting
in there being a relatively high prevalence of malignant disease in
the study population. Accordingly, it is possible that our findings
may have limitations when trying to predict test performance
either in primary care or secondary gynaecology units. However, it
should be noted that in the original ADNEX study the prevalence
of malignancy ranged from 0 to 66% in the 24 participating centres
(Van Calster et al, 2014), and hence this makes it more likely that
results will be generalisable. Furthermore, ADNEX explicitly
corrects its prediction for type of centre (oncology centres vs
other centres). In this sense, the potential for selection bias is
accounted for by the model.
Finally, having no centralised histopathology review in our
study may have led to bias. For example, distinguishing borderline
tumours from benign tumours or even stage I cancer may be
challenging for pathologists, where disagreement can occur and
this may give inaccurate diagnostic performance results for the
ADNEX model in these cases (Van Calster et al, 2014). However,
as all the histopathology departments involved in this study were
tertiary referral centres for gynaecological cancers, in the event of a
discrepancy (including discrepancies in the referring units) a local
review at the tertiary centre would have been held to resolve the
disagreement. Furthermore, centralised review of pathology was
discontinued in IOTA studies as it was shown in initial studies that
there were minimal differences between local and central reports
(Timmerman et al, 2005).
It is worth noting that we have observed variation in the
ADNEX performance between centres that is comparable to the
one observed in the original IOTA validation study (Van Calster
et al, 2014). This variation could be explained by the differences in
Table 3. The overall sensitivity and specificity (benign vs
malignant) of the ADNEX model with and without the
inclusion of serum CA125
Cutoff
Patients with
riskXcutoff,
N(%)
Sensitivity with
95% CI
Specificity with
95% CI
ADNEX with CA125
1% 559 (91.6%) 100.0% (97.4–100.0) 11.9% (9.1–15.5)
3% 479 (78.5%) 100.0% (97.4–100.0) 30.6% (26.3–35.3)
5% 383 (62.8%) 99.0% (94.9–99.8) 53.2% (48.2–58.1)
10% 315 (51.6%) 97.3% (93.5–98.9) 67.7% (63.0–72.0)
15% 281 (46.1%) 94.4% (90.0–97.0) 75.2% (70.7–79.2)
20% 253 (41.5%) 90.6% (85.2–94.1) 79.3% (75.1–83.0)
30% 226 (37.0%) 86.3% (80.4–90.6) 83.9% (80.1–87.2)
ADNEX without CA125
1% 557 (91.3%) 100.0% (97.4–100.0) 12.4% (9.5–16.0)
3% 490 (80.3%) 100.0% (97.4–100.0) 28.0% (23.9–32.6)
5% 374 (61.3%) 98.9% (95.7–99.7) 54.7% (49.9–59.3)
10% 317 (52.0%) 96.7% (92.9–98.5) 67.1% (62.5–71.3)
15% 289 (47.4%) 94.5% (90.1–97.0) 72.7% (68.2–76.7)
20% 261 (42.8%) 90.7% (85.5–94.1) 77.6% (73.4–81.3)
30% 225 (36.9%) 84.6% (78.6–89.2) 83.4% (80.0–86.6)
Abbreviations: ADNEX ¼The Assessment of Different NEoplasias in the adneXa; CA125 ¼
cancer antigen 125; CI¼confidence interval. When using a 1% or 3% cutoff, confidence
limits are calculated through use of Wilson’s score confidence interval method with
continuity correction (Newcombe, 1998). For the other cutoffs, confidence limits are
calculated using logistic regression to combine results after multiple imputation.
1.0
0.9
0.8 30%
20%
15% 10% 5% 3% 1%
0.7
0.6
Sensitivity
0.5
0.4
0.3
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5
1-Specificity
0.6
ADNEX without CA125 (AUC=0.925)
Cutoff ADNEX without CA125
ADNEX with CA125 (AUC=0.937)
Cutoff ADNEX with CA125
0.7 0.8 0.9 1.0
Figure 2. Receiver operating curves for the ADNEX model with and
without serum CA125 levels to discriminate between benign and
malignant masses.
Table 4. Pairwise AUCs and PDI of the ADNEX model with
and without serum CA125
Discrimination measure ADNEX with
CA125
ADNEX
without
CA125
Polytomous discrimination index (PDI) 0.59 0.52
AUC benign vs borderline 0.88 0.88
AUC benign vs stage I OC 0.95 0.94
AUC benign vs stage II–IV OC 0.99 0.97
AUC benign vs secondary metastasis 0.96 0.95
AUC borderline vs stage I OC 0.78 0.78
AUC borderline vs stage II–IV OC 0.94 0.91
AUC borderline vs secondary metastasis 0.92 0.93
AUC stage I OC vs stage II–IV OC 0.83 0.79
AUC stage I OC vs secondary metastasis 0.81 0.83
AUC stage II–IV OC vs secondary metastasis 0.88 0.77
Abbreviations: ADNEX ¼The Assessment of Different NEoplasias in the adneXa; AUC¼
area under the receiver operating curve; CA125 ¼cancer antigen 125; OC ¼ovarian cancer.
Table 2. The area under the receiver operator curve for the
discrimination between benign and malignant lesions for
ADNEX with and without CA125 according to type of centre
and sonographer
ADNEX with
CA125
ADNEX without
CA125
Subgroup AUC 95% CI AUC 95% CI
All patients 0.937 0.915–0.954 0.925 0.902–0.943
Centre
QCCH 0.942 0.913–0.962 0.931 0.900–0.953
PAH 0.900 0.841–0.938 0.889 0.828–0.930
GNH 0.990 0.959–0.998 0.983 0.950–0.995
Operator profession
MD 0.939 0.917–0.956 0.924 0.900–0.943
Sonographer 0.912 0.809–0.962 0.916 0.818–0.964
Menopausal status
Premenopausal 0.939 0.901–0.963 0.935 0.901–0.958
Postmenopausal 0.899 0.855–0.931 0.873 0.824–0.910
Abbreviations: ADNEX ¼The Assessment of Different NEoplasias in the adneXa; AUC ¼
area under the receiver operating curve; CA125 ¼cancer antigen 125; CI ¼confidence
interval; MD ¼medically qualified doctor; QCCH ¼Queen Charlotte’s and Chelsea
Hospital; PAH ¼Princess Anne Hospital; GNH ¼Garibaldi Nesima Hospital.
Characterising ovarian masses by multiclass model BRITISH JOURNAL OF CANCER
www.bjcancer.com | DOI:10.1038/bjc.2016.227 5
the case mix between these centres with a higher number of
secondary metastatic cancers in PAH compared with QCCH and
GNH. It is important to investigate heterogeneity between centres,
but this data set is not ideal for this objective because this requires
a larger database derived from a large number of centres.
In our study, the classification of the level of experience of the
ultrasound examiners (level II) was based on the recommendations
published by the European Federation of Societies for Ultrasound
in Medicine and Biology (Education and Practical Standards
Committee, European Federation of Societies for Ultrasound in
Medicine and Biology (EFSUMB), 2006) and by the Royal College
of Radiologists (The Royal College of Radiologists (RCR) Board of
the Faculty of Clinical Radiology, 2012). As guidance, a level III
examiner in the United Kingdom equates to a consultant with a
special interest in gynaecological ultrasonography (The Royal
College of Radiologists (RCR) Board of the Faculty of Clinical
Radiology, 2012). We acknowledge that this approach has
limitations as some level II examiners may have similar levels of
competence to someone with level III experience. However, it is
acknowledged that the boundaries between these levels can be
difficult to distinguish and may overlap (The Royal College of
Radiologists (RCR) Board of the Faculty of Clinical Radiology,
2012). In our study, similar to previous findings when the IOTA
model LR2 was validated in the hands of level II examiners
(Sayasneh et al, 2013b), we found the AUC for the ADNEX model
was slightly higher when the scans were performed by doctors
compared with sonographers (Table 2).
By characterising the type of malignancy (borderline, primary stage I
cancer, primary stage II–IV cancer or secondary metastatic), the
ADNEX model offers the possibility of a more personalised diagnosis in
the event of an ovarian mass. This potentially may enable fertility
preserving surgery in some women, help plan the most appropriate
surgical approach (laparoscopy or laparotomy) in others or direct
attention to the primary site of malignancy in the event of metastasis.
Although the ADNEX model gives absolute risks ratios, relative risk
ratios can be computed to give a comparison with the background risk
for individual patient (Van Calster et al, 2015). External validation is a
critical step for any diagnostic test before it can be introduced into
clinical practice. We have shown that the performance of the ADNEX
model is retained in units with different patient populations to the
original study, and that it performs well in the hands of examiners with
different levels of experience and background training. Our findings
suggest that the ADNEX model has the potential to improve
management decisions in daily clinical practice for women with
adnexal tumours.
ACKNOWLEDGEMENTS
TB is supported by the National Institute for Health Research
(NIHR) Biomedical Research Centre based at Imperial College
Healthcare NHS Trust and Imperial College London. The views
expressed are those of the author(s) and not necessarily those of
the NHS, the NIHR or the Department of Health. DT is Senior
Clinical Investigator of the Research Foundation -Flanders
(Belgium) (FWO). Research was supported by FWO Grants
G049312N and G0B4716N and by Internal Funds KU Leuven
Grant C24/15/037.
CONFLICT OF INTEREST
TB reports that clinical research in his department (QCCH,
Imperial College London Healthcare NHS Trust) is supported by
Samsung Medison and Roche Diagnostics. The remaining authors
declare no conflict of interest.
REFERENCES
Bristow RE, Chang J, Ziogas A, Anton-Culver H (2013) Adherence to
treatment guidelines for ovarian cancer as a measure of quality care. Obstet
Gynecol 121(6): 1226–1234.
Bristow RE, Chang J, Ziogas A, Randall LM, Anton-Culver H (2014) High-
volume ovarian cancer care: survival impact and disparities in access for
advanced-stage disease. Gynecol Oncol 132(2): 403–410.
Buys SS, Partridge E, Black A, Johnson CC, Lamerato L, Isaacs C, Reding DJ,
Greenlee RT, Yokochi LA, Kessel B, Crawford ED, Church TR,
Andriole GL, Weissfeld JL, Fouad MN, Chia D, O’Brien B, Ragard LR,
Clapp JD, Rathmell JM, Riley TL, Hartge P, Pinsky PF, Zhu CS,
Izmirlian G, Kramer BS, Miller AB, Xu JL, Prorok PC, Gohagan JK,
Berg CD. PLCO Project Team (2011) Effect of screening on ovarian cancer
mortality: the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer
Screening Randomized Controlled Trial. JAMA 305(22): 2295–2303.
Collins GS, Reitsma JB, Altman DG, Moons KG (2015) Transparent reporting
of a multivariable prediction model for individual prognosis or diagnosis
(TRIPOD): the TRIPOD statement. Brit J Obstet Gynacol 122(3): 434–443.
Darai E, Fauvet R, Uzan C, Gouy S, Duvillard P, Morice P (2013) Fertility
and borderline ovarian tumor: a systematic review of conservative
management, risk of recurrence and alternative options. Hum Reprod
Update 19(2): 151–166.
Davelaar EM, van Kamp GJ, Verstraeten RA, Kenemans P (1998) Comparison
of seven immunoassays for the quantification of CA 125 antigen in serum.
Clin Chem 44(7): 1417–1422.
Department of Health (1997) The Caldicott Committee Report on the Review
of Patient-Identifiable Information. Department of Health: Great Britain.
Education and Practical Standards Committee, European Federation of
Societies for Ultrasound in Medicine and Biology (EFSUMB) (2006)
Minimum training recommendations for the practice of medical
ultrasound. Ultraschall Med 27(1): 79–105.
HeintzAP,OdicinoF,MaisonneuveP,QuinnMA,BenedetJL,CreasmanWT,
Ngan HY, Pecorelli S, Beller U (2006) Carcinoma of the ovary. FIGO 26th
Annual Report on the Results of Treatment in Gynecological Cancer. Int J
Gynaecol Obstet 95(Suppl 1): S161–S192.
Hennessy BT, Coleman RL, Markman M (2009) Ovarian cancer. Lancet
374(9698): 1371–1382.
Howlader N, Noone AM, Krapcho M, Garshell J, Miller D, Altekruse SF,
Kosary CL, Yu M, Ruhl J, Tatalovich Z, Mariotto A, Lewis DR, Chen HS,
Feuer EJ, Cronin KA (2015) SEER Cancer Statistics Review, 1975-2012.
Vol. 2015. National Cancer Institute: Bethesda, MD.
Jacobs IJ, Menon U, Ryan A, Gentry-Maharaj A, Burnell M, Kalsi JK, Amso
NN, Apostolidou S, Benjamin E, Cruickshank D, Crump DN, Davies SK,
Dawnay A, Dobbs S, Fletcher G, Ford J, Godfrey K, Gunu R, Habib M,
Hallett R, Herod J, Jenkins H, Karpinskyj C, Leeson S, Lewis SJ, Liston
WR, Lopes A, Mould T, Murdoch J, Oram D, Rabideau DJ, Reynolds K,
Scott I, Seif MW, Sharma A, Singh N, Taylor J, Warburton F,
WidschwendterM,WilliamsonK,Woolas R, Fallowfield L, McGuire AJ,
Campbell S, Parmar M, Skates SJ (2015) Ovarian cancer screening
and mortality in the UK Collaborative Trial of Ovarian Cancer
Screening (UKCTOCS): a randomised controlled trial. Lancet 387:
945–956.
Kobayashi H, Yamada Y, Sado T, Sakata M, Yoshida S, Kawaguchi R,
Kanayama S, Shigetomi H, Haruta S, Tsuji Y, Ueda S, Kitanaka T (2008)
A randomized study of screening for ovarian cancer: a multicenter study
in Japan. Int J Gynecol Cancer 18(3): 414–420.
Menon U, Ryan A, Kalsi J, Gentry-Maharaj A, Dawnay A, Habib M,
Apostolidou S, Singh N, Benjamin E, Burnell M, Davies S, Sharma A,
Gunu R, Godfrey K, Lopes A, Oram D, Herod J, Williamson K, Seif MW,
Jenkins H, Mould T, Woolas R, Murdoch JB, Dobbs S, Amso NN,
Leeson S, Cruickshank D, Scott I, Fallowfield L, Widschwendter M,
Reynolds K, McGuire A, Campbell S, Parmar M, Skates SJ, Jacobs I (2015)
Risk algorithm using serial biomarker measurements doubles the number
of screen-detected cancers compared with a single-threshold rule in the
United Kingdom Collaborative Trial of Ovarian Cancer Screening. J Clinic
Oncol 33(18): 2062–2071.
Musoro JZ, Zwinderman AH, Puhan MA, Ter Riet G, Geskus RB (2014)
Validation of prediction models based on lasso regression with multiply
imputed data. BMC Med Res Methodol 14(1): 116.
Newcombe RG (1998) Two-sided confidence intervals for the single
proportion comparison of seven methods. Stat Med 17(8): 857–872.
BRITISH JOURNAL OF CANCER Characterising ovarian masses by multiclass model
6 www.bjcancer.com | DOI:10.1038/bjc.2016.227
Rubin DB (1987) Multiple Imputation for Nonresponse in Surveys. John Wiley
& Sons, Inc.: Hoboken, NJ, USA.
Sayasneh A, Kaijser J, Preisler J, Johnson S, Stalder C, Husicka R, Guha S,
Naji O, Abdallah Y, Raslan F, Drought A, Smith AA, Fotopoulou C,
Ghaem-Maghami S, Van Calster B, Timmerman D, Bourne T (2013a)
A multicenter prospective external validation of the diagnostic
performance of IOTA simple descriptors and rules to characterize ovarian
masses. Gynecol Oncol 130(1): 140–146.
Sayasneh A, Wynants L, Preisler J, Kaijser J, Johnson S, Stalder C, Husicka R,
Abdallah Y, Raslan F, Drought A, Smith AA, Ghaem-Maghami S, Epstein E,
Van Calster B, Timmerman D, Bourne T (2013b) Multicentre external
validation of IOTA prediction models and RMI by operators with varied
training. Br J Cancer 108(12): 2448–2454.
Tavassoli FA, Devilee P. International Agency for Research on Cancer (2003)
Pathology and Genetics of Tumours of the Breast and Female Genital
Organs. International Agency for Research on Cancer: Lyon.
The Royal College of Radiologists (RCR) Board of the Faculty of Clinical
Radiology (2012) Ultrasound Training Recommendations for Medical and
Surgical Specialties. London. Available at https://www.rcr.ac.uk/sites/
default/files/publication/BFCR(12)17_ultrasound_training.pdf (last
accessed June 2015).
Timmerman D, Ameye L, Fischerova D, Epstein E, Melis GB, Guerriero S,
Holsbeke CV, Savelli L, Fruscio R, Lissoni AA, Testa AC, Veldman J,
Vergote I, Huffel SV, Bourne T, Valentin L (2010a) Simple ultrasound
rules to distinguish between benign and malignant adnexal masses before
surgery: prospective validation by IOTA group. BMJ 341: c6839.
Timmerman D, Testa A, Bourne T, Ferrazzi E, Ameye L, Konstantinovic M,
Van Calster B, Collins W, Vergote I, Van Huffel S, Valentin L (2005)
A logistic regression model to distinguish between the benign and
malignant adnexal mass before surgery: a multicenter study by the
International Ovarian Tumor Analysis (IOTA) group. J Clin Oncol 23:
8794–8801.
Timmerman D, Valentin L, Bourne TH, Collins WP, Verrelst H, Vergote I.
International Ovarian Tumor Analysis (IOTA) Group (2000) Terms,
definitions and measurements to describe the sonographic features of
adnexal tumors: a consensus opinion from the International Ovarian
Tumor Analysis (IOTA) Group. Ultrasound Obstet Gynecol 16(5):
500–505.
Timmerman D, Van Calster B, Testa AC, Guerriero S, Fischerova D,
Lissoni AA, Van Holsbeke C, Fruscio R, Czekierdowski A, Jurkovic D,
Savelli L, Vergote I, Bourne T, Van Huffel S, Valentin L (2010b) Ovarian
cancer prediction in adnexal masses using ultrasound-based logistic
regression models: a temporal and external validation study by the IOTA
group. Ultrasound Obstet Gynecol 36(2): 226–234.
Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina M, Steyerberg
EW (2016) A calibration hierarchy for risk models was defined: from
utopia to empirical data. J Clin Epidemiol 74: 167–176.
Van Calster B, Van Belle V, Vergouwe Y, Timmerman D, Van Huffel S,
Steyerberg EW (2012a) Extending the c-statistic to nominal polytomous
outcomes: the polytomous discrimination index. Stat Med 31(23):
2610–2626.
Van Calster B, Van Hoorde K, Froyman W, Kaijser J, Wynants L, Landolfo C,
Anthoulakis C, Vergote I, Bourne T, Timmerman D (2015) Practical
guidance for applying the ADNEX model from the IOTA group to
discriminate between different subtypes of adnexal tumors. Facts Views
Vis ObGyn 7(1): 32–41.
Van Calster B, Van Hoorde K, Valentin L, Testa AC, Fischerova D,
Van Holsbeke C, Savelli L, Franchi D, Epstein E, Kaijser J, Van Belle V,
Czekierdowski A, Guerriero S, Fruscio R, Lanzani C, Scala F, Bourne T,
Timmerman D. International Ovarian Tumour Analysis Group (2014)
Evaluating the risk of ovarian cancer before surgery using the ADNEX
model to differentiate between benign, borderline, early and advanced
stage invasive, and secondary metastatic tumours: prospective multicentre
diagnostic study. BMJ 349: g5920.
Van Calster B, Vergouwe Y, Looman CW, Van Belle V, Timmerman D,
Steyerberg EW (2012b) Assessing the discriminative ability of risk
models for more than two outcome categories. Eur J Epidemiol 27(10):
761–770.
Van Holsbeke C, Van Calster B, Bourne T, Ajossa S, Testa AC, Guerriero S,
Fruscio R, Lissoni AA, Czekierdowski A, Savelli L, Van Huffel S,
Valentin L, Timmerman D (2012) External validation of diagnostic
models to estimate the risk of malignancy in adnexal masses. Clin Cancer
Res 18(3): 815–825.
This work is published under the standard license to publish agree-
ment. After 12 months the work will become freely available and
the license terms will switch to a Creative Commons Attribution-
NonCommercial-Share Alike 4.0 Unported License.
Supplementary Information accompanies this paper on British Journal of Cancer website (http://www.nature.com/bjc)
Characterising ovarian masses by multiclass model BRITISH JOURNAL OF CANCER
www.bjcancer.com | DOI:10.1038/bjc.2016.227 7
... We did not collect information on how sex and ethnicity were defined in the study. Recruitment was conducted by research nurses and delivered through the National Collaborative Research Network (appendix pp [31][32][33]. ...
... One previous study 31 investigated the performance of IOTA ADNEX in three hospitals with non-specialist sonographers, and showed that the performance of the ADNEX model was retained on external validation when conducted by ultrasound examiners with varied training and experience; however, the two participating hospitals based in the UK had previously participated in IOTA studies and were led by principal investigators with international reputations for excellence in ultrasound, and one principal investigator was an IOTA founding member. Thus, sonographers in both departments might have had access to specialist expertise not available in many NHS hospitals. ...
... Thus, sonographers in both departments might have had access to specialist expertise not available in many NHS hospitals. 31 Histology types and surgical outcomes from premenopausal and postmenopausal patients with ovarian cancer within ROCkeTS have been described previously, 30 and show that most participants diagnosed through symptom-triggered testing had high cytoreduction rates and a low to moderate spread of cancer. 25% of patients with high-grade serous ovarian cancer were diagnosed at stage I or stage II, reinforcing the importance of an accurate diagnosis in patients with non-specific symptoms. ...
... We did not collect information on how sex and ethnicity were defined in the study. Recruitment was conducted by research nurses and delivered through the National Collaborative Research Network (appendix pp [31][32][33]. ...
... One previous study 31 investigated the performance of IOTA ADNEX in three hospitals with non-specialist sonographers, and showed that the performance of the ADNEX model was retained on external validation when conducted by ultrasound examiners with varied training and experience; however, the two participating hospitals based in the UK had previously participated in IOTA studies and were led by principal investigators with international reputations for excellence in ultrasound, and one principal investigator was an IOTA founding member. Thus, sonographers in both departments might have had access to specialist expertise not available in many NHS hospitals. ...
... Thus, sonographers in both departments might have had access to specialist expertise not available in many NHS hospitals. 31 Histology types and surgical outcomes from premenopausal and postmenopausal patients with ovarian cancer within ROCkeTS have been described previously, 30 and show that most participants diagnosed through symptom-triggered testing had high cytoreduction rates and a low to moderate spread of cancer. 25% of patients with high-grade serous ovarian cancer were diagnosed at stage I or stage II, reinforcing the importance of an accurate diagnosis in patients with non-specific symptoms. ...
... 2 Among them, ADNEX and SRRisk best distinguish between benign and malignant tumors. 2 The ADNEX model has been investigated in 10 countries, mainly in Europe. Although the ADNEX model was constructed by experts, several studies have reported that the ADNEX model can be used by trainees [3][4][5][6] ; however, its practical application remains debatable. Furthermore, the comparison of its diagnostic accuracy among trainees and experts, and improvement strategies remain to be discussed. ...
... While several studies have validated the efficacy of the ADNEX model, [3][4][5][6][7][8] its adaptation to the Japanese population may prove more suitable, owing to the distinct composition of ovarian carcinomas in Japan, with serous, clear cell, endometrioid, and mucinous carcinomas accounting for 42.1%, 20.9%, 15.7%, and 7.3% of cases, respectively, 1 differing from FIGO (The International Federation of Gynecology and Obstetrics) global data, where high-grade serous, endometrioid, clear-cell, mucinous, and low-grade serous carcinomas account for 70%, 10%, 10%, 3%, and <5%, respectively. 9 The proportion of ovarian clear cell carcinomas in Japan is significantly higher. ...
Article
Full-text available
Objectives This study aimed to validate the diagnostic accuracy of the International Ovarian Tumor Analysis (IOTA) Assessment of Different NEoplasias in the adneXa (ADNEX) model in Japanese women, population with a distinct adnexal mass distribution compared with European women, and to evaluate the model's utility by gynecology trainees and ultrasound specialists. Methods This single‐center, retrospective study analyzed ultrasound data from January 2017 to March 2020 of 206 women with adnexal masses. Patients who underwent ultrasonography and serum CA‐125 measurement and received postsurgery histological diagnosis were included. The ADNEX model's diagnostic performance was evaluated by two trainees and two specialists using the area under the receiver operating characteristic curve (AUC) and measures of accuracy, sensitivity, specificity, and predictive values for overall performance and each examiner. Results Of the 206 included Japanese women, the prevalence of malignancy was 30.1%, including borderline cases. The overall AUC for distinguishing malignancy was 0.848 (95% confidence interval [CI]: 0.817–0.880). The AUC for each examiner ranged from 0.791 to 0.898, with Specialist 2 showing the highest accuracy and sensitivity varying between 0.677 and 0.839. A moderate degree of agreement was noted among the four examiners (Fleiss' kappa was 0.586). The performance of trainees and specialists differed significantly in evaluating the solid tissue and the papillary projections in both malignant and benign groups ( P < .001). Conclusions The IOTA ADNEX model effectively differentiates benign and malignant adnexal masses in Japanese women. Although the accuracy matched up moderately among the four examiners, better accuracy is expected with training in evaluating solid tissue and papillary projections.
... Adnexal masses are frequently diagnosed on pelvic ultrasound in symptomatic women, but also as an incidental finding on other imaging modalities such as CT, undertaken for a different purpose. Many adnexal lesions are benign and there are well-established diagnostic algorithms using ultrasound to categorise findings and stratify the risk of malignancy [2][3][4]. ...
... On external validation, the ability of the ADNEX model to discriminate between benign and malignant masses has been shown to be very good and the model has been shown to be well-calibrated (i.e. the calculated risk of malignancy agrees well with the observed prevalence of malignancy) 5 . The ADNEX model maintains its diagnostic performance in the hands of operators with different experience and training 6 . ...
Article
Full-text available
Objective Our primary aim was to identify radiomic ultrasound features that can distinguish benign from malignant adnexal masses with solid ultrasound morphology, and primary invasive from metastatic solid ovarian masses, and to develop ultrasound‐based machine learning models that include radiomics features to discriminate between benign and malignant solid adnexal masses. Our secondary aim was to compare the diagnostic performance of our radiomics models with that of the ADNEX model and subjective assessment by an experienced ultrasound examiner. Methods This is a retrospective observational single center study. Patients with a histological diagnosis of an adnexal tumor with solid morphology at preoperative ultrasound examination performed between 2014 and 2021 were included. The patient cohort was split into training and validation sets with a ratio of 70:30 and with the same proportion of benign and malignant (borderline, primary invasive and metastatic) tumors in the two subsets. The extracted radiomic features belonged to two different families: intensity‐based statistical features and textural features. Models to predict malignancy were built based on a random forest classifier, fine‐tuned using 5‐fold cross‐validation over the training set, and tested on the held‐out validation set. The variables used in model building were patient′s age, and those radiomic features that were statistically significantly different between benign and malignant adnexal masses (Wilcoxon‐Mann‐Whitney Test with Benjamini‐Hochberg correction for multiple comparisons) and assessed as not redundant based on the Pearson correlation coefficient. We describe discriminative ability as area under the receiver operating characteristics curve (AUC) and classification performance as sensitivity and specificity. Results 326 patients were identified and 775 preoperative ultrasound images were analyzed. 68 radiomic features were extracted, 52 differed statistically significantly between benign and malignant tumors in the training set, and 18 features were selected for inclusion in model building. The same 52 radiomic features differed statistically significantly between benign, primary invasive malignant and metastatic tumors. However, the values of the features manifested overlap between primary malignant and metastatic tumors and did not differ statistically significantly between them. In the validation set, 25/98 tumors (25.5%) were benign, 73/98 (74.5%) were malignant (6 borderline, 57 primary invasive, 10 metastases). In the validation set, a model including only radiomics features had an AUC of 0.80, and 78% sensitivity and 76% specificity at its optimal risk of malignancy cutoff (68% based on Youden′s index). The corresponding results for a model including age and radiomics features were 0.79, 86% and 56% (cutoff 60% based on Youden′s method), while those of the ADNEX model were 0.88, 99% and 64% (at 20% malignancy cutoff). Subjective assessment had sensitivity 99% and specificity 72%. Conclusions Even though our radiomics models had discriminative ability inferior to that of the ADNEX model, our results are promising enough to justify continued development of radiomics analysis of ultrasound images of adnexal masses. This article is protected by copyright. All rights reserved.
Article
OBJECTIVE To compare the performance of four commonly used algorithms to differentiate benign from malignant adnexal masses when used by a novice operator. METHODS Women with adnexal masses treated at Mayo Clinic, Rochester, Minnesota, in 2019 were identified retrospectively. Patients were included if they underwent surgery within 3 months of diagnosis or had at least 10 months of follow-up. A nonexpert operator (European Federation of Societies for Ultrasound in Medicine and Biology level I) classified each lesion using ADNEX (Assessment of Different Neoplasias in the Adnexa), two-step strategy (benign descriptors followed by ADNEX), O-RADS (Ovarian-Adnexal Reporting and Data System) 2019, and O-RADS 2022. The primary outcome measure was the area under the receiver operating characteristic curve (AUC) compared across the four algorithms. RESULTS A total of 556 women were included in the analyses: 452 with benign and 104 with malignant masses. The AUCs of ADNEX, the two-step strategy, O-RADS 2019, and O-RADS 2022 were 0.90 (95% CI, 0.87–0.94), 0.91 (95% CI,0.88–0.94), 0.88 (95% CI,0.84–0.91), and 0.88 95% CI, (0.84–0.91), respectively. The two-step strategy performed significantly better than the O-RADS algorithms ( P =.005 and P =.002). With all the algorithms, the observed malignancy rate was 1.9–2.2% among lesions categorized as almost certainly benign, twofold higher than the expected less than 1.0%. Lesions wrongly classified as almost certainly benign were borderline tumors (n=4) and metastases (n=3). CONCLUSION In the hands of a novice operator, all algorithms performed well and were able to distinguish benign from malignant lesions. Although the two-step strategy performed slightly better than the O-RADSs, the difference did not appear to be clinically meaningful. The malignancy rate among lesions classified as almost certainly benign was unexpectedly high at 1.9–2.3%, approximately double the expected rate of less than 1.0%.
Article
Objectives The primary aim was the validation of benign descriptors (BDs), followed by Assessment of Different NEoplasia's of the adneXa (ADNEX) (when BDs cannot be applied), in a two‐step strategy to classify adnexal masses in pregnancy. The secondary aim was to describe the natural history of adnexal masses in pregnancy. Methods Retrospective analysis of prospectively collected data of women with an adnexal mass on ultrasonography identified during pregnancy between 2017 and 2022. The study was conducted at Queen Charlotte's and Chelsea Hospital, UK. Relevant clinical and ultrasound data were extracted from the medical records and ultrasound software astraia. Adnexal masses were classified and managed according to expert subjective assessment (SA). Ultrasound features were recorded prospectively at the time of ultrasound examination. Borderline ovarian tumours (BOT) were classified as malignant. Benign Descriptors (BDs) were applied to classify adnexal masses, in cases where BDs were not applicable, the ADNEX model (using a risk of malignancy of > 10%) was used, in a two‐step strategy. The two‐step strategy was applied retrospectively. The reference standard used was histology (where available) or expert SA at the postnatal ultrasound scan. Results 291 women with a median age of 33 (IQR 29‐36) years presented with an adnexal mass in pregnancy, at a median gestation of 12 (IQR 8‐17) weeks. 267 (267/291, 91.8%) women were followed up to the postnatal period, as 24 women (24/291, 8.2%) were lost to follow up. Based on the reference standard, 4.1% of adnexal masses (11/267) were classified as malignant (all BOTs) and 95.9% (256/267) as benign (41 on histology and 215 based on expert SA at postnatal ultrasound). BDs could be applied to 68.9% of adnexal masses (184/267); of these only one mass (BOT) was misclassified as benign (1/184, 0.5%). ADNEX was used to classify the residual masses (83/267) and misclassified three BOTs as benign (3/10, 30.0%) and 25 benign masses (based on reference standard) as malignant (25/73, 34.2%), 13 (13/25, 52.0%) of these were classified as decidualised endometriomas on expert SA, with confirmed resolution of decidualisation in the postnatal period. The two‐step strategy had a specificity of 90.2%, sensitivity of 63.6%, negative predictive value of 98.3% and positive predictive value of 21.9%. 56 (56/267, 21.0%) women had surgical intervention, four as an emergency during pregnancy (4/267, 1.5%,) and four (4/267, 1.5%) electively during caesarean section. 48 (48/267, 18.0%) women had surgical intervention in the post‐natal period, 11 (11/267, 4.1%) in the first 12 weeks postnatal and 37 >12 weeks (37/267, 13.9%) postnatal. 64 (64/267, 24.0%) adnexal masses resolved spontaneously during follow up. Cyst‐related complications occurred in four women (4/267, 1.5%) during pregnancy (ovarian torsion n=2, cyst rupture n=2) and six (6/267, 2.2%) in the postnatal period (all ovarian torsion). 196 (196/267, 73.4%) had a persistent adnexal mass, including one of the women who had an ovarian torsion and underwent de‐torsion and had a persistent adnexal mass at postnatal ultrasound. Presumed decidualisation occurred in 31.1% (19/61) of endometriomas and had resolved in 89.5% (17/19) by the first postnatal ultrasound scan. Conclusion We found Benign Descriptors apply to most masses in pregnancy, however the small number of malignant tumours in the cohort (4.1%) restricted the evaluation of the ADNEX model, so expert subjective assessment should be used to classify adnexal masses in pregnancy, when BDs do not apply. A larger multicentre prospective study is required to evaluate the use of the ADNEX model to classify adnexal masses in pregnancy. Our data suggests that most adnexal masses can be managed expectantly during pregnancy given a large proportion of masses spontaneously resolved and the low risk of complications. This article is protected by copyright. All rights reserved.
Preprint
Full-text available
Background Ovarian cancer is the most lethal gynecologic malignancy and every attempt should be made to develop screening programs to detect it at its early stages in order to improve survival rate. Using the ADNEX model in screening for ovarian cancer will help in triaging patients with adnexal masses before undergoing surgery which will help in optimizing outcomes particularly for those with ovarian malignancy. Patients & methods: This was a prospective study which included fifty postmenopausal patients with adnexal mass. All the included patients underwent ultrasound assessment of the adnexal mass and measurement of CA 125 level. Then, the data were collected to calculate the RMI, and integrated to IOTA ADNEX calculator. The primary outcome was determining the predictive accuracy of both RMI and ADNEX model for differentiating between benign and malignant ovarian tumors by setting both against the gold standard histopathology Results Out of the included 50 patients, 56% had benign ovarian lesions, 12% had borderline ovarian tumors, and 24% had malignant ovarian tumors. The Area under the receiver operating characteristic curve (AUC) for the RMI was 0.799 and with cutoff value of 115, the sensitivity was 81.8%, the specificity was 60.7% while the AUC was 0.864 for the ADNEX model and at 10% cutoff, the sensitivity was 91.1% and the specificity was 65%. Performance of the ADNEX for the five tumor types was highest when benign histopathology was compared as stage Ⅱ - Ⅳ malignant cases with AUC of 0.823. Conclusion ADNEX model is more sensitive than RMI for differentiating between benign and malignant tumors and it can be used as screening test. However, the application of ADNEX model needs significant experience in ultrasound evaluation of adnexal masses before it can be an integral part in the screening pathway of ovarian malignancy in postmenopausal patients with adnexal masses. Clinicaltrials.gov ID: NCT05755841 – Data of registration: 3/30/2024 “retrospectively registered”
Article
Full-text available
Background: Ovarian cancer has a poor prognosis, with just 40% of patients surviving 5 years. We designed this trial to establish the effect of early detection by screening on ovarian cancer mortality. Methods: In this randomised controlled trial, we recruited postmenopausal women aged 50-74 years from 13 centres in National Health Service Trusts in England, Wales, and Northern Ireland. Exclusion criteria were previous bilateral oophorectomy or ovarian malignancy, increased risk of familial ovarian cancer, and active non-ovarian malignancy. The trial management system confirmed eligibility and randomly allocated participants in blocks of 32 using computer-generated random numbers to annual multimodal screening (MMS) with serum CA125 interpreted with use of the risk of ovarian cancer algorithm, annual transvaginal ultrasound screening (USS), or no screening, in a 1:1:2 ratio. The primary outcome was death due to ovarian cancer by Dec 31, 2014, comparing MMS and USS separately with no screening, ascertained by an outcomes committee masked to randomisation group. All analyses were by modified intention to screen, excluding the small number of women we discovered after randomisation to have a bilateral oophorectomy, have ovarian cancer, or had exited the registry before recruitment. Investigators and participants were aware of screening type. This trial is registered with ClinicalTrials.gov, number NCT00058032. Findings: Between June 1, 2001, and Oct 21, 2005, we randomly allocated 202 638 women: 50 640 (25·0%) to MMS, 50 639 (25·0%) to USS, and 101 359 (50·0%) to no screening. 202 546 (>99·9%) women were eligible for analysis: 50 624 (>99·9%) women in the MMS group, 50 623 (>99·9%) in the USS group, and 101 299 (>99·9%) in the no screening group. Screening ended on Dec 31, 2011, and included 345 570 MMS and 327 775 USS annual screening episodes. At a median follow-up of 11·1 years (IQR 10·0-12·0), we diagnosed ovarian cancer in 1282 (0·6%) women: 338 (0·7%) in the MMS group, 314 (0·6%) in the USS group, and 630 (0·6%) in the no screening group. Of these women, 148 (0·29%) women in the MMS group, 154 (0·30%) in the USS group, and 347 (0·34%) in the no screening group had died of ovarian cancer. The primary analysis using a Cox proportional hazards model gave a mortality reduction over years 0-14 of 15% (95% CI -3 to 30; p=0·10) with MMS and 11% (-7 to 27; p=0·21) with USS. The Royston-Parmar flexible parametric model showed that in the MMS group, this mortality effect was made up of 8% (-20 to 31) in years 0-7 and 23% (1-46) in years 7-14, and in the USS group, of 2% (-27 to 26) in years 0-7 and 21% (-2 to 42) in years 7-14. A prespecified analysis of death from ovarian cancer of MMS versus no screening with exclusion of prevalent cases showed significantly different death rates (p=0·021), with an overall average mortality reduction of 20% (-2 to 40) and a reduction of 8% (-27 to 43) in years 0-7 and 28% (-3 to 49) in years 7-14 in favour of MMS. Interpretation: Although the mortality reduction was not significant in the primary analysis, we noted a significant mortality reduction with MMS when prevalent cases were excluded. We noted encouraging evidence of a mortality reduction in years 7-14, but further follow-up is needed before firm conclusions can be reached on the efficacy and cost-effectiveness of ovarian cancer screening. Funding: Medical Research Council, Cancer Research UK, Department of Health, The Eve Appeal.
Article
Full-text available
Cancer screening strategies have commonly adopted single-biomarker thresholds to identify abnormality. We investigated the impact of serial biomarker change interpreted through a risk algorithm on cancer detection rates. In the United Kingdom Collaborative Trial of Ovarian Cancer Screening, 46,237 women, age 50 years or older underwent incidence screening by using the multimodal strategy (MMS) in which annual serum cancer antigen 125 (CA-125) was interpreted with the risk of ovarian cancer algorithm (ROCA). Women were triaged by the ROCA: normal risk, returned to annual screening; intermediate risk, repeat CA-125; and elevated risk, repeat CA-125 and transvaginal ultrasound. Women with persistently increased risk were clinically evaluated. All participants were followed through national cancer and/or death registries. Performance characteristics of a single-threshold rule and the ROCA were compared by using receiver operating characteristic curves. After 296,911 women-years of annual incidence screening, 640 women underwent surgery. Of those, 133 had primary invasive epithelial ovarian or tubal cancers (iEOCs). In all, 22 interval iEOCs occurred within 1 year of screening, of which one was detected by ROCA but was managed conservatively after clinical assessment. The sensitivity and specificity of MMS for detection of iEOCs were 85.8% (95% CI, 79.3% to 90.9%) and 99.8% (95% CI, 99.8% to 99.8%), respectively, with 4.8 surgeries per iEOC. ROCA alone detected 87.1% (135 of 155) of the iEOCs. Using fixed CA-125 cutoffs at the last annual screen of more than 35, more than 30, and more than 22 U/mL would have identified 41.3% (64 of 155), 48.4% (75 of 155), and 66.5% (103 of 155), respectively. The area under the curve for ROCA (0.915) was significantly (P = .0027) higher than that for a single-threshold rule (0.869). Screening by using ROCA doubled the number of screen-detected iEOCs compared with a fixed cutoff. In the context of cancer screening, reliance on predefined single-threshold rules may result in biomarkers of value being discarded. © 2015 by American Society of Clinical Oncology.
Article
Full-text available
All gynecologists are faced with ovarian tumors on a regular basis, and the accurate preoperative diagnosis of these masses is important because appropriate management depends on the type of tumor. Recently, the International Ovarian Tumor Analysis (IOTA) consortium published the Assessment of Different NEoplasias in the adneXa (ADNEX) model, the first risk model that differentiates between benign and four types of malignant ovarian tumors: borderline, stage I cancer, stage II-IV cancer, and secondary metastatic cancer. This approach is novel compared to existing tools that only differentiate between benign and malignant tumors, and therefore questions may arise on how ADNEX can be used in clinical practice. In the present paper, we first provide an in-depth discussion about the predictors used in ADNEX and the ability for risk prediction with different tumor histologies. Furthermore, we formulate suggestions about the selection and interpretation of risk cut-offs for patient stratification and choice of appropriate clinical management. This is illustrated with a few example patients. We cannot propose a generally applicable algorithm with fixed cut-offs, because (as with any risk model) this depends on the specific clinical setting in which the model will be used. Nevertheless, this paper provides a guidance on how the ADNEX model may be adopted into clinical practice.
Article
Full-text available
Background Prediction models are developed to aid healthcare providers in estimating the probability or risk that a specific disease or condition is present (diagnostic models) or that a specific event will occur in the future (prognostic models), to inform their decision-making. However, the overwhelming evidence shows that the quality of reporting of prediction model studies is poor. Only with full and clear reporting of information on all aspects of a prediction model can risk of bias and potential usefulness of prediction models be adequately assessed.Materials and methodsThe Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) initiative developed a set of recommendations for the reporting of studies developing, validating or updating a prediction model, whether for diagnostic or prognostic purposes. This article describes how the TRIPOD Statement was developed. An extensive list of items based on a review of the literature was created, which was reduced after a Web-based survey and revised during a 3-day meeting in June 2011 with methodologists, healthcare professionals and journal editors. The list was refined during several meetings of the steering group and in e-mail discussions with the wider group of TRIPOD contributors.ResultsThe resulting TRIPOD Statement is a checklist of 22 items, deemed essential for transparent reporting of a prediction model study. The TRIPOD Statement aims to improve the transparency of the reporting of a prediction model study regardless of the study methods used. The TRIPOD Statement is best used in conjunction with the TRIPOD explanation and elaboration document.Conclusions To aid the editorial process and readers of prediction model studies, it is recommended that authors include a completed checklist in their submission (also available at www.tripod-statement.org).
Article
Full-text available
In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coefficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved as well. It is unclear whether the performance of a model fitted using the lasso still shows some optimism. Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects. It is unclear how resampling should be performed in the presence of multiply imputed data. The data were based on a cohort of Chronic Obstructive Pulmonary Disease patients. We constructed models to predict Chronic Respiratory Questionnaire dyspnea 6 months ahead. Optimism of the lasso model was investigated by comparing 4 approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets. In the first 3 approaches, data sets that had been completed via multiple imputation (MI) were resampled, while the fourth approach resampled the incomplete data set and then performed MI. The discriminative model performance of the lasso was optimistic. There was suboptimal calibration due to over-shrinkage. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure. Resampling the completed data sets underestimates optimism, especially if, within a bootstrap step, selected individuals differ over the imputed data sets. Incorporating the MI procedure in the validation yields estimates of optimism that are closer to the true value, albeit slightly too larger. Performance of prognostic models constructed using the lasso technique can be optimistic as well. Results of the internal validation are sensitive to how bootstrap resampling is performed.
Article
Interest in screening for ovarian cancer, which is common in developed countries, has grown in recent years. This study, which seems to be the first prospective randomized report of ovarian cancer screening, was designed to establish a better strategy for detecting early cancers. Asymptomatic postmenopausal women, seen in the years 1985 to 1999, were randomly assigned to an intervention group (n = 41,688) or to a control group (n = 40,799) and were followed up for an average of 9.2 years. The original goal was to offer annual screens comprising pelvic ultrasonography and a serum cancer antigen 125 (CA-125) test to women in the intervention group. Those with abnormal ultrasound findings or elevated CA-125 were referred to a gynecologic oncologist for surgical assessment. In late 2002 when the code was broken, 27 cancers were found in screened women and 8 more were diagnosed outside the screening program. Rates of detecting ovarian cancer were 0.31 per 1000 at the prevalent screen and ranged from 0.38 to 0.74 per 1000 at subsequent screens. Rates increased on successive screens. Ovarian cancer developed in 32 control women. Fewer women in the screened group than in the control group had advanced-stage disease. The proportion of stage I ovarian cancer was higher in screened than in control women (63% vs. 38%), but the difference fell short of statistical significance. The histologic type of index cancers was similar in the screened and control groups, as were tumor grade and the use of adjuvant chemotherapy. More women in the screening group than in the control group had no disease or only microscopic disease, but this difference also was not statistically significant. Continuing follow-up of this cohort is expected to provide further information about the effects of screening for ovarian cancer on asymptomatic postmenopausal women.
Article
Cancer screening strategies have commonly adopted single-biomarker thresholds to identify abnormality. We investigated the impact of serial biomarker change interpreted through a risk algorithm on cancer detection rates.
Article
Objective: Calibrated risk models are vital for valid decision support. We define four levels of calibration and describe implications for model development and external validation of predictions. Study design and setting: We present results based on simulated datasets. Results: A common definition of calibration is "having an event rate of R % among patients with a predicted risk of R %", which we refer to as 'moderate calibration'. Weaker forms of calibration only require the average predicted risk (mean calibration) or the average prediction effects (weak calibration) to be correct. 'Strong calibration' requires that the event rate equals the predicted risk for every covariate pattern. This implies that the model is fully correct for the validation setting. We argue that this is unrealistic: the model type may be incorrect, at model development the linear predictor is only asymptotically unbiased, and all nonlinear and interaction effects should be correctly modeled. In addition, we prove that moderate calibration guarantees non-harmful decision-making. Finally, results indicate that a flexible assessment of calibration in small validation datasets is problematic. Conclusion: Strong calibration is desirable for individualized decision support, but unrealistic and counter-productive by stimulating the development of overly complex models. Model development and external validation should focus on moderate calibration.
Article
Prediction models are developed to aid health care providers in estimating the probability or risk that a specific disease or condition is present (diagnostic models) or that a specific event will occur in the future (prognostic models), to inform their decision making. However, the overwhelming evidence shows that the quality of reporting of prediction model studies is poor. Only with full and clear reporting of information on all aspects of a prediction model can risk of bias and potential usefulness of prediction models be adequately assessed. The Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) Initiative developed a set of recommendations for the reporting of studies developing, validating, or updating a prediction model, whether for diagnostic or prognostic purposes. This article describes how the TRIPOD Statement was developed. An extensive list of items based on a review of the literature was created, which was reduced after a Web-based survey and revised during a 3-day meeting in June 2011 with methodologists, health care professionals, and journal editors. The list was refined during several meetings of the steering group and in e-mail discussions with the wider group of TRIPOD contributors. The resulting TRIPOD Statement is a checklist of 22 items, deemed essential for transparent reporting of a prediction model study. The TRIPOD Statement aims to improve the transparency of the reporting of a prediction model study regardless of the study methods used. The TRIPOD Statement is best used in conjunction with the TRIPOD explanation and elaboration document. To aid the editorial process and readers of prediction model studies, it is recommended that authors include a completed checklist in their submission (also available at www.tripod-statement.org). © 2015 Royal College of Obstetricians and Gynaecologists.