Available via license: CC BY 4.0
Content may be subject to copyright.
1
MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Large- scale analysis to identify risk factors
for ovariancancer
Iqbal Madakkatel ,1,2 Amanda L Lumsden ,1,2 Anwar Mulugeta ,1,2,3 Johanna Mäenpää,4
Martin K Oehler,5,6 Elina Hyppönen 1,2
►Additional supplemental
material is published online
only. To view, please visit the
journal online (https:// doi. org/
10. 1136/ ijgc- 2024- 005424).
For numbered afliations see
end of article.
Correspondence to
Professor Elina Hyppönen,
Australian Centre for Precision
Health, University of South
Australia, c/o SAHMRI, GPO
BOX 2471, Adelaide, SA 5001,
Australia; Elina. Hypponen@
unisa. edu. au
IM and ALL contributed equally.
Received 27 February 2024
Accepted 3 June 2024
To cite: MadakkatelI,
LumsdenAL, MulugetaA,
etal. Int J Gynecol Cancer
Published Online First: [please
include Day Month Year].
doi:10.1136/ijgc-2024-
005424
Original research
© IGCS and ESGO 2024.
Re- use permitted under CC BY.
Published by BMJ.
Original research
Editorials
Joint statement
Society statement
Meeting summary
Review articles
Consensus statement
Clinical trial
Tumor board
Video articles
Images
Pathology archives
Corners of the world
Commentary
Letters
ijgc.bmj.com
INTERNATIONAL JOURNAL OF
GYNECOLOGICAL CANCER
ABSTRACT
Objective Ovarian cancer is characterized by late- stage
diagnoses and poor prognosis. We aimed to identify factors
that can inform prevention and early detection of ovarian
cancer.
Methods We used a data- driven machine learning
approach to identify predictors of epithelial ovarian cancer
from 2920 input features measured 12.6 years (IQR
11.9 to 13.3 years) before diagnoses. Analyses included
221 732 female participants in the UK Biobank without
a history of cancer. During the follow- up 1441 women
developed ovarian cancer. For factors that contributed to
model prediction, we used multivariate logistic regression
to evaluate the association with ovarian cancer, with
evidence for causality tested by Mendelian randomization
(MR) analyses in the Ovarian Cancer Genetics Consortium
(25 509 cases).
Results Greater parity and ever- use of oral contraception
were associated with lower ovarian cancer risk (ever vs
never OR 0.74, 95% CI 0.66 to 0.84). After adjustment
for established risk factors, greater height, weight, and
greater red blood cell distribution width were associated
with increased ovarian cancer risk, while higher aspartate
aminotransferase levels and mean corpuscular volume
were associated with lower risk. MR analyses conrmed
observational associations with anthropometric/adiposity
traits (eg, body fat percentage per standard deviation (SD);
OR inverse- variance weighted (ORIVW) 1.28, 95% CI 1.13
to 1.46) and aspartate aminotransferase (ORIVW 0.87, 95%
CI 0.78 to 0.98). MR also provided genetic evidence for
a protective association of higher total serum protein on
ovarian cancer, higher lymphocyte count on serous and
endometrioid ovarian cancer, and greater forced expiratory
volume in 1 s on serous ovarian cancer among other
ndings.
Conclusions This study shows that certain risk factors
for ovarian cancer are modiable, suggesting that weight
reduction and interventions to reduce the number of
ovulations may provide potential for future prevention.
We also identied blood biomarkers associated with
ovarian cancer years before diagnoses, warranting further
investigation.
INTRODUCTION
Ovarian cancer accounts for more deaths than
any other gynaecological cancer.1 The prognosis
of ovarian cancer is typically poor, with most cases
(~70%) diagnosed at stage 3 or 42 when the 5- year
survival rate is less than 30%. In contrast, the 5- year
survival rate for stage 1 diagnoses is more than 90%,
underscoring the importance of early detection. While
there is a genetic component to ovarian cancer risk,3
the vast majority of ovarian cancer cases are sporadic.
Factors such as use of oral contraception and medi-
cations including aspirin and levonorgestrel4 have
been linked to a lower ovarian cancer risk; however,
evidence for a role of environmental and lifestyle
factors is limited. There is great need for research to
understand predictors and risk factors that associate
with long- term susceptibility to ovarian cancer, with
potential to improve risk stratication, earlier detec-
tion, and strategies to prevent ovarian cancer.
Recent increases in the size and scale of prospec-
tive cohort studies offer unique opportunities for
WHAT IS ALREADY KNOWN ON THIS TOPIC
⇒There are several known risk factors for ovarian
cancer, but little that women can do to mitigate
ovarian cancer risk by changes to lifestyle. The only
effective prevention is ovary/fallopian tube remov-
al, which is not an appropriate solution for every
woman. Diagnosis is commonly done at a late stage
when prognosis is poor. No prior large- scale studies
have examined thousands of participant character-
istics for their contribution to ovarian cancer predic-
tion to identify potential risk factors.
WHAT THIS STUDY ADDS
⇒This study validates several established risk/pro-
tective factors and discusses ovarian cancer risk
factors in the context of the risk factors of cancer in
general. It identies the roles of higher weight, oral
contraceptives, and parity as risk factors for ovarian
cancer and shows that biomarkers measured sev-
eral years before the diagnosis can predict future
diagnosis.
HOW THIS STUDY MIGHT AFFECT RESEARCH,
PRACTICE OR POLICY
⇒This study shows that certain risk factors for ovarian
cancer are modiable, suggesting that weight re-
duction and interventions to reduce the number of
ovulations may provide potential targets for future
prevention strategies. It identies several blood bio-
markers that are associated with ovarian cancer risk
years before diagnoses, which need to be investi-
gated for underlying mechanisms and their potential
to support early diagnosis.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
2MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
discovering new risk factors even for relatively rare diseases such
as ovarian cancer for which effective prevention strategies are still
lacking. Where information has been obtained from individuals
before disease diagnosis, new machine learning5 approaches can
be used to predict further disease risks. There are several studies
using machine learning in the context of ovarian cancer; however,
they have mostly focused on diagnostics6 or investigated factors
associated with disease prognosis.7–10 To date, studies focused on
risk prediction have focused on specic population groups.11 12 In the
context of disease risk prediction, machine learning is attractive as
it is able to identify potential risk factors from large volumes of data,
while remaining capable of handling missing information, complex
interactions, and nonlinear relationships. In this study, we imple-
ment hypothesis- free analyses that combine machine learning and
statistical approaches in a group of over 200 000 women, with the
overall objective of identifying novel actionable targets for ovarian
cancer prevention. To the best of our knowledge, this is the rst
study aiming to identify novel risk factors from among thousands of
characteristics, all collected before the diagnosis of ovarian cancer.
METHODS
Participants
The UK Biobank includes over 500 000 participants recruited
between 2006 and 2010 (aged 37–73 years) through 22 assess-
ment centers across England, Wales, and Scotland.13 The baseline
data collection covered touchscreen questionnaires, face- to- face
interviews, physical measurements, and blood and urine collec-
tion for genetic assays and biomarker assessments. Information
on disease outcomes was obtained through linkage to cancer
registrations and hospital admissions.13 Restricting the study to
female participants with active consent and excluding women with
a history of cancer (n=51 587) left 221 732 eligible women for the
analysis (Online Supplemental Figure 1). These included 1441 inci-
dent ovarian cancer cases (our main outcome) who were identi-
ed via data linkage to cancer registrations and hospital records,
coded using International Classication of Diseases (ICD) version
10.14 Cancer registry data (available for 1120 cases) was used for
ovarian cancer subtype classication. For possible predictors, we
only included cross- sectional phenotypic data, including informa-
tion obtained using touchscreen questionnaires, clinical exami-
nations, and biomarker assays collected at baseline assessment
(median of 12.6 years (IQR 11.9–13.3 years) before ovarian cancer
diagnosis). After excluding highly correlated features (|r|≥0.9) we
had 2920 possible predictors (exposures) for analyses (Online
Supplemental Tables 1 and 2). Online Supplemental Methods has
more details on the study population and the features considered.
Model Development and Statistical Analyses
As shown in Figure 1A, we conducted the main analyses in two
stages. In the rst stage, we used machine learning to screen for
risk factors that contribute to the probability of developing ovarian
cancer during the follow- up.15 16 In the second stage, we examined
the direction, strength, and robustness of the association between
risk factors identied as making the most important contribution
to the model in stage 1 (top 3% of features) and ovarian cancer
by implementing epidemiological analyses. In epidemiological
analyses, we assessed statistical signicance using false discovery
rate (FDR) correction17 to account for multiple testing. The machine
learning algorithm used in stage 1 was gradient boosting deci-
sion trees (GBDT, CatBoost implementation)18 19 that use a series
of decision trees to achieve the most accurate prediction of the
outcome (Figure1B). Risk factors that made an important contribu-
tion to t of the GBDT models were identied using SHAP (SHapley
Additive exPlanation) values,20 which allowed us to rank the risk
factors based on their contribution and to identify the most ‘impor-
tant’ features to be taken to stage 2 analyses. Stage 2 analyses
started with logistic regression modeling, including two adjustment
strategies. In the ‘basic adjusted’ model, we accounted for key
confounders and measures related to study structure, including
adjustments for age, ethnicity, assessment center, year of attending
the center, and Townsend deprivation index, and, for all biomarkers,
fasting time and sample aliquot. Logistic regression analyses were
repeated with further adjustment for conceptually relevant risk
factors that were associated with ovarian cancer in the UK Biobank:
family history of breast cancer/prostate cancer, ever- use of the
contraceptive pill, parity, tubal ligation, and household income (‘risk
factor adjusted models’) (Online Supplemental Figure 2 and Table
3). In sensitivity analyses to explore whether undiagnosed tumors
might be responsible for the observed associations, we repeated the
basic analyses after excluding ovarian cancer cases reported within
the rst 2 years after the baseline assessment. We also investigated
whether menopause interacted (p<0.05 after FDR correction) with
trait–ovarian cancer associations (Online Supplemental Methods).
All analyses were repeated to assess associations of SHAP-
important features with the main ovarian cancer subtypes (Online
Supplemental Figure 1). Furthermore, we conducted Mendelian
randomization (MR) analyses to obtain proof of principle evidence
for a causal association for identied risk factors. We used the
OpenGWAS21 repository to identify genetic variants approximating
the exposures, while summary- level data for evaluating the genetic
association with ovarian cancer outcomes (including ovarian cancer
subtypes) were obtained from the Ovarian Cancer Association
Consortium (OCAC).22 We used inverse- variance weighted (IVW)
MR23 as the primary method in the absence of evidence on direc-
tional pleiotropy. For exposures with a single genetic instrument,
we used the Wald ratio method. For exposures instrumented by ≥3
variants, the intercept test from MR- Egger was used to assess the
presence of pleiotropy (p<0.05), and where detected, we present
estimates from MR- Egger analyses.
RESULTS
During the median follow- up of 12.6 years (IQR 11.9–13.3 years),
there were 1441 new ovarian cancer cases and the remaining
220 291 participants were considered as controls (Table1). Figure2
shows the relative contribution of each feature category to the
ovarian cancer prediction, with full information in Online Supple-
mental Results (page 8). In Online Supplemental Table 4 we show
information on the 87 features (top 3%) that were taken forward to
logistic regression analyses.
Phenotypic Observational Analyses
After basic adjustment and FDR correction, several female- specic
features were associated with ovarian cancer risk including
“ever- use of oral contraceptives” (OR 0.74, 95% CI 0.66 to 0.84),
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
3
MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
“older age of last use of the contraceptive pill” (45+ years vs <25
years: OR 0.57, 95% CI 0.41 to 0.80), and number of live births
(2+ vs none: OR 0.61, 95% CI 0.54 to 0.69) (Figure3A). None of
the SHAP- important dietary factors showed an association with
ovarian cancer. Anthropometric traits were associated with ovarian
cancer risk, and an elevated risk of ovarian cancer was seen both
by greater standing height (per 1 SD higher, OR 1.13, 95% CI 1.07
to 1.20) and weight (OR 1.08, 95% CI 1.03 to 1.14). Bilateral oopho-
rectomy, the second most SHAP- important feature overall, showed
an expected strong association with lower ovarian cancer risk (OR
0.21, 95% CI 0.14 to 0.31).
Online Supplemental Table 5 shows SHAP- important biomarkers
with their mean and SD. Biomarkers were strongly represented
among important features (Figure 3B). Higher levels of aspartate
aminotransferase (per 1 SD, OR 0.89, 95% CI 0.82 to 0.96) and
alanine aminotransferase (OR 0.89, 95% CI 0.83 to 0.96) were
associated with lower ovarian cancer risk. Several red blood cell
features were associated with ovarian cancer risk, including mean
Figure 1 Analytical strategy. (A)shows the GBDT- SHAP pipeline followed by logistic regression models and Mendelian
randomization (MR) analysis. (B)shows a schematic diagram of successively adding new decision trees to the ensemble of
decision trees in a GBDT model. GBDT, gradient boosting decision trees; PHESANT, PHEnome Scan ANalysis Tool; SHAP,
SHapley Additive exPlanation.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
4MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
corpuscular volume (OR 0.92, 95% CI 0.87 to 0.97), mean corpus-
cular hemoglobin (OR 0.93, 95% CI 0.88 to 0.98), and variation
in the size of red blood cells (red blood cell distribution width, OR
1.09, 95% CI 1.04 to 1.15). Having a higher neutrophil percentage
was also associated with higher ovarian cancer risk (OR 1.08,
95% CI 1.03 to 1.14). Most biomarker associations were only
modestly attenuated after risk factor adjustment (Online Supple-
mental Figure 3 and Table 4). In secondary analyses, we investi-
gated associations with risk of ovarian cancer subtypes including
serous (n=616), endometrioid (n=59), clear cell (n=50), and muci-
nous ovarian cancer (n=43) (Online Supplemental Results (page 8),
Online Supplemental Table 6). Figure4 lists all the features asso-
ciated with overall ovarian cancer or at least one ovarian cancer
subtype. Further sensitivity analyses are presented in the Online
Supplemental Results (page 9) and Online Supplemental Table 7.
MR analyses
MR analyses on overall ovarian cancer in the UK Biobank supported
adverse effects of weight (IVW, per 1 SD higher; OR 1.14, 95% CI
1.04 to 1.26), body fat percentage (IVW; OR 1.28, 95% CI 1.13 to
1.46), basal metabolic rate (IVW; OR 1.24, 95% CI 1.10 to 1.39), and
several other measures for adiposity, with directionally consistent
ndings from consortia meta- analyses (Online Supplemental Table
8Online supplemental table 8). MR analyses in the UK Biobank
supported the observational association between higher aspartate
aminotransferase and lower ovarian cancer risk (IVW: OR 0.87, 95%
CI 0.78 to 0.98), and additionally supported a lower ovarian cancer
risk by higher total serum protein (UK Biobank IVW: OR 0.86, 95%
CI 0.78 to 0.96), while consortia information was not available for
these biomarkers. The genetic association between red blood cell
width and ovarian cancer was directionally inconsistent with the
observational analyses (UK Biobank Egger: OR 0.84, 95% CI 0.74 to
Table 1 Distribution of baseline characteristics and
selected risk factors used in the risk factor adjusted models
for incident ovarian cancer cases and controls in the UK
Biobank
Baseline characteristics
and selected risk factors
Controls
N
Cases
N (%) P- value
Overall 220 291 1441 (0.65)
Age (years)
<50 57 363 203 (0.35) 3.7E- 46
50–59.9 77 376 431 (0.55)
60–69.9 84 736 790 (0.92)
70+ 816 17 (2.04)
Ethnicity
African ancestry 4081 16 (0.39) 0 .04
Asian 4863 21 (0.43)
White European 206 515 1373 (0.66)
Other/mixed/unknown* 4832 31 (0.64)
Townsend index†
Q1 (low deprivation) 55 008 358 (0.65) 0.69
Q2 54 994 374 (0.68)
Q3 54 996 366 (0.66)
Q4 (high deprivation) 55 022 343 (0.62)
Missing data 271 0 (0.00)
Education
None 35 014 286 (0.81) 9.63E- 05
NVQ/CSE/A- levels 99 282 640 (0.64)
Degree/professional 81 806 487 (0.59)
Missing data 4189 28 (0.66)
Family history of BC/PC
No 182 779 1152 (0.63) 0.002
Yes 37 512 289 (0.76)
Parity
0 41 162 339 (0.82) 6.96E- 05
1 29 266 181 (0.61)
2/3 135 219 831 (61)
3+ 13 937 87 (0.62)
Missing data 707 3 (0.42)
Contraceptive pill use
No 39 875 389 (0.97) 2.36E- 18
Yes 179 230 1044 (0.58)
Missing data 1186 8 (0.67)
Tubal ligation
No 203 153 1342 (0.69) 0.235
Yes 16 802 98 (0.17)
Missing data 336 1 (0.30)
Annual household income (GBP)
<18 000 42 650 343 (0.80) 2.40E- 14
18 000–30 999 46 754 367 (0.78)
Continued
Baseline characteristics
and selected risk factors
Controls
N
Cases
N (%) P- value
31 000–51 999 47 034 257 (0.54)
52 000–100 000 35 275 150 (0.42)
>100 000 9121 41 (0.45)
Missing data 39 457 283 (0.71)
Data presented for basic characteristics and selected risk
factors used in the fully adjusted models. P- values are from
chi- square tests of independence between the individual
characteristics and the overall ovarian cancer outcome,
excluding missing values.
*The group ‘Other/mixed/unknown’ included all participants
who did not identify as White European, Asian (including
Chinese) or African ancestry (African/Caribbean/other), who
identied under more than one ethnic group, or who did not
provide information on ethnic background.
†The Townsend index is a measure of material deprivation
within a population, incorporating four variables, namely
unemployment, non- car ownership, non- house ownership,
and household overcrowding. A greater Townsend index
score implies a greater degree of deprivation.
BC, breast cancer; CSE, Certicate of Secondary Education;
GBP, British pound; NVQ, National Vocational Qualication;
PC, prostate cancer.
Table 1 Continued
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
5
MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
0.95). For all ovarian cancer subtypes, MR analyses provided some
evidence for a causal association with anthropometric/adiposity
traits (Online Supplemental Table 8). For serous ovarian cancer
there was evidence supporting a protective effect of greater forced
expiratory volume in 1 s (UK Biobank Egger: OR 0.46, 95% CI 0.23
to 0.91), while clear cell ovarian cancer risk was potentially linked
to strong associations with various height indices. Greater lympho-
cyte count showed protective effects for serous and endometrioid
ovarian cancer, while later age of menopause was associated with
greater endometrioid and clear cell ovarian cancer risk. There were
several other risk factors for which we saw some genetic evidence
for association with ovarian cancer or its subtypes, with full MR
results presented in Online Supplemental Results (page 10) and
Online Supplemental Table 8.
DISCUSSION
Summary of Main Results
Findings from our large- scale, hypothesis- free, machine learning
study suggest that the risk of ovarian cancer is likely to be at least
in part modiable, and also that it may be possible to develop
predictive blood tests that can identify the cancer in its early stages
of development. Many of the identied risk factors for ovarian
cancer have also been associated with the broader risk of cancer.
This notion is supported by features such as older age, greater
height, excess weight, family disease history, and some biomarkers
being picked up by the current analyses, as well as by our earlier
work on overall cancer risk using similar methodologies.24 We
also identied risk factors that appear more specic to ovarian
cancer and, importantly, our analyses strengthen observations for
a likely protective role of oral contraceptive use and higher parity.
The nding that 20% of the features suggested by the model as
important for ovarian cancer were blood biomarkers is notable, and
can potentially inform on mechanisms associated with tumorigen-
esis, and support the development of predictive blood tests. More
work is needed to conrm the importance and role of individual
biomarkers in ovarian cancer.
Results in the Context of Published Literature
As validation of our machine learning approach, the GBDT model
picked up bilateral oophorectomy as the second most important
feature for predicting ovarian cancer after age. Both ever- use and
older age of last use of oral contraceptives are associated with
lower risk, which is consistent with previously reported protec-
tive effects of oral contraceptive use.25 This may be explained by
ovulation inhibition26 in line with the “incessant ovulation” theory
of ovarian cancer,27 28 and may be related to reduced exposure to
carcinogenic and transformative factors present in ovulatory folli-
cular uid such as reactive oxygen species, insulin- like growth
factor 2, and hepatocyte growth factor.26 Increased ovulatory cycles
may also explain the observed association between number of
live births and ovarian cancer, where nulliparous women had the
greatest risk. This association was the strongest with the clear cell
ovarian cancer subtype, consistent with a previous study29 and
ndings suggesting an association between infertility and higher
ovarian cancer risk.30
Although none of the dietary factors were observationally associ-
ated with ovarian cancer, the MR approach supported a protective
Figure 2 Contribution of each category of information to the model prediction based on the distribution of feature importance.
Features have been restricted to 87 identied potential predictors of ovarian cancer which were in the top 3% among all
predictors, with total importance standardized to 100%.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
6MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
effect of higher serum omega 3 fatty acids (enriched in sh oil,
coming up in the top 3% of important features in our analyses) on
risk of overall, endometrioid, and clear cell ovarian cancer, consis-
tent with a previously reported protective effect of omega 3 fatty
acid N3- docosapentaenoic acid on endometrioid ovarian cancer.31
Fish oil may protect against ovarian cancer via anti- inammatory
effects targeting prostaglandin pathways.32
Among anthropometric traits, our study conrmed previ-
ously reported adverse associations of several adiposity- related
measures33 and taller height34 with ovarian cancer, with taller
height being particularly risky for clear cell ovarian cancer.34 35 Taller
height has been linked to increased risk for many cancers,36 and
while the mechanisms explaining these connections are not well-
established, one possibility is that growth hormones and factors
involved in growing taller such as insulin and insulin- like growth
factors may also promote tumor growth.37
Biomarkers were highly represented among the features deemed
important for ovarian cancer prediction. Higher levels of aspar-
tate and alanine aminotransferases that are central to alanine,
aspartate, and glutamate metabolism were associated with
lower ovarian cancer risk, with MR supporting a protective effect
for aspartate aminotransferase. Interestingly, a recent screen of
epithelial ovarian cancer- related metabolic biomarkers in ovarian
cancer patients using a machine learning approach also identied
“alanine, aspartate, and glutamate metabolism” as one of the ve
relevant metabolic pathways.6 Using metabolic network analyses,
another study identied ve metabolites among serum amino acid
and organic acid proles that helped distinguish between epithe-
lial ovarian cancer and healthy controls, and implicated “alanine,
aspartate, and glutamate metabolism” and “D- glutamine and
G- glutamate metabolism” pathways in epithelial ovarian cancer
pathogenesis.38 Further, oral contraceptive use has been linked to
higher aspartate aminotransferase levels, increasing with duration
of use.39 Further investigation is necessary to further understand
how aspartate aminotransferase might be protective for ovarian
cancer. One possibility is via modulating levels of glutamate, which
has been implicated in the control of major reproductive and neuro-
endocrine processes.40
Greater counts of lymphocytes and neutrophils showed protective
association with ovarian cancer risk, as did inammation marker
Figure 3 Associations between the identied potential predictors and ovarian cancer risk. (A)shows association for female-
specic features and physical measurements; (B)shows biomarkers. Odds ratios (ORs) and 95% condence itervals (CIs) for
ovarian cancer risk are from logistic regression analyses, adjusted for basic covariates (Model 1: age, assessment center, year
attending assessment center, Townsend deprivation index, and ethnicity) and other ovarian cancer risk factors (Model 2: family
history of breast cancer or prostate cancer, use of oral contraceptives, parity, tubal ligation, household income).
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
7
MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
C- reactive protein for the endometrioid subtype, strengthening a
suggestive association reported previously.35 The evidence for a
protective effect of higher serum protein in MR analyses may be
linked to higher immunoglobulins, given that the abundant serum
protein, albumin, was not deemed SHAP- important. In contrast,
our MR analyses suggested an adverse effect of a higher count
of eosinophils, innate immune cells involved in parasite response,
allergy, and asthma pathology.
Strengths and Weaknesses
An important strength is that our GBDT- SHAP pipeline was able to
identify predictive features ‘hidden’ among thousands of features
Figure 4 Odds ratios for associations between potential predictors with overall ovarian cancer and the four subtypes of
ovarian cancer (OC). Number of cases are given at the top of the gure; associations indicated in bold are signicant after
multiple testing correction. ‘Illnesses of mother – none’ represents having no diseases among the group of diseases including
breast cancer and other conditions. The features sitting height, standing height, and weight are in standard deviations. Age
when last used contraceptive pill has been scaled by dividing by ve.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
8MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
(input to a single model) considering nonlinearity and interactions.
Another strength is in the scale and rich data available from the
UK Biobank, where biomarkers, clinical assessments, and ques-
tionnaire data were available for every participant, reecting the
time before the diagnoses for all ovarian cancer cases. However,
there are also limitations, such as that the UK Biobank is not repre-
sentative of the current UK population due to healthy volunteer
bias,41 and there is an over- representation of people from White
European backgrounds. All blood measures were assessed only
once, and a high level may not necessarily reect a prolonged
elevation. The variables included in our models were also conned
to available phenotypic information, and in addition to genetic risk
markers there are likely to be other indicators reecting differ-
ences in ovarian cancer risk which were not included. Furthermore,
while we used a prospective study design, causality cannot be
concluded for our observational associations due to possibility of
residual confounding and reverse causality. However, support for
many associations was obtained from the MR analyses, with the
consortia analyses allowing us to test for associations with rarer
ovarian cancer subtypes for which we were likely underpowered
when using data from the UK Biobank.
Implications for Practice and Future Research
This study shows that certain risk factors for ovarian cancer are
modiable, suggesting that weight reduction and approaches to
lowering the number of ovulations may provide potential targets
for future prevention strategies. Our study also strongly supports
the feasibility of predictive testing for ovarian cancer, strengthening
hopes for early diagnoses. As we discuss in this article, there are
plausible mechanisms that may link many of the biomarkers identi-
ed by our study with ovarian cancer risk; however, further studies
are needed to conrm their role and clinical utility. It is important to
note that despite the extensive information available in this study,
the variables included in our models have not included all relevant
predictors, which together with genetic risk markers will be of
great interest in future studies developing risk prediction models to
screen for women at high risk of ovarian cancer.
CONCLUSIONS
In conclusion, bilateral oophorectomy, age, and some biomarkers
such as aspartate aminotransferase contributed the most to the
prediction of ovarian cancer. In addition to established risk-
modulating factors including adiposity and parity, the predictive
importance and favorable associations of measures of contra-
ceptive pill use provide further impetus for their investigation as
a potential protective agent. Associations between several blood
biomarkers and subsequent ovarian cancer risk support the possi-
bility of using blood tests to aid ovarian cancer prediction and early
diagnoses.
Author afliations
1Australian Centre for Precision Health, Unit of Clinical and Health Sciences,
University of South Australia, Adelaide, South Australia, Australia
2South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South
Australia, Australia
3Department of Pharmacology and Clinical Pharmacy, College of Health Science,
Addis Ababa University, Addis Ababa, Ethiopia
4Faculty of Medicine and Medical Technology, Tampere University, Tampere, Finland
5Department of Gynaecological Oncology, Royal Adelaide Hospital, Adelaide, South
Australia, Australia
6Adelaide Medical School, Robinson Research Institute, University of Adelaide,
Adelaide, South Australia, Australia
Acknowledgements We thank the UK Biobank participants and administrators
for making this study possible. We thank our consumer members, Ms Stephanie
Newell, Ms Jacinta Frawley Werger, and Miss Jemima Leydon, for their valuable
insights and feedback. The funders had no role in the design of the study; the
collection, analysis, and interpretation of the data; the writing of the manuscript; or
the decision to submit the manuscript for publication. The authors have no conicts
of interest to declare.
Contributors IM: methodology, software, data curation, formal analysis,
investigation; visualization; writing – original draft; writing – review and editing.
ALL: investigation; methodology; project administration; visualization, writing –
original draft; writing – review and editing. AM: data curation, formal analysis,
methodology, investigation; writing – review and editing. JM: funding acquisition;
investigation; writing – review and editing. MKO: funding acquisition; investigation;
writing – review and editing. EH: conceptualization; funding acquisition;
investigation; methodology; formal analysis; supervision; writing – original draft;
writing – review and editing. EH is the guarantor of this work.
Funding This work is supported by the Medical Research Future Fund, Australia
(Grant 2007431). EH is funded by the National Health and Medical Research
Council (Australia) Leadership Award (GNT2025349).
Competing interests None declared.
Patient consent for publication Not applicable.
Ethics approval This study involves human participants and the UK Biobank
project was approved by the National Information Governance Board for Health
and Social Care and North West Multi- centre Research Ethics Committee (11/
NW/0382). The study was conducted in accordance with the Declaration of
Helsinki. Participants provided electronic consent to use their anonymized data
and samples for health- related research, to be recontacted for further substudies,
and for the UK Biobank to access their health- related records. Participants gave
informed consent to participate in the study before taking part. This study was
conducted under project number 89630.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data may be obtained from a third party and are not
publicly available. All data are available through the UK Biobank repository upon
application. This study was conducted under project number 89630. The source
code is made available through GitHub (https://github.com/madakkmi/OC).
Supplemental material This content has been supplied by the author(s). It has
not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been
peer- reviewed. Any opinions or recommendations discussed are solely those
of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and
responsibility arising from any reliance placed on the content. Where the content
includes any translated material, BMJ does not warrant the accuracy and reliability
of the translations (including but not limited to local regulations, clinical guidelines,
terminology, drug names and drug dosages), and is not responsible for any error
and/or omissions arising from translation and adaptation or otherwise.
Open access This is an open access article distributed in accordance with the
Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits
others to copy, redistribute, remix, transform and build upon this work for any
purpose, provided the original work is properly cited, a link to the licence is given,
and indication of whether changes were made. See: https://creativecommons.org/
licenses/by/4.0/.
ORCID iDs
IqbalMadakkatel http://orcid.org/0000-0003-2339-5917
Amanda LLumsden http://orcid.org/0000-0002-0214-6498
AnwarMulugeta http://orcid.org/0000-0002-8018-3454
ElinaHyppönen http://orcid.org/0000-0003-3670-9399
REFERENCES
1 Prat J. New insights into ovarian cancer pathology. Ann Oncol
2012;23 Suppl 10:x111–7.
2 Roett MA, Evans P. Ovarian cancer: an overview. Am Fam Physician
2009;80:609–16.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
9
MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
3 Walsh T, Casadei S, Lee MK, etal. Mutations in 12 genes for
inherited ovarian, fallopian tube, and peritoneal carcinoma identied
by massively parallel sequencing. Proc Natl Acad Sci U S A
2011;108:18032–7.
4 Soini T, Hurskainen R, Grénman S, etal. Impact of levonorgestrel-
releasing intrauterine system use on the cancer risk of the ovary and
fallopian tube. Acta Oncol 2016;55:1281–4.
5 Mitchell TM. Machine learning. New York: McGraw- Hill, 1997.
6 Yao JZ, Tsigelny IF, Kesari S, etal. Diagnostics of ovarian cancer
via metabolite analysis and machine learning. Integr Biol (Camb)
2023;15:zyad005.
7 Tseng C- J, Lu C- J, Chang C- C, etal. Integration of data mining
classication techniques and ensemble learning to identify risk
factors and diagnose ovarian cancer recurrence. Artif Intell Med
2017;78:47–54.
8 Ahamad MM, Aktar S, Uddin MJ, etal. Early- stage detection of
ovarian cancer based on clinical data using machine learning
approaches. J Pers Med 2022;12:1211.
9 Hossain MA, Saiful Islam SM, Quinn JMW, etal. Machine learning
and bioinformatics models to identify gene expression patterns of
ovarian cancer associated with disease progression and mortality.
J Biomed Inform 2019;100:103313.
10 Barber EL, Garg R, Persenaire C, etal. Natural language processing
with machine learning to predict outcomes after ovarian cancer
surgery. Gynecol Oncol 2021;160:182–6.
11 Chao X, Wang S, Lang J, etal. The application of risk models based
on machine learning to predict endometriosis- associated ovarian
cancer in patients with endometriosis. Acta Obstet Gynecol Scand
2022;101:1440–9.
12 Comes MC, Arezzo F, Cormio G, etal. An explainable machine
learning ensemble model to predict the risk of ovarian cancer
in BRCA- mutated patients undergoing risk- reducing salpingo-
oophorectomy. Front Oncol 2023;13:1181792.
13 Sudlow C, Gallacher J, Allen N, etal. UK Biobank: an open access
resource for identifying the causes of a wide range of complex
diseases of middle and old age. PLOS Med 2015;12:e1001779.
14 Berek JS, Renz M, Kehoe S, etal. Cancer of the ovary, fallopian
tube, and peritoneum: 2021 update. Int J Gynaecol Obstet
2021;155 Suppl 1:61–85.
15 Madakkatel I, Zhou A, McDonnell MD, etal. Combining machine
learning and conventional statistical approaches for risk factor
discovery in a large cohort study. Sci Rep 2021;11:22997.
16 Bolón- Canedo V, Sánchez- Maroño N, Alonso- Betanzos A. Feature
selection for high- dimensional data. Prog Artif Intell 2016;5:65–75.
17 Benjamini Y, Hochberg Y. Controlling the false discovery rate: a
practical and powerful approach to multiple testing. J R Stat Soc
Series B 1995;57:289–300.
18 Prokhorenkova L, Gusev G, Vorobev A, etal. CatBoost: unbiased
boosting with categorical features. Advances in neural information
processing systems2018;31.
19 Friedman JH. Greedy function approximation: a gradient boosting
machine. Ann Statist 2001;29:1189–232.
20 Lundberg SM, Erion GG, Lee S- I. Consistent individualized feature
attribution for tree ensembles. arXiv 2018.
21 Hemani G, Zheng J, Elsworth B, etal. The MR- base platform
supports systematic causal inference across the human phenome.
Elife 2018;7:e34408.
22 Phelan CM, Kuchenbaecker KB, Tyrer JP, etal. Identication of 12
new susceptibility loci for different histotypes of epithelial ovarian
cancer. Nat Genet 2017;49:680–91.
23 Bowden J, Del Greco M F, Minelli C, etal. A framework for the
investigation of pleiotropy in two- sample summary data Mendelian
randomization. Stat Med 2017;36:1783–802.
24 Madakkatel I, Lumsden AL, Mulugeta A, etal. Hypothesis- free
discovery of novel cancer predictors using machine learning. Eur J
Clin Invest 2023;53:e14037.
25 Havrilesky LJ, Moorman PG, Lowery WJ, etal. Oral contraceptive
pills as primary prevention for ovarian cancer: a systematic review
and meta- analysis. Obstet Gynecol 2013;122:139–47.
26 Chu T- Y, Khine AA, Wu N- YY, etal. Insulin- like growth factor (IGF)
and hepatocyte growth factor (HGF) in follicular uid cooperatively
promote the oncogenesis of high- grade serous carcinoma from
fallopian tube epithelial cells: dissection of the molecular effects. Mol
Carcinog 2023;62:1417–27.
27 Fathalla MF. Incessant ovulation--a factor in ovarian neoplasia?
Lancet 1971;2:163.
28 Fathalla MF. Incessant ovulation and ovarian cancer - a hypothesis
re- visited. Facts Views Vis Obgyn 2013;5:292–7.
29 Nüesch E, Dale C, Palmer TM, etal. Adult height, coronary heart
disease and stroke: a multi- locus Mendelian randomization meta-
analysis. Int J Epidemiol 2016;45:1927–37.
30 Fan Z, Song H, Yuan R, etal. Genetic predisposition to female
infertility in relation to epithelial ovarian and endometrial cancers.
Postgrad Med J 2023;99:63–8.
31 Si S, Li J, Tewara MA, etal. Identifying causality, genetic correlation,
priority and pathways of large- scale complex exposures of breast
and ovarian cancers. Br J Cancer 2021;125:1570–81.
32 Eilati E, Small CC, McGee SR, etal. Anti- inammatory effects of sh
oil in ovaries of laying hens target prostaglandin pathways. Lipids
Health Dis 2013;12:152.
33 Aune D, Navarro Rosenblatt DA, Chan DSM, etal. Anthropometric
factors and ovarian cancer risk: a systematic review and nonlinear
dose–response meta- analysis of prospective studies. Int J Cancer
2015;136:1888–98.
34 Dixon- Suen SC, Nagle CM, Thrift AP, etal. Adult height is associated
with increased risk of ovarian cancer: a Mendelian randomisation
study. Br J Cancer 2018;118:1123–9.
35 Yarmolinsky J, Relton CL, Lophatananon A, etal. Appraising the role
of previously reported risk factors in epithelial ovarian cancer risk: a
Mendelian randomization analysis. PLoS Med 2019;16:e1002893.
36 Lai FY, Nath M, Hamby SE, etal. Adult height and risk of 50
diseases: a combined epidemiological and genetic analysis. BMC
Med 2018;16:187.
37 Stefan N, Häring H- U, Hu FB, etal. Divergent associations of
height with cardiometabolic disease and cancer: epidemiology,
pathophysiology, and global implications. Lancet Diabetes
Endocrinol 2016;4:457–67.
38 Wang X, Zhao X, Zhao J, etal. Serum metabolite signatures of
epithelial ovarian cancer based on targeted metabolomics. Clin
Chim Acta 2021;518:59–69.
39 Falaq N, Smita JRahuletal. Effect of oral contraceptive pills on the
blood serum enzymes and DNA damage in lymphocytes among
users. Indian J Clin Biochem 2016;31:294–301.
40 Brann DW, Mahesh VB. Excitatory amino acids: evidence for a role in
the control of reproduction and anterior pituitary hormone secretion.
Endocr Rev 1997;18:678–700.
41 Fry A, Littlejohns TJ, Sudlow C, etal. Comparison of
sociodemographic and health- related characteristics of UK Biobank
participants with those of the general population. Am J Epidemiol
2017;186:1026–34.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from