ArticlePDF Available

Large-scale analysis to identify risk factors for ovarian cancer

Authors:
  • University of South Australia / South Australian Health Medical Research Institute

Abstract and Figures

Objective Ovarian cancer is characterized by late-stage diagnoses and poor prognosis. We aimed to identify factors that can inform prevention and early detection of ovarian cancer. Methods We used a data-driven machine learning approach to identify predictors of epithelial ovarian cancer from 2920 input features measured 12.6 years (IQR 11.9 to 13.3 years) before diagnoses. Analyses included 221 732 female participants in the UK Biobank without a history of cancer. During the follow-up 1441 women developed ovarian cancer. For factors that contributed to model prediction, we used multivariate logistic regression to evaluate the association with ovarian cancer, with evidence for causality tested by Mendelian randomization (MR) analyses in the Ovarian Cancer Genetics Consortium (25 509 cases). Results Greater parity and ever-use of oral contraception were associated with lower ovarian cancer risk (ever vs never OR 0.74, 95% CI 0.66 to 0.84). After adjustment for established risk factors, greater height, weight, and greater red blood cell distribution width were associated with increased ovarian cancer risk, while higher aspartate aminotransferase levels and mean corpuscular volume were associated with lower risk. MR analyses confirmed observational associations with anthropometric/adiposity traits (eg, body fat percentage per standard deviation (SD); OR inverse-variance weighted (ORIVW) 1.28, 95% CI 1.13 to 1.46) and aspartate aminotransferase (ORIVW 0.87, 95% CI 0.78 to 0.98). MR also provided genetic evidence for a protective association of higher total serum protein on ovarian cancer, higher lymphocyte count on serous and endometrioid ovarian cancer, and greater forced expiratory volume in 1 s on serous ovarian cancer among other findings. Conclusions This study shows that certain risk factors for ovarian cancer are modifiable, suggesting that weight reduction and interventions to reduce the number of ovulations may provide potential for future prevention. We also identified blood biomarkers associated with ovarian cancer years before diagnoses, warranting further investigation.
Content may be subject to copyright.
1
MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Large- scale analysis to identify risk factors
for ovariancancer
Iqbal Madakkatel ,1,2 Amanda L Lumsden ,1,2 Anwar Mulugeta ,1,2,3 Johanna Mäenpää,4
Martin K Oehler,5,6 Elina Hyppönen 1,2
Additional supplemental
material is published online
only. To view, please visit the
journal online (https:// doi. org/
10. 1136/ ijgc- 2024- 005424).
For numbered afliations see
end of article.
Correspondence to
Professor Elina Hyppönen,
Australian Centre for Precision
Health, University of South
Australia, c/o SAHMRI, GPO
BOX 2471, Adelaide, SA 5001,
Australia; Elina. Hypponen@
unisa. edu. au
IM and ALL contributed equally.
Received 27 February 2024
Accepted 3 June 2024
To cite: MadakkatelI,
LumsdenAL, MulugetaA,
etal. Int J Gynecol Cancer
Published Online First: [please
include Day Month Year].
doi:10.1136/ijgc-2024-
005424
Original research
© IGCS and ESGO 2024.
Re- use permitted under CC BY.
Published by BMJ.
Original research
Editorials
Joint statement
Society statement
Meeting summary
Review articles
Consensus statement
Clinical trial
Tumor board
Video articles
Images
Pathology archives
Corners of the world
Commentary
Letters
ijgc.bmj.com
INTERNATIONAL JOURNAL OF
GYNECOLOGICAL CANCER
ABSTRACT
Objective Ovarian cancer is characterized by late- stage
diagnoses and poor prognosis. We aimed to identify factors
that can inform prevention and early detection of ovarian
cancer.
Methods We used a data- driven machine learning
approach to identify predictors of epithelial ovarian cancer
from 2920 input features measured 12.6 years (IQR
11.9 to 13.3 years) before diagnoses. Analyses included
221 732 female participants in the UK Biobank without
a history of cancer. During the follow- up 1441 women
developed ovarian cancer. For factors that contributed to
model prediction, we used multivariate logistic regression
to evaluate the association with ovarian cancer, with
evidence for causality tested by Mendelian randomization
(MR) analyses in the Ovarian Cancer Genetics Consortium
(25 509 cases).
Results Greater parity and ever- use of oral contraception
were associated with lower ovarian cancer risk (ever vs
never OR 0.74, 95% CI 0.66 to 0.84). After adjustment
for established risk factors, greater height, weight, and
greater red blood cell distribution width were associated
with increased ovarian cancer risk, while higher aspartate
aminotransferase levels and mean corpuscular volume
were associated with lower risk. MR analyses conrmed
observational associations with anthropometric/adiposity
traits (eg, body fat percentage per standard deviation (SD);
OR inverse- variance weighted (ORIVW) 1.28, 95% CI 1.13
to 1.46) and aspartate aminotransferase (ORIVW 0.87, 95%
CI 0.78 to 0.98). MR also provided genetic evidence for
a protective association of higher total serum protein on
ovarian cancer, higher lymphocyte count on serous and
endometrioid ovarian cancer, and greater forced expiratory
volume in 1 s on serous ovarian cancer among other
ndings.
Conclusions This study shows that certain risk factors
for ovarian cancer are modiable, suggesting that weight
reduction and interventions to reduce the number of
ovulations may provide potential for future prevention.
We also identied blood biomarkers associated with
ovarian cancer years before diagnoses, warranting further
investigation.
INTRODUCTION
Ovarian cancer accounts for more deaths than
any other gynaecological cancer.1 The prognosis
of ovarian cancer is typically poor, with most cases
(~70%) diagnosed at stage 3 or 42 when the 5- year
survival rate is less than 30%. In contrast, the 5- year
survival rate for stage 1 diagnoses is more than 90%,
underscoring the importance of early detection. While
there is a genetic component to ovarian cancer risk,3
the vast majority of ovarian cancer cases are sporadic.
Factors such as use of oral contraception and medi-
cations including aspirin and levonorgestrel4 have
been linked to a lower ovarian cancer risk; however,
evidence for a role of environmental and lifestyle
factors is limited. There is great need for research to
understand predictors and risk factors that associate
with long- term susceptibility to ovarian cancer, with
potential to improve risk stratication, earlier detec-
tion, and strategies to prevent ovarian cancer.
Recent increases in the size and scale of prospec-
tive cohort studies offer unique opportunities for
WHAT IS ALREADY KNOWN ON THIS TOPIC
There are several known risk factors for ovarian
cancer, but little that women can do to mitigate
ovarian cancer risk by changes to lifestyle. The only
effective prevention is ovary/fallopian tube remov-
al, which is not an appropriate solution for every
woman. Diagnosis is commonly done at a late stage
when prognosis is poor. No prior large- scale studies
have examined thousands of participant character-
istics for their contribution to ovarian cancer predic-
tion to identify potential risk factors.
WHAT THIS STUDY ADDS
This study validates several established risk/pro-
tective factors and discusses ovarian cancer risk
factors in the context of the risk factors of cancer in
general. It identies the roles of higher weight, oral
contraceptives, and parity as risk factors for ovarian
cancer and shows that biomarkers measured sev-
eral years before the diagnosis can predict future
diagnosis.
HOW THIS STUDY MIGHT AFFECT RESEARCH,
PRACTICE OR POLICY
This study shows that certain risk factors for ovarian
cancer are modiable, suggesting that weight re-
duction and interventions to reduce the number of
ovulations may provide potential targets for future
prevention strategies. It identies several blood bio-
markers that are associated with ovarian cancer risk
years before diagnoses, which need to be investi-
gated for underlying mechanisms and their potential
to support early diagnosis.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
2MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
discovering new risk factors even for relatively rare diseases such
as ovarian cancer for which effective prevention strategies are still
lacking. Where information has been obtained from individuals
before disease diagnosis, new machine learning5 approaches can
be used to predict further disease risks. There are several studies
using machine learning in the context of ovarian cancer; however,
they have mostly focused on diagnostics6 or investigated factors
associated with disease prognosis.7–10 To date, studies focused on
risk prediction have focused on specic population groups.11 12 In the
context of disease risk prediction, machine learning is attractive as
it is able to identify potential risk factors from large volumes of data,
while remaining capable of handling missing information, complex
interactions, and nonlinear relationships. In this study, we imple-
ment hypothesis- free analyses that combine machine learning and
statistical approaches in a group of over 200 000 women, with the
overall objective of identifying novel actionable targets for ovarian
cancer prevention. To the best of our knowledge, this is the rst
study aiming to identify novel risk factors from among thousands of
characteristics, all collected before the diagnosis of ovarian cancer.
METHODS
Participants
The UK Biobank includes over 500 000 participants recruited
between 2006 and 2010 (aged 37–73 years) through 22 assess-
ment centers across England, Wales, and Scotland.13 The baseline
data collection covered touchscreen questionnaires, face- to- face
interviews, physical measurements, and blood and urine collec-
tion for genetic assays and biomarker assessments. Information
on disease outcomes was obtained through linkage to cancer
registrations and hospital admissions.13 Restricting the study to
female participants with active consent and excluding women with
a history of cancer (n=51 587) left 221 732 eligible women for the
analysis (Online Supplemental Figure 1). These included 1441 inci-
dent ovarian cancer cases (our main outcome) who were identi-
ed via data linkage to cancer registrations and hospital records,
coded using International Classication of Diseases (ICD) version
10.14 Cancer registry data (available for 1120 cases) was used for
ovarian cancer subtype classication. For possible predictors, we
only included cross- sectional phenotypic data, including informa-
tion obtained using touchscreen questionnaires, clinical exami-
nations, and biomarker assays collected at baseline assessment
(median of 12.6 years (IQR 11.9–13.3 years) before ovarian cancer
diagnosis). After excluding highly correlated features (|r|≥0.9) we
had 2920 possible predictors (exposures) for analyses (Online
Supplemental Tables 1 and 2). Online Supplemental Methods has
more details on the study population and the features considered.
Model Development and Statistical Analyses
As shown in Figure 1A, we conducted the main analyses in two
stages. In the rst stage, we used machine learning to screen for
risk factors that contribute to the probability of developing ovarian
cancer during the follow- up.15 16 In the second stage, we examined
the direction, strength, and robustness of the association between
risk factors identied as making the most important contribution
to the model in stage 1 (top 3% of features) and ovarian cancer
by implementing epidemiological analyses. In epidemiological
analyses, we assessed statistical signicance using false discovery
rate (FDR) correction17 to account for multiple testing. The machine
learning algorithm used in stage 1 was gradient boosting deci-
sion trees (GBDT, CatBoost implementation)18 19 that use a series
of decision trees to achieve the most accurate prediction of the
outcome (Figure1B). Risk factors that made an important contribu-
tion to t of the GBDT models were identied using SHAP (SHapley
Additive exPlanation) values,20 which allowed us to rank the risk
factors based on their contribution and to identify the most ‘impor-
tant’ features to be taken to stage 2 analyses. Stage 2 analyses
started with logistic regression modeling, including two adjustment
strategies. In the ‘basic adjusted’ model, we accounted for key
confounders and measures related to study structure, including
adjustments for age, ethnicity, assessment center, year of attending
the center, and Townsend deprivation index, and, for all biomarkers,
fasting time and sample aliquot. Logistic regression analyses were
repeated with further adjustment for conceptually relevant risk
factors that were associated with ovarian cancer in the UK Biobank:
family history of breast cancer/prostate cancer, ever- use of the
contraceptive pill, parity, tubal ligation, and household income (‘risk
factor adjusted models’) (Online Supplemental Figure 2 and Table
3). In sensitivity analyses to explore whether undiagnosed tumors
might be responsible for the observed associations, we repeated the
basic analyses after excluding ovarian cancer cases reported within
the rst 2 years after the baseline assessment. We also investigated
whether menopause interacted (p<0.05 after FDR correction) with
trait–ovarian cancer associations (Online Supplemental Methods).
All analyses were repeated to assess associations of SHAP-
important features with the main ovarian cancer subtypes (Online
Supplemental Figure 1). Furthermore, we conducted Mendelian
randomization (MR) analyses to obtain proof of principle evidence
for a causal association for identied risk factors. We used the
OpenGWAS21 repository to identify genetic variants approximating
the exposures, while summary- level data for evaluating the genetic
association with ovarian cancer outcomes (including ovarian cancer
subtypes) were obtained from the Ovarian Cancer Association
Consortium (OCAC).22 We used inverse- variance weighted (IVW)
MR23 as the primary method in the absence of evidence on direc-
tional pleiotropy. For exposures with a single genetic instrument,
we used the Wald ratio method. For exposures instrumented by ≥3
variants, the intercept test from MR- Egger was used to assess the
presence of pleiotropy (p<0.05), and where detected, we present
estimates from MR- Egger analyses.
RESULTS
During the median follow- up of 12.6 years (IQR 11.9–13.3 years),
there were 1441 new ovarian cancer cases and the remaining
220 291 participants were considered as controls (Table1). Figure2
shows the relative contribution of each feature category to the
ovarian cancer prediction, with full information in Online Supple-
mental Results (page 8). In Online Supplemental Table 4 we show
information on the 87 features (top 3%) that were taken forward to
logistic regression analyses.
Phenotypic Observational Analyses
After basic adjustment and FDR correction, several female- specic
features were associated with ovarian cancer risk including
“ever- use of oral contraceptives” (OR 0.74, 95% CI 0.66 to 0.84),
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
3
MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
“older age of last use of the contraceptive pill” (45+ years vs <25
years: OR 0.57, 95% CI 0.41 to 0.80), and number of live births
(2+ vs none: OR 0.61, 95% CI 0.54 to 0.69) (Figure3A). None of
the SHAP- important dietary factors showed an association with
ovarian cancer. Anthropometric traits were associated with ovarian
cancer risk, and an elevated risk of ovarian cancer was seen both
by greater standing height (per 1 SD higher, OR 1.13, 95% CI 1.07
to 1.20) and weight (OR 1.08, 95% CI 1.03 to 1.14). Bilateral oopho-
rectomy, the second most SHAP- important feature overall, showed
an expected strong association with lower ovarian cancer risk (OR
0.21, 95% CI 0.14 to 0.31).
Online Supplemental Table 5 shows SHAP- important biomarkers
with their mean and SD. Biomarkers were strongly represented
among important features (Figure 3B). Higher levels of aspartate
aminotransferase (per 1 SD, OR 0.89, 95% CI 0.82 to 0.96) and
alanine aminotransferase (OR 0.89, 95% CI 0.83 to 0.96) were
associated with lower ovarian cancer risk. Several red blood cell
features were associated with ovarian cancer risk, including mean
Figure 1 Analytical strategy. (A)shows the GBDT- SHAP pipeline followed by logistic regression models and Mendelian
randomization (MR) analysis. (B)shows a schematic diagram of successively adding new decision trees to the ensemble of
decision trees in a GBDT model. GBDT, gradient boosting decision trees; PHESANT, PHEnome Scan ANalysis Tool; SHAP,
SHapley Additive exPlanation.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
4MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
corpuscular volume (OR 0.92, 95% CI 0.87 to 0.97), mean corpus-
cular hemoglobin (OR 0.93, 95% CI 0.88 to 0.98), and variation
in the size of red blood cells (red blood cell distribution width, OR
1.09, 95% CI 1.04 to 1.15). Having a higher neutrophil percentage
was also associated with higher ovarian cancer risk (OR 1.08,
95% CI 1.03 to 1.14). Most biomarker associations were only
modestly attenuated after risk factor adjustment (Online Supple-
mental Figure 3 and Table 4). In secondary analyses, we investi-
gated associations with risk of ovarian cancer subtypes including
serous (n=616), endometrioid (n=59), clear cell (n=50), and muci-
nous ovarian cancer (n=43) (Online Supplemental Results (page 8),
Online Supplemental Table 6). Figure4 lists all the features asso-
ciated with overall ovarian cancer or at least one ovarian cancer
subtype. Further sensitivity analyses are presented in the Online
Supplemental Results (page 9) and Online Supplemental Table 7.
MR analyses
MR analyses on overall ovarian cancer in the UK Biobank supported
adverse effects of weight (IVW, per 1 SD higher; OR 1.14, 95% CI
1.04 to 1.26), body fat percentage (IVW; OR 1.28, 95% CI 1.13 to
1.46), basal metabolic rate (IVW; OR 1.24, 95% CI 1.10 to 1.39), and
several other measures for adiposity, with directionally consistent
ndings from consortia meta- analyses (Online Supplemental Table
8Online supplemental table 8). MR analyses in the UK Biobank
supported the observational association between higher aspartate
aminotransferase and lower ovarian cancer risk (IVW: OR 0.87, 95%
CI 0.78 to 0.98), and additionally supported a lower ovarian cancer
risk by higher total serum protein (UK Biobank IVW: OR 0.86, 95%
CI 0.78 to 0.96), while consortia information was not available for
these biomarkers. The genetic association between red blood cell
width and ovarian cancer was directionally inconsistent with the
observational analyses (UK Biobank Egger: OR 0.84, 95% CI 0.74 to
Table 1 Distribution of baseline characteristics and
selected risk factors used in the risk factor adjusted models
for incident ovarian cancer cases and controls in the UK
Biobank
Baseline characteristics
and selected risk factors
Controls
N
Cases
N (%) P- value
Overall 220 291 1441 (0.65)
Age (years)
<50 57 363 203 (0.35) 3.7E- 46
50–59.9 77 376 431 (0.55)
60–69.9 84 736 790 (0.92)
70+ 816 17 (2.04)
Ethnicity
African ancestry 4081 16 (0.39) 0 .04
Asian 4863 21 (0.43)
White European 206 515 1373 (0.66)
Other/mixed/unknown* 4832 31 (0.64)
Townsend index†
Q1 (low deprivation) 55 008 358 (0.65) 0.69
Q2 54 994 374 (0.68)
Q3 54 996 366 (0.66)
Q4 (high deprivation) 55 022 343 (0.62)
Missing data 271 0 (0.00)
Education
None 35 014 286 (0.81) 9.63E- 05
NVQ/CSE/A- levels 99 282 640 (0.64)
Degree/professional 81 806 487 (0.59)
Missing data 4189 28 (0.66)
Family history of BC/PC
No 182 779 1152 (0.63) 0.002
Yes 37 512 289 (0.76)
Parity
0 41 162 339 (0.82) 6.96E- 05
1 29 266 181 (0.61)
2/3 135 219 831 (61)
3+ 13 937 87 (0.62)
Missing data 707 3 (0.42)
Contraceptive pill use
No 39 875 389 (0.97) 2.36E- 18
Yes 179 230 1044 (0.58)
Missing data 1186 8 (0.67)
Tubal ligation
No 203 153 1342 (0.69) 0.235
Yes 16 802 98 (0.17)
Missing data 336 1 (0.30)
Annual household income (GBP)
<18 000 42 650 343 (0.80) 2.40E- 14
18 000–30 999 46 754 367 (0.78)
Continued
Baseline characteristics
and selected risk factors
Controls
N
Cases
N (%) P- value
31 000–51 999 47 034 257 (0.54)
52 000–100 000 35 275 150 (0.42)
>100 000 9121 41 (0.45)
Missing data 39 457 283 (0.71)
Data presented for basic characteristics and selected risk
factors used in the fully adjusted models. P- values are from
chi- square tests of independence between the individual
characteristics and the overall ovarian cancer outcome,
excluding missing values.
*The group ‘Other/mixed/unknown’ included all participants
who did not identify as White European, Asian (including
Chinese) or African ancestry (African/Caribbean/other), who
identied under more than one ethnic group, or who did not
provide information on ethnic background.
†The Townsend index is a measure of material deprivation
within a population, incorporating four variables, namely
unemployment, non- car ownership, non- house ownership,
and household overcrowding. A greater Townsend index
score implies a greater degree of deprivation.
BC, breast cancer; CSE, Certicate of Secondary Education;
GBP, British pound; NVQ, National Vocational Qualication;
PC, prostate cancer.
Table 1 Continued
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
5
MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
0.95). For all ovarian cancer subtypes, MR analyses provided some
evidence for a causal association with anthropometric/adiposity
traits (Online Supplemental Table 8). For serous ovarian cancer
there was evidence supporting a protective effect of greater forced
expiratory volume in 1 s (UK Biobank Egger: OR 0.46, 95% CI 0.23
to 0.91), while clear cell ovarian cancer risk was potentially linked
to strong associations with various height indices. Greater lympho-
cyte count showed protective effects for serous and endometrioid
ovarian cancer, while later age of menopause was associated with
greater endometrioid and clear cell ovarian cancer risk. There were
several other risk factors for which we saw some genetic evidence
for association with ovarian cancer or its subtypes, with full MR
results presented in Online Supplemental Results (page 10) and
Online Supplemental Table 8.
DISCUSSION
Summary of Main Results
Findings from our large- scale, hypothesis- free, machine learning
study suggest that the risk of ovarian cancer is likely to be at least
in part modiable, and also that it may be possible to develop
predictive blood tests that can identify the cancer in its early stages
of development. Many of the identied risk factors for ovarian
cancer have also been associated with the broader risk of cancer.
This notion is supported by features such as older age, greater
height, excess weight, family disease history, and some biomarkers
being picked up by the current analyses, as well as by our earlier
work on overall cancer risk using similar methodologies.24 We
also identied risk factors that appear more specic to ovarian
cancer and, importantly, our analyses strengthen observations for
a likely protective role of oral contraceptive use and higher parity.
The nding that 20% of the features suggested by the model as
important for ovarian cancer were blood biomarkers is notable, and
can potentially inform on mechanisms associated with tumorigen-
esis, and support the development of predictive blood tests. More
work is needed to conrm the importance and role of individual
biomarkers in ovarian cancer.
Results in the Context of Published Literature
As validation of our machine learning approach, the GBDT model
picked up bilateral oophorectomy as the second most important
feature for predicting ovarian cancer after age. Both ever- use and
older age of last use of oral contraceptives are associated with
lower risk, which is consistent with previously reported protec-
tive effects of oral contraceptive use.25 This may be explained by
ovulation inhibition26 in line with the “incessant ovulation” theory
of ovarian cancer,27 28 and may be related to reduced exposure to
carcinogenic and transformative factors present in ovulatory folli-
cular uid such as reactive oxygen species, insulin- like growth
factor 2, and hepatocyte growth factor.26 Increased ovulatory cycles
may also explain the observed association between number of
live births and ovarian cancer, where nulliparous women had the
greatest risk. This association was the strongest with the clear cell
ovarian cancer subtype, consistent with a previous study29 and
ndings suggesting an association between infertility and higher
ovarian cancer risk.30
Although none of the dietary factors were observationally associ-
ated with ovarian cancer, the MR approach supported a protective
Figure 2 Contribution of each category of information to the model prediction based on the distribution of feature importance.
Features have been restricted to 87 identied potential predictors of ovarian cancer which were in the top 3% among all
predictors, with total importance standardized to 100%.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
6MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
effect of higher serum omega 3 fatty acids (enriched in sh oil,
coming up in the top 3% of important features in our analyses) on
risk of overall, endometrioid, and clear cell ovarian cancer, consis-
tent with a previously reported protective effect of omega 3 fatty
acid N3- docosapentaenoic acid on endometrioid ovarian cancer.31
Fish oil may protect against ovarian cancer via anti- inammatory
effects targeting prostaglandin pathways.32
Among anthropometric traits, our study conrmed previ-
ously reported adverse associations of several adiposity- related
measures33 and taller height34 with ovarian cancer, with taller
height being particularly risky for clear cell ovarian cancer.34 35 Taller
height has been linked to increased risk for many cancers,36 and
while the mechanisms explaining these connections are not well-
established, one possibility is that growth hormones and factors
involved in growing taller such as insulin and insulin- like growth
factors may also promote tumor growth.37
Biomarkers were highly represented among the features deemed
important for ovarian cancer prediction. Higher levels of aspar-
tate and alanine aminotransferases that are central to alanine,
aspartate, and glutamate metabolism were associated with
lower ovarian cancer risk, with MR supporting a protective effect
for aspartate aminotransferase. Interestingly, a recent screen of
epithelial ovarian cancer- related metabolic biomarkers in ovarian
cancer patients using a machine learning approach also identied
“alanine, aspartate, and glutamate metabolism” as one of the ve
relevant metabolic pathways.6 Using metabolic network analyses,
another study identied ve metabolites among serum amino acid
and organic acid proles that helped distinguish between epithe-
lial ovarian cancer and healthy controls, and implicated “alanine,
aspartate, and glutamate metabolism” and “D- glutamine and
G- glutamate metabolism” pathways in epithelial ovarian cancer
pathogenesis.38 Further, oral contraceptive use has been linked to
higher aspartate aminotransferase levels, increasing with duration
of use.39 Further investigation is necessary to further understand
how aspartate aminotransferase might be protective for ovarian
cancer. One possibility is via modulating levels of glutamate, which
has been implicated in the control of major reproductive and neuro-
endocrine processes.40
Greater counts of lymphocytes and neutrophils showed protective
association with ovarian cancer risk, as did inammation marker
Figure 3 Associations between the identied potential predictors and ovarian cancer risk. (A)shows association for female-
specic features and physical measurements; (B)shows biomarkers. Odds ratios (ORs) and 95% condence itervals (CIs) for
ovarian cancer risk are from logistic regression analyses, adjusted for basic covariates (Model 1: age, assessment center, year
attending assessment center, Townsend deprivation index, and ethnicity) and other ovarian cancer risk factors (Model 2: family
history of breast cancer or prostate cancer, use of oral contraceptives, parity, tubal ligation, household income).
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
7
MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
C- reactive protein for the endometrioid subtype, strengthening a
suggestive association reported previously.35 The evidence for a
protective effect of higher serum protein in MR analyses may be
linked to higher immunoglobulins, given that the abundant serum
protein, albumin, was not deemed SHAP- important. In contrast,
our MR analyses suggested an adverse effect of a higher count
of eosinophils, innate immune cells involved in parasite response,
allergy, and asthma pathology.
Strengths and Weaknesses
An important strength is that our GBDT- SHAP pipeline was able to
identify predictive features ‘hidden’ among thousands of features
Figure 4 Odds ratios for associations between potential predictors with overall ovarian cancer and the four subtypes of
ovarian cancer (OC). Number of cases are given at the top of the gure; associations indicated in bold are signicant after
multiple testing correction. ‘Illnesses of mother – none’ represents having no diseases among the group of diseases including
breast cancer and other conditions. The features sitting height, standing height, and weight are in standard deviations. Age
when last used contraceptive pill has been scaled by dividing by ve.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
8MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
(input to a single model) considering nonlinearity and interactions.
Another strength is in the scale and rich data available from the
UK Biobank, where biomarkers, clinical assessments, and ques-
tionnaire data were available for every participant, reecting the
time before the diagnoses for all ovarian cancer cases. However,
there are also limitations, such as that the UK Biobank is not repre-
sentative of the current UK population due to healthy volunteer
bias,41 and there is an over- representation of people from White
European backgrounds. All blood measures were assessed only
once, and a high level may not necessarily reect a prolonged
elevation. The variables included in our models were also conned
to available phenotypic information, and in addition to genetic risk
markers there are likely to be other indicators reecting differ-
ences in ovarian cancer risk which were not included. Furthermore,
while we used a prospective study design, causality cannot be
concluded for our observational associations due to possibility of
residual confounding and reverse causality. However, support for
many associations was obtained from the MR analyses, with the
consortia analyses allowing us to test for associations with rarer
ovarian cancer subtypes for which we were likely underpowered
when using data from the UK Biobank.
Implications for Practice and Future Research
This study shows that certain risk factors for ovarian cancer are
modiable, suggesting that weight reduction and approaches to
lowering the number of ovulations may provide potential targets
for future prevention strategies. Our study also strongly supports
the feasibility of predictive testing for ovarian cancer, strengthening
hopes for early diagnoses. As we discuss in this article, there are
plausible mechanisms that may link many of the biomarkers identi-
ed by our study with ovarian cancer risk; however, further studies
are needed to conrm their role and clinical utility. It is important to
note that despite the extensive information available in this study,
the variables included in our models have not included all relevant
predictors, which together with genetic risk markers will be of
great interest in future studies developing risk prediction models to
screen for women at high risk of ovarian cancer.
CONCLUSIONS
In conclusion, bilateral oophorectomy, age, and some biomarkers
such as aspartate aminotransferase contributed the most to the
prediction of ovarian cancer. In addition to established risk-
modulating factors including adiposity and parity, the predictive
importance and favorable associations of measures of contra-
ceptive pill use provide further impetus for their investigation as
a potential protective agent. Associations between several blood
biomarkers and subsequent ovarian cancer risk support the possi-
bility of using blood tests to aid ovarian cancer prediction and early
diagnoses.
Author afliations
1Australian Centre for Precision Health, Unit of Clinical and Health Sciences,
University of South Australia, Adelaide, South Australia, Australia
2South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South
Australia, Australia
3Department of Pharmacology and Clinical Pharmacy, College of Health Science,
Addis Ababa University, Addis Ababa, Ethiopia
4Faculty of Medicine and Medical Technology, Tampere University, Tampere, Finland
5Department of Gynaecological Oncology, Royal Adelaide Hospital, Adelaide, South
Australia, Australia
6Adelaide Medical School, Robinson Research Institute, University of Adelaide,
Adelaide, South Australia, Australia
Acknowledgements We thank the UK Biobank participants and administrators
for making this study possible. We thank our consumer members, Ms Stephanie
Newell, Ms Jacinta Frawley Werger, and Miss Jemima Leydon, for their valuable
insights and feedback. The funders had no role in the design of the study; the
collection, analysis, and interpretation of the data; the writing of the manuscript; or
the decision to submit the manuscript for publication. The authors have no conicts
of interest to declare.
Contributors IM: methodology, software, data curation, formal analysis,
investigation; visualization; writing – original draft; writing – review and editing.
ALL: investigation; methodology; project administration; visualization, writing –
original draft; writing – review and editing. AM: data curation, formal analysis,
methodology, investigation; writing – review and editing. JM: funding acquisition;
investigation; writing – review and editing. MKO: funding acquisition; investigation;
writing – review and editing. EH: conceptualization; funding acquisition;
investigation; methodology; formal analysis; supervision; writing – original draft;
writing – review and editing. EH is the guarantor of this work.
Funding This work is supported by the Medical Research Future Fund, Australia
(Grant 2007431). EH is funded by the National Health and Medical Research
Council (Australia) Leadership Award (GNT2025349).
Competing interests None declared.
Patient consent for publication Not applicable.
Ethics approval This study involves human participants and the UK Biobank
project was approved by the National Information Governance Board for Health
and Social Care and North West Multi- centre Research Ethics Committee (11/
NW/0382). The study was conducted in accordance with the Declaration of
Helsinki. Participants provided electronic consent to use their anonymized data
and samples for health- related research, to be recontacted for further substudies,
and for the UK Biobank to access their health- related records. Participants gave
informed consent to participate in the study before taking part. This study was
conducted under project number 89630.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement Data may be obtained from a third party and are not
publicly available. All data are available through the UK Biobank repository upon
application. This study was conducted under project number 89630. The source
code is made available through GitHub (https://github.com/madakkmi/OC).
Supplemental material This content has been supplied by the author(s). It has
not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been
peer- reviewed. Any opinions or recommendations discussed are solely those
of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and
responsibility arising from any reliance placed on the content. Where the content
includes any translated material, BMJ does not warrant the accuracy and reliability
of the translations (including but not limited to local regulations, clinical guidelines,
terminology, drug names and drug dosages), and is not responsible for any error
and/or omissions arising from translation and adaptation or otherwise.
Open access This is an open access article distributed in accordance with the
Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits
others to copy, redistribute, remix, transform and build upon this work for any
purpose, provided the original work is properly cited, a link to the licence is given,
and indication of whether changes were made. See: https://creativecommons.org/
licenses/by/4.0/.
ORCID iDs
IqbalMadakkatel http://orcid.org/0000-0003-2339-5917
Amanda LLumsden http://orcid.org/0000-0002-0214-6498
AnwarMulugeta http://orcid.org/0000-0002-8018-3454
ElinaHyppönen http://orcid.org/0000-0003-3670-9399
REFERENCES
1 Prat J. New insights into ovarian cancer pathology. Ann Oncol
2012;23 Suppl 10:x111–7.
2 Roett MA, Evans P. Ovarian cancer: an overview. Am Fam Physician
2009;80:609–16.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
9
MadakkatelI, etal. Int J Gynecol Cancer 2024;0:1–9. doi:10.1136/ijgc-2024-005424
Original research
3 Walsh T, Casadei S, Lee MK, etal. Mutations in 12 genes for
inherited ovarian, fallopian tube, and peritoneal carcinoma identied
by massively parallel sequencing. Proc Natl Acad Sci U S A
2011;108:18032–7.
4 Soini T, Hurskainen R, Grénman S, etal. Impact of levonorgestrel-
releasing intrauterine system use on the cancer risk of the ovary and
fallopian tube. Acta Oncol 2016;55:1281–4.
5 Mitchell TM. Machine learning. New York: McGraw- Hill, 1997.
6 Yao JZ, Tsigelny IF, Kesari S, etal. Diagnostics of ovarian cancer
via metabolite analysis and machine learning. Integr Biol (Camb)
2023;15:zyad005.
7 Tseng C- J, Lu C- J, Chang C- C, etal. Integration of data mining
classication techniques and ensemble learning to identify risk
factors and diagnose ovarian cancer recurrence. Artif Intell Med
2017;78:47–54.
8 Ahamad MM, Aktar S, Uddin MJ, etal. Early- stage detection of
ovarian cancer based on clinical data using machine learning
approaches. J Pers Med 2022;12:1211.
9 Hossain MA, Saiful Islam SM, Quinn JMW, etal. Machine learning
and bioinformatics models to identify gene expression patterns of
ovarian cancer associated with disease progression and mortality.
J Biomed Inform 2019;100:103313.
10 Barber EL, Garg R, Persenaire C, etal. Natural language processing
with machine learning to predict outcomes after ovarian cancer
surgery. Gynecol Oncol 2021;160:182–6.
11 Chao X, Wang S, Lang J, etal. The application of risk models based
on machine learning to predict endometriosis- associated ovarian
cancer in patients with endometriosis. Acta Obstet Gynecol Scand
2022;101:1440–9.
12 Comes MC, Arezzo F, Cormio G, etal. An explainable machine
learning ensemble model to predict the risk of ovarian cancer
in BRCA- mutated patients undergoing risk- reducing salpingo-
oophorectomy. Front Oncol 2023;13:1181792.
13 Sudlow C, Gallacher J, Allen N, etal. UK Biobank: an open access
resource for identifying the causes of a wide range of complex
diseases of middle and old age. PLOS Med 2015;12:e1001779.
14 Berek JS, Renz M, Kehoe S, etal. Cancer of the ovary, fallopian
tube, and peritoneum: 2021 update. Int J Gynaecol Obstet
2021;155 Suppl 1:61–85.
15 Madakkatel I, Zhou A, McDonnell MD, etal. Combining machine
learning and conventional statistical approaches for risk factor
discovery in a large cohort study. Sci Rep 2021;11:22997.
16 Bolón- Canedo V, Sánchez- Maroño N, Alonso- Betanzos A. Feature
selection for high- dimensional data. Prog Artif Intell 2016;5:65–75.
17 Benjamini Y, Hochberg Y. Controlling the false discovery rate: a
practical and powerful approach to multiple testing. J R Stat Soc
Series B 1995;57:289–300.
18 Prokhorenkova L, Gusev G, Vorobev A, etal. CatBoost: unbiased
boosting with categorical features. Advances in neural information
processing systems2018;31.
19 Friedman JH. Greedy function approximation: a gradient boosting
machine. Ann Statist 2001;29:1189–232.
20 Lundberg SM, Erion GG, Lee S- I. Consistent individualized feature
attribution for tree ensembles. arXiv 2018.
21 Hemani G, Zheng J, Elsworth B, etal. The MR- base platform
supports systematic causal inference across the human phenome.
Elife 2018;7:e34408.
22 Phelan CM, Kuchenbaecker KB, Tyrer JP, etal. Identication of 12
new susceptibility loci for different histotypes of epithelial ovarian
cancer. Nat Genet 2017;49:680–91.
23 Bowden J, Del Greco M F, Minelli C, etal. A framework for the
investigation of pleiotropy in two- sample summary data Mendelian
randomization. Stat Med 2017;36:1783–802.
24 Madakkatel I, Lumsden AL, Mulugeta A, etal. Hypothesis- free
discovery of novel cancer predictors using machine learning. Eur J
Clin Invest 2023;53:e14037.
25 Havrilesky LJ, Moorman PG, Lowery WJ, etal. Oral contraceptive
pills as primary prevention for ovarian cancer: a systematic review
and meta- analysis. Obstet Gynecol 2013;122:139–47.
26 Chu T- Y, Khine AA, Wu N- YY, etal. Insulin- like growth factor (IGF)
and hepatocyte growth factor (HGF) in follicular uid cooperatively
promote the oncogenesis of high- grade serous carcinoma from
fallopian tube epithelial cells: dissection of the molecular effects. Mol
Carcinog 2023;62:1417–27.
27 Fathalla MF. Incessant ovulation--a factor in ovarian neoplasia?
Lancet 1971;2:163.
28 Fathalla MF. Incessant ovulation and ovarian cancer - a hypothesis
re- visited. Facts Views Vis Obgyn 2013;5:292–7.
29 Nüesch E, Dale C, Palmer TM, etal. Adult height, coronary heart
disease and stroke: a multi- locus Mendelian randomization meta-
analysis. Int J Epidemiol 2016;45:1927–37.
30 Fan Z, Song H, Yuan R, etal. Genetic predisposition to female
infertility in relation to epithelial ovarian and endometrial cancers.
Postgrad Med J 2023;99:63–8.
31 Si S, Li J, Tewara MA, etal. Identifying causality, genetic correlation,
priority and pathways of large- scale complex exposures of breast
and ovarian cancers. Br J Cancer 2021;125:1570–81.
32 Eilati E, Small CC, McGee SR, etal. Anti- inammatory effects of sh
oil in ovaries of laying hens target prostaglandin pathways. Lipids
Health Dis 2013;12:152.
33 Aune D, Navarro Rosenblatt DA, Chan DSM, etal. Anthropometric
factors and ovarian cancer risk: a systematic review and nonlinear
dose–response meta- analysis of prospective studies. Int J Cancer
2015;136:1888–98.
34 Dixon- Suen SC, Nagle CM, Thrift AP, etal. Adult height is associated
with increased risk of ovarian cancer: a Mendelian randomisation
study. Br J Cancer 2018;118:1123–9.
35 Yarmolinsky J, Relton CL, Lophatananon A, etal. Appraising the role
of previously reported risk factors in epithelial ovarian cancer risk: a
Mendelian randomization analysis. PLoS Med 2019;16:e1002893.
36 Lai FY, Nath M, Hamby SE, etal. Adult height and risk of 50
diseases: a combined epidemiological and genetic analysis. BMC
Med 2018;16:187.
37 Stefan N, Häring H- U, Hu FB, etal. Divergent associations of
height with cardiometabolic disease and cancer: epidemiology,
pathophysiology, and global implications. Lancet Diabetes
Endocrinol 2016;4:457–67.
38 Wang X, Zhao X, Zhao J, etal. Serum metabolite signatures of
epithelial ovarian cancer based on targeted metabolomics. Clin
Chim Acta 2021;518:59–69.
39 Falaq N, Smita JRahuletal. Effect of oral contraceptive pills on the
blood serum enzymes and DNA damage in lymphocytes among
users. Indian J Clin Biochem 2016;31:294–301.
40 Brann DW, Mahesh VB. Excitatory amino acids: evidence for a role in
the control of reproduction and anterior pituitary hormone secretion.
Endocr Rev 1997;18:678–700.
41 Fry A, Littlejohns TJ, Sudlow C, etal. Comparison of
sociodemographic and health- related characteristics of UK Biobank
participants with those of the general population. Am J Epidemiol
2017;186:1026–34.
on July 31, 2024 by guest. Protected by copyright.http://ijgc.bmj.com/Int J Gynecol Cancer: first published as 10.1136/ijgc-2024-005424 on 30 July 2024. Downloaded from
... Furthermore, chronic low-grade inflammation linked to obesity creates a microenvironment conducive to tumor progression. [6][7][8][9] Dietary habits also play a crucial role in ovarian cancer risk. Diets high in animal fats and low in fruits, vegetables, and phytoestrogens have been associated with increased risk. ...
... Specific dietary components, such as isoflavones and flavonoids, are thought to influence cancer risk by modulating estrogen receptor activity and reducing oxidative stress. [7][8][9] Reproductive factors interconnect with these modifiable risks, as multiparity and long-term use of oral contraceptives are known to reduce ovarian cancer risk by decreasing the lifetime number of ovulatory cycles. This reduction in ovulation-associated inflammation and oxidative stress may be particularly protective in women carrying BRCA1 and BRCA2 mutations. ...
... This reduction in ovulation-associated inflammation and oxidative stress may be particularly protective in women carrying BRCA1 and BRCA2 mutations. [6][7][8][9] Environmental exposures add another layer of complexity. Air pollution, including exposure to particulate matter (PM) 2.5, CO, and SO2, has been linked to increased ovarian cancer risk. ...
Article
Full-text available
Environmental and lifestyle factors significantly contribute to gynecological cancers. The risk of ovarian cancer, one the most lethal gynecological cancer, is associated with obesity, poor dietary habits, and environmental pollutants, exacerbating hormonal imbalances, inflammation, and oxidative stress. Protective factors, such as the Mediterranean diet and oral contraceptives, modulate risk by reducing ovulatory cycles, particularly in genetically predisposed women. Uterine cancer is associated with metabolic factors, with obesity driving hormonal disruptions and systemic inflammation. Physical inactivity and diets rich in animal fats increase the risk of endometrial cancer, along with air pollution and microbiome imbalances contribute to endometrial carcinogenesis. Cervical cancer is primarily driven by persistent high‐risk HPV infection, with smoking enhancing viral persistence and oncogenesis. Nutritional deficiencies in antioxidants and folate weaken immune defenses, while vaginal and gut microbiome dysbiosis fosters neoplastic progression. Vulvar and vaginal cancers, though less common, share risk factors such as obesity, smoking, and occupational exposures, disrupting immune responses and epithelial integrity. Microbial imbalances exacerbate these malignancies, creating a pro‐inflammatory microenvironment. The interplay between modifiable factors and genetic predisposition, including high‐penetrance mutations and polygenic risk scores, highlights the complexity complexity of prevention of gynecological cancers. Epigenetic mechanisms, such as DNA methylation and histone modifications, further modulate susceptibility and tumor progression, influenced by environmental and lifestyle exposures. In addition, promoting and supporting healthy lifestyle changes, including smoking cessation, increased physical activity, and a balanced diet, are crucial for improving long‐term outcomes and quality of life in gynecological cancer survivors. Addressing these factors through personalized prevention, leveraging predictive models incorporating genetics and modifiable risks, enables tailored lifestyle interventions and avoidance of environmental exposures. Combined with equitable public health initiatives, these strategies have the potential to reduce the burden of gynecological cancers and improve women's health globally.
... Increases in cortisol levels may be directly or indirectly associated with some of the increased risk for certain types of cancer associated with oral contraception or even its impact on mental health (52)(53)(54)(55)(56). However, the use of oral contraception is also associated with a lower risk of ovarian cancer (57). In our study, the levels of progestogens, estrogens, and several androgens were lower in women using oral contraceptive pills than in postmenopausal women. ...
Article
Full-text available
Steroid hormone levels vary greatly among individuals, between sexes, with age, and across health and disease. What drives variance in steroid hormones and how they vary in individuals over time are not well studied. To address these questions, we measured 17 steroid hormones in a sex-balanced cohort of 949 healthy donors aged 20 to 69 years. We investigated associations between steroid levels and biological sex, age, clinical and demographic data, genetics, and plasma proteomics. Steroid hormone levels were strongly affected by sex and age, and a high number of lifestyle habits. Key observations were the broad impact of hormonal birth control in female donors and the relationship with smoking in male donors. In a 10-year follow-up study, we identified significant associations between steroid hormone levels and health status only in male donors. These observations highlight biological and lifestyle parameters affecting steroid hormones, and underlie the importance of considering sex, age, and potentially gendered behaviors in the treatment of hormone-related diseases.
Article
Full-text available
Introduction It has been estimated that 19,880 new cases of ovarian cancer had been diagnosed in 2022. Most epithelial ovarian cancer are sporadic, while in 15%–25% of cases, there is evidence of a familial or inherited component. Approximately 20%–25% of high-grade serous carcinoma cases are caused by germline mutations in the BRCA1 and BRCA2 genes. However, owing to a lack of effective early detection methods, women with BRCA mutations are recommended to undergo bilateral risk-reducing salpingo-oophorectomy (RRSO) after childbearing. Determining the right timing for this procedure is a difficult decision. It is crucial to find a clinical signature to identify high-risk BRCA-mutated patients and determine the appropriate timing for performing RRSO. Methods In this work, clinical data referred to a cohort of 184 patients, of whom 7.6% were affected by adnexal tumors including invasive carcinomas and intraepithelial lesions after RSSO has been analyzed. Thus, we proposed an explainable machine learning (ML) ensemble approach using clinical data commonly collected in clinical practice to early identify BRCA-mutated patients at high risk of ovarian cancer and consequentially establish the correct timing for RRSO. Results The ensemble model was able to handle imbalanced data achieving an accuracy value of 83.2%, a specificity value of 85.3%, a sensitivity value of 57.1%, a G-mean value of 69.8%, and an AUC value of 71.1%. Discussion In agreement with the promising results achieved, the application of suitable ML techniques could play a key role in the definition of a BRCA-mutated patient-centric clinical signature for ovarian cancer risk and consequently personalize the management of these patients. As far as we know, this is the first work addressing this task from an ML perspective.
Article
Full-text available
Introduction There is currently no satisfactory model for predicting malignant transformation of endometriosis. The aim of this study was to construct and evaluate a risk model incorporating noninvasive clinical parameters to predict endometriosis‐associated ovarian cancer (EAOC) in patients with endometriosis. Material and Methods We enrolled 6809 patients with endometriosis confirmed by pathology, and randomly allocated them to training (n = 4766) and testing cohorts (n = 2043). The proportion of patients with EAOC in each cohort was similar. We extracted a total of 94 demographic and clinicopathologic features from the medical records using natural language processing. We used a machine learning method – gradient‐boosting decision tree – to construct a predictive model for EAOC and to evaluate the accuracy of the model. We also constructed a multivariate logistic regression model inclusive of the EAOC‐associated risk factors using a back stepwise procedure. Then we compared the performance of the two risk‐predicting models using DeLong's test. Results The occurrence of EAOC was 1.84% in this study. The logistic regression model comprised 10 selected features and demonstrated good discrimination in the testing cohort, with an area under the curve (AUC) of 0.891 (95% confidence interval [CI] 0.821–0.960), sensitivity of 88.9%, and specificity of 76.7%. The risk model based on machine learning had an AUC of 0.942 (95% CI 0.914–0.969), sensitivity of 86.8%, and specificity of 86.7%. The machine learning‐based risk model performed better than the logistic regression model in DeLong's test (p = 0.036). Furthermore, in a prospective dataset, the machine learning‐based risk model had an AUC of 0.8758, a sensitivity of 94.4%, and a specificity of 73.8%. Conclusions The machine learning‐based risk model was constructed to predict EAOC and had high sensitivity and specificity. This model could be of considerable use in helping reduce medical costs and designing follow‐up schedules.
Article
Full-text available
One of the common types of cancer for women is ovarian cancer. Still, at present, there are no drug therapies that can properly cure this deadly disease. However, early-stage detection could boost the life expectancy of the patients. The main aim of this work is to apply machine learning models along with statistical methods to the clinical data obtained from 349 patient individuals to conduct predictive analytics for early diagnosis. In statistical analysis, Student’s t-test as well as log fold changes of two groups are used to find the significant blood biomarkers. Furthermore, a set of machine learning models including Random Forest (RF), Support Vector Machine (SVM), Decision Tree (DT), Extreme Gradient Boosting Machine (XGBoost), Logistic Regression (LR), Gradient Boosting Machine (GBM) and Light Gradient Boosting Machine (LGBM) are used to build classification models to stratify benign-vs.-malignant ovarian cancer patients. Both of the analysis techniques recognized that the serumsamples carbohydrate antigen 125, carbohydrate antigen 19-9, carcinoembryonic antigen and human epididymis protein 4 are the top-most significant biomarkers as well as neutrophil ratio, thrombocytocrit, hematocrit blood samples, alanine aminotransferase, calcium, indirect bilirubin, uric acid, natriumas as general chemistry tests. Moreover, the results from predictive analysis suggest that the machine learning models can classify malignant patients from benign patients with accuracy as good as 91%. Since generally, early-stage detection is not available, machine learning detection could play a significant role in cancer diagnosis.
Article
Full-text available
We present a simple and efficient hypothesis-free machine learning pipeline for risk factor discovery that accounts for non-linearity and interaction in large biomedical databases with minimal variable pre-processing. In this study, mortality models were built using gradient boosting decision trees (GBDT) and important predictors were identified using a Shapley values-based feature attribution method, SHAP values. Cox models controlled for false discovery rate were used for confounder adjustment, interpretability, and further validation. The pipeline was tested using information from 502,506 UK Biobank participants, aged 37–73 years at recruitment and followed over seven years for mortality registrations. From the 11,639 predictors included in GBDT, 193 potential risk factors had SHAP values ≥ 0.05, passed the correlation test, and were selected for further modelling. Of the total variable importance summed up, 60% was directly health related, and baseline characteristics, sociodemographics, and lifestyle factors each contributed about 10%. Cox models adjusted for baseline characteristics, showed evidence for an association with mortality for 166 out of the 193 predictors. These included mostly well-known risk factors (e.g., age, sex, ethnicity, education, material deprivation, smoking, physical activity, self-rated health, BMI, and many disease outcomes). For 19 predictors we saw evidence for an association in the unadjusted but not adjusted analyses, suggesting bias by confounding. Our GBDT-SHAP pipeline was able to identify relevant predictors ‘hidden’ within thousands of variables, providing an efficient and pragmatic solution for the first stage of hypothesis free risk factor identification.
Article
Full-text available
In 2014, FIGO’s Committee for Gynecologic Oncology revised the staging of ovarian cancer, incorporating ovarian, fallopian tube, and peritoneal cancer into the same system. Most of these malignancies are high‐grade serous carcinomas (HGSC). Stage IC is now divided into three categories: IC1 (surgical spill); IC2 (capsule ruptured before surgery or tumor on ovarian or fallopian tube surface); and IC3 (malignant cells in the ascites or peritoneal washings). The updated staging includes a revision of Stage IIIC based on spread to the retroperitoneal lymph nodes alone without intraperitoneal dissemination. This category is now subdivided into IIIA1(i) (metastasis ≤10 mm in greatest dimension), and IIIA1(ii) (metastasis >10 mm in greatest dimension). Stage IIIA2 is now “microscopic extrapelvic peritoneal involvement with or without positive retroperitoneal lymph node” metastasis. This review summarizes the genetics, surgical management, chemotherapy, and targeted therapies for epithelial cancers, and the treatment of ovarian germ cell and stromal malignancies.
Article
Background: Cancer is a leading cause of morbidity and mortality worldwide, and better understanding of the risk factors could enhance prevention. Methods: We conducted a hypothesis-free analysis combining machine learning and statistical approaches to identify cancer risk factors from 2828 potential predictors captured at baseline. There were 459,169 UK Biobank participants free from cancer at baseline and 48,671 new cancer cases during the 10-year follow-up. Logistic regression models adjusted for age, sex, ethnicity, education, material deprivation, smoking, alcohol intake, body mass index and skin colour (as a proxy for sun sensitivity) were used for obtaining adjusted odds ratios, with continuous predictors presented using quintiles (Q). Results: In addition to smoking, older age and male sex, positively associating features included several anthropometric characteristics, whole body water mass, pulse, hypertension and biomarkers such as urinary microalbumin (Q5 vs. Q1 OR 1.16, 95% CI = 1.13-1.19), C-reactive protein (Q5 vs. Q1 OR 1.20, 95% CI = 1.16-1.24) and red blood cell distribution width (Q5 vs. Q1 OR 1.18, 95% CI = 1.14-1.21), among others. High-density lipoprotein cholesterol (Q5 vs. Q1 OR 0.84, 95% CI = 0.81-0.87) and albumin (Q5 vs. Q1 OR 0.84, 95% CI = 0.81-0.87) were inversely associated with cancer. In sex-stratified analyses, higher testosterone increased the risk in females but not in males (Q5 vs. Q1 ORfemales 1.23, 95% CI = 1.17-1.30). Phosphate was associated with a lower risk in females but a higher risk in males (Q5 vs. Q1 ORfemales 0.94, 95% CI = 0.90-0.99 vs. ORmales 1.09, 95% CI 1.04-1.15). Conclusions: This hypothesis-free analysis suggests personal characteristics, metabolic biomarkers, physical measures and smoking as important predictors of cancer risk, with further studies needed to confirm causality and clinical relevance.
Article
Incessant ovulation is believed to be a potential cause of epithelial ovarian cancer (EOC). Our previous investigations have shown that insulin-like growth factor (IGF2) and hepatocyte growth factor (HGF) in the ovulatory follicular fluid (FF) contributed to the malignant transformation initiated by p53 mutations. Here we examined the individual and synergistic impacts of IGF2 and HGF on enhancing the malignant properties of high-grade serous carcinoma (HGSC), the most aggressive type of EOC, and its precursor lesion, serous tubal intraepithelial carcinoma (STIC). In a mouse xenograft co-injection model, we observed that FF co-injection induced tumorigenesis of STIC-mimicking cells, FE25. Co-injection with IGF2 or HGF partially recapitulated the tumorigenic effects of FF, but co-injection with both resulted in a higher tumorigenic rate than FF. We analyzed the different transformation phenotypes influenced by these FF growth signals through receptor inhibition. The IGF signal was necessary for clonogenicity, while the HGF signal played a crucial role in the migration and invasion of STIC and HGSC cells. Both signals were necessary for the malignant phenotype of anchoring-independent growth but had little impact on cell proliferation. The downstream signals responsible for these HGF activities were identified as the tyrosine-protein kinase Met (cMET)/mitogen-activated protein kinase and cMET/AKT pathways. Together with the previous finding that the FF-IGF2 could mediate clonogenicity and stemness activities via the IGF-1R/AKT/mammalian target of rapamycin and IGF-1R/AKT/NANOG pathways, respectively, this study demonstrated the cooperation of the FF-sourced IGF and HGF growth signals in the malignant transformation and progression of HGSC through both common and distinct signaling pathways. These findings help develop targeted prevention of HGSC.
Article
Ovarian cancer (OC) is the second most common cancer of the female reproductive system. Due to the asymptomatic nature of early stages of OC and an increasingly poor prognosis in later stages, methods of screening for OC are much desired. Furthermore, screening and diagnosis processes, in order to justify use on asymptomatic patients, must be convenient and non-invasive. Recent developments in machine-learning technologies have made this possible via techniques in the field of metabolomics. The objective of this research was to use existing metabolomics data on OC and various analytic methods to develop a machine-learning model for the classification of potentially OC-related metabolite biomarkers. Pathway analysis and metabolite-set enrichment analysis were performed on gathered metabolite sets. Quantitative molecular descriptors were then used with various machine-learning classifiers for the diagnostics of OC using related metabolites. We elucidated that the metabolites associated with OC used for machine-learning models are involved in five metabolic pathways linked to OC: Nicotinate and Nicotinamide Metabolism, Glycolysis/Gluconeogenesis, Aminoacyl-tRNA Biosynthesis, Valine, Leucine and Isoleucine Biosynthesis, and Alanine, Aspartate and Glutamate Metabolism. Several classification models for the identification of OC using related metabolites were created and their accuracies were confirmed through testing with 10-fold cross-validation. The most accurate model was able to achieve 85.29% accuracy. The elucidation of biological pathways specific to OC using metabolic data and the observation of changes in these pathways in patients have the potential to contribute to the development of screening techniques for OC. Our results demonstrate the possibility of development of the machine-learning models for OC diagnostics using metabolomics data.
Article
Background: The associations between female infertility and epithelial ovarian cancer (EOC) or endometrial cancer (EC) have been reported in observational studies, but its causal relationship remains unknown. We intended to assess the causal effect of female infertility on EOCs and ECs using a two-sample Mendelian Randomization (MR) approach. Methods: Large pooled genome-wide association study (GWAS) datasets for female infertility (6481 cases and 68 969 controls), EOC (25 509 cases and 40 941 controls), and EC (12 906 cases and 108 979 controls) were derived from public GWAS databases and published studies. The Inverse Variance Weighted method, Weighted Median method, MR-Egger regression, and MR-Pleiotropy Residual Sum and Outlier test were adopted for MR analyses. Results: Our results suggested that genetically predicted infertility was positively associated with the risk of EOC (OR = 1.117, 95% CI = 1.003-1.245, P = .045), but did not find a causal relationship between infertility and EC (OR = 1.081, 95% CI = 0.954-1.224, P = .223). As to the reverse direction, our study did not obtain evidence from genetics that EOCs (OR = 0.974, 95% CI = 0.825-1.150, P = .755) and ECs (OR = 1.039, 95% CI = 0.917-1.177, P = .548) were associated with an increased risk of infertility. Conclusions: This large MR analysis supported a causal association between female infertility and increased risk of EOCs, but did not find a causal relationship between infertility and ECs.
Article
Genetic correlations, causalities and pathways between large-scale complex exposures and ovarian and breast cancers need systematic exploration. Mendelian randomisation (MR) and genetic correlation (GC) were used to identify causal biomarkers from 95 cancer-related exposures for risk of breast cancer [BC: oestrogen receptor-positive (ER + BC) and oestrogen receptor-negative (ER − BC) subtypes] and ovarian cancer [OC: high-grade serous (HGSOC), low-grade serous, invasive mucinous (IMOC), endometrioid (EOC) and clear cell (CCOC) subtypes]. Of 31 identified robust risk factors, 16 were new causal biomarkers for BC and OC. Body mass index (BMI), body fat mass (BFM), comparative body size at age 10 (CBS-10), waist circumference (WC) and education attainment were shared risk factors for overall BC and OC. Childhood obesity, BMI, CBS-10, WC, schizophrenia and age at menopause were significantly associated with ER + BC and ER − BC. Omega-6:omega-3 fatty acids, body fat-free mass and basal metabolic rate were positively associated with CCOC and EOC; BFM, linoleic acid, omega-6 fatty acids, CBS-10 and birth weight were significantly associated with IMOC; and body fat percentage, BFM and adiponectin were significantly associated with HGSOC. Both GC and MR identified 13 shared factors. Factors were stratified into five priority levels, and visual causal networks were constructed for future interventions. With analysis of large-scale exposures for breast and ovarian cancers, causalities, genetic correlations, shared or specific factors, risk factor priority and causal pathways and networks were identified.