Content uploaded by Thomas W Kelsey
Author content
All content in this area was uploaded by Thomas W Kelsey on Sep 25, 2020
Content may be subject to copyright.
Harnessing the power of data:
Applications of data science in
reproductive medicine
Tom Kelsey
Professor of Health Data Science
School of Computer Science
University of St Andrews
Inaugural Lecture
23rd September 2020
Based on Scot Nelson’s keynote
lecture at ESHRE 2020
Acknowledgements & Background
Kelsey, 1995
Roger Fletcher, FRS FRSE
1939 –2016
SIAM Lagrange Prize, 2006
Royal Medal of the Royal
Society of Edinburgh, 2008
•Optimisation is the mathematics of
decision making
•Continuous values
•Numeric uncertainty, algorithms, heuristics
Acknowledgements & Background
Wallace et al., 2003; Wallace et al., 2005
W H B Wallace FRCP FRCPCH FRCS
Consultant paediatric oncologist at the Royal
Hospital for Sick Children, Edinburgh
Professor, MRC Centre for Reproductive
Health, University of Edinburgh
•Late effects research
•Minimise the long-term effects of radio-
and chemotherapies on survivors
•Whilst maintaining cure rates
Fertility after Radiotherapy
Wallace et al., 2003; Anderson et al., Lancet Diabetes Endocrinol. 2015
•Estimate LD50 for the human
oocyte
•Use to plan conformal RXT to
optimise dose to the least-
affected ovary
•Calculate window of opportunity
for fertility
•Calculate the age-related effective
sterilising dose
•Use to inform fertility
preservation decision making
•Minimise the long-term effects of
radiotherapy on healthy tissue
•Whilst maintaining cure rates
Fertility after Radiotherapy
Kelsey et al., 2020 Lancet Oncology (in preparation); source: Danny Indelicato, MD - University of Florida Proton Therapy Institute
Photon
Plan
Proton
Plan
Fertility after Radiotherapy
Kelsey et al., 2020 Lancet Oncology (in preparation); source: Danny Indelicato, MD - University of Florida Proton Therapy Institute
•8 year old patient (say)
•CSI plan for pineoblastoma treatment with 36 Gy
•Revised radiosensitivity modelling using externally validated model of ovarian reserve
•Estimate age at premature ovarian insufficiency for two types of RXT
•Mean dose to ovaries 3.28 Gy: Infertile at age 15 with photon RXT
•Mean dose to ovaries 0.96 Gy: Infertile at age 33 with proton RXT
•No change in dose to tumour
•No change in dose to craniospinal fields
•No change in effectiveness of the cancer treatment
Dunstan et al., 1999; Dunstan et al., 1998; Gent et al., 2002; Kelsey et al., 2004; Kelsey et al., 2014
Steve Linton Ursula Martin Ian Gent Colva Roney-Dougal Ian Miguel
Acknowledgements & Background
Collection & use of data has
transformed our lives
Tracking the
spread of
SARS-CoV-2
Reuters accessed Sep 8 2020
Trade-offs between axes of data
Shilo et al Nature Medicine 2020
Can we extend and improve medical and
healthcare decision making by careful
analysis of these data?
The Lancet
Global Health
2018
The standard view of ground truth has
strengths and weaknesses
•Trade-offs in axes of data are hidden or
non-existent
•Meta-analysis and systematic review
often reveal paucity of high-quality data
•Tried and tested
•Easy to understand
•Confidence increases at each stage
Case reports can transform medicine
Steptoe and Edwards Lancet 1978, Trounson and Mohr Nature 1983, Porter et al Lancet 1984, Lutjen et al Nature 1983,
Palermo et al Lancet 1984, Handyside et al Nature 1990, Craft et al Lancet 1993, Donnez et al Lancet 2004, Brannstrom et al Lancet 2015
But can also be outliers with
no useful generalisation
Wallace et al., 2003; Jeppesen et al., 2013; Anderson et al., 2013; McGlaughlin et al., 2016; Mamsen et al., 2019; Mamsen et al., 2017
N = 6
Childood cancer
N = 87
Cancer
N = 59
Breast cancer
N = 13
Ovarian tissue
N = 57
Turner syndrome
Cohort studies can lack power
when data are scarce
N = 39
Foetal tissue
Important insights
from rare cases
Results are
indictative rather
than definitive
Trials: often too small to provide reliable
estimates of risk / benefit balance
Stocking et al Hum Repro 2019; de Vries et al Psychol Med. 2018
Intervention to
raise IVF live
birth rates
40 to 44% N = 4,778
need to be
randomised for
80% power
α 0.05
Trials: often too small to provide reliable
estimates of risk / benefit balance
Stocking et al Hum Repro 2019; de Vries et al Psychol Med. 2018
Intervention
30 to 35% N = 2,752
need to be
randomised for
80% power
α 0.05
Iliodromiti et al., 2013; Iliodromiti et al., 2014; Iliodromiti et al., 2016; van der Kooi et al., 2019
AMH
PCOS
AMH
Live Birth
Adiponectin
Gestational Diabetes
Perinatal complications
Cancer survivors
Meta-analysis is important but
caution is needed
Quality of evidence
is often low
Random effects
indicated by study
heterogeneity…
…but then outcome
variance is poor due
to small number of
included studies
Meta-analysis can magnify belief
Stocking et al Hum Repro 2019; de Vries et al Psychol Med. 2018
Intervention
30 to 35%
Protocol Belief
Triangulation of evidence is critical
• Conventional approaches
- Randomised controlled trials
- Multivariable regression in observational data
• Refinements using general populations
- Cross-context comparisons
- Different control groups
- Natural experiments
• Refinements using specific populations
- Within sibling comparisons
• Refinements of exposure
- Instrument variable analyses
- Exposure negative control studies
• Refinements of outcome
- Outcome negative control studies
Lawlor et al IJE 2016
All of these have
different
assumptions and key
sources of bias
Let’s harness all of the data science
tools at our disposal to enable
high performance medicine
Validation by independent data
becomes a crucial determinant of
research quality
Observational data model validation
Wallace & Kelsey, PLoS ONE 2010; Depmann et al., JCEM 2015
Number of potential eggs (NGFs) in the human ovary
Predictions compared to later
observations from a population-
based cohort
Observational data model validation
Kelsey et al., PLoS ONE 2013; McGloughlin et al., JARG 2015
Ovarian volume throughout life
Predictions compared to later
observations after combination with
the NGF model
Observational data approaches have been critical
to our understanding of AMH and its utility
Nelson et al Fertil Steril 2011, Nelson et al RBM Online 2011
Derived = 9,601
Validated in N =15,834
Derived N = 5,492
Validated N = 5,492
Compared N = 9,601
•AMH is a product preantral and
small antral follicles in women
•As such, AMH is only present in
the ovary until menopause
•Can it be used as a biomarker for
remaining ovarian reserve?
•First studies are promising, but
are based on infertile subjects
Observational data approaches have been critical
to our understanding of AMH and its utility
Kelsey et al PLOS One 2011, Kelsey et al Mol Hum Reprod 2012; Jeffery et al J Ped Endocrinol Metab 2015
Intrauterine
Childhood
Adult
•AMH model from
conception to
menopause
•Validated for adult
ages
•Validated for
childhood/pubertal
ages using 10-year
longitudinal data
•AMH now accepted
as biomarker
Observational data approaches have been critical
to our understanding of AMH and its utility
Perry et al,. Hum Mol Genet, 2016; Iliodromiti et al HRU 2014; Khader et al J Ovarian Res 2013
Live birth
prediction
•“Our findings provide genetic
support for the well-
established use of AMH as a
marker of ovarian reserve”
•AMH now routinely used as
adjunct to the 2003 criteria
for PCOS diagnosis
•Predicted live births based on
AMH match observations
•AMH now used effectively as a
biomarker
•Providing further validation of
the underlying models
Using AMH to inform fertility preservation for
survivors of cancer
Anderson et al 2013 Eur J Cancer; Anderson et al 2020 in prep.
•Pre-treatment AMH predicts
for loss of ovarian function
after chemotherapy for early
breast cancer
•6-month post-treatment AMH
has high PPV for impaired
fertility
•Pre-and post-chemo AMH
combined with BMI, age,
parity and endocrine factors
has high diagnostic utility
•We can optimise and
personalise post-chemo
endocrine therapy
Population data studies have been
externally validated
Templeton et al Lancet 1996; Smith et al JAMA 2015; Smith et al RBMOnline 2020
52,507 cycles 1991 -1994 271,438 UK IVF cycles 2003 - 2010,
135,673 US cycles in 2016
Use of natural experiment design to
assess effect of smoking with subsequent
replication
Mackay DF et al 2012 PLoS Med; Been et al Scientific Reports 2015
SGA
reduced
4.52%
716,941 women
10,238,950 live-births
England
July 2007
ban
Using large national datasets to redefine risk
Anderson et al. Hum Rep 2018
•Cancer registry data linked to
maternity data
•Matched controls for each
exposed subject
•All Scottish data from 1983 to
2017
•Cancer survivors were 38%
less likely to achieve a
pregnancy
•New insights, better
counselling
•Extend with mental health,
SMID and admission datasets
Using large national datasets to redefine risk
Anderson et al. Hum Rep 2018
•Cancer registry data linked to
maternity data
•Calculate risk by diagnosis,
age at treatment, decade of
treatment, treatment type
•New insights, better
counselling
•Extend with mental health,
SMID and admission datasets
Using large national datasets to validate
criteria for fertility preservation
Wallace et al 2014 Lancet Oncology
•Diagnosis, treatment and
follow-up data
•Cryopreservation still
considered experimental
•Over 130 healthy babies
•Success rate about 30%
•New insights, better
counselling
•Extend with more detailed
follow-up data
Using large national datasets to validate
criteria for fertility preservation
Wallace et al 2014 Lancet Oncology
•Diagnosis, treatment and
follow-up data
•Compare early menopause
risk for those offered & not
offered ovarian
cryopreservation
•Results strongly in favour of
criteria which minimise
overall risk
•New insights, better
counselling
•Extend with more detailed
follow-up data
15-year probability 35% [95% CI 10–53] vs 1% [0–2]
p<0.0001
Hazard ratio 56.8 [95% CI 6.2–521.6] at 10 years
Where is the modern AI?
• Cohort studies
- Regression models
- Multivariable logistic regression
- Dose-response models
• Oservational data studies
- Life tables using Kaplan-Meier
- PK style modelling using ODEs
- Cox proportional hazards
- Normative age-related models
• Meta analyses
- Hierarchical summary ROC curves
- Fixed & random effects meta-regression
All of these are well-
understood statistical
and/or optimisation
methods
Machine learning within IVF has been
proposed for all areas of the pathway
Kort et al Fertil Steril 2018; Hicks et al Scientific Reports 2019; Wang et al Infect Dis Pov 2020; McCallum et al Commun Biology 2019, Coopers Genomics, TMRW, RI Witness
Customer
communications
and engagement
Donor mapping Ovarian stimulationFinancial models Semen analysisCrowdsourcing
Sperm selection Oocyte
diagnosis
Live and alarmed KPI
assessments
Live and alarmed
QC assessments
Equipment
monitoring
PGTai Endometrial assessment
Prediction models
AI as a strategy to improve ovarian simulation
Inputs
Random
Forest
Analysis
Outputs Variable ImportancePrediction Cohort simulations for
case-control investigation
Morphokinetic data Patient phenotype
0 12 24 36
0
50
100
150
200
Ti m e (h rs )
Se r u m
LH
/ h C G
Longitudinal endocrine data
Output 1 = Yes
Output 2 = No
Output 3 = Yes
Output n = Yes
Marority vote
Output = Yes
Abbara et al Front Endo 2018; Abbara et al Front Endo 2019; Andersen et al Fert Stert 2020
AI as a strategy to improve endocrine
therapy after breast cancer
Inputs
Machine
Learning
Analysis
Outputs Variable ImportancePrediction Cohort simulations for
case-control investigation
Treatment data Patient phenotype
0 12 2 4 3 6
0
50
100
150
200
Ti m e (h rs )
Se r u m
LH
/ h C G
Longitudinal endocrine data
If A then B
PPV 87.9%
ACC 54.9%
Neural Net
PPV
ACC
Random Forest
PPV 100%
AUC 69.4%
SVM
PPV
ACC
GB Dec. Tree
PPV
ACC
Log. Reg.
PPV
ACC
Anderson et al Eur J Cancer 2020 (under review)
AI as a strategy to improve
lung cancer screening
Augustine et al J Biomed Informatics 2020 (under review)
AI as a strategy to improve
diagnosis and staging of dementia
Skackauskas et al Neurocomputing 2020 (in preparation)
ADNI 4D fMRI data Preprocessing, calibration, registration, CNN
Model
evaluation
Due process of AI studies still required
Nagendran et al BMJ 2020; Topol Nature Medicine 2019; Morse et al Nature Medicine 2020;
Deep neural network
•Publish in accordance with
•existing reporting standards
Clinical validation in
real-world medicine
•Publish, RCTs showing benefit,
Regulatory approval
Implementation in
healthcare
•Cost of implementation
•how many workflows will be
affected?
•Does the model increase the
efficiency of existing workflows?
•Is the model being deployed within
an existing digital workflow?
Conclusions
•Triangulation of evidence: multiples angles, one truth
•Large, simple, randomised controlled trials are essential for medicine,
but so are other scientific approaches to data
•The convergence of data science and human intelligence provides us
with a unique opportunity for practicing high-performance reproductive
medicine
Colleagues
•Everyone at the School of Computer Science
•Edinburgh
•Hamish Wallace, Richard Anderson, Evelyn Telfer, …
•Copenhagen
•Stine Gry Kristensen, Claus Yding Andersen,…
•Imperial College
•Ali Abbara, Waljit Dhillo,…
•Glasgow
•Scott Nelson, Stamiatina Iliodromiti
•St Andrews
•Gerry Humphris, Frank Sullivan,…
Thank you