PresentationPDF Available

Abstract

Harnessing the power of data: Applications of data science in reproductive medicine
Harnessing the power of data:
Applications of data science in
reproductive medicine
Tom Kelsey
Professor of Health Data Science
School of Computer Science
University of St Andrews
Inaugural Lecture
23rd September 2020
Based on Scot Nelson’s keynote
lecture at ESHRE 2020
Acknowledgements & Background
Kelsey, 1995
Roger Fletcher, FRS FRSE
1939 2016
SIAM Lagrange Prize, 2006
Royal Medal of the Royal
Society of Edinburgh, 2008
Optimisation is the mathematics of
decision making
Continuous values
Numeric uncertainty, algorithms, heuristics
Acknowledgements & Background
Wallace et al., 2003; Wallace et al., 2005
W H B Wallace FRCP FRCPCH FRCS
Consultant paediatric oncologist at the Royal
Hospital for Sick Children, Edinburgh
Professor, MRC Centre for Reproductive
Health, University of Edinburgh
Late effects research
Minimise the long-term effects of radio-
and chemotherapies on survivors
Whilst maintaining cure rates
Fertility after Radiotherapy
Wallace et al., 2003; Anderson et al., Lancet Diabetes Endocrinol. 2015
Estimate LD50 for the human
oocyte
Use to plan conformal RXT to
optimise dose to the least-
affected ovary
Calculate window of opportunity
for fertility
Calculate the age-related effective
sterilising dose
Use to inform fertility
preservation decision making
Minimise the long-term effects of
radiotherapy on healthy tissue
Whilst maintaining cure rates
Fertility after Radiotherapy
Kelsey et al., 2020 Lancet Oncology (in preparation); source: Danny Indelicato, MD - University of Florida Proton Therapy Institute
Photon
Plan
Proton
Plan
Fertility after Radiotherapy
Kelsey et al., 2020 Lancet Oncology (in preparation); source: Danny Indelicato, MD - University of Florida Proton Therapy Institute
8 year old patient (say)
CSI plan for pineoblastoma treatment with 36 Gy
Revised radiosensitivity modelling using externally validated model of ovarian reserve
Estimate age at premature ovarian insufficiency for two types of RXT
Mean dose to ovaries 3.28 Gy: Infertile at age 15 with photon RXT
Mean dose to ovaries 0.96 Gy: Infertile at age 33 with proton RXT
No change in dose to tumour
No change in dose to craniospinal fields
No change in effectiveness of the cancer treatment
Dunstan et al., 1999; Dunstan et al., 1998; Gent et al., 2002; Kelsey et al., 2004; Kelsey et al., 2014
Steve Linton Ursula Martin Ian Gent Colva Roney-Dougal Ian Miguel
Acknowledgements & Background
Collection & use of data has
transformed our lives
Tracking the
spread of
SARS-CoV-2
Reuters accessed Sep 8 2020
Trade-offs between axes of data
Shilo et al Nature Medicine 2020
Can we extend and improve medical and
healthcare decision making by careful
analysis of these data?
The Lancet
Global Health
2018
The standard view of ground truth has
strengths and weaknesses
Trade-offs in axes of data are hidden or
non-existent
Meta-analysis and systematic review
often reveal paucity of high-quality data
Tried and tested
Easy to understand
Confidence increases at each stage
Case reports can transform medicine
Steptoe and Edwards Lancet 1978, Trounson and Mohr Nature 1983, Porter et al Lancet 1984, Lutjen et al Nature 1983,
Palermo et al Lancet 1984, Handyside et al Nature 1990, Craft et al Lancet 1993, Donnez et al Lancet 2004, Brannstrom et al Lancet 2015
But can also be outliers with
no useful generalisation
Wallace et al., 2003; Jeppesen et al., 2013; Anderson et al., 2013; McGlaughlin et al., 2016; Mamsen et al., 2019; Mamsen et al., 2017
N = 6
Childood cancer
N = 87
Cancer
N = 59
Breast cancer
N = 13
Ovarian tissue
N = 57
Turner syndrome
Cohort studies can lack power
when data are scarce
N = 39
Foetal tissue
Important insights
from rare cases
Results are
indictative rather
than definitive
Trials: often too small to provide reliable
estimates of risk / benefit balance
Stocking et al Hum Repro 2019; de Vries et al Psychol Med. 2018
Intervention to
raise IVF live
birth rates
40 to 44% N = 4,778
need to be
randomised for
80% power
α 0.05
Trials: often too small to provide reliable
estimates of risk / benefit balance
Stocking et al Hum Repro 2019; de Vries et al Psychol Med. 2018
Intervention
30 to 35% N = 2,752
need to be
randomised for
80% power
α 0.05
Iliodromiti et al., 2013; Iliodromiti et al., 2014; Iliodromiti et al., 2016; van der Kooi et al., 2019
AMH
PCOS
AMH
Live Birth
Adiponectin
Gestational Diabetes
Perinatal complications
Cancer survivors
Meta-analysis is important but
caution is needed
Quality of evidence
is often low
Random effects
indicated by study
heterogeneity…
…but then outcome
variance is poor due
to small number of
included studies
Meta-analysis can magnify belief
Stocking et al Hum Repro 2019; de Vries et al Psychol Med. 2018
Intervention
30 to 35%
Protocol Belief
Triangulation of evidence is critical
Conventional approaches
- Randomised controlled trials
- Multivariable regression in observational data
Refinements using general populations
- Cross-context comparisons
- Different control groups
- Natural experiments
Refinements using specific populations
- Within sibling comparisons
Refinements of exposure
- Instrument variable analyses
- Exposure negative control studies
Refinements of outcome
- Outcome negative control studies
Lawlor et al IJE 2016
All of these have
different
assumptions and key
sources of bias
Let’s harness all of the data science
tools at our disposal to enable
high performance medicine
Validation by independent data
becomes a crucial determinant of
research quality
Observational data model validation
Wallace & Kelsey, PLoS ONE 2010; Depmann et al., JCEM 2015
Number of potential eggs (NGFs) in the human ovary
Predictions compared to later
observations from a population-
based cohort
Observational data model validation
Kelsey et al., PLoS ONE 2013; McGloughlin et al., JARG 2015
Ovarian volume throughout life
Predictions compared to later
observations after combination with
the NGF model
Observational data approaches have been critical
to our understanding of AMH and its utility
Nelson et al Fertil Steril 2011, Nelson et al RBM Online 2011
Derived = 9,601
Validated in N =15,834
Derived N = 5,492
Validated N = 5,492
Compared N = 9,601
AMH is a product preantral and
small antral follicles in women
As such, AMH is only present in
the ovary until menopause
Can it be used as a biomarker for
remaining ovarian reserve?
First studies are promising, but
are based on infertile subjects
Observational data approaches have been critical
to our understanding of AMH and its utility
Kelsey et al PLOS One 2011, Kelsey et al Mol Hum Reprod 2012; Jeffery et al J Ped Endocrinol Metab 2015
Intrauterine
Childhood
Adult
AMH model from
conception to
menopause
Validated for adult
ages
Validated for
childhood/pubertal
ages using 10-year
longitudinal data
AMH now accepted
as biomarker
Observational data approaches have been critical
to our understanding of AMH and its utility
Perry et al,. Hum Mol Genet, 2016; Iliodromiti et al HRU 2014; Khader et al J Ovarian Res 2013
Live birth
prediction
“Our findings provide genetic
support for the well-
established use of AMH as a
marker of ovarian reserve”
AMH now routinely used as
adjunct to the 2003 criteria
for PCOS diagnosis
Predicted live births based on
AMH match observations
AMH now used effectively as a
biomarker
Providing further validation of
the underlying models
Using AMH to inform fertility preservation for
survivors of cancer
Anderson et al 2013 Eur J Cancer; Anderson et al 2020 in prep.
Pre-treatment AMH predicts
for loss of ovarian function
after chemotherapy for early
breast cancer
6-month post-treatment AMH
has high PPV for impaired
fertility
Pre-and post-chemo AMH
combined with BMI, age,
parity and endocrine factors
has high diagnostic utility
We can optimise and
personalise post-chemo
endocrine therapy
Population data studies have been
externally validated
Templeton et al Lancet 1996; Smith et al JAMA 2015; Smith et al RBMOnline 2020
52,507 cycles 1991 -1994 271,438 UK IVF cycles 2003 - 2010,
135,673 US cycles in 2016
Use of natural experiment design to
assess effect of smoking with subsequent
replication
Mackay DF et al 2012 PLoS Med; Been et al Scientific Reports 2015
SGA
reduced
4.52%
716,941 women
10,238,950 live-births
England
July 2007
ban
Using large national datasets to redefine risk
Anderson et al. Hum Rep 2018
Cancer registry data linked to
maternity data
Matched controls for each
exposed subject
All Scottish data from 1983 to
2017
Cancer survivors were 38%
less likely to achieve a
pregnancy
New insights, better
counselling
Extend with mental health,
SMID and admission datasets
Using large national datasets to redefine risk
Anderson et al. Hum Rep 2018
Cancer registry data linked to
maternity data
Calculate risk by diagnosis,
age at treatment, decade of
treatment, treatment type
New insights, better
counselling
Extend with mental health,
SMID and admission datasets
Using large national datasets to validate
criteria for fertility preservation
Wallace et al 2014 Lancet Oncology
Diagnosis, treatment and
follow-up data
Cryopreservation still
considered experimental
Over 130 healthy babies
Success rate about 30%
New insights, better
counselling
Extend with more detailed
follow-up data
Using large national datasets to validate
criteria for fertility preservation
Wallace et al 2014 Lancet Oncology
Diagnosis, treatment and
follow-up data
Compare early menopause
risk for those offered & not
offered ovarian
cryopreservation
Results strongly in favour of
criteria which minimise
overall risk
New insights, better
counselling
Extend with more detailed
follow-up data
15-year probability 35% [95% CI 10–53] vs 1% [0–2]
p<0.0001
Hazard ratio 56.8 [95% CI 6.2–521.6] at 10 years
Where is the modern AI?
Cohort studies
- Regression models
- Multivariable logistic regression
- Dose-response models
Oservational data studies
- Life tables using Kaplan-Meier
- PK style modelling using ODEs
- Cox proportional hazards
- Normative age-related models
Meta analyses
- Hierarchical summary ROC curves
- Fixed & random effects meta-regression
All of these are well-
understood statistical
and/or optimisation
methods
Machine learning within IVF has been
proposed for all areas of the pathway
Kort et al Fertil Steril 2018; Hicks et al Scientific Reports 2019; Wang et al Infect Dis Pov 2020; McCallum et al Commun Biology 2019, Coopers Genomics, TMRW, RI Witness
Customer
communications
and engagement
Donor mapping Ovarian stimulationFinancial models Semen analysisCrowdsourcing
Sperm selection Oocyte
diagnosis
Live and alarmed KPI
assessments
Live and alarmed
QC assessments
Equipment
monitoring
PGTai Endometrial assessment
Prediction models
AI as a strategy to improve ovarian simulation
Inputs
Random
Forest
Analysis
Outputs Variable ImportancePrediction Cohort simulations for
case-control investigation
Morphokinetic data Patient phenotype
0 12 24 36
0
50
100
150
200
Ti m e (h rs )
Se r u m
LH
/ h C G
Longitudinal endocrine data
Output 1 = Yes
Output 2 = No
Output 3 = Yes
Output n = Yes
Marority vote
Output = Yes
Abbara et al Front Endo 2018; Abbara et al Front Endo 2019; Andersen et al Fert Stert 2020
AI as a strategy to improve endocrine
therapy after breast cancer
Inputs
Machine
Learning
Analysis
Outputs Variable ImportancePrediction Cohort simulations for
case-control investigation
Treatment data Patient phenotype
0 12 2 4 3 6
0
50
100
150
200
Ti m e (h rs )
Se r u m
LH
/ h C G
Longitudinal endocrine data
If A then B
PPV 87.9%
ACC 54.9%
Neural Net
PPV
ACC
Random Forest
PPV 100%
AUC 69.4%
SVM
PPV
ACC
GB Dec. Tree
PPV
ACC
Log. Reg.
PPV
ACC
Anderson et al Eur J Cancer 2020 (under review)
AI as a strategy to improve
lung cancer screening
Augustine et al J Biomed Informatics 2020 (under review)
AI as a strategy to improve
diagnosis and staging of dementia
Skackauskas et al Neurocomputing 2020 (in preparation)
ADNI 4D fMRI data Preprocessing, calibration, registration, CNN
Model
evaluation
Due process of AI studies still required
Nagendran et al BMJ 2020; Topol Nature Medicine 2019; Morse et al Nature Medicine 2020;
Deep neural network
Publish in accordance with
existing reporting standards
Clinical validation in
real-world medicine
Publish, RCTs showing benefit,
Regulatory approval
Implementation in
healthcare
Cost of implementation
how many workflows will be
affected?
Does the model increase the
efficiency of existing workflows?
Is the model being deployed within
an existing digital workflow?
Conclusions
Triangulation of evidence: multiples angles, one truth
Large, simple, randomised controlled trials are essential for medicine,
but so are other scientific approaches to data
The convergence of data science and human intelligence provides us
with a unique opportunity for practicing high-performance reproductive
medicine
Colleagues
Everyone at the School of Computer Science
Edinburgh
Hamish Wallace, Richard Anderson, Evelyn Telfer, …
Copenhagen
Stine Gry Kristensen, Claus Yding Andersen,…
Imperial College
Ali Abbara, Waljit Dhillo,…
Glasgow
Scott Nelson, Stamiatina Iliodromiti
St Andrews
Gerry Humphris, Frank Sullivan,…
Thank you
ResearchGate has not been able to resolve any citations for this publication.
  • Nagendran
Nagendran et al BMJ 2020;
  • Morse
Topol Nature Medicine 2019; Morse et al Nature Medicine 2020; existing workflows? • Is the model being deployed within an existing digital workflow?