Conference PaperPDF Available

Cardiovascular disease and all-cause mortality risk prediction from abdominal CT using deep learning

Authors:
PROCEEDINGS OF SPIE
SPIEDigitalLibrary.org/conference-proceedings-of-spie
Cardiovascular disease and all-cause
mortality risk prediction from
abdominal CT using deep learning
Elton, Daniel, Chen, Andy, Pickhardt, Perry, Summers,
Ronald
Daniel C. Elton, Andy Chen, Perry J. Pickhardt, Ronald M. Summers,
"Cardiovascular disease and all-cause mortality risk prediction from
abdominal CT using deep learning," Proc. SPIE 12033, Medical Imaging
2022: Computer-Aided Diagnosis, 120332N (4 April 2022); doi:
10.1117/12.2612620
Event: SPIE Medical Imaging, 2022, San Diego, California, United States
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 07 Apr 2022 Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
Cardiovascular disease and all-cause mortality risk prediction
from abdominal CT using deep learning
Daniel C. Eltona, Andy Chena, Perry J. Pickhardtb, and Ronald M. Summersa
aImaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging
Sciences, National Institutes of Health Clinical Center, Bethesda, MD 20892-1182, USA
bSchool of Medicine and Public Health, University of Wisconsin, Madison, WI 53726, USA
ABSTRACT
Cardiovascular disease is the number one cause of mortality worldwide. Risk prediction can help incentivize
lifestyle changes and inform targeted preventative treatment. In this work we explore utilizing a convolutional
neural network (CNN) to predict cardiovascular disease risk from abdominal CT scans taken for routine CT
colonography in otherwise healthy patients aged 50-65. We find that adding a variational autoencoder (VAE) to
the CNN classifier improves its accuracy for five year survival prediction (AUC 0.787 vs. 0.768). In four-fold cross
validation we obtain an average AUC of 0.787 for predicting five year survival and an AUC of 0.767 for predicting
cardiovascular disease. For five year survival prediction our model is significantly better than the Framingham
Risk Score (AUC 0.688) and of nearly equivalent performance to method demonstrated in Pickhardt et al. (AUC
0.789) which utilized a combination of five CT derived biomarkers.
Keywords: cardiovascular disease, machine learning, deep learning, longevity, artificial intelligence, risk, ab-
dominal CT, all-cause mortality
1. PURPOSE
Globally, cardiovascular disease (CVD) remains the number one cause of mortality, causing 17.9 million deaths
in 2019, including 38% of all premature deaths from noncommunicable diseases.1As standards of living increase
globally, people have more time and resources to invest in preventative measures to protect their long term
health. Since the 1960s, the rate of deaths from CVD in the United States has been reduced by about 50%
through a mixture of lifestyle changes and improved treatment. It has been argued that up to 90% of CVD
is preventable if lifestyle changes and other interventions are implemented early enough.2Informing a patient
that they have higher than average CVD risk can help encourage positive lifestyle changes. Additionally, several
primary prevention treatments are under investigation to reduce CVD risk including metformin,3,4low-dose
aspirin,5and gene therapy.6Given the costs and risks associated with these treatments, targeting them to the
most at-risk patients will be important.
Genetic screening can help with targeting but completely neglects the effects of environmental exposures. A
popular risk scoring system for CVD risk used currently is the Framingham risk score (FRS) which factors in
age, sex, cholesterol level, and blood pressure.7In the Framingham Heart Study 8,491 patients were followed
for 12 years and it was found that the FRS achieved a C-statistic (equivalent to AUC for censored data) for
predicting CVD of 0.763 (95% confidence interval (CI) 0.746 - 0.780) in men and 0.793 (95% CI, 0.772 - 0.814)
in women.8Previously we showed that the FRS yields an AUC of 0.688 (95% CI 0.650–0.727) for predicting five
year survival for the patient cohort we will study.9A 2019 meta-analysis compared three popular risk models
for predicting 10-year CVD risk - the FRS, Framingham Adult Treatment Panel III model, and pooled cohort
model.10 They found modest C-statistics ranging between 0.68-0.74.10
Imaging biomarkers hold much promise for improving CVD risk scoring. In particular, coronary artery
calcification (CAC) scoring has been heavily studied.11,12 CAC scores are typically obtained using a specialized
ECG gated non-contrast cardiac CT scan. The addition of CAC score has been shown to improve risk prediction
Co-senior authors. Further author information: send correspondence to Daniel Elton (delton@mgh.harvard.edu) or
Ronald Summers (rms@nih.gov)
Medical Imaging 2022: Computer-Aided Diagnosis, edited by Karen Drukker, Khan M. Iftekharuddin,
Hongbing Lu, Maciej A. Mazurowski, Chisako Muramatsu, Ravi K. Samala, Proc. of SPIE
Vol. 12033, 120332N · © 2022 SPIE · 1605-7422 · doi: 10.1117/12.2612620
Proc. of SPIE Vol. 12033 120332N-1
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 07 Apr 2022
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
when added to more traditional risk factors such as age, gender, blood biomarkers, and family history.1315 In the
Multi-Ethnic Study of Atherosclerosis (MESA) dataset the FRS yielded a C-statistic for CVD risk (median follow
up time 7.6 years) of 0.62 but that rose to 0.78 when CAC score was included.13 Work by Mets et al. obtained a
C-statistic of 0.71 (95% CI 0.67-0.76) for three year survival prediction using the National Lung Screening Trial
dataset with a multivariate model that combined CAC score and clinical parameters (age, smoking status, etc.)14
McClelland et al. obtained a C-statistic of 0.80 for 10 year prediction of coronary heart disease when combining
CAC score with blood biomarkers, family history, age, and gender.15 Recently several works have shown how
deep learning systems can automate CAC scoring.1620 Interestingly, Chao et al. have demonstrated that feeding
a box around the heart from a chest CT into a multi-view convolutional neural net (CNN) architecture can lead
to improved CVD and mortality prediction over the state of the art for automated CAC scoring or manual CAC
grading.20 This suggests that CNNs can extract additional features beyond the coronary artery calcification
which help inform the risk prediction. Partially by inspired that work, in this work we study feeding an entire
abdominal CT scan into a CNN to perform risk scoring rather than extracting custom biomarkers such as aortic
calcification score and visceral/subcutaneous fat ratio like we did in a prior work.9
The vast majority of prior work on CVD or all-cause mortality risk prediction has looked at either chest
CT1924 or chest X-ray.25,26 Among the works that utilize chest CT, the publicly available National Lung
Screening Trial (NLST) dataset has been heavily utilized14,19,20,23,24,26 in addition to the Dutch-Belgian lung
cancer screening trial,21 or in-house chest CT datasets.22 NLST participants have a history of smoking, putting
them at higher than average risk for CVD. In contrast, our dataset consists of scans from an otherwise healthy
patient cohort. In this work we focus on risk scoring from abdominal CT scans, which has only been studied
previously in a handful of works.9,27,28 Abdominal CT are rich with biomarkers known to be relevant to
cardiovascular risk such as visceral fat and aortic plaque.29,30 This is the first work to feed the entire abdominal
CT scan into a deep learning model.
μ
σ
Sampled latent vector
(size = 256)
VAE architecture #2
Hidden fully connected layer
(size = 512)
Output branch for CNN / VAE architecture #1
Fully connected
feedforward
network
3D Convolution
Classification
sigmoid output
unit
16
32
64
64
64
Resampled
3D CT scan
volume
VAE Decode r (op tiona l)
Figure 1. The architectures employed. The CNN architecture consists of five CNN layers followed by two fully connected
layers. Optionally a VAE decoder may be attached to the CNN to encourage the CNN to obtain good high level features. If
the VAE is attached, one can experiment with attaching the classification output to the vector of latent means (architecture
2)
2. METHODS
The dataset we utilize consists of CT colonography (CTC) scans from the University of Wisconsin Medical
Center.27 9,223 people underwent CTC scans between April 2004 and December 2016. Further details on
Proc. of SPIE Vol. 12033 120332N-2
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 07 Apr 2022
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
4826 scans 255 scans 1694 scans
6775 Patients with 5 year follow-
up data, one scan
per patient
Training set
(71.23%)
Testing set
(25%)
Validation set
(3.78%)
9393 patients who underwent CT
Colonographies at
University of Wisconsin-Madison
Excluded:
170 scans either corrupted or
not supine scans
2448 patients without 5 year
follow-up data
Figure 2. Patient flow chart showing how the patient cohort was selected and split into train, validation, and test sets
for five year survival prediction.
the cohort are provided in Pickhardt et al, 2020.9As in prior works9,27 cardiovascular disease was defined as
myocardial infarction, cerebrovascular accident, or development of congestive heart failure. These reflect the
endpoints considered by the FRS for cardiovascular disease.
We found 6775 patients had five year follow up data indicating if they survived for five years and out of those
216 (3%) died within five years. Similarly, 7008 patients had follow up data indicating whether or not they were
diagnosed with CVD within five years, and out of those 399 developed CVD (5.6%). We used roughly 71% of
the data for training, 5% for validation, and 25% for test. The data flow and number of patients in the training,
validation, and testing folds for 5 year survival prediction is summarized in figure 2.
The architectures for the CNN and VAE are shown in figure 1. The two architectures were identical apart
from the presence of the VAE decoder. To reduce the number of parameters in the network the first convolutional
layer contains a large 7x7x7 kernel applied with a stride of 2 in each direction and padding of 3 on the edges. The
rest of the convolutional layers use 4x4x4 kernels with a stride of 2 and padding of 1. Group normalization and
dropout (dropout rate 0.5) were used after each convolutional layer. In summary, the CNN encoder contains 5
convolutional layers and two feedforward networks and has a total of 5,414,558 parameters. The VAE architecture
contains 3 additional fully connected layers and 5 additional upsampling layers and has a total of 21,060,930
parameters. We use a relatively small latent vector size of D= 256 to encourage the network to find efficient
high level features.
We also attempt to add the patient age and gender information to the model. As is standard practice in
machine learning, we encode the age and gender information in one-hot vectors. Ages were encoded in a vector
of length 60 corresponding to ages between 30-90 years old (any patient with age less than 30 is set to 30 and
any above 90 is set to 90). The one hot vectors are concatenated to the hidden layer.
The images are resampled to a size of 192x192x128 using trilinear interpolation, clipped to [-500, 500], and
then rescaled to between [0, 1]. To deal with the data imbalance (only 3% of patients died with 5 years) we
reweight the data during training so the ratio of survived/died is 50/50 during training. The RAdam optimizer31
was used with a learning rate of 0.001 and batch size of 4. The data augmentations employed were limited to
random rotations around a random axis by +/- 0-15 degrees, random 0-20% cropping in the cranial-caudal
direction, and random left-right flipping.
Most existing works using a VAE, such as van Velzen et al.,23 train an autoencoder first for image recon-
struction and then train a separate classification model that utilizes either the latent vector or a hidden layer
vector taken from the “bottleneck” portion of the VAE. In this work we perform joint training, training our
Proc. of SPIE Vol. 12033 120332N-3
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 07 Apr 2022
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
VAE model for both image reconstruction and classification at the same time. This procedure is inspired by
previous work where a VAE decoder was attached to a 3D V-Net.32 The author claims that adding the VAE
helps regularize the 3D V-Net and prevent overfitting.32 In a similar way we hypothesize that joint training with
the VAE attached will improve the generalization performance of the classifier by encouraging the network to
learn robust high level representations. Joint training is more challenging to implement since it requires carefully
weighting the loss terms, otherwise one or more loss terms may be ignored during gradient descent training.
The loss function applied at iteration ifor the VAE architecture was:
Li=βLCE +LL1 + ΓiγLKL (1)
where
LL1 =||Iinput Ipred|| LKL =1
NXµ2+σ2log σ21 (2)
The loss function for the classifier was the standard cross-entropy loss weighted by a factor β. For the VAE
reconstruction loss we used an L1 loss function. As has been adopted in prior works,33 a weighting factor Γiwas
applied to Kullback-Leibler loss term of the VAE and linearly increased from 0 to 1 over the course of the first
5,000 iterations. This practice prevents the optimizer from only focusing on minimizing the KL loss in the early
stages of training. The factors βand γhad to be determined empirically. After experimenting with weightings
of β {1,10,100}and γ {0.001,0.0001}we used β= 10 and γ= 0.001 as this was the only combination where
all three loss functions decreased during training. Further tuning of these hyperparameters was not attempted.
The F1 score and AUC in the validation set were monitored during training. All models were trained for exactly
four epochs, which was found to be sufficient for the validation F1 and AUC to converge.
3. RESULTS
AUC 5-year AUC 5-year AUC 5-year
method mortality CVD or mortality CVD
CNN only 0.768±0.026
CNN+VAE (arch #1) 0.787±0.030 0.767 ±0.036
CNN+VAE (arch #1) + age+sex 0.777±0.031
CNN+VAE (arch #2) 0.770±0.034
FRS90.688 0.695
BMI90.499 0.552
best CT derived model from Pickhardt et al. 202090.789 0.742
best multivariate model (CT biomarkers+FRS)90.796 0.751
Table 1. Summary of average AUCs in four-fold cross validation and standard deviation over the four folds.
box size validation AUC test AUC
128x128x128 0.75 0.74
192x192x128 0.76 0.76
192x192x192 0.78 0.79
256x256x128 0.78 0.76
Table 2. Hyperparameter optimization experiments testing different box sizes for five year survival prediction using the
CNN-only architecture. Validation AUC is an average of the validation AUCs obtained in the the last epoch of training
and the test AUC is the AUC on the hold-out test set for the final model after 4 epochs of training.
A summary of the average AUCs obtained in four-fold cross validation is provided in table 3and select ROC
curves are shown in figure 4. In line with our hypothesis, we found that the VAE obtained a higher AUC than the
CNN only (0.787 vs 0.768), although we note that due to the limited number of folds employed means the result is
not statistically significant (Welch’s t-test p= 0.34). VAE architecture option #2, where the classifier is attached
to the latent vector means, performed slightly worse than VAE architecture option #1. The concatenation of
age and sex information to the final layer of the classifier did not improve the AUC. This may partially be due
Proc. of SPIE Vol. 12033 120332N-4
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 07 Apr 2022
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
Figure 3. Visualization of a case from the publicly available Kidney Tumor Segmentation Challenge (KiTS) dataset which
contains aortic plaque. The patterns observed here were found to be remarkably consistent across five different cases we
looked at. This method is likely showing only where the network is looking generally in the first few layers and it is hard
to draw many firm conclusions about how the model functions internally on the basis of this sort of visualization.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
1 - specificity (false positive rate)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
sensitivity
CNN only, avg AUC = 0.77
CNN+VAE, avg AUC = 0.79
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
1 - specificity (false positive rate)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
sensitivity
CNN+VAE CVD only, avg AUC = 0.77
Figure 4. Select average ROC curves under 4-fold cross validation for five year survival prediction (left) and five year
CVD prediction (right). The shaded regions show the standard deviation over the four folds.
to the fact that the ages of the patients did not vary that much. It may be possible to better incorporate the
age and gender information, eg. by adding an additional fully connected layer.
The results in table 3were obtained using a box size of 192x192x128. After obtaining those results we ran
some hyperparameter optimization experiments to investigate the effect of box size, the results of which are
shown in table 3). The results show a slightly lower AUC is found when using a smaller box of size 128x128x128
and a slightly higher AUC is obtained when the box size is increased to 192x192x192, at least for the one fold
under consideration. The results suggest that a larger box of 192x192x192 may be beneficial but there is little
benefit to increasing the box size beyond that.
We experimented with visualizing model using guided backpropagation34 using a publicly available code-
base.35 An example of the visualization obtained can be found in figure 3. As shown by Adebayo et al.,36
“saliency” methods like guided backprop are only sensitive to what is going on in the first few layers. Such
methods show roughly where the network is looking, but do not provide much insight into what the network is
doing and should be interpreted with caution.37,38
Proc. of SPIE Vol. 12033 120332N-5
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 07 Apr 2022
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
4. NEW OR BREAKTHROUGH WORK
For the first time we have shown how a deep learning model can predict all-cause mortality risk directly from
an abdominal CT using a novel jointly trained VAE architecture. We found that the use of the VAE decoder
improves the model over using just a plain CNN architecture. The VAE based model is significantly better
than using the Framingham Risk Score at predicting five year mortality and is of equivalent accuracy to the
best system for abdominal CT demonstrated in Pickhardt et al.9(AUC 0.787 vs 0.789). Our model has the
advantage of being much simpler than the prior system, which requires the use of five independent software tools
for plaque, bone mineral density, visceral/subcutaneous fat, liver fat, and muscle quantification. On a modern
workstation with a NVIDIA Quadro RTX 8000 GPU our model runs in about four seconds and that time reduces
to less than two seconds when our model is preloaded into GPU memory. Many avenues are open for improving
our model further.
ACKNOWLEDGMENTS
This research was supported in part by the Intramural Research Program of the National Institutes of Health
Clinical Center.
REFERENCES
[1] “Cardiovascular diseases (CVDs) https://www.who.int/news-room/fact-sheets/detail/
cardiovascular-diseases-(cvds),” (June 2021).
[2] McGill, H. C., McMahan, C. A., and Gidding, S. S., “Preventing heart disease in the 21st century,” Circu-
lation 117, 1216–1227 (Mar. 2008).
[3] Han, Y., Xie, H., Liu, Y., Gao, P., Yang, X., and Shen, Z., “Effect of metformin on all-cause and cardiovascu-
lar mortality in patients with coronary artery diseases: a systematic review and an updated meta-analysis,”
Cardiovascular Diabetology 18 (July 2019).
[4] Kulkarni, A. S., Gubbi, S., and Barzilai, N., “Benefits of metformin in attenuating the hallmarks of aging,”
Cell Metabolism 32, 15–30 (July 2020).
[5] Patrono, C. and Baigent, C., “Role of aspirin in primary prevention of cardiovascular disease,” Nature
Reviews Cardiology 16, 675–686 (June 2019).
[6] King, A., “A CRISPR edit for heart disease,” Nature 555, S23–S25 (Mar. 2018).
[7] Wilson, P. W. F., D’Agostino, R. B., Levy, D., Belanger, A. M., Silbershatz, H., and Kannel, W. B.,
“Prediction of coronary heart disease using risk factor categories,” Circulation 97, 1837–1847 (May 1998).
[8] D’Agostino, R. B., Vasan, R. S., Pencina, M. J., Wolf, P. A., Cobain, M., Massaro, J. M., and Kannel,
W. B., “General cardiovascular risk profile for use in primary care,” Circulation 117, 743–753 (Feb. 2008).
[9] Pickhardt, P. J., Graffy, P. M., Zea, R., Lee, S. J., Liu, J., Sandfort, V., and Summers, R. M., “Automated
CT biomarkers for opportunistic prediction of future cardiovascular events and mortality in an asymptomatic
screening population: a retrospective cohort study,” The Lancet Digital Health 2, e192–e200 (Apr. 2020).
[10] Damen, J. A., Pajouheshnia, R., Heus, P., Moons, K. G. M., Reitsma, J. B., Scholten, R. J. P. M., Hooft,
L., and Debray, T. P. A., “Performance of the framingham risk models and pooled cohort equations for
predicting 10-year risk of cardiovascular disease: a systematic review and meta-analysis,” BMC Medicine 17
(June 2019).
[11] McClelland, R. L., Nasir, K., Budoff, M., Blumenthal, R. S., and Kronmal, R. A., “Arterial age as a function
of coronary artery calcium (from the multi-ethnic study of atherosclerosis [MESA]),” The American Journal
of Cardiology 103, 59–63 (Jan. 2009).
[12] Chiles, C., Duan, F., Gladish, G. W., Ravenel, J. G., Baginski, S. G., Snyder, B. S., DeMello, S., Desjardins,
S. S., and and, R. F. M., “Association of coronary artery calcification and mortality in the national lung
screening trial: A comparison of three scoring methods,” Radiology 276, 82–90 (July 2015).
[13] Yeboah, J., McClelland, R. L., Polonsky, T. S., Burke, G. L., Sibley, C. T., O’Leary, D., Carr, J. J.,
Goff, D. C., Greenland, P., and Herrington, D. M., “Comparison of novel risk markers for improvement in
cardiovascular risk assessment in intermediate-risk individuals,” JAMA 308, 788 (Aug. 2012).
Proc. of SPIE Vol. 12033 120332N-6
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 07 Apr 2022
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
[14] Mets, O. M., Vliegenthart, R., Gondrie, M. J., Viergever, M. A., Oudkerk, M., de Koning, H. J., Mali, W. P.,
Prokop, M., van Klaveren, R. J., van der Graaf, Y., Buckens, C. F., Zanen, P., Lammers, J.-W. J., Groen,
H. J., Isgum, I., and de Jong, P. A., “Lung cancer screening CT-based prediction of cardiovascular events,”
JACC: Cardiovascular Imaging 6, 899–907 (Aug. 2013).
[15] McClelland, R. L., Jorgensen, N. W., Budoff, M., Blaha, M. J., Post, W. S., Kronmal, R. A., Bild, D. E.,
Shea, S., Liu, K., Watson, K. E., Folsom, A. R., Khera, A., Ayers, C., Mahabadi, A.-A., Lehmann, N.,
ockel, K.-H., Moebus, S., Carr, J. J., Erbel, R., and Burke, G. L., “10-year coronary heart disease risk pre-
diction using coronary artery calcium and traditional risk factors,” Journal of the American College of
Cardiology 66, 1643–1653 (Oct. 2015).
[16] Gonz´alez, G., Washko, G. R., Est´epar, R. S. J., Cazorla, M., and Espinosa, C. C., “Automated agatston
score computation in non-ECG gated CT scans using deep learning,” in [Medical Imaging 2018: Image
Processing], Angelini, E. D. and Landman, B. A., eds., SPIE (Mar. 2018).
[17] Commandeur, F., Slomka, P. J., Goeller, M., Chen, X., Cadet, S., Razipour, A., McElhinney, P., Gransar,
H., Cantu, S., Miller, R. J. H., Rozanski, A., Achenbach, S., Tamarappoo, B. K., Berman, D. S., and Dey, D.,
“Machine learning to predict the long-term risk of myocardial infarction and cardiac death based on clinical
risk, coronary calcium, and epicardial adipose tissue: a prospective study,” Cardiovascular Research 116,
2216–2225 (Dec. 2019).
[18] Lee, H., Martin, S., Burt, J. R., Bagherzadeh, P. S., Rapaka, S., Gray, H. N., Leonard, T. J., Schwemmer, C.,
and Schoepf, U. J., “Machine learning and coronary artery calcium scoring,” Current Cardiology Reports 22
(July 2020).
[19] Zeleznik, R., Foldyna, B., Eslami, P., Weiss, J., Alexander, I., Taron, J., Parmar, C., Alvi, R. M., Banerji,
D., Uno, M., Kikuchi, Y., Karady, J., Zhang, L., Scholtz, J.-E., Mayrhofer, T., Lyass, A., Mahoney, T. F.,
Massaro, J. M., Vasan, R. S., Douglas, P. S., Hoffmann, U., Lu, M. T., and Aerts, H. J. W. L., “Deep
convolutional neural networks to predict cardiovascular risk from computed tomography,” Nature Commu-
nications 12 (Jan. 2021).
[20] Chao, H., Shan, H., Homayounieh, F., Singh, R., Khera, R. D., Guo, H., Su, T., Wang, G., Kalra, M. K., and
Yan, P., “Deep learning predicts cardiovascular disease risks from lung cancer screening low dose computed
tomography,” Nature Communications 12 (May 2021).
[21] de Vos, B. D., de Jong, P. A., Wolterink, J. M., Vliegenthart, R., Wielingen, G. V., Viergever, M. A., and
sgum, I., “Automatic machine learning based prediction of cardiovascular events in lung cancer screening
data,” in [Medical Imaging 2015: Computer-Aided Diagnosis], Hadjiiski, L. M. and Tourassi, G. D., eds.,
SPIE (Mar. 2015).
[22] Oakden-Rayner, L., Carneiro, G., Bessen, T., Nascimento, J. C., Bradley, A. P., and Palmer, L. J., “Pre-
cision radiology: Predicting longevity using feature engineering and deep learning methods in a radiomics
framework,” Scientific Reports 7(May 2017).
[23] van Velzen, S., Zreik, M., Lessmann, N., Viergever, M. A., de Jong, P. A., Verkooijen, H. M., and sgum,
I., “Direct prediction of cardiovascular mortality from low-dose chest CT using deep learning,” in [Medical
Imaging 2019: Image Processing], Angelini, E. D. and Landman, B. A., eds., SPIE (Mar. 2019).
[24] Guo, H., Kruger, U., Wang, G., Kalra, M. K., and Yan, P., “Knowledge-based analysis for mortality
prediction from CT images,” IEEE Journal of Biomedical and Health Informatics 24, 457–464 (Feb. 2020).
[25] Karargyris, A., Kashyap, S., Wu, J. T., Sharma, A., Moradi, M., and Syeda-Mahmood, T., “Age prediction
using a large chest x-ray dataset,” in [Medical Imaging 2019: Computer-Aided Diagnosis], Hahn, H. K. and
Mori, K., eds., SPIE (Mar. 2019).
[26] Raghu, V. K., Weiss, J., Hoffmann, U., Aerts, H. J., and Lu, M. T., “Deep learning to estimate biological
age from chest radiographs,” JACC: Cardiovascular Imaging (Mar. 2021).
[27] O’Connor, S. D., Graffy, P. M., Zea, R., and Pickhardt, P. J., “Does nonenhanced CT-based quantification
of abdominal aortic calcification outperform the framingham risk score in predicting cardiovascular events
in asymptomatic adults?,” Radiology 290, 108–115 (Jan. 2019).
[28] Zambrano Chaves, J. M., Chaudhari, A. S., Wentland, A. L., Desai, A. D., Banerjee, I., Boutin, R. D.,
Maron, D. J., Rodriguez, F., Sandhu, A. T., Jeffrey, R. B., Rubin, D., and Patel, B., “Opportunistic
assessment of ischemic heart disease risk using abdominopelvic computed tomography and medical record
data: a multimodal explainable artificial intelligence approach,” medRxiv (2021).
Proc. of SPIE Vol. 12033 120332N-7
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 07 Apr 2022
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
[29] Sethi, A., Taylor, L., Ruby, J. G., Venkataraman, J., Sorokin, E., Cule, M., and Melamud, E., “Calcification
of abdominal aorta is a high risk underappreciated cardiovascular disease factor in a general population,”
medRxiv (2020).
[30] Pickhardt, P. J., Graffy, P. M., Perez, A. A., Lubner, M. G., Elton, D. C., and Summers, R. M., “Opportunis-
tic screening at abdominal CT: Use of automated body composition biomarkers for added cardiometabolic
value,” RadioGraphics 41, 524–542 (Mar. 2021).
[31] Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., and Han, J., “On the variance of the adaptive
learning rate and beyond,” in [Proceedings of the 8th International Conference on Learning Representations
(ICLR)], (2020).
[32] Myronenko, A., “3D MRI brain tumor segmentation using autoencoder regularization,” arXiv e-prints ,
arXiv:1810.11654 (Oct. 2018).
[33] Sandfort, V., Yan, K., Graffy, P. M., Pickhardt, P. J., and Summers, R. M., “Use of variational autoen-
coders with unsupervised learning to detect incorrect organ segmentations at CT,” Radiology: Artificial
Intelligence 3, e200218 (July 2021).
[34] Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. A., “Striving for simplicity: The all
convolutional net,” in [3rd International Conference on Learning Representations, ICLR 2015, San Diego,
CA, USA, May 7-9, 2015, Workshop Track Proceedings ], Bengio, Y. and LeCun, Y., eds. (2015).
[35] ohle, M., Eitel, F., Weygandt, M., and Ritter, K., “Layer-wise relevance propagation for explaining
deep neural network decisions in MRI-based alzheimer's disease classification,” Frontiers in Aging Neu-
roscience 11 (July 2019).
[36] Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., and Kim, B., “Sanity checks for saliency
maps,” in [Proceedings of the 32nd International Conference on Neural Information Processing Systems],
NIPS’18, 9525–9536, Curran Associates Inc., Red Hook, NY, USA (2018).
[37] Rudin, C., “Stop explaining black box machine learning models for high stakes decisions and use inter-
pretable models instead,” Nature Machine Intelligence 1, 206–215 (May 2019).
[38] Ghassemi, M., Oakden-Rayner, L., and Beam, A. L., “The false hope of current approaches to explainable
artificial intelligence in health care,” The Lancet Digital Health 3, e745–e750 (Nov. 2021).
Proc. of SPIE Vol. 12033 120332N-8
Downloaded From: https://www.spiedigitallibrary.org/conference-proceedings-of-spie on 07 Apr 2022
Terms of Use: https://www.spiedigitallibrary.org/terms-of-use
... In parallel, there have been efforts in related all-cause mortality prediction problems to include medical imaging data. Elton et al. 5 have successfully explored using abdominal CT scans for cardiovascular disease and five-year survival prediction, similarly, in another work focused on CT imaging, Yan and colleagues 6 used low-dose CT imaging to predict all-cause mortality for lung cancer subjects. However, such imaging-focused approaches have not yet been explored for general body-composition-based allcause mortality prediction models. ...
Article
Full-text available
Background Mortality research has identified biomarkers predictive of all-cause mortality risk. Most of these markers, such as body mass index, are predictive cross-sectionally, while for others the longitudinal change has been shown to be predictive, for instance greater-than-average muscle and weight loss in older adults. And while sometimes markers are derived from imaging modalities such as DXA, full scans are rarely used. This study builds on that knowledge and tests two hypotheses to improve all-cause mortality prediction. The first hypothesis is that features derived from raw total-body DXA imaging using deep learning are predictive of all-cause mortality with and without clinical risk factors, meanwhile, the second hypothesis states that sequential total-body DXA scans and recurrent neural network models outperform comparable models using only one observation with and without clinical risk factors. Methods Multiple deep neural network architectures were designed to test theses hypotheses. The models were trained and evaluated on data from the 16-year-long Health, Aging, and Body Composition Study including over 15,000 scans from over 3000 older, multi-race male and female adults. This study further used explainable AI techniques to interpret the predictions and evaluate the contribution of different inputs. Results The results demonstrate that longitudinal total-body DXA scans are predictive of all-cause mortality and improve performance of traditional mortality prediction models. On a held-out test set, the strongest model achieves an area under the receiver operator characteristic curve of 0.79. Conclusion This study demonstrates the efficacy of deep learning for the analysis of DXA medical imaging in a cross-sectional and longitudinal setting. By analyzing the trained deep learning models, this work also sheds light on what constitutes healthy aging in a diverse cohort.
Article
Full-text available
Early and accurate cardiovascular disease detection is a very crucial task to lower the mortality rate of a patient with a diagnosis of cardiovascular disease. Deep learning approaches are working effectively in discriminating or extracting important features from cardiovascular images to detect the disease. The objective of this paper is to deliver a comprehensive and detailed review article based on comparative analysis of imaging modalities, datasets, and deep learning architectures. The paper focuses to do an in-depth review on advances of deep architectural networks with relevant feature based characteristics used in cardiovascular disease (CVD) diagnosis. The novelty of the paper lies in including characteristics and challenges of cardiovascular imaging, their milestones in deep learning techniques, taxonomy of cardiovascular imaging with their purpose, dataset observation summarization, diagnosis strategies through deep learning and finally, deep learning architecture analysis. The performance of DL networks has been analyzed through CVD classification, segmentation and detection. It has been reported that AlexNet generates highest classification accuracy with 99%; for segmentation purpose, U-Net is the best technique with dice score 0.98 and in CVD detection, DenseNet 121, TR-Net, ResNet50 provide approximately 92% detection rate. At last, important findings are reported and identified as promising research directions for the future.
Chapter
Heart disease (Cardiovascular) illness presents a noteworthy public health issue and ranks among the primary factors contributing to mortality worldwide. The World Health Organization (WHO) reports that approximately 32% of worldwide fatalities are attributed to heart disease. Consequently, it becomes crucial to implement preventive measures that enable the prediction of heart disease risks, aiming to mitigate its occurrence and decrease associated mortality rates. Several technologies and methodologies have been utilized to forecast the likelihood of heart disease by leveraging patient data and existing risk factors. One such approach is Machine Learning, specifically the Supervised Learning Binary Classification Technique of distinguishing between individuals with or without heart disease. Within this framework, the objective is to predict an individual's probability of developing heart disease based on specific features. The selection of these features is grounded in the strongest correlations observed in the available data. The researchers have identified several highly correlated features, namely ST Slope Up, ST Slope Flat, Exercise Angina, Oldpeak, Chest Pain Type, Max HR, and Sex. The objective of this research is to create an advanced predictive model that enhances precision by employing Feature Engineering and Hyperparameter techniques in a specific case study centered around forecasting the likelihood of heart disease. The results are promising, with the initial stage before the hyperparameter tuning 69.57% (data validation and 68.12% (data testing). After that, the model achieved an accuracy of 82.61% (data validation) and 86.23% (data testing) with the aid of the K-Nearest Neighbors algorithm with Hyperparameter Tuning GridSearch.
Chapter
Artificial intelligence (AI) is changing the medical research and patient care field by showing data patterns that allow predicting disease, disease progress, and treatment outcomes for individual patients. Big-data sets from these fields require advanced technology for analysis. High cancer mortality negates advances in oncology research. Traditional approaches are becoming inadequate to efficiently combat cancer due to cancer’s heterogenous nature. Accurate risk assessment, prevention, detection, segmentation, and cancer treatment present major challenges for successful patient outcomes. AI-based tool advancement presents a potent weapon for improved cancer care by advancing personalized patient care. These tools have promise for improved therapeutic potential and identifying novel biomarkers and drug targets. Effective implementation of precision oncology needs a positive impact on patient outcome, provides decision support in real time, and discovery of unique patient patterns of disease progression. Emerging technologies present with new challenges; the benefits of AI technology in precision oncology outweigh the challenges. AI-based precision oncology provides augmented intelligence to aid clinician decision-making. Advancement of wet-lab-based assays, high throughput NGS data, bioinformatics tools, and strategies to detect novel biomarkers that accurately predict prognosis and enhance treatment regimens are urgently warranted. This review will focus on AI-based tools in the detection and identification of cancer biomarkers for accurate prognosis with the overall aim of enhancing treatment regimens, advancing precision oncology, and improving patient outcomes.
Article
Full-text available
Cancer is associated with significant morbimortality globally. Advances in screening, diagnosis, management and survivorship were substantial in the last decades, however, challenges in providing personalized and data-oriented care remain. Artificial intelligence (AI), a branch of computer science used for predictions and automation, has emerged as potential solution to improve the healthcare journey and to promote precision in healthcare. AI applications in oncology include, but are not limited to, optimization of cancer research, improvement of clinical practice (eg., prediction of the association of multiple parameters and outcomes – prognosis and response) and better understanding of tumor molecular biology. In this review, we examine the current state of AI in oncology, including fundamentals, current applications, limitations and future perspectives.
Article
Full-text available
Calcification of large arteries is a high-risk factor in the development of cardiovascular diseases, however, due to the lack of routine monitoring, the pathology remains severely under-diagnosed and prevalence in the general population is not known. We have developed a set of machine learning methods to quantitate levels of abdominal aortic calcification (AAC) in the UK Biobank imaging cohort and carried out the largest to-date analysis of genetic, biochemical, and epidemiological risk factors associated with the pathology. In a genetic association study, we identified three novel loci associated with AAC (FGF9, NAV9, and APOE), and replicated a previously reported association at the TWIST1/HDAC9 locus. We find that AAC is a highly prevalent pathology, with ~ 1 in 10 adults above the age of 40 showing significant levels of hydroxyapatite build-up (Kauppila score > 3). Presentation of AAC was strongly predictive of future cardiovascular events including stenosis of precerebral arteries (HR~1.5), myocardial infarction (HR~1.3), ischemic heart disease (HR~1.3), as well as other diseases such as chronic obstructive pulmonary disease (HR~1.3). Significantly, we find that the risk for myocardial infarction from elevated AAC (HR ~1.4) was comparable to the risk of hypercholesterolemia (HR~1.4), yet most people who develop AAC are not hypercholesterolemic. Furthermore, the overwhelming majority (98%) of individuals who develop pathology do so in the absence of known pre-existing risk conditions such as chronic kidney disease and diabetes (0.6% and 2.7% respectively). Our findings indicate that despite the high cardiovascular risk, calcification of large arteries remains a largely under-diagnosed lethal condition, and there is a clear need for increased awareness and monitoring of the pathology in the general population.
Article
Full-text available
The black-box nature of current artificial intelligence (AI) has caused some to question whether AI must be explainable to be used in high-stakes scenarios such as medicine. It has been argued that explainable AI will engender trust with the health-care workforce, provide transparency into the AI decision making process, and potentially mitigate various kinds of bias. In this Viewpoint, we argue that this argument represents a false hope for explainable AI and that current explainability methods are unlikely to achieve these goals for patient-level decision support. We provide an overview of current explainability techniques and highlight how various failure cases can cause problems for decision making for individual patients. In the absence of suitable explainability methods, we advocate for rigorous internal and external validation of AI models as a more direct means of achieving the goals often associated with explainability, and we caution against having explainability be a requirement for clinically deployed models.
Article
Full-text available
Cancer patients have a higher risk of cardiovascular disease (CVD) mortality than the general population. Low dose computed tomography (LDCT) for lung cancer screening offers an opportunity for simultaneous CVD risk estimation in at-risk patients. Our deep learning CVD risk prediction model, trained with 30,286 LDCTs from the National Lung Cancer Screening Trial, achieves an area under the curve (AUC) of 0.871 on a separate test set of 2,085 subjects and identifies patients with high CVD mortality risks (AUC of 0.768). We validate our model against ECG-gated cardiac CT based markers, including coronary artery calcification (CAC) score, CAD-RADS score, and MESA 10-year risk score from an independent dataset of 335 subjects. Our work shows that, in high-risk patients, deep learning can convert LDCT for lung cancer screening into a dual-screening quantitative tool for CVD risk estimation. Low dose computed tomography (LDCT) for lung cancer screening offers an opportunity for simultaneous CVD risk estimation in at-risk patients. Here, the authors develop a deep learning model to perform this task, showing human-level performance.
Article
Full-text available
Abdominal CT is a frequently performed imaging examination for a wide variety of clinical indications. In addition to the immediate reason for scanning, each CT examination contains robust additional data on body composition that generally go unused in routine clinical practice. There is now growing interest in harnessing this additional information. Prime examples of cardiometabolic information include measurement of bone mineral density for osteoporosis screening, quantification of aortic calcium for assessment of cardiovascular risk, quantification of visceral fat for evaluation of metabolic syndrome, assessment of muscle bulk and density for diagnosis of sarcopenia, and quantification of liver fat for assessment of hepatic steatosis. All of these relevant biometric measures can now be fully automated through the use of artificial intelligence algorithms, which provide rapid and objective assessment and allow large-scale population-based screening. Initial investigations into these measures of body composition have demonstrated promising performance for prediction of future adverse events that matches or exceeds the best available clinical prediction models, particularly when these CT-based measures are used in combination. In this review, the concept of CT-based opportunistic screening is discussed, and an overview of the various automated biomarkers that can be derived from essentially all abdominal CT examinations is provided, drawing heavily on the authors' experience. As radiology transitions from a volume-based to a value-based practice, opportunistic screening represents a promising example of adding value to services that are already provided. If the potentially high added value of these objective CT-based automated measures is ultimately confirmed in subsequent investigations, this opportunistic screening approach could be considered for intentional CT-based screening. ©RSNA, 2021.
Article
Full-text available
Coronary artery calcium is an accurate predictor of cardiovascular events. While it is visible on all computed tomography (CT) scans of the chest, this information is not routinely quantified as it requires expertise, time, and specialized equipment. Here, we show a robust and time-efficient deep learning system to automatically quantify coronary calcium on routine cardiac-gated and non-gated CT. As we evaluate in 20,084 individuals from distinct asymptomatic (Framingham Heart Study, NLST) and stable and acute chest pain (PROMISE, ROMICAT-II) cohorts, the automated score is a strong predictor of cardiovascular events, independent of risk factors (multivariable-adjusted hazard ratios up to 4.3), shows high correlation with manual quantification, and robust test-retest reliability. Our results demonstrate the clinical value of a deep learning system for the automated prediction of cardiovascular events. Implementation into clinical practice would address the unmet need of automating proven imaging biomarkers to guide management and improve population health.
Article
Full-text available
Purpose of review: To summarize current artificial intelligence (AI)-based applications for coronary artery calcium scoring (CACS) and their potential clinical impact. Recent findings: Recent evolution of AI-based technologies in medical imaging has accelerated progress in CACS performed in diverse types of CT examinations, providing promising results for future clinical application in this field. CACS plays a key role in risk stratification of coronary artery disease (CAD) and patient management. Recent emergence of AI algorithms, particularly deep learning (DL)-based applications, have provided considerable progress in CACS. Many investigations have focused on the clinical role of DL models in CACS and showed excellent agreement between those algorithms and manual scoring, not only in dedicated coronary calcium CT but also in coronary CT angiography (CCTA), low-dose chest CT, and standard chest CT. Therefore, the potential of AI-based CACS may become more influential in the future.
Article
Full-text available
Background: Body CT scans are frequently performed for a wide variety of clinical indications, but potentially valuable biometric information typically goes unused. We investigated the prognostic ability of automated CT-based body composition biomarkers derived from previously-developed deep-learning and feature-based algorithms for predicting major cardiovascular events and overall survival in an adult screening cohort, compared with clinical parameters. Methods: Mature and fully-automated CT-based algorithms with pre-defined metrics for quantifying aortic calcification, muscle density, visceral/subcutaneous fat, liver fat, and bone mineral density (BMD) were applied to a generally-healthy asymptomatic outpatient cohort of 9223 adults (mean age, 57.1 years; 5152 women) undergoing abdominal CT for routine colorectal cancer screening. Longitudinal clinical follow-up (median, 8.8 years; IQR, 5.1-11.6 years) documented subsequent major cardiovascular events or death in 19.7% (n=1831). Predictive ability of CT-based biomarkers was compared against the Framingham Risk Score (FRS) and body mass index (BMI). Findings: Significant differences were observed for all five automated CT-based body composition measures according to adverse events (p<0.001). Univariate 5-year AUROC (with 95% CI) for automated CT-based aortic calcification, muscle density, visceral/subcutaneous fat ratio, liver density, and vertebral density for predicting death were 0.743(0.705-0.780)/0.721(0.683-0.759)/0.661(0.625-0.697)/0.619 (0.582-0.656)/0.646(0.603-0.688), respectively, compared with 0.499(0.454-0.544) for BMI and 0.688(0.650-0.727) for FRS (p<0.05 for aortic calcification vs. FRS and BMI); all trends were similar for 2-year and 10-year ROC analyses. Univariate hazard ratios (with 95% CIs) for highest-risk quartile versus others for these same CT measures were 4.53(3.82-5.37) /3.58(3.02-4.23)/2.28(1.92-2.71)/1.82(1.52-2.17)/2.73(2.31-3.23), compared with 1.36(1.13-1.64) and 2.82(2.36-3.37) for BMI and FRS, respectively. Similar significant trends were observed for cardiovascular events. Multivariate combinations of CT biomarkers further improved prediction over clinical parameters (p<0.05 for AUROCs). For example, by combining aortic calcification, muscle density, and liver density, the 2-year AUROC for predicting overall survival was 0.811 (0.761-0.860). Interpretation: Fully-automated quantitative tissue biomarkers derived from CT scans can outperform established clinical parameters for pre-symptomatic risk stratification for future serious adverse events, and add opportunistic value to CT scans performed for other indications.
Article
Purpose: To develop a deep learning model to detect incorrect organ segmentations at CT. Materials and methods: In this retrospective study, a deep learning method was developed using variational autoencoders (VAEs) to identify problematic organ segmentations. First, three different three-dimensional (3D) U-Nets were trained on segmented CT images of the liver (n = 141), spleen (n = 51), and kidney (n = 66). A total of 12 495 CT images then were segmented by the 3D U-Nets, and output segmentations were used to train three different VAEs for the detection of problematic segmentations. Automatic reconstruction errors (Dice scores) were then calculated. A random sampling of 2510 segmented images each for the liver, spleen, and kidney models were assessed manually by a human reader to determine problematic and correct segmentations. The ability of the VAEs to identify unusual or problematic segmentations was evaluated using receiver operating characteristic curve analysis and compared with traditional non-deep learning methods for outlier detection. Using the VAE outputs, passive and active learning approaches were performed on the original 3D U-Nets to determine if training could decrease segmentation error rates (15 CT scans were added to the original training data, according to each approach). Results: The mean area under the receiver operating characteristic curve (AUC) for detecting problematic segmentations using the VAE method was 0.90 (95% CI: 0.89, 0.92) for kidney, 0.94 (95% CI: 0.93, 0.95) for liver, and 0.81 (95% CI: 0.80, 0.82) for spleen. The VAE performance was higher compared with traditional methods in most cases. For example, for liver segmentation, the highest performing non-deep learning method for outlier detection had an AUC of 0.83 (95% CI: 0.77, 0.90) compared with 0.94 (95% CI: 0.93, 0.95) using the VAE method (P < .05). Using the information on problematic segmentations for active learning approaches decreased 3D U-Net segmentation error rates (original error rate, 7.1%; passive learning, 6.0%; active learning, 5.7%). Conclusion: A method was developed to screen for unusual and problematic automatic organ segmentations using a 3D VAE.Keywords: Convolutional Neural Network (CNN), Deep Learning Algorithms, Machine Learning Algorithms, Segmentation, CT© RSNA, 2021.
Article
Objectives The goal of this study was to assess whether a deep learning estimate of age from a chest radiograph image (CXR-Age) can predict longevity beyond chronological age. Background Chronological age is an imperfect measure of longevity. Biological age, a measure of overall health, may improve personalized care. This paper proposes a new way to estimate biological age using a convolutional neural network that takes as input a CXR image and outputs a chest x-ray age (in years) as a measure of long-term mortality risk. Methods CXR-Age was developed using CXR from 116,035 individuals and validated in 2 held-out testing sets: 1) 75% of the CXR arm of PLCO (Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial) (N = 40,967); and 2) the CXR arm of NLST (National Lung Screening Trial) (N = 5,414). CXR-Age was compared to chronological age and a multivariable regression model of chronological age, risk factors, and radiograph findings to predict all-cause and cardiovascular mortality with a maximum 23 years and 13 years of follow-up, respectively. The primary outcome was observed mortality; results are provided for the testing datasets only. Results In the PLCO testing dataset, a 5-year increase in CXR-Age carried a higher risk of all-cause mortality than a 5-year increase in chronological age (CXR-Age hazard ratio [HR]: 2.26 [95% confidence interval (CI): 2.24 to 2.29] vs. chronological age HR: 1.77 [95% CI: 1.75 to 1.78]; p < 0.001). A similar pattern was found for cardiovascular mortality (CXR-Age cause-specific HR: 2.45 per 5 years [95% CI: 2.34 to 2.56] vs. chronological age HR: 1.82 per 5 years [95% CI: 1.74 to 1.90]). Similar results were seen for both outcomes in the NLST external testing dataset. Adding CXR-Age to the multivariable model resulted in significant improvements for predicting both outcomes in both testing datasets (p < 0.001 for all comparisons). Conclusions Based on a CXR image, CXR-Age predicted long-term all-cause and cardiovascular mortality.
Article
Biological aging involves an interplay of conserved and targetable molecular mechanisms, summarized as the hallmarks of aging. Metformin, a biguanide that combats age-related disorders and improves health span, is the first drug to be tested for its age-targeting effects in the large clinical trial—TAME (targeting aging by metformin). This review focuses on metformin’s mechanisms in attenuating hallmarks of aging and their interconnectivity, by improving nutrient sensing, enhancing autophagy and intercellular communication, protecting against macromolecular damage, delaying stem cell aging, modulating mitochondrial function, regulating transcription, and lowering telomere attrition and senescence. These characteristics make metformin an attractive gerotherapeutic to translate to human trials.