Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Research article
Using machine learning model for predicting risk of memory
decline: A cross sectional study
Ying Song
a
, Yansun Sun
b
, Qi Weng
a
, Li Yi
a,*
a
Department of Neurology, Peking University Shenzhen Hospital, Shenzhen, China
b
Department of Geriatrics, Peking University Shenzhen Hospital, Shenzhen, China
ARTICLE INFO
Keywords:
Machine learning
Alzheimer’s disease
Memory decline
NHANES
SHAP
ABSTRACT
Background: Memory decline is the earliest symptom of various neurodegenerative disease, such
as Alzheimer’s disease (AD). However, accurately the prediction and identication of risk factors
leading up to memory decline has remained limited.
Objective: The objective of this study is to create and verify a machine learning model that can
accurately predict risk factors for memory decline among US adults.
Methods: A total of 9971 individuals were enrolled from the National Health and Nutrition Ex-
amination Survey (NHANES) 2015–2016 database. The least absolute shrinkage and selection
operator (LASSO) was used to screen for characteristic predictors. Five machine learning (ML)
algorithms: including Logistic Regression, ExtraTrees classier, Bagging classier, eXtreme
Gradient Boosting (XGBoost), and Random Forest (RF) were employed. The performance of each
model was evaluated by confusion matrix, area under curve (AUC), accuracy, precision, speci-
city, Recall and F1 scores.
Results: The ultimate sample comprised 4525 subjects, of whom 7.7 % (N =347) exhibited
memory deterioration. The ExtraTrees classier model and the XGBoost model demonstrated
superior prediction performance and clinical value compared to other independent machine
learning models, based on the AUC value of 0.915 and 0.911. Additionally, they consistently
demonstrated accurate predicting ability for memory decline in the external datasets, with an
AUC of 0.851 and 0.843, respectively.
Conclusion: The ExtraTrees classier and the XGBoost models were the two outperformed models
in predicting memory decline. Nevertheless, it is necessary to conduct future investigations to
conrm the accuracy of our ndings.
1. Introduction
Memory decline, which refers to the deterioration of the storage and utilization of learned information in the brain, serves as an
early indicator for many neurodegenerative disorders [1]. For example, unlike other cognitive failure syndromes in humans, AD
specically causes a signicant decline in declarative memory during its initial phases [2,3]. An estimated 13 million individuals are
projected to experience memory impairment associated with AD by the year 2050 [4]. Over 65 % of individuals with multiple sclerosis
(MS) experience cognitive impairment, with memory problems affecting more than 40 % of them [5]. Regarding the early cognitive
* Corresponding author.
E-mail address: yilitj@hotmail.com (L. Yi).
Contents lists available at ScienceDirect
Heliyon
journal homepage: www.cell.com/heliyon
https://doi.org/10.1016/j.heliyon.2024.e39575
Received 1 July 2024; Received in revised form 16 October 2024; Accepted 17 October 2024
Heliyon 10 (2024) e39575
Available online 19 October 2024
2405-8440/© 2024 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
( http://creativecommons.org/licenses/by-nc-nd/4.0/ ).
alterations of Parkinson’s disease (PD), which may be accompanied by memory impairment, increasing the risk of early progression to
dementia [6]. Furthermore, deviations in memory processes can lead to psychopathology in a range of more subtle methods. Even
without the presence of those neurologic and neuropsychiatric disorders, the belief that memory issues are linked to detrimental effects
on individuals and society at large, as memory is essential for animals to control behavior and to adapt to their surroundings [7].
Considering the high prevalence of memory decline, it is imperative to develop risk models to mitigate and postpone the onset of
memory decline.
Previous studies have investigated the risk factors for memory decline, which has been valuable in detecting it early and developing
specic treatments in clinical settings [8–10]. For example, Fl¨
oel et al. conducted a study to examine the relationship between lifestyle
factors and memory decline in elderly individuals using linear regression analysis [11]. Handajani et al. analyzed risk and protective
factors of memory impairment among Indonesia older adults [12]. However, there is currently a lack of a comprehensive and accurate
model for predicting the likelihood of memory impairment. Although various studies have compared the predictors of memory decline,
these methods mainly used traditional methods, such as logistic liner regression [13,14]. In addition, the existing risk prediction
models focused on old or very old people; while many of the diseases associated with memory loss are not limited to the elderly, and
less is known about what is occurring in the earlier years. Furthermore, it is during these early stages of development when preventive
measures have the potential to be more impactful in reversing or improving memory decline [15–17]. Therefore, it is crucial to develop
accurate risk screening models for memory impairment across all age groups and to enhance the management of adults at high risk.
Machine learning methods can enhance the accuracy of predicting memory deterioration, addressing the limitations of conven-
tional methods. Machine learning (ML) is a eld that combines statistics, which focuses on extracting relationships from data, and
computer science, which emphasizes the development of efcient computing techniques [18]. Owing its powerful functions to handle
extensive, intricate, and diverse data has resulted in notable advancements in medical research, greatly propelling worldwide
healthcare [19]. For example, Fayemiwo et al. built four ML algorithms for the prediction of dementia by using three cognitive
assessment tasks [20]. Their results indicated that immediate-word recall is reliable in predicting dementia with best model perfor-
mance in all the experiments. Bracher-Smith et al. applied nine ML models for predicting schizophrenia, which offers a promising
direction towards comprehending the accurate forecasting of psychiatric diseases [21]. These studies indicate that machine learning
has the potential to be a highly successful method for enhancing the accuracy of disease diagnosis and predicting risks, as well as
promoting effective intervention.
However, few studies have attempted to prediction risk factors of memory problem using machine learning approaches. The
present study uses data from a large cross-sectional study to identication and prediction of memory decline using machine learning
approaches among U.S. adults. To the best of our knowledge, this is the rst study to employ an ML approach for memory decline risk
assessment based on large population databases. The NHANES data collection and survey methods underwent rigorous examinations
and strictly adhered to standard protocols for quality control. Our ndings present a precise and dependable risk model for preventing
memory deterioration. Fig. 1 displayed the inclusion and exclusion criteria of the current analysis.
Fig. 1. Flow chart of sample selection in current analysis.
Abbreviations NHANES: national health and nutrition examination surveys; BMI: body mass index; COPD: chronic obstructive pulmonary disease;
RBC: red blood cell.
Y. Song et al. Heliyon 10 (2024) e39575
2
2. Materials and methods
2.1. Study design and participants
The analytical sample included 9971 individuals from the 2015–2016 NHANES cycles. A total of 28 predictors associated with
memory disorder were included in this study. After excluding 3717 individuals with missing uric acid, 1 individual with missing
triglycerides data, 3 individuals with missing iron data, 1 individual with missing creatine data, 8 individuals with missing creatine
phosphokinase data, 1109 individuals with missing memory data, 22 individuals with missing serum total folate data, 33 individuals
with missing red blood cell (RBC) folate data, 4 individuals with missing glycohemoglobin data, 47 individuals with missing sedentary
behavior data, 41 individuals with missing depression data, 6 individuals with missing anxiety data, 4 individuals with missing
concentration data, 2 individuals with missing seeing data, 1 individual with missing hearing data, 4 individuals with missing cancer
data, 3 individuals with missing stroke data, 21 individuals with missing coronary heart disease data, 3 individuals with missing
chronic obstructive pulmonary disease (COPD) data, 17 individuals with missing sleep data, 353 individual with missing income data,
3 individual with missing hypertension data and 43 individuals with missing BMI (body mass index) data, the nal sample consisted of
4525 participants.
2.2. Memory decline
Clinically, memory problems are identied based on performance on multiple tests, as well as accounts from the individual and
their family [22]. The Physical Functioning questionnaire (PFQ) provides respondent-level interview data on functional limitations
caused by long-term physical, mental, and emotional problems or illness. The memory function was assessed using the following
question: Are you limited in any way because of difculty remembering or because you experience (s) periods of confusion? Partic-
ipants answering “Yes”was considered as having impaired memory [1,23].
2.3. Other covariates
In this study, demographic factors include age (20–80 years), gender (male and female), educational level (≤9th grade, 9-11th
grade/includes 12th grade with no diploma, high school graduate/GED or equivalent, some college or AA degree, and college
graduate or above) and income. We divided income into three groups based on the family poverty income ratio (PIR): low income (<1),
median income ([1,3)), and high income (≥3). Lifestyle variables, include smoking status (subjects smoked fewer than 100 cigarettes
in life or not), sleep duration, and sedentary behavior. Serum specimens, including glycohemoglobin, RBC folate, serum total folate,
albumin, alkaline phosphatase, aspartate aminotransferase, alanine aminotransferase, blood urea nitrogen, cholesterol, creatine
phosphokinase, creatinine, gamma glutamyl transferase, iron, triglycerides, and uric acid, were obtained from laboratory data. Disease
related factors, including BMI, hypertension, coronary heart disease, stroke, COPD, hearing, seeing, concentration, anxiety, and
depression were obtained based on questionnaire.
Fig. 2. Workow to construct machine learning models for memory decline risk prediction.
Y. Song et al. Heliyon 10 (2024) e39575
3
2.4. Machine learning model development
The importance values for each feature were determined by employing the optimal
α
parameter and the LASSO algorithm. To
optimize the algorithm, a 5-fold cross-validation was implemented within the training set. Ultimately, the ML model was enhanced by
integrating signicant features to improve its predictive capabilities.
Participants were randomly assigned to the training (N =3620) and internal validation sets (N =905) in an 8:2 ratio. Five machine
learning methods, namely Logistic Regression, ExtraTrees classier, Bagging classier, XGBoost, and RF, were applied to develop the
risk models based on the training set. Further, a Grid Search with 5-fold cross validation was employed to nd all possible combi-
nations of hyperparameters for each ML model [24]. Then, each model’s performance was conducted by confusion matrix, AUC,
accuracy, precision, specicity, Recall and F1 scores. Moreover, the DCA curve was conducted on ve machine learning models for the
comparison of the net benet of each model and alternative approaches for clinical decision-making. Finally, The calibration curve was
created by contrasting the expected probability against the actual probability obtained from the machine learning models. Fig. 2 shows
the experimental workow for the modeling process.
2.5. External validation
Moreover, to ensure the generalization ability of the models, an external validation set used data from the NHANES 2017–2018
cycle were collected. The evaluation indicators used to compare the performance of models in the external validation cohort were
measured by confusion matrix, AUC, DCA and calibration curves.
2.6. Feature importance
The SHAP analysis provided an explanation of the crucial factors that played a signicant role in generating model predictions and
quantied their respective contributions to the overall performance of the model [25]. In this study, we calculated the feature
importance using the SHAP value for ExtraTrees classier and XGBoost, respectively.
2.7. Statistical methods
All analyses and calculations were performed using R software (4.3.0, http://www.Rproject.org) and Python (version 3.12.2,
https://www.python.org). Descriptive statistics were used to characterize the participants based on the outcome of memory decline:
Chi-squared tests were applied to categorical variables (frequency (%)), while Kruskal-Wallis tests were applied to continuous vari-
ables (means ±standard deviation (SD)). P <0.05 was considered statistically signicant.
3. Results
3.1. Characteristics of participants
A total of 4525 participants were included in the analysis. Patient demographics and memory-related variables were presented in
Table 1. Approximately 7.7 % (N =347) of participants had memory decline whereas 92.3 % (N =4178) without memory decline.
For the 347 patients with memory decline, their mean age was 57.8 ±16.4 years. Moreover, 8.1 % (N =28), 8.6 % (N =30), 12.7 %
(N =44), 15.6 % (N =54), 25.9 % (N =90), 20.2 % (N =70), and 72.0 % (N =250) adults had a disease history of COPD, coronary
heart diseases, stroke, cancer, hearing, seeing and concentration difculty, respectively. Among patients with memory decline, the
average level of glycohemoglobin, RBC folate, serum total folate, albumin, alkaline phosphatase, aspartate aminotransferase, alanine
aminotransferase, blood urea nitrogen, cholesterol, creatine phosphokinase, creatine, gamma glutamyl transferase, iron, triglycerides,
uric acid were 6.1 ±1.4 (%), 1350.4 ±652.5 (nmol/L), 48.8 ±45.2 (nmol/L), 42.4 ±3.4 (g/L), 72.2 ±21.9 (IU/L), 26.8 ±15.7 (U/
L), 25.5 ±17.4 (U/L), 5.5 ±2.4 (mmol/L), 5.0 ±1.1 (mmol/L), 145.1 ±255.0 (IU/L), 80.9 ±32.3 (umol/L), 35.3 ±41.6 (IU/L), 13.8
±6.0 (umol/L), 1.9 ±1.2 (mmol/L), 323.2 ±96.4 (umol/L), respectively. P <0.05 was considered statistically signicant.
3.2. Comparison of models
After applying the LASSO algorithm with the optimal alpha parameter (0.001) in the training sets, the top 15 most signicant
variables were used to construct the models. We developed machine learning–based prediction algorithms using Logistic regression,
ExtraTrees classier, Bagging classier and XGBoost and RF. The confusion matrix and corresponding predictive values for the ve ML
models were compared in Fig. 3(A–E) and 4 (A–F), respectively. Among the considered machine learning models, the ExtraTrees
classier and XGBoost models achieved the best performance in the internal validation set, with an AUC of 0.915 and 0.911,
respectively (Fig. 4A). Moreover, the Logistic regression and the XGBoost models also exhibit the best performance in the external
validation set with an AUC of 0.874 and 0.866, respectively (Fig. 4B). In addition, the Bagging classier showed the worst predictive
performance with an AUC value of 0.890 and 0.853 in the internal and external validation set, respectively (Fig. 4A and B).
Table 2 demonstrated that ExtraTrees classier (0.934) and XGBoost (0.935) showed best accuracy in the internal validation set
when identifying memory decline. For precision, XGBoost (0.562) showed best performance. For specicity, all models showed
excellent performance, but these differences were not statistically signicant (Table 2). Moreover, Logistic Regression had higher
Y. Song et al. Heliyon 10 (2024) e39575
4
Table 1
Demographic and clinical characteristics based on the status of memory decline.
Memory decline P-value
No Yes
N4178 347
Age 48.2 ±17.3 57.8 ±16.4 <0.001
Gender 0.202
Male 1991 (47.7 %) 153 (44.1 %)
Female 2187 (52.3 %) 194 (55.9 %)
Education <0.001
≤9th grade 424 (10.1 %) 81 (23.3 %)
9-11th grade 453 (10.8 %) 51 (14.7 %)
high school graduate 893 (21.4 %) 86 (24.8 %)
some college or AA degree 1290 (30.9 %) 93 (26.8 %)
college graduate or above 1118 (26.8 %) 36 (10.4 %)
Income <0.001
low income 1395 (33.4 %) 199 (57.3 %)
median income 625 (15.0 %) 58 (16.7 %)
high income 2158 (51.7 %) 90 (25.9 %)
BMI 29.6 ±7.1 30.6 ±7.6 0.009
Hypertension <0.001
No 2753 (65.9 %) 155 (44.7 %)
Yes 1425 (34.1 %) 192 (55.3 %)
Sleep duration 7.6 ±1.5 8.3 ±2.0 <0.001
Sedentary behavior 367.2 ±195.8 392.3 ±209.5 0.023
COPD <0.001
No 4071 (97.4 %) 319 (91.9 %)
Yes 107 (2.6 %) 28 (8.1 %)
Coronary heart disease <0.001
No 4029 (96.4 %) 317 (91.4 %)
Yes 149 (3.6 %) 30 (8.6 %)
Stroke <0.001
No 4065 (97.3 %) 303 (87.3 %)
Yes 113 (2.7 %) 44 (12.7 %)
Cancer <0.001
No 3812 (91.2 %) 293 (84.4 %)
Yes 366 (8.8 %) 54 (15.6 %)
Hearing <0.001
No 3871 (92.7 %) 257 (74.1 %)
Yes 307 (7.3 %) 90 (25.9 %)
Seeing <0.001
No 3997 (95.7 %) 277 (79.8 %)
Yes 181 (4.3 %) 70 (20.2 %)
Concentration difculty <0.001
No 3991 (95.5 %) 97 (28.0 %)
Yes 187 (4.5 %) 250 (72.0 %)
Anxiety <0.001
Daily 523 (12.5 %) 144 (41.5 %)
Weekly 606 (14.5 %) 60 (17.3 %)
Monthly 551 (13.2 %) 38 (11.0 %)
A few times a year 1517 (36.3 %) 76 (21.9 %)
Never 981 (23.5 %) 29 (8.4 %)
Depression <0.001
Daily 125 (3.0 %) 98 (28.2 %)
Weekly 236 (5.6 %) 59 (17.0 %)
Monthly 329 (7.9 %) 44 (12.7 %)
A few times a year 1391 (33.3 %) 86 (24.8 %)
Never 2097 (50.2 %) 60 (17.3 %)
Glycohemoglobin 5.8 ±1.1 6.1 ±1.4 <0.001
RBC folate 1180.3 ±519.8 1350.4 ±652.5 <0.001
Serum total folate 42.4 ±37.7 48.8 ±45.2 0.003
Albumin 43.2 ±3.5 42.4 ±3.4 <0.001
Alkaline Phosphatase 69.2 ±23.2 72.2 ±21.9 0.022
Aspartate Aminotransferase 25.5 ±11.6 26.8 ±15.7 0.050
Alanine Aminotransferase 25.2 ±17.1 25.5 ±17.4 0.782
Blood Urea Nitrogen 5.2 ±2.1 5.5 ±2.4 0.008
Cholesterol 5.0 ±1.1 5.0 ±1.1 0.565
Creatine Phosphokinase 160.2 ±181.8 145.1 ±255.0 0.151
Creatine 76.9 ±37.3 80.9 ±32.3 0.056
Gamma Glutamyl Transferase 27.9 ±42.3 35.3 ±41.6 0.002
Iron 14.5 ±6.1 13.8 ±6.0 0.022
(continued on next page)
Y. Song et al. Heliyon 10 (2024) e39575
5
Recall score (0.567) and F1 score (0.555) than other models, followed by XGBoost model, with a Recall score of 0.537 and a F1 score of
0.550, respectively (Table 2).
Fig. 4C exhibited that the XGBoost algorithm exhibit largest net benet in internal validation set, whereas Bagging classier
showed smallest net benet within the risk thresholds between 0.1 and 0.2. Fig. 4D shows the net benet curves of ExtraTrees classier
model was largest in the external validation cohort. The calibration curve illustrates the deviation in performance of each model by
compared the predicted probability of the model and the actual probability [26]. Fig. 4E and Fshowed that most of the models had
great calibration from a visual representation. The XGBoost model performed better in the internal validation sets as well as in the
external validation sets, indicating the XGBoost algorithms had a certain predictive value.
3.3. Model interpretability
To describe the clinical signicance of critical features more effectively, we quantied their signicance as SHAP values. Fig. 5
shows the SHAP plots for the ExtraTrees classier (Fig. 5A and B) and the XGBoost (Fig. 5C and D) models. Fig. 5A and Cprovides an
overview of the impact of factors on the ExtraTrees classier and the XGBoost model, respectively. Fig. 5B and Dpresents the crucial
clinical features according to the average absolute SHAP value for the ExtraTrees classier and the XGBoost models, respectively. The
results reveal that concentration, depression, age, anxiety, and income were the strongest predictors for the ExtraTrees classier
model, whereas concentration, age, depression, sleep duration and sedentary behavior were the strongest predictors for the XGBoost
model.
4. Discussion
In this study, we applied 5 ML algorithms from a national population-based survey data to predict memory decline risk among US
adults. Our result demonstrated that ExtraTrees classier and XGBoost displayed the best predictive performance and clinical utility
than other independent ML models in the internal sets, based on the AUC value of 0.915 and 0.911, respectively. The ExtraTrees
classier and XGBoost models were further validated using an external validation cohort, and they exhibited consistent predictive
performance for memory decline, with an AUC of 0.851 and 0.843, respectively. Furthermore, compared to other ML models, the
Table 1 (continued )
Memory decline P-value
No Yes
Triglycerides 1.8 ±1.5 1.9 ±1.2 0.124
Uric acid 320.8 ±84.5 323.2 ±96.4 0.619
Fig. 3. The confusion matrix for each model in the internal validation. (A) Logistic regression; (B) ExtraTrees classier; (C) Bagging classier; (D)
Gradient Boosting; (E) RF.
Y. Song et al. Heliyon 10 (2024) e39575
6
XGBoost model had larger net benet (threshold: 0.1–0.2) in the internal validation sets, whereas the ExtraTrees classier had larger
net benet (threshold: 0.1–0.2) in the external validation sets. Our results demonstrated the robustness and applicability of the
ExtraTrees classier and XGBoost models in memory decline risk prediction.
Among the independent ML models, the ExtraTrees classier and XGBoost were the 2 best-performing models, as identied by the
AUC scores. Similar results have been reported in other studies. For example, Tateishi et al. constructed ML models for predicting
inappropriate implantable cardioverter-debrillator (ICD) therapy and found that the ExtraTrees classier performed better than other
models, with an AUC of 0.891 in the training sets and an AUC of 0.869 in the test sets, respectively [27]. Zhong et al. showed that the
XGBoost model was superior to other models for the risk prediction of early cognitive impairment in patients with hypertension by a
Fig. 4. The AUC and DCA curve of each model in the internal and external validation cohorts. (A), (C) and (E): internal validation set; (B), (D) and
(F): external validation set.
Y. Song et al. Heliyon 10 (2024) e39575
7
Table 2
Performance of each model for prediction in the internal validation set.
Models Accuracy AUC Precision Specicity Recall F1
Logistic Regression 0.933 0.906 0.543 0.962 0.567 0.555
ExtraTrees classier 0.934 0.915 0.557 0.968 0.507 0.531
Bagging 0.929 0.890 0.522 0.961 0.537 0.529
XGBoost 0.935 0.911 0.562 0.977 0.537 0.550
RF 0.930 0.894 0.538 0.971 0.418 0.471
Fig. 5. The SHAP summary plot for the important features. (A) and (C) SHAP summary plot, (B) and (D) feature importance.
Y. Song et al. Heliyon 10 (2024) e39575
8
cross-sectional study (AUC: 0.88, F1 score: 0.59, accuracy: 0.81, sensitivity: 0.84, specicity: 0.80) [28]. Chen et al. built a machine
learning model for the detection of mild cognitive impairment among PD patients without dementia and concluded that the XGBoost
model the best classication performance, with an accuracy of 91.67 % and AUC of 0.94 in the test set [29]. These studies indicated
that both ExtraTrees classier and XGBoost may be good choices for predicting outcomes as well as effective clinical implementation.
In line with the previous literature, our results consistently identied several factors for memory decline. These known risk factors
have an important relationship with Alzheimer’s disease and other neurological disorders. Among them, concentration was the
strongest predictors for memory decline. Attention is a fundamental cognitive skill necessary for advanced cognitive capabilities, such
as executive functions or memory, that begin to develop in early infancy [30]. Elder et al. concluded that patients with Lewy body
dementia (LBD) often have impaired attentional and executive function, indicating that attention difculties are signicant associated
with LBD [31]. Impaired memory and alterations in mood are common symptoms of epilepsy. Shi et al. discovered that depression and
anxiety are signicant risk factors for epilepsy, which may relate to the breakdown of broadly distributed limbic networks [32,33]. The
prevalence and incidence of many neurological diseases increase with age. For example, Alzheimer’s disease and vascular dementia
are age-related progressive neurodegenerative disorders that lead to memory decline [34,35]. Moreover, sleep duration, BMI, and
education are also common risk factors for AD, and vascular dementia, respectively [36–38]. Hachinski et al. reported that the
prevalence of dementia is increasing in low-income and middle-income countries, while decreasing in high-income countries [39].
Probably because lower income is associated with a reduced hippocampal volume [40]. In contrast to these neurological disorders,
memory loss is a symptom of many neurodegenerative diseases. Thus, predicting risk factors for memory decline are crucial for the
early detection of various neurological disorders.
Metabolic disorders could damage brain function. This is characterized by a signicant decrease in awareness, which leads to
reduced reactivity, aberrant receptivity, impaired content, and retained memory. In this study, we took metabolic or nutritional
related factors into consideration, including iron, creatinine, alkaline phosphatase, alanine aminotransferase, gamma glutamyl
transferase, and uric acid. We noted high levels of iron to be negatively associated with the prevalence of memory decline, probably
because iron is a key nutrient that inuences brain development and function. In contrast, previously reported that elevated levels of
serum iron were found to be linked to a decline in the learning and memory and this correlation is believed to be caused by alterations
in the electrophysiology and functionality of the brain region in question [41]. It indicates that iron is both an essential vitamin for
brain development and a potentially harmful substance. Additional research is needed to establish the ideal dosage of iron. For
example, apart for serum iron, we would quality the association between brain iron levels and memory deterioration utilizing neu-
roimaging technology, such as magnetic resonance imaging (MRI), in the future. Moreover, in vitro and in vivo experiments could be
performed in the subsequent study. Our study supports previous research that elevated serum uric acid levels may be a potential
mechanism of cognitive impairment in bipolar disorder (BD) [42]. These risk factors may directly or indirectly participate in regulating
the interaction between neural and vascular cells, consequently inuencing the maintenance of normal learning and memory
activities.
Our study also revealed novel aspects concerning some established risk factors for these conditions. In this study, we discovered
that increased sedentary time contribute to memory decline among adults. Wickel et al. discovered that increased sedentary time from
9 to 15 years predicted higher adolescent levels of inhibition, working memory [43]. In contrast to adolescents, Maasakkers et al.
discovered that there were no associations between baseline sedentary behavior and cognitive decline among older adults without
dementia [44]. However, a solitary assessment session at youth or older age may not provide an adequate representation of the entire
population. Our study identied that improved sedentary time act as crucial factors for memory decline among adults aged 20–80
years, which is more comprehensive compared to previous study. However, future studies are needed to further disentangle these
complex interrelationships between sedentary behavior and memory-related outcomes.
This study had several strengths. To our knowledge, few less studies concentrating on the prediction risk factors of memory decline
using machine learning approaches among adults. While numerous predictive models have demonstrated encouraging performance in
research, there is still insufcient data on their use in clinical settings and the availability of interpretable risk prediction models to
assist in disease risk screening [45]. In this study, a machine learning model was created to examine the signicant characteristics for
predicting the risk factors associated with memory deterioration using data from the NHANES database. Even though many risk
variables associated to memory deterioration were found, this study highlighted unexpected factors that are often ignored in clinical
practice. Our ndings may offer insights for therapeutic application. Moreover, an external validation cohort was conducted, and the
applicability of the developed models may be efcient in clinical practice. Importantly, to explain the ExtraTrees classier and
XGBoost models, we identied some important variables associated with patients with memory decline by using the SHAP analysis.
Additionally, the present data were derived from the NHANES database, which is a large, standardized, nationwide database repre-
sentative of US population. Our results may provide insights into the risk factors for predicting memory decline in the global popu-
lation. Moreover, it can be used in comparative analysis of global health research. In future, we would perfect our study by developing
models that include other populations, such as Asian and European.
This study has several limitations. First, in this study, memory decline was considered as a binary variable, we fail to evaluate the
correlation between the factors and the severity of memory decline. Second, unknown variables may inuence the correlation between
key parameters and the risk of memory deterioration.
5. Conclusion
Our result demonstrated that the ExtraTrees classier and XGBoost displayed the best predictive performance and clinical utility
than other independent ML models, based on the AUC value in the internal and external sets. Moreover, the SHAP analysis explored
Y. Song et al. Heliyon 10 (2024) e39575
9
that concentration is the most important predictor in our study. ML models have the potential to enhance clinical practice by being
integrated into future clinical decision-making processes for assessing the risk of disease in patients experiencing memory deterio-
ration. Our ndings could be useful for the online screening of possible memory decline patients, offering a major benet for the early
identication of high-risk individuals. Additionally, it offers individualized treatment strategies to support clinical decision-making
and illness management for healthcare organizations. Moreover, our results may promptly identify changes in patients’health sta-
tus and offer targeted health recommendations and cautions.
CRediT authorship contribution statement
Ying Song: Visualization, Validation, Formal analysis. Yansun Sun: Validation, Software, Resources, Data curation. Qi Weng:
Writing –review &editing, Writing –original draft. Li Yi: Supervision, Project administration, Methodology, Investigation, Funding
acquisition, Data curation, Conceptualization.
Ethics approval and consent to participate
The protocol of NHANES is approved by the National Center for Health Statistics (NCHS) Research Ethics Review Board, and the
protocol numbers of 2015–2018 NHANES are ‘#2011–17’and ‘#2018–01’. Informed consent was obtained from patients before
enrolment; otherwise, the consent was obtained from the patients’closest relative.
Consent for publication
Not applicable.
Availability of data and materials
The datasets generated and/or analyzed during the current study are available in the NHANES database (https://www.cdc.gov/
nchs/nhanes/index.htm).
Funding
National Natural Science Foundation of China (22067015), The Shenzhen Science and Technology Innovation Project (grant
numbers: JCYJ20190822090801701; JCYJ20230807095124046) and Peking University Shenzhen Hospital - Ye Chenghai Charity
Foundation provided funding for this project.
Declaration of competing interest
The authors declare the following nancial interests/personal relationships which may be considered as potential competing in-
terests:Li Yi reports nancial support was provided by National Natural Science Foundation of China (22067015). Li Yi reports
nancial support was provided by The Shenzhen Science and Technology Innovation Project. Li Yi reports nancial support was
provided by Peking University Shenzhen Hospital - Ye Chenghai Charity Foundation. If there are other authors, they declare that they
have no known competing nancial interests or personal relationships that could have appeared to inuence the work reported in this
paper.
Acknowledgements
None.
References
[1] A. Casillas, et al., Culture and cognition-the association between acculturation and self-reported memory problems among middle-aged and older Latinos in the
National Health and Nutrition Examination Survey (NHANES), 1999 to 2014, J. Gen. Intern. Med. 37 (1) (2022) 258–260.
[2] D.M. Walsh, D.J. Selkoe, Deciphering the molecular basis of memory failure in Alzheimer’s disease, Neuron 44 (1) (2004) 181–193.
[3] Q. Liu, et al., Association between intake of energy and macronutrients and memory impairment severity in US older adults, National Health and Nutrition
Examination Survey 2011-2014, Nutrients 12 (11) (2020).
[4] M. Kaushik, P. Kaushik, S. Parvez, Memory related molecular signatures: the pivots for memory consolidation and Alzheimer’s related memory decline, Ageing
Res. Rev. 76 (2022) 101577.
[5] R. Dutta, et al., Hippocampal demyelination and memory dysfunction are associated with increased levels of the neuronal microRNA miR-124 and reduced
AMPA receptors, Ann. Neurol. 73 (5) (2013) 637–645.
[6] D. Aarsland, et al., Parkinson disease-associated cognitive impairment, Nat. Rev. Dis. Prim. 7 (1) (2021) 47.
[7] C. Ortega-de San Luis, T.J. Ryan, Understanding the physical basis of memory: molecular mechanisms of the engram, J. Biol. Chem. 298 (5) (2022) 101866.
[8] A. Ferrario, et al., Predicting working memory in healthy older adults using real-life language and social context information: a machine learning approach,
JMIR Aging 5 (1) (2022) e28333.
[9] Y.C. Chen, et al., Personalized prediction of postconcussive working memory decline: a feasibility study, J. Personalized Med. 12 (2) (2022).
[10] Y. Li, et al., Exploring memory function in earthquake trauma survivors with resting-state fMRI and machine learning, BMC Psychiatr. 20 (1) (2020) 43.
Y. Song et al. Heliyon 10 (2024) e39575
10
[11] A. Fl¨
oel, et al., Lifestyle and memory in the elderly, Neuroepidemiology 31 (1) (2008) 39–47.
[12] Y.S. Handajani, et al., Memory impairment and its associated risk and protective factors among older adults in Indonesia, Int. J. Neurosci. (2023) 1–9.
[13] W.C. Kreisl, et al., Odor identication ability predicts PET amyloid status and memory decline in older adults, J Alzheimers Dis 62 (4) (2018) 1759–1766.
[14] F.J. Maguire, et al., Baseline association of motoric cognitive risk syndrome with sustained attention, memory, and global cognition, J. Am. Med. Dir. Assoc. 19
(1) (2018) 53–58.
[15] J.R. Marden, et al., Contribution of socioeconomic status at 3 life-course periods to late-life memory function and decline: early and late predictors of dementia
risk, Am. J. Epidemiol. 186 (7) (2017) 805–814.
[16] E. Duzel, H. van Praag, M. Sendtner, Can physical exercise in old age improve memory and hippocampal function? Brain 139 (Pt 3) (2016) 662–673.
[17] M.E. Nelson, et al., The association between homocysteine and memory in older adults, J Alzheimers Dis 81 (1) (2021) 413–426.
[18] R.C. Deo, Machine learning in medicine, Circulation 132 (20) (2015) 1920–1930.
[19] G.S. Handelman, et al., eDoctor: machine learning and the future of medicine, J. Intern. Med. 284 (6) (2018) 603–619.
[20] M.A. Fayemiwo, et al., Immediate word recall in cognitive assessment can predict dementia using machine learning techniques, Alzheimer’s Res. Ther. 15 (1)
(2023) 111.
[21] M. Bracher-Smith, et al., Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank, Schizophr. Res. 246 (2022)
156–164.
[22] K.M. McAvoy, A. Sahay, Targeting adult neurogenesis to optimize hippocampal circuits in aging, Neurotherapeutics 14 (3) (2017) 630–645.
[23] E.E. Dooley, et al., Higher 24-h total movement activity percentile is associated with better cognitive performance in U.S. Older adults, Med. Sci. Sports Exerc.
54 (8) (2022) 1317–1325.
[24] Q. Liu, et al., Development and validation of a preliminary clinical support system for measuring the probability of incident 2-year (pre)frailty among
community-dwelling older adults: a prospective cohort study, Int. J. Med. Inf. 177 (2023) 105138.
[25] M. Yin, et al., Automated machine learning for the early prediction of the severity of acute pancreatitis in hospitals, Front. Cell. Infect. Microbiol. 12 (2022)
886935.
[26] K.K. Venkatesh, et al., Machine learning and statistical models to predict postpartum hemorrhage, Obstet. Gynecol. 135 (4) (2020) 935–944.
[27] R. Tateishi, et al., Risk prediction of inappropriate implantable cardioverter-debrillator therapy using machine learning, Sci. Rep. 13 (1) (2023) 19586.
[28] X. Zhong, et al., A risk prediction model based on machine learning for early cognitive impairment in hypertension: development and validation study, Front.
Public Health 11 (2023) 1143019.
[29] B. Chen, et al., Detection of mild cognitive impairment in Parkinson’s disease using gradient boosting decision tree models based on multilevel DTI indices,
J. Transl. Med. 21 (1) (2023) 310.
[30] I. Rivas, et al., Association between early life exposure to air pollution and working memory and attention, Environ. Health Perspect. 127 (5) (2019) 57002.
[31] G.J. Elder, et al., Effects of transcranial direct current stimulation upon attention and visuoperceptual function in Lewy body dementia: a preliminary study, Int.
Psychogeriatr. 28 (2) (2016) 341–347.
[32] W. Shi, et al., Prevalence and risk factors of anxiety and depression in adult patients with epilepsy: a multicenter survey-based study, Ther Adv Neurol Disord 16
(2023) 17562864231187194.
[33] V. Krishnan, Depression and anxiety in the epilepsies: from bench to bedside, Curr. Neurol. Neurosci. Rep. 20 (9) (2020) 41.
[34] X.W. Zhang, et al., Targeting autophagy in Alzheimer’s disease: animal models and mechanisms, Zool. Res. 44 (6) (2023) 1132–1145.
[35] Y. Yang, et al., Vascular dementia: a microglia’s perspective, Ageing Res. Rev. 81 (2022) 101734.
[36] L. Shi, et al., Sleep disturbances increase the risk of dementia: a systematic review and meta-analysis, Sleep Med. Rev. 40 (2018) 4–16.
[37] T. Ngandu, et al., Education and dementia: what lies behind the association? Neurology 69 (14) (2007) 1442–1450.
[38] M.A. Beydoun, H.A. Beydoun, Y. Wang, Obesity and central obesity as risk factors for incident dementia and its subtypes: a systematic review and meta-analysis,
Obes. Rev. 9 (3) (2008) 204–218.
[39] V. Hachinski, et al., Preventing dementia by preventing stroke: the Berlin Manifesto, Alzheimers Dement. 15 (7) (2019) 961–984.
[40] L. Rafngton, et al., Blunted cortisol stress reactivity in low-income children relates to lower memory function, Psychoneuroendocrinology 90 (2018) 110–121.
[41] S.J. Fretham, E.S. Carlson, M.K. Georgieff, The role of iron in learning and memory, Adv. Nutr. 2 (2) (2011) 112–121.
[42] S. Li, et al., Association between uric acid and cognitive dysfunction: a cross-sectional study with newly diagnosed, drug-naïve with bipolar disorder, J. Affect.
Disord. 327 (2023) 159–166.
[43] E.E. Wickel, Sedentary time, physical activity, and executive function in a longitudinal study of youth, J. Phys. Activ. Health 14 (3) (2017) 222–228.
[44] C.M. Maasakkers, et al., The association of sedentary behaviour and cognitive function in people without dementia: a coordinated analysis across ve cohort
studies from COSMIC, Sports Med. 50 (2) (2020) 403–413.
[45] J. Li, et al., Predicting mortality in intensive care unit patients with heart failure using an interpretable machine learning model: retrospective cohort study,
J. Med. Internet Res. 24 (8) (2022) e38082.
Y. Song et al. Heliyon 10 (2024) e39575
11