Access to this full-text is provided by Springer Nature.
Content available from Scientific Reports
This content is subject to copyright. Terms and conditions apply.
Machine learning modeling for
predicting adherence to physical
activity guideline
Ju-Pil Choe, Seungbak Lee & Minsoo Kang
This study aims to create predictive models for PA guidelines by using ML and examine the critical
determinants inuencing adherence to the PA guidelines. 11,638 entries from the National Health
and Nutrition Examination Survey were analyzed. Variables were categorized into demographic,
anthropometric, and lifestyle categories. 18 prediction models were created by 6 ML algorithms
and evaluated via accuracy, F1 score, and area under the curve (AUC). Additionally, we employed
permutation feature importance (PFI) to assess the variable signicance in each model. The decision
tree using all variables emerged as the most eective method in the prediction for PA guidelines
(accuracy = 0.705, F1 score = 0.819, and AUC = 0.542). Based on the PFI, sedentary behavior, age,
gender, and educational status were the most important variables. These results highlight the
possibilities of using data-driven methods with ML in PA research. Our analysis also identied crucial
variables, providing valuable insights for targeted interventions aimed at enhancing individuals’
adherence to PA guidelines.
Keywords Articial intelligence, Measurement, MPA, VPA, Prediction model, Subjectively measured
Consistent and sucient physical activity (PA) is fundamental for overall health and wellness inuencing
chronic diseases1, mental health2, and metabolic syndrome3, but also life expectancy4. For these benets,
numerous researchers have investigated the types5, duration6, and intensity7 of exercise that would be benecial
for individuals.
In response, various PA guidelines have been established globally to promote and enhance individuals’ health
outcomes8–10. e World Health Organization (WHO) recommends adults exercise along with their guidelines,
150–300min of moderate-intensity PA or 75–150min of vigorous-intensity PA, or some equivalent combination
of moderate- and vigorous-intensity aerobic PA per week11. Specic PA guidelines for children and adolescents12
and the elderly13 have also been released. However, in spite of the worldwide eorts by organizations, statistics
reveal concerning trends, with only about 24% of the United States population meeting PA guidelines14.
Classifying PA is crucial for understanding behavioral trends and tailoring interventions. Several studies have
addressed this issue by classifying individuals based on diverse characteristics to identify key factors inuencing
PA, such as anthropometric status15 and lifestyle patterns16,17. ose variables like sedentary behavior (SB),
sleep, educational status, and body mass index (BMI) play a signicant role in this process, oering valuable
insights for designing eective strategies to improve adherence to PA guidelines.
Traditionally, analysis of factors impacting adherence to PA guidelines has predominantly relied on classical
methodologies such as logistic regression16 and receiver operator characteristic curve18. On the other hand,
the emergence of state-of-the-art articial intelligence, particularly machine learning (ML), has presented a
powerful avenue for PA research using diverse classication models and big data. ML’s superior performance
over classical methods, addressing concerns like overtting and multicollinearity19, and big data challenges20, has
been extensively debated. With these advantages, numerous studies were conducted to utilize ML in the PA eld.
For example, several studies developed prediction models with approximately 60–80% accuracy in classifying
preschool children’s PA types from accelerometer data21,22. Similarly, Farrahi et al.23 reported that the accuracy
of the articial neural networks model for PA intensity classication with accelerometer data was from 80.4 to
90.7%. In addition, one study investigated appropriate combinations of feature subsets for PA class prediction
by using the feature selection method and the classiers24. Lastly, a study by Nu et al.25 focused on developing
prediction models using three algorithms (i.e., Support Vector Machine, Random Forest, and eXtreme Gradient
Boosting (XGBoost)) based on the comprehensive characteristics of the subjects and accelerometer data.
Health and Sport Analytics Laboratory, Department of Health, Exercise Science, and Recreation Management, The
University of Mississippi, University 38677, USA. email: kang@olemiss.edu
OPEN
Scientic Reports | (2025) 15:5650 1
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports
Content courtesy of Springer Nature, terms of use apply. Rights reserved
e signicance of the previously mentioned studies lies in their attempt to utilize ML as a research tool
for PA classication. However, there has been limited research on prediction models classifying PA using ML,
particularly in addressing three key issues: (1) limited investigation into diverse variable combinations for
classication models, (2) restricted use of a small number of algorithms for predictive modeling, and (3) a
lack of study solely based on subjective questionnaire data when objectively measured PA data are unavailable.
erefore, the present study aims (1) to develop models predicting individuals’ adherence to PA guidelines using
six algorithms and diverse variable combinations with subjective questionnaire data and (2) to conrm the most
suitable combinations and variables for the models.
Method
e current study adopted CRoss Industry Standard Process for Data Mining (CRISP-DM), which is a widely
recognized framework in ML research, to systematically predict individuals’ adherence to PA guidelines26,27. Its
six stages encompass domain understanding, data understanding, data preparation, modeling, model evaluation,
and nally, deployment of the models.
Domain understanding
As a rst step, domain understanding involves outlining research objectives, approaching problems within the
context of modeling, and craing a systematic plan to achieve these objectives. In this research, the goal is to
forecast individuals’ adherence to PA guidelines. us, the chosen ML method is a classication, aiming to
predict the target variable as either meeting guidelines or non-meeting guidelines.
Data understanding
e phase of data understanding encompasses describing the database, initial data collection, explaining data
structures, and exploring data patterns. In this study, data sourced from the National Health and Nutrition
Examination Survey (NHANES) conducted by the National Center for Health Statistics (NCHS) was used.
NHANES releases data in two-year cycles, such as 2009–2010 and 2011–2012, and aims to assess the health and
nutrition status of the US population and includes socioeconomic, demographic, dietary, and health-related
inquiries. To ensure the representativeness of the US population, NHANES sample weights were applied in the
analysis (Tables1, 2). ese weights account for the complex survey design, including oversampling and non-
response, enabling nationally representative estimates. e weighted scores were calculated by multiplying each
participant’s reported activity time by their corresponding sample weight.
e ve cycles of data (2009–2018), each originally containing approximately 10,000 dierent cross-sectional
data points, were merged by ltering for participants aged 18 and older, resulting in an integrated dataset of
30,352 entries. We limited our study to the past decade (2009–2018) to minimize the impact of COVID-19
and to maintain the validity of our ndings, as health patterns have signicantly changed over time and older
data could undermine the relevance and accuracy of the analysis. ese data involved 25 variables, comprising
seven demographic variables, six lifestyle variables, two anthropometric variables, and ten PA-related variables
as outlined in Table3. Cases with missing values in any variable (educational status: n = 1,562; marital status:
n = 1,540; income: n = 2,35; sleep: n = 21; occupation: n = 20; alcohol: n = 4,301; smoke: n = 635; and SB: n = 141)
were excluded from the analysis. A single case could have missing values in multiple variables, which contributed
to the total of 4,427 excluded cases. Participants who were pregnant (n = 300) or had hypertension (n = 9,287),
diabetes (n = 3,560), cancer (n = 23), arthritis (n = 7,778), or physical limitation (n = 71) to PA were also excluded
(n = 14,287), resulting in a nal dataset comprising 11,638 participants. is exclusion ensures that the analysis
focuses on individuals without inherent or clinical barriers to meeting PA guidelines, thereby providing a clearer
understanding of factors inuencing adherence to these guidelines.
Data preparation
In the process of preparing the data, this study initially generated a new variable, intensity-weighted PA (IWPA),
intending to designate it as the target variable for subsequent modeling and prediction. IWPA, comprising a
combination of work, transportation, and recreational activities, was calculated by multiplying the minutes
spent on each activity by the number of days per week. In this process, to align with the established moderate
PA guidelines, minutes of vigorous work and recreational activities were doubled before summing the total
activity time28. According to the guidelines, 150min of moderate-intensity PA or 75min of vigorous-intensity
PA per week are considered equivalent in meeting the recommended levels of physical activity. Since vigorous
activity achieves the same health benets in half the time compared to moderate activity, doubling the minutes
of vigorous activity allows for a standardized comparison when combining it with moderate activity. Eventually,
a threshold of 150min per week was utilized to evaluate individuals’ adherence to the PA guidelines. In addition,
Variables Male Female Tot a l
Unweighted number 6112 5526 11,638
Age (years) 38.84 (0.24) 39.09 (0.25) 38.96 (0.17)
BMI (kg/m2) 27.96 (0.09) 27.84 (0.12) 27.91 (0.07)
WC (cm) 97.89 (0.25) 92.79 (0.27) 95.47 (0.19)
Tab le 1. Weighted demographic characteristics. Data are presented as mean (standard error). BMI body mass
index, WC waist circumference.
Scientic Reports | (2025) 15:5650 2
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
to classify drinkers and smokers, two alcohol-related variables and two smoke-related variables were used to
make new alcohol and smoke variables, respectively. Figure1 outlines the process of creating new variables.
e purpose of collecting PA variables was only to generate the IWPA variable, and the PA variables were
excluded from the analysis aer the use. erefore, the study included a total of 13 variables: encompassing seven
demographic variables (gender, age, race, marital status, educational status, employment status, income), six
lifestyle and anthropometric variables (alcohol, smoke, sleep time, SB, waist circumference, and BMI) alongside
the target variable. Additionally, categorical features were converted into a numerical format using Label
Encoding to ensure compatibility with machine learning algorithms. is method involves assigning unique
numerical codes to each category (e.g., “Male” as 0 and “Female” as 1), where the target labels are encoded with
values ranging from 0 to ‘n-classes’-1.
Predictor variables Unweighted (n)Weighted IWPA
(SE) p-value Post-hoc test*
Gender Male 6112 387.77 (6.49) < 0.001
Female 5526 345.20 (6.44)
Age
Young adulta6413 387.65 (5.86) < 0.001 b, c
Middle-aged adultb3824 352.57 (8.44) a, c
Old adultc1401 299.75 (13.56) a, b
Race
Non-hispanic blacka2237 336.45 (8.59) <0.001 b
Non-hispanic whiteb4327 383.19 (6.63) a, c,d
Mexican americanc1948 326.61 (9.33) b, d
Othersd3126 353.60 (8.42) b, c
ES
Less than 12 gradea2276 317.18 (10.52) < 0.001 c, d
High school or equivalentb2582 325.54 (9.40) c, d
Some college or AAc3593 374.54 (8.27) a, b,d
College grade or aboved3187 407.96 (8.38) a, b,c
Marital status
Marrieda5564 352.83 (6.56) < 0.001 b
Never marriedb3188 406.68 (8.88) a, c,d
Living with partnerc1320 364.89 (12.15) b
Separatedd1566 343.21 (13.57) b
Income
Under $20,000a2849 352.83 (6.56) < 0.001 c, d
$20,000 - $44,999b2974 406.68 (8.88) d
$45,000 - $74,999c2270 364.89 (12.15) a
Over $75,000d3545 343.21 (13.57) a, b
BMI
Underweighta228 364.38 (34.58) < 0.001
Normalb3942 401.10 (8.24) d
Overweightc3864 366.85 (7.75) d
Obesed3604 331.23 (7.89) b, c
WC Low risk 3909 409.64 (8.16) <0.001
High risk 7729 347.49 (5.52)
Alcohol
Never drinkera2617 314.26 (8.79) < 0.001 b, c
Light drinkerb7451 377.31 (5.72) a
Heavy drinkerc1570 385.40 (12.81) a
Smoke
Never smoker 8552 371.37 (5.80) 0.402
Former smoker 2263 369.72 (11.07)
Current smoker 1891 353.13 (10.07)
Employment Not employed 3354 350.36 (8.57) < 0.001
Employed 8284 365.82 (4.95)
Sleep time < 7h 3779 356.58 (8.17) 0.261
≥ 7h 7859 372.12 (5.53)
SB < 480min 7797 381.36 (5.78) < 0.001
≥ 480min 3841 343.18 (7.52)
Tab le 2. Weighted mean of IWPA (minutes per week) across each variable (n = 11,638). P-value was calculated
by t-test and ANOVA. *Superscripts indicate the direction of the relationship of the signicant post-hoc test
(p < .05). AA Associate of arts, BMI body mass index, ES educational status, IWPA intensity-weighted physical
activity, SB sedentary behavior, SE standard error, WC waist circumference.
Scientic Reports | (2025) 15:5650 3
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Modeling
e process of data modeling entails the development of predictive models by discerning specic patterns from
both models and data. is method involves a sequential procedure: (1) the segregation of the data into training
and test datasets, (2) the selection of an appropriate algorithm, (3) the comprehensive training of the model
using the designated algorithm and the training dataset, and (4) the optimization of hyperparameter values
inherent in the algorithmic model to rene its performance.
Data split
Within the modeling procedure, data splitting occurs twice. Initially, prior to model training, the dataset is
divided into distinct training and test sets. In the present study, data were partitioned at a balanced split of 80%
for training data and 20% for test data to optimize model performance29. Furthermore, data splitting recurs
during the training phase of algorithmic models to facilitate cross-validation. We implemented a stratied
10-fold cross-validation approach during the model training phase to prevent overtting30. e training data
was segmented into ten equal parts, each serving sequentially as a validation set while the remainder was used
for training. is stratication ensured each fold accurately represented the overall distribution of the target
variable, enhancing the model’s accuracy and generalizability by providing thorough training and testing across
all data segments31.
ML algorithms
e research employed six ML algorithms, which have been proven for their eectiveness, interpretability, and
capacity to adeptly manage diverse data types, in constructing prediction models: Logistic Regression, Support
Vector Machine, Decision Tree, Random Forest, XGBoost, and Light Gradient Boosting Machine (LightGBM).
Logistic regression Logistic Regression operates on the assumption of assessing the odds associated with a bi-
nary outcome. It functions by identifying coecients within an optimal linear model that indicates the relation-
ship between the logit transformation of the outcome variable and the independent variables. Simply, Logistic
Regression harnesses regression techniques to forecast the likelihood that data points correspond to a specic
category, representing this likelihood as a value ranging between 0 and 1, thereby facilitating data classication
based on these probabilities. is model serves as a fundamental benchmark in contrast to other non-parametric
machine learning models and is frequently favored due to its simplicity in interpretation.
Support vector machine Support Vector Machine is a powerful algorithm introduced by Cortes and Vapnik32,
excelling in classication tasks by craing an optimal decision boundary—whether a line or a hyperplane—that
Category Variables
Demographic
Gender
Age
Race-Ethnicity
Marital status
Educational status
Employment status
Income
Lifestyle
Experience of alcohol in a year
Number of alcohol drinks per day
Experience of smoke in a year
Current smoking status
Sleep time
Sedentary behavior
Anthropometric Waist circumference
Body mass index
Physical activity
Number of days moderate work
Minutes moderate-intensity work
Number of days vigorous work
Minutes vigorous-intensity work
Number of days walk or bicycle
Minutes walk/bicycle for transportation
Number of days moderate recreational activity
Minutes moderate-recreational activity
Number of days vigorous recreational activity
Minutes vigorous-recreational activity
Tab le 3. Data category and variables.
Scientic Reports | (2025) 15:5650 4
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
eectively separates data points into distinct classes. is algorithm strategically positions this boundary to max-
imize the margin, the space between the boundary and the nearest data points called support vectors. Known
for their ability to handle complex datasets and nonlinear relationships, Support Vector Machine demonstrates
remarkable ecacy compared to Logistic Regression, especially with unbalanced datasets, albeit their computa-
tional complexity oen requires longer training times during model development33.
Decision tree Decision Tree method outperforms data classication and prediction, structuring optimal de-
cision rules in a tree format and eectively handling categorical dependent variables without relying on key
assumptions of parametric statistics, such as variance homogeneity and normal distribution33. Within this study,
the Classication and Regression Trees (CART) algorithm was employed, utilizing binary separation and the
Gini index to reduce information impurity. However, Decision Tree is prone to overtting, especially when it
becomes too complex. It can memorize the training data, including noise, which may lead to poor generalization
on new, unseen data. Additionally, Small changes in the data can result in a completely dierent tree structure.
is instability makes decision trees sensitive to variations in the dataset, impacting their reliability.
Random forest To address some drawbacks of Decision Tree, ensemble methods—comprising the training
and combining of multiple models, oen using both bagging and boosting techniques (e.g., Random Forest,
XGBoost, and LightGBM)—were additionally utilized in this research. Random Forest is an ensemble algorithm
utilizing the bagging method. is technique involves synthesizing multiple Decision Tree models to produce a
conclusive classication model34. Compared to the Decision Tree, Random Forest demonstrates higher stabil-
Fig. 1. Creation of new variables. Vigorous work and recreational activity minutes were doubled because the
established PA guidelines consider 75min of vigorous-intensity activity equivalent to 150min of moderate-
intensity activity.
Scientic Reports | (2025) 15:5650 5
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
ity and accuracy while eectively mitigating overtting. Moreover, ensemble learning, a method involving the
training and combination of multiple models, is employed to develop a more procient model.
eXtreme gradient boosting Based on the boosting method, XGBoost constructs an optimal model by integrat-
ing a loss function measuring the deviation between predicted and actual values35. Not only encapsulating the
intricate tree model function, but XGBoost prevents overtting through meticulous regulation of model com-
plexity. Additionally, by equipping the graphics processing unit (GPU), XGBoost can enhance its learning speed,
which was an advantage that was absent in earlier boosting models.
Light gradient boosting machine Another algorithm using the boosting method, LightGBM diverges from
the conventional boosting method (e.g., XGBoost with depth-wise tree structure, expanding all leaves at the
same depth simultaneously) by adopting a leaf-wise tree structure, which splits the leaf with max loss to grow
regardless of the balance of the tree36. Moreover, it optimizes memory usage and accelerates learning speed
through gradient-based one-side sampling, reducing data volumes, and exclusive feature bundling, eectively
minimizing variable numbers.
Hyperparameter tuning
Hyperparameter tuning involves optimizing a nal model’s performance by adjusting parameters not learned
from the data. For instance, in algorithms like LightGBM, the number of iterations (“n_estimators”) represents
a crucial hyperparameter determining the count of sequential models built. While methods for optimizing
hyperparameters dier, the present study employed the grid search technique. is approach involves
systematically testing dierent values within a parameter range, creating models for each case, evaluating their
performance, and selecting the parameter value that yields the highest model performance among the tested
options37. Details are presented in Table4.
Evaluating
Aer the development of modeling, the models should be assessed on their performance using a classication
confusion matrix. A confusion matrix is a compact yet comprehensive tool for evaluating the performance
of models. It presents a tabular summary of predicted versus actual classications, comprising four main
components (see Fig. 2): true positives (correctly predicted positives), true negatives (correctly predicted
negatives), false positives (incorrectly predicted positives), and false negatives (incorrectly predicted negatives).
Algorithm Hyperparameters Value Description
LR
solver lbfgs To nd the optimal weights and cost function
multi_class auto To handle problems with multiple classes
C 100 Regularization parameter
DT max_depth 4 Max depth of the tree
RF
criterion entropy To decide how to split the data
max_depth 5 Max depth of the tree
min_samples_leaf 8 Minimum number of samples at a leaf
min_samples_split 3 Minimum number of samples required to split
n_estimators 4 Number of trees in the forest
SVM
class_weight None To control data imbalance
C 1 Regularization parameter
gamma 0.1 To control overt
kernel linear To transform data linearly separable
probability TRUE To use a probabilistic classication approach
XGBoost
colsample_bytree 0.6 Subsample ratio of features for building a tree
gamma 5 To control overt
max_depth 1 Max depth of the tree
min_child_weight 1 To control the tree’s depth and partitioning
subsample 1 Subsample ratio of training data for building a tree
LightGBM
colsample_bytree 0.8 Subsample ratio of features for building a tree
learning_rate 0.1 To determine the step size from each iteration.
max_depth 2 Max depth of the tree
min_child_samples 1 To control the tree’s depth and partitioning
n_estimators 50 Number of trees in the forest
subsample 0.7 Subsample ratio of training data for building a tree
Tab le 4. Optimal hyperparameter for each algorithm. DT decision tree, LightGBM light gradient; boosting
machine, LR logistic regression, RF random forest, SVM support vector machine, XGBoost extreme gradient
boosting.
Scientic Reports | (2025) 15:5650 6
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
From these elements, various performance metrics such as accuracy, precision, recall, specicity, and the F1-
score are derived. is matrix enables a detailed evaluation of a model’s ability to correctly classify instances,
oering insights into its strengths and areas that may require improvement, aiding in ne-tuning and enhancing
its predictive capabilities.
In the present study, accuracy, F1 score, and Area under the curve (AUC) were utilized as the validity indexes.
Accuracy, frequently utilized as a validation metric, signies the percentage of correctly classied predicted
results among the total predictions made by the model. A higher accuracy indicates a better overall correctness
of predictions. e F1 score is a metric that combines both precisions and recalls into a single formula38. e f-1
score was calculated using Eq.(1) below.
F1score =2
∗precision ∗recall
precision +recall
It represents the harmonic mean of precision and recall, oering a balanced assessment and comprehensive
evaluation of a model’s ability to accurately predict positive instances while minimizing incorrect classications.
A perfect F1 score of 1.0 means that the model has achieved both perfect precision and recall, indicating an
ideal balance between the two. Lastly, AUC means the area under the Receiver Operating Characteristic (ROC)
curve. e ROC curve plots the True Positive Rate (recall) against the False Positive Rate (1 - specicity) for
various classication thresholds. AUC represents the probability that the model will rank a randomly chosen
positive instance higher than a randomly chosen negative instance. A value nearing 1 indicates the model’s
accurate classication of the data. e combination of these three metrics (i.e., accuracy, F1-score, and AUC) is
compelling because it allows for the evaluation of the model’s overall performance (accuracy), performance in
imbalanced class scenarios (F1-score), and threshold-independent classication capability (AUC). Additionally,
supplementary outcomes (i.e., precision, recall, specicity, and negative predictive value) for all models are
provided in Supplement Table1.
Deployment
As the concluding phase of CRISP-DM, it’s essential to transition from the overall results to practical applications,
while also acknowledging the strengths and limitations of the ndings. In the present study, these discussions are
presented in the discussion section.
Feature importance
As one of the critical objectives of developing prediction models, several researchers emphasized the role of
interpretability, since, without proper interpretation, models function as black boxes, hiding their inner workings,
thereby complicating users’ comprehension of the connection between input data and eventual outcomes39. To
address this issue, studies about feature importance have been widely progressed by researchers to interpret the
result of ML40,41.
Fig. 2. Confusion matrix.
Scientic Reports | (2025) 15:5650 7
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
e present study chose permutation feature importance (PFI) as a tool for interpretation. PFI measures a
feature’s importance by assessing the increase in the model’s prediction error following the permutation of the
feature’s values, disrupting its relationship with the target variable. In other words, a feature gains importance
when its values’ mixing elevates the model’s error, meaning the model’s heavy dependence on that feature.
Conversely, if reshuing its values yields no changes in the model’s error, the feature is regarded as insignicant
in inuencing predictions. is method’s strength lies in its ability to handle dierent types of features without
assumptions about their distributions, oering a robust understanding of feature importance in diverse datasets
and models. More details can be found elsewhere30.
Statistical analysis
All data were analyzed using R version 4.1.3 (RStudio: Integrated Development for R. RStudio, PBC, Boston,
MA, USA) for data preprocessing and Python version 3.8 (Python Soware Foundation, Wilmington, DE, USA)
for the main analyses. Group dierences and statistical signicance were examined using independent t-tests,
ANOVA, and post-hoc analyses with Tukey’s Honest Signicant Dierence (HSD) test, as the assumption of
homogeneity was not violated. A p-value less than 0.05 was considered statistically signicant.
Results
e weighted demographic characteristics and unweighted number of NHANES data used for the present study
are presented in Table1. Table 2 demonstrates the unweighted number data and the weighted IWPA across
each variable, which was used in this study as a predictor variable. Out of 13 variables, 11 variables (i.e., age,
race, educational status, marital status, income, BMI, waist circumference, alcohol, employment status, and
SB) exhibited statistically signicant dierences (p < .0.05) in the weighted average time of IWPA. However, no
signicant dierences were observed for the remaining two variables, smoke and sleep time.
Table5 ranks the top 10 models out of 18 based on their validity scores, which were derived from various
combinations of variables and algorithms. e validity scores were presented through two methods: (1) cross-
validation to assess the models’ robustness, and (2) test scores to evaluate their generalizability; and test scores
were used for total rank. Total rank was calculated by the sum of the number of ranks in each validity score (the
model with the smallest sum of numbers was ranked rst and so on). In the context of variable combination,
the combination using all variables prevailed with six models, followed by the combination using lifestyle
and anthropometric variables with 4 models. Examining from an algorithmic perspective, Decision Tree,
Logistic Regression, Random Forest, and LightGBM were utilized twice each, while Support Vector Machine
and XGBoost were each employed in a single model. In evaluating each model, the Decision Tree algorithm
model with all variables (accuracy: 0.705, F1-score: 0.819, and AUC: 0.542) was the most robust model for
predicting adherence to PA guidelines. Also, the 10th model was the LightGBM algorithm model with lifestyle
and anthropometric variables (accuracy: 0.693, F1-score: 0.817, and AUC: 0.510).
Figure3 illustrates the top 10 models’ PFI in which variables are dierentiated by their importance for
each model. Only positive PFI values were gathered for each model, and the variables were listed by weighted
importance. Across all models, SB emerged as the most chosen variable, followed by age, gender, and educational
status. Within each model, SB was found as the most frequently (seven times) selected variable as the most
valuable variable. BMI was ranked rst twice, and educational status took rst place one time.
Discussion
e purpose of this study was to develop predictive models for PA guidelines using ML and to examine the
critical factors inuencing adherence to the PA guidelines. e current study developed a total of 18 prediction
Combination Algorithm
Accuracy F1-score AUC
Tota l rankCV Tes t CV Te st CV Test
All DT 0.695 0.705 0.809 0.819 0.546 0.542 1
L&A LR 0.691 0.703 0.514 0.820 0.514 0.532 2
All LightGBM 0.703 0.702 0.815 0.814 0.553 0.554 3
All XGBoost 0.702 0.700 0.817 0.815 0.540 0.539 4
L&A DT 0.689 0.699 0.812 0.818 0.515 0.527 5
All SVM 0.700 0.699 0.815 0.814 0.539 0.538 6
All LR 0.703 0.699 0.814 0.810 0.558 0.557 7
All RF 0.695 0.696 0.816 0.819 0.519 0.513 8
L&A RF 0.690 0.693 0.815 0.818 0.505 0.505 9
L&A LightGBM 0.689 0.693 0.814 0.817 0.507 0.510 10
Tab le 5. Top 10 models of total 18 models. AUC area under the curve, L&A lifestyle and anthropometric
variables, DT decision tree, CV cross-validation, LightGBM light gradient boosting machine, LR logistic
regression, RF random forest, SVM support vector machine, XGBoost extreme gradient boosting. For
combination, L&A includes alcohol, smoke, sleep time, SB, waist circumference, and BMI; Demographic
includes gender, age, race, marital status, educational status, employment status, and income. All combines
both L&A and anthropometric variables.
Scientic Reports | (2025) 15:5650 8
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
models, incorporating three combinations of variables, and employing six algorithms. ese models were ranked
based on their validity scores. Additionally, we identied highly inuential variables for each model.
e PFI results highlighted the most critical feature for developing each prediction model. In the top 10
models, SB had the highest number of occurrences in the rst rank (seven times). SB is a critical component
of human behavior, complementing PA as its counterpart42. While PA promotes positive health outcomes, SB
produces the opposite eects43,44. Together with sleep, SB and PA form the 24-hour activity cycle, where the
allocation of time to one oen reduces the time available for the other45. With these results, we could conjecture
that subjectively measured SB may be valuable in predicting adherence to PA guidelines. Except for SB, among
the total of 10 models, age, gender, and educational status were identied as the most frequently important
variables. Age, gender, and educational status, which are demographic variables, have been acknowledged to
be vastly inuential on PA. It is a well-known fact that males are more active than women achieving more
PA46. Especially, the Centers for Disease Control and Prevention (CDC) specically highlighted PA guidelines
tailored to dierent age groups, including preschool children, adolescents, adults, and older adults47. Likewise,
education-related disparities in PA have existed as an important factor48, and Kari et al.49 reported that an extra
year of education correlates with an increase of 0.26h in weekly vigorous activity, 560 more daily steps, and
390 more aerobic steps per day. Taking these two variables together, recent research regarding the association
among PA, age, and educational status concluded that there was an age-related decline in PA, steeper among
low-education individuals. In sum, our ndings are generally aligned with those of previous studies’ results and
veried that age and educational status play a substantial role in predicting adherence to PA. Remarkably, in the
top 10 models, the 1st, 3rd, 4th, 6th, and 7th models shared the same top four features (i.e., SB, age, gender, and
educational status) with only small dierences in order. ese consistent results contribute to the reassurance of
the validity of this study.
e present study demonstrated which variable combination is the best option for the model predicting
adherence to the PA guidelines. First, all six models using all variables occupied their place in the top 10.
Following that, four models using lifestyle and anthropometric variables were ranked. Interestingly, none of
the models using only demographic variables were listed in the top 10. Numerous studies have extensively
documented how demographic variables inuence PA. ey include gender: male and female50, age: adolescent,
adult, and elderly14, race: Asian, Black, Hispanic, and White51, marital status: single and married52, educational
status: low, middle, and high53, employment: employed and unemployed54, and income: low, middle, and
high55. Accordingly, many studies have conducted their research aer setting these variables as covariates56–58.
Likewise, there are many studies regarding PA and lifestyle variables, such as between alcohol consumers and
non-alcohol consumers59, smokers and non-smokers60, people with sucient sleep and insucient sleep61, and
sedentary people and less sedentary people62. BMI and waist circumference are also well-known indicators of
PA63. In comparing demographic variables with lifestyle and anthropometric variables, the latter proved to be a
more powerful combination for our models. As indicated in the PFI results, while age, gender, and educational
status were identied as important, SB emerged as the most signicant feature in our study. is suggests that a
single critical feature, such as SB, could potentially outweigh the impact of others, inuencing the results of the
comparison of the variable combinations.
e models in this study were developed by six algorithms, Logistic Regression, Support Vector Machine,
Decision Tree, Random Forest, XGBoost, and LightGBM. Of the top 10 models, in the context of algorithms,
seven models were Decision Tree or Decision Tree-based ensemble methods such as Random Forest, XGBoost,
and LightGBM. e Decision Tree algorithm is widely known as a practical method for the classication of
Fig. 3. Top 5 features in top 10 models. Dierent color density indicates weight dierence (high density means
higher importance). BMI body mass index, ES educational status, SB sedentary behavior, PFI permutation
feature importance, WC waist circumference.
Scientic Reports | (2025) 15:5650 9
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
activities through diverse studies. One study by Zhang et al.64 utilized the Decision Tree algorithm to classify
activities and posture and resulted in an accuracy of 98.85% without the boosting method. Nonetheless, due to its
drawbacks, several elds utilizing ML have begun to employ ensemble methods for their robustness. Chowdhury
et al.65 investigated whether ensemble algorithms could improve PA recognition accuracy compared to the single
classier algorithms, and it was proved that ensemble algorithms outperform the individual classiers. Similarly,
it was also outlined by Nu et al.25, which had a similar study design to the present study, that ensemble methods
(i.e., Random Forest and XGBoost) had higher AUC than Support Vector Machine. In aggregate, our models in
the top 10 revealed that algorithms based on Decision Tree and ensemble methods are eective choices when it
comes to choosing algorithms for PA classication.
To our knowledge, this study is the rst study developing predictive models for adherence to PA using
only subjectively measured variables and ML. Also, we controlled the types of variables in each model to nd
the optimal combination of variables and the critical features for the prediction model. Identifying the most
signicant predictors of PA adherence is crucial for designing targeted public health interventions and clinical
strategies. For example, reducing sedentary behavior (SB) or creating age-specic and education-specic PA
programs can improve adherence rates. is approach ensures resources are eectively directed toward key
determinants, maximizing their impact on PA promotion. Nevertheless, our study has a couple of limitations.
First, the study design was exclusively for individuals without constraints (i.e., disease and pregnancy) on PA. In
subsequent studies, researchers should investigate more diverse types of subjects so that proper models targeting
specic types of subjects can be invented. Second, the variables used in this study were very restricted, and due
to the relatively small number of variables in lifestyle and anthropometric variables, the present study merged
the two types of variables into one category, which might have inuenced the result of this study. e following
studies should select more variables and a balanced number of variables in each category. Additionally, the
omission of light-intensity physical activity, a crucial component of the 24-hour movement cycle, from our
analysis may limit the comprehensiveness of our results. Future rese arch should consider including light-intensity
PA to provide a more comprehensive assessment of PA impacts and to better understand the full spectrum of
activity intensities contributing to health outcomes. Lastly, although our study utilizes widely collected big data,
it is important to note that the data are subjectively measured, which resulted in our models’ validity scores being
relatively lower than those of studies using objectively measured PA21–23. erefore, continuous eorts should be
directed toward establishing eective ways to utilize subjectively measured PA data.
Conclusion
e study aimed to develop predictive models for adherence to PA guidelines using subjectively measured
comprehensive variables. Results showed that lifestyle and anthropometric variables were highly important
factors for our models in the context of variable categories. Additionally, in terms of each individual variable, SB,
age, and educational status were the most critical variables. Ensemble algorithms based on Decision Tree (DT)
were conrmed as highly recommended algorithms in this study design. ese ndings oer actionable insights
for both public health and clinical practice, allowing for personalized PA interventions that address individual
needs. Such strategies can enhance patient engagement and improve overall health outcomes. Our study also
demonstrated the possibility of using massive and subjectively measured PA data with ML. With this rst step,
further researchers can investigate more sophisticated PA-related research utilizing big data and ML.
Data availability
e codes and all the data used in this study are available on the web (Supplement File).
Received: 13 February 2024; Accepted: 10 February 2025
References
1. Anderson, E. & Durstine, J. L. Physical activity, exercise, and chronic diseases: a brief review. Sports Med. Health Sci. 1, 3–10 (2019).
2. Dunn, A. L. & Jewell, J. S. e eect of exercise on mental health. Curr. Sports Med. Rep. 9, 202–207 (2010).
3. Seo, M. W., Eum, Y. & Jung, H. C. Leisure time physical activity: a protective factor against metabolic syndrome development. BMC
Public Health 23, 1–8 (2023).
4. Janssen, I., Carson, V., Lee, I. M., Katzmarzyk, P. T. & Blair, S. N. Years of life gained due to leisure-time physical activity in the US.
Am. J. Prev. Med. 44, 23–29 (2013).
5. Patel, H. et al. Aerobic vs anaerobic exercise training eects on the cardiovascular system. World J. Cardiol. 9, 134 (2017).
6. Jakicic, J. M. et al. Association between bout duration of physical activity and health: systematic review. Med. Sci. Sports Exerc. 51,
1213 (2019).
7. Panza, G. A., Taylor, B. A., ompson, P. D., White, C. M. & Pescatello, L. S. Physical activity intensity and subjective well-being in
healthy adults. J. Health Psychol. 24, 1257–1267 (2019).
8. Tremblay, M. S. et al. New Canadian physical activity guidelines. Appl. Physiol. Nutr. Metab. 36, 36–46 (2011).
9. Misra, A. et al. Consensus physical activity guidelines for Asian indians. Diabetes Technol. er. 14, 83–98 (2012).
10. Piercy, K. L. et al. e physical activity guidelines for americans. JAMA 320, 2020–2028 (2018).
11. Bull, F. C. et al. World Health Organization 2020 guidelines on physical activity and sedentary behaviour. Br. J. Sports Med. 54,
1451–1462 (2020).
12. Janssen, I. Physical activity guidelines for children and youth. Appl. Physiol. Nutr. Metab. 32, S109–S121 (2007).
13. Elsawy, B. & Higgins, K. E. Physical activity guidelines for older adults. Am. Fam. Phys. 81, 55–59 (2010).
14. Elgaddal, N., Kramarow, E. A. & Reuben, C. Physical activity among adults aged 18 and over: United States, 2020. Natl. Center
Health Stat. (2022).
15. Finn, M., Sherlock, M., Feehan, S., Guinan, E. M. & Moore, K. B. Adherence to physical activity recommendations and barriers to
physical activity participation among adults with type 1 diabetes. Ir. J. Med. Sci. 191, 1639–1646 (2022).
16. Marruganti, C. et al. Adherence to Mediterranean diet, physical activity level, and severity of periodontitis: results from a
university-based cross-sectional study. J. Periodontol. 93, 1218–1232 (2022).
Scientic Reports | (2025) 15:5650 10
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
17. Rock, C. L. et al. American cancer society nutrition and physical activity guideline for cancer survivors. CA Cancer J. Clin. 72,
230–262 (2022).
18. Vähä-Ypyä, H. et al. How adherence to the updated physical activity guidelines should be assessed with accelerometer? Eur. J.
Public Health 32, i50–i55 (2022).
19. Desai, R. J., Wang, S. V., Vaduganathan, M., Evers, T. & Schneeweiss, S. Comparison of machine learning methods with traditional
models for use of administrative claims with electronic medical records to predict heart failure outcomes. JAMA Netw. Open 3,
e1918962 (2020).
20. Ngiam, K. Y. & Khor, W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20, e262–e273 (2019).
21. Hagenbuchner, M., Cli, D. P., Trost, S. G., Van Tuc, N. & Peoples, G. E. Prediction of activity type in preschool children using
machine learning techniques. J. Sci. Med. Sport 18, 426–431 (2015).
22. Ahmadi, M. N., Pavey, T. G. & Trost, S. G. Machine learning models for classifying physical activity in free-living preschool
children. Sensors 20, 4364 (2020).
23. Farrahi, V. et al. Evaluating and enhancing the generalization performance of machine learning models for physical activity
intensity prediction from raw acceleration data. IEEE J. Biomed. Health Inf. 24, 27–38 (2019).
24. Chong, J., Tjurin, P., Niemelä, M., Jämsä, T. & Farrahi, V. Machine-learning models for activity class prediction: a comparative
study of feature selection and classication algorithms. Gait Post. 89, 45–53 (2021).
25. Nu, U. K., Touati, T., Buddhadev, S., Sun, R. & Smuck, M. Classication and analysis of physical activity using NHANES data. 2020
IEEE Symposium Series on Computational Intelligence (SSCI) 527–533 (2020).
26. Chapman, P. et al. CRISP-DM 1.0: Step-by-step data Mining Guide (SPSS Inc., 2000).
27. Schröer, C., Kruse, F. & Gómez, J. M. A systematic literature review on applying CRISP-DM process model. Procedia Comput. Sci.
181, 526–534 (2021).
28. Whiteld, G. P., Ussery, E. N., Saint-Maurice, P. F. & Carlson, S. A. Trends in aerobic physical activity participation across multiple
domains among US adults, National Health and Nutrition Examination Survey 2007/2008 to 2017/2018. J. Phys. Act. Health 18,
S64–S73 (2021).
29. Nguyen, Q. H. et al. Inuence of data splitting on performance of machine learning models in prediction of shear strength of soil.
Math. Probl. Eng. 1, 1–15 (2021).
30. Lei, J. Cross-validation with condence. J. Am. Stat. Assoc. 115, 1978–1997 (2020).
31. Prusty, S., Patnaik, S., Dash, S. K. & SKCV Stratied K-fold cross-validation on ML classiers for predicting cervical cancer. Front.
Nanotechnol. 4, (2022).
32. Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
33. Ragan, B. G. & Kang, M. Construction of a classication/decision tree. Korean J. Meas. Eval Phys. Educ. Sports Sci. 7, 61–75 (2005).
34. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
35. Dhaliwal, S. S., Nahid, A. A. & Abbas, R. Eective intrusion detection system using XGBoost. Inf 9, 149 (2018).
36. Jin, D., Lu, Y., Qin, J., Cheng, Z. & Mao, Z. SwiIDS: real-time intrusion detection system based on LightGBM and parallel
intrusion detection mechanism. Comput. Secur. 97, 101984 (2020).
37. Schratz, P., Muenchow, J., Iturritxa, E., Richter, J. & Brenning, A. Hyperparameter tuning and performance assessment of statistical
and machine-learning algorithms using spatial data. Ecol. Model. 406, 109–120 (2019).
38. Humphrey, A. et al. Machine-learning classication of astronomical sources: estimating F1-score in the absence of ground truth.
Mon. Not. R Astron. Soc. Lett. 517, 116–120 (2022).
39. Altmann, A., Tološi, L., Sander, O. & Lengauer, T. Permutation importance: a corrected feature importance measure. Bioinformatics
26, 1340–1347 (2010).
40. C ar valho, D. V., Pereira, E. M. & Cardoso, J. S. Machine learning interpretability: a survey on methods and metrics. Electron 8, 832
(2019).
41. Mannini, A. & Sabatini, A. M. Machine learning methods for classifying human physical activity from on-body accelerometers.
Sensors 10, 1154–1175 (2010).
42. Park, J. H., Moon, J. H., Kim, H. J., Kong, M. H. & Oh, Y. H. Sedentary lifestyle: overview of updated evidence of potential health
risks. Korean J. Fam. Med. 41, 365 (2020).
43. Kim, Y., Barreira, T. V. & Kang, M. Concurrent associations of physical activity and screen-based sedentary behavior on obesity
among US adolescents: a latent class analysis. Med. Sci. Sports Exerc. 26, 137–144 (2016).
44. Ramsey, K. A. et al. e association of objectively measured physical activity and sedentary behavior with skeletal muscle strength
and muscle power in older adults: a systematic review and meta-analysis. Ageing Res. Rev. 67, 101266 (2021).
45. Dumuid, D. et al. e compositional isotemporal substitution model: a method for estimating changes in a health outcome for
reallocation of time between sleep, physical activity and sedentary behaviour. Stat. Methods Med. Res. 28, 846–857 (2019).
46. Azevedo, M. R. et al. Gender dierences in leisure-time physical activity. Int. J. Public Health 52, 8–15 (2007).
47. CDC. What you can do to meet physical activity recommendations. Centers for disease control and prevention. h t t p s : / / w w w . c d c .
g o v / p h y s i c a l - a c t i v i t y - b a s i c s / g u i d e l i n e s / i n d e x . h t m l (2020).
48. Scholes, S. & Bann, D. Education-related disparities in reported physical activity during leisure-time, active transportation, and
work among US adults: repeated cross-sectional analysis from the National Health and Nutrition Examination Surveys, 2007 to
2016. BMC Public Health 18, 1–10 (2018).
49. Kari, J. T. et al. Education leads to a more physically active lifestyle: evidence based on mendelian randomization. Scand. J. Med.
Sci. Sports 30, 1194–1204 (2020).
50. Molanorouzi, K., Khoo, S. & Morris, T. Motives for adult participation in physical activity: type of activity, age, and gender. BMC
Public Health 15, 1–12 (2015).
51. Sohn, E. K., Porch, T., Hill, S., orpe, R. J. & Geography race/ethnicity, and physical activity among men in the United States. Am.
J. Mens Health 11, 1019–1027 (2017).
52. Cavazzotto, T. G. et al. Age and sex-related associations between marital status, physical activity and TV time. Int. J. Environ. Res.
Public Health 19, 502 (2022).
53. Droomers, M., Schrijvers, C. T. & Mackenbach, J. P. Educational level and decreases in leisure-time physical activity: predictors
from the longitudinal GLOBE study. J. Epidemiol. Community Health 55, 562–568 (2001).
54. Kwak, L., Berrigan, D., Van Domelen, D., Sjöström, M. & Hagströmer, M. Examining dierences in physical activity levels by
employment status and/or job activity level: gender-specic comparisons between the United States and Sweden. J. Sci. Med. Sport
19, 482–490 (2016).
55. Shuval, K., Li, Q., Gabriel, K. P. & Tchernis, R. Income, physical activity, sedentary behavior, and the ‘weekend warrior’ among US
adults. Prev. Med. 103, 91–97 (2017).
56. Kim, Y. & Welk, G. J. Extracting objective estimates of sedentary behavior from accelerometer data: measurement considerations
for surveillance and research applications. PLoS ONE 10, e0118078 (2015).
57. Kim, H. & Kang, M. Validation of sedentary behavior record instrument as a measure of contextual information of sedentary
behavior. J. Phys. Act. Health 16, 623–630 (2019).
58. Choe, J. P., Kim, J. S., Park, J. H., Yoo, E. & Lee, J. M. When do individuals get more injured? Relationship between physical activity
intensity, duration, participation mode, and injury. Int. J. Environ. Res. Public Health 18, 10855 (2021).
59. Dodge, T., Clarke, P. & Dwan, R. e relationship between physical activity and alcohol use among adults in the United States: a
systematic review of the literature. Am. J. Health Promot. 31, 97–108 (2017).
Scientic Reports | (2025) 15:5650 11
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
60. Papathanasiou, G. et al. Smoking and physical ac tivity interrelations in health science students. Is smoking associated with physical
inactivity in young adults? Hellenic J. Cardiol. 53, 17–25 (2012).
61. Foti, K. E., Eaton, D. K., Lowry, R. & McKnight-Ely, L. R. Sucient sleep, physical activity, and sedentary behaviors. Am. J. Prev.
Med. 41, 596–602 (2011).
62. Gennuso, K. P., Gangnon, R. E., Matthews, C. E., raen-Borowski, K. M. & Colbert, L. H. Sedentary behavior, physical activity,
and markers of health in older adults. Med. Sci. Sports Exerc. 45, 1493–1503 (2013).
63. Dunsky, A., Zach, S. & Zeev, A. Level of physical activity and anthropometric characteristics in old age—results from a national
health survey. Eur. Rev. Aging Phys. Act. 11, 149–157 (2014).
64. Zhang, T., Tang, W. & Sazonov, E. S. Classication of posture and activities by using decision trees. Proc. Annu. Int. C onf. IEEE Eng.
Med. Biol. Soc. 4353–4356 (2012).
65. Chowdhury, A. K., Tjondronegoro, D., Chandran, V. & Trost, S. Ensemble methods for classication of physical activities from
wrist accelerometry. Med. Sci. Sports Exerc. 49, 1965–1973 (2017).
Acknowledgements
e authors are deeply grateful to the proofreaders and editors for their dedicated time and expertise.
Author contributions
JPC contributed to the conception and design of the study, data interpretation, manuscript draing, and ad-
ministrative support; SBL provided administrative, technical, material support, and critical revision of the man-
uscript; MSK had full access to all the data in the study, contributed to conception and design of the study,
and provided critical revision of the manuscript for important intellectual content. All authors have read and
approved the nal version of the manuscript and agree with the order of presentation of the authors.
Declarations
Competing interests
e authors declare no competing interests.
Additional information
Supplementary Information e online version contains supplementary material available at h t t p s : / / d o i . o r g / 1
0 . 1 0 3 8 / s 4 1 5 9 8 - 0 2 5 - 9 0 0 7 7 - 1 .
Correspondence and requests for materials should be addressed to M.K.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional aliations.
Open Access is article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives
4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in
any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide
a link to the Creative Commons licence, and indicate if you modied the licensed material. You do not have
permission under this licence to share adapted material derived from this article or parts of it. e images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence
and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to
obtain permission directly from the copyright holder. To view a copy of this licence, visit h t t p : / / c r e a t i v e c o m m o
n s . o r g / l i c e n s e s / b y - n c - n d / 4 . 0 / .
© e Author(s) 2025
Scientic Reports | (2025) 15:5650 12
| https://doi.org/10.1038/s41598-025-90077-1
www.nature.com/scientificreports/
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Content uploaded by Seungbak Lee
Author content
All content in this area was uploaded by Seungbak Lee on Feb 16, 2025
Content may be subject to copyright.