Access to this full-text is provided by Springer Nature.
Content available from BMC Medical Informatics and Decision Making
This content is subject to copyright. Terms and conditions apply.
Dolaand Valderrama
BMC Medical Informatics and Decision Making (2024) 24:367
https://doi.org/10.1186/s12911-024-02783-x
RESEARCH Open Access
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0
International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if
you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or
parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To
view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
BMC Medical Informatics and
Decision Making
Exploring parental factors inuencing low
birth weight onthe2022 CDC natality dataset
Sumaiya Sultana Dola1† and Camilo E. Valderrama1,2*†
Abstract
Background andaims Low birth weight (LBW), known as the condition of a newborn weighing less than 2500 g,
is a growing concern in the United States (US). Previous studies have identified several contributing factors, but many
have analyzed these variables in isolation, limiting their ability to capture the combined influence of multiple factors.
Moreover, past research has predominantly focused on maternal health, demographics, and socioeconomic condi-
tions, often neglecting paternal factors such as age, educational level, and ethnicity. Additionally, most studies have
utilized localized datasets, which may not reflect the diversity of the US population. To address these gaps, this study
leverages machine learning to analyze the 2022 Centers for Disease Control and Prevention’s National Natality Dataset,
identifying the most significant factors contributing to LBW across the US.
Methods We combined anthropometric, socioeconomic, maternal, and paternal factors to train logistic regression,
random forest, XGBoost, conditional inference tree, and attention mechanism models to predict LBW and normal
birth weight (NBW) outcomes. These models were interpreted using odds ratio analysis, feature importance, partial
dependence plots (PDP), and Shapley Additive Explanations (SHAP) to identify the factors most strongly associated
with LBW.
Results Across all five models, the most consistently associated factors with birth weight were maternal height,
pre-pregnancy weight, weight gain during pregnancy, and parental ethnicity. Other pregnancy-related factors, such
as prenatal visits and avoiding smoking, also significantly influenced birth weight.
Conclusion The relevance of maternal anthropometric factors, pregnancy weight gain, and parental ethnicity
can help explain the current differences in LBW and NBW rates among various ethnic groups in the US. Ethnicities
with shorter average statures, such as Asians and Hispanics, are more likely to have newborns below the World Health
Organization’s 2500-gram threshold. Additionally, ethnic groups with historical challenges in accessing nutrition
and perinatal care face a higher risk of delivering LBW infants.
Keywords Low birth weight, Machine learning, Interpretable predictive models, Parental factors, Maternal health,
Statistical analysis
†Sumaiya Sultana Dola and Camilo E. Valderrama contributed equally to this
work.
*Correspondence:
Camilo E. Valderrama
c.valderrama@uwinnipeg.ca
Full list of author information is available at the end of the article
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 2 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
Background
e birth weight of a newborn is a crucial determinant
of their survival chances because, according to the World
Health Organization (WHO), a newborn weighing less
than 2500 grams is at increased risk of dying in the first
28 days of life [1]. Moreover, low birth weight (LBW) is
associated with morbidity because those who survive
may experience long-term physiological, neuropsychi-
atric, cognitive, and social challenges that persist into
adulthood [2].
LBW is currently a public health issue in the United
States (US), which reports more cases than any other
Western European country [3]. Recent data show a 1%
increase in LBW from 8.52% in 2021 to 8.60% in 2022,
with a rise of 20% since 1980 [4]. As of 2022, 8.6% of the
US newborns were born with LBW, with Black newborns
experiencing the highest LBW rate (14.0%), followed by
Asian/Pacific Islanders (9.0%), American Indian/Alaska
Natives (8.3%), and Whites (7.2%). Surprisingly, the like-
lihood of LBW births among Black newborns was dou-
ble that of White newborns [5]. ese ethnic differences
were also reported by Paige etal. [6] after analyzing LBW
incidents in more than 113,760 singleton live births in
King County, Washington, from 2008 to 2012. e results
showed that women from certain ethnic groups who
were born outside of the US had a lower chance of having
an LBW newborn than females who were born in the US,
even after adjusting for common pregnancy complica-
tions. e lowest rates of LBW were found in White, Chi-
nese, and Korean women. On the other hand, the highest
rates of LBW were found in Filipino, Asian Indian, and
non-Hispanic Black women (6.8–7.6%).
According to Morisaki etal. [7], the disparities in birth
weight between ethnicities are not attributable to tra-
ditional factors like maternal age, socioeconomic sta-
tus, and behavioral characteristics (e.g., smoking) but to
maternal anthropometric factors. ey reached this con-
clusion after reviewing singleton US live births between
2009 and 2012, finding that height, BMI, and specific
pregnancy-related factors such as gestational weight gain
and preterm birth rates were the most significant factors
influencing LBW. Given the strong association between
maternal body composition, including height, and birth
weight [8], previous studies have suggested the need for
alternative methods to identify LBW, as the 2500 g cut-
off may not be appropriate for newborns of non-Euro-
pean descent [9].
Similar studies in other countries have also reported an
association between maternal physical, socioeconomic,
and health factors and LBW newborns. Sharma et al.
[10], after reviewing 193 neonates in Chandigarh, India
reported that a LBW prevalence of 23.8%, with higher
rates observed among newborns whose mothers were
under 20 (50.0%), poorly educated mothers (32.6%), and
mothers with a pre-pregnancy weight less than 45 kg
(50.0%).
Other factors contributing to LBW include health con-
cerns, inadequate prenatal care, lower socioeconomic
status, and limited education [11]. ese factors nega-
tively impact both the physical and mental health of the
mother during pregnancy. e sex of the newborn is
also an LBW contributor due to the inherent biological
differences in growth patterns between male and female
fetuses. According to Broere-Brown [12], there are dif-
ferences in the weight and other biometrics of male and
female fetuses, which leads to different body propor-
tions. Male newborns generally weigh more, are longer,
and have larger head circumferences than their female
counterparts.
ese previous studies have identified some relevant
factors influencing LBW, such as maternal age, educa-
tion, socioeconomic status, and ethnicity [4, 5, 10, 11].
Also, one study has mentioned the strong influence of
maternal anthropometric factors on birth weight out-
comes [7]. However, although these studies have outlined
factors shaping birth weight, they have not evaluated the
extent to which these factors intersect to create a paren-
tal profile associated with a higher risk of having LBW
newborns. Furthermore, their focus has primarily been
on maternal health, demographics, and socioeconomic
factors, often overlooking potential paternal influences
such as the father’s age, education level, and ethnicity.
Additionally, most of the previous research has been
restricted to specific local populations in the US, neglect-
ing the diversity across the US population. erefore,
there is a need for a more comprehensive analysis that
incorporates various factors, including paternal predic-
tors, to identify the most significant contributors to LBW
across all 50 US states.
One way to correlate different factors to identify those
more associated with LBW is to leverage machine learn-
ing (ML) and deep learning (DL) predictive models.
Unlike traditional statistical methods and statistical
hypothesis tests, which cannot accommodate interac-
tions among many variables simultaneously, are limited
in their ability to handle collinearity, and require a priori
hypotheses about how variables relate with one another
[13–17], ML and DL models can handle multiple corre-
lated predictors simultaneously, yielding highly interpret-
able outcomes [18, 19]. In this way, ML and DL models
can provide a practical approach to operationalize iden-
tifying population subgroups with a high proportion of
LBW.
is study presents an approach based on ML and DL
models to correlate multiple factors, including anthropo-
metric, socioeconomic, and demographic factors from
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 3 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
both mothers and fathers, to predict LBW in a national
US newborn dataset provided by the Centers for Disease
Control and Prevention (CDC) [20]. To that aim, we use
a range of predictive models, including logistic regres-
sion, random forest, XGBoost, conditional inference tree
and attention mechanism layers, to determine which
factors most significantly influence LBW. Furthermore,
for explaining our models and to enhance interpretabil-
ity, we apply Shapley additive explanations (SHAP) and
partial dependence plots (PDP) to the outputs of these
predictive models, allowing us to identify both direct
and inverse relationships between the factors and birth
weight.
Methods
Data source
For this study, we used the 2022 National Natality Data-
set, a publicly available file, provided by the Centers for
Disease Control and Prevention (CDC) [20, 21]. e
dataset comprises information for 3,675,606 birth reg-
istrations that occurred in the US in 2022. For each
newborn, 227 features are provided, including mater-
nal anthropometric (height and weight), parental demo-
graphics (parent’s race and education), birth weight, etc.
e data was collected from the delivery admission form
filled out by the mothers, as well as from the medical
records collected before and during delivery, such as the
first prenatal care visit date, pregnancy risk factors, and
delivery mode.
Predictor variables
e 2022 National Natality Dataset provides 227 features
describing births that occurred in the US, from both resi-
dents and non-residents. To reduce collinearity between
the predictors, as well as reduce the computational cost
of building the predictor models, we selected 20 vari-
ables out of a total of 227. Our selection was based on
previous studies suggesting significant factors influenc-
ing birth weight [5, 8, 11, 22, 23]. ese variables fall into
anthropometric, maternal, paternal, socioeconomic, and
ethnicity.
Anthropometric variables generally reflect an indi-
vidual’s physical and biological development through
body measurements like height, weight, and body mass
index (BMI) [24]. ese measurements provide infor-
mation about the mother’s nutrition and health, which
are important indicators of the newborn’s health. e
BMI identifies pregnancy complications caused by being
underweight or overweight, which may impact the birth
weight [25]. Maternal height and pre-pregnancy weight
significantly influence fetal growth together. Taller moth-
ers experience accelerated fetal growth in the first and
second trimesters, likely due to genetic factors, whereas
maternal weight status increasingly influences intrauter-
ine growth in the third trimester [26]. Overall, taller and
heavier mothers tend to give birth to larger newborns.
Parental factors, particularly the mother’s age, play a
critical role in determining birth outcomes. Younger and
older mothers often face increased complications, such
as preterm birth and LBW, due to their age [27]. Simi-
larly, older fathers’ age is associated with greater genetic
abnormalities in offspring. In comparison to fathers aged
20 to 34, those older than 34 years have a 90% higher
chance of having an LBW newborn, and teenage fathers
have a 20% lower chance [28]. On another note, mater-
nal smoking during pregnancy affects fetal development
by shortening the gestation period and reducing fetal
growth, leading to LBW [29].
Pregnancy history, including previous live births, still-
births, or neonatal deaths, also provides insight into
potential risks. Mothers who have had two or more suc-
cessful pregnancies tend to have more newborns with
normal birth weight, compared to nulliparous women
[30–32]. In contrast, a history of previous fetal loss has
been linked to a higher occurrence of abnormalities in
pregnancies [33]. is kind of occurrence can physically
and mentally affect a mother [34, 35]; as a result, the out-
comes are adverse.
Parental education levels significantly influence birth
outcomes by affecting access to resources and health lit-
eracy [11, 22, 23]. Mothers and fathers with more educa-
tion tend to get better prenatal care and make healthier
lifestyle choices, leading to more favorable birth out-
comes. Prenatal care and the frequency of prenatal vis-
its are critical [36], as they ensure timely monitoring and
intervention, which are essential for identifying and miti-
gating risks during pregnancy.
Various studies indicate that birth outcomes are not
consistent across different ethnicities [5, 37, 38]. Moreo-
ver, the origin of the parents can affect the health of the
fetus. Lebron etal. [39] investigate the significant influ-
ence of a mother’s origin on healthcare access, edu-
cational opportunities, and economic stability among
Hispanic subgroups. ese factors are all related to soci-
oeconomic status and have an impact on mothers and
newborn health outcomes, such as breastfeeding, birth
weight, and newborn mortality. is study also describes
how sociopolitical factors, particularly immigrant poli-
cies, directly and indirectly affect these health outcomes
through stress, limited healthcare access, and other
mechanisms.
Outcome variable
We aim to analyze the factors that influence newborns
birth weight. As such, we used the birth weight (DBWT)
column to determine the outcome variable. As 2500
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 4 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
grams is the WHO’s established cut-off for LBW, we
divided the birth records into two classes. Newborns
with a birth weight lower than 2500 grams were labeled
as “Low Birth Weight” (LBW), and those whose birth
weight was higher than 2500 grams were labeled as “Nor-
mal Birth Weight” (NBW).
Data ltering
In this study, we focused exclusively on newborns with a
gestational age of at least 37 weeks (i.e., COMBGEST
≤
37) due to the strong correlation between preterm births
and low birth weight (LBW) [40–43]. Newborns born
before 37 weeks typically have a birth weight below 2500
grams, and including them could skew our analysis. We
also excluded non-singleton records, as indicated by the
column ‘DPLURAL’, to prevent confounding factors asso-
ciated with multiple pregnancies. Records from parents
identified as mixed race were excluded to avoid ambi-
guity in the interpretation of results among ethnicities.
Furthermore, we only included infants reported to be
alive at the time of the report to avoid bias in our pre-
dictions due to medical complications. To assess fetal
well-being against the predictor variables, we removed
any birth records lacking an APGAR score at 5 minutes.
Finally, records with unknown values for the selected
predictor and outcome variables were also excluded.
Figure1 shows the data filtering process. Initially, our
dataset included 3,675,606 newborn newborns. After
filtering out instances based on gestational age, plural-
ity records, mixed races, infant living at the time of the
report, and unknown values, the final dataset contained
2,303,722 instances.
Distribution ofthepredictor variables
Tables1 and 2 show the distribution of the 20 predic-
tor variables, separated into numerical and categorical
variables, respectively. For the numerical variables, the
mean and standard deviation are provided, while for the
Fig. 1 Data filtering process
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 5 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
categorical variables, the number of samples and the rela-
tive frequency for each category are displayed. e differ-
ent subgroups that six parental ethnicities encompassed
are displayed in Table3.
Data preparation
Training andtest sets
e final dataset containing 2,303,722 instances was split
into training and testing sets. e training set comprised
80% of the data, and the test set comprised 20% of the
data. Although our major goal was to combine paren-
tal factors to identify those most associated with birth
weight outcomes, we used the test set to evaluate the
generalization capacity. In detail, given that the test set
was not used to fit the predictive models, assessing the
models on these independent samples provided a reliable
means of evaluating the identified patterns.
To tune the hyperparameters of the machine learning
models, we further split the training set into two sets:
training and validation. Each hyperparameter configura-
tion was used to train the model, and the validation set
was used for performance evaluation. e hyperparam-
eters with the highest performance were selected to train
the final model, which was then evaluated in the inde-
pendent, held-out test data.
To train and evaluate the performance of the predic-
tive models, NBW was labeled ‘1’, while LBW was labeled
as ‘0’. As the dataset was imbalanced, with LBW being
the minority class, the models were trained to prioritize
the accurate prediction of LBW. is focus was driven
by the fact that LBW is a critical health condition that
requires proper identification. Consequently, our models
were optimized to minimize false negatives (newborns
predicted as NBW when they were actually LBW) over
false positives (newborns predicted as LBW when they
were actually NBW).
Data preprocessing
e predictor variables were separated into numerical
and categorical variables. e categorical variables were
converted into dummy variables using one hot encod-
ing. e numerical variables were scaled using min-max
normalization.
Resampling
Because the number of LBW cases in the training set
were only around 3%, the training set was imbalanced. To
address this issue, we employed Random Over Sampling
(ROS) to ensure a more balanced distribution of classes
on the training set.
Predictive models
We used logistic regression, random forest, XGBoost,
conditional inference tree, and attention mechanisms to
predict the two birth weight classes. ese models used
different non-linear relationships between the predictor
variables to classify between LBW and NBW newborns.
Together, these five models offer a robust approach for
identifying relevant predictors of birth weight, highlight-
ing those that consistently emerged as significant across
all predictive methods.
Logistic regression converted the combination of pre-
dictors variables into probabilities using the sigmoid
function, thus indicating which combinations had higher
odds to belong to the NBW class. To train logistic regres-
sion, we used the majority category on the categorical
Table 1 Description of the 14 numerical predictor variables selected for predicting normal birth weight against low birth weight. For
each variable, the mean and standard deviation (SD) is provided
Category Variable Description Mean ± SD
Anthropometric M_Ht_In Maternal height (inches) 64.2 (2.8)
BMI Body Mass Index 27.5 (6.7)
PWgt_R Pre-pregnancy weight (pounds) 161.4 (41.4)
Paternal Factor FAGECOMB Parental age (years) 32.0 (6.6)
Maternal Factor MAGER Maternal age (years) 29.8 (5.5)
WTGAIN Weight gain (pounds) 29.3 (14.7)
CIG_0 Daily cigarettes before pregnancy 0.5 (3.1)
CIG_1 Daily cigarettes during 1st trimester 0.3 (2.3)
CIG_2 Daily cigarettes during 2nd trimester 0.2 (1.9)
CIG_3 Daily cigarettes during 3rd trimester 0.2 (1.8)
Previous Pregnancies PRIORLIVE Prior births now living (count) 1.1 (1.2)
PRIORDEAD Prior births now dead (count) 0.0 (0.2)
Prenatal care PREVIS_REC Number of prenatal visits (count) 6.9 (1.8)
PRECARE5 Month prenatal care began 2.8 (1.4)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 6 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
variables as a reference (see Table 2). us, for both
maternal and paternal ethnicity, the reference category
was White; for newborn sex, the reference was male; for
maternal education level, the reference was a Bachelor’s
degree; for paternal education, the reference was a high
Table 2 Description of the 6 categorical predictor variables selected for predicting normal birth weight against low birth weight. For
each variable, the number of categories
Category Variable Description Number (percent)
Ethnicity MRACE15 Maternal ethnicity White 1,357,994 (59.0)
Hispanic 491,418 (21.3)
Black 267,922 (11.6)
Asian 167,357 (7.3)
Indigenous 13,677 (0.6)
Pacific Islanders 5,354 (0.2)
FRACE15 Paternal ethnicity White 1,356,042 (58.9)
Hispanic 457,292 (19.9)
Black 319,525 (13.9)
Asian 151,372 (6.6)
Indigenous 13,534 (0.6)
Pacific Islanders 5,957 (0.3)
Newborn sex SEX Newborn’s sex Male 1,173,414 (50.9)
Female 1,130,308 (49.1)
Socioeconomic MEDUC Maternal education level 8th grade or less 51,656 (2.2)
9th through 12th grade with no diploma 124,089 (5.3)
High school graduate or GED completed 530,308 (23.0)
Some college credit, but not a degree 401,453 (17.4)
Associate degree (AA, AS) 209,008 (9.1)
Bachelor’s degree (BA, AB, BS) 60,3961 (26.2)
Master’s degree (MA, MS, MEng, MEd, MSW, MBA) 295,846 (12.8)
Doctorate (PhD, EdD) or Professional Degree (MD, DDS, DVM, LLB, JD) 87,401 (3.8)
FEDUC Paternal education level 8th grade or less 62,893 (2.7)
9th through 12th grade with no diploma 155,774 (6.8)
High school graduate or GED completed 683,298 (29.7)
Some college credit, but not a degree 404,908 (17.6)
Associate degree (AA, AS) 174,720 (7.6)
Bachelor’s degree (BA, AB, BS) 527,983 (22.9)
Master’s degree (MA, MS, MEng, MEd, MSW, MBA) 204,495 (8.9)
Doctorate (PhD, EdD) or Professional Degree (MD, DDS, DVM, LLB, JD) 89,651 (3.89)
MBSTATE_REC Maternal origin US born 1,819,958 (79.0)
born outside 483,764 (21.0)
Table 3 Detailed breakdown of ethnic categories of parents
Ethnicity of parents Categories
White
Black
Asian Asian Indian, Chinese, Filipino, Japanese, Korean, Vietnamese, Other Asian
Hispanic Mexican, Puerto Rican, Cuban, Central or South American, Dominican,
Other and unknown Hispanic
Indigeneous American Indian and Alaska Native
Pacific Islander Native Hawaiian and Other Pacific Islander
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 7 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
school graduate; and for maternal origin, the reference
was US-born.
Random forest (RF) and XGBoost built multiple deci-
sion trees to identify rules more associated with LBW
and NBW. Each tree was built by using a subset of train-
ing data and a subset of the predictors variables that
were selected randomly. e difference between RF and
XGBoost is the way the individual trees were combined.
RF used a bagging strategy, in which the trees are trained
independently. In contrast, XGBoost used a boosting
strategy, in which trees were trained sequentially aiming
that each new tree corrected the mistakes made by the
previous ones.
e conditional inference tree (CIT) built a tree relat-
ing the predictors based on their capacity to separate
samples in two groups that were statistically significantly
different [44]. To that aim, the CIT evaluated multiple
hypothesis tests with Bonferroni correction to find the
predictor variable that produces the lowest pvalue to dis-
criminate between LBW and NBW cases.
e attention layer mechanism is a deep learning
model that identifies the variables that a model focuses
on the most when making predictions [45]. is was
achieved using three matrices, query (Q), key (K), and
value (V), which were correlated to assign an attention
weight to each input feature as follows:
where
dk
was the dimension of the keys, and the softmax
function ensured that the attention weights sum to 1,
normalizing the attention weights. e attention weight
highlighted the importance of different input features
relative to discriminating between LBW and NBW cases.
Evaluation performance
To evaluate the performance of the models, six different
metrics were computed. e first two corresponded to
the individual recall for each class. e remaining four
corresponded to the average (macro) of the individual
class metric for recall, precision, F1-score, receiver oper-
ating characteristic area under the curve (ROC AUC),
and the precision-recall area under the curve (ROC PR).
Data analysis andinterpretation
After training the predictive models, we applied various
post-processing methodologies to identify the variables
that consistently emerged as significant across all pre-
dictive methods. Table4 shows the different methodolo-
gies used to interpret the models. ese interpretation
methods allowed us to identify the common factors that
Attention
(Q,K,V)=softmax QK T
d
k
V
consistently emerged as relevant across all analyses in
distinguishing between NBW and LBW cases.
Odds‑ratio analysis
For logistic regression, we performed odds-ratio analysis
to determine which variables significantly correlated with
birth weight outcomes, thus identifying those that were
strongly associated with LBW.
Feature importance
For the ensemble models, we conducted a feature impor-
tance analysis to identify the most influential factors
contributing to the predictions. e ensemble models
computed importance scores by weighting, summing,
and averaging attribute data across all decision trees,
identifying the factors that were most sensitive and criti-
cal for prediction performance.
Attention weights
Similarly, for the attention mechanism, we visualized the
attention scores assigned to each predictor after training
the model. Higher attention scores indicated that a par-
ticular feature was more relevant for the prediction task.
Conditional inference tree
We visualized the branches of the CIT, with each branch
representing a classification rule that offers insights into
how different predictor variables are combined to classify
NBW and LBW cases. By analyzing these branches, we
identified parental profiles associated with the lowest and
highest proportions of LBW newborns.
Partial dependence plots
To visualize the marginal impact of a single feature on
LBW and NBW cases, we implemented Partial Depend-
ence Plots (PDP) [46] using the logistic regression model.
PDPs illustrate how a feature influences the predicted
outcome by displaying the average prediction while hold-
ing other features constant. Unlike feature importance
techniques, PDPs can reveal both the direction and
Table 4 Interpretability methodologies to post-process trained
predictive models
Technique Models
Odd Ratio Analysis LR
Feature Importance RF, XGBoost
Attention Weights Attention Layer
Partial Dependence Plot (PDP) LR
Conditional Inference Tree CIT
SHAP Values LR, RF, XGBoost
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 8 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
nature of the relationship between a feature and the pre-
diction outcome.
Shapley additive exPlanations
We employed Shapley Additive Explanations (SHAP) to
analyze the predictive rules of logistic regression, RF, and
XGBoost further. SHAP analysis quantified the contri-
bution of each feature to individual predictions, offering
a detailed understanding of the models’ behavior. Spe-
cifically, the SHAP analysis generated visualizations that
illustrate the contribution of each feature to the predic-
tions. e summary plots displayed each variable verti-
cally, with the x-axis representing the range of SHAP
values. Positive values on the x-axis indicated a higher
likelihood of the predicted outcome, while negative val-
ues suggested a lower likelihood. For a specific feature,
red points on the right indicated a positive contribution
to the likelihood of achieving NBW, whereas blue points
on the left indicated a negative impact, reducing the
likelihood of NBW. When a feature exhibited a signifi-
cant contrast between red and blue across both positive
and negative SHAP values, it suggested that the feature’s
effect on the prediction varied considerably across its
range.
Result
Model evaluation onthetesting set
Table 5 shows the performance on the held-out, inde-
pendent test samples. All the models achieved an accu-
racy greater than 64%, with XGBoost showing the highest
performance. Overall, the predictive models performed
better for predicting NBW than LBW. e macro preci-
sion, F1-score, and PR AUC were the lowest metrics due
to the high imbalance between NBW and LBW classes, as
well as the fact that the models were trained to prioritize
the prediction of LBW cases. Consequently, the models
obtained a false positive rate for the LBW class around
34%, which, given the high ratio between LBW and NBW
samples (1:30), resulted in a low precision for the LBW
class. Nevertheless, the average ROC AUC across the
models was nearly 70%, indicating that the models were
able to effectively distinguish between LBW and NBW
cases [47].
Odds ratio analysis
Table 6 shows significant factors (
p value <0.05
)
obtained by the logistic regression for predicting NBW
(class labeled as ‘1’) and LBW (class labeled as ‘0’). Mater-
nal anthropometrics showed a strong association with
the odds of having NBW newborns. Specifically, taller
mothers and those with higher pre-pregnancy weight had
higher odds of delivering NBW newborns. In addition
to anthropometric factors, the chronological age of both
the mother and father showed a negative association
with NBW, as the odds of delivering an NBW newborn
decreased with increasing parental age.
e logistic regression analysis showed that parental
ethnicity correlated with birth weight outcomes. Par-
ents who identified as Black or Asian had higher odds of
having LBW offspring than their White counterparts. In
contrast, Hispanic mothers were more likely to have new-
borns with NBW compared to White mothers. Interest-
ingly, mothers who were born outside the US were more
associated with NBW newborns than US-born mothers.
Actions taken during pregnancy and previous preg-
nancy history significantly influenced the odds of deliv-
ering NBW infants. For instance, gaining adequate
weight during pregnancy and attending prenatal visits
were positively associated with having NBW newborns.
Conversely, smoking habits during pregnancy negatively
impacted the odds of NBW, particularly in the first tri-
mester, where an increase of one unit in daily cigarette
consumption decreased the odds of delivering an NBW
newborn by 76%. Additionally, the number of previ-
ous living births emerged as a critical indicator of NBW
outcomes, suggesting that mothers with a history of
Table 5 Performance of the predictive models for classifying low-birth weight (LBW) and normal-birth weight (NBW). Individual
recall for each class is presented, along with macro accuracy, recall, macro precision, macro F1-score, macro area under the receiver
operating characteristic curve (ROC AUC), and macro area under the precision-recall curve (PR AUC)
Model LBW recall (
%
)NBW recall (
%
)Accuracy(%) Macro recall (
%
) Macro
precision
(
%
)
Macro
F1-score
(
%
)
Macro ROC
AUC (
%
)Macro
PR AUC
(
%
)
LR 64.0 66.0 66.0 65.0 52.0 44.0 70.4 52.9
RF 62.0 66.0 66.0 64.0 52.0 44.0 69.5 52.7
XGBoost 66.0 68.0 68.3 67.0 52.0 46.0 73.4 53.9
CIT 61.6 61.8 61.8 61.7 51.4 43.4 63.8 51.3
Attention Mechanism 64.0 66.0 66.3 65.0 52.0 45.0 70.5 52.9
Average 63.5 65.6 65.7 64.3 51.9 44.5 69.52 52.7
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 9 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
successful pregnancies have a higher likelihood of deliv-
ering NBW newborns.
e educational levels of both mothers and fathers sig-
nificantly influenced the likelihood of a newborn having
a NBW. Mothers with an education level of an associate
degree or lower exhibited lower odds of delivering NBW
newborns compared to those with a bachelor’s degree.
Similarly, fathers who completed at least a bachelor’s
degree had approximately 30% higher odds of having
an NBW newborn than those who graduated from high
school.
Ensemble models relevant features
Figures2 and 3 illustrate the feature importance for
the random forest and XGBoost models, respectively.
For both ensemble models, weight gain during preg-
nancy emerged as the most important predictor of
NBW and LBW cases. Additionally, pre-pregnancy
Table 6 Odds ratios analysis for the logistic regression coefficients. All coefficients were significant at the significance level of 0.05.
The top 10 significant features are the mothers who were born outside of us, Asian fathers, fathers with a bachelor’s degree, female
newborns, Black mothers, month prenatal care began, number of prenatal care visits, number of previous living births, and weight gain
Category Variable Coecient 95% CI Odds ratio P val
Anthropometric Maternal height 3.70 (3.45, 4.00) 41.40 < 0.001
Pre-preganncy weight 1.54 (1.16, 1.92) 4.66 < 0.001
Ethnicity (White as reference) Mother - Black −0.53 (−0.55, −0.51) 0.59 0.0
Father - Asian −0.53 (−0.55, −0.51) 0.59 < 0.001
Father - Black −0.29 (−0.31, −0.27) 0.75 < 0.001
Mother - Hispanic 0.13 (0.11, 0.15) 1.14 < 0.001
Mother - Asian −0.14 (−0.16, −0.11) 0.87 < 0.001
Mother - Indigenous −0.28 (−0.39, −0.17) 0.75 < 0.001
Father - Pacific Islander −0.11 (−0.17, −0.05) 0.90 < 0.001
Maternal Education (Bachelors degree as refer-
ence) Mother - 9th through 12th grade
with no diploma −0.40 (−0.42, −0.38) 0.66 < 0.001
Mother - High school graduate or GED com-
pleted −0.20 (−0.21, −0.18) 0.81 < 0.001
Mother - Some college credit, but not a degree −0.15 (−0.16, −0.13) 0.86 < 0.001
Mother - Associate degree −0.10 (−0.13, −0.09) 0.90 < 0.001
Mother - 8th grade or less −0.17 (−0.20, −0.13) 0.84 < 0.001
Mother - Master’s degree 0.05 (0.04, 0.07) 1.06 < 0.001
Paternal Education (High school graduate
or GED completed as reference) Father - Bachelor’s degree 0.34 (0.33, 0.36) 1.41 0.0
Father - Master’s degree 0.29 (0.27, 0.31) 1.33 < 0.001
Father - Some college credit, but not a degree 0.17 (0.16, 0.18) 1.18 < 0.001
Father - Doctorate or Professional Degree 0.33 (0.30, 0.35) 1.39 < 0.001
Father - Associate degree 0.18 (0.16, 0.20) 1.20 < 0.001
Father - 9th through 12th grade
with no diploma −0.06 (−0.08, −0.04) 0.93 < 0.001
Father - 8th grade or less 0.11 (0.08, 0.14) 1.12 < 0.001
Paternal age Paternal age −0.26 (−0.34, −0.17) 0.77 < 0.001
Maternal factors Weight gain 2.63 (2.60, 2.67) 13.93 0.0
Maternal age −0.38 (−0.43, −0.34) 0.68 < 0.001
Daily cigarettes before pregnancy −1.64 (−1.82, −1.45) 0.19 < 0.001
Daily cigarettes in the 1st trimester −1.39 (−1.78, −0.99) 0.24 < 0.001
Daily cigarettes in the 3rd trimester −1.20 (−1.74, −0.66) 0.30 < 0.001
Daily cigarettes in the 2nd trimester −1.23 (−1.86, −0.61) 0.29 < 0.001
Newborn sex (male as reference) Female (1: ‘yes’, 0: ‘no’) −0.21 (−0.21, −0.19) 0.81 0.0
Previous pregnancies previous living births 3.62 (3.54, 3.69) 37.29 0.0
Prenatal care Number of prenatal visits 1.03 (1.01, 1.06) 2.81 0.0
Month prenatal care started 0.60 (0.57, 0.63) 1.82 0.0
Mother origin (Born in the US as reference) Born Outside the US (1: ‘yes’, 0: ‘no’) 0.39 (0.39, 0.40) 1.48 0.0
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 10 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
weight, maternal height, number of prenatal care vis-
its, and previous living births ranked among the top
ten features in both models. The random forest and
XGBoost also highlighted the significance of paternal
factors in predicting birthweight outcomes, revealing
that the father’s ethnicity (White or Black) and age
were critical for classifying LBW and NBW. Notably,
neither model included educational factors in their top
ten rankings based on feature importance.
Attention mechanism layer
Figure4 shows the attention scores assigned by the self-
attention mechanism to each variable. e bar chart
ranks features according to their importance scores, with
taller bars indicating greater significance for predicting
birth weight. Among the features, the education level of
parents exhibited the highest importance. Additionally,
the number of prenatal care visits, the presence of Asian
fathers, Black mothers, mothers born in the US., the
Fig. 2 Feature importance for the random forest (RF) model. The top ten predictors identified as most relevant for birth weight predictions were
weight gain (WTGAIN), Black parents, maternal height maternal height (M_Ht_in), pre-pregnancy weight (PWgt_R), number of previous living births
(PRIORLIVE), White parents, number of prenatal care visits (PREVIS_REC), and female infants
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 11 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
month prenatal care commenced, pre-pregnancy weight
gain, maternal height, and weight gain during pregnancy
were also among the features that received the highest
attention weights.
Partial dependence plots
Figures5, 6, 7, 8, 9, 10, 11, 12 and13 show the PDP for
nine parental factors based on the logistic regression
output, namely: weight gain during pregnancy, maternal
height, maternal pre-pregnancy weight, as well as paren-
tal ethnicity and education. In the plots, the x-axis repre-
sents the range of values for each feature, with numerical
features grouped into bins and categorical features repre-
sented by individual categories. e distribution of fea-
ture values was also displayed along the x-axis. e y-axis
shows the predicted change in the model output, with
the leftmost value on the x-axis serving as the reference
point. To aid interpretation, the PDP of the reference
value was set to zero, highlighting relative changes across
the feature values.
Figures5, 6, 7 display maternal anthropometric factors
effect on the chances of delivering an NBW newborn. Fig-
ure5 shows a significant upward trend with weight gain
during pregnancy, indicating that higher weight gains
Fig. 3 Feature importance for the XGBoost model. The top ten predictors identified as most relevant for birth weight predictions were weight gain
(WTGAIN), number of prenatal care visits (PREVIS_REC), number of previous living births (PRIORLIVE), pre-pregnancy weight (PWgt_R), BMI, month
prenatal care began (PRECARE), parental age (MAGER and FAGECOMB), maternal height (M_Ht_in), BMI, mothers who were born in the US
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 12 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
were strongly related to NBW outcomes. For maternal
height, Fig.6 indicates that mothers between 61 and 64
inches (155−163 cm) had similar probabilities of deliver-
ing an NBW newborn, but these probabilities increased
steadily for mothers taller than 64 inches, suggesting that
taller mothers were more likely to deliver NBW new-
borns. In terms of pre-pregnancy weight (Fig.7), there
was an increasing trend, indicating that heavier mothers
had more chances to deliver NBW newborns.
Figures 8 and 9 show the effect of parental age on
the birth weight prediction. e trend for both parents
was inverse, indicating that the older parents were, the
lower the probability of having an NBW newborn was.
Figures10 and 11 show the impact of parental ethnic-
ity on NBW outcomes. In general, White parents had
a higher probability of having an NBW newborn than
fathers from other ethnicities. Asian and Black par-
ents were those with the highest risk of having an LBW
newborn. Among ethnicities, Hispanic mothers were
the only group with a higher likelihood of delivering an
NBW newborn compared to White mothers.
Figures12 and 13 show the influence of parental edu-
cation on birth weight outcomes. Mothers with at least
a bachelor’s degree were more likely to deliver a new-
born with normal birth weight (NBW) compared to
those with only a high school diploma or some college
credits. Regarding fathers, those who had completed at
least an associate degree showed a significantly higher
probability of having an NBW newborn.
Fig. 4 Feature importance from the attention mechanism layer, based on attention scores assigned to each predictor variable. As a reference, equal
relevance for all predictors would result in a score of
1/
46 =2.2x10−
2
. Variables with scores higher than
2.2x10
−
2
contributed the most to the birth
weight predictions
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 13 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
Conditional inference tree
Figure14 displays the conditional inference tree when its
maximum height was constrained to three levels. Among
the different predictor variables, the tree identified that
the most critical variables to discriminate between NBW
and LBW cases were maternal ethnicity, maternal height,
and maternal weight gain.
Fig. 5 PDP for maternal weight gain during pregnancy
Fig. 6 PDP for maternal height
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 14 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
Fig. 7 PDP for pre pregnancy weight
Fig. 8 PDP for maternal age
Fig. 9 PDP for paternal age
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 15 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
Based on maternal ethnicity, the tree was split into
two groups: one group included White, Hispanic,
Pacific Islander, and Indigenous mothers, whereas the
other one encompassed Black and Asian mothers. For
the White, Hispanic, Pacific Islander, and Indigenous
mothers, the node with the highest proportion of
LBW cases corresponded to mothers smaller than 63
inches who gained less than
28 lbs
during pregnancy,
and whose pre-pregnancy weight was lower than 131
lbs (Node 5; 68.8%). For Black and Asian mothers, the
node with the highest proportion of LBW cases was for
mothers who gained less than
28 lbs
(Node 9; 69.0%).
e node with the highest proportion of NBW new-
borns (Node 18; 73.6%) corresponded to White, His-
panic, Pacific Islander, and Indigenous mothers taller
than 63 inches who gained more than
27 lbs
and held a
bachelor’s, Master’s, PhD, or professional degree.
SHAP analysis
Figures15, 16, and 17 show the top 20 factors based on
SHAP values for the logistic regression, random for-
est, and XGBoost models, respectively. e SHAP sum-
mary plots revealed consistent patterns across all models
for predicting birth weight. Notably, weight gain during
Fig. 10 PDP for maternal ethnicity (Mother - white as reference)
Fig. 11 PDP for paternal ethnicity (Father - white as reference)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 16 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
pregnancy emerged as the most influential predictor,
with higher weight gain being strongly associated with
delivering an NBW. Additionally, all SHAP analyses high-
lighted the positive relationship between maternal height
(M_Ht_In), body mass index (BMI), pre-pregnancy
weight (PWgt_R), and the likelihood of having an NBW
newborn.
Parental factors, including ethnicity, age, and educa-
tion, played a pivotal role in birth weight predictions.
In terms of ethnicity, Black, Hispanic, and Asian fathers
were more frequently related to LBW predictions,
whereas White parents and Hispanic mothers tended to
correlate more with NBW predictions. Regarding age,
the SHAP analyses indicated that the older the parents
were, the higher the chances of having an LBW new-
born. Finally, mothers and fathers who had higher edu-
cation levels, such as master’s and bachelor’s degrees,
Fig. 12 PDP for maternal education (Mother - bachelor’s degree as reference)
Fig. 13 PDP for paternal education (Father - high school graduate or GED completed as reference)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 17 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
were found to have a higher likelihood of giving birth to
NBW infants.
Previous pregnancy history, particularly the number of
living births (PRIORLIVE), was strongly associated with
NBW predictions. Likewise, regular prenatal checkups
(PREVIS_REC) were positively linked to NBW outcomes.
Conversely, negative factors such as maternal smoking
during pregnancy (CIG_0, CIG_1, CIG_2, and CIG_3)
were associated with LBW predictions. Additionally, the
sex of the newborn emerged as a significant factor, with
male newborns (SEX_M) more likely to be predicted as
NBW, while female newborns (SEX_F) were associated
with higher rates of LBW.
Eect ofmaternal height, ethnicity andbirth weight
To further explore the strong association between birth
weight outcomes, maternal height, and ethnicity indi-
cated by the predictive models, we conducted a descrip-
tive analysis comparing birth weights ranging from 2200
to 2550 g against newborn well-being, based on the
APGAR 5 score, and average maternal height (Fig.18).
For birth weights near the WHO’s LBW cutoff of 2500
g, White and Black newborns exhibited higher rates of
abnormal APGAR 5 scores (APGAR 5 < 6) compared to
their Asian and Hispanic counterparts. Notably, within
this birth weight range, White and Black mothers were,
on average, taller than Asian and Hispanic mothers.
is pattern suggests that the WHO’s LBW cutoff of
2500 g may represent a greater risk for offspring of eth-
nic groups with taller average maternal heights, such as
White and Black mothers, compared to infants born to
shorter mothers, such as Asian or Hispanic mothers.
Discussion
Our findings indicate that there are critical parental fac-
tors that strongly influence birth weight outcomes on
the US population. Across all the analyses, nutritional
and maternal anthropometric factors, such as maternal
height, weight gain during pregnancy, pre-pregnancy
weight, and parental ethnicity, consistently emerged as
critical determinants of newborn weight. ese find-
ings align with previous research, which also reports that
nutritional status and maternal anthropometrics are sig-
nificantly correlated with birth weight and length of the
newborn [7, 48, 49].
e relationship between maternal height, weight gain
during pregnancy, pre-pregnancy weight, and mater-
nal ethnicity helps explain why some women are more
likely to deliver LBW newborns. For example, women
of shorter stature and lower body mass are at greater
risk of delivering a baby weighing less than 2500 g. Simi-
larly, women with a pre-pregnancy BMI below 24.9 are
more likely to have an LBW newborn, as they are recom-
mended to gain between 11 to 18 kg during pregnancy to
achieve an NBW outcome [50], which can be a challenge
for some.
Our findings also emphasize the importance of
adopting healthy habits during pregnancy to improve
birth weight outcomes. It is important to ensure that
mothers have access to perinatal care and follow proper
Fig. 14 Conditional Inference Tree for detecting NBW and LBW newborns. For maternal education, the following abbreviation was used: ‘
≤
8th’,
for 8th grade or less; ‘9th’, for 9th through 12th grade with no diploma; ‘HS’, for High school graduate or GED completed; ‘SC’, for some college credit,
but not a degree; ‘AD’, for Associate degree (AA, AS); ‘Bs’, for Bachelor’s degree (BA, AB, BS); ‘MS’, for Master’s degree (MA, MS, MEng, MEd, MSW, MBA);
‘PhD or PD’, for Doctorate (PhD, EdD) or Professional Degree (MD, DDS, DVM, LLB, JD)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 18 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
nutrition, which supports healthy weight gain, as these
factors strongly contribute to the likelihood of deliver-
ing an NBW infant. Other habits, like smoking, should
be avoided as it is a strong determinant of LBW. More-
over, pregnancy history needs to also be considered
as mothers who have had several successful births are
more likely to deliver an NBW newborn. Finally, paren-
tal age also matters, as both older mothers and fathers
are at an increased risk of having an LBW infant.
One of the most intriguing relationships identified in
our study is between maternal height, pre-preganncy
weight, weight gain during pregnancy, ethnicity, and
birth weight (Fig.14). Given that maternal anthropo-
metric factors (height, weight, BMI) significantly influ-
ence birth weight [49], and that newborns from White
parents have higher odds of having NBW (see Table6),
the WHO’s cut-off for defining LBW (2500 g) may be
biased towards the Caucasian population. is bias
is because, except for Black parents, White parents
have higher average height than other ethnicities in
the US [51–54]. is finding aligns with other studies
that advocate for a review of the global WHO’s cut-off
threshold for LBW [55], which was originally estab-
lished due to the higher risk of mortality for European-
descendent newborns weighing less than 2500 g [9].
erefore, birth weights less than 2500 g for non-white
newborns do not necessarily indicate a high-risk condi-
tion (see Fig.18). It is essential also to consider other
factors, such as intrauterine growth restriction, mater-
nal health history, and preterm birth [56, 57].
Fig. 15 Top 20 variables ranked by SHAP values for logistic regression
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 19 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
e high difference between Black and White birth
weights seems more related to socioeconomic factors
than anthropometrics, as the average heights for both
groups are similar (163 cm for females and 178 cm for
males [51]). In the US, Black communities have his-
torically been concentrated in low-income areas due to
social, economic, and cultural reasons. One contributing
factor to this birth weight disparity is nutrition, as Black
communities tend to have poorer diets with higher con-
sumption of salt and sugar [58]. Since nutrition is crucial
during pregnancy, the lower birth weights in Black new-
borns compared to their White counterparts may result
from this nutritional dissimilarity. Moreover, other socio-
economic factors, such as education and income, play an
important role in predicting newborn weight outcomes.
Bachelor’s graduate parents tend to have newborns with
NBW more often than those with lower education levels.
Higher years of education can make parents more aware
of nutrition and lifestyle choices. Moreover, pregnant
women with higher levels of education are more likely to
Fig. 16 Top 20 variables ranked by SHAP values for random forest
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 20 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
earn higher incomes [59], leading to less stressful preg-
nancies, better adherence to medical advice, and more
regular prenatal checkups.
e identification of weight gain, maternal height,
pre-pregnancy weight, and parental ethnicity as crucial
factors influencing birth weight outcomes aligns with
the findings of Marisaki etal. [7], who emphasized that
anthropometric factors are the major factor explaining
LBW disparities among ethnicities. However, our study
enhances this perspective by indirectly incorporating
paternal anthropometrics, noting that paternal ethnic-
ity is correlated with paternal height [51]. us, our
study provides a more comprehensive understanding of
both maternal and paternal factors in predicting LBW
outcomes, as paternal height also affects the newborn’s
anthropometrics. Furthermore, we expand upon the
work of Marisaki etal. [7] by showing that when aver-
age heights are comparable between ethnicities, such as
White and Black parents in the US, disparities in birth
weight outcomes are predominantly attributed to other
Fig. 17 Top 20 variables ranked by SHAP values for XGBoost
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 21 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
factors, particularly access to adequate nutrition. is
finding highlights the critical need to consider socio-
economic factors alongside anthropometric measures to
fully comprehend LBW outcomes.
Strengths andlimitations
is is the first study, as far as we know, to use predictive
models to analyze various factors and identify the ones
most strongly linked to LBW in a nationwide US dataset.
Unlike prior studies, we also considered paternal factors
in our analysis, demonstrating how parental ethnicity,
age, and education level influence birth weight outcomes.
e generalization of our findings was evaluated on an
independent test set (see Table5), yielding an average
accuracy of approximately 64% and a macro ROC AUC
of nearly 70% for distinguishing between NBW and LBW
newborns. is evaluation metric suitably supports the
extension of our findings presented in this work. e lim-
itation for achieving a higher accuracy may be attributed
to the highly imbalanced dataset, with LBW cases con-
stituting only about 3% of the training data. Nonethe-
less, our primary objective was to identify critical factors
influencing birth weight outcomes rather than solely
maximizing accuracy. e comprehensive dataset, which
encompasses information from diverse populations
across all 50 US states, supports the findings presented
in this study.
We note that our analysis was confined to a single
dataset collected in 2022. Our rationale was to iden-
tify the most relevant predictors using the most cur-
rent data available from the CDC, thereby reflecting
the contemporary situation in the US. is scenario set
our study as a cross-sectional analysis, which restricts
our ability to conduct longitudinal studies that examine
evolving trends between birth weight and parental pre-
dictors. Moreover, although recent research suggests
that the COVID-19 pandemic did not significantly
impact the dynamics of prenatal care visits in the US in
Fig. 18 Birth weight compared to (a) newborn well-being, represented by the percentage of abnormal Apgar 5 scores, and (b) average maternal
height, categorized by ethnic group
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 22 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
2022 [60], we note that the pandemic may have affected
access to perinatal care services for certain households.
Future research could explore how the influence of the
factors identified in this study has evolved over the past
decade concerning birth weight outcomes in the US.
We also recognize that the dataset used in this study
lacks factors that may be relevant to determining birth
weight outcomes. For instance, key features such as
income [61] and paternal factors like height and weight
[62] were not included, which could have offered addi-
tional insights into the socioeconomic and anthro-
pometric influences on LBW. Future research should
address these gaps by incorporating a broader range of
datasets and variables to achieve a more comprehensive
understanding of the determinants of LBW.
Finally, we note that our analysis identified factors
influencing birth weight outcomes based on associa-
tions rather than causality. Although machine learning
models can capture complex, nonlinear relationships
among multiple predictors and the response variable,
they do not establish cause-and-effect relationships.
erefore, our study does not imply causality. Instead,
the machine learning models identified key anthropo-
metric, ethnic, educational, and pregnancy-related fac-
tors that are commonly associated with parents of LBW
newborns.
Conclusion
is study analyzed various factors to determine which
ones impact the birth weight of newborns in the US
the most. To achieve that aim, we used machine learn-
ing and deep learning models to create predictive mod-
els based on 20 factors, including maternal, parental,
socioeconomic, ethnicity, and neonatal factors. Our
models showed that certain fixed factors, like maternal
height and parents’ ethnicity, significantly influence birth
weight. Taller and White parents are more likely to have
NBW newborns. However, because White parents tend
to be taller than parents from other ethnicities, this result
should be interpreted with caution. Indeed, as reported
by previous studies, the WHO’s cut-off for LBW may not
be appropriate for non-White ethnicities. Additionally,
our findings also indicate that pregnancy-related factors,
such as nutrition, smoking habits, and access to perinatal
care, are crucial for birth weight. Our findings emphasize
the importance of proper nutrition, avoiding smoking,
and accessing prenatal care. is is especially crucial for
vulnerable communities in the US, such as Black commu-
nities, which are statistically significantly more associated
with LBW newborns.
Acknowledgements
Not applicable.
Authors’ contributions
S.S.D and C.E.V designed the methodology of the study. Both implemented
the code and analyze the results. S.S.D. drafted the manuscript, and C.E.V.
reviewed and edited. C.E.V. is the supervisor of S.S.D.
Funding
This project was unfunded.
Data availability
Study was conducted using a public available dataset provided by the Centers
for Disease Control and Prevention (CDC). The data can be accessed at the fol-
lowing URL: https:// www. cdc. gov/ nchs/ data_ access/ Vital stats online. htm.
Declarations
Ethics approval and consent to participate
All experiments were performed according to relevant guidelines and regula-
tions (such as the Declaration of Helsinki).
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Author details
1 Applied Computer Science Department, University of Winnipeg, 515 Portage
Avenue, Winnipeg R3B 2E9, MB, Canada. 2 Department of Community Health
Sciences, Cumming School of Medicine, University of Calgary, 3280 Hospital
Drive NW, Calgary T2N 4Z6, AB, Canada.
Received: 26 August 2024 Accepted: 25 November 2024
References
1. Organization WH. UNICEF-WHO low birthweight estimates: levels and
trends 2000–2015. World Health Organization; 2019. https:// www. unicef.
org/ repor ts/ UNICEF- WHO- low- birth weight- estim ates- 2019. Accessed 15
July 2024.
2. Mathewson KJ, Burack JA, Saigal S, Schmidt LA. Tiny Babies Grow Up: The
Long-Term Effects of Extremely Low Birth Weight. In: Wazana A, Székely
E, Oberlander TF, editors. Prenatal Stress and Child Development. Cham:
Springer International Publishing; 2021. pp. 469–490. https:// doi. org/ 10.
1007/ 978-3- 030- 60159-1_ 16.
3. Paneth NS. The Problem of Low Birth Weight. Futur Child. 1995;5:19–34.
http:// www. jstor. org/ stable/ 16025 05.
4. Osterman MJK, Hamilton BE, Martin JA, Driscoll AK , Valenzuela CP. Births:
Final data for 2022. Natl Vital Stat Rep. 2024;73. Retrieved from National
Center for Health Statistics. https:// www. cdc. gov/ nchs/ data/ nvsr/ nvsr73/
nvsr73- 02. pdf. Accessed 15 Aug 2024.
5. March of Dimes. Low Birthweight by Race: United States, 2020-2022 Aver-
age. 2024. https:// www. march ofdim es. org/ peris tats/ data? reg= 99& top= 4
& stop= 45 & lev= 1 & slev= 1 & obj=1. Accessed 15 Aug 2024.
6. Wartko PD, Wong EY, Enquobahrie DA. Maternal birthplace is associated
with low birth weight within racial/ethnic groups. Matern Child Health J.
2017;21:1358–66.
7. Morisaki N, Kawachi I, Oken E, Fujiwara T. Social and anthropometric
factors explaining racial/ethnical differences in birth weight in the United
States. Sci Rep. 2017;7(1):46657.
8. Arabzadeh H, Doosti-Irani A, Kamkari S, Farhadian M, Elyasi E, Moham-
madi Y. The maternal factors associated with infant low birth weight: an
umbrella review. BMC Pregnancy Childbirth. 2024;24(1):316.
9. McCormick MC. The contribution of low birth weight to infant mortality
and childhood morbidity. N Engl J Med. 1985;312(2):82–90.
10. Sharma MK, Kumar D, Huria A, Gupta P. Maternal risk factors of low birth
weight in Chandigarh India. Internet J Health. 2009;9:10–2.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 23 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
11. Shi L, Macinko J, Starfield B, et al. Primary care, infant mortality, and
low birth weight in the states of the USA. J Epidemiol Commun Health.
2004;58:374–80.
12. Broere-Brown ZA, Baan E, Schalekamp-Timmermans S, Verburg BO, Jad-
doe VW, Steegers EA. Sex-specific differences in fetal and infant growth
patterns: a prospective population-based cohort study. Biol Sex Differ.
2016;7:1–9.
13. Bowleg L. When Black+ lesbian+ woman≠ Black lesbian woman: The
methodological challenges of qualitative and quantitative intersec-
tionality research. Sex Roles. 2008;59:312–25.
14. Bowleg L. The problem with the phrase women and minorities: inter-
sectionality—an important theoretical framework for public health.
Am J Public Health. 2012;102(7):1267–73.
15. Bauer GR. Incorporating intersectionality theory into population health
research methodology: challenges and the potential to advance health
equity. Soc Sci Med. 2014;110:10–7.
16. Evans CR, Williams DR, Onnela JP, Subramanian S. A multilevel
approach to modeling health inequalities at the intersection of multi-
ple social identities. Soc Sci Med. 2018;203:64–73.
17. Evans CR. Adding interactions to models of intersectional health
inequalities: comparing multilevel and conventional methods. Soc Sci
Med. 2019;221:95–105.
18. Strobl C, Malley J, Tutz G. An introduction to recursive partition-
ing: rationale, application, and characteristics of classification and
regression trees, bagging, and random forests. Psychol Methods.
2009;14(4):323.
19. Stevens LM, Mortazavi BJ, Deo RC, Curtis L, Kao DP. Recommenda-
tions for reporting machine learning analyses in clinical research. Circ
Cardiovasc Qual Outcomes. 2020;13(10):e006556.
20. National Center for Health Statistics. User Guide to the 2022 Natality
Public Use File. 2022. National Center for Health Statistics website.
https:// ftp. cdc. gov/ pub/ Health_ Stati stics/ NCHS/ Datas et_ Docum entat
ion/ DVS/ natal ity/ UserG uide2 022. pdf. Accessed 15 Aug 2024.
21. National Center for Health Statistics. Vital statistics online data portal:
Birth data files. https:// www. cdc. gov/ nchs/ data_ access/ Vital stats
online. htm. Accessed 21 Aug 2024.
22. Blumenshine PM, Egerter SA, Libet ML, Braveman PA. Father’s educa-
tion: an independent marker of risk for preterm birth. Matern Child
Health J. 2011;15:60–7.
23. Mao Y, Zhang C, Wang Y, Meng Y, Chen L, Dennis CL, et al. Association
between paternal age and birth weight in preterm and full-term birth:
a retrospective study. Front Endocrinol. 2021;12:706369.
24. Casadei K, Kiel J. Anthropometric Measurement. 2024. Updated 2022
Sep 26. StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing.
https:// www. ncbi. nlm. nih. gov/ books/ NBK53 7315/. Accessed 15 Aug
2024.
25. Wallace J, Horgan G, Bhattacharya S. Placental weight and efficiency
in relation to maternal body mass index and the risk of pregnancy
complications in women delivering singleton babies. Placenta.
2012;33(8):611–8.
26. Pölzlberger E, Hartmann B, Hafner E, Stümpflein I, Kirchengast S. Maternal
height and pre-pregnancy weight status are associated with fetal growth
patterns and newborn size. J Biosoc Sci. 2017;49(3):392–407.
27. Koo YJ, Ryu HM, Yang JH, Lim JH, Lee JE, Kim MY, et al. Pregnancy out-
comes according to increasing maternal age. Taiwan J Obstet Gynecol.
2012;51(1):60–5.
28. Reichman NE, Teitler JO. Paternal age as a risk factor for low birthweight.
Am J Public Health. 2006;96(5):862–6.
29. Wang X, Zuckerman B, Pearson C, Kaufman G, Chen C, Wang G, et al.
Maternal cigarette smoking, metabolic gene polymorphism, and infant
birth weight. JAMA. 2002;287(2):195–202.
30. Garces A, Perez W, Harrison MS, Hwang KS, Nolen TL, Goldenberg RL, et al.
Association of parity with birthweight and neonatal death in five sites:
The Global Network’s Maternal Newborn Health Registry study. Reprod
Health. 2020;17:1–7.
31. Momeni M, Danaei M, Kermani AJ, Bakhshandeh M, Foroodnia S,
Mahmoudabadi Z, Amirzadeh R, Safizadeh H. Prevalence and Risk Factors
of Low Birth Weight in the Southeast of Iran. Int J Prev Med. 2017;8(1):12.
https:// doi. org/ 10. 4103/ ijpvm. IJPVM_ 112_ 16.
32. Gebrehawerya T, Gebreslasie K, Admasu E, Gebremedhin M. Deter-
minants of low birth weight among mothers who gave birth in
Debremarkos referral hospital, Debremarkos town, east Gojam, Amhara
region, Ethiopia. Neonat Pediatr Med. 2018;4(1):145.
33. Manniello RL, Farrell PM. Analysis of United States neonatal mortality
statistics from 1968 to 1974, with specific reference to changing trends in
major causalities. Am J Obstet Gynecol. 1977;129(6):667–74.
34. Quenby S, Gallos ID, Dhillon-Smith RK, Podesek M, Stephenson MD,
Fisher J, et al. Miscarriage matters: the epidemiological, physical,
psychological, and economic costs of early pregnancy loss. Lancet.
2021;397(10285):1658–67.
35. Gerber-Epstein P, Leichtentritt RD, Benyamini Y. The experience of miscar-
riage in first pregnancy: the women’s voices. Death Stud. 2008;33(1):1–29.
36. Alexander GR, Kotelchuck M. Quantifying the adequacy of prenatal care:
a comparison of indices. Public Health Rep. 1996;111(5):408.
37. Conley D, Bennett NG. Race and the inheritance of low birth weight. Soc
Biol. 2000;47(1–2):77–93.
38. Zephyrin, L, Seervai, S, Lewis, C, Katon, JG. Community-Based Models
to Improve Maternal Health Outcomes and Promote Health Equity. The
Commonwealth Fund; 2021. https:// www. commo nweal thfund. org/ publi
catio ns/ issue- briefs/ 2021/ mar/ commu nity- models- impro ve- mater nalou
tcomes- equity. Accessed 15 Aug 2024.
39. Lebron CN, Mitsdarffer M, Parra A, Chavez JV, Behar-Zusman V. Latinas
and Maternal and Child Health: Research, Policy, and Representation.
Matern Child Health J. 2023. https:// doi. org/ 10. 1007/ s10995- 023- 03662-z.
40. Spinillo A, Capuzzo E, Piazzi G, Baltaro F, Stronati M, Ometto A. Signifi-
cance of low birthweight for gestational age among very preterm infants.
BJOG: Int J Obstet Gynaecol. 1997;104(6):668–73.
41. Armstrong B, Nolin A, McDonald A. Work in pregnancy and birth weight
for gestational age. Occup Environ Med. 1989;46(3):196–9.
42. Velaphi S, Mokhachane M, Mphahlele R, Beckh-Arnold E, Kuwanda
M, Cooper P. Survival of very-low-birth-weight infants according to
birth weight and gestational age in a public hospital. S Afr Med J.
2005;95(7):504–9.
43. Tsai LY, Chen YL, Tsou KI, Mu SC, Group TPIDCS, et al. The impact of small-
for-gestational-age on neonatal outcome among very-low-birth-weight
infants. Pediatr Neonatol. 2015;56(2):101–7.
44. Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: a condi-
tional inference framework. J Comput Graph Stat. 2006;15(3):651–74.
45. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gómez AN, Kaiser Ł,
Polosukhin I. Attention is all you need. Advances in Neural Information
Processing Systems: Proceedings of the 31st International Conference on
Neural Information Processing Systems (NIPS 2017). Long Beach, CA,
USA; 2017.
46. Molnar C. 8.1 Partial Dependence Plot (PDP) | Interpretable Machine
Learning. 2024. https:// chris tophm. github. io/ inter preta ble- ml- book/ pdp.
html. Accessed 15 Aug 2024.
47. Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett.
2006;27(8):861–74.
48. Patra S, Sarangi G. Association between maternal anthropometry and
birth outcome. J Pediatr Assoc India. 2017;6(2):85–94.
49. Devaki G, Shobha R. Maternal anthropometry and low birth weight: a
review. Biomed Pharmacol J. 2018;11(2):815–20.
50. Cunningham FG, Leveno KJ, Bloom SL, Spong CY, Dashe JS, Hoffman BL,
et al. Williams Obstetrics. 24th ed. New York: McGraw-Hill; 2014.
51. Komlos J, Baur M. From the tallest to (one of) the fattest: the enigmatic
fate of the American population in the 20th century. Econ Hum Biol.
2004;2(1):57–74.
52. Denney JT, Krueger PM, Rogers RG, Boardman JD. Race/ethnic and sex
differentials in body mass among US adults. Ethn Dis. 2004;14(3):389–98.
53. Silva AM, Shen W, Heo M, Gallagher D, Wang Z, Sardinha LB, et al. Ethnic-
ity-related skeletal muscle differences across the lifespan. Am J Hum Biol:
Off J Hum Biol Assoc. 2010;22(1):76–82.
54. Yin L, Annett-Hitchcock K. Comparison of body measurements
between Chinese and U.S. females. The Journal of The Textile Institute.
2019;110(12):1716–24. https:// doi. org/ 10. 1080/ 00405 000. 2019. 16175 31.
55. Lucas M. Low birth weight–the less than 2500g cut-off: is it applicable to
Sri Lanka? Sri Lanka J Perinat Med. 2023;4(2):6-17. https:// doi. org/ 10. 4038/
sljpm. v4i2. 70.
56. Valderrama CE, Ketabi N, Marzbanrad F, Rohloff P, Clifford GD. A review
of fetal cardiac monitoring, with a focus on low-and middle-income
countries. Physiol Meas. 2020;41(11):11TR01.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Page 24 of 24
Dolaand Valderrama BMC Medical Informatics and Decision Making (2024) 24:367
57. Valderrama CE, Marzbanrad F, Hall-Clifford R, Rohloff P, Clifford GD. A
proxy for detecting IUGR based on gestational age estimation in a Guate-
malan rural population. Front Artif Intell. 2020;3:56.
58. Stephenson BJK, Willett WC. Racial and ethnic heterogeneity in diets of
low-income adult females in the United States: results from National
Health and Nutrition Examination Surveys from 2011 to 2018. Am J Clin
Nutr. 2023;117(3):625–34.
59. Barrett H, Browne A. Health, hygiene and maternal education: Evidence
from The Gambia. Soc Sci Med. 1996;43(11):1579–90.
60. Osterman MJ, Hamilton BE, Martin JA, Driscoll AK, Valenzuela CP. Births:
Final Data for 2022. Natl Vital Stat Rep: Cent Dis Control Prev Natl Cent
Health Stat Natl Vital Stat Syst. 2024;73(2):1–56.
61. Aregay M, Lawson AB, Faes C, Kirby RS, Carroll R, Watjou K. Impact of
Income on Small Area Low Birth Weight Incidence Using Multiscale
Models. AIMS Public Health. 2015;2:667–680. Epub 2015 Oct 10. PMID:
27398390; PMCID: PMC4936536. https:// doi. org/ 10. 3934/ publi cheal th.
2015.4. 667.
62. Griffiths LJ, Dezateux C, Cole TJ, et al. Differential parental weight and
height contributions to offspring birthweight and weight gain in infancy.
Int J Epidemiol. 2007;36(1):104–7. https:// doi. org/ 10. 1093/ ije/ dyl210.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub-
lished maps and institutional affiliations.
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.