ChapterPDF Available

Early Risk Pregnancy Prediction Based on Machine Learning Built on Intelligent Application Using Primary Health Care Cohort Data

Authors:

Abstract

Early detection has already reduced pregnancy risk, complications, emergency situations, and also maternal mortality cases. Our study’s goal was to build on the intelligent application for early risk pregnancy prediction based on machine learning. We examined 997 patient data and 114 attributes from the electronic medical records on primary health care cohort data from the ENA System of the Sawah Besar Primary Health Care. Subsequently, eight attributes were chosen based on the Indonesian Ministry of Health, Maternal and Child Health Handbook, and medical doctor-supervised as classifier attributes. Machine learning and Knowledge Discovery from Data (KDD) technique was also applied to build an intelligent prediction in this work. In addition, we investigated the decision tree C4.5, random forest, and naive bayes algorithms for seeing which one was the right match for our application. The accuracy values for decision tree C4.5, random forest, and naive bayes were 98.01, 98.51, and 68.81%, respectively. On most accuracy measures, the random forest algorithm exceeded the decision tree C4.5 and the naive bayes algorithm. As a consequence, we employed random forest to build the web-based application. Additionally, all three algorithms obtained AUCs ranging from 0.95 to 0.99, indicating perfect prediction accuracy. Our study’s contribution was to pave the way for machine learning potential in intelligent applications for early risk pregnancy prediction. In conclusion, we successfully developed an intelligent application for risk pregnancy prediction based on machine learning and revealed potential implications in providing self-checking and early detection of pregnancy risk based on machine learning. KeywordsDecision tree C4.5Random forestNaïve BayesKnowledge discovery from dataRisk of pregnancy
Article
Full-text available
Background Maternal morbidity and mortality remain critical health concerns globally. As a result, reducing the maternal mortality ratio (MMR) is part of goal 3 in the global sustainable development goals (SDGs), and previously, it was an important indicator in the Millennium Development Goals (MDGs). Therefore, identifying high-risk groups during pregnancy is crucial for decision-makers and medical practitioners to mitigate mortality and morbidity. However, the availability of accurate predictive models for maternal mortality and maternal health risks is challenging. Compared with traditional predictive models, machine learning algorithms have emerged as promising predictive modelling methods providing accurate predictive models. Methods This work aims to explore the potential of machine learning (ML) algorithms in maternal risk level prediction using a nationwide maternal mortality dataset from Oman for the first time. A total of 402 maternal deaths from 1991 to 2023 in Oman were included in this study. We utilised principal component analysis (PCA) in the ML algorithms and compared them to the results of model performance without PCA. We employed and compared ten ML algorithms, including decision tree (DT), random forest (RF), K—Nearest Neighbors (KNN), Naïve Bayes (NB), Extreme Gradient Boosting (xgboost), Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Logistic Regression (LR), Support Vector Machine (SVM) and Artificial Neural Network (ANN). Different metrics, including, accuracy, sensitivity, precision, and the F1- score, were utilised to assess Model performance. Results The results indicated that the RF model outperformed the other methods in predicting the risk level (low or high) with an accuracy of 75.2%, precision of 85.7% and F1- score of 73% after PCA was applied. Conclusions We applied several machine learning models to predict maternal risk levels for the first time using real data from Oman. RF outperformed the other algorithms in this classification problem. A reliable estimate of maternal risk level would facilitate intervention plans for medical practitioners to reduce maternal death.
Chapter
Full-text available
Postpartum depression is a severe mental health issue exhibited among perinatal women after the childbirth process. While the negative impact of postpartum depression is extensive in developing countries, there is a significant lack of proper tools and techniques to predict the disorder due to negligence. This work proposes a machine learning-based system for finding the risk factors and prevalence of postpartum depression in Bangladesh. We developed a survey of different socio-demographic questions and modified questions from two standard postpartum depression screening scales (EPDS, PHQ-2). Data from 150 women have been collected, processed, and implemented in different machine learning models to find—the best performing models. Based on the collected data of the perinatal women in Bangladesh, the best performing machine learning model was Random Forest. The performance metrics for the best model were AUC: 98%, Accuracy: 89%, and Sensitivity: 89%. The performance of the models varies from 88%–98% (AUC), 82%–89% (Accuracy), and 81%–89% (Sensitivity). We have also found the top risk factors for causing PPD. According to this work, the prevalence of PPD in Bangladesh is 66.7% (Considering the medium and high chance of PPD). This proposed work is the first to detect the risk factors and prevalence of PPD in Bangladesh using a machine learning approach.KeywordsDepressionPostpartum depressionMachine learningDetection modelMental health
Article
Full-text available
Disease risk prediction is a rising challenge in the medical domain. Researchers have widely used machine learning algorithms to solve this challenge. The k -nearest neighbour (KNN) algorithm is the most frequently used among the wide range of machine learning algorithms. This paper presents a study on different KNN variants (Classic one, Adaptive, Locally adaptive, k-means clustering, Fuzzy, Mutual, Ensemble, Hassanat and Generalised mean distance) and their performance comparison for disease prediction. This study analysed these variants in-depth through implementations and experimentations using eight machine learning benchmark datasets obtained from Kaggle, UCI Machine learning repository and OpenML. The datasets were related to different disease contexts. We considered the performance measures of accuracy, precision and recall for comparative analysis. The average accuracy values of these variants ranged from 64.22% to 83.62%. The Hassanaat KNN showed the highest average accuracy (83.62%), followed by the ensemble approach KNN (82.34%). A relative performance index is also proposed based on each performance measure to assess each variant and compare the results. This study identified Hassanat KNN as the best performing variant based on the accuracy-based version of this index, followed by the ensemble approach KNN. This study also provided a relative comparison among KNN variants based on precision and recall measures. Finally, this paper summarises which KNN variant is the most promising candidate to follow under the consideration of three performance measures (accuracy, precision and recall) for disease prediction. Healthcare researchers and stakeholders could use the findings of this study to select the appropriate KNN variant for predictive disease risk analytics.
Article
Full-text available
(1) Background: Macrosomia is prevalent in China and worldwide. The current method of predicting macrosomia is ultrasonography. We aimed to develop new predictive models for recognizing macrosomia using a random forest model to improve the sensitivity and specificity of macrosomia prediction; (2) Methods: Based on the Shandong Multi-Center Healthcare Big Data Platform, we collected the prenatal examination and delivery data from June 2017 to May 2018 in Jinan, including the macrosomia and normal-weight newborns. We constructed a random forest model and a logistic regression model for predicting macrosomia. We compared the validity and predictive value of these two methods and the traditional method; (3) Results: 405 macrosomia cases and 3855 normal-weight newborns fit the selection criteria and 405 pairs of macrosomia and control cases were brought into the random forest model and logistic regression model. On the basis of the average decrease of the Gini coefficient, the order of influencing factors was: interspinal diameter, transverse outlet, intercristal diameter, sacral external diameter, pre-pregnancy body mass index, age, the number of pregnancies, and the parity. The sensitivity, specificity, and area under curve were 91.7%, 91.7%, and 95.3% for the random forest model, and 56.2%, 82.6%, and 72.0% for logistic regression model, respectively; the sensitivity and specificity were 29.6% and 97.5% for the ultrasound; (4) Conclusions: A random forest model based on the maternal information can be used to predict macrosomia accurately during pregnancy, which provides a scientific basis for developing rapid screening and diagnosis tools for macrosomia.
Article
Full-text available
Introduction: Machine learning is increasingly utilized over recent years in order to develop models that represent and solve problems in a variety of domains, including those of obstetrics and midwifery. The aim of this systematic review was to analyze research studies on machine learning and intelligent systems applications in midwifery and obstetrics. Methods: A thorough literature review was performed in four electronic databases (PubMed, APA PsycINFO, SCOPUS, ScienceDirect). Only articles that discussed machine learning and intelligent systems applications in midwifery and obstetrics, were considered in this review. Selected articles were critically evaluated as for their relevance and a contextual synthesis was conducted. Results: Thirty-two articles were included in this systematic review as they met the inclusion and methodological criteria specified in this study. The results suggest that machine learning and intelligent systems have produced successful models and systems in a broad list of midwifery and obstetrics topics, such as diagnosis, pregnancy risk assessment, fetal monitoring, bladder tumor, etc. Conclusions: This systematic review suggests that machine learning represents a very promising area of artificial intelligence for the development of practical and highly effective applications that can support human experts, as well the investigation of a wide range of exciting opportunities for further research.
Article
Full-text available
Background: With the growing rate of cesarean sections, rising morbidity and mortality thereafter is an important health issue. Predictive models can identify individuals with a higher probability of cesarean section, and help them make better decisions. This study aimed to investigate the biopsychosocial factors associated with the method of childbirth and designed a predictive model using the decision tree C4.5 algorithm. Methods: In this cohort study, the sample included 170 pregnant women in the third trimester of pregnancy referring to Shahroud Health Care Centers (Semnan, Iran), from 2018 to 2019. Blood samples were taken from mothers to measure the estrogen hormone at baseline. Birth information was recorded at the follow-up time per 30-42 days postpartum. Chi square, independent samples t test, and Mann-Whitney were used for comparisons between the two groups. Modeling was performed with the help of MATLAB software and C4.5 decision tree algorithm using input variables and target variable (childbirth method). The data were divided into training and testing datasets using the 70-30% method. In both stages, sensitivity, specificity, and accuracy were evaluated by the decision tree algorithm. Results: Previous method of childbirth, maternal body mass index at childbirth, maternal age, and estrogen were the most significant factors predicting the childbirth method. The decision tree model's sensitivity, specificity, and accuracy were 85.48%, 94.34%, and 89.57% in the training stage, and 82.35%, 83.87%, and 83.33% in the testing stage, respectively. Conclusion: The decision tree model was designed with high accuracy successfully predicted the method of childbirth. By recognizing the contributing factors, policymakers can take preventive action.It should be noted that this article was published in preprint form on the website of research square (https://www.researchsquare.com/article/rs-34770/v1).
Article
Full-text available
Background Postpartum depression is a widespread disorder, adversely affecting the well-being of mothers and their newborns. We aim to utilize machine learning for predicting risk of postpartum depression (PPD) using primary care electronic health records (EHR) data, and to evaluate the potential value of EHR-based prediction in improving the accuracy of PPD screening and in early identification of women at risk. Methods We analyzed EHR data of 266,544 women from the UK who gave first live birth between 2000 and 2017. We extracted a multitude of socio-demographic and medical variables and constructed a machine learning model that predicts the risk of PPD during the year following childbirth. We evaluated the model’s performance using multiple validation methodologies and measured its accuracy as a stand-alone tool and as an adjunct to the standard questionnaire-based screening by Edinburgh postnatal depression scale (EPDS). Results The prevalence of PPD in the analyzed cohort was 13.4%. Combing EHR-based prediction with EPDS score increased the area under the receiver operator characteristics curve (AUC) from 0.805 to 0.844 and the sensitivity from 0.72 to 0.76, at specificity of 0.80. The AUC of the EHR-based prediction model alone varied from 0.72 to 0.74 and decreased by only 0.01–0.02 when applied as early as before the beginning of pregnancy. Conclusions PPD risk prediction using EHR data may provide a complementary quantitative and objective tool for PPD screening, allowing earlier (pre-pregnancy) and more accurate identification of women at risk, timely interventions and potentially improved outcomes for the mother and child.
Article
Full-text available
Background: Vaccine safety surveillance is important because it is related to vaccine hesitancy, which affects vaccination rate. To increase confidence in vaccination, the active monitoring of vaccine adverse events is important. For effective active surveillance, we developed and verified a machine learning-based active surveillance system using national claim data. Methods: We used two databases, one from the Korea Disease Control and Prevention Agency, which contains flu vaccination records for the elderly, and another from the National Health Insurance Service, which contains the claim data of vaccinated people. We developed a case-crossover design based machine learning model to predict the health outcome of interest events (anaphylaxis and agranulocytosis) using a random forest. Feature importance values were evaluated to determine candidate associations with each outcome. We investigated the relationship of the features to each event via a literature review, comparison with the Side Effect Resource, and using the Local Interpretable Model-agnostic Explanation method. Results: The trained model predicted each health outcome of interest with a high accuracy (approximately 70%). We found literature supporting our results, and most of the important drug-related features were listed in the Side Effect Resource database as inducing the health outcome of interest. For anaphylaxis, flu vaccination ranked high in our feature importance analysis and had a positive association in Local Interpretable Model-Agnostic Explanation analysis. Although the feature importance of vaccination was lower for agranulocytosis, it also had a positive relationship in the Local Interpretable Model-Agnostic Explanation analysis. Conclusion: We developed a machine learning-based active surveillance system for detecting possible factors that can induce adverse events using health claim and vaccination databases. The results of the study demonstrated a potentially useful application of two linked national health record databases. Our model can contribute to the establishment of a system for conducting active surveillance on vaccination.
Article
Full-text available
Background: To analyze the factors associated with women's vasomotor symptoms (VMS) using machine learning. Methods: Data on 3,298 women, aged 40-80 years, who attended their general health check-up from January 2010 to December 2012 were obtained from Korea University Anam Hospital in Seoul, Korea. Five machine learning methods were applied and compared for the prediction of VMS, measured by the Menopause Rating Scale. Variable importance, the effect of a variable on model performance, was used for identifying the major factors associated with VMS. Results: In terms of the mean squared error, the random forest (0.9326) was much better than linear regression (12.4856) and artificial neural networks with one, two, and three hidden layers (1.5576, 1.5184, and 1.5833, respectively). Based on the variable importance from the random forest, the most important factors associated with VMS were age, menopause age, thyroid-stimulating hormone, and monocyte, triglyceride, gamma glutamyl transferase, blood urea nitrogen, cancer antigen 19-9, C-reactive protein, and low-density lipoprotein cholesterol levels. Indeed, the following variables were ranked within the top 20 in terms of variable importance: cancer antigen 125, total cholesterol, insulin, free thyroxine, forced vital capacity, alanine aminotransferase, forced expired volume in 1 second, height, homeostatic model assessment for insulin resistance, and carcinoembryonic antigen. Conclusion: Machine learning provides an invaluable decision support system for the prediction of VMS. For managing VMS, comprehensive consideration is needed regarding thyroid function, lipid profile, liver function, inflammation markers, insulin resistance, monocyte count, cancer antigens, and lung function.
Chapter
Full-text available
In this paper we carried out research on heart disease from data analytics point of view. Prediction of heart disease is a very recent field as the data is becoming available. Other researchers have approached it with different techniques and methods. We used data analytics to detect and predict disease’s patients. Starting with a pre-processing phase, where we selected the most relevant features by the correlation matrix, then we applied three data analytics techniques (neural networks, SVM and KNN) on data sets of different sizes, in order to study the accuracy and stability of each of them. Found neural networks are easier to configure and obtain much good results (accuracy of 93%).
Article
Herein, we show differences in blood serum of asymptomatic and symptomatic pregnant women infected with COVID-19 and correlate them with laboratory indexes, ATR FTIR and multivariate machine learning methods. We collected the sera of COVID-19 diagnosed pregnant women, in the second trimester (n = 12), third-trimester (n = 7), and second-trimester with severe symptoms (n = 7) compared to the healthy pregnant (n = 11) women, which makes a total of 37 participants. To assign the accuracy of FTIR spectra regions where peak shifts occurred, the Random Forest algorithm, traditional C5.0 single decision tree algorithm and deep neural network approach were used. We verified the correspondence between the FTIR results and the laboratory indexes such as: the count of peripheral blood cells, biochemical parameters, and coagulation indicators of pregnant women. CH2 scissoring, amide II, amide I vibrations could be used to differentiate the groups. The accuracy calculated by machine learning methods was higher than 90%. We also developed a method based on the dynamics of the absorbance spectra allowing to determine the differences between the spectra of healthy and COVID-19 patients. Laboratory indexes of biochemical parameters associated with COVID-19 validate changes in the total amount of proteins, albumin and lipase.