Article

A hybrid FSRF model based on regression algorithm for diabetes medical expense prediction

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... An ablation study is conducted for this study to analyze the effects of components and hyperparameters in LeD-Net and HiDenNet models. This ablation study highlights activation functions, kernel sizes, and network depth that are critical factors in DL model performance [33]. Both LeDNet and HiDenNet showed robustness and excellent performance in various types of settings. ...
Article
Full-text available
Diabetes is a metabolic condition that can lead to chronic illness and organ failure if it remains untreated. Accurate detection is essential to reduce these risks at an early stage. Recent advancements in predictive models show promising results. However, these models exhibit inadequate accuracy, struggle with class imbalance, and lack interpretability of the decision-making process. To overcome these issues, we propose two novel deep models for early and accurate diabetes prediction: LeDNet (inspired by LeNet and the Dual Attention Network) and HiDenNet (influenced by the Highway Network and DenseNet). The models are trained using the Diabetes Health Indicators dataset, which has an inherent class imbalance problem and results in biased predictions. This imbalance is mitigated by employing the majority-weighted minority over-sampling technique. Experimental findings demonstrate that LeDNet achieves an F1-score of 85%, recall of 84%, accuracy of 85%, and precision of 86%. Similarly, HiDenNet, achieves accuracy, F1-score, recall, and precision of 85%, 86%, 86%, and 86%, respectively. Both proposed models outperform the state-of-the-art Deep Learning (DL) models. K-fold cross-validation is applied to ensure models' stability at different data splits. Local interpretable model-agnostic explanations and Shapley additive explanations techniques are utilized to enhance interpretability and overcome the traditional black-box nature of DL models. By providing both local and global insights into feature contributions, these explainable artificial intelligence techniques provide transparency to LeDNet and HiDenNet in diabetes prediction. LeDNet and HiDenNet not only improve decision-making transparency but also enhance diabetes prediction accuracy, making them reliable tools for clinical decision-making and early diagnosis.
Article
Full-text available
Addressing the problems of manual dependence and low accuracy of traditional building electrical system fault diagnosis, this paper proposes a novel method, which is based on random forest optimized by improved sparrow search algorithm (ISSA-RF). Firstly, the method utilizes a fault collection platform to acquire raw signals of various faults. Secondly, the features of these signals are extracted by time-domain and frequency-domain analysis. Furthermore, principal component analysis (PCA) is employed to reduce the dimensionality of the extracted features. Finally, the reduced features are input into ISSA-RF for classification. In ISSA-RF, the improved sparrow algorithm (ISSA) is used to optimize the parameters of the random forest (RF). The parameters for ISSA optimization are n_estimators and min_samples_leaf. In this case, the accuracy of the proposed method can reach 98.61% through validation experiment. In addition, the proposed method also exhibits superior performance compared with traditional fault classification algorithms and the latest building electrical fault diagnosis algorithms.
Article
Full-text available
Background Streptococcus suis (S.suis) is a neglected zoonotic disease that imposes a significant economic burden on healthcare and society. To our knowledge, studies estimating the cost of illness associated with S.suis treatment are limited, and no study focuses on treatment costs and potential key drivers in Thailand. This study aimed to estimate the direct medical costs associated with S.suis treatment in Thailand and identify key drivers affecting high treatment costs from the provider’s perspective. Methods A retrospective analysis of the 14-year data from 2005–2018 of confirmed S.suis patients admitted at Chiang Mai University Hospital (CMUH) was conducted. Descriptive statistics were used to summarize the data of patients’ characteristics, healthcare utilization and costs. The multiple imputation with predictive mean matching strategy was employed to deal with missing Glasgow Coma Scale (GCS) data. Generalized linear models (GLMs) were used to forecast costs model and identify determinants of costs associated with S.suis treatment. The modified Park test was adopted to determine the appropriate family. All costs were inflated applying the consumer price index for medical care and presented to the year 2019. Results Among 130 S.suis patients, the average total direct medical cost was 12,4675 Thai baht (THB) (US$ 4,016), of which the majority of expenses were from the “others” category (room charges, staff services and medical devices). Infective endocarditis (IE), GCS, length of stay, and bicarbonate level were significant predictors associated with high total treatment costs. Overall, marginal increases in IE and length of stay were significantly associated with increases in the total costs (standard error) by 132,443 THB (39,638 THB) and 5,490 THB (1,715 THB), respectively. In contrast, increases in GCS and bicarbonate levels were associated with decreases in the total costs (standard error) by 13,118 THB (5,026 THB) and 7,497 THB (3,430 THB), respectively. Conclusions IE, GCS, length of stay, and bicarbonate level were significant cost drivers associated with direct medical costs. Patients’ clinical status during admission significantly impacts the outcomes and total treatment costs. Early diagnosis and timely treatment were paramount to alleviate long-term complications and high healthcare expenditures.
Article
Full-text available
In recent years, swarm intelligence algorithms have received extensive attention and research. Swarm intelligence algorithms are a biological heuristic method, which is widely used in solving optimization problems. The traditional swarm intelligence algorithms provide new ideas and new ways to solve some practical problems, and they have made positive progress in fields such as combinatorial optimization, task scheduling, process control, engineering prediction, and image processing. In particular, the sparrow search algorithm is a new type of group intelligence optimization algorithm inspired by the group foraging behavior to perform local and global search by imitating the foraging and anti-predation behavior of sparrows. In view of the shortcomings of the original sparrow search algorithm, such as its easy fall into local optimum, slow convergence speed, and low convergence accuracy, scholars at home and abroad have improved the sparrow search algorithm and have made practical applications in various fields. Firstly, this paper introduces the basic principle of sparrow search algorithm, analyzes the factors affecting the performance of the algorithm, further proposes the improvement strategy of the algorithm, and performs function test comparison and performance analysis with particle swarm optimization algorithm, monarch butterfly algorithm, colony spider algorithm, and pigeon swarm optimization algorithm. After that, the application and development of the sparrow search algorithm in power grid load forecasting, image processing, path tracking, wireless sensor network routing performance optimization, wireless location, and fault diagnosis are described. Finally, combined with the performance characteristics and application direction of the sparrow search algorithm, the future research and development direction of the sparrow search algorithm is prospected.
Article
Full-text available
Objective Machine learning (ML) algorithms, as an early branch of artificial intelligence technology, can effectively simulate human behavior by training on data from the training set. Machine learning algorithms were used in this study to predict patient choice tendencies in medical decision-making. Its goal was to help physicians understand patient preferences and to serve as a resource for the development of decision-making schemes in clinical treatment. As a result, physicians and patients can have better conversations at lower expenses, leading to better medical decisions. Method Patient medical decision-making tendencies were predicted by primary survey data obtained from 248 participants at third-level grade-A hospitals in China. Specifically, 12 predictor variables were set according to the literature review, and four types of outcome variables were set based on the optimization principle of clinical diagnosis and treatment. That is, the patient's medical decision-making tendency, which is classified as treatment effect, treatment cost, treatment side effect, and treatment experience. In conjunction with the study's data characteristics, three ML classification algorithms, decision tree (DT), k-nearest neighbor (KNN), and support vector machine (SVM), were used to predict patients' medical decision-making tendency, and the performance of the three types of algorithms was compared. Results The accuracy of the DT algorithm for predicting patients' choice tendency in medical decision making is 80% for treatment effect, 60% for treatment cost, 56% for treatment side effects, and 60% for treatment experience, followed by the KNN algorithm at 78%, 66%, 74%, 84%, and the SVM algorithm at 82%, 76%, 80%, 94%. At the same time, the comprehensive evaluation index F1-score of the DT algorithm are 0.80, 0.61, 0.58, 0.60, the KNN algorithm are 0.75, 0.65, 0.71, 0.84, and the SVM algorithm are 0.81, 0.74, 0.73, 0.94. Conclusion Among the three ML classification algorithms, SVM has the highest accuracy and the best performance. Therefore, the prediction results have certain reference values and guiding significance for physicians to formulate clinical treatment plans. The research results are helpful to promote the development and application of a patient-centered medical decision assistance system, to resolve the conflict of interests between physicians and patients and assist them to realize scientific decision-making.
Article
Full-text available
Mathematical programming and meta-heuristics are two types of optimization methods. Meta-heuristic algorithms can identify optimal/near-optimal solutions by mimicking natural behaviours or occurrences and provide benefits such as simplicity of execution, a few parameters, avoidance of local optimization, and flexibility. Many meta-heuristic algorithms have been introduced to solve optimization issues, each of which has advantages and disadvantages. Studies and research on presented meta-heuristic algorithms in prestigious journals showed they had good performance in solving hybrid, improved and mutated problems. This paper reviews the sparrow search algorithm (SSA), one of the new and robust algorithms for solving optimization problems. This paper covers all the SSA literature on variants, improvement, hybridization, and optimization. According to studies, the use of SSA in the mentioned areas has been equal to 32%, 36%, 4%, and 28%, respectively. The highest percentage belongs to Improved, which has been analyzed by three subsections: Meat-Heuristics, artificial neural networks, and Deep Learning.
Article
Full-text available
Background Innovative provider payment methods that avoid adverse selection and reward performance require accurate prediction of healthcare costs based on individual risk adjustment. Our objective was to compare the performances of a simple neural network (NN) and random forest (RF) to a generalized linear model (GLM) for the prediction of medical cost at the individual level.MethodsA 1/97 representative sample of the French National Health Data Information System was used. Predictors selected were: demographic information; pre-existing conditions, Charlson comorbidity index; healthcare service use and costs. Predictive performances of each model were compared through individual-level (adjusted R-squared (adj-R2), mean absolute error (MAE) and hit ratio (HiR)), and distribution-level metrics on different sets of covariates in the general population and by pre-existing morbid condition, using a quasi-Monte Carlo design.ResultsWe included 510,182 subjects alive on 31st December, 2015. Mean annual costs were 1894€ (standard deviation 9326€) (median 393€, IQ range 95€; 1480€), including zero-claim subjects. All models performed similarly after adjustment on demographics. RF model had better performances on other sets of covariates (pre-existing conditions, resource counts and past year costs). On full model, RF reached an adj-R2 of 47.5%, a MAE of 1338€ and a HiR of 67%, while GLM and NN had an adj-R2 of 34.7% and 31.6%, a MAE of 1635€ and 1660€, and a HiR of 58% and 55 M, respectively. RF model outperformed GLM and NN for most conditions and for high-cost subjects.ConclusionsRF should be preferred when the objective is to best predict medical costs. When the objective is to understand the contribution of predictors, GLM was well suited with demographics, conditions and base year cost.
Article
Full-text available
Aim: With the improvement in people's living standards, the incidence of chronic renal failure (CRF) is increasing annually. The increase in the number of patients with CRF has significantly increased pressure on China's medical budget. Predicting hospitalization expenses for CRF can provide guidance for effective allocation and control of medical costs. The purpose of this study was to use the random forest (RF) method and least absolute shrinkage and selection operator (LASSO) regression to predict personal hospitalization expenses of hospitalized patients with CRF and to evaluate related influencing factors. Methods: The data set was collected from the first page of data of the medical records of three tertiary first-class hospitals for the whole year of 2016. Factors influencing hospitalization expenses for CRF were analyzed. Random forest and least absolute shrinkage and selection operator regression models were used to establish a prediction model for the hospitalization expenses of patients with CRF, and comparisons and evaluations were carried out. Results: For CRF inpatients, statistically significant differences in hospitalization expenses were found for major procedures, medical payment method, hospitalization frequency, length of stay, number of other diagnoses, and number of procedures. The R² of LASSO regression model and RF regression model are 0.6992 and 0.7946, respectively. The mean absolute error (MAE) and root mean square error (RMSE) of the LASSO regression model were 0.0268 and 0.043, respectively, and the MAE and RMSE of the RF prediction model were 0.0171 and 0.0355, respectively. In the RF model, and the weight of length of stay was the highest (0.730). Conclusions: The hospitalization expenses of patients with CRF are most affected by length of stay. The RF prediction model is superior to the LASSO regression model and can be used to predict the hospitalization expenses of patients with CRF. Health administration departments may consider formulating accurate individualized hospitalization expense reimbursement mechanisms accordingly.
Article
Full-text available
Objective To assess the prevalence of diabetes and its risk factors. Design Population based, cross sectional study. Setting 31 provinces in mainland China with nationally representative cross sectional data from 2015 to 2017. Participants 75 880 participants aged 18 and older—a nationally representative sample of the mainland Chinese population. Main outcome measures Prevalence of diabetes among adults living in China, and the prevalence by sex, regions, and ethnic groups, estimated by the 2018 American Diabetes Association (ADA) and the World Health Organization diagnostic criteria. Demographic characteristics, lifestyle, and history of disease were recorded by participants on a questionnaire. Anthropometric and clinical assessments were made of serum concentrations of fasting plasma glucose (one measurement), two hour plasma glucose, and glycated haemoglobin (HbA 1c ). Results The weighted prevalence of total diabetes (n=9772), self-reported diabetes (n=4464), newly diagnosed diabetes (n=5308), and prediabetes (n=27 230) diagnosed by the ADA criteria were 12.8% (95% confidence interval 12.0% to 13.6%), 6.0% (5.4% to 6.7%), 6.8% (6.1% to 7.4%), and 35.2% (33.5% to 37.0%), respectively, among adults living in China. The weighted prevalence of total diabetes was higher among adults aged 50 and older and among men. The prevalence of total diabetes in 31 provinces ranged from 6.2% in Guizhou to 19.9% in Inner Mongolia. Han ethnicity had the highest prevalence of diabetes (12.8%) and Hui ethnicity had the lowest (6.3%) among five investigated ethnicities. The weighted prevalence of total diabetes (n=8385) using the WHO criteria was 11.2% (95% confidence interval 10.5% to 11.9%). Conclusion The prevalence of diabetes has increased slightly from 2007 to 2017 among adults living in China. The findings indicate that diabetes is an important public health problem in China.
Article
Full-text available
In this paper, a novel swarm optimization approach, namely sparrow search algorithm (SSA), is proposed inspired by the group wisdom, foraging and anti-predation behaviours of sparrows. Experiments on 19 benchmark functions are conducted to test the performance of the SSA and its performance is compared with other algorithms such as grey wolf optimizer (GWO), gravitational search algorithm (GSA), and particle swarm optimization (PSO). Simulation results show that the proposed SSA is superior over GWO, PSO and GSA in terms of accuracy, convergence speed, stability and robustness. Finally, the effectiveness of the proposed SSA is demonstrated in two practical engineering examples.
Article
Full-text available
Firefly algorithm is a powerfulalgorithm, however, it may show inferior convergence rate towards global optimum because the optimization process of the firefly algorithm depends upon on a random quantity that facilitates the fireflies to explore the search space at the cost of the exploitation. As a result, to improve the performance of firefly algorithm in terms of convergence rate even as its ability to exploit the solutions is preserved, firefly algorithm is altered using two different approaches in this research work. In the first approach, a local search strategy, i.e., classical unidimensional local search is employed in the firefly algorithm in order to improve its exploitation capability. In the second approach, a new solution search strategy, i.e., stochastic diffusion scout search is integrated with the searching phase of the firefly algorithm in order to increase the probability of inferior solutions to improve themselves. Both the approaches are evaluated on various benchmark problems having different complexities and their performances are compared with artificial bee colony and classical firefly algorithm. Results prove the capability of the proposed approaches in finding the global optimum in the search space and avoid the local optima at the same time.
Article
Full-text available
Modern optimisation algorithms are often metaheuristic, and they are very promising in solving NP-hard optimization problems. In this paper, we show how to use the recently developed Firefly Algorithm to solve nonlinear design problems. For the standard pressure vessel design optimisation, the optimal solution found by FA is far better than the best solution obtained previously in literature. In addition, we also propose a few new test functions with either singularity or stochastic components but with known global optimality, and thus they can be used to validate new optimisation algorithms. Possible topics for further research are also discussed.
Article
Cardiovascular disease (CVD) has become a significant public health problem affecting national economic and social development, and ranks among the top causes of death in the world. Thus, people pay increasing attention to the prevention, control, and risk assessment of CVD. In this paper, an improved sparrow search algorithm (SSA) is designed to optimize the parameters of Categorical Boosting (CatBoost) model, and it is applied to the risk assessment of CVD. The contributions of this research are mainly in the following aspects: (1) In the position update formula of the discoverer, the salp swarm algorithm is integrated, the global optimal solution of the previous generation is added to improve the global search ability and local development ability of SSA; (2) Using Opposition-based Learning (OBL) and Lateral mutation strategy to improve the search ability of the worst individual; (3) Sparrow search algorithm based on salp swarm algorithm, OBL and Lateral mutation strategy (SOLSSA) is used to optimize parameters of CatBoost to improve the prediction effect, and the experiments are carried out for the proposed model (SOLSSA-CatBoost) using two CVD data sets on Kaggle. The proposed model is compared with six machine learning models, including random forest (RF), logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), light gradient Boosting (LGB) and CatBoost, and is also compared with other four optimization algorithms (whale optimization algorithm (WOA), gray wolf algorithm (GWO), seagull optimization algorithm (SOA) and SSA) in optimizing the performance of the CatBoost. The experimental results show that compared with other comparison algorithms, SOLSSA-CatBoost has better prediction effect on test set, with F1-score reaching 90% and 81.51% in two CVD data sets respectively. The SOLSSA-CatBoost model in this paper can make a more accurate prediction of patients' disease risk, and provide a certain basis for doctors to judge the condition.
Article
Air quality indicators and air quality index (AQI) prediction are effective approaches for urban decision-makers, planners, managers and even city residents to arrange their risk abatement measures in advance. In this study, five models, including Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Multi-layer perceptron (MLP), Long-short term memory (LSTM) and Long-short term memory coupling with sparrow search algorithm (LSTM-SSA), were used to predict hourly, daily and weekly concentrations of six air quality indicators (PM2.5, PM10, SO2, NO2, CO, O3) and AQI. The case study was based on the hourly observed data of Shanghai from February 1, 2021 to January 31, 2022 and prediction accuracy for different prediction models, indicators and periods was compared. Results revealed that: (1) The prediction accuracy of three neural network models (MLP, LSTM, LSTM-SSA) was superior to two tree-based models (RF, XGBoost) in terms of MAPE, RMSE, MAE and R². (2) Comparing three neural network models, LSTM-SSA was more accurate than MLP and LSTM in terms of MAPE, RMSE, MAE and R². (3) For LSTM-SSA, the MAPE of AQI was minimal (1.53%), followed by PM2.5 (2.72%), O3 (3.93%), PM10 (4.87%), NO2 (5.74%), CO (7.54%) and SO2 (8.82%). (4) For LSTM-SSA, the value of MAPE increased from 3.88%, 5.28%–5.91% when the prediction period increased from an hour, a day to a week.
Article
Groundwater is considered to be one of the most valuable natural resources in the world. However, the availability of groundwater is of concern. Therefore, understanding the potential of groundwater is very important for the utilization of water resources. The main goal of the study was to predict and assess the groundwater using hybrid machine learning and metaheuristic algorithms to automatically tune the parameters, namely the Random Forest (RF), Support Vector Machines (SVM), the Grey Wolf Algorithm (GWO), and the Sparrow Search Algorithm (SSA). A total of 608 groundwater locations were identified by field surveys. Three different sample datasets (D1-D3) were created to increase the confidence of the result, and each dataset was divided randomly into a training set (70%) and a validation set (30%). Fifteen conditioning factors involving geology, human activity, and hydrology were extracted from the available materials. The Evidential Belief Function (EBF) was employed to determine the correlation between groundwater and factors. Then fourteen relevant factors were selected by feature selection. After that, the hybrid models of RF-GWO, RF-SSA, SVM-GWO, and SVM-SSA built using the datasets (D1-D3) were applied to generate groundwater potential maps (GPMs). Results showed that the performances of the hybrid models can be considered to be stable. The global performance of these hybrid models was assessed using the area under the receiver operating characteristic curve (AUC-ROC) and related statistical indexes. According to the D1 dataset validation results, the AUC values for the RF-GWO, RF-SSA, SVM-GWO, and SVM-SSA were 0.832, 0.840, 0.790, and 0.809, respectively. The RF-SSA had the highest accuracy (0.764), with an AUC of 0.840, and the SVM-GWO showed the least accuracy (0.723), with an AUC of 0.790. The outcomes revealed that all hybrid models had a good predictive performance. However, the RF outperformed the SVM model, and the SSA algorithm is superior to the GWO algorithm. Overall, the results of this study show that the construction of the groundwater potential model using a metaheuristic optimization algorithm is a feasible approach.
Article
Flame image recognition is of great significance in the fire detection and prevention. In this paper, in order to improve the accuracy of fire recognition, a fast stochastic configuration network (FSCN) method based on an improved sparrow search algorithm (ISSA) is proposed. In the design of fast stochastic configuration network (FSCN), the gradual increase of hidden layer nodes in the original stochastic configuration network (SCN) is canceled, and the number of them is set directly. An improved sparrow search algorithm (ISSA) is used to generate the input weights and biases of hidden layer nodes. At the same time, the supervisory mechanism is retained to judge the weights and biases of all hidden layer nodes, and ISSA is used to regenerate corresponding weights and biases for the nodes that do not meet the constraints in the supervisory mechanism. In the ISSA, sine map, adaptive adjustment of hyper-parameters and mutation strategy are used to improve the optimization ability of the original sparrow search algorithm (SSA). Some parameters in FSCN are optimized by ISSA to make it have better classification performance. Finally, the image processing technology is used to extract features from the flame images and the interference images, and then the feature vectors are obtained to train the ISSA-FSCN. Several simulation experiments have been carried out to verify effectiveness of the proposed ISSA-FSCN method. In the performance verification of ISSA on CEC test suit, ISSA averagely outperforms other algorithms by 33.6% in the average results of 20 functions. In the performance verification of FSCN, the average results of accuracy, precision, recall, F1 and auc are compared. In the experiment 1, ISSA-FSCN averagely outperforms other algorithms by 19.7%, 14.7%, 12.8%, 14.5% and 23.0%. In the experiment 2, ISSA-FSCN averagely outperforms other algorithms by 2.3%, 2.1%, 2.5%, 6.0%, 3.2%. In the experiment 3, ISSA-FSCN averagely outperforms other algorithms by 5.9%, 4.0%, 4.1%, 4.1%, 6.8%.
Article
The paper reports three new ensembles of supervised learning predictors for managing medical insurance costs. The open dataset is used for data analysis methods development. The usage of artificial intelligence in the management of financial risks will facilitate economic wear time and money and protect patients' health. Machine learning is associated with many expectations, but its quality is determined by choosing a good algorithm and the proper steps to plan, develop, and implement the model. The paper aims to develop three new ensembles for individual insurance costs prediction to provide high prediction accuracy. Pierson coefficient and Boruta algorithm are used for feature selection. The boosting, stacking, and bagging ensembles are built. A comparison with existing machine learning algorithms is given. Boosting modes based on regression tree and stochastic gradient descent is built. Bagged CART and Random Forest algorithms are proposed. The boosting and stacking ensembles shown better accuracy than bagging. The tuning parameters for boosting do not allow to decrease the RMSE too. So, bagging shows its weakness in generalizing the prediction. The stacking is developed using K Nearest Neighbors (KNN), Support Vector Machine (SVM), Regression Tree, Linear Regression, Stochastic Gradient Boosting. The random forest (RF) algorithm is used to combine the predictions. One hundred trees are built for RF. Root Mean Square Error (RMSE) has lifted the to 3173.213 in comparison with other predictors. The quality of the developed ensemble for Root Mean Squared Error metric is 1.47 better than for the best weak predictor (SVR).
Article
Triage management plays important roles in hospitalized patients for disease severity stratification and medical burden analysis. Although progression risks have been extensively researched for numbers of diseases, other crucial indicators that reflect patients' economic and time costs have not been systematically studied. To address the problems, we developed an automatic deep learning based Auto Triage Management (ATM) Framework capable of accurately modelling patients' disease progression risk and health economic evaluation. Based on them, we can first discover the relationship between disease progression and medical system cost, find potential features that can more precisely aid patient triage in resource allocation, and allow treatment plan searching that has cured patients. Applying ATM in COVID-19, we built a joint model to predict patients' risk, the total length of stay (LoS) and cost when at-admission, and remaining LoS and cost at a given hospitalized time point, with C-index 0.930 and 0.869 for risk prediction, mean absolute error (MAE) of 5.61 and 5.90 days for total LoS prediction in internal and external validation data.
Article
Heart disease seriously threatens human life due to high morbidity and mortality. Accurate prediction and diagnosis become more critical for early prevention, detection and treatment. The Internet of Medical Things (IoMT) and Artificial Intelligence (AI) support healthcare services in heart disease monitoring, prediction and diagnosis. However, most prediction models only predict whether people are sick, and rarely further determine the severity of the disease. In this paper, we propose a machine learning based prediction model to achieve binary and multiple classification heart disease prediction simultaneously. We first design a Fuzzy-GBDT algorithm combining fuzzy logic and Gradient Boosting Decision Tree (GBDT) to reduce data complexity and increase the generalization of binary classification prediction. Then, we integrate Fuzzy-GBDT with Bagging to avoid over-fitting. The Bagging-Fuzzy-GBDT for multi-classification prediction further classify the severity of heart disease. Evaluation results demonstrate the Bagging-Fuzzy-GBDT has excellent accuracy and stability in both binary and multiple classification predictions.
Article
Diabetes is a chronic metabolic disorder with a high rate of morbidity and mortality. Insufficient insulin secretion and insulin action are two major causes for the development of diabetes, which is characterized by a persistent increase in blood glucose level. Diet and sedentary life style play pivotal role in development of vascular complications in type 2 diabetes. Dietary modification is associated with a reprogramming of nutrient intake, which are proven to be effective for the management of diabetes and associated complications. Dietary modifications modulate various molecular key players linked with the functions of nutrient signalling, regulation of autophagy, and energy metabolism. It activates silent mating type information regulation 2 homolog1 (SIRT1) and AMP-activated protein kinase (AMPK). AMPK mainly acts as an energy sensor and inhibits autophagy repressor Mammalian target of rapamycin (mTOR) under nutritional deprivation. Under CR, SIRT1 gets activated directly or indirectly and plays a central role in autophagy via the regulation of protein acetylation. Dietary modification is also effective in controlling inflammation and apoptosis by decreasing the level of pro-inflammatory cytokines like nuclear factor kappa- beta (NF-kβ), tissue growth factor-beta (TGF-β), tissue necrosis factor-alpha (TNF-α) and interleukin-6 (IL-6). It also improves glucose homeostasis and insulin secretion through beta cell regeneration. This indicates calorie intake plays a crucial role in the pathogenesis of type 2 diabetes-associated complications. The present review, emphasizes the role of dietary modifications in diabetes and associated complications.
Article
Stochastic configuration network (SCN), as a novel incremental generation model with supervisory mechanism, has an excellent superiority in solving large-scale data regression and classification problems. However, the accuracy of the SCN is affected by the assignation and selection of some network parameters significantly Sparrow search algorithm (SSA) is a new meta-heuristic algorithm that simulates the foraging and anti-predation behavior of sparrow population. In this paper, a stochastic configuration network based on chaotic sparrow search algorithm is first introduced, termed as CSSA-SCN. Firstly, chaotic sparrow search algorithm (CSSA) is designed which mainly utilizes logistic mapping, self-adaptive hyper-parameters, mutation operator to enhance the global optimization capability of SSA; Secondly, as the performance of SCN is related to regularization parameter r and scale factor λ of weights and biases, then CSSA is employed to give better parameters for SCN automatically; Finally, 13 benchmark functions and several datasets are used to evaluate the performance of CSSA and CSSA-SCN respectively. Experimental results demonstrate the feasibility and validity of CSSA-SCN compared with SCN and other contrast algorithms.
Article
In the last few years, the application of Machine Learning approaches like Deep Neural Network (DNN) models have become more attractive in the healthcare system given the rising complexity of the healthcare data. Machine Learning (ML) algorithms provide efficient and effective data analysis models to uncover hidden patterns and other meaningful information from the considerable amount of health data that conventional analytics are not able to discover in a reasonable time. In particular, Deep Learning (DL) techniques have been shown as promising methods in pattern recognition in the healthcare systems. Motivated by this consideration, the contribution of this paper is to investigate the deep learning approaches applied to healthcare systems by reviewing the cutting-edge network architectures, applications, and industrial trends. The goal is first to provide extensive insight into the application of deep learning models in healthcare solutions to bridge deep learning techniques and human healthcare interpretability. And then, to present the existing open challenges and future directions.
Article
Carbon price is the basis of developing a low carbon economy. The accurate carbon price forecast can not only stimulate the actions of enterprises and families, but also encourage the study and development of low carbon technology. However, as the original carbon price series is non-stationary and nonlinear, traditional methods are less robust to predict it. In this study, an innovative nonlinear ensemble paradigm of improved feature extraction and deep learning algorithm is proposed for carbon price forecasting, which includes complete ensemble empirical mode decomposition (CEEMDAN), sample entropy (SE), long short-term memory (LSTM) and random forest (RF). As the core of the proposed model, LSTM enhanced from the recurrent neural network is utilized to establish appropriate prediction models by extracting memory features of the long and short term. Improved feature extraction, as assistant data preprocessing, represents its unique advantage for improving calculating efficiency and accuracy. Removing irrelevant features from original time series through CEEMDAN lets learning easier and it's even better for using SE to recombine similar-complexity modes. Furthermore, compared with simple linear ensemble learning, RF increases the generalization ability for robustness to achieve the final nonlinear output results. Two markets' real data of carbon trading in china are as the experiment cases to test the effectiveness of the above model. The final simulation results indicate that the proposed model performs better than the other four benchmark methods reflected by the smaller statistical errors. Overall, the developed approach provides an effective method for predicting carbon price.
Article
Objective With the continuing rise in the global incidence of diabetes, the prevention of diabetes and control of associated medical expenses has become a public health issue worldwide. This study aims to identify the medical expenses of patients with diabetes in different regions of China and examine the differences in inpatient medical expenses and the impacts of them on these patients. Study design This study is a longitudinal analysis of medical expenses for inpatients with diabetes for different years; horizontal analysis of medical expenses among different regions; and literature review. Methods Data were derived from China's Medical Insurance Department database. We selected inpatients with diabetes in the eastern, central, and western regions of China for the period 2013–2015 and randomly selected data through systematic sampling. Results Among the 4150 patients with diabetes considered in this study, the patients' medical expenses were found to differ significantly across regions, years, ages, medical insurance types, medical institution levels, total medical expenses, medical insurance fund payments, and out-of-pocket (OOP) expenses. In addition, there were significant differences in total medical expenses for male and female patients. Furthermore, medical insurance type, patients' age, medical institution level, and year significantly influenced total medical expenses. Conclusions Inpatients with diabetes in different regions exhibited significant differences in total medical expenses, medical insurance fund payments, and OOP expenses. China should invest more in chronic disease treatment in its central and western regions, narrow the regional differences in medical expenses, and endeavor to ensure equity in the availability and cost of medical services. Moreover, patients with diabetes must be encouraged to access primary care to reduce their medical expenses.
Article
Air pollution can lead to a wide range of hazards and can affect most organisms on Earth. Therefore, managing and controlling air pollution has become a top priority for many countries. An effective short-term atmospheric pollutant concentration forecasting (SAPCF) can mitigate the negative effects of atmospheric pollution. In this paper, we propose a new hybrid forecasting model for SAPCF. Firstly, we analyse the influential factors of pollutants to obtain the optimal combination of input variables. Secondly, we use a clustering algorithm to enhance the regularity of our modelling data. Thirdly, we build a particle swarm optimisation (PSO)–support vector machine (SVM) hybrid model called PSO–SVM and perform a case study in Temple of Heaven, Beijing to test its forecasting accuracy and validate its performance against three contrastive models. The first model inputs all possible variables in equal weight without influence factor analysis. The second model integrates the same input variables used in the proposed model without clustering. The third model inputs these same variables with genetic-algorithm optimised SVM parameters. The comparison amongst these models demonstrates the superior performance of our proposed hybrid model. We further verify the forecasting results of our hybrid model by conducting statistical tests.
Article
Objective This paper constructs a mortality prediction system based on a real-world dataset. This mortality prediction system aims to predict mortality in heart failure (HF) patients. Effective mortality prediction can improve resources allocation and clinical outcomes, avoiding inappropriate overtreatment of low-mortality patients and discharging of high-mortality patients. This system covers three mortality prediction targets: prediction of in-hospital mortality, prediction of 30-day mortality and prediction of 1-year mortality. Materials and methods HF data are collected from the Shanghai Shuguang hospital. 10,203 in-patients records are extracted from encounters occurring between March 2009 and April 2016. The records involve 4682 patients, including 539 death cases. A feature selection method called Orthogonal Relief (OR) algorithm is first used to reduce the dimensionality. Then, a classification algorithm named Dynamic Radius Means (DRM) is proposed to predict the mortality in HF patients. Results and discussions The comparative experimental results demonstrate that mortality prediction system achieves high performance in all targets by DRM. It is noteworthy that the performance of in-hospital mortality prediction achieves 87.3% in AUC (35.07% improvement). Moreover, the AUC of 30-day and 1-year mortality prediction reach to 88.45% and 84.84%, respectively. Especially, the system could keep itself effective and not deteriorate when the dimension of samples is sharply reduced. Conclusions The proposed system with its own method DRM can predict mortality in HF patients and achieve high performance in all three mortality targets. Furthermore, effective feature selection strategy can boost the system. This system shows its importance in real-world applications, assisting clinicians in HF treatment by providing crucial decision information.
Article
Introduction: Since the year 2000, IDF has been measuring the prevalence of diabetes nationally, regionally and globally. Aim: To produce estimates of the global burden of diabetes and its impact for 2017 and projections for 2045. Methods: A systematic literature review was conducted to identify published studies on the prevalence of diabetes, impaired glucose tolerance and hyperglycaemia in pregnancy in the period from 1990 to 2016. The highest quality studies on diabetes prevalence were selected for each country. A logistic regression model was used to generate age-specific prevalence estimates or each country. Estimates for countries without data were extrapolated from similar countries. Results: It was estimated that in 2017 there are 451 million (age 18-99 years) people with diabetes worldwide. These figures were expected to increase to 693 million) by 2045. It was estimated that almost half of all people (49.7%) living with diabetes are undiagnosed. Moreover, there was an estimated 374 million people with impaired glucose tolerance (IGT) and it was projected that almost 21.3 million live births to women were affected by some form of hyperglycaemia in pregnancy. In 2017, approximately 5 million deaths worldwide were attributable to diabetes in the 20-99 years age range. The global healthcare expenditure on people with diabetes was estimated to be USD 850 billion in 2017. Conclusion: The new estimates of diabetes prevalence, deaths attributable to diabetes and healthcare expenditure due to diabetes present a large social, financial and health system burden across the world.
Conference Paper
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large num- ber of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alterna- tive to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of “Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivated by this example,we present a simplemethod for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Article
Over the past few years, there has been increased interest in data mining and machine learning methods to improve hospital performance, in particular hospitals want to improve their intensive care unit statistics by reducing the number of patients dying inside the intensive care unit. Research has focused on prediction of measurable outcomes, including risk of complications, mortality and length of hospital stay. The length of stay is an important metric both for healthcare providers and patients, influenced by numerous factors. In particular, the length of stay in critical care is of great significance, both to patient experience and the cost of care, and is influenced by factors specific to the highly complex environment of the intensive care unit. The length of stay is often used as a surrogate for other outcomes, where those outcomes cannot be measured; for example as a surrogate for hospital or intensive care unit mortality. The length of stay is also a parameter, which has been used to identify the severity of illnesses and healthcare resource utilisation. This paper examines a range of length of stay and mortality prediction applications in acute medicine and the critical care unit. It also focuses on the methods of analysing length of stay and mortality prediction. Moreover, the paper provides a classification and evaluation for the analytical methods of the length of stay and mortality prediction associated with a grouping of relevant research papers published in the years 1984 to 2016 related to the domain of survival analysis. In addition, the paper highlights some of the gaps and challenges of the domain.