Article

A Classification Model for Predicting Fetus with down Syndrome – A Study from Turkey

Taylor & Francis
Applied Artificial Intelligence
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The triple test is a screening test (blood test) used to calculate the probability of a pregnant woman having a fetus that has a chromosomal abnormality like Down Syndrome (DS). AFP (Alpha-Fetoprotein), hCG (Human Chorionic Gonadotropin), and uE3 (Unconjugated Estriol) values in the blood sample of pregnant women are computed and compared with the similar real records where the outputs (healthy fetus or a fetus with DS) are actually known. The likelihood of the indicators is used to calculate the probability of having a fetus with chromosomal abnormality like DS. However, high false positive rate of the triple test has been a problematic issue. One of the reasons of the high false positives is the differences in the norm values of indicators for the pregnant women from different geographical regions of a country. We use 81 patient records retrieved from Şahinbey Training and Research Hospital of Gaziantep University; Turkey. In our study, nine different classification algorithms were trained based on triple test indicators. Multilayer perceptron outperformed with 94.24% detection rate and 13% false positive rate. The multilayer perceptron can predict the outcome of triple test with a high level of accuracy and fewer patients are suggested for amniocentesis. This study is the first study using the MLP model for Turkish triple test data. Regional MLP models can eliminate the bias due to local biological differences.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Recently, machine learning and deep learning methods have become widespread in healthcare, such as arrhythmias classification [3], coronary artery disease diagnosis [4], cancer diagnosis [5], and COVID-19 detection [6]. In prenatal screening for DS, many traditional machine learning algorithms have been used to detect DS cases from the antenatal screening data, such as support vector machine (SVM) [7]- [9], decision tree [7], [9]- [11], random forest [8]- [10], ensemble learning [8], [12] or multilayer perceptron (MLP) [11], [13]. However, as no traditional machine learning model can beat all others in every dataset and task, numerous experiments are required to find the most suitable model. ...
... Recently, machine learning and deep learning methods have become widespread in healthcare, such as arrhythmias classification [3], coronary artery disease diagnosis [4], cancer diagnosis [5], and COVID-19 detection [6]. In prenatal screening for DS, many traditional machine learning algorithms have been used to detect DS cases from the antenatal screening data, such as support vector machine (SVM) [7]- [9], decision tree [7], [9]- [11], random forest [8]- [10], ensemble learning [8], [12] or multilayer perceptron (MLP) [11], [13]. However, as no traditional machine learning model can beat all others in every dataset and task, numerous experiments are required to find the most suitable model. ...
... Alptekein et al. applied and evaluated nine machine learning models to detect DS from the triple screening test (blood test) data and showed that the MLP model archived the best performance with a 94.24% detection rate and 13% false positive rate [11]. Xgboost was used to evaluate synthetic data-generating methods for the imbalanced data problem in [8]. ...
Conference Paper
Full-text available
One of the most common congenital anomalies in fetuses is known to be Down syndrome (DS). DS causes various adverse effects on the quality and length of life of children having DS and their families. Therefore, prenatal screening and diagnosis for DS are essential and valuable in antenatal care. Recently, machine learning methods for DS detection have become widespread. However, the existing methods, which use the traditional machine learning models, usually have several limitations while facing imbalanced data and missing data. This paper proposes a multi-branch CNN model combined with a feature rearrangement approach to improve the quality of DS prediction from prenatal screening data. The proposed feature rearrangement approach utilizes Pearson correlation testing and feature grouping to create a proper arrangement for the CNN model. Despite the imbalanced and highly missing data, the experiments show promising results with a Recall of 0.9023, F1-score of 0.8969, and balanced accuracy of 0.9314. These achievements outperform several traditional machine learning and attention-based deep learning models.
... Alptekin et al., in their studies [14] collected data with indices taken from the triple screening test during the second trimester. In the pre-processing step, they used vanilla SMOTE not only for class balancing but also for generating extra instances of the HRDS class. ...
Conference Paper
Full-text available
Down’s Syndrome (DS) is one of the most prevalent types of chromosomal abnormality. Developmental delays and disabilities on the physical and mental levels result from this syndrome. Hence, it must be detected as soon as possible. However, Down syndrome screening data tend to have a large overall data pool with a small proportion of positive cases, leading to an imbalanced class problem that causes classifiers to become biased. Moreover, the number of these data is also another problem, especially in Vietnam. These data are often private and challenging to collect correctly. This study utilizes synthetic data generating methods to maximize the detection rate of some traditional classification models. The final results of this study indicate that it is possible to improve the down syndrome prediction quality of the classifiers by adequately incorporating the SMOTE-based and GANs-based methods to generate synthetic data.
Chapter
Smart ultrasound images mixed with soft computing analysis is a cutting-edge method for medical diagnosis that promises to make healthcare more accurate, efficient, and open to everyone. Combining ultrasound technology with computer programs will completely change medical imaging analysis. Smart ultrasound imaging uses cutting-edge tools, like high-resolution sensors and real-time processing, to get clear, fast images of body parts. It makes it possible for machines to find trends, oddities, and minor problems that humans might miss. This makes diagnostic accuracy and reliability better all the time. It helps doctors find problems with a baby early in obstetrics, so they can fix them quickly and give the mother better prenatal care. It makes it easier to see how the heart works and find abnormalities that helps doctors make more accurate evaluations and create personalized treatment plans. This makes it easier for more people to get good healthcare, especially in places with few resources. This union has the power to make medicine better and patient care better all over the world.
Chapter
Finding birth flaws early is important for getting help right away. Machine learning (ML) methods promise recognition processes more accurate and faster. Convolutional neural networks (CNNs) and gradient boosting machines (GBMs) are used to find complicated patterns that could point to early birth problems. A variety of datasets like fetal ultrasound pictures, genetic data, mother health records, and demographic data are used in this study. The ML models are taught on labelled data, which includes accurate diagnoses of birth defects, and they are checked for accuracy using strict cross-validation methods. The proposed method is not only focused on getting at classifying things, but also on figuring out learning of biological processes that cause birth problems. The easy-to-use interface can be designed for healthcare professionals. Initial results show that the proposed method can correctly find neural tube defects and fetal heart and chromosomal abnormalities. This gives healthcare workers the ability to offer quick guidance and assistance to pregnant parents.
Article
Industry 4.0 technologies have revolutionized how care is provided to patients and how operations in hospitals are conducted. The Sustainable Development Goal (SDG) 3.1 aims to ensure access to healthcare facilities and a reduction in the global maternal mortality ratio to less than 70 per 100 000 live births. This paper investigates the state of the art in the adoption of Industry 4.0 technologies in maternal healthcare: how they are transforming methods of treatment, diagnosis and monitoring of pregnancy, and how they are reorganizing the management and organization of healthcare systems. A systematic literature review was carried out with 43 papers that met specified inclusion and exclusion criteria. It was found that most of the research focus is on the provision of solutions for low- to medium-income countries that are still lagging behind in reducing maternal and child mortality rates even though there has been an advancement in terms of Industry 4.0 technology use. The research was largely quantitative in nature with models and frameworks developed as opposed to prototypes. It mainly combined the Internet of things (IoT) with Cloud computing and Big data analytics (BDA). Artificial intelligence (AI) in maternal healthcare was similarly trending. Only one paper focused on the use of blockchain in facilitating health information exchange of maternal records. Recommendations for policy and a future research agenda are made.
Article
Full-text available
Detection of breast cancer is the preliminary phase in cancer diagnosis. So, classifiers with higher accuracy are always desired. A classifier with high accuracy offers very less chance to wrongly classify a patient of cancer. This research investigates the performance of a modified and improved version of the hypothesis used in the logistic regression. Both gradient descent and advanced optimization techniques are used for the minimization of the cost function. A weighting factor of β is assigned in the hypothesis which is a sigmoid function. The dependency of the weighting factor to the number of features, the size of the dataset and the type of optimization technique used are observed. The accuracy of breast cancer detection is improved significantly by appropriately choosing the value of β, which, is a function of both the number of features and the type of optimization techniques used. The obtained results are promising by providing a significant increment in accuracy, sensitivity, and specificity.
Article
Full-text available
Highly tensile manganese steel is in great demand owing to its high tensile strength under shock loads. All workpieces are produced through casting, because it is highly difficult to machine. The probabilistic aspects of its casting, its variable composition, and the different casting techniques must all be considered for the optimisation of its mechanical properties. A hybrid strategy is therefore proposed which combines decision trees and artificial neural networks (ANNs) for accurate and reliable prediction models for ore crushing plate lifetimes. The strategic blend of these two high-accuracy prediction models is used to generate simple decision trees which can reveal the main dataset features, thereby facilitating decision-making. Following a complexity analysis of a dataset with 450 different plates, the best model consisted of 9 different multilayer perceptrons, the inputs of which were only the Fe and Mn plate compositions. The model recorded a low root mean square error (RMSE) of only 0.0614 h for the lifetime of the plate: a very accurate result considering their varied lifetimes of between 746 and 6902 h in the dataset. Finally, the use of these models under real industrial conditions is presented in a heat map, namely a 2D representation of the main manufacturing process inputs with a colour scale which shows the predicted output, i.e. the expected lifetime of the manufactured plates. Thus, the hybrid strategy extracts core training dataset information in high-accuracy prediction models. This novel strategy merges the different capabilities of two families of machine-learning algorithms. It provides a high-accuracy industrial tool for the prediction of the full lifetime of highly tensile manganese steel plates. The results yielded a precision prediction of (RMSE of 0.061 h) for the full lifetime of (light, medium, and heavy) crusher plates manufactured with the three (experimental, classic, and highly efficient (new)) casting methods.
Article
Full-text available
Malignant mesothelioma (MM) is very aggressive progress tumors of the pleura. MM in humans results from exposure to asbestos and asbestiform fibers. The incidence of MM is extremely high in some Turkish villages. Under computation-ally efficient data mining (DM) techniques, classification procedures were performed for MM disease diagnosis. The support vector machine (SVM) achieved promising results, outperform-ing the multilayer perceptron ensembles (MLPE) neural network method. It was observed that SVM is the best classification with 99.87% accuracy obtained via 10-fold cross-validation in 5 runs when compare to MLPE neural network, which gives 99.56% classification accuracy. Sensitivity analysis is performed to find the important inputs for MM disease diagnosis under SVM model. Alkaline phosphatase (ALP) ranging from 300 to 500 gives the maximum possibility of having the MM disease. The MM disease dataset was prepared from a faculty of medicine's database using new patient's hospital reports from the south east region of Turkey.
Article
Full-text available
The Naive Bayes (NB) learning algorithm is simple and effective in many domains including text classification. However, its performance depends on the accuracy of the estimated conditional probability terms. Sometimes these terms are hard to be accurately estimated especially when the training data is scarce. This work transforms the probability estimation problem into an optimization problem, and exploits three metaheuristic approaches to solve it. These approaches are Genetic Algorithms (GA), Simulated Annealing (SA), and Differential Evolution (DE). We also propose a novel DE algorithm that uses multi-parent mutation and crossover operations (MPDE) and three different methods to select the final solution. We create an initial population by manipulating the solution generated by a method used for fine tuning the NB. We evaluate the proposed methods by using their resulted solutions to build NB classifiers and compare their results with the results of obtained from classical NB and Fine-Tuning Naïve Bayesian (FTNB) algorithm, using 53 UCI benchmark data sets. We name these obtained classifiers NBGA, NBSA, NBDE, and NB-MPDE respectively. We also evaluate the performance NB-MPDE for text-classification using 18 text-classification data sets, and compare its results with the results of obtained from FTNB, BNB, and MNB. The experimental results show that using DE in general and the proposed MPDE algorithm in particular are more convenient for fine-tuning NB than all other methods, including the other two metaheuristic methods (GA, and SA). They also indicate that NB-MPDE achieves superiority over classical NB, FTNB, NBDE, NBGA, NBSA, MNB, and BNB.
Article
Full-text available
The objective of the current study is to examine the potential value of using machine learning techniques such as artificial neural network (ANN) schemes for the non-invasive estimation, at 11-13 weeks of gestation, the risk for euploidy, trisomy 21 (T21) and other chromosomal aneuploidies (O.C.A.), from suitable sonographic, biochemical markers, and other relevant data. A database1 consisted of 51,208 singleton pregnancy cases while undergoing first trimester screening for aneuploidies has been used for the building, training and verification of the proposed method. From all the data collected for each case from the mother and the fetus, the following nine are considered by the collaborating obstetricians as the most relevant to the problem in question: maternal age, previous pregnancy with T21, fetal crown-rump length, serum free -hCG in multiples of the median (MoM), PAPP-A in MoM, nuchal translucency thickness, nasal bone, tricuspid flow and ductus venosus flow. The dataset was randomly divided into a training set that was used to guide the development of various ANN schemes, support vector machines and k-nearest neighbours models. An evaluation set used to determine the performance of the developed systems. The evaluation set, totally unknown to the proposed system contained 16,898 cases of euploidy fetuses, 129 cases of T21 and 76 cases of O.C.A. The best results were obtained by the ANN system which identified correctly all T21 cases i.e. 0% false negative rate (FNR) and 96.1% of euploidies i.e. 3.9% false positive rate (FPR), meaning that no child would have been born with T21 if only that 3.9% of all pregnancies had been sent for invasive testing. The aim of this work is to produce a practical tool for the obstetrician which will ideally provide 0% FNR, and recommend the minimum possible number of cases for further testing such as invasive. In conclusion it was demonstrated that ANN schemes can provide an effective early screening for fetal aneuploidies at a low FPR with results that compare favorably to those of existing systems.
Article
Full-text available
Purpose – The task of identifying activity classes from sensor information in smart home is very challenging because of the imbalanced nature of such data set where some activities occur more frequently than others. Typically probabilistic models such as Hidden Markov Model (HMM) and Conditional Random Fields (CRF) are known as commonly employed for such purpose. The paper aims to discuss these issues. Design/methodology/approach – In this work, the authors propose a robust strategy combining the Synthetic Minority Over-sampling Technique (SMOTE) with Cost Sensitive Support Vector Machines (CS-SVM) with an adaptive tuning of cost parameter in order to handle imbalanced data problem. Findings – The results have demonstrated the usefulness of the approach through comparison with state of art of approaches including HMM, CRF, the traditional C-Support vector machines (C-SVM) and the Cost-Sensitive-SVM (CS-SVM) for classifying the activities using binary and ubiquitous sensors. Originality/value – Performance metrics in the experiment/simulation include Accuracy, Precision/Recall and F measure.
Article
Full-text available
Down syndrome is the most common autosomal chromosome aneuploidy. The prenatal Down syndrome screening protocol has been known in Taiwan for the past 20 years. The maternal serum double markers required for the screening test was first implemented into the general prenatal check-up back in 1994, where it had around a 60% detection rate at a 5% false positive rate. The first trimester combined test was started in 2005, and the maternal serum quadruple test was introduced in 2008 to replace the previous double test. The overall detection rate for the current screening strategies (first trimester combined or second trimester quadruple test) in Taiwan ranges between 80% and 85% at a fixed 5% false positive rate. Noninvasive prenatal testing (NIPT) is the latest powerful fetal aneuploidy detection method and has become commercially available in Taiwan starting from 2013. The sensitivity and specificity for NIPT are very high (both over 99%) according to large worldwide studies. Our preliminary data for NIPT from 11 medical centers in Taiwan have also shown a 100% detection rate for Down syndrome and Edwards syndrome, respectively. Invasive chromosome studies such as amniocentesis or chorionic villus sampling cannot be replaced by NIPT, and all prenatal screening and NIPT results require confirmation using invasive testing. This review discusses the Down syndrome screening method assessments and the progress of NIPT in Taiwan.
Article
Full-text available
This paper reviews methods to fix a number of hidden neurons in neural networks for the past 20 years. And it also proposes a new method to fix the hidden neurons in Elman networks for wind speed prediction in renewable energy systems. The random selection of a number of hidden neurons might cause either overfitting or underfitting problems. This paper proposes the solution of these problems. To fix hidden neurons, 101 various criteria are tested based on the statistical errors. The results show that proposed model improves the accuracy and minimal error. The perfect design of the neural network based on the selection criteria is substantiated using convergence theorem. To verify the effectiveness of the model, simulations were conducted on real-time wind data. The experimental results show that with minimum errors the proposed approach can be used for wind speed prediction. The survey has been made for the fixation of hidden neurons in neural networks. The proposed model is simple, with minimal error, and efficient for fixation of hidden neurons in Elman networks.
Article
Full-text available
Background & objectives: Triple test as prenatal screening procedure does not form a part of routine health care of pregnant women in India. Hence, median values of triple test biomarkers are lacking for Indian population. This study was undertaken to establish population-specific medians for biomarkers viz. alpha-foetoprotien (AFP), human chorionic gonadotropin (hCGβ), and unconjugated estriol (uE3) for detection of Down's syndrome, Edward's syndrome and neural tube defects (NTDs) in pregnant women in north-west India. Methods: Serum biomarker values were derived from 5420 pregnant women between 15-20 wk of gestation who were enrolled for triple test investigations at Department of Gynecology and Obstetrics, Government Medical College and Hospital, Chandigarh, India, between January, 2007 to December, 2009. Median values were calculated for rounded weeks using database comprising pregnancies with normal outcomes only. Simple statistical analysis and log-linear regression were used for median estimation of the biomarker values. Results: The levels of the three biomarkers were found to be ranging from 1.38 to 187.00 IU/ml for AFP, 1.06 to 315 ng/ml for hCGβ, and 0.25 to 28.5 nmol/l for uE3. The age of women ranged from 18 to 47 yr and mean weight was 57.9 ± 9.8 kg. Data revealed that AFP, hCGβ and uE3 medians in our study population were not significantly different from those reported from other countries or when compared ethnically. Interpretation & conclusion: The population-specific median values for the three biomarkers (AFP, hCGβ, uE3) may be used as reference values during prenatal screening in Indian pregnant women.
Conference Paper
Full-text available
In classification, when the distribution of the training data among classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. The features in the minority classes are normally difficult to be fully recognized. In this paper, a method is proposed to enhance the classification accuracy for the minority classes. The proposed method combines Synthetic Minority Over-sampling Technique (SMOTE) and Complementary Neural Network (CMTNN) to handle the problem of classifying imbalanced data. In order to demonstrate that the proposed technique can assist classification of imbalanced data, several classification algorithms have been used. They are Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM). The benchmark data sets with various ratios between the minority class and the majority class are obtained from the University of California Irvine (UCI) machine learning repository. The results show that the proposed combination techniques can improve the performance for the class imbalance problem.
Article
Full-text available
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Article
The availability of constant electricity supply is a crucial factor to the performance of any industry. Nevertheless, the unstable supply of electricity in Cameroon has led to countless periods of electricity load shedding, hence, making the management of the telecom industry to fall on backup power supply such as diesel generators. The fuel consumption of these generators remain a challenge due to some perturbations in the mechanical fuel level gauges and lack of maintenance at the base stations resulting to fuel pilferage. In order to overcome these effects, we detect anomalies in the recorded data by learning the patterns of the fuel consumption using four classification techniques namely; support vector machines (SVM), K-Nearest Neighbors (KNN), Logistic Regression (LR), and MultiLayer Perceptron (MLP) and then compare the performance of these classification techniques on a test data. In this paper, we show the use of supervised machine learning classification based techniques in detecting anomalies associated with the fuel consumed dataset from TeleInfra base stations using the generator as a source of power. Here, we perform the normal feature engineering, selection, and then fit the model classifiers to obtain results and finally, test the performance of these classifiers on a test data. The results of this study show that MLP has the best performance in the evaluation measurement recording a score of 96% in the K-fold cross validation test. In addition, because of class imbalance in the observation, we use the precision-recall curve instead of the ROC curve and registered the probability of the Area Under Curve (AUC) as 0.98.
Article
Interoperable ontologies already exist in the biomedical field, enabling scientists to communicate with minimum ambiguity. Unfortunately, ontology languages, in the semantic web, such as OWL and RDF(S), are based on crisp logic and thus they cannot handle uncertain knowledge about an application field, which is unsuitable for the medical domain. In this paper, we focus on modeling incomplete knowledge in the classical OWL ontologies, using Bayesian networks, all keeping the semantic of the first ontology, and applying algorithms dedicated to learn parameters of Bayesian networks in order to generate the Bayesian networks. We use EM algorithm for learning conditional probability tables of different nodes of Bayesian network automatically, contrary to different tools of Bayesian networks where probabilities are inserted manually. To validate our work, we have applied our model on the diagnosis of liver cancer using classical ontology containing incomplete instances, in order to handle medical uncertain knowledge, for predicting a liver cancer.
Article
POSTPRINT AVAILABLE IN: https://hdl.handle.net/10481/87759 The characterization of building thermal behaviour is crucial to achieve the low-carbon objectives of the European Union by 2050. In this way, the knowledge of the thermal transmittance is being developed as a significant factor of the thermophysical properties of the envelope. In the existing building, the theoretical calculation has several limitations with non-destructive techniques typical of the deterioration of the elements. Many experimental methods obtain therefore more representative results. The experimental method developed in ISO 9869–1 is the most standardized, although it presents limitations in the heat flux measurement. However, the thermometric method obtains the thermal transmittance with the surface temperature. This research is focused on the evaluation of the thermal transmittance based on ISO 9869–1 (average method and average method with correction for storage effects), but using variables measured with the thermometric method. For this purpose, multilayer perceptrons were used as post-processing techniques. The models were trained by using a dataset of 22,820 simulated tests of representative walls of the building stock in Spain. The determination coefficient was greater than 98% in both analysis approaches. Individual models were also generated for each building period because they significantly influenced the input variables. The results showed that thermal transmittance values can be obtained without measuring the heat flux, and the error associated with the use of tabulated values for the total internal heat transfer can be removed. This research would guarantee a high assessment tax of buildings establishing adequate energy conservation measures to improve their energy performance.
Article
POSTPRINT AVAILABLE IN: https://hdl.handle.net/10481/87752 Many studies are focused on the diagnosis of fuel poverty. However, its prediction before occupying households is a developing research area. This research studies the feasibility of implementing the Fuel Poverty Potential Risk Index (FPPRI) in different climate zones of Chile by means of regression models based on artificial neural networks (ANNs). A total of 116,640 representative case studies were carried out in the three cities with the largest population in Chile: Santiago, Concepción, and Valparaiso. Apart from energy price (EP) and income (IN), 9 variables related to the morphology of the building were considered in approach 1. Furthermore, approach 2 was developed by including comfort hours (NCH). A total of 84 datasets were combined considering both approaches and the 5 most unfavourable deciles according to the income level of Chilean families. The results of both approaches showed a better performance in the use of individual models for each climate (MLPC, MLPS, and MLPV), and the dataset with all deciles (Full) could be used. Regarding the influence of the input variables on the models, IN was the most determinant, and NCH becomes important in approach 2. The potential of using this methodology to allocate social housing would guarantee the main objective of the country: the reduction of fuel poverty in the roadmap for 2050.
Article
Problem statement: In the U.S., a safety rating is assigned to each motor carrier based on data obtained from the Motor Carrier Management Information System (MCMIS) and an on-site investigation. While researchers have identified variables associated with the safety ratings, the specific direction of the relationships are not necessarily clear. Objective: The objective of this study is to identify those relationships involved in the safety ratings of interstate motor carriers, the largest users of the U.S. transportation network. Method: Bayesian networks are used to learn these relationships from data obtained from MCMIS for a 6-year period (2007-2012). Results: Our study shows that safety rating assignment is a complex process with only a subset of the variables having statistically significant relationship with safety rating. They include driver out-of-service violations, weight violations, traffic violations, fleet size, total employed drivers, and passenger & general carrier indicators. Application: The findings have both immediate implications and long term benefits. The immediate implications relate to better identification of unsafe motor carriers, and the long term benefits pertain to policies and crash countermeasures that can enhance carrier safety.
Article
Introduction: The clinical course of chronic obstructive pulmonary disease (COPD) is marked by acute exacerbation events that increase hospitalization rates and healthcare spending. The early identification of future high-cost patients with COPD may decrease healthcare spending by informing individualized interventions that prevent exacerbation events and decelerate disease progression. Existing studies of cost prediction of other chronic diseases have applied regression and machine-learning methods that cannot capture the complex causal relationships between COPD factors. Thus, the exploration of these factors through nonlinear, high-dimensional but explainable modeling is greatly needed. Objectives: We aimed to develop a machine-learning model to identify future high-cost patients with COPD. Such a model should incorporate expert knowledge about causal relationships, and the method for estimating the model could provide more accurate predictions than other machine learning methods. Methods: We used the 2011-2013 medical insurance data of patients with COPD in a large city. The data set included demographic information and admission records. Leveraging on developments in graphical modeling methods, we proposed a smooth Bayesian network (SBN) model for the prediction of high-cost individuals using medical insurance data. The modeling method incorporated some expert knowledge about causal relationships (i.e., about the Bayesian network structure). We employed a smoothing kernel based on the weighted nearest neighborhood method in the SBN model to address overfitting, case-mix effect, and data sparsity (i.e., using data about "similar patients"). Results: The proposed SBN achieved the area under curve (AUC) of 0.80 and showed considerable improvement over the baseline machine-learning methods. Besides confirming the known factors from the literature, we found "region" (i.e., a suburban or urban area) to be a significant factor, and that in a 3-tier system with primary, secondary and tertiary hospitals, COPD patients who had been admitted to primary hospitals were more likely to develop into future high-cost patients than patients who had been admitted to tertiary hospitals. Conclusion: The proposed SBN model not only obtained higher prediction accuracy and stronger generalizability than a number of benchmark machine-learning methods, but also used the Bayesian network to capture the complex causal relationships between different predictors by incorporating expert knowledge. Furthermore, a framework was developed to establish the relationships between exposure to historical trajectory and future outcome, which can also be applied to other temporal data to model different trajectory information and predict other outcomes.
Article
Developing an accurate forecasting model for long-term gold price fluctuations plays a vital role in future investments and decisions for mining projects and related companies. Viewed from this perspective, this paper proposes a novel model for accurately forecasting long-term monthly gold price fluctuations. This model uses a recent meta-heuristic method called whale optimization algorithm (WOA) as a trainer to learn the multilayer perceptron neural network (NN). The results of the proposed model are compared to other models, including the classic NN, particle swarm optimization for NN (PSO–NN), genetic algorithm for NN (GA–NN), and grey wolf optimization for NN (GWO–NN). Additionally, we employ ARIMA models as the benchmark for assessing the capacity of the proposed model. Empirical results indicate the superiority of the hybrid WOA–NN model over other models. Moreover, the proposed WOA–NN model demonstrates an improvement in the forecasting accuracy obtained from the classic NN, PSO–NN, GA–NN, GWO–NN, and ARIMA models by 41.25%, 24.19%, 25.40%, 25.40%, and 85.84% decrease in mean square error, respectively.
Article
Stress is inevitably experienced by almost every person at some stage of their life. A reliable and accurate measurement of stress can give an estimate of an individual's stress burden. It is necessary to take essential steps to relieve the burden and regain control for better health. Listening to music is a way that can help in breaking the hold of stress. This study examines the effect of music tracks in English and Urdu language on human stress level using brain signals. Twenty-seven subjects including 14 males and 13 females having Urdu as their first language, with ages ranging from 20 to 35 years, voluntarily participated in the study. The electroencephalograph (EEG) signals of the participants are recorded, while listening to different music tracks by using a four-channel MUSE headband. Participants are asked to subjectively report their stress level using the state and trait anxiety questionnaire. The English music tracks used in this study are categorized into four genres i.e., rock, metal, electronic, and rap. The Urdu music tracks consist of five genres i.e., famous, patriotic, melodious, qawali, and ghazal. Five groups of features including absolute power, relative power, coherence, phase lag, and amplitude asymmetry are extracted from the preprocessed EEG signals of four channels and five bands, which are used by the classifier for stress classification. Four classifier algorithms namely sequential minimal optimization, stochastic decent gradient, logistic regression (LR), and multilayer perceptron are used to classify the subject's stress level into two and three classes. It is observed that LR performs well in identifying stress with the highest reported accuracy of 98.76% and 95.06% for two- and three-level classification respectively. For understanding gender, language, and genre related discriminations in stress, a t-test and one-way analysis of variance is used. It is evident from results that English music tracks have more influence on stress level reduction as compared to Urdu music tracks. Among the genres of both languages, a noticeable difference is not found. Moreover, significant difference is found in the scores reported by females as compared to males. This indicates that the stress behavior of females is more sensitive to music as compared to males.
Article
With high-dimensional data emerging in various domains, sparse logistic regression models have gained much interest of researchers. Variable selection plays a key role in both improving the prediction accuracy and enhancing the interpretability of built models. Bayesian variable selection approaches enjoy many advantages such as high selection accuracy, easily incorporating many kinds of prior knowledge and so on. Because Bayesian methods generally make inference from the posterior distribution with Markov Chain Monte Carlo (MCMC) techniques, however, they become intractable in high-dimensional situations due to the large searching space. To address this issue, a novel variational Bayesian method for variable selection in high-dimensional logistic regression models is presented. The proposed method is based on the indicator model in which each covariate is equipped with a binary latent variable indicating whether it is important. The Bernoulli-type prior is adopted for the latent indicator variable. As for the specification of the hyperparameter in the Bernoulli prior, we provide two schemes to determine its optimal value so that the novel model can achieve sparsity adaptively. To identify important variables and make predictions, one efficient variational Bayesian approach is employed to make inference from the posterior distribution. The experiments conducted with both synthetic and some publicly available data show that the new method outperforms or is very competitive with some other popular counterparts.
Article
Statistical predictive models play an important role in learning analytics. In this work, we seek to harness the power of predictive modeling methodology for the development of an analytics framework in STEM student success efficacy studies. We develop novel predictive analytics tools to provide stakeholders automated and timely information to assess student performance toward a student success outcome, and to inform pedagogical decisions or intervention strategies. In particular, we take advantage of the random forest machine learning algorithm, proposing a number of innovations to identify key input thresholds, quantify the impact of inputs on student success, evaluate student success at benchmarks in a program of study, and obtain a student success score. The proposed machinery can also tailor information for advisers to identify the risk levels of individual students in efforts to enhance STEM persistence and STEM graduation success. We additionally present our predictive analytics pipeline, motivated by and illustrated in a particular STEM student success study at San Diego State University. We highlight the process of designing, implementing, validating, and deploying analytical tools or dashboards, and emphasize the advantage of leveraging the utilities of both statistical analyses and business intelligence tools in order to maximize functionality and computational capacity.
Article
As a result of some events (disasters, inheritance, disappearances etc.), age and gender determination can be vital for people. Forensic medical institutions make the determination of age by examining the structures such as teeth and bones. Procedures for forensic science are currently estimated manually according to certain morphological findings on the tooth. In this study, 1313 panoramic dental images were used automatically to estimate age. Image preprocessing is applied on these images. Trapezoidal teeth are corrected in the coordinate plane to obtain more accurate and standard results. In the study, the correction process is done with original and novel developed algorithm. Dental images are automatically and dynamically segmented and feature vectors are created by extracting their features. The generated feature vectors are dynamic and presented as inputs to the Multilayer Perceptron Neural Network. Depending on the request, the number of input count reduction process can be performed. In this study, age and gender were determined from dental x-ray images with novel and originally developed algorithm. The application is written in C # programming language. In some tooth groups, the highest classification rate of 100% and age determination with 0 error were performed. With this study, age determination in forensic science will be more accurate.
Article
This study focuses on incomplete data classification with the support of different partial discriminant analyses. When samples contain missing values, discriminant analyses such as Principal Component Analysis and Fisher Discriminant Analysis are inapplicable. Partial discriminant analyses that measure the importance of individual dimensions for incomplete data become necessary. Partial discriminant analyses do not rely on data imputation. The analyses can select and sort dimensions (i.e., predictors) based on discriminability in incomplete data. However, the typical approach Fisher Discriminant Ratios may result in biased estimation due to unequal variance of classes according to statistical literature. This study examines various partial discriminant ratios to discover effective approaches for relieving such a problem by considering different variance in computation. Experiments on an open dataset were carried out during the evaluation. Comparisons included discriminability of Partial Fisher Discriminant Ratios, Partial Welch Discriminant Ratios, and their derivatives in incomplete date classification.
Article
Screening for fetal chromosomal disorders has evolved greatly over the last four decades. Initially, only maternal age-related risks of aneuploidy were provided to patients. This was followed by screening with maternal serum analytes and ultrasound markers, followed by the introduction and rapid uptake of maternal plasma cell-free DNA-based screening. Studies continue to demonstrate that cfDNA screening for common aneuploidies has impressive detection rates with low false-positive rates. The technology continues to push the boundaries of prenatal screening as it is now possible to screen for less common aneuploidies and subchromosomal disorders. The optimal method for incorporating cfDNA screening into existing programs continues to be debated. It is important that obstetricians understand the biological foundations and limitations of this technology and provide patients with up-to-date information regarding cfDNA screening.
Article
Objective: The main goal of this study was to develop an automatic method based on supervised learning methods, able to distinguish healthy from pathologic arterial pulse wave (APW), and those two from noisy waveforms (non-relevant segments of the signal), from the data acquired during a clinical examination with a novel optical system. Materials and methods: The APW dataset analysed was composed by signals acquired in a clinical environment from a total of 213 subjects, including healthy volunteers and non-healthy patients. The signals were parameterised by means of 39 pulse features: morphologic, time domain statistics, cross-correlation features, wavelet features. Multiclass Support Vector Machine Recursive Feature Elimination (SVM RFE) method was used to select the most relevant features. A comparative study was performed in order to evaluate the performance of the two classifiers: Support Vector Machine (SVM) and Artificial Neural Network (ANN). Results and discussion: SVM achieved a statistically significant better performance for this problem with an average accuracy of 0.9917 ± 0.0024 and a F-Measure of 0.9925 ± 0.0019, in comparison with ANN, which reached the values of 0.9847 ± 0.0032 and 0.9852 ± 0.0031 for Accuracy and F-Measure, respectively. A significant difference was observed between the performances obtained with SVM classifier using a different number of features from the original set available. Conclusion: The comparison between SVM and NN allowed reassert the higher performance of SVM. The results obtained in this study showed the potential of the proposed method to differentiate those three important signal outcomes (healthy, pathologic and noise) and to reduce bias associated with clinical diagnosis of cardiovascular disease using APW.
Article
Objectives To examine the costs and outcomes of different screening strategies for Down Syndrome (DS) in singleton pregnancies. Study Design A decision-analytic model was developed to compare the costs and the outcomes of different prenatal screening strategies. Five strategies were compared for women under 35-year of age: 1A) triple test (TT), 2A); combined test (CT), 3A) Non-invasive Prenatal Screening Test by using cell free fetal DNA (NIPT), 4A) and 5A) NIPT as a second-step screening for high-risk patients detected by either TT, or CT respectively. For women ≥35-year of age, 1B) implementing invasive test (amniocentesis –AC) and 2B) NIPT for all women were compared. Data was analyzed to obtain the outcomes, total costs, the cost per women and the incremental cost-effectiveness ratios (ICERs) for screening strategies. Results Among the current strategies for women under 35 years old, CT is clearly dominated to TT, as it is more effective and less costly. Although, the current routine practice (2A) is the least-costly strategy, implementing NIPT as a second step screening to high-risk women identified by CT (5A) would be more effective than 2A; leading to a 10.2% increase in the number of detected DS cases and a 96.3% reduction in procedural related losses (PRL). However, its cost to the Social Security Institution that is a public entity would be 17 times higher and increase screening costs by 1.5 times. Strategy 5A would result in an incremental cost effectiveness of 6,873,082 (PPP) US$ when compared to the current one (2A). Strategy 1B-for offering AC to all women ≥35-year of age is dominated over NIPT (2B), as it would detect more DS cases and would be less costly. On the other hand, there would be 206 PRL associated with AC, but NIPT provides clear clinical benefits as there would be no PRL with NIPT. Conclusions NIPT leads to very high costs despite its high effectiveness in terms of detecting DS cases and avoiding PRL. The cost of NIPT should be decreased, otherwise, only individuals who can afford to pay from out-of-pocket could benefit. We believe that reliable cost-effective prenatal screening policies are essential in countries with low and smiddle income and high birth rates as well.
Article
Purpose Data mining is widely considered necessary in many business applications for effective decision-making. The importance of business data mining is reflected by the existence of numerous surveys in the literature focusing on the investigation of related works using data mining techniques for solving specific business problems. The purpose of this paper is to answer the following question: What are the widely used data mining techniques in business applications? Design/methodology/approach The aim of this paper is to examine related surveys in the literature and thus to identify the frequently applied data mining techniques. To ensure the recent relevance and quality of the conclusions, the criterion for selecting related studies are that the works be published in reputed journals within the past 10 years. Findings There are 33 different data mining techniques employed in eight different application areas. Most of them are supervised learning techniques and the application area where such techniques are most often seen is bankruptcy prediction, followed by the areas of customer relationship management, fraud detection, intrusion detection and recommender systems. Furthermore, the widely used ten data mining techniques for business applications are the decision tree (including C4.5 decision tree and classification and regression tree), genetic algorithm, k-nearest neighbor, multilayer perceptron neural network, naïve Bayes and support vector machine as the supervised learning techniques and association rule, expectation maximization and k-means as the unsupervised learning techniques. Originality/value The originality of this paper is to survey the recent 10 years of related survey and review articles about data mining in business applications to identify the most popular techniques.
Article
Our concept of nucleic acid biology has advanced dramatically over the past two decades, with a growing appreciation that cell-free DNA (cfDNA) fragments are present in all body fluids including plasma. In no other field has plasma DNA been as rapidly translated into clinical practice as in noninvasive prenatal testing (NIPT) for fetal chromosome abnormalities. NIPT is a screening test that requires confirmation with diagnostic testing, but other applications of cfDNA provide diagnostic information and do not require invasive testing. These applications are referred to as noninvasive prenatal diagnosis (NIPD) and include determination of fetal sex, blood group and some single gene disorders. As technology advances, noninvasive tests based on cell-free nucleic acids will continue to expand. This review will outline the technical and clinical aspects of NIPT and NIPD relevant to the daily practice of maternity carers.
Article
k nearest neighbor (kNN) is one of the basic processes behind various machine learning methods In kNN, the relation of a query to a neighboring sample is basically measured by a similarity metric, such as Euclidean distance. This process starts with mapping the training dataset onto a one-dimensional distance space based on the calculated similarities, and then labeling the query in accordance with the most dominant or mean of the labels of the k nearest neighbors, in classification or regression issues, respectively. The number of nearest neighbors (k) is chosen according to the desired limit of success. Nonetheless, two distinct samples may have equal distances to query but, with different angles in the feature space. The similarity of the query to these two samples needs to be weighted in accordance with the angle going between the query and each of the samples to differentiate between the two distances in reference to angular information. This opinion can be analyzed in the context of dependency and can be utilized to increase the precision of classifier. With this point of view, instead of kNN, the query is labeled according to its nearest dependent neighbors that are determined by a joint function, which is built on the similarity and the dependency. This method, therefore, may be called dependent NN (d-NN). To demonstrate d-NN, it is applied to synthetic datasets, which have different statistical distributions, and 4 benchmark datasets, which are Pima Indian, Hepatitis, approximate Sinc and CASP datasets. Results showed the superiority of d-NN in terms of accuracy and computation cost as compared to other employed popular machine learning methods.
Article
Credal Decision Trees (CDTs) are algorithms to design classifiers based on imprecise probabilities and uncertainty based information measures. These algorithms are suitable when noisy data sets are classified. This fact is explained in this paper in terms of the split criterion used in the new procedure of a special type of CDT, called Credal-C4.5. This criterion is more robust to noise than the split criteria of the classic decision trees. On the other hand, all the CDTs depend on a parameter s that has a direct relation with the size of the tree built. The known C4.5 and the Credal-C4.5 procedures are equivalent when the value is used. The relation between the value of this parameter s and the capacity of the Credal-C4.5 to classify data sets with label noise is studied. Also, an experimental research is carried out where Credal-C4.5 with different values for s are compared when they classify data sets with distinct label noise levels. From this experimentation we can conclude that, if it is possible to estimate the noise level of a data set, the choice of the correct value for s is a key point in order to obtain notably better results.
Article
Background: Noninvasive prenatal screening has become an increasingly prevalent choice for women who desire aneuploidy screening. Although the test characteristics are impressive, some women are at increased risk for noninvasive prenatal screen failure. The risk of test failure increases with maternal weight; thus, obese women may be at elevated risk for failure. This risk of failure may be mitigated by the addition of a paternal cheek swab and screening at a later gestational age. Objective: The purpose of this study was to evaluate the association among obesity, gestational age, and paternal cheek swab in the prevention of screening failure. Study Design: A retrospective cohort study was performed for women who were ≥35 years old at delivery who underwent screening at NorthShore University HealthSystem, Evanston, IL. Maternal weight, body mass index, gestational age, and a paternal cheek swab were evaluated in univariate and multivariable logistic regression analyses to assess the association with failed screening. Results: Five hundred sixty-five women met inclusion criteria for our study. The mean body mass index was 25.9 ± 5.1 kg/m2; 111 women (20%) were obese (body mass index, ≥30 kg/m2). Forty-four women (7.8%) had a failed screen. Obese women had a failure rate of 24.3% compared with 3.8% in nonobese women (P < .01). Gestational age was not associated with failure rate (mean ± standard deviation, 13 ± 3 weeks for both screen failure and nonfailure; P = .76). The addition of a paternal cheek swab reduced the failure rate from 10.2% in women with no swab to 3.8% in women with a swab (P < .01). In multivariable analysis, obesity and lack of a paternal cheek swab were independent predictors of screen failure (odds ratio, 9.75; 95% confidence interval, 4.85-19.61; P < .01; and odds ratio, 3.61; 95% confidence interval, 1.56-8.33; P < .01, respectively). Conclusion: The addition of a paternal cheek swab significantly improved noninvasive prenatal screen success rates in obese women. However, delaying testing to a later gestational age did not.
Chapter
Cell-free DNA (cfDNA) testing has recently become indispensable in diagnostic testing and screening. In the prenatal setting, this type of testing is often called noninvasive prenatal testing (NIPT). With a number of techniques, using either next-generation sequencing or single nucleotide polymorphism-based approaches, fetal cfDNA in maternal plasma can be analyzed to screen for rhesus D genotype, common chromosomal aneuploidies, and increasingly for testing other conditions, including monogenic disorders. With regard to screening for common aneuploidies, challenges arise when implementing NIPT in current prenatal settings. Depending on the method used (targeted or nontargeted), chromosomal anomalies other than trisomy 21, 18, or 13 can be detected, either of fetal or maternal origin, also referred to as unsolicited or incidental findings. For various biological reasons, there is a small chance of having either a false-positive or false-negative NIPT result, or no result, also referred to as a “no-call.” Both pre- and posttest counseling for NIPT should include discussing potential discrepancies. Since NIPT remains a screening test, a positive NIPT result should be confirmed by invasive diagnostic testing (either by chorionic villus biopsy or by amniocentesis). As the scope of NIPT is widening, professional guidelines need to discuss the ethics of what to offer and how to offer. In this review, we discuss the current biochemical, clinical, and ethical challenges of cfDNA testing in the prenatal setting and its future perspectives including novel applications that target RNA instead of DNA.
Article
We use a 1993 policy change in Israel's public healthcare system that lowered the eligibility age for amniocentesis to 35 to study the effects of financing of screening tests. Financing is found to have increased amniocentesis testing by about 35%. At ages above the eligibility threshold, utilization rates rose to roughly 33%, reflection nearly full takeup among prospective users of amniocentesis. Additionally, whereas below the age-35 threshold amniocentesis utilization rates increase with maternal age, this relation is muted above this age. Finally, no evidence is found that financing affects outcomes such as pregnancy terminations and births of children with Down syndrome. These results support the view that women above the eligibility threshold tend to refrain from acquiring inexpensive information about their degree of risk that absent the financing they would acquire, and instead, undergo the accurate and costly test regardless of additional information that noninvasive screening would provide.
Article
Non-invasive prenatal screening (NIPS) for fetal chromosome defects has high sensitivity and specificity but is not fully diagnostic. In response to a desire to provide more information to individual women with positive NIPS results, two on-line calculators have been developed to calculate post-test risks. Use of these calculators is critically reviewed. There is a mathematically dictated requirement for a precise estimate for the specificity in order to provide an accurate post-test risk. This illustrated by showing that a 0.1% decrease in the value for specificities for trisomies 21, 18 and 13, can reduce the post-test risks from 79% to 64% for trisomy 21, 39% to 27% for trisomy 18, and 21% to 13% for trisomy 13, respectively. Use of the calculators assumes that sensitivity and specificity are constant for all women receiving the test but there is evidence that discordancy between screening results and true fetal karyotype is more common for older women. Use of an appropriate value for the prior risk is also important and for rare disorders there is considerable uncertainty regarding prevalence. For example, commonly used rates for trisomy 13, monosomy-X, triploidy and 22q11.2 deletion syndrome can vary by more than 4-fold and this can translate into large differences in post-test risk. When screening for rare disorders, it may not be possible to provide a reliable post-test risk if there is uncertainty over the false-positive rate and/or prevalence. These limitations, per se, do not negate the value of screening for rare conditions. However, counselors need to carefully weigh the validity of post-test risks before presenting them to patients. Additional epidemiologic and NIPS outcome data is needed.
Article
Prenatal diagnostic testing is available for a growing number of disorders. The goal of prenatal diagnosis was initially focused on the identification of Down syndrome in women aged 35 years and older, but invasive prenatal genetic techniques can now detect a far broader array of conditions. The risks of invasive procedures have also decreased over time. Advances in genomic medicine allow testing for smaller but significant chromosomal abnormalities known as copy number variants, in addition to major aneuploidies and structural rearrangements. Molecular DNA techniques can detect many single-gene conditions. In the future, it is likely that whole-exome and whole-genome sequencing will be applied to prenatal genetic testing to allow identification of yet more genetic disorders. With advances in technology, the indications for testing have likewise evolved far beyond recommendations based solely on maternal age to include a more patient-centered view of the goals of prenatal testing.
Article
Patients and providers are faced with a wide array of choices to screen for structural abnormalities, aneuploidy, and genetic diseases in the prenatal period. It is important to consider the features of the diseases being screened for, the characteristics of the screening tests used, and the population being screened when evaluating prenatal screening techniques.
Article
This study develops a novel degradation assessment index (DAI) from acoustic emission signals obtained from slow rotating bearings and integrates the same into alternative Bayesian methods for the prediction of remaining useful life (RUL). The DAI is obtained by the integration of polynomial kernel principal component analysis (PKPCA), Gaussian mixture model (GMM), and exponentially weighted moving average (EWMA). The DAI is then used as inputs in several Bayesian regression models, such as the multilayer perceptron (MLP), radial basis function (RBF), Bayesian linear regression (BLR), Gaussian mixture regression (GMR), and the Gaussian process regression (GPR) for RUL prediction. The combination of the DAI with the GPR model, otherwise known as the DAIGPR gave the best prediction with the least error. The findings show that the GPR model is suitable and effective in the prediction of RUL of slow rotating bearings and robust to varying operating conditions. Further, the findings are also robust when the training and tests sets are obtained from dependent and independent samples. Therefore, the GPR model is found useful for monitoring the condition of machines in order to implement effective preventive rather than reactive maintenance, thereby maximizing safety and asset availability.
Article
During radiotherapy treatment of patients with head-and-neck cancer, the possibility that parotid glands shrink was evidenced, connected with increasing risk of acute toxicity. In this ambit, the early identification of patients in danger is of primary importance, in order to treat them with adaptive therapy. This work studies different approaches for classifying parotid gland samples, taking into account textural features extracted from computed tomography (CT) images of monitored patients. A real dataset is used, and accuracy, sensitivity and specificity are counted as classification performances. Therefore, firstly, different procedures to define classes are compared in terms of their physical meaning and classification performances. Then, different methods for extracting knowledge from the dataset are implemented and compared in terms of performances and model interpretability. First-rate performance was obtained by using Likelihood-Fuzzy Analysis (LFA), which is a recently developing method based on the use of statistical information by means of Fuzzy Logic. The interpretable models extracted with LFA also allow identifying among textural features those able to predict parotid shrinkage. Some of these features are already known and are confirmed here, others are new, and some of them are very early predictors. Finally, an example of textural feature monitoring and classification of a patient is presented, through a reasoning scheme similar to human reasoning, based on the interpretation of simple rule-based models using linguistic variables.
Article
Average surface roughness value (Ra) is an important measure of the quality of a machined work piece. Lower the Ra value, the higher is the work piece quality and vice versa. It is therefore desirable to develop mathematical models that can predict the minimal Ra value and the associated machining conditions that can lead to this value. In this paper, real experimental data from an end milling process is used to develop models for predicating minimum Ra value. Two techniques, Model Tree and Sequential Minimal Optimization based Support Vector Machine, which have not been used before to model surface roughness, were applied to the training data to build prediction models. The developed models were then applied to the test data to determine minimum Ra value. Results indicate that both techniques reduced the minimum Ra value of experimental data by 4.2% and 2.1% respectively. Model trees are found to be better than other approaches in predicting minimum Ra value.
Article
One of the major difficulties facing researchers using neural networks is the selection of the proper size and topology of the networks. The problem is even more complex because often when the neural network is trained to very small errors, it may not respond properly for patterns not used in the training process. A partial solution proposed to this problem is to use the least possible number of neurons along with a large number of training patterns. The discussion consists of three main parts: first, different learning algorithms, including the Error Back Propagation (EBP) algorithm, the Levenberg Marquardt (LM) algorithm, and the recently developed Neuron-by-Neuron (NBN) algorithm, are discussed and compared based on several benchmark problems; second, the efficiency of different network topologies, including traditional Multilayer Perceptron (MLP) networks, Bridged Multilayer Perceptron (BMLP) networks, and Fully Connected Cascade (FCC) networks, are evaluated by both theoretical analysis and experimental results; third, the generalization issue is discussed to illustrate the importance of choosing the proper size of neural networks.
Article
Current trends in prenatal genetic testing will affect nursing practice, education, research, and policy making. Although fetal genetic testing has been the traditional focus, new technologies open the possibility of acquiring genomic information for both parents and offspring, revealing windows onto individuals' lifelong health. Noninvasive prenatal testing of cell-free fetal DNA also has become a reality. Some of the recent advances in detecting cytogenetic and heritable molecular variants in pregnancy are overviewed. Exemplars of prenatal tests are presented and related ethical, legal, and social implications are considered. Educating clinicians with updated genomic knowledge has been outpaced by new technologies and direct-to-consumer marketing of prenatal tests. Implications for nursing are discussed.
Article
Objective: To study the effect of different government prenatal screening (PNS) policies on the uptake of PNS and prenatal diagnostic testing (PND) over the periods 2001-2003 (PNS on request), 2004-2006 (permission to offer the first-trimester combined test (FCT) to women of advanced maternal age (AMA), with women aged <36 years informed on explicit request) and 2007-2010 (introduction of population screening) and to evaluate whether trends in uptake are related to maternal age. The indication AMA for PND is still warranted, and the costs for FCT are only reimbursed for AMA women. Study design: Analysis of data on the first- and second-trimester screening program (n=41,600) for Down syndrome (DS) and on PND (n=10,795) performed from 2001 to 2010 in the region North-Holland of the Netherlands. To evaluate the actual participation in PNS and PND in different maternal age groups, estimation of the age distribution of women who underwent a fetal anomaly scan in 2009 (n=14,481) was used as a reference population (participation of 85.2%). Results: The overall uptake of FCT was 35.2% in 2010. Over the years the number of FCT in all age groups increased significantly (P<0.001). Overall the number of PND decreased significantly; the number of PND for AMA decreased and the number of PND for increased risk at FCT (in women <36 and ≥36 years) increased (P<0.05). Since 2004 significantly more DS cases were detected with FCT in AMA women and fewer with PND for AMA, and since 2007 more DS cases were detected with FCT in women <36 years (P<0.001). Conclusion: The effect of the national screening program is limited. Significantly more women opt for PNS but the overall uptake remains low, especially in younger women. A significant number of AMA women still opt for PND for AMA. The choice for FCT and PND for AMA seems dependent on background risk. To accomplish a more effective screening policy, reimbursement of the cost of the test should apply to all women and the indication for PND for AMA should be abolished.
Article
This article presents the classification of blood characteristics by a C4.5 decision tree, a naïve Bayes classifier and a multilayer perceptron for thalassaemia screening. The aim is to classify eighteen classes of thalassaemia abnormality, which have a high prevalence in Thailand, and one control class by inspecting data characterised by a complete blood count (CBC) and haemoglobin typing. Two indices namely a haemoglobin concentration (HB) and a mean corpuscular volume (MCV) are the chosen CBC attributes. On the other hand, known types of haemoglobin from six ranges of retention time identified via high performance liquid chromatography (HPLC) are the chosen haemoglobin typing attributes. The stratified 10-fold cross-validation results indicate that the best classification performance with average accuracy of 93.23% (standard deviation=1.67%) and 92.60% (standard deviation=1.75%) is achieved when the naïve Bayes classifier and the multilayer perceptron are respectively applied to samples which have been pre-processed by attribute discretisation. The results also suggest that the HB attribute is redundant. Moreover, the achieved classification performance is significantly higher than that obtained using only haemoglobin typing attributes as classifier inputs. Subsequently, the naïve Bayes classifier and the multilayer perceptron are applied to an additional data set in a clinical trial which respectively results in accuracy of 99.39% and 99.71%. These results suggest that a combination of CBC and haemoglobin typing analysis with a naïve Bayes classifier or a multilayer perceptron is highly suitable for automatic thalassaemia screening.
Article
Preconceptual and prenatal counseling recommendations have changed significantly over the past 12 months. Since January 2007, two new prenatal screening and testing practice bulletins have been published by the American College of Obstetrics and Gynecologists (ACOG).1,2 These changes are substantive and impact every clinician providing counseling to women who are pregnant or are considering pregnancy. These recommendations are also changing the prenatal experience for women; they will certainly notice the difference if this is not their first pregnancy. Clinicians must be prepared to answer patient questions, provide accurate information, and order the correct tests at the appropriate times. This article reviews the recent screening recommendations and describes the advantages and limitations of the screening tests.
Conference Paper
In recent years, mining with imbalanced data sets receives more and more attentions in both theoretical and practical aspects. This paper introduces the importance of imbalanced data sets and their broad application domains in data mining, and then summarizes the evaluation metrics and the existing methods to evaluate and solve the imbalance problem. Synthetic minority over-sampling technique (SMOTE) is one of the over-sampling methods addressing this problem. Based on SMOTE method, this paper presents two new minority over-sampling methods, borderline-SMOTE1 and borderline-SMOTE2, in which only the minority examples near the borderline are over-sampled. For the minority class, experiments show that our approaches achieve better TP rate and F-value than SMOTE and random over-sampling methods.
Article
One of keys for multilayer perceptrons (MLPs) to solve the multi-class learning problems is how to make them get good convergence and generalization performances merely through learning small-scale subsets, i.e., a small part of the original larger-scale data sets. This paper first decomposes an n-class problem into n two-class problems, and then uses n class-modular MLPs to solve them one by one. A class-modular MLP is responsible for forming the decision boundaries of its represented class, and thus can be trained only by the samples from the represented class and some neighboring ones. When solving a two-class problem, an MLP has to face with such unfavorable situations as unbalanced training data, locally sparse and weak distribution regions, and open decision boundaries. One of solutions is that the samples from the minority classes or in the thin regions are virtually reinforced by suitable enlargement factors. And next, the effective range of an MLP is localized by a correction coefficient related to the distribution of its represented class. In brief, this paper focuses on the formation of economic learning subsets, the virtual balance of imbalanced training sets, and the localization of generalization regions of MLPs. The results for the letter and the extended handwritten digital recognitions show that the proposed methods are effective.
Conference Paper
In this paper we discuss problems of inducing classifiers from imbalanced data and improving recognition of minority class using focused resampling techniques. We are particularly interested in SMOTE over-sampling method that generates new synthetic examples from the minority class be- tween the closest neighbours from this class. However, SMOTE could also overgeneralize the minority class region as it does not consider distribution of other neighbours from the majority classes. Therefore, we introduce a new generalization of SMOTE, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples. In the experiments we compare this method with original SMOTE and its two, the most related, other generalizations Borderline and Safe-Level SMOTE. All these pre-processing methods are applied together with either decision tree or Naive Bayes classifiers. The results show that the new LN-SMOTE method improves evaluation measures for the minority class.
Conference Paper
In recent years, mining with imbalanced data sets receives more and more attentions in both theoretical and practical aspects. This paper introduces the importance of imbalanced data sets and their broad application domains in data mining, and then summarizes the evaluation metrics and the existing methods to evaluate and solve the imbalance problem. Synthetic minority over-sampling technique (SMOTE) is one of the over-sampling methods addressing this problem. Based on SMOTE method, this paper presents two new minority over-sampling methods, borderline-SMOTE1 and borderline-SMOTE2, in which only the minority examples near the borderline are over-sampled. For the minority class, experiments show that our approaches achieve better TP rate and F-value than SMOTE and random over-sampling methods.
Article
Since naïve Bayesian classifiers are suitable for processing discrete attributes, many methods have been proposed for discretizing continuous ones. However, none of the previous studies apply more than one discretization method to the continuous attributes in a data set for naïve Bayesian classifiers. Different approaches employ different information embedded in continuous attributes to determine the boundaries for discretization. It is likely that discretizing the continuous attributes in a data set using different methods can utilize the information embedded in the attributes more thoroughly and thus improve the performance of naïve Bayesian classifiers. In this study, we propose a nonparametric measure to evaluate the dependence level between a continuous attribute and the class. The nonparametric measure is then used to develop a hybrid method for discretizing continuous attributes so that the accuracy of the naïve Bayesian classifier can be enhanced. This hybrid method is tested on 20 data sets, and the results demonstrate that discretizing the continuous attributes in a data set by various methods can generally have a higher prediction accuracy.