Applying data mining techniques to improve diagnosis in neonatal jaundice

BMC Medical Informatics and Decision Making (Impact Factor: 1.83). 12/2012; 12(1):143. DOI: 10.1186/1472-6947-12-143
Source: PubMed


Hyperbilirubinemia is emerging as an increasingly common problem in newborns due to a decreasing hospital length of stay after birth. Jaundice is the most common disease of the newborn and although being benign in most cases it can lead to severe neurological consequences if poorly evaluated. In different areas of medicine, data mining has contributed to improve the results obtained with other methodologies.Hence, the aim of this study was to improve the diagnosis of neonatal jaundice with the application of data mining techniques.

This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology.This observational study was performed at the Obstetrics Department of a central hospital (Centro Hospitalar Tâmega e Sousa--EPE), from February to March of 2011. A total of 227 healthy newborn infants with 35 or more weeks of gestation were enrolled in the study. Over 70 variables were collected and analyzed. Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer.Different attribute subsets were used to train and test classification models using algorithms included in Weka data mining software, such as decision trees (J48) and neural networks (multilayer perceptron). The accuracy results were compared with the traditional methods for prediction of hyperbilirubinemia.

The application of different classification algorithms to the collected data allowed predicting subsequent hyperbilirubinemia with high accuracy. In particular, at 24 hours of life of newborns, the accuracy for the prediction of hyperbilirubinemia was 89%. The best results were obtained using the following algorithms: naive Bayes, multilayer perceptron and simple logistic.

The findings of our study sustain that, new approaches, such as data mining, may support medical decision, contributing to improve diagnosis in neonatal jaundice.

Download full-text


Available from: Alberto Freitas, Feb 26, 2014
    • "Data mining is a non-trivial process of identifying novel, potentially useful, and valid patterns in databases [14]. This area of computer science is characterized by the use of various statistical techniques, databases, artificial intelligence , and pattern recognition [15]. It has commonly been used in several applied fields such as marketing, fraud detection, performance, medicine and scientific research, among others, contributing to the generation of new associations and improving the results obtained with other methodologies [16]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Neuroticism and impulsivity are the personality variables most consistently associated with drug-dependent patients. To date, no data mining procedures have been applied to explore the differential role of personality variables in this population. The personality profile of 336 drug-dependent patients was compared with that of a sample of community participants in the context of a decision tree learning approach using the Alternative Five Factor Model. The resulting discriminant model was cross-validated. Neuroticism and impulsivity were the most relevant variables in the resulting model, but their association appeared to be hierarchically organized. In the personality characterization of these patients, neuroticism became the main discriminant dimension, whereas impulsivity played a differential role, explained by means of an interaction effect. Decision tree learning models appear to be a heuristic theoretical and empirical approximation to the study of relevant variables, such as personality traits, in drug-dependency research.
    Comprehensive psychiatry 04/2014; 55(5). DOI:10.1016/j.comppsych.2014.03.021 · 2.25 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Objective Staging of breast cancer is one of the most important prognostic factors. However, collecting data for staging manually from unstructured free text is variable and imprecise because of the complexity of the TNM classification, the existence of different versions over time, and variability in the source used to obtain data. The aim of this study was to develop an artificial intelligence tool to allow data on tumoral staging to be mined automatically.Patients and methodsThe study included the reports of the first 100 patients with nonmetastatic breast cancer treated with surgery and radiotherapy in 2012. Data on postoperative tumor size (TNM seventh edition) were collected with a specially designed software tool and manually by a third-year resident physician in radiation oncology.ResultsThe software application detected 62% of cases when pathology reports were included, and 77% when radiation oncology reports were added. Non-detection was due to the information being stored in another section of the clinical station. When we compared the results of the software application and manual collection, we found a difference of 13% (10/77). In these 10 cases, the application was correct in 50%, while manual collection was correct in the remaining 50%.Conclusions This innovative system allows automatic staging of tumoral size in breast cancer. The use of this tool would save time in data collection and prevent errors in tumoral classification and could also improve therapeutic decisions.
    04/2013; 26(2). DOI:10.1016/j.senol.2013.02.006
  • [Show abstract] [Hide abstract]
    ABSTRACT: Data mining is a powerful method to extract knowledge from data. Raw data faces various challenges that make traditional method improper for knowledge extraction. Data mining is supposed to be able to handle various data types in all formats. Relevance of this paper is emphasized by the fact that data mining is an object of research in different areas. In this paper, we review previous works in the context of knowledge extraction from medical data. The main idea in this paper is to describe key papers and provide some guidelines to help medical practitioners. Medical data mining is a multidisciplinary field with contribution of medicine and data mining. Due to this fact, previous works should be classified to cover all users’ requirements from various fields. Because of this, we have studied papers with the aim of extracting knowledge from structural medical data published between 1999 and 2013. We clarify medical data mining and its main goals. Therefore, each paper is studied based on the six medical tasks: screening, diagnosis, treatment, prognosis, monitoring and management. In each task, five data mining approaches are considered: classification, regression, clustering, association and hybrid. At the end of each task, a brief summarization and discussion are stated. A standard framework according to CRISP-DM is additionally adapted to manage all activities. As a discussion, current issue and future trend are mentioned. The amount of the works published in this scope is substantial and it is impossible to discuss all of them on a single work. We hope this paper will make it possible to explore previous works and identify interesting areas for future research.
    Expert Systems with Applications 07/2014; 41(9):4434–4463. DOI:10.1016/j.eswa.2014.01.011 · 2.24 Impact Factor
Show more