Classification and Error Estimation for Discrete Data

Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77845, USA.
Current Genomics (Impact Factor: 2.87). 11/2009; 10(7):446-62. DOI: 10.2174/138920209789208228
Source: PubMed

ABSTRACT Discrete classification is common in Genomic Signal Processing applications, in particular in classification of discretized gene expression data, and in discrete gene expression prediction and the inference of boolean genomic regulatory networks. Once a discrete classifier is obtained from sample data, its performance must be evaluated through its classification error. In practice, error estimation methods must then be employed to obtain reliable estimates of the classification error based on the available data. Both classifier design and error estimation are complicated, in the case of Genomics, by the prevalence of small-sample data sets in such applications. This paper presents a broad review of the methodology of classification and error estimation for discrete data, in the context of Genomics, focusing on the study of performance in small sample scenarios, as well as asymptotic behavior.


Available from: Ulisses M Braga-Neto, Apr 20, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: The binary Coefficient of Determination (CoD) is a key component of inference methods in Genomic Signal Processing. Assuming a stochastic logic model, we introduce a new sample CoD estimator based upon maximum likelihood (ML) estimation. Experiments have been conducted to assess how the ML CoD estimator performs in recovering predictors in multivariate prediction settings. Performance is compared with the traditional nonparametric CoD estimators based on resubstitution, leave-one-out, bootstrap and cross-validation. The results show that the ML CoD estimator is the estimator of choice if prior knowledge is available about the logic relationships in the model, even if this knowledge is incomplete.
    Circuits, Systems and Computers, 1977. Conference Record. 1977 11th Asilomar Conference on 01/2011; DOI:10.1109/ACSSC.2011.6190164
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: NOWADAYS, THERE ARE MOLECULAR BIOLOGY TECHNIQUES PROVIDING INFORMATION RELATED TO CERVICAL CANCER AND ITS CAUSE: the human Papillomavirus (HPV), including DNA microarrays identifying HPV subtypes, mRNA techniques such as nucleic acid based amplification or flow cytometry identifying E6/E7 oncogenes, and immunocytochemistry techniques such as overexpression of p16. Each one of these techniques has its own performance, limitations and advantages, thus a combinatorial approach via computational intelligence methods could exploit the benefits of each method and produce more accurate results. In this article we propose a clinical decision support system (CDSS), composed by artificial neural networks, intelligently combining the results of classic and ancillary techniques for diagnostic accuracy improvement. We evaluated this method on 740 cases with complete series of cytological assessment, molecular tests, and colposcopy examination. The CDSS demonstrated high sensitivity (89.4%), high specificity (97.1%), high positive predictive value (89.4%), and high negative predictive value (97.1%), for detecting cervical intraepithelial neoplasia grade 2 or worse (CIN2+). In comparison to the tests involved in this study and their combinations, the CDSS produced the most balanced results in terms of sensitivity, specificity, PPV, and NPV. The proposed system may reduce the referral rate for colposcopy and guide personalised management and therapeutic interventions.
    04/2014; 2014:341483. DOI:10.1155/2014/341483
  • [Show abstract] [Hide abstract]
    ABSTRACT: Several current research projects are focused on the creation of haplotype maps to identify and describe common genetic variation in some species. Studies on haplotype maps are key in understanding how natural selection has produced genomic differences between subspecies of a given species. Important insight can be obtained by determining which variations in the genotype are associated with important phenotypical differences between individuals. Pattern recognition theory and machine learning techniques are useful tools to reveal this connection from a large amount of data provided by haplotype maps. In this work, we applied discrete classifiers and feature selection techniques for the prediction of cattle coat color from genotypes. We compared the performance of different classification rules and showed the feasibility of this approach for the prediction of phenotype based on genotype.
    VI Latin American Congress on Biomedical Engineering CLAIB 2014, Paraná, Argentina 29, 30 & 31 October 2014, Edited by Ariel Braidot, Alejandro Hadad, 03/2015: pages 671–674; Springer International Publishing, Berlin Heidelberg., ISBN: 978-3-319-13116-0