A new classification method using array Comparative Genome Hybridization data, based on the concept of Limited Jumping Emerging Patterns

Faculty of Electronics and Information Technology of Warsaw University of Technology, Institute of Computer Science, Nowowiejska 15/19, Warsaw, 00-665, Poland.
BMC Bioinformatics (Impact Factor: 2.58). 02/2009; 10 Suppl 1(Suppl 1):S64. DOI: 10.1186/1471-2105-10-S1-S64
Source: PubMed


Classification using aCGH data is an important and insufficiently investigated problem in bioinformatics. In this paper we propose a new classification method of DNA copy number data based on the concept of limited Jumping Emerging Patterns. We present the comparison of our limJEPClassifier to SVM which is considered the most successful classifier in the case of high-throughput data.
Our results revealed that the classification performance using limJEPClassifier is significantly higher than other methods. Furthermore, we show that application of the limited JEP's can significantly improve classification, when strongly unbalanced data are given.
Nowadays, aCGH has become a very important tool, used in research of cancer or genomic disorders. Therefore, improving classification of aCGH data can have a great impact on many medical issues such as the process of diagnosis and finding disease-related genes. The performed experiment shows that the application of Jumping Emerging Patterns can be effective in the classification of high-dimensional data, including these from aCGH experiments.

Download full-text


Available from: Krzysztof Walczak, Dec 21, 2013
  • Source
    • "An extra advantage of the feature selection process is that the majority of the irrelevant features are discarded and the few remaining can be indicators of possible biomarkers related to the observed disease. Feature selection has already been shown to significantly benefit the classification accuracy of aCGH data [9], [12], [22]. Thus, it is important to design effective feature selection method for identifying the DNA copy number biomarkers. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Array comparative genomic hybridization (aCGH) is a newly introduced method for the detection of copy number abnormalities associated with human diseases with special focus on cancer. Specific patterns in DNA copy number variations (CNVs) can be associated with certain disease types and can facilitate prognosis and progress monitoring of the disease. Machine learning techniques have been used to model the problem of tissue typing as a classification problem. Feature selection is an important part of the classification process, because many biological features are not related to the diseases and confuse the classification tasks. Multiple feature selection methods have been proposed in the different domains where classification has been applied. In this work, we will present a new feature selection method based on structured sparsity-inducing norms to identify the informative aCGH biomarkers which can help us classify different disease subtypes. To validate the performance of the proposed method, we experimentally compare it with existing feature selection methods on four publicly available aCGH data sets. In all empirical results, the proposed sparse learning based feature selection method consistently outperforms other related approaches. More important, we carefully investigate the aCGH biomarkers selected by our method, and the biological evidences in literature strongly support our results.
    Full-text · Article · Jan 2014 · IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a new classification algorithm based on Jumping Emerging Patterns (JEPs), that have the highest impact on classification accuracy. The core idea of our method is the application of a new ¿REAL/ALL¿ coefficient, which is used to compare the discriminating power among various groups of JEPs. The efficacy of the proposed approach was confirmed by tests performed on both synthetic and real data sets. The results show that our method may significantly improve the classification performance in comparison to other classifiers based on JEPs.
    Full-text · Conference Paper · Dec 2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Analysis of DNA copy number profiles requires methods tailored to the specific nature of these data. The number of available data analysis methods has grown enormously in the last 5 years. We discuss the typical characteristics of DNA copy number data, as measured by microarray technology and review the extensive literature on preprocessing methods such as segmentation and calling. Subsequently, the focus narrows to applications of DNA copy number in cancer, in particular, several downstream analyses of multi-sample data sets such as testing, clustering and classification. Finally, we look ahead: what should we prepare for and which methodology-related topics may deserve attention in the near future?
    Preview · Article · Feb 2010 · Briefings in Bioinformatics
Show more