Article

Logical analysis of diffuse large B-cell lymphomas

Robert Wood Johnson University Hospital, Нью-Брансуик, New Jersey, United States
Artificial Intelligence in Medicine (Impact Factor: 2.02). 08/2005; 34(3):235-67. DOI: 10.1016/j.artmed.2004.11.004
Source: PubMed

ABSTRACT

The goal of this study is to re-examine the oligonucleotide microarray dataset of Shipp et al. (www.genome.wi.mit.du/MPR/lymphoma), which contains the intensity levels of 6817 genes of 58 patients with diffuse large B-cell lymphoma (DLBCL) and 19 with follicular lymphoma (FL), by means of the combinatorics, optimisation, and logic-based methodology of logical analysis of data (LAD). The motivations for this new analysis included the previously demonstrated capabilities of LAD and its expected potential (1) to identify different informative genes than those discovered by conventional statistical methods, (2) to identify combinations of gene expression levels capable of characterizing different types of lymphoma, and (3) to assemble collections of such combinations that if considered jointly are capable of accurately distinguishing different types of lymphoma.

2 Followers
 · 
15 Reads
  • Source
    • "LAD has been shown to offer important insights into problems ranging from oil exploration [2], labor productivity analysis [37] and country creditworthiness evaluation [38], to medical application (for example, risk evaluation among cardiac patients [39,40]), polymer design for artificial bones [41], computerized pulmonology [42], genomic-based diagnosis and prognosis of lymphoma [43], and proteomics-based ovarian cancer diagnosis [44]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The potential of applying data analysis tools to microarray data for diagnosis and prognosis is illustrated on the recent breast cancer dataset of van 't Veer and coworkers. We re-examine that dataset using the novel technique of logical analysis of data (LAD), with the double objective of discovering patterns characteristic for cases with good or poor outcome, using them for accurate and justifiable predictions; and deriving novel information about the role of genes, the existence of special classes of cases, and other factors. Data were analyzed using the combinatorics and optimization-based method of LAD, recently shown to provide highly accurate diagnostic and prognostic systems in cardiology, cancer proteomics, hematology, pulmonology, and other disciplines. LAD identified a subset of 17 of the 25,000 genes, capable of fully distinguishing between patients with poor, respectively good prognoses. An extensive list of 'patterns' or 'combinatorial biomarkers' (that is, combinations of genes and limitations on their expression levels) was generated, and 40 patterns were used to create a prognostic system, shown to have 100% and 92.9% weighted accuracy on the training and test sets, respectively. The prognostic system uses fewer genes than other methods, and has similar or better accuracy than those reported in other studies. Out of the 17 genes identified by LAD, three (respectively, five) were shown to play a significant role in determining poor (respectively, good) prognosis. Two new classes of patients (described by similar sets of covering patterns, gene expression ranges, and clinical features) were discovered. As a by-product of the study, it is shown that the training and the test sets of van 't Veer have differing characteristics. The study shows that LAD provides an accurate and fully explanatory prognostic system for breast cancer using genomic data (that is, a system that, in addition to predicting good or poor prognosis, provides an individualized explanation of the reasons for that prognosis for each patient). Moreover, the LAD model provides valuable insights into the roles of individual and combinatorial biomarkers, allows the discovery of new classes of patients, and generates a vast library of biomedical research hypotheses.
    Full-text · Article · Feb 2006 · Breast cancer research: BCR
  • Source
    • "We have recently developed ([6]) an efficient algorithm for exhaustive pattern extraction from biomedical data. Our method starts by applying a pattern-based multivariate approach (see e.g., [2]) to identify a subset of predictive genes out of a pool of genes by requiring them to satisfy stringent filtering criteria. Next, we combine the predictions of several machine learning tools trained on the subset of predictive genes and on pattern data with the aim of producing an accurate predictor. "
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the major challenges in cancer diagnosis from microarray data is to develop robust classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose a meta-classification scheme which uses a robust multivariate gene selection procedure and integrates the results of several machine learning tools trained on raw and pattern data. We validate our method by applying it to distinguish diffuse large B-cell lymphoma (DLBCL) from follicular lymphoma (FL) on two independent datasets: the HuGeneFL Affmetrixy dataset of Shipp et al. (www. genome. wi. mit. du/MPR/lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our meta-classification technique achieves higher predictive accuracies than each of the individual classifiers trained on the same dataset and is robust against various data perturbations. We also find that combinations of p53 responsive genes (e.g., p53, PLK1 and CDK2) are highly predictive of the phenotype.
    Full-text · Conference Paper · Sep 2005
  • Source
    • "We have recently developed ([6]) an efficient algorithm for exhaustive pattern extraction from biomedical data. Our method starts by applying a pattern-based multivariate approach (see e.g., [2]) to identify a subset of predictive genes out of a pool of genes by requiring them to satisfy stringent filtering criteria. Next, we combine the predictions of several machine learning tools trained on the subset of predictive genes and on pattern data with the aim of producing an accurate predictor. "
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the major challenges in cancer diagnosis from microarray data is to develop robust classification models which are independent of the analysis techniques used and can combine data from different laboratories. We propose a meta-classification scheme which uses a robust multivariate gene selection procedure and integrates the results of several machine learning tools trained on raw and pattern data. We validate our method by applying it to distinguish diffuse large B-cell lymphoma (DLBCL) from follicular lymphoma (FL) on two independent datasets: the HuGeneFL Affmetrixy dataset of Shipp et al. (www. genome.wi.mit.du/MPR /lymphoma) and the Hu95Av2 Affymetrix dataset (DallaFavera's laboratory, Columbia University). Our meta-classification technique achieves higher predictive accuracies than each of the individual classifiers trained on the same dataset and is robust against various data perturbations. We also find that combinations of p53 responsive genes (e.g., p53, PLK1 and CDK2) are highly predictive of the phenotype.
    Full-text · Article · Feb 2005 · Proceedings / IEEE Computational Systems Bioinformatics Conference, CSB. IEEE Computational Systems Bioinformatics Conference
Show more