-
[show abstract]
[hide abstract]
ABSTRACT: MOTIVATION: The DNA microarray technology has been increasingly used in cancer research. In the literature, discovery of putative classes and classification to known classes based on gene expression data have been largely treated as separate problems. This paper offers a unified approach to class discovery and classification, which we believe is more appropriate, and has greater applicability, in practical situations. RESULTS: We model the gene expression profile of a tumor sample as from a finite mixture distribution, with each component characterizing the gene expression levels in a class. The proposed method was applied to a leukemia dataset, and good results are obtained. With appropriate choices of genes and preprocessing method, the number of leukemia types and subtypes is correctly inferred, and all the tumor samples are correctly classified into their respective type/subtype. Further evaluation of the method was carried out on other variants of the leukemia data and a colon dataset.
Bioinformatics 12/2004; 20(16):2545-52. · 5.47 Impact Factor
-
Genetic Epidemiology 01/2004; 25(4):384-7. · 3.44 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Multilocus calculations, using all available information on all pedigree members, are important for linkage analysis. Exact calculation methods in linkage analysis are limited in either the number of loci or the number of pedigree members they can handle. In this article, we propose a Monte Carlo method for linkage analysis based on sequential imputation. Unlike exact methods, sequential imputation can handle large pedigrees with a moderate number of loci in its current implementation. This Monte Carlo method is an application of importance sampling, in which we sequentially impute ordered genotypes locus by locus, and then impute inheritance vectors conditioned on these genotypes. The resulting inheritance vectors, together with the importance sampling weights, are used to derive a consistent estimator of any linkage statistic of interest. The linkage statistic can be parametric or nonparametric; we focus on nonparametric linkage statistics. We demonstrate that accurate estimates can be achieved within a reasonable computing time. A simulation study illustrates the potential gain in power using our method for multilocus linkage analysis with large pedigrees. We simulated data at six markers under three models. We analyzed them using both sequential imputation and GENEHUNTER. GENEHUNTER had to drop between 38-54% of pedigree members, whereas our method was able to use all pedigree members. The power gains of using all pedigree members were substantial under 2 of the 3 models. We implemented sequential imputation for multilocus linkage analysis in a user-friendly software package called SIMPLE.
Genetic Epidemiology 08/2003; 25(1):25-35. · 3.44 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Motivation: The DNA microarray technology has been increasingly used in cancer research. In the literature, discovery of putative classes and classification to known classes based on gene expression data have been largely treated as separate problems. This article offers a unified approach to class discovery and classification, which we believe is more appropriate, and has greater applicability, in practical situations. Results: We model the gene expression profile of a tumor sample as from a finite mixture distribution, with each component characterizing the gene expression levels in a class. The proposed method was applied to a leukemia dataset, and good results are obtained. With appropriate choices of genes and preprocessing method, the number of leukemia types and subtypes is correctly inferred, and all the tumor samples are correctly classified into their respective type/subtype. Further evaluation of the method was carried out on other variants of the leukemia data and a colon dataset. Supplementary Information: The program implementing the method and additional details and figures are at http://www.stat.ohio-state.edu/~statgen/PAPERS/DNC-MIX.html.