Article

A comparative study of discriminating human heart failure etiology using gene expression profiles.

Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.
BMC Bioinformatics (impact factor: 2.75). 02/2005; 6:205. DOI:10.1186/1471-2105-6-205 pp.205
Source: PubMed

ABSTRACT Human heart failure is a complex disease that manifests from multiple genetic and environmental factors. Although ischemic and non-ischemic heart disease present clinically with many similar decreases in ventricular function, emerging work suggests that they are distinct diseases with different responses to therapy. The ability to distinguish between ischemic and non-ischemic heart failure may be essential to guide appropriate therapy and determine prognosis for successful treatment. In this paper we consider discriminating the etiologies of heart failure using gene expression libraries from two separate institutions.
We apply five new statistical methods, including partial least squares, penalized partial least squares, LASSO, nearest shrunken centroids and random forest, to two real datasets and compare their performance for multiclass classification. It is found that the five statistical methods perform similarly on each of the two datasets: it is difficult to correctly distinguish the etiologies of heart failure in one dataset whereas it is easy for the other one. In a simulation study, it is confirmed that the five methods tend to have close performance, though the random forest seems to have a slight edge.
For some gene expression data, several recently developed discriminant methods may perform similarly. More importantly, one must remain cautious when assessing the discriminating performance using gene expression profiles based on a small dataset; our analysis suggests the importance of utilizing multiple or larger datasets.

0 0
 · 
0 Bookmarks
 · 
44 Views
  • Source
    Article: Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data.
    [show abstract] [hide abstract]
    ABSTRACT: MOTIVATION: Novel methods, both molecular and statistical, are urgently needed to take advantage of recent advances in biotechnology and the human genome project for disease diagnosis and prognosis. Mass spectrometry (MS) holds great promise for biomarker identification and genome-wide protein profiling. It has been demonstrated in the literature that biomarkers can be identified to distinguish normal individuals from cancer patients using MS data. Such progress is especially exciting for the detection of early-stage ovarian cancer patients. Although various statistical methods have been utilized to identify biomarkers from MS data, there has been no systematic comparison among these approaches in their relative ability to analyze MS data. RESULTS: We compare the performance of several classes of statistical methods for the classification of cancer based on MS spectra. These methods include: linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor classifier, bagging and boosting classification trees, support vector machine, and random forest (RF). The methods are applied to ovarian cancer and control serum samples from the National Ovarian Cancer Early Detection Program clinic at Northwestern University Hospital. We found that RF outperforms other methods in the analysis of MS data.
    Bioinformatics 10/2003; 19(13):1636-43. · 5.47 Impact Factor
  • Source
    Article: Regression approaches for microarray data analysis.
    [show abstract] [hide abstract]
    ABSTRACT: A variety of new procedures have been devised to handle the two-sample comparison (e.g., tumor versus normal tissue) of gene expression values as measured with microarrays. Such new methods are required in part because of some defining characteristics of microarray-based studies: (i) the very large number of genes contributing expression measures which far exceeds the number of samples (observations) available and (ii) the fact that by virtue of pathway/network relationships, the gene expression measures tend to be highly correlated. These concerns are exacerbated in the regression setting, where the objective is to relate gene expression, simultaneously for multiple genes, to some external outcome or phenotype. Correspondingly, several methods have been recently proposed for addressing these issues. We briefly critique some of these methods prior to a detailed evaluation of gene harvesting. This reveals that gene harvesting, without additional constraints, can yield artifactual solutions. Results obtained employing such constraints motivate the use of regularized regression procedures such as the lasso, least angle regression, and support vector machines. Model selection and solution multiplicity issues are also discussed. The methods are evaluated using a microarray-based study of cardiomyopathy in transgenic mice.
    Journal of Computational Biology 02/2003; 10(6):961-80. · 1.55 Impact Factor
  • Source
    Article: Regression shrinkage and selection via the lasso
    J. Royal. Statist. Soc B. 58(1):267-288.

Full-text (2 Sources)

View
0 Downloads
Available from

Keywords

complex disease
 
developed discriminant methods
 
different responses
 
discriminating performance
 
environmental factors
 
five methods
 
five statistical methods
 
gene expression data
 
gene expression libraries
 
gene expression profiles
 
guide appropriate therapy
 
Human heart failure
 
multiclass classification
 
new statistical methods
 
non-ischemic heart disease present clinically
 
non-ischemic heart failure
 
random forest
 
real datasets
 
separate institutions
 
ventricular function