Article
A comparative study of discriminating human heart failure etiology using gene expression profiles.
Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA.
BMC Bioinformatics (impact factor:
2.75).
02/2005;
6:205.
DOI:10.1186/1471-2105-6-205
pp.205
Source: PubMed
-
Article: Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data.
[show abstract] [hide abstract]
ABSTRACT: MOTIVATION: Novel methods, both molecular and statistical, are urgently needed to take advantage of recent advances in biotechnology and the human genome project for disease diagnosis and prognosis. Mass spectrometry (MS) holds great promise for biomarker identification and genome-wide protein profiling. It has been demonstrated in the literature that biomarkers can be identified to distinguish normal individuals from cancer patients using MS data. Such progress is especially exciting for the detection of early-stage ovarian cancer patients. Although various statistical methods have been utilized to identify biomarkers from MS data, there has been no systematic comparison among these approaches in their relative ability to analyze MS data. RESULTS: We compare the performance of several classes of statistical methods for the classification of cancer based on MS spectra. These methods include: linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor classifier, bagging and boosting classification trees, support vector machine, and random forest (RF). The methods are applied to ovarian cancer and control serum samples from the National Ovarian Cancer Early Detection Program clinic at Northwestern University Hospital. We found that RF outperforms other methods in the analysis of MS data.Bioinformatics 10/2003; 19(13):1636-43. · 5.47 Impact Factor -
Article: Regression approaches for microarray data analysis.
[show abstract] [hide abstract]
ABSTRACT: A variety of new procedures have been devised to handle the two-sample comparison (e.g., tumor versus normal tissue) of gene expression values as measured with microarrays. Such new methods are required in part because of some defining characteristics of microarray-based studies: (i) the very large number of genes contributing expression measures which far exceeds the number of samples (observations) available and (ii) the fact that by virtue of pathway/network relationships, the gene expression measures tend to be highly correlated. These concerns are exacerbated in the regression setting, where the objective is to relate gene expression, simultaneously for multiple genes, to some external outcome or phenotype. Correspondingly, several methods have been recently proposed for addressing these issues. We briefly critique some of these methods prior to a detailed evaluation of gene harvesting. This reveals that gene harvesting, without additional constraints, can yield artifactual solutions. Results obtained employing such constraints motivate the use of regularized regression procedures such as the lasso, least angle regression, and support vector machines. Model selection and solution multiplicity issues are also discussed. The methods are evaluated using a microarray-based study of cardiomyopathy in transgenic mice.Journal of Computational Biology 02/2003; 10(6):961-80. · 1.55 Impact Factor -
Article: Regression shrinkage and selection via the lasso
J. Royal. Statist. Soc B. 58(1):267-288.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed.
The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual
current impact factor.
Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence
agreement may be applicable.
Keywords
complex disease
developed discriminant methods
different responses
discriminating performance
environmental factors
five methods
five statistical methods
gene expression data
gene expression libraries
gene expression profiles
guide appropriate therapy
Human heart failure
multiclass classification
new statistical methods
non-ischemic heart disease present clinically
non-ischemic heart failure
random forest
real datasets
separate institutions
ventricular function