Article
Lower Dimensional Representation of Text Data Based on Centroids and Least Squares
University of Minnesota; Univ. of California, Santa Barbara; University of California
BIT (impact factor:
0.72).
05/2003;
43(2):427-448.
DOI:10.1023/A:1026039313770
pp.427-448
-
Citations (0)
- Cited In (9)
-
Article: Feature extraction and dimensionality reduction for mass spectrometry data.
[show abstract] [hide abstract]
ABSTRACT: Mass spectrometry is being used to generate protein profiles from human serum, and proteomic data obtained from mass spectrometry have attracted great interest for the detection of early stage cancer. However, high dimensional mass spectrometry data cause considerable challenges. In this paper we propose a feature extraction algorithm based on wavelet analysis for high dimensional mass spectrometry data. A set of wavelet detail coefficients at different scale is used to detect the transient changes of mass spectrometry data. The experiments are performed on 2 datasets. A highly competitive accuracy, compared with the best performance of other kinds of classification models, is achieved. Experimental results show that the wavelet detail coefficients are efficient way to characterize features of high dimensional mass spectra and reduce the dimensionality of high dimensional mass spectra.Computers in biology and medicine 10/2009; 39(9):818-23. · 1.27 Impact Factor -
Article: Generalized linear discriminant analysis: a unified framework and efficient model selection.
[show abstract] [hide abstract]
ABSTRACT: High-dimensional data are common in many domains, and dimensionality reduction is the key to cope with the curse-of-dimensionality. Linear discriminant analysis (LDA) is a well-known method for supervised dimensionality reduction. When dealing with high-dimensional and low sample size data, classical LDA suffers from the singularity problem. Over the years, many algorithms have been developed to overcome this problem, and they have been applied successfully in various applications. However, there is a lack of a systematic study of the commonalities and differences of these algorithms, as well as their intrinsic relationships. In this paper, a unified framework for generalized LDA is proposed, which elucidates the properties of various algorithms and their relationships. Based on the proposed framework, we show that the matrix computations involved in LDA-based algorithms can be simplified so that the cross-validation procedure for model selection can be performed efficiently. We conduct extensive experiments using a collection of high-dimensional data sets, including text documents, face images, gene expression data, and gene expression pattern images, to evaluate the proposed theories and algorithms.IEEE Transactions on Neural Networks 11/2008; 19(10):1768-82. · 2.95 Impact Factor -
Conference Proceeding: Identifying biomarkers for acupuncture treatment via an optimization model
[show abstract] [hide abstract]
ABSTRACT: Identifying biomarkers for acupuncture treatment is crucial to understand the mechanism of acupuncture effect at molecular level. In this study, we investigate the metabolic profiles of acupuncture treatment on several meridian points in human. To identify the subsets of metabolites that best characterize the acupuncture effect for each meridian point, a linear programming based model is proposed to identify biomarkers from the high-dimensional metabolic data. Specifically, we use nearest centroid as prototype to simultaneously minimize the number of selected features and leave-one-out cross validation error of the classifier. As a result, we reveal novel metabolite biomarkers for acupuncture treatment. Our result demonstrates that metabolic profiling might be a promising method to investigating the molecular mechanism of acupuncture. Comparison with other existing methods shows the efficiency and effectiveness of our new method. In addition, the method proposed in this paper is general and can be used in other high-dimensional applications, such as cancer genomics.Systems Biology (ISB), 2011 IEEE International Conference on; 10/2011
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed.
The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual
current impact factor.
Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence
agreement may be applicable.
Keywords
certain classification problems
computational efficiency
data clusters
dimension reduction
dimension reduction algorithms
handling massive amounts
information retrieval system
lower dimensional representation
mathematical framework
matrix rank reduction formula
new methods
priori information
reduced dimensional space
reduced space
Singular Value Decomposition
successful lower dimensional representation
text data
today's vector space
used Latent Semantic Indexing
vector space