Conference Paper

Effective Dimension Reduction Using Sequential Projection Pursuit on Gene Expression Data for Cancer Classification.

Conference: Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Scienes, METMBS '04, June 21-24, 2004, Las Vegas, Nevada, USA
Source: DBLP


Motiviation: Classification is a powerful tool for uncovering interesting phenomena, for example classes of cancer, in microarray data. Due to the small number of observations (n) in comparison to the number of variables (p), genes, classification on microarray data is challenging. Thus, multivariate dimension reduction techniques are commonly used as a precursor to classification of microarray data; typically this is principal component analysis (PCA) or singular value decomposition (SVD). Since PCA and SVD are concerned with explaining the variance-covariance structure of the data, they may not be the best choice when the between-cluster variance is smaller than the within-cluster variance. Recently an attractive alternative to PCA, sequential projection pursuit (SPP), has been introduced which is designed to elicit clustering tendencies in the data. Thus, in some cases SPP may be more appropriate when performing clustering or classification analysis. Results: We compare the performance of SPP to PCA on two cancer gene expression datasets related to leukemia and colon cancer. Using PCA and SPP to reduce the dimensionality of the data to m

1 Read