Article

Feature selection using Haar wavelet power spectrum.

ABV-Indian Institute of Information Technology and Management, Gwalior, India.
BMC Bioinformatics (impact factor: 2.75). 02/2006; 7:432. DOI:10.1186/1471-2105-7-432 pp.432
Source: PubMed

ABSTRACT Feature selection is an approach to overcome the 'curse of dimensionality' in complex researches like disease classification using microarrays. Statistical methods are utilized more in this domain. Most of them do not fit for a wide range of datasets. The transform oriented signal processing domains are not probed much when other fields like image and video processing utilize them well. Wavelets, one of such techniques, have the potential to be utilized in feature selection method. The aim of this paper is to assess the capability of Haar wavelet power spectrum in the problem of clustering and gene selection based on expression data in the context of disease classification and to propose a method based on Haar wavelet power spectrum.
Haar wavelet power spectra of genes were analysed and it was observed to be different in different diagnostic categories. This difference in trend and magnitude of the spectrum may be utilized in gene selection. Most of the genes selected by earlier complex methods were selected by the very simple present method. Each earlier works proved only few genes are quite enough to approach the classification problem 1. Hence the present method may be tried in conjunction with other classification methods. The technique was applied without removing the noise in data to validate the robustness of the method against the noise or outliers in the data. No special software or complex implementation is needed. The qualities of the genes selected by the present method were analysed through their gene expression data. Most of them were observed to be related to solve the classification issue since they were dominant in the diagnostic category of the dataset for which they were selected as features.
In the present paper, the problem of feature selection of microarray gene expression data was considered. We analyzed the wavelet power spectrum of genes and proposed a clustering and feature selection method useful for classification based on Haar wavelet power spectrum. Application of this technique in this area is novel, simple, and faster than other methods, fit for a wide range of data types. The results are encouraging and throw light into the possibility of using this technique for problem domains like disease classification, gene network identification and personalized drug design.

0 0
 · 
0 Bookmarks
 · 
43 Views
  • Article: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.
    [show abstract] [hide abstract]
    ABSTRACT: The purpose of this study was to develop a method of classifying cancers to specific diagnostic categories based on their gene expression signatures using artificial neural networks (ANNs). We trained the ANNs using the small, round blue-cell tumors (SRBCTs) as a model. These cancers belong to four distinct diagnostic categories and often present diagnostic dilemmas in clinical practice. The ANNs correctly classified all samples and identified the genes most relevant to the classification. Expression of several of these genes has been reported in SRBCTs, but most have not been associated with these cancers. To test the ability of the trained ANN models to recognize SRBCTs, we analyzed additional blinded samples that were not previously used for the training procedure, and correctly classified them in all cases. This study demonstrates the potential applications of these methods for tumor diagnosis and the identification of candidate targets for therapy.
    Nature Medicine 07/2001; 7(6):673-9. · 22.46 Impact Factor
  • Source
    Article: A Bayesian approach to nonlinear probit gene selection and classification
    [show abstract] [hide abstract]
    ABSTRACT: We consider the problem of gene selection and classification based on the expression data. Specifically, we propose a bootstrap Bayesian gene selection method for nonlinear probit regression. A binomial probit regression model with data augmentation is used to transform the binomial problem into a sequence of smoothing problems. The probit regressor is approximated as a nonlinear combination of the genes. A Gibbs sampler is employed to find the strongest genes. Some numerical techniques to speed up the computation are discussed. We then develop a nonlinear probit Bayesian classifier consisting of a linear term plus a nonlinear term, the parameters of which are estimated using the sequential Monte Carlo technique. These new methods are applied to analyze several data sets, including the hereditary breast cancer data, the small round blue-cell tumor data, and the acute leukemia tumor data. The experimental results show the proposed methods can effectively find important genes which are consistent with the existing biological belief, and the classification accuracies are very high. Some robustness and sensitivity properties of the proposed methods are also discussed to deal with noisy microarray data.
    Journal of the Franklin Institute.
  • Article: Microglia and astrocytes in the adult rat brain: comparative immunocytochemical analysis demonstrates the efficacy of lipocortin 1 immunoreactivity.
    [show abstract] [hide abstract]
    ABSTRACT: The distribution of glial cells (microglia and astrocytes) in different regions of normal adult rat brain was studied using immunohistochemical techniques and computer analysis. Lipocortin 1, phosphotyrosine, and lectin GSA B(4), were used for identification of microglia, while S100beta and glial fibrillary acidic protein identified astrocytes. Bioquant computerized image analysis was used to quantify and map the immunostained cells in sections from adult rat brain. If lipocortin 1 was used as a marker, more microglial cells were detected than with phosphotyrosine or lectin. The lipocortin 1-positive microglial population was most numerous (on average, 130+/-5 cells/mm(2) of the brain section area) in neostriatum, and least (51+/-4 cells/mm(2)) in cerebellum and medulla oblongata. In general, the density of lipocortin 1 microglia was higher in the forebrain, and lower in the midbrain, and the least in the brainstem and cerebellum. The number of S100beta astrocytes was two to three times larger than the number of microglial cells, and approximately two times greater than glial fibrillary acidic protein cells. A high density of astrocytes was found in the hypothalamus and hippocampus (more than 260 cells/mm(2)); they were more numerous in the white matter than in the gray matter. Fewer astrocytes were observed in the cerebral cortex, neostriatum, midbrain, medulla oblongata and cerebellum (less than 200 cells/mm(2)). Thus lipocortin 1 and S100beta were shown to be the most specific and reliable markers for microglia and astrocytes, respectively. The regional population differences demonstrated for lipocortin 1 microglia and S100beta astrocytes presumably reflect structural and functional specializations of the certain brain regions.
    Neuroscience 02/2000; 96(1):195-203. · 3.38 Impact Factor

Full-text

View
1 Download
Available from

Keywords

classification methods
 
classification problem 1
 
complex methods
 
complex researches
 
data types
 
diagnostic category
 
different diagnostic categories
 
disease classification
 
drug design
 
expression data
 
feature selection method
 
feature selection method useful
 
gene expression data
 
Haar wavelet power spectrum
 
microarray gene expression data
 
signal processing domains
 
simple present method
 
Statistical methods
 
wavelet power spectrum
 
wide range
 

Prabakaran Subramani