ABSTRACT: Among the great amount of genes presented in microarray gene expression data, only a small fraction is effective for performing a certain diagnostic test. In this regard, mutual information has been shown to be successful for selecting a set of relevant and nonredundant genes from microarray data. However, information theory offers many more measures such as the f-information measures that may be suitable for selection of genes from microarray gene expression data. This paper presents different f-information measures as the evaluation criteria for gene selection problem. To compute the gene-gene redundancy (respectively, gene-class relevance), these information measures calculate the divergence of the joint distribution of two genes' expression values (respectively, the expression values of a gene and the class labels of samples) from the joint distribution when two genes (respectively, the gene and class label) are considered to be completely independent. The performance of different f-information measures is compared with that of the mutual information based on the predictive accuracy of naive Bayes classifier, K -nearest neighbor rule, and support vector machine. An important finding is that some f-information measures are shown to be effective for selecting relevant and nonredundant genes from microarray data. The effectiveness of different f-information measures, along with a comparison with mutual information, is demonstrated on breast cancer, leukemia, and colon cancer datasets. While some f -information measures provide 100% prediction accuracy for all three microarray datasets, mutual information attains this accuracy only for breast cancer dataset, and 98.6% and 93.6% for leukemia and colon cancer datasets, respectively.
IEEE transactions on bio-medical engineering 10/2008; 56(4):1063-9. · 2.15 Impact Factor