Article
Cluster analysis for gene expression data: a survey
Dept. of Comput. Sci. & Eng., State Univ. of New York, USA;
IEEE Transactions on Knowledge and Data Engineering (Impact Factor: 1.89). 12/2004; 16(11):1370 1386. DOI: 10.1109/TKDE.2004.68 Source: IEEE Xplore

Article: Coclustering of Fuzzy Lagged Data
[Show abstract] [Hide abstract]
ABSTRACT: The paper focuses on mining patterns that are characterized by a fuzzy lagged relationship between the data objects forming them. Such a regulatory mechanism is quite common in real life settings. It appears in a variety of fields: finance, gene expression, neuroscience, crowds and collective movements are but a limited list of examples. Mining such patterns not only helps in understanding the relationship between objects in the domain, but assists in forecasting their future behavior. For most interesting variants of this problem, finding an optimal fuzzy lagged cocluster is an NPcomplete problem. We thus present a polynomialtime MonteCarlo approximation algorithm for mining fuzzy lagged coclusters. We prove that for any data matrix, the algorithm mines a fuzzy lagged cocluster with fixed probability, which encompasses the optimal fuzzy lagged cocluster by a maximum 2 ratio columns overhead and completely no rows overhead. Moreover, the algorithm handles noise, anticorrelations, missing values and overlapping patterns. The algorithm was extensively evaluated using both artificial and real datasets. The results not only corroborate the ability of the algorithm to efficiently mine relevant and accurate fuzzy lagged coclusters, but also illustrate the importance of including the fuzziness in the laggedpattern model.02/2014;  [Show abstract] [Hide abstract]
ABSTRACT: An understanding towards genetics and epigenetics is essential to cope up with the paradigm shift which is underway. Personalized medicine and gene therapy will confluence the days to come. This review highlights traditional approaches as well as current advancements in the analysis of the gene expression data from cancer perspective. Due to improvements in biometric instrumentation and automation, it has become easier to collect a lot of experimental data in molecular biology. Analysis of such data is extremely important as it leads to knowledge discovery that can be validated by experiments. Previously, the diagnosis of complex genetic diseases has conventionally been done based on the nonmolecular characteristics like kind of tumor tissue, pathological characteristics, and clinical phase. The microarray data can be well accounted for high dimensional space and noise. Same were the reasons for ineffective and imprecise results. Several machine learning and data mining techniques are presently applied for identifying cancer using gene expression data. While differences in efficiency do exist, none of the wellestablished approaches is uniformly superior to others. The quality of algorithm is important, but is not in itself a guarantee of the quality of a specific data analysis.IEEE/ACM Transactions on Computational Biology and Bioinformatics 03/2014; 11(3):533  547. · 1.62 Impact Factor 
Conference Paper: Efficient Error Setting for Subspace Miners
[Show abstract] [Hide abstract]
ABSTRACT: A typical mining problem is the extraction of patterns from subspaces of multidimensional data. Such patterns, known as a biclusters, comprise subsets of objects that behave similarly across subsets of attributes, and may overlap each other, i.e., objects/attributes may belong to several patterns, or to none. For many miners, a key input parameter is the maximum allowed error used which greatly affects the quality, quantity and coherency of the mined clusters. As the error is dataset dependent, setting it demands either domain knowledge or some trialanderror. The paper presents a new method for automatically setting the error to the value that maximizes the number of clusters mined. This error value is strongly correlated to the value for which performance scores are maximized. The correlation is extensively evaluated using six datasets, two mining algorithms, seven prevailing performance measures, and compared with five prior literature methods, demonstrating a substantial improvement in the mining score.10th International Conference on Machine Learning and Data Mining, MLDM 2014, St. Petersburg, Russia; 07/2014
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.