About
43
Publications
18,436
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
768
Citations
Additional affiliations
August 1998 - present
Publications
Publications (43)
Identifying potential novel subtypes of cancers from genomic data requires techniques to estimate the number of natural clusters in the data. Determining the number of natural clusters in a dataset has been a challenging problem in Machine Learning. Employing an internal cluster validity index such as Silhouette Index together with a clustering alg...
Background: Tree-based Long Short Term Memory (LSTM) network has become state-of-the-art for modeling the meaning of language texts as they can effectively exploit the grammatical syntax and thereby non-linear dependencies among words of the sentence. However, most of these models cannot recognize the difference in meaning caused by a change in sem...
Tree-based Long short term memory (LSTM) network has become state-of-the-art for modeling the meaning of language texts as they can effectively exploit the grammatical syntax and thereby non-linear dependencies among words of the sentence. However, most of these models cannot recognize the difference in meaning caused by a change in semantic roles...
Background and objective:
Retrieval of medical images from an anatomically diverse dataset is a challenging task. Objective of our present study is to analyse the automated medical image retrieval system incorporating topic and location probabilities to enhance the performance.
Materials and methods:
In this paper, we present an automated medica...
Background and objective
High-throughput Next Generation Sequencing tools have generated immense quantity of genome-wide methylation and expression profiling data, resulting in an unprecedented opportunity to unravel the epigenetic regulatory mechanisms underlying cancer. Identifying differentially methylated regions within gene networks is an impo...
Content-Based Image Retrieval (CBIR) systems have recently emerged as one of the most promising and best image retrieval paradigms. To pacify the semantic gap associated with CBIR systems, the Bag of Visual Words (BoVW) techniques are now increasingly used. However, existing BoVW techniques fail to capture the location information of visual words e...
Cancer subtype discovery from omics data requires techniques to estimate the number of natural clusters in the data. Automatically estimating the number of clusters has been a challenging problem in Machine Learning. Using clustering algorithms together with internal cluster validity indexes have been a popular method of estimating the number of cl...
Background and objective:
Recently, differential DNA Methylation is known to affect the regulatory mechanism of biological pathways. A pathway encompasses a set of interacting genes or gene products that altogether perform a given biological function. Pathways often encode strong methylation signatures that are capable of distinguishing biological...
Background:
Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly...
Biomedical text mining is the process of extracting high quality information from biomedical text. It has a lot of applications in genetics related studies. Information about gene-disease associations is very important in drug design. Laboratory based methods for gene-disease association extraction need more effort and time. Literature mining is a...
Background:
Identification of pathways that show significant difference in activity between disease and control samples have been an interesting topic of research for over a decade. Pathways so identified serve as potential indicators of aberrations in phenotype or a disease condition. Recently, epigenetic mechanisms such as DNA methylation are kn...
Motivation
The identification of new therapeutic uses of existing drugs, or drug repositioning, offers the possibility of faster drug development, reduced risk, lesser cost and shorter paths to approval. The advent of high throughput microarray technology has enabled comprehensive monitoring of transcriptional response associated with various disea...
Self Organizing Map (SOM) is a significant algorithmic methodology to visualize data spaces of larger dimensions. Accurate analysis of the input data requires a well-trained SOM. Many measures are there in practice to analyse the quality of the map. One of the most commonly used measure is Quantization Error. A trained SOM grid with minimum quantiz...
The findings of Human Genome Project revealed Single Nucleotide Polymorphisms (SNPs) as the most common form of genetic variations in humans. It also demonstrated the active role of SNPs in the genesis of many system disorders. Thus it is related to one or more diseases. SNP-Gene interactions across multiple diseases is a novel area of research whi...
Understanding the role of genetics is very important for the in-depth study of a disease. Even though lots of information about gene-disease association is available, it is difficult even for an expert user to manually extract it from the huge volume of literature. Therefore, this work introduces a novel extraction tool that can identify disease as...
Clustering is one of the widely used unsupervised methods to interpret and analyze huge amount of data in the field of Bioinformatics. One of the major issues involved in clustering is to address the growing data so that the cluster quality does not decrease with increase in the size of the data. In this work, we compare the promising clustering al...
The bioinformatics field which is now dealing with a vast amount of data such as the protein patterns and the gene expression data, with a lot more information still to be unraveled, uses the basic techniques and tools for Data mining for retrieving useful information from huge biological databases. Clustering is a popular Data mining technique whi...
This paper deals with the problem of extracting acronym-definition pairs from biomedical text. We propose an improved Text mining system based on pattern matching method and space reduction heuristics which increases both recall and precision. Three metrics were used for evaluating the system - recall (measure of how much relevant data the system h...
The size and growth rate of biomedical literature creates new challenges for researchers who need to keepup to date. The objective of the present study was to design a patternmatching method for miningacronyms and their definitions from biomedical text by considering the space reduction heuristicconstraints have been proposed and implemented. The c...
Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quanti...
The need of high quality clustering is very important in the modern era of information processing. Clustering is one of the most important data analysis methods and the k-means clustering is commonly used for diverse applications. Despite its simplicity and ease of implementation, the k-means algorithm is computationally expensive and the quality o...
Identification of structural and sequence motifs in genomic sequences is gaining much attention now a days. Ribonucleic acid or RNA is one of the important biomolecule whose secondary structure defines its functionality. Soft computing techniques like genetic programming have been used for motif identification. In this paper, we propose a method fo...
Hepatitis C Virus (HCV) has become a major risk factor for the development of Hepatocellular Carcinoma (HCC). A framework has been developed to identify genomic markers associated with HCC of HCV sequences, which comprises of clustering, feature selection and classification. A new method for feature extraction for genomic sequences rooted in Hash t...
Clustering is a data mining technique that classifies a set of observations into several clusters based on some similarity measures. The most commonly used partitioning based clustering algorithm is K-means. However, the K-means algorithm has several drawbacks. The algorithm generates a local optimal solution based on the randomly chosen initial ce...
With the advent of modern techniques for scientific data collection, large quantities of Biomedical and Health Informatics data are getting accumulated at various databases. As a result of the enormity and tremendous growth-rate of such data banks, it is practically difficult to analyze and interpret the data using conventional methods. Effective a...
With the advent of modern techniques for scientific data collection, large quantities of data are getting accumulated at various databases. Systematic data analysis methods are necessary to extract useful information from rapidly growing data banks. Cluster analysis is one of the major data mining methods and the k-means clustering algorithm is wid...
With the advent of modern scientific methods for data collection, huge volumes of biological data are now getting accumulated
at various data banks. The enormity of such data and the complexity of biological networks greatly increase the challenges
of understanding and interpreting the underlying data. Effective and efficient Data Mining techniques...
Emergence of modern techniques for scientific data collection has resulted in large scale accumulation of data per- taining to diverse fields. Conventional database querying methods are inadequate to extract useful information from huge data banks. Cluster analysis is one of the major data analysis methods and the k-means clustering algorithm is wi...
This paper deals with the problem of mining acronyms and their definitions from biomedical text. We propose an effective text mining system by using pattern matching method. Different stages of the design have been explained with pseudo code. We used space reduction heuristic constraints (D. Nadeau and P. Turney, 2005) which will increase the preci...
Direction relations constitute an important class of user queries in Spatial Databases and Geographic Information Systems. However, no work has dealt with the processing of such relations in a mobile environment. Supporting direction queries using traditional spatial index structures do not work well in this environment because of the need to frequ...
Analysis done on the nature of the data posted on the World Wide Web (WWW) reveal that more than 80% of the data over the WWW is in unstructured text format. Hence extracting information from text is of paramount importance both for academic and business purposes. Simultaneously, evolution of web technology led to the novel concept of Semantic Web,...