Anupam Bhattacharjee

Wayne State University, Detroit, Michigan, United States

Are you Anupam Bhattacharjee?

Claim your profile

Publications (17)1.62 Total impact

  • Kazi Zakia Sultana, Anupam Bhattacharjee, Hasan Jamil
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding the interaction patterns among biological entities in a pathway can potentially reveal the role of the entities in biological systems. Although considerable effort has been contributed to this direction, querying biological pathways remained relatively unexplored. Querying is principally different in which we retrieve pathways satisfying a given property in terms of its topology, or constituents. One such property is subnetwork matching using various constituent parameters. In this paper, we introduce a logic based framework for querying biological pathways using a novel and generic subgraph isomorphism computation technique. We develop a graphical interface called IsoKEGG to facilitate flexible querying of KEGG pathways based on isomorphic pathway topologies as well as matching any combination of node names, types, and edges. It allows editing KGML represented query pathways and returns all isomorphic patterns in KEGG pathways satisfying a given query condition for further analysis.
    International Journal of Data Mining and Bioinformatics 11/2014; 9(1):1-21. · 0.39 Impact Factor
  • Kazi Zakia Sultana, Anupam Bhattacharjee, Hasan Jamil
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding the interaction patterns among biological entities in a pathway can potentially reveal the role of the entities in biological systems. Although considerable effort has been contributed to this direction, querying biological pathways remained relatively unexplored. Querying is principally different in which we retrieve pathways satisfying a given property in terms of its topology, or constituents. One such property is subnetwork matching using various constituent parameters. In this paper, we introduce a logic based framework for querying biological pathways using a novel and generic subgraph isomorphism computation technique. We develop a graphical interface called IsoKEGG to facilitate flexible querying of KEGG pathways based on isomorphic pathway topologies as well as matching any combination of node names, types, and edges. It allows editing KGML represented query pathways and returns all isomorphic patterns in KEGG pathways satisfying a given query condition for further analysis.
    International Journal of Data Mining and Bioinformatics 01/2014; 9(1):1-21. · 0.39 Impact Factor
  • Anupam Bhattacharjee, Hasan M. Jamil
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present an improved and novel directed graph matching algorithm, called CodeBlast, for searching functionally similar program segments in software repositories with greater effectiveness and accuracy. CodeBlast uses a novel canonical labeling concept to capture order independent data flow pattern in a program to encode the programŠs functional semantics and to aid matching. CodeBlast is capable of exact and approximate directed graph matching and is particularly suitable for matching Program Dependence Graphs. Introducing the notion of semantic equivalence in CodeBlast helps discover clone matches with high precision and recall that was not possible using systems such as JPlag, MOSS, and GPlag. We substantiate our claim through sufficient experimental evidence and comparative analysis with these leading systems.
    Proceedings of the 28th Annual ACM Symposium on Applied Computing; 03/2013
  • K.Z. Sultana, A. Bhattacharjee, H. Jamil
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding the interaction patterns among a set of biological entities in a pathway is an important exercise because it potentially could reveal the role of the entities in biological systems. Although a considerable amount of effort has been directed to the detection and mining of patterns in biological pathways in contemporary research, querying biological pathways remained relatively unexplored. Querying is principally different in which we retrieve pathways that satisfy a given property in terms of its topology, or constituents. One such property is subnetwork matching using various constituent parameters. In this paper, we introduce a logic based framework for querying biological pathways based on a novel and generic subgraph isomorphism computation technique. We cast this technique into a graphical interface called IsoKEGG to facilitate flexible querying of KEGG pathways. We demonstrate that IsoKEGG is flexible enough to allow querying based on isomorphic pathway topologies as well as matching any combination of node names, types, and edges. It also allows editing KGML represented query pathways and returns all possible pathways in KEGG that satisfy a given query condition that the users are able to investigate further.
    Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Experimental methods are beginning to define the networks of interacting genes and proteins that control most biological processes. There is significant interest in developing computational approaches to identify subnetworks that control specific processes or that may be involved in specific human diseases. Because genes associated with a particular disease (i.e., disease genes) are likely to be well connected within the interaction network, the challenge is to identify the most well-connected subnetworks from a large number of possible subnetworks. One way to do this is to search through chromosomal loci, each of which has many candidate disease genes, to find a subset of genes well connected in the interaction network. In order to identify a significantly connected subnetwork, however, an efficient method of selecting candidate genes from each locus needs to be addressed. In the current study, we describe a method to extract important candidate subnetworks from a set of loci, each containing numerous genes. The method is scalable with the size of the interaction networks. We have conducted simulations with our method and observed promising performance.
    Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), Sierre, Switzerland, March 22-26, 2010; 01/2010
  • Mohammad Shafkat Amin, Anupam Bhattacharjee, Hasan M. Jamil
    [Show abstract] [Hide abstract]
    ABSTRACT: Study of interactomes requires assembling complex tools, ontologies and online interaction network databases and so on to validate hypotheses and gain insight. One of the major bottlenecks is the discovery of similar or isomorphic subgraphs in very large interactomes and cross referencing the relationships a set of proteins or genes share. These interactomes are so large that most traditional subgraph isomorphism computation tools are unable to handle efficiently as stand alone tool, or as part of systems such as R. In this paper, we present a Cytoscape plugin to compute and discover isomorphic subnetworks in large interactomes based on a novel and efficient isomorphic subgraph computation method developed in our laboratory. Given an input interactome and a given query subnetwork, the plugin can efficiently compute interactome subnetworks similar to the query network, and cross reference the results from GO or other interactome databases with the aid of other available Cytoscape plugins such as BinGO. We describe the tool with respect to real life applications Biologists may want to contemplate.
    Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), Sierre, Switzerland, March 22-26, 2010; 01/2010
  • Mohammad Shafkat Amin, Anupam Bhattacharjee, Hasan M. Jamil
    [Show abstract] [Hide abstract]
    ABSTRACT: As the volume of information available on the internet is growing exponentially, it is clear that most of this information will have to be processed and digested by computers to produce useful information for human consumption. Unfortunately, most web contents are currently designed for direct human consumption in which it is assumed that a human will decipher the information presented to him in some context and will be able to connect the missing dots, if any. In particular, information presented in some tabular form often does not accompany descriptive titles or column names similar to attribute names in tables. While such omissions are not really an issue for humans, it is truly hard to extract information in autonomous systems in which a machine is expected to understand the meaning of the table presented and extract the right information in the context of the query. It is even more difficult when the information needed is distributed across the globe and involve semantic heterogeneity. In this paper, our goal is to address the issue of how to interpret tables with missing column names by developing a method for the assignment of attributes names in an arbitrary table extracted from the web in a fully autonomous manner. We propose a novel approach by leveraging Wikipedia for the first time for column name discovery for the purpose of table annotation. We show that this leads to an improved likelihood of capturing the context and interpretation of the table accurately and producing a semantically meaningful query response.
    Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), Sierre, Switzerland, March 22-26, 2010; 01/2010
  • K.Z. Sultana, A. Bhattacharjee, H. Jamil
    [Show abstract] [Hide abstract]
    ABSTRACT: Epistasis usually contributes to many well known diseases making the traits more complex and harder to study. The interactions between multiple genes and their alleles of different loci often mask the effects of a single gene at particular locus resulting in a complex trait. So the analysis of epistasis uncovers the facts about the mechanisms and pathways involved in a disease by analyzing biological interactions between implicated proteins. As the existing tools mainly focus on the single or pair wise variation analysis, a comprehensive tool capable of analyzing interactions among multiple variations located in different chromosomal loci is still of growing importance for genome wide association study. In this paper, we focus on exploring all the protein-protein interactions coded by the genes in the regions of variations of human genome. We introduce a tool called EpICS that helps explore the epistatic effects of genes by analyzing the protein-protein interactions within the regions of different types of genetic variations. It accepts variation IDs, type of variations (Insertion-Deletion/Copy Number Variation/Single Nucleotide polymorphism), PubMed identifiers, or a region of a chromosome as input and then enumerates the variations of the user-specified types as well as the interactions of the proteins coded by the genes in the region. It also provides necessary details for further study of the results. EpICS is available at http://integra.cs.wayne.edu:8080/epics for general use.
    Bioinformatics and Biomedicine Workshop, 2009. BIBMW 2009. IEEE International Conference on; 12/2009
  • Source
    A. Bhattacharjee, H. Jamil
    [Show abstract] [Hide abstract]
    ABSTRACT: Traditional schema matchers use a set of distinct simple matchers and use a composition function to combine the individual scores using an arbitrary order of matcher application leading to non-intuitive scores, produce improper matches, and wasteful and counterproductive computation, especially when no consideration is given to the properties of the individual matchers and the context of the application. In this paper, we propose a new method for schema matching in which wasteful computation is avoided by a prudent, and objective selection and ordering of a subset of useful matchers. This method thus has the potential to improve the matching efficiency and accuracy of many popular ontology generation engines. Such efficiency and quality assurance are imperative in autonomous systems because users rarely have a chance to validate the processing accuracy until the computation is complete. Experimental results to support the claim that such an approach monotonically improves the matching score at successive application of the matchers are also provided.
    Information Reuse & Integration, 2009. IRI '09. IEEE International Conference on; 09/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: In computer based internet services, queries are usually submitted in a context. Either the contexts are created, or are assumed - e.g., a purchase order, or an airline reservation. Unfortunately, there is little theoretical foundation for contexts, and systems usually do not use them formally. In this paper, we propose a model for context representation in the direction of aspect oriented programming and object-oriented systems, and show that contexts can be used to process queries better. We outline a brief model that we are pursuing based on the idea of constraint inheritance with exceptions in a query tree.
    Flexible Query Answering Systems, 8th International Conference, FQAS 2009, Roskilde, Denmark, October 26-28, 2009. Proceedings; 01/2009
  • Source
    Anupam Bhattacharjee, Aminul Islam, Hasan M. Jamil, Derek E. Wildman
    [Show abstract] [Hide abstract]
    ABSTRACT: A convenient mechanism to refer to large biological objects such as sequences, structures and networks is the use of identiflers or handles, commonly called IDs. IDs function as a unique place holder in an application for objects too large to be of immediate use in a ta- ble which is retrieved from a secondary archive when needed. Usually, applications use IDs of objects managed by remote databases that the applications do not have any control over such as GenBank, EMBL and UCSC. Unfortunately, IDs are generally not unique in public databases, and frequently change as the objects they refer to change. Consequently, applications built using such IDs need to adapt by monitoring possible ID migration occuring in databases they do not control, or risk produc- ing inconsistent, or out of date results, or even face loss of functionality. In this paper, we develop a wrapper based approach to recognizing ID migration in secondary databases, mapping obsolete IDs to valid new IDs, and updating databases to restore their intended functionality. We present our technique in detail using an example involving NCBI RefSeq as primary, and OCPAT as a secondary databases. Based on the pro- posed technique, we introduce a new wrapper like tool, called IDChase, to address the ID migration problem in biological databases and as a general platform.
    Contemporary Computing - Second International Conference, IC3 2009, Noida, India, August 17-19, 2009. Proceedings; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Data intensive applications in Life Sciences extensively use the Hidden Web as a platform for information sharing. Access to these heterogeneous Hidden Web resources is limited through the use of prede- flned web forms and interactive interfaces that users navigate manually, and assume responsibility for reconciling schema heterogeneity, mediat- ing missing information, extracting information and piping, transformat- ing formats and so on in order to implement desired query sequences or scientiflc work ∞ows. In this paper, we present a new data management system, called LifeDB, in which we ofier support for currency without view materialization and autonomous reconciliation of schema hetero- geneity in one single platform through a declarative query language called BioFlow. In our approach, schema heterogeneity is resolved at run time by treating the hidden web resources as a virtual warehouse, and by sup- porting a set of primitives for data integration on-the-∞y, for extracting information and piping to other resources, and for manipulating data in a way similar to traditional database systems to respond to application demands. We also describe BioFlow's support for work ∞ow design and application design using a visual interface called VizBuilder.
    Database and Expert Systems Applications, 20th International Conference, DEXA 2009, Linz, Austria, August 31 - September 4, 2009. Proceedings; 01/2009
  • Source
    Anupam Bhattacharjee, Zalia Shams, Kazi Zakia Sultana
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we introduce new algorithms for selecting taxon samples from large evolutionary trees, maintaining uniformity and randomness, under certain new constraints on the taxa. The algorithms are efficient as their runtimes and space complexities are polynomial. The algorithms have direct applications to the evolution of phylogenetic tree and efficient supertree construction using biologically curated data. We also present new lower bounds for the problem of constructing evolutionary tree from experiment under some earlier stated constraints. All the algorithms have been implemented
    Proceedings of the Canadian Conference on Electrical and Computer Engineering, CCECE 2006, May 7-10, 2006, Ottawa Congress Centre, Ottawa, Canada; 01/2006
  • Source
    Anupam Bhattacharjee, Kazi Zakia Sultana, Zalia Shams
    [Show abstract] [Hide abstract]
    ABSTRACT: Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small databases (up to a few thousand sequences) while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP and ML are NP-hard, application of such approaches do not scale large datasets. In this paper, we present a promising divide-and-conquer technique, the ZAZ method, to construct an evolutionary tree. The algorithm has been implemented and tested against five large biological datasets ranging from 5000-7000 sequences and dramatic speedup with significant improvement in accuracy (better than 94%), in comparison to existing approaches, has been obtained. Thus, high quality reconstruction can be obtained for large datasets by using this approach. Moreover, we present here another approach to construct the tree dynamically (when sequences come dynamically with partial information). Finally combining the two approaches, we show parallel approaches to construct the tree when sequences are generated or obtained dynamically
    Proceedings of the Canadian Conference on Electrical and Computer Engineering, CCECE 2006, May 7-10, 2006, Ottawa Congress Centre, Ottawa, Canada; 01/2006
  • Anupam Bhattacharjee, Hasan M. Jamil
    [Show abstract] [Hide abstract]
    ABSTRACT: Given an undirected/directed large weighted data graph and a similar smaller weighted pattern graph, the problem of weighted subgraph matching is to find a mapping of the nodes in the pattern graph to a subset of nodes in the data graph such that the sum of edge weight differences is minimum. Biological interaction networks such as protein-protein interaction networks and molecular pathways are often modeled as weighted graphs in order to account for the high false positive rate occurring intrinsically during the detection process of the interactions. Nonetheless, complex biological problems such as disease gene prioritization and conserved phylogenetic tree construction largely depend on the similarity calculation among the networks. Although several existing methods provide efficient methods for graph and subgraph similarity measurement, they produce nonintuitive results due to the underlying unweighted graph model assumption. Moreover, very few algorithms exist for weighted graph matching that are applicable with the restriction that the data and pattern graph sizes are equal. In this paper, we introduce a novel algorithm for weighted subgraph matching which can effectively be applied to directed/undirected weighted subgraph matching. Experimental results demonstrate the superiority and relative scalability of the algorithm over available state of the art methods. KeywordsWeighted graphs–Weighted subgraph matching– Canonical representation–Biological networks
    Journal of Intelligent Information Systems 38(3):1-18. · 0.83 Impact Factor
  • Source
    Md Tanvir, Al Amin, Anupam Bhattacharjee
    [Show abstract] [Hide abstract]
    ABSTRACT: Presenting a clear image by reducing the noise to a minimal level is one of the most fundamental research topics in image processing. Different types of noise are initiated during the process of acquisition to digitiza-tion of an image, causing degradation in quality. As there is no way for total elimination, several methods are employed depending on the type of noise. In this paper, we present a method based on least squares re-gression analysis to detect and eliminate impulsive noise from an image. A sweeping window of certain dimension calculates how well a general plane fits over the pixels currenly inside the window and then exam-ines each pixel whether it is part of noise or part of sig-nal. Final decision about each pixel is taken according to the majority verdicts about the pixel. Values of the corrupted pixels are found by fitting a general parabol-oid only through the uncorrupted pixels in each window and taking the mean of all suggested values. The method can be applied to a vast area of real world ap-plications like digital photography, medical imaging, satellite imagery correction, computer vision and so on. Experiment reveals that this method has success rate of more than 92% in detection and elimination of impulse noise.
  • Source
    Mohammad Shafkat Amin, Anupam Bhattacharjee, Hasan Jamil
    [Show abstract] [Hide abstract]
    ABSTRACT: As the volume of information available on the internet is growing exponentially, it is clear that most of this information will have to be processed and digested by computers to produce useful infor- mation for human consumption. Unfortunately, most web contents are currently designed for direct human consumption in which it is assumed that a human will decipher the information presented to him in some context and will be able to connect the missing dots, if any. In particular, information presented in some tabular form of- ten does not accompany descriptive titles or column names similar to attribute names in tables. While such omissions are not really an issue for humans, it is truly hard to extract information in au- tonomous systems in which a machine is expected to understand the meaning of the table presented and extract the right informa- tion in the context of the query. It is even more difficult when the information needed is distributed across the globe and involve se- mantic heterogeneity. In this paper, our goal is to address the issue of how to interpret tables with missing column names by develop- ing a method for the assignment of attributes names in an arbitrary table extracted from the web in a fully autonomous manner. We propose a novel approach by leveraging Wikipedia for the first time for column name discovery for the purpose of table annotation. To improve accuracy of the assignment, we exploit the structural ho- mogeneity of the column values and their co-location to weed out less likely candidates. We show that this leads to an improved like- lihood of capturing the context and interpretation of the table accu- rately and producing a semantically meaningful query response.