Dhruba K Bhattacharyya

Dhruba K Bhattacharyya
Tezpur University · Department of Computer Science & Engineering

PhD in Computer Science and Engineering
Working towards development of a generic machine learning solution for novel malware and biomarker identification.

About

291
Publications
127,195
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,628
Citations
Citations since 2016
125 Research Items
4643 Citations
20162017201820192020202120220200400600800
20162017201820192020202120220200400600800
20162017201820192020202120220200400600800
20162017201820192020202120220200400600800
Introduction
Machine learning enabled (I) malware defense development and (ii) biomarker identification using differential and co-expression analysis are my high priority research areas.
Additional affiliations
March 2004 - July 2017
Tezpur University
Position
  • Professor
Description
  • Teaching, Research, Consultancy and Administration.
March 1999 - July 2004
Tezpur University
Position
  • Professor
Description
  • Teaching and research.
March 1995 - March 1999
Tezpur University
Position
  • Professor
Description
  • Teaching and Research

Publications

Publications (291)
Article
Recent years have seen rise in applications of differential co-expression analysis (DCE) for disease biomarker identification. This paper presents a centrality-based hub-gene centric method called Centrality Based Differential Co-Expression Method (CBDCEM), for crucial gene finding for critical diseases. A prominent task of DCE is the identificatio...
Article
Full-text available
Recently spectral-spatial information based algorithms are gaining more attention because of its robustness, accuracy and efficiency. In this paper, an SVM based classification method has been proposed which extracts features considering both spectral and spatial information. The proposed method exploits SVM to encode spectral-spatial information o...
Article
Clustering unleashes the power of scRNA-seq through identification of appropriate cell groups. Most existing clustering methods applied on or developed for scRNA-seq data require user inputs. A few also require rigorous external preprocessing. In this paper, we propose an effective clustering method, which integrates required preprocessing steps fo...
Preprint
Full-text available
A satellite image is a remotely sensed image data, where each pixel represents a specific location on earth. The pixel value recorded is the reflection radiation from the earth's surface at that location. Multispectral images are those that capture image data at specific frequencies across the electromagnetic spectrum as compared to Panchromatic im...
Article
Full-text available
Recent history is not that generous and kind when it comes to viral infections from animal reservoirs to target humans. Re-emergence of mutating strains of such virus has only added more misery. In 2019, SARS-CoV2 had a daunting presence in and around the world, pausing grave threats to the perspective of global health, economy, livelihood and huma...
Article
Feature selection help select an optimal subset of features from a large feature space to achieve better classification performance. The performance of KNN classifier can be improved significantly using an appropriate subset of features from a large feature space. Recent development in General Purpose Graphics Processing Units (GPGPU) has provided...
Chapter
With the advancements in modern information and control systems, a new generation of systems has emerged, featuring a combination of independently developed cyber and physical processes. These systems are called cyber physical systems (CPSs). CPSs are composed of various interacting elements that monitor and control the physical processes through a...
Article
An extensive empirical study is presented in this work to identify potential biomarkers of ESCC by employing fifteen prominent biclustering algorithms on synthetic and real datasets. For systematic analyses, we implement the algorithms on a variety of synthetic datasets and evaluate the quality of biclusters using recovery and relevance scores. The...
Article
The challenge of identifying modules in a gene interaction network is important for a better understanding of the overall network architecture. In this work, we develop a novel similarity measure called Scaling-and-Shifting Normalized Mean Residue Similarity (SNMRS), based on the existing NMRS technique [1]. SNMRS yields correlation values in the r...
Chapter
Jitani, N.Singha, B.Barman, G.Talukdar, A.Choudhury, B. K.Sarmah, R.Bhattacharyya, D. K.Thresholding is one of the most widely used techniques for image segmentation, particularly, for medical image segmentation. The key idea is the selection of an appropriate intensity value to differentiate the background pixels from the object of interest pixels...
Article
Full-text available
Parkinson’s disease (PD) is one of the most common neurodegenerative disorders. This aging-related disease occurs due to the degenerative loss of tissue or cellular functions in the brain and due to genetic and epigenetic effects. This study was conducted on an RNA-seq dataset of PD collected from BA9 tissues to get insights to PD. A few RNA-seq ba...
Article
Exploratory analysis of high throughput gene sample time (GST) data has an impact in biomedical and bioinformatics research. Mining gene expression pattern in such three dimensional data facilitate in understanding hidden biological knowledge as well as underlying complex gene regulatory mechanism. In particular, we propose a novel semi-supervised...
Preprint
Full-text available
Hepatobiliary cancers (HBCs) are the most aggressive and sixth most diagnosed cancers globally. Biomarkers for timely diagnosis and targeted therapy in HBCs are still limited. Considering the gap, our objective is to identify unique and overlapping molecular signatures associated with HBCs. We analyzed publicly available transcriptomic datasets on...
Article
scRNA-seq data analysis enables new possibilities for identification of novel cells, specific characterization of known cells and study of cell heterogeneity. The performance of most clustering methods especially developed for scRNA-seq is greatly influenced by user input. We propose a centrality-clustering method named UICPC and compare its perfor...
Article
A number of methods are being developed and used for analysis of gene expression data such as RNA-Seq data. Most of these tools focus on finding genes that are responsible for the disease conditions. Methods such as co-expression network generation, module detection and differential co-expression analysis are used to look into specific changes in t...
Article
Full-text available
Gallbladder cancer (GBC) has a lower incidence rate among the population relative to other cancer types but is a major contributor to the total number of biliary tract system cancer cases. GBC is distinguished from other malignancies by its high mortality, marked geographical variation and poor prognosis. To date no systemic targeted therapy is ava...
Article
Hyperspectral sensor generates huge datasets which conveys abundance of information. However, it poses many challenges in the analysis and interpretation of these data. Deep networks like VGG16, VGG19 are difficult to directly apply for hyperspectral image (HSI) classification because of its higher number of layers which in turn requires high level...
Article
Density-based clustering has the ability to detect arbitrary shaped clusters in any dataset. In recent years, several density peak clustering methods have been reported. Among these, a few need user input(s), but majority use cluster validity indices to provide the best results. In this paper, we propose a density-based user-input-free clustering m...
Article
To promote diligent analysis of the progression of a disease, it is important to identify interesting biomarkers for the disease. Biclustering has already been established as an effective technique to help identify such biomarkers of high biological significance. Although in the recent past, a good number of biclustering techniques have been introd...
Chapter
K-nearest neighbor (k-nn) is a widely used classifier in machine learning and data mining, and is very simple to implement. The k-nn classifier predicts the class label of an unknown object based on the majority of the computed class labels of its k nearest neighbors. The prediction accuracy of the k-nn classifier depends on the user input value of...
Article
Effective biomarkers aid in the early diagnosis and monitoring of breast cancer and thus play an important role in the treatment of patients suffering from the disease. Growing evidence indicates that alteration of expression levels of miRNA is one of the principal causes of cancer. We analyze breast cancer miRNA data to discover a list of bicluste...
Preprint
Full-text available
Gallbladder cancer (GBC) has a lower incidence rate among the population relative to other cancer types but majorly contributes to the total cancer cases of the biliary tract system. GBC is distinguished from other malignancies due to its high mortality, marked geographical variation and poor prognosis. To date no systemic targeted therapy is avail...
Chapter
In near future, Internet is predicted to be on the cloud, resulting in more complex and more intensive computing, but possibly also a more insecure digital world. The presence of a large number of resources organized densely is a factor in attracting DDoS attacks. Such attacks are arguably more dangerous in private or individual clouds with limited...
Chapter
Biclustering has already been established as an effectiveSaikia, Manaswita tool to study gene expression data toward interesting biomarker findings for a given disease. This paper examines the effectiveness of some prominent biclustering algorithms in extracting biclusters of highBhattacharyya, Dhruba K. biological significance toward the identific...
Article
Genes act in groups known as gene modules, which accomplish different cellular functions in the body. The modular nature of gene networks was used in this study to detect functionally enriched modules in samples obtained from COPD patients. We analyzed modules extracted from COPD samples and identified crucial genes associated with the disease COVI...
Chapter
Genes are the backboneSharma, Pooja ofPandey, Anuj K. livingBhattacharyya, Dhruba K. bodies. GeneKalita, Jugal K. modules are nothing but group of genes responsible for carrying out various life-supporting functions in the body. However, any disruption in the activity of genes leads to an imbalance in the body referred to as diseased condition. Par...
Chapter
Full-text available
To classify sensor data correctly and quickly has a very sound impact on areas such as performance monitoring, user behavior analysis, and user accounting and intrusion detection in IoT (Internet of things). This work is an approach to reorganize the features in a dataset of 114 features depending on the relevancy and non-redundancy of an attribute...
Conference Paper
In recent years, ransomware has emerged as a new malware epidemic that creates havoc on the Internet. It infiltrates a victim system or network and encrypts all personal files or the whole system using a variety of encryption techniques. Such techniques prevent users from accessing files or the system until the required amount of ransom is paid. In...
Conference Paper
Full-text available
In this paper, a clean comparison among different ensemble learning methods for hyperspectral image(HSI) classification have been investigated. Random Forest(RF), eXtreme Gradient Boosting(XGBoost) and adaboost have been exploited which extract both spectral and spatial features. Adaboost has been used with two base learners: decision tree(DT) and...
Article
Full-text available
This paper introduces an enhanced version of Pearson’s correlation coefficient (PCC) to achieve better biclustering-enabled co-expression analysis. The modified measure called local pearson correlation measure (LPCM) helps detect shifting, scaling, and shifting-and-scaling correlation patterns effectively over gene expression data in the presence o...
Article
A gene co-expression network (CEN) is of biological interest, since co-expressed genes share common functions and biological processes or pathways. Finding relationships among modules can reveal inter-modular preservation, and similarity in transcriptome, functional, and biological behaviors among modules of the same or two different datasets. Ther...
Chapter
In this work, we study the performance of the K-Nearest Neighbour (KNN) based predictive model in sequential as well as parallel mode to observe its performance both in terms of accuracy and execution time. We propose a parallel KNN algorithm, called TUKNN to handle voluminous data. Based on our experimental study, it has been observed that our met...
Chapter
Full-text available
This paper attempts to identify a set of crucial genes for Esophageal Squamous Cell Carcinoma (ESCC) using Differential Expression analysis supported by gene enrichment analysis. Initially, we identify a subset of up-regulated and down-regulated genes based on adjusted P-value and log fold change value. Then, we construct co-expression network and...
Article
To understand the underlying biological mechanisms of gene expression data, it is important to discover the groups of genes that have similar expression patterns under certain subsets of conditions. Biclustering algorithms have been effective in analyzing large-scale gene expression data. Recently, traditional biclustering has been improved by intr...
Conference Paper
To classify sensor data correctly and quickly has a very sound impact on areas such as performance monitoring, user behavior analysis, and user accounting and intrusion detection in IoT (Internet of things). This work is an approach to reorganize the features in a dataset of 114 features depending the relevancy and non-redundancy of an attribute or...
Article
Shrew DDoS attack mainly targets the TCP’s retransmission timeout (RTO) mechanism that handles severe cases of congestion and packet losses. This attack is very hard to detect due to its stealthy nature and low-rate in volume which if remained undetected can affect the legitimate TCP flows. In this paper, we propose a fast shrew DDoS attack detecti...
Chapter
A semi-supervised gene co-expressed pattern finding method, PatGeneClus is presented in this paper. PatGeneClus attempts to find all possible biologically relevant gene coherent patterns from any microarray dataset by exploiting both gene expression similarity as well as GO-similarity. PatGeneClus uses a graph-based clustering algorithm called DCli...
Chapter
Presence of missing values (MV) in gene expression data is commonplace. It significantly affects the performance of statistical analysis and machine learning algorithms. Discarding objects or attributes with missing values and inappropriate estimation of MVs lead to high information loss and misleading results. So, it is necessary to have an accura...
Article
Full-text available
Background Neuropsychiatric disorders such as Schizophrenia (SCZ) and Bipolar disorder (BPD) pose a broad range of problems with different symptoms mainly characterized by some combination of abnormal thoughts, emotions, behaviour, etc. However, in depth molecular and pathophysiological mechanisms among different neuropsychiatric disorders have not...
Article
Full-text available
Esophageal Squamous Cell Carcinoma (ESCC) is considered as a deadly disease especially in the North-East, India. A series of differentially expressed genes (DEGs) are suspected to be involved in the progression of ESCC. To search the DEGs a good number of tools are available. To remove the biasness of resulting DEGs given by all such tools, a conse...
Chapter
Full-text available
This paper investigates a fatal disease called Esophageal Squamous Cell Carcinoma (ESCC) and is ranked as sixth leading cancer all over the world. The method of integrative analysis is used to identify a set of most responsible genes that may cause the progression of this disease. We consider both microarray and RNA-seq gene expression data. Initia...
Chapter
Differential expression (DE) analysis and identification of differentially expressed genes (DEGs) provide insights for discovery of therapeutic drugs and underlying mechanisms of disease. Statistical methods, such as DESeq2, edgeR, and limma-voom produce a number of false positives and false negatives and fail to differentiate between the DEGs as u...
Preprint
Gene co-expression networks are effective in studying properties of genes, and deriving meaningful conclusions about their behaviors. Co-expression analysis is usually carried out on a single edge network. However, two genes may coexpress over two or more distinct subspaces, i.e., genes may be connected via multiple edges, where each edge correspon...
Article
Full-text available
Analysis of gene expression patterns enables identifcation of signifcant genes related to a specifc disease. We analyze gene expression data for esophageal squamous cell carcinoma (ESCC) using biclustering, gene–gene network topology and pathways to identify signifcant biomarkers. Biclustering is a clustering technique by which we can extract coe...
Article
In the recent past, a number of methods have been developed for analysis of biological data. Among these methods, gene co-expression networks have the ability to mine functionally related genes with similar co-expression patterns, because of which such networks have been most widely used. However, gene co-expression networks cannot identify genes,...
Article
Random forest (RF) is one of the most powerful ensemble classifiers often used in machine learning applications. It has been found successful on many benchmarked data. However, the performance of an RF model is highly affected by the calibration of the model parameters. It requires optimization of two parameters—(i) size of RF and (ii) number of fe...
Article
Network traffic classification to detect DDoS attacks is challenging in the context of high-speed networks. In this paper, we discuss the need for distributed feature selection in intrusion detection systems using parallel computing. This paper presents a parallel cumulative ranker algorithm to rank the attributes of a dataset for cost-effective cl...
Article
In this paper we present a complete framework for detection and mitigation of different types of commonly seen deadly DDoS attacks. The system assumes bi-directional traffic information at an edge router to detect and mitigate the attacks. A router might not always see the outgoing traffic corresponding to the incoming traffic carried by the router...
Article
Analysis of gene expression data is widely used in transcriptomic studies to understand functions of molecules inside a cell and interactions among molecules. Differential co-expression analysis studies diseases and phenotypic variations by finding modules of genes whose co-expression patterns vary across conditions. We review the best practices in...
Article
Full-text available
Whenever web-application executes dynamic SQL statements it may come under SQL injection attack. To evaluate the existing practices of its detection, we consider two different security scenarios for the web-application authentication that generates dynamic SQL query with the user input data. Accordingly, we generate two different datasets by consid...
Chapter
This paper presents an effective alert correlation method referred to as MaNaDAC to support network intrusion detection. The method includes several modules such as feature ranking and selection, clustering and fusion to process low-level alerts and uses the concept of causality to discover relations among attacks. The method has been validated usi...
Book
The two-volume set of LNCS 11941 and 11942 constitutes the refereed proceedings of the 8th International Conference on Pattern Recognition and Machine Intelligence, PReMI 2019, held in Tezpur, India, in December 2019. The 131 revised full papers presented were carefully reviewed and selected from 341 submissions. They are organized in topical secti...
Book
The two-volume set of LNCS 11941 and 11942 constitutes the refereed proceedings of the 8th International Conference on Pattern Recognition and Machine Intelligence, PReMI 2019, held in Tezpur, India, in December 2019. The 131 revised full papers presented were carefully reviewed and selected from 341 submissions. They are organized in topical secti...
Article
Full-text available
In the domain of gene-gene network analysis, construction of co-expression networks and extraction of network modules have opened up enormous possibilities for exploring the role of genes in biological processes. Through such analysis, one can extract interesting behaviour of genes and would help in the discovery of genes participating in a common...
Article
Analysis of RNA-sequence (RNA-seq) data is widely used in transcriptomic studies and it has many applications. We review RNA-seq data analysis from RNA-seq reads to the results of differential expression analysis. In addition, we perform descriptive comparison of tools used in each step of RNA-seq data analysis along with a discussion of important...
Chapter
Datasets are important for validation of any method or technique. The effectiveness of a method or technique can be well judged using an unbiased, complete and correct dataset. This paper presents a novel dataset to support validation of any computer vision method for recognition of Sattriya dance hand gestures, a fifteenth-century major Indian cla...
Article
Cross-site scripting attack (abbreviated as XSS) is an unremitting problem for the Web applications since the early 2000s. It is a code injection attack on the client-side where an attacker injects malicious payload into a vulnerable Web application. The attacker is often successful in eventually executing the malicious code in an innocent user's b...
Article
Developing a cost-effective and robust triclustering algorithm that can identify triclusters of high biological significance in the gene-sample-time (GST) domain is a challenging task. Most existing triclustering algorithms can detect shifting and scaling patterns in isolation, they are not able to handle co-occurring shifting-and-scaling patterns....
Article
This paper presents an exhaustive empirical study to identify biomarkers using two approaches: frequency-based and network-based, over seventeen different biclustering algorithms and six different cancer expression datasets. To systematically analyze the biclustering algorithms, we perform enrichment analysis, subtype identification and biomarker i...
Article
Malicious software events are usually stealthy and thus challenging to detect. A triggering relation can be assumed to be causal and to create a temporal relationship between the events. For example, in a spoofed TCP DDoS flooding attack, the attacker manipulates a three-way handshake procedure. During this attack, the number of spoofed IP addresse...
Article
This paper presents fingerprint indexing based on graph information of minutiae, fingerprint classification and verification based on hierarchical agglomerative clustering technique. The proposed fingerprint indexing is invariant under translation and rotation. Its performance is evaluated in terms of several real-life datasets. The fingerprint dat...