# Angshul MajumdarIndraprastha Institute of Information Technology | IIITD

Angshul Majumdar

PhD

## About

311

Publications

24,610

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

4,735

Citations

Citations since 2016

Introduction

Additional affiliations

May 2016 - July 2016

May 2015 - July 2015

July 2014 - August 2014

## Publications

Publications (311)

The identification and characterization of circulating tumor cells (CTCs) are important for gaining insights into the biology of metastatic cancers, monitoring disease progression, and medical management of the disease. The limiting factor in the enrichment of purified CTC populations is their sparse availability, heterogeneity, and altered phenoty...

There are two fundamental contributions of this work. On the application side, one of the most challenging problems is tackled, predicting day-ahead crypto-currency prices. On the theoretical front, a new dynamical modeling approach is proposed. The proposed approach keeps the probabilistic formulation of the State-Space Model that yields point est...

Co-administration of two or more drugs simultaneously can result in adverse drug reactions. Identifying drug-drug interactions (DDIs) is necessary, especially for drug development and for repurposing old drugs. DDI prediction can be viewed as a matrix completion task, for which matrix factorization (MF) appears as a suitable solution. This paper pr...

Co-administration of two or more drugs simultaneously can result in adverse drug reactions. Identifying drug-drug interactions (DDIs) is necessary, especially for drug development and for repurposing old drugs. DDI prediction can be viewed as a matrix completion task, for which matrix factorization (MF) appears as a suitable solution. This paper pr...

Inter and intra-tumoral heterogeneity are major stumbling blocks in the treatment of cancer and are responsible for imparting differential drug responses in cancer patients. Recently, the availability of high-throughput screening datasets has paved the way for machine learning based personalized therapy recommendations using the molecular profiles...

Investigation of existing drugs is an effective alternative to the discovery of new drugs for treating diseases. This task of drug re-positioning can be assisted by various kinds of computational methods to predict the best indication for a drug given the open-source biological datasets. Owing to the fact that similar drugs tend to have common path...

Understanding the variability in intra-urban forms is essential for protecting future-proofing cities from climate volatilities. However, the classification of urban forms is costly due to its reliance on high-resolution datasets, limited in the Global South's low-income cities. Additionally, the current classifiers are constrained in characterizin...

This study formulates antiviral repositioning as a matrix completion problem wherein the antiviral drugs are along the rows and the viruses are along the columns. The input matrix is partially filled, with ones in positions where the antiviral drug has been known to be effective against a virus. The curated metadata for antivirals (chemical structu...

Dysregulation of a gene’s function, either due to mutations or impairments in regulatory networks, often triggers pathological states in the affected tissue. Comprehensive mapping of these apparent gene-pathology relationships is an ever daunting task, primarily due to genetic pleiotropy and lack of suitable computational approaches. With the adven...

In clustering-based hyperspectral band selection techniques, 2-D images of each band are usually taken as input samples. Some form of feature extraction on these images is performed before they are input to the clustering algorithm. The clustering algorithm returns the cluster centroids; the bands closest to the centroids are selected as representa...

In this work we propose a dictionary learning based clustering approach. We regularize dictionary learning with a clustering loss; in particular, we have used sparse subspace clustering and K-means clustering. The basic idea is to use the coefficients from dictionary learning as inputs for clustering. Comparison with state-of-the-art deep learning...

The objective of this letter is to propose a novel computational method to learn the state of an appliance (ON / OFF) given the aggregate power consumption recorded by the smart-meter. We formulate a multi-label classification problem where the classes correspond to the appliances. The proposed approach is based on our recently introduced framework...

This work addresses the problem of completing a partially observed matrix where the entries are either ones or zeroes. This is typically called one-bit matrix completion or binary matrix completion. In this problem, the association among the rows and among the columns can be modeled through graph Laplacians. Since the Laplacians cannot be computed...

The identification and characterization of circulating tumor cells (CTCs) are important for gaining insights into the biology of metastatic cancers, monitoring disease progression, and medical management of the disease. The limiting factor that hinders enrichment of purified CTC populations is their sparse availability, heterogeneity, and altered p...

Inter and intra-tumoral heterogeneity are major stumbling blocks in the treatment of cancer and are responsible for imparting differential drug responses in cancer patients. Recently, the availability of large-scale drug screening datasets has provided an opportunity for predicting appropriate patient-tailored therapies by employing machine learnin...

This work follows the multi-label classification based paradigm for non-intrusive load monitoring (NILM). Power consumption signals used for NILM are inherently time varying. However prior multi-label classification techniques could not model this dynamical behaviour. They used off-the-shelf algorithms for classifying static signals on NILM problem...

There are two broad frameworks for collaborative filtering. Chronologically, the neighborhood based models came first – they were based on linear interpolation where the interpolation weights were proportional to the similarities between users’ and items’. Latent factor models were introduced later; they were based on the underlying assumption that...

Curbing hate speech is undoubtedly a major challenge for online microblogging platforms like Twitter. While there have been studies around hate speech detection, it is not clear how hate speech finds its way into an online discussion. It is important for a content moderator to not only identify which tweet is hateful, but also to predict which twee...

The year 2020 witnessed a heavy death toll due to COVID-19, calling for a global emergency. The continuous ongoing research and clinical trials paved the way for vaccines. But, the vaccine efficacy in the long run is still questionable due to the mutating coronavirus, which makes drug re-positioning a reasonable alternative. COVID-19 has hence fast...

In this work, we introduce a new modeling and inferential tool for dynamical processing of time series. The approach is called recurrent dictionary learning (RDL). The proposed model reads as a linear Gaussian Markovian state-space model involving two linear operators, the state evolution and the observation matrices, that we assumed to be unknown....

This work addresses the problem of analyzing multi-channel time series data by proposing an unsupervised fusion framework based on convolutional transform learning. Each channel is processed by a separate 1D convolutional transform; the output of all the channels are fused by a fully connected layer of transform learning. The training procedure tak...

This work proposes a new approach for dynamical modeling; we call it sequential transform learning. This is loosely based on the transform (analysis dictionary) learning formulation. This is the first work on this topic. Transform learning, was originally developed for static problems; we modify it to model dynamical systems by introducing a feedba...

This letter presents a new Transform Learning (TL) based multi-sensor fusion framework referred to as TransFuse. Unlike the standard representation learning based techniques, TransFuse learns individual transforms for each sensor and fuses them using a common transform representation within a joint optimization formulation. Considering regression a...

The advent of single-cell open-chromatin profiling technology has facilitated the analysis of heterogeneity of activity of regulatory regions at single-cell resolution. However, stochasticity and availability of low amount of relevant DNA, cause high drop-out rate and noise in single-cell open-chromatin profiles. We introduce here a robust method c...

This work proposes an unsupervised fusion framework based on deep convolutional transform learning. The great learning ability of convolutional filters for data analysis is well acknowledged. The success of convolutive features owes to convolutional neural network (CNN). However, CNN cannot perform learning tasks in an unsupervised fashion. In a re...

This work addresses the problem of analyzing multi-channel time series data %. In this paper, we by proposing an unsupervised fusion framework based on %the recently proposed convolutional transform learning. Each channel is processed by a separate 1D convolutional transform; the output of all the channels are fused by a fully connected layer of tr...

This work proposes a supervised multi-channel time-series learning framework for financial stock trading. Although many deep learning models have recently been proposed in this domain, most of them treat the stock trading time-series data as 2-D image data, whereas its true nature is 1-D time-series data. Since the stock trading systems are multi-c...

This work proposes a supervised multi-channel time-series learning framework for financial stock trading. Although many deep learning models have recently been proposed in this domain, most of them treat the stock trading time-series data as 2-D image data, whereas its true nature is 1-D time-series data. Since the stock trading systems are multi-c...

This work addresses the problem of completing a partially filled matrix incorporating metadata associated with the rows and columns. The basic operation of matrix completion is modeled via deep matrix factorization, and the metadata associations are modeled as graphs. The problem is formally modeled as deep matrix factorization regularized by multi...

This work introduces a new unsupervised representation learning technique called Deep Convolutional Transform Learning (DCTL). By stacking convolutional transforms, our approach is able to learn a set of independent kernels at different layers. The features extracted in an unsupervised manner can then be used to perform machine learning tasks, such...

The importance of clustering the single-cell RNA sequence is well known. Traditional clustering techniques (GiniClust, Seurat, etc.) have mostly been used to address this problem. This is the first work that develops a deep dictionary learning-based solution for the same. Our work builds on the framework of deep dictionary learning. We make the fra...

-This letter addresses the problem of detecting non-technical losses in an unsupervised fashion. Most prior studies in this area proposed supervised means (assumed the losses to be labeled); getting supervised data for such a problem seems impractical. For a practical scenario, non-technical losses should be detected in an unsupervised fashion. Thi...

This work introduces a new unsupervised representation learning technique called Deep Convolutional Transform Learning (DCTL). By stacking convolutional transforms, our approach is able to learn a set of independent kernels at different layers. The features extracted in an unsupervised manner can then be used to perform machine learning tasks, such...

This work formulates antiviral repositioning as a matrix completion problem where the antiviral drugs are along the rows and the viruses along the columns. The input matrix is partially filled, with ones in positions where the antiviral has been known to be effective against a virus. The curated metadata for antivirals (chemical structure and pathw...

In this work we address the problem of short-term load forecasting. We propose a generalization of the linear state-space model where the evolution of the state and the observation matrices is unknown. The proposed blind Kalman filter algorithm proceeds via alternating the estimation of these unknown matrices and the inference of the state, within...

The advent of single-cell open-chromatin profiling technology has facilitated the analysis of heterogeneity of activity of regulatory regions at single-cell resolution. However, stochasticity and availability of low amount of relevant DNA cause high drop-out rate and noise in single-cell open-chromatin profiles. We introduce here a robust method ca...

We have created a database with all known viruses and their corresponding antivirals. The database also accounts for the genomic sequene of the viruses and the chemical structure of the drugs. This database is used for drug repositioning, with the goal of finding drugs suitable for treating COVID-19. <br

Motivation: COVID-19 has fast-paced drug re-positioning for its treatment. This work builds computational models for the same. The aim is to assist clinicians with a tool for selecting prospective antiviral treatments. Since the virus is known to mutate fast, the tool is likely to help clinicians in selecting the right set of antivirals for the mut...

Load Disaggregation has gained much popularity
in the recent times, owing to the advantages it brings to energy
utility companies. Many modeling techniques ranging from Dic-
tionary Learning to HMM-based techniques to Neural Network
based modeling have been proposed in the literature to solve this
problem. However, scalability and computational lig...

Abstract This work proposes an unsupervised fusion framework based on deep convolutional transform learning. The great learning ability of convolutional filters for data analysis is well acknowledged. The success of convolutive features owes to the convolutional neural network (CNN). However, CNN cannot perform learning tasks in an unsupervised fas...

Single-cell RNA sequencing has been proved to be advantageous in discerning molecular heterogeneity in seemingly similar cells in a tissue. Due to the paucity of starting RNA, a large fraction of transcripts fail to amplify during the polymerase chain reaction cycle. This gets compounded by trivial biological noise such as variability in the cell c...

Convolutional transform learning is an unsupervised framework we introduced recently, for feature generation based on learnt convolutions. In this work, we propose a supervised formulation for convolutional transform so as to address the multi-label classification problem. Unlike the simple multiclass classification, in multi-label problems, each s...

Subspace clustering assumes that the data is separable into separate subspaces; this assumption may not always hold. For such cases, we assume that, even if the raw data is not separable into subspaces, one can learn a deep representation such that the learnt representation is separable into subspaces. To achieve the intended goal, we propose to em...

Transform learning is a new representation learning framework where we learn an operator/transform that analyses the data to generate the coefficient/representation. We propose a variant of it called the graph transform learning; in this we explicitly account for the correlation in the dataset in terms of graph Laplacian. We will give two variants;...

Motivation
Investigation of existing drugs is an effective alternative to discovery of new drugs for treating diseases. This task of drug re-positioning can be assisted by various kinds of computational methods to predict the best indication for a drug given the open-source biological datasets. Owing to the fact that similar drugs tend to have comm...

Multi-echo magnetic resonance (MR) images are acquired by changing the echo times (for T2 weighted) or relaxation times (for T1 weighted) of scans. The resulting (multi-echo) images are usually used for quantitative MR imaging. Acquiring MR images is a slow process and acquiring multi scans of the same cross section for multi-echo imaging is even s...

Subspace clustering assumes that the data is separable into separate subspaces. Such a simple assumption, does not always hold. We assume that, even if the raw data is not separable into subspaces, one can learn a representation (transform coefficients) such that the learnt representation is separable into subspaces. To achieve the intended goal, w...

Regression and forecasting can be viewed as learning the functions with the appropriate input and output variables from the data. To capture the complex relationship among the variables, different techniques like, kernelized dictionary learning are being explored in the existing literature. In this paper, the transform learning based function appro...

The identification of potential interactions between drugs and target proteins is crucial in pharmaceutical sciences. The experimental validation of interactions in genomic drug discovery is laborious and expensive; hence, there is a need for efficient and accurate in-silico techniques which can predict potential drug-target interactions to narrow...

The objective of this work is to improve the accuracy of building demand forecasting. This is a more challenging task than grid level forecasting. For the said purpose, we develop a new technique called recurrent transform learning (RTL). Two versions are proposed. The first one (RTL) is unsupervised; this is used as a feature extraction tool that...

In this work we propose a technique to remove sparse impulse noise from hyperspectral images. Our algorithm accounts for the spatial redundancy and spectral correlation of such images. The proposed method is based on the recently introduced Blind Compressed Sensing (BCS) framework, i.e. it empirically learns the spatial and spectral sparsifying dic...

In multi echo imaging, multiple T1/T2 weighted images of the same cross section is acquired. Acquiring multiple scans is time consuming. In order to accelerate, compressed sensing based techniques have been proposed. In recent times, it has been observed in several areas of traditional compressed sensing, that instead of using fixed basis (wavelet,...

This work follows the approach of multi-label classification for non-intrusive load monitoring (NILM). We modify the popular sparse representation based classification (SRC) approach (developed for single label classification) to solve multi-label classification problems. Results on benchmark REDD and Pecan Street dataset shows significant improvem...

The term blind denoising refers to the fact that the basis used for denoising is learnt from the noisy sample itself during denoising. Dictionary learning and transform learning based formulations for blind denoising are well known. But there has been no autoencoder based solution for the said blind denoising approach. So far autoencoder based deno...

Currently there are several well-known approaches to non-intrusive appliance load monitoring rule based, stochastic finite state machines, neural networks and sparse coding. Recently several studies have proposed a new approach based on multi label classification. Different appliances are treated as separate classes, and the task is to identify the...

This work proposes a new framework for deep learning that has been particularly tailored for hyperspectral image classification. We learn multiple levels of dictionaries in a robust fashion. The last layer is discriminative that learns a linear classifier. The training proceeds greedily, at a time a single level of dictionary is learnt and the coef...

In recent studies in hyperspectral imaging, biometrics and energy analytics, the framework of deep dictionary learning has shown promise. Deep dictionary learning outperforms other traditional deep learning tools when training data is limited; therefore hyperspectral imaging is one such example that benefits from this framework. Most of the prior s...

The concept of deep dictionary learning has been recently proposed. Unlike shallow dictionary learning which learns single level of dictionary to represent the data, it uses multiple layers of dictionaries. So far, the problem could only be solved in a greedy fashion; this was achieved by learning a single layer of dictionary in each stage where th...

This work proposes a new image analysis tool called Label Consistent Transform Learning (LCTL). Transform learning is a recent unsupervised representation learning approach; we add supervision by incorporating a label consistency constraint. The proposed technique is especially suited for hyper-spectral image classification problems owing to its ab...

Energy disaggregation is the task of segregating the aggregate energy of the entire building (as logged by the smartmeter) into the energy consumed by individual appliances. This is a single channel (the only channel being the smart-meter) blind source (different electrical appliances) separation problem. The traditional way to address this is via...

In this work we propose an autoencoder based framework for simultaneous reconstruction and classification of biomedical signals. Previously these two tasks, reconstruction and classification were treated as separate problems. This is the first work to propose a combined framework to address the issue in a holistic fashion. Reconstruction techniques...

This work proposes kernel transform learning. The idea of dictionary learning is well known; it is a synthesis formulation where a basis is learnt along with the coefficients so as to generate or synthesize the data. Transform learning is its analysis equivalent; the transforms operates or analyses on the data to generate the coefficients. The conc...

Conventionally, autoencoders are unsupervised representation learning tools. In this work, we propose a novel discriminative autoencoder. Use of supervised discriminative learning ensures that the learned representation is robust to variations commonly encountered in image datasets. Using the basic discriminating autoencoder as a unit, we build a s...

Energy disaggregation is the task of segregating the aggregate energy of the entire building (as logged by the smartmeter) into the energy consumed by individual appliances. This is a single channel (the only channel being the smart-meter) blind source (different electrical appliances) separation problem. In recent times dictionary learning based a...

Subspace clustering assumes that the data is sepa-rable into separate subspaces. Such a simple as-sumption, does not always hold. We assume that, even if the raw data is not separable into subspac-es, one can learn a representation (transform coef-ficients) such that the learnt representation is sep-arable into subspaces. To achieve the intended go...

Latent factor models have been used widely in collaborative filtering based recommender systems. In recent years, deep learning has been successful in solving a wide variety of machine learning problems. Motivated by the success of deep learning, we propose a deeper version of latent factor model. Experiments on benchmark datasets shows that our pr...

Multi-echo magnetic resonance (MR) images are acquired by changing the echo times (for T2 weighted) or relaxation times (for T1 weighted) of scans. The resulting (multi-echo) images are usually used for quantitative MR imaging. Acquiring MR images is a slow process and acquiring multi scans of the same cross section for multi-echo imaging is even s...

In this work, we focus on solving four standard inverse problems in imaging – denoising, deblurring, super-resolution, and reconstruction. All these problems are usually associated with image acquisition. Traditionally, signal processing techniques are used to solve such problems. However, such techniques are computationally expensive. In many appl...

Latent factor models have been used widely in collaborative filtering based recommender systems. In recent years, deep learning has been successful in solving a wide variety of machine learning problems. Motivated by the success of deep learning, we propose a deeper version of latent factor model. Experiments on benchmark datasets shows that our pr...