## About

279

Publications

22,970

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

4,882

Citations

Introduction

**Skills and Expertise**

## Publications

Publications (279)

Comparing the functional behavior of neural network models, whether it is a single network over time or two (or more networks) during or post-training, is an essential step in understanding what they are learning (and what they are not), and for identifying strategies for regularization or efficiency improvements. Despite recent progress, e.g., com...

Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity p...

Recent legislation has led to interest in machine unlearning, i.e., removing specific training samples from a predictive model as if they never existed in the training dataset. Unlearning may also be required due to corrupted/adversarial data or simply a user's updated privacy requirement. For models which require no training (k-NN), simply deletin...

Pooling multiple neuroimaging datasets across institutions often enables improvements in statistical power when evaluating associations (e.g., between risk factors and disease outcomes) that may otherwise be too weak to detect. When there is only a {\em single} source of variability (e.g., different scanners), domain adaptation and matching the dis...

Panel data involving longitudinal measurements of the same set of participants taken over multiple time points is common in studies to understand childhood development and disease modeling. Deep hybrid models that marry the predictive power of neural networks with physical simulators such as differential equations, are starting to drive advances in...

Uncertainty estimation in deep models is essential in many real-world applications and has benefited from developments over the last several years. Recent evidence suggests that existing solutions dependent on simple Gaussian formulations may not be sufficient. However, moving to other distributions necessitates Monte Carlo (MC) sampling to estimat...

Generative models which use explicit density modeling (e.g., variational autoencoders, flow-based generative models) involve finding a mapping from a known distribution, e.g. Gaussian, to the unknown input distribution. This often requires searching over a class of non-linear functions (e.g., representable by a deep neural network). While effective...

Pooling datasets from multiple studies can significantly improve statistical power: larger sample sizes can enable the identification of otherwise weak disease‐specific patterns. When modern learning methods are utilized (e.g., for predicting progression to dementia), differences in data acquisition‐methods / scanner‐protocols can enable the model...

Autosomal dominant Alzheimer’s disease (AD) is caused by known genetic mutations which results in the biochemical consequences that underlie the pathological basis of the disease, and a disease process driven by amyloid accumulation. Mutation carriage is characterized by substantial amyloid accumulation, and dementia onset at or around the age of p...

Dominantly inherited Alzheimer’s disease (DIAD) and late onset Alzheimer’s disease (LOAD) are characterized by the accumulation of amyloid pathology, and neurodegeneration which heralds the onset of dementia. Loss of structural connectivity prior to development of dementia may be measured using techniques that are sensitive to subtle neurodegenerat...

We propose a framework which makes it feasible to directly train deep neural networks with respect to popular families of task-specific non-decomposable performance measures such as AUC, multi-class AUC, F-measure and others. A feature of the optimization model that emerges from these tasks is that it involves solving a Linear Programs (LP) during...

We present an efficient stochastic algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices. We show how this reparametrization (into structured matrices), simple in hindsight, directly presents an opportunity to repurpose/adjust mature techniques for numerical optimization on Riemannian manifol...

Transformer-based models are widely used in natural language processing (NLP). Central to the transformer model is the self-attention mechanism, which captures the interactions of token pairs in the input sequences and depends quadratically on the sequence length. Training such models on longer sequences is expensive. In this paper, we show that a...

We study how stochastic differential equation (SDE) based ideas can inspire new modifications to existing algorithms for a set of problems in computer vision. Loosely speaking, our formulation is related to both explicit and implicit strategies for data augmentation and group equivariance, but is derived from new results in the SDE literature on es...

Uncertainty estimation in deep models is essential in many real-world applications and has benefited from developments over the last several years. Recent evidence [Farquhar et al., 2020] suggests that existing solutions dependent on simple Gaussian formulations may not be sufficient. However, moving to other distributions necessitates Monte Carlo...

Panel data involving longitudinal measurements of the same set of participants taken over multiple time points is common in studies to understand childhood development and disease modeling. Deep hybrid models that marry the predictive power of neural networks with physical simulators such as differential equations, are starting to drive advances in...

We present an efficient stochastic algorithm (RSG+) for canonical correlation analysis (CCA) using a reparametrization of the projection matrices. We show how this reparametrization (into structured matrices), simple in hindsight, directly presents an opportunity to repurpose/adjust mature techniques for numerical optimization on Riemannian manifol...

One strategy for adversarially training a robust model is to maximize its certified radius - the neighborhood around a given training sample for which the model's prediction remains unchanged. The scheme typically involves analyzing a "smoothed" classifier where one estimates the prediction corresponding to Gaussian samples in the neighborhood of e...

We introduce a unified framework to jointly model images, text, and human attention traces. Our work is built on top of the recent Localized Narratives annotation framework [30], where each word of a given caption is paired with a mouse trace segment. We propose two novel tasks: (1) predict a trace given an image and caption (i.e., visual grounding...

One strategy for adversarially training a robust model is to maximize its certified radius -- the neighborhood around a given training sample for which the model's prediction remains unchanged. The scheme typically involves analyzing a "smoothed" classifier where one estimates the prediction corresponding to Gaussian samples in the neighborhood of...

For large-scale testing with graph-associated data, we present an empirical Bayes mixture technique to score local false-discovery rates (FDRs). Compared to procedures that ignore the graph, the proposed Graph-based Mixture Model (GraphMM) method gains power in settings where non-null cases form connected subgraphs, and it does so by regularizing p...

Learning invariant representations is a critical first step in a number of machine learning tasks. A common approach corresponds to the so-called information bottleneck principle in which an application dependent function of mutual information is carefully chosen and optimized. Unfortunately, in practice, these functions are not suitable for optimi...

Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on t...

Learning invariant representations is a critical first step in a number of machine learning tasks. A common approach corresponds to the so-called information bottleneck principle in which an application dependent function of mutual information is carefully chosen and optimized. Unfortunately, in practice, these functions are not suitable for optimi...

Consider a learning algorithm, which involves an internal call to an optimization routine such as a generalized eigenvalue problem, a cone programming problem or even sorting. Integrating such a method as a layer(s) within a trainable deep neural network (DNN) in an efficient and numerically stable way is not straightforward - for instance, only re...

Many measurements or observations in computer vision and machine learning manifest as non-Euclidean data. While recent proposals (like spherical CNN) have extended a number of deep neural network architectures to manifold-valued data, and this has often provided strong improvements in performance, the literature on generative models for manifold da...

Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on t...

Many measurements or observations in computer vision and machine learning manifest as non-Euclidean data. While recent proposals (like spherical CNN) have extended a number of deep neural network architectures to manifold-valued data, and this has often provided strong improvements in performance, the literature on generative models for manifold da...

Recently, neural network architectures have been developed to accommodate when the data has the structure of a graph or, more generally, a hypergraph. While useful, graph structures can be potentially limiting. Hypergraph structures in general do not account for higher order relations between their hyperedges. Simplicial complexes offer a middle gr...

Algorithmic decision making based on computer vision and machine learning methods continues to permeate our lives. But issues related to biases of these models and the extent to which they treat certain segments of the population unfairly, have led to legitimate concerns. There is agreement that because of biases in the datasets we present to the m...

The adoption of "human-in-the-loop" paradigms in computer vision and machine learning is leading to various applications where the actual data acquisition (e.g., human supervision) and the underlying inference algorithms are closely interwined. While classical work in active learning provides effective solutions when the learning module involves cl...

Given impressive abilities of GANs in generating highly realistic images, they are also being used in novel ways in applications in the life sciences, raising an interesting question in scientific or biomedical studies. Consider the setting where we are restricted to only using the samples from a trained GAN for downstream group difference analysis...

Rectified Linear Units (ReLUs) are among the most widely used activation function in a broad variety of tasks in vision. Recent theoretical results suggest that despite their excellent practical performance, in various cases, a substitution with basis expansions (e.g., polynomials) can yield significant benefits from both the optimization and gener...

Consider a learning algorithm, which involves an internal call to an optimization routine such as a generalized eigenvalue problem, a cone programming problem or even sorting. Integrating such a method as layers within a trainable deep network in a numerically stable way is not simple -- for instance, only recently, strategies have emerged for eige...

Inverted bottleneck layers, which are built upon depthwise convolutions, have been the predominant building blocks in state-of-the-art object detection models on mobile devices. In this work, we question the optimality of this design pattern over a broad range of mobile accelerators by revisiting the usefulness of regular convolutions. We achieve s...

Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The F
β
measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prominent...

Algorithmic decision making based on computer vision and machine learning technologies continue to permeate our lives. But issues related to biases of these models and the extent to which they treat certain segments of the population unfairly, have led to concern in the general public. It is now accepted that because of biases in the datasets we pr...

Efforts are underway to study ways via which the power of deep neural networks can be extended to non-standard data types such as structured data (e.g., graphs) or manifold-valued data (e.g., unit vectors or special matrices). Often, sizable empirical improvements are possible when the geometry of such data spaces are incorporated into the design o...

Modern deep networks have proven to be very effective for analyzing real world images. However, their application in medical imaging is still in its early stages, primarily due to the large size of three-dimensional images, requiring enormous convolutional or fully connected layers - if we treat an image (and not image patches) as a sample. These i...

Positron emission tomography (PET) imaging is an imaging modality for diagnosing a number of neurological diseases. In contrast to Magnetic Resonance Imaging (MRI), PET is costly and involves injecting a radioactive substance into the patient. Motivated by developments in modality transfer in vision, we study the generation of certain types of PET...

Efforts are underway to study ways via which the power of deep neural networks can be extended to non-standard data types such as structured data (e.g., graphs) or manifold-valued data (e.g., unit vectors or special matrices). Often, sizable empirical improvements are possible when the geometry of such data spaces are incorporated into the design o...

We develop a conditional generative model for longitudinal image datasets based on sequential invertible neural networks. Longitudinal image acquisitions are common in various scientific and biomedical studies where often each image sequence sample may also come together with various secondary (fixed or temporally dependent) measurements. The key g...

Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The $F_\beta$ measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prom...

Rectified Linear Units (ReLUs) are among the most widely used activation function in a broad variety of tasks in vision. Recent theoretical results suggest that despite their excellent practical performance, in various cases, a substitution with basis expansions (e.g., polynomials) can yield significant benefits from both the optimization and gener...

Positron emission tomography (PET) imaging is an imaging modality for diagnosing a number of neurological diseases. In contrast to Magnetic Resonance Imaging (MRI), PET is costly and involves injecting a radioactive substance into the patient. Motivated by developments in modality transfer in vision, we study the generation of certain types of PET...

A number of results have recently demonstrated the benefits of incorporating various constraints when training deep architectures in vision and machine learning. The advantages range from guarantees for statistical generalization to better accuracy to compression. But support for general constraints within widely used libraries remains scarce and t...

There has recently been a concerted effort to derive mechanisms in vision and machine learning systems to offer uncertainty estimates of the predictions they make. Clearly, there are benefits to a system that is not only accurate but also has a sense for when it is not. Existing proposals center around Bayesian interpretations of modern deep archit...

There is much interest in developing algorithms based on 3D convolutional neural networks (CNNs) for performing regression and classification with brain imaging data and more generally, with biomedical imaging data. A standard assumption in learning is that the training samples are independently drawn from the underlying distribution. In computer v...

Recently it's been shown that neural networks can use images of human faces to accurately predict Body Mass Index (BMI), a widely used health indicator. In this paper we demonstrate that a neural network performing BMI inference is indeed vulnerable to test-time adversarial attacks. This extends test-time adversarial attacks from classification tas...

The design of neural network architectures is frequently either based on human expertise using trial/error and empirical feedback or tackled via large scale reinforcement learning strategies run over distinct discrete architecture choices. In the latter case, the optimization task is non-differentiable and also not very amenable to derivative-free...

Alzheimer's disease (AD) is characterized by substantial neurodegeneration, including both cortical atrophy and loss of underlying white matter fiber tracts. Understanding longitudinal alterations to white matter may provide new insights into trajectories of brain change in both healthy aging and AD, and fluid biomarkers may be particularly useful...

Recent results in coupled or temporal graphical models offer schemes for estimating the relationship structure between features when the data come from related (but distinct) longitudinal sources. A novel application of these ideas is for analyzing group-level differences, i.e., in identifying if trends of estimated objects (e.g., covariance or pre...

Visual relationships provide higher-level information of objects and their relations in an image-this enables a semantic understanding of the scene and helps downstream applications. Given a set of localized objects in some training data, visual relationship detection seeks to detect the most likely "relationship" between objects in a given image....

Generative models using neural network have opened a door to large-scale studies for various application domains, especially for studies that suffer from lack of real samples to obtain statistically robust inference. Typically, these generative models would train on existing data to learn the underlying distribution of the measurements (e.g., image...

In addition to the development of beta amyloid plaques and neurofibrillary tangles, Alzheimer's disease (AD) involves the loss of connecting structures including degeneration of myelinated axons and synaptic connections. However, the extent to which white matter tracts change longitudinally, particularly in the asymptomatic, preclinical stage of AD...

Characterizing Alzheimer's disease (AD) at preclinical stages is crucial for initiating early treatment strategies. It is widely accepted that amyloid accumulation is a primary pathological event in AD. Also, loss of connectivity between brain regions is suspected of contributing to cognitive decline, but studies which test these associations using...

Computational tools in the form of software packages are burgeoning in the field of medical imaging and biomedical research. These tools enable biomedical researchers to analyze a variety of data using modern machine learning and statistical analysis techniques. While these publicly available software packages are a great step towards a multiplicat...

A sizable body of work on relative attributes provides evidence
that relating pairs of images along a continuum of strength pertaining
to a visual attribute yields improvements in a variety of vision
tasks. In this paper, we show how emerging ideas in graph neural networks
can yield a solution to various problems that broadly fall under
relative at...

A sizable body of work on relative attributes provides evidence that relating pairs of images along a continuum of strength pertaining to a visual attribute yields improvements in a variety of vision tasks. In this paper, we show how emerging ideas in graph neural networks can yield a solution to various problems that broadly fall under relative at...