# Sathya N. Ravi's research while affiliated with University of Illinois at Chicago and other places

## Publications (46)

Conference Paper
Pooling multiple neuroimaging datasets across institutions often enables improvements in statistical power when evaluating associations (e.g., between risk factors and disease outcomes) that may otherwise be too weak to detect. When there is only a single source of variability (e.g., different scanners), domain adaptation and matching the distribut...
Preprint
Full-text available
Recent legislation has led to interest in machine unlearning, i.e., removing specific training samples from a predictive model as if they never existed in the training dataset. Unlearning may also be required due to corrupted/adversarial data or simply a user's updated privacy requirement. For models which require no training (k-NN), simply deletin...
Preprint
Full-text available
Pooling multiple neuroimaging datasets across institutions often enables improvements in statistical power when evaluating associations (e.g., between risk factors and disease outcomes) that may otherwise be too weak to detect. When there is only a {\em single} source of variability (e.g., different scanners), domain adaptation and matching the dis...
Preprint
Full-text available
Panel data involving longitudinal measurements of the same set of participants taken over multiple time points is common in studies to understand childhood development and disease modeling. Deep hybrid models that marry the predictive power of neural networks with physical simulators such as differential equations, are starting to drive advances in...
Article
We propose a framework which makes it feasible to directly train deep neural networks with respect to popular families of task-specific non-decomposable performance measures such as AUC, multi-class AUC, F-measure and others. A feature of the optimization model that emerges from these tasks is that it involves solving a Linear Programs (LP) during...
Preprint
Transformer-based models are widely used in natural language processing (NLP). Central to the transformer model is the self-attention mechanism, which captures the interactions of token pairs in the input sequences and depends quadratically on the sequence length. Training such models on longer sequences is expensive. In this paper, we show that a...
Preprint
We study how stochastic differential equation (SDE) based ideas can inspire new modifications to existing algorithms for a set of problems in computer vision. Loosely speaking, our formulation is related to both explicit and implicit strategies for data augmentation and group equivariance, but is derived from new results in the SDE literature on es...
Article
Panel data involving longitudinal measurements of the same set of participants taken over multiple time points is common in studies to understand childhood development and disease modeling. Deep hybrid models that marry the predictive power of neural networks with physical simulators such as differential equations, are starting to drive advances in...
Preprint
Learning invariant representations is a critical first step in a number of machine learning tasks. A common approach corresponds to the so-called information bottleneck principle in which an application dependent function of mutual information is carefully chosen and optimized. Unfortunately, in practice, these functions are not suitable for optimi...
Article
Learning invariant representations is a critical first step in a number of machine learning tasks. A common approach corresponds to the so-called information bottleneck principle in which an application dependent function of mutual information is carefully chosen and optimized. Unfortunately, in practice, these functions are not suitable for optimi...
Article
Consider a learning algorithm, which involves an internal call to an optimization routine such as a generalized eigenvalue problem, a cone programming problem or even sorting. Integrating such a method as a layer(s) within a trainable deep neural network (DNN) in an efficient and numerically stable way is not straightforward - for instance, only re...
Chapter
Algorithmic decision making based on computer vision and machine learning methods continues to permeate our lives. But issues related to biases of these models and the extent to which they treat certain segments of the population unfairly, have led to legitimate concerns. There is agreement that because of biases in the datasets we present to the m...
Article
Generative adversarial networks (GANs) have emerged as a powerful generative model in computer vision. Given their impressive abilities in generating highly realistic images, they are also being used in novel ways in applications in the life sciences. This raises an interesting question when GANs are used in scientific or biomedical studies. Consid...
Conference Paper
Rectified Linear Units (ReLUs) are among the most widely used activation function in a broad variety of tasks in vision. Recent theoretical results suggest that despite their excellent practical performance, in various cases, a substitution with basis expansions (e.g., polynomials) can yield significant benefits from both the optimization and gener...
Preprint
Consider a learning algorithm, which involves an internal call to an optimization routine such as a generalized eigenvalue problem, a cone programming problem or even sorting. Integrating such a method as layers within a trainable deep network in a numerically stable way is not simple -- for instance, only recently, strategies have emerged for eige...
Article
Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The F β measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prominent...
Preprint
Algorithmic decision making based on computer vision and machine learning technologies continue to permeate our lives. But issues related to biases of these models and the extent to which they treat certain segments of the population unfairly, have led to concern in the general public. It is now accepted that because of biases in the datasets we pr...
Preprint
Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The $F_\beta$ measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prom...
Preprint
Rectified Linear Units (ReLUs) are among the most widely used activation function in a broad variety of tasks in vision. Recent theoretical results suggest that despite their excellent practical performance, in various cases, a substitution with basis expansions (e.g., polynomials) can yield significant benefits from both the optimization and gener...
Article
The impact of numerical optimization on modern data analysis has been quite significant. Today, these methods lie at the heart of most statistical machine learning applications in domains spanning genomics, finance and medicine. The expanding scope of these applications (and the complexity of the associated data) has continued to raise the expectat...
Article
A number of results have recently demonstrated the benefits of incorporating various constraints when training deep architectures in vision and machine learning. The advantages range from guarantees for statistical generalization to better accuracy to compression. But support for general constraints within widely used libraries remains scarce and t...
Conference Paper
Full-text available
Visual relationships provide higher-level information of objects and their relations in an image-this enables a semantic understanding of the scene and helps downstream applications. Given a set of localized objects in some training data, visual relationship detection seeks to detect the most likely "relationship" between objects in a given image....
Article
We revisit the Blind Deconvolution problem with a focus on understanding its robustness and convergence properties. Provable robustness to noise and other perturbations is receiving recent interest in vision, from obtaining immunity to adversarial attacks to assessing and describing failure modes of algorithms in mission critical applications. Furt...
Article
Full-text available
A number of results have recently demonstrated the benefits of incorporating various constraints when training deep architectures in vision and machine learning. The advantages range from guarantees for statistical generalization to better accuracy to compression. But support for general constraints within widely used libraries remains scarce and t...
Article
We present a new Frank-Wolfe (FW) type algorithm that is applicable to minimization problems with a nonsmooth convex objective. We provide convergence bounds and show that the scheme yields so-called coreset results for various Machine Learning problems including 1-median, Balanced Development, Sparse PCA, Graph Cuts, and the $\ell_1$-norm-regulari...
Article
Full-text available
We study mechanisms to characterize how the asymptotic convergence of backpropagation in deep architectures, in general, is related to the network structure, and how it may be influenced by other design choices including activation type, denoising and dropout rate. We seek to analyze whether network architecture and input data statistics may guide...
Conference Paper
The regularization and output consistency behavior of dropout and layer-wise pretraining for learning deep networks have been fairly well studied. However, our understanding of how the asymptotic convergence of backpropagation in deep architectures is related to the structural properties of the network and other design choices (like denoising and d...
Article
Full-text available
There is a great deal of interest in using large scale brain imaging studies to understand how brain connectivity evolves over time for an individual and how it varies over different levels/quantiles of cognitive function. To do so, one typically performs so-called tractography procedures on diffusion MR brain images and derives measures of brain c...
Article
Budget constrained optimal design of experiments is a well studied problem. Although the literature is very mature, not many strategies are available when these design problems appear in the context of sparse linear models commonly encountered in high dimensional machine learning. In this work, we study this budget constrained design where the unde...
Article
Consider samples from two different data sources [Formula: see text] and [Formula: see text]. We only observe their transformed versions [Formula: see text] and [Formula: see text], for some known function class h(·) and g(·). Our goal is to perform a statistical test checking if Psource = Ptarget while removing the distortions induced by the trans...
Conference Paper
Article
A variety of studies in neuroscience/neuroimaging seek to perform statistical inference on the acquired brain image scans for diagnosis as well as understanding the pathological manifestation of diseases. To do so, an important first step is to register (or co-register) all of the image data into a common coordinate system. This permits meaningful...
Conference Paper
Eigenvalue problems are ubiquitous in computer vision, covering a very broad spectrum of applications ranging from estimation problems in multi-view geometry to image segmentation. Few other linear algebra problems have a more mature set of numerical routines available and many computer vision libraries leverage such tools extensively. However, the...
Article
Full-text available
The regularization and output consistency offered by dropout and layer-wise pretraining for learning deep networks have been well studied. However, our understanding about the explicit convergence of parameter estimates, and their dependence on structural (like depth and layer lengths) and learning (like denoising and dropout rate) aspects is less...
Article
Unsupervised pretraining and dropout have been well studied, especially with respect to regularization and output consistency. However, our understanding about the explicit convergence rates of the parameter estimates, and their dependence on the learning (like denoising and dropout rate) and structural (like depth and layer lengths) aspects of the...
Article
The success of deep architectures is at least in part attributed to the layer-by-layer unsupervised pre-training that initializes the network. Various papers have reported extensive empirical analysis focusing on the design and implementation of good pre-training procedures. However, an understanding pertaining to the consistency of parameter estim...

## Citations

... In [10], MI between prediction and the sensitive attributes is used to train a fair classifier whereas [2] describes the use of inverse contrastive loss. Group-theoretic approaches have also been described in [11,45]. The work in [41] gives an empirical solution to remove specific visual features from the latent variables using adversarial training. ...
... A teacher-student based framework for class-level unlearning as well as random cohort unlearning was introduced in [28]. Mehta et al. [29] introduced a variant of conditional independence coefficient and using with a Markov blanket selection method, achieved deep unlearning in vision and language problems. Ye et al. [30] proposed a learning with recoverable forgetting scheme that supports task and sample specific knowledge removal. ...
... The amounts of data and computational power required for learning have increased. Deep learning uses DNNs with hundreds of layers and a large number of parameters related to structure [145][146][147][148][149][150][151][152][153][154][155][156][157][158][159][160][161]. Therefore, it is prone to overfitting, which is a condition where the learning data are overfitted, generalization is not possible, and high accuracy cannot be achieved with unknown data. ...
... the model be invariant to the categorical variable denoting "site". While this is not a "solved" problem, this strategy has been successfully deployed based on results in invariant representation learning [3,5,34] (see Fig. 1). One may alternatively view this task via the lens of fairness -we want the model's performance to be fair with respect to the site variable. ...
... Fairness is becoming an important issue to consider in the design of learning algorithms. A common strategy to make an algorithm fair is to remove the influence of one/more protected attributes when training the models, see [28]. Most methods assume that the labels of protected attributes are known during training but this may not always be possible. ...
... There is an extensive variety of literature on semisupervised learning algorithms, especially after the boom of deep learning [41]. Among them, pseudo-label based methods [1,6,25,34,35,38] train a model on the existing labeled data and then use this model to generate pseudo-labels on the unlabeled data, which will later be used for additional training. Another emerging direction is to leverage selfsupervised learning algorithms such as RotNet [14], Jig-Saw [29], SimCLR [10], or MOCO [16] for unsupervised pretraining and then fine-tune with the limited labeled set [11,19]. ...
... Such results on stochastic and online data models have also been explored in Ataman et al. [2006], Cortes and Mohri [2004], Gao et al. [2013]. There are also available strategies for measures other than the AUC: Nan et al. [2012], Dembczynski et al. [2011] give exact algorithms for optimizing F-score and Eban et al. [2017], Ravi et al. [2020] proposes scalable methods for non-decomposable objectives which utilizes Lagrange multipliers to construct the proxy objectives. The authors in Mohapatra et al. [2018] discuss using a function that upper bounds (structured) hinge-loss to optimize average precision. ...
... Examples of this intuition in vision. [3,12,39] and others have shown that models trained on complex tasks tend to delegate subnetworks to specific regions of the input space. That is, parameters and functions within networks tend to (or can be encouraged to) act in blocks. ...
... Wang et al., 2019;Hendriks et al., 2021). It has been reported that including prior information, particularly smoothness constraints, significantly improves the generalization capability of image processing networks (Ravi et al., 2019;Rosca et al., 2020). Because prior information helps improve searching efficiency by restricting the data space (Sirignano and Spiliopoulos, 2018), incorporating physical constraints can potentially improve the convergence as well as inference performance of neural networks. ...
... Having said that, there are a lot of interesting ideas in the classical quantization methods in DSP that have been applied to NN quantization, and in particular vector quantization [9]. In particular, the work of [1,30,73,82,115,164,174,182,243] clusters the weights into different groups and use the centroid of each group as quantized values during inference. As shown in Eq. 13, i is the index of weights in a tensor, c 1 , ..., c k are the k centroids found by the clustering, and c j is the corresponding centroid to w i . ...