
Sharu Theresa Jose- Doctor of Philosophy
- PostDoc Position at King's College London
Sharu Theresa Jose
- Doctor of Philosophy
- PostDoc Position at King's College London
About
42
Publications
2,562
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
247
Citations
Introduction
Information theoretic bounds on generalization bounds for learning problems
https://scholar.google.co.in/citations?user=hhaiVGYAAAAJ&hl=en&oi=ao
Current institution
Publications
Publications (42)
The goal of these lecture notes is to review the problem of free energy minimization as a unified framework underlying the definition of maximum entropy modelling, generalized Bayesian inference, learning with latent variables, statistical learning analysis of generalization, and local optimization. Free energy minimization is first introduced, her...
Meta-learning automatically infers an inductive bias by observing data from a number of related tasks. The inductive bias is encoded by hyperparameters that determine aspects of the model class or training algorithm, such as initialization or learning rate. Meta-learning assumes that the learning tasks belong to a task environment, and that tasks a...
The overall predictive uncertainty of a trained predictor can be decomposed into separate contributions due to epistemic and aleatoric uncertainty. Under a Bayesian formulation, assuming a well-specified model, the two contributions can be exactly expressed (for the log-loss) or bounded (for more general losses) in terms of information-theoretic qu...
A key step in quantum machine learning with classical inputs is the design of an embedding circuit mapping inputs to a quantum state. This paper studies a transfer learning setting in which classical-to-quantum embedding is carried out by an arbitrary parametric quantum circuit that is pre-trained based on data from a source task. At run time, the...
Variational quantum algorithms (VQAs) offer the most promising path to obtaining quantum advantages via noisy intermediate-scale quantum (NISQ) processors. Such systems leverage classical optimization to tune the parameters of a parameterized quantum circuit (PQC). The goal is minimizing a cost function that depends on measurement outputs obtained...
This paper presents a novel hybrid quantum generative model, the VAE-QWGAN, which combines the strengths of a classical Variational AutoEncoder (VAE) with a hybrid Quantum Wasserstein Generative Adversarial Network (QWGAN). The VAE-QWGAN integrates the VAE decoder and QGAN generator into a single quantum model with shared parameters, utilizing the...
We study stochastic linear contextual bandits (CB) where the agent observes a noisy version of the true context through a noise channel with unknown channel parameters. Our objective is to design an action policy that can “approximate” that of a Bayesian oracle that has access to the reward model and the noise channel parameter. We introduce a modi...
Learning problems involve settings in which an algorithm has to make decisions based on data, and possibly side information such as expert knowledge. This study has two main goals. First, it reviews and generalizes different results on the data and model complexity of quantum learning, where the data and/or the algorithm can be quantum, focusing on...
Optimal resource allocation in modern communication networks calls for the optimization of objective functions that are only accessible via costly separate evaluations for each candidate solution. The conventional approach carries out the optimization of resource-allocation parameters for each system configuration, characterized, e.g., by topology...
Deep learning has achieved remarkable success in many machine learning tasks such as image classification, speech recognition, and game playing. However, these breakthroughs are often difficult to translate into real-world engineering systems because deep learning models require a massive number of training samples, which are costly to obtain in pr...
Deep learning has achieved remarkable success in many machine learning tasks such as image classification, speech recognition, and game playing. However, these breakthroughs are often difficult to translate into real-world engineering systems because deep learning models require a massive number of training samples, which are costly to obtain in pr...
In vertical federated learning (FL), the features of a data sample are distributed across multiple agents. As such, inter-agent collaboration can be beneficial not only during the learning phase, as is the case for standard horizontal FL, but also during the inference phase. A fundamental theoretical question in this setting is how to quantify the...
Variational quantum algorithms (VQAs) offer the most promising path to obtaining quantum advantages via noisy intermediate-scale quantum (NISQ) processors. Such systems leverage classical optimization to tune the parameters of a parameterized quantum circuit (PQC). The goal is minimizing a cost function that depends on measurement outputs obtained...
Meta-learning automatically infers an inductive bias by observing data from a number of related tasks. The inductive bias is encoded by hyperparameters that determine aspects of the model class or training algorithm, such as initialization or learning rate. Meta-learning assumes that the learning tasks belong to a task environment, and that tasks a...
In vertical federated learning (FL), the features of a data sample are distributed across multiple agents. As such, inter-agent collaboration can be beneficial not only during the learning phase, as is the case for standard horizontal FL, but also during the inference phase. A fundamental theoretical question in this setting is how to quantify the...
Meta-learning optimizes the hyperparameters of a training procedure, such as its initialization, kernel, or learning rate, based on data sampled from a number of auxiliary tasks. A key underlying assumption is that the auxiliary tasks, known as meta-training tasks, share the same generating distribution as the tasks to be encountered at deployment...
Machine unlearning refers to mechanisms that can remove the influence of a subset of training data upon request from a trained model without incurring the cost of re-training from scratch. This paper develops a unified PAC-Bayesian framework for machine unlearning that recovers the two recent design principles - variational unlearning (Nguyen et.al...
The goal of this lecture note is to review the problem of free energy minimization as a unified framework underlying the definition of maximum entropy modeling, generalized Bayesian inference, learning with latent variables, the statistical learning analysis of generalization, and local optimization. Free energy minimization is first introduced, he...
Meta-learning aims at optimizing the hyperparameters of a model class or training algorithm from the observation of data from a number of related tasks. Following the setting of Baxter [1], the tasks are assumed to belong to the same task environment, which is defined by a distribution over the space of tasks and by per-task data distributions. The...
Meta-learning, or "learning to learn", refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average los...
Meta-learning infers an inductive bias---typically in the form of the hyperparameters of a base-learning algorithm---by observing data from a finite number of related tasks. This paper presents an information-theoretic upper bound on the average meta-generalization gap that builds on the conditional mutual information (CMI) framework of Steinke and...
In transfer learning, training and testing data sets are drawn from different data distributions. The transfer generalization gap is the difference between the population loss on the target data distribution and the training loss. The training data set generally includes data drawn from both source and target distributions. This work presents novel...
We study the setting of channel coding over a family of channels whose state is controlled by an adversarial jammer by viewing it as a zero-sum game between a finite blocklength encoder-decoder team, and the jammer. The encoder-decoder team choose stochastic encoding and decoding strategies to minimize the average probability of error in transmissi...
Time-encoded signals, such as social network update logs and spiking traces in neuromorphic processors, are defined by multiple traces carrying information in the timing of events, or spikes. When time-encoded data is processed at a remote site with respect to the location it is produced, the occurrence of events needs to be encoded and transmitted...
This paper considers a zero-sum game between a team of delay-constrained encoder and decoder, and a finite state jammer, with average probability of error as the payoff. The team attempts to communicate a discrete source using a finite blocklength over a finite family of discrete channels whose state is controlled by the jammer. For each strategy o...
We study the setting of channel coding over a family of channels whose state is controlled by an adversarial jammer by viewing it as a zero-sum game between a finite blocklength encoder-decoder team, and the jammer. The encoder-decoder team choose stochastic encoding and decoding strategies to minimize the average probability of error in transmissi...
This paper presents a systematic method to synthesize new finite blocklength converses for the channel coding of asymmetric multiple access channels (A-MAC) from point-to-point converses, by employing the linear programming (LP) based framework in [1]. A direct synthesis yields a converse that extends the Polyanskiy-Poor-VerdúVerd´Verdú metaconvers...
A new finite blocklength converse for the Slepian- Wolf coding problem is presented which significantly improves on the best known converse for this problem, due to Miyake and Kanaya [2]. To obtain this converse, an extension of the linear programming (LP) based framework for finite blocklength point- to-point coding problems from [3] is employed....
A new finite blocklength converse for the Slepian- Wolf coding problem is presented which significantly improves on the best known converse for this problem, due to Miyake and Kanaya [2]. To obtain this converse, an extension of the linear programming (LP) based framework for finite blocklength point- to-point coding problems from [3] is employed....
A linear programming (LP) based framework is presented for obtaining converses for finite blocklength lossy joint source-channel coding problems. The framework applies for any loss criterion, generalizes certain previously known converses, and also extends to multi-terminal settings. The finite blocklength problem is posed equivalently as a nonconv...
The linear programming (LP) based approach we introduced in [1] for finding finite blocklength converses for joint source-channel coding is extended to some network-like settings. Finite blocklength channel coding of compound and averaged channels under the maximum probability error criterion is considered. Through the LP approach new converses are...
This paper illustrates the application of the linear programming (LP) based framework proposed by the authors previously [1] in deriving improved converses for finite blocklength channel coding of a discrete memoryless binary symmetric channel (BSC) and binary erasure channel (BEC). Employing elementary concepts of optimization, finite blocklength...