Sharu Theresa Jose

Sharu Theresa Jose
  • Doctor of Philosophy
  • PostDoc Position at King's College London

About

42
Publications
2,562
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
247
Citations
Current institution
King's College London
Current position
  • PostDoc Position

Publications

Publications (42)
Preprint
Full-text available
The goal of these lecture notes is to review the problem of free energy minimization as a unified framework underlying the definition of maximum entropy modelling, generalized Bayesian inference, learning with latent variables, statistical learning analysis of generalization, and local optimization. Free energy minimization is first introduced, her...
Preprint
Full-text available
Meta-learning automatically infers an inductive bias by observing data from a number of related tasks. The inductive bias is encoded by hyperparameters that determine aspects of the model class or training algorithm, such as initialization or learning rate. Meta-learning assumes that the learning tasks belong to a task environment, and that tasks a...
Preprint
Full-text available
The overall predictive uncertainty of a trained predictor can be decomposed into separate contributions due to epistemic and aleatoric uncertainty. Under a Bayesian formulation, assuming a well-specified model, the two contributions can be exactly expressed (for the log-loss) or bounded (for more general losses) in terms of information-theoretic qu...
Preprint
Full-text available
A key step in quantum machine learning with classical inputs is the design of an embedding circuit mapping inputs to a quantum state. This paper studies a transfer learning setting in which classical-to-quantum embedding is carried out by an arbitrary parametric quantum circuit that is pre-trained based on data from a source task. At run time, the...
Preprint
Variational quantum algorithms (VQAs) offer the most promising path to obtaining quantum advantages via noisy intermediate-scale quantum (NISQ) processors. Such systems leverage classical optimization to tune the parameters of a parameterized quantum circuit (PQC). The goal is minimizing a cost function that depends on measurement outputs obtained...
Preprint
Full-text available
This paper presents a novel hybrid quantum generative model, the VAE-QWGAN, which combines the strengths of a classical Variational AutoEncoder (VAE) with a hybrid Quantum Wasserstein Generative Adversarial Network (QWGAN). The VAE-QWGAN integrates the VAE decoder and QGAN generator into a single quantum model with shared parameters, utilizing the...
Article
Full-text available
We study stochastic linear contextual bandits (CB) where the agent observes a noisy version of the true context through a noise channel with unknown channel parameters. Our objective is to design an action policy that can “approximate” that of a Bayesian oracle that has access to the reward model and the noise channel parameter. We introduce a modi...
Article
Learning problems involve settings in which an algorithm has to make decisions based on data, and possibly side information such as expert knowledge. This study has two main goals. First, it reviews and generalizes different results on the data and model complexity of quantum learning, where the data and/or the algorithm can be quantum, focusing on...
Article
Optimal resource allocation in modern communication networks calls for the optimization of objective functions that are only accessible via costly separate evaluations for each candidate solution. The conventional approach carries out the optimization of resource-allocation parameters for each system configuration, characterized, e.g., by topology...
Book
Deep learning has achieved remarkable success in many machine learning tasks such as image classification, speech recognition, and game playing. However, these breakthroughs are often difficult to translate into real-world engineering systems because deep learning models require a massive number of training samples, which are costly to obtain in pr...
Preprint
Full-text available
Deep learning has achieved remarkable success in many machine learning tasks such as image classification, speech recognition, and game playing. However, these breakthroughs are often difficult to translate into real-world engineering systems because deep learning models require a massive number of training samples, which are costly to obtain in pr...
Article
Full-text available
In vertical federated learning (FL), the features of a data sample are distributed across multiple agents. As such, inter-agent collaboration can be beneficial not only during the learning phase, as is the case for standard horizontal FL, but also during the inference phase. A fundamental theoretical question in this setting is how to quantify the...
Article
Full-text available
Variational quantum algorithms (VQAs) offer the most promising path to obtaining quantum advantages via noisy intermediate-scale quantum (NISQ) processors. Such systems leverage classical optimization to tune the parameters of a parameterized quantum circuit (PQC). The goal is minimizing a cost function that depends on measurement outputs obtained...
Article
Full-text available
Meta-learning automatically infers an inductive bias by observing data from a number of related tasks. The inductive bias is encoded by hyperparameters that determine aspects of the model class or training algorithm, such as initialization or learning rate. Meta-learning assumes that the learning tasks belong to a task environment, and that tasks a...
Preprint
Full-text available
In vertical federated learning (FL), the features of a data sample are distributed across multiple agents. As such, inter-agent collaboration can be beneficial not only during the learning phase, as is the case for standard horizontal FL, but also during the inference phase. A fundamental theoretical question in this setting is how to quantify the...
Preprint
Full-text available
Meta-learning optimizes the hyperparameters of a training procedure, such as its initialization, kernel, or learning rate, based on data sampled from a number of auxiliary tasks. A key underlying assumption is that the auxiliary tasks, known as meta-training tasks, share the same generating distribution as the tasks to be encountered at deployment...
Preprint
Full-text available
Machine unlearning refers to mechanisms that can remove the influence of a subset of training data upon request from a trained model without incurring the cost of re-training from scratch. This paper develops a unified PAC-Bayesian framework for machine unlearning that recovers the two recent design principles - variational unlearning (Nguyen et.al...
Article
The goal of this lecture note is to review the problem of free energy minimization as a unified framework underlying the definition of maximum entropy modeling, generalized Bayesian inference, learning with latent variables, the statistical learning analysis of generalization, and local optimization. Free energy minimization is first introduced, he...
Preprint
Full-text available
Meta-learning aims at optimizing the hyperparameters of a model class or training algorithm from the observation of data from a number of related tasks. Following the setting of Baxter [1], the tasks are assumed to belong to the same task environment, which is defined by a distribution over the space of tasks and by per-task data distributions. The...
Preprint
Full-text available
Meta-learning, or "learning to learn", refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average los...
Preprint
Full-text available
Meta-learning infers an inductive bias---typically in the form of the hyperparameters of a base-learning algorithm---by observing data from a finite number of related tasks. This paper presents an information-theoretic upper bound on the average meta-generalization gap that builds on the conditional mutual information (CMI) framework of Steinke and...
Preprint
Full-text available
In transfer learning, training and testing data sets are drawn from different data distributions. The transfer generalization gap is the difference between the population loss on the target data distribution and the training loss. The training data set generally includes data drawn from both source and target distributions. This work presents novel...
Article
We study the setting of channel coding over a family of channels whose state is controlled by an adversarial jammer by viewing it as a zero-sum game between a finite blocklength encoder-decoder team, and the jammer. The encoder-decoder team choose stochastic encoding and decoding strategies to minimize the average probability of error in transmissi...
Preprint
Full-text available
Time-encoded signals, such as social network update logs and spiking traces in neuromorphic processors, are defined by multiple traces carrying information in the timing of events, or spikes. When time-encoded data is processed at a remote site with respect to the location it is produced, the occurrence of events needs to be encoded and transmitted...
Conference Paper
Full-text available
This paper considers a zero-sum game between a team of delay-constrained encoder and decoder, and a finite state jammer, with average probability of error as the payoff. The team attempts to communicate a discrete source using a finite blocklength over a finite family of discrete channels whose state is controlled by the jammer. For each strategy o...
Preprint
Full-text available
We study the setting of channel coding over a family of channels whose state is controlled by an adversarial jammer by viewing it as a zero-sum game between a finite blocklength encoder-decoder team, and the jammer. The encoder-decoder team choose stochastic encoding and decoding strategies to minimize the average probability of error in transmissi...
Conference Paper
Full-text available
This paper presents a systematic method to synthesize new finite blocklength converses for the channel coding of asymmetric multiple access channels (A-MAC) from point-to-point converses, by employing the linear programming (LP) based framework in [1]. A direct synthesis yields a converse that extends the Polyanskiy-Poor-VerdúVerd´Verdú metaconvers...
Article
Full-text available
A new finite blocklength converse for the Slepian- Wolf coding problem is presented which significantly improves on the best known converse for this problem, due to Miyake and Kanaya [2]. To obtain this converse, an extension of the linear programming (LP) based framework for finite blocklength point- to-point coding problems from [3] is employed....
Preprint
A new finite blocklength converse for the Slepian- Wolf coding problem is presented which significantly improves on the best known converse for this problem, due to Miyake and Kanaya [2]. To obtain this converse, an extension of the linear programming (LP) based framework for finite blocklength point- to-point coding problems from [3] is employed....
Article
Full-text available
A linear programming (LP) based framework is presented for obtaining converses for finite blocklength lossy joint source-channel coding problems. The framework applies for any loss criterion, generalizes certain previously known converses, and also extends to multi-terminal settings. The finite blocklength problem is posed equivalently as a nonconv...
Conference Paper
Full-text available
The linear programming (LP) based approach we introduced in [1] for finding finite blocklength converses for joint source-channel coding is extended to some network-like settings. Finite blocklength channel coding of compound and averaged channels under the maximum probability error criterion is considered. Through the LP approach new converses are...
Conference Paper
Full-text available
This paper illustrates the application of the linear programming (LP) based framework proposed by the authors previously [1] in deriving improved converses for finite blocklength channel coding of a discrete memoryless binary symmetric channel (BSC) and binary erasure channel (BEC). Employing elementary concepts of optimization, finite blocklength...

Network

Cited By