Ness B. Shroff’s research while affiliated with The Ohio State University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (234)


Figure 3. Regret vs. episodes for NCS-LSVI in an autonomous vehicle merging scenario.
Figure 4. Plot of f (x) showing the dynamics function behavior.
Figure 5. The diagram of autonomous vehicle example: Agent interacts with the environment and observe feedbacks on its location and speed. It utilizes the feedback to imporove the estimation of lane keeping. Then, using lane keeping and a trained collav module it provides the safe set of actions. The decision making module uses the feedback to enhance the estimation on Q function, and then utilizes the saf set to make the next decision. Note that the Collav block is trained a prioir and we are not learning it, but lane keeping and Decision Making are the blocks that RL agent needs to learn.
Figure 6. Regret vs. episodes for NCS-LSVI in an star-convex autonomous vehicle merging scenario.
Provably Efficient RL for Linear MDPs under Instantaneous Safety Constraints in Non-Convex Feature Spaces
  • Preprint
  • File available

February 2025

·

5 Reads

Amirhossein Roknilamouki

·

Arnob Ghosh

·

Ming Shi

·

[...]

·

Ness B. Shroff

In Reinforcement Learning (RL), tasks with instantaneous hard constraints present significant challenges, particularly when the decision space is non-convex or non-star-convex. This issue is especially relevant in domains like autonomous vehicles and robotics, where constraints such as collision avoidance often take a non-convex form. In this paper, we establish a regret bound of O~((1+1τ)log(1τ)d3H4K)\tilde{\mathcal{O}}\bigl(\bigl(1 + \tfrac{1}{\tau}\bigr) \sqrt{\log(\tfrac{1}{\tau}) d^3 H^4 K} \bigr), applicable to both star-convex and non-star-convex cases, where d is the feature dimension, H the episode length, K the number of episodes, and τ\tau the safety threshold. Moreover, the violation of safety constraints is zero with high probability throughout the learning process. A key technical challenge in these settings is bounding the covering number of the value-function class, which is essential for achieving value-aware uniform concentration in model-free function approximation. For the star-convex setting, we develop a novel technique called Objective Constraint-Decomposition (OCD) to properly bound the covering number. This result also resolves an error in a previous work on constrained RL. In non-star-convex scenarios, where the covering number can become infinitely large, we propose a two-phase algorithm, Non-Convex Safe Least Squares Value Iteration (NCS-LSVI), which first reduces uncertainty about the safe set by playing a known safe policy. After that, it carefully balances exploration and exploitation to achieve the regret bound. Finally, numerical simulations on an autonomous driving scenario demonstrate the effectiveness of NCS-LSVI.

Download

Figure 1: A scenario of users interacting with a conversational recommender for restaurant recommendation. (a) Recommender achieves Pareto optimality but receives low rating from user. (b) Recommendations with high users' ratings when the recommender captures users' preferences and aligns optimization with preferences.
Figure 3: A scenario of user's preference feedback is not explicitly provided.
Figure 4: A 2-dimensional hidden preference PAMO-MAB toy example with mean preference c = [0.5, 0.5], illustrating preference estimatê c via linear regression using reward data from (a) Arm-1 (dominated mean reward: [0.2, 0.2]) and (b) Arm-2 (Pareto-optimal mean reward: [0.8, 0.8]).
Figure 6: Regrets of different algorithms under unknown preference environment.
Figure 9: A simple 2D hidden preference PAMO-MAB for preference estimation comparison between standard LR estimator and our tailored WLS-estimator under different samples.
Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization

February 2025

·

1 Read

Multi-objective multi-armed bandit (MO-MAB) problems traditionally aim to achieve Pareto optimality. However, real-world scenarios often involve users with varying preferences across objectives, resulting in a Pareto-optimal arm that may score high for one user but perform quite poorly for another. This highlights the need for customized learning, a factor often overlooked in prior research. To address this, we study a preference-aware MO-MAB framework in the presence of explicit user preference. It shifts the focus from achieving Pareto optimality to further optimizing within the Pareto front under preference-centric customization. To our knowledge, this is the first theoretical study of customized MO-MAB optimization with explicit user preferences. Motivated by practical applications, we explore two scenarios: unknown preference and hidden preference, each presenting unique challenges for algorithm design and analysis. At the core of our algorithms are preference estimation and preference-aware optimization mechanisms to adapt to user preferences effectively. We further develop novel analytical techniques to establish near-optimal regret of the proposed algorithms. Strong empirical performance confirm the effectiveness of our approach.


Figure 2: Evolution of average occupation and max cost of JSED-k in light traffic conditions with different values of the memory size k.
Figure 3: Evolution of the memory size needed to obtain safety as the amount of queues in the system increases.
Figure 4: Comparison of the policies described in section IV for increasing ratio between arrival rate and capacity of the system.
Figure 5: Comparison of the reward and safety guarantees for different policies in the case of a convex generic gain function.
Performing Load Balancing under Constraints

February 2025

·

36 Reads

Join-the-shortest queue (JSQ) and its variants have often been used in solving load balancing problems. JSQ minimizes the average system occupation, e.g., the customer's system time. In this paper, we extend the load balancing setting to include constraints that may be imposed due to the communication network. In particular, we cast the problem in the framework of constrained MDPs: this permit us to address both action-dependent constraints, such as, e.g, bandwidth constraints, and state-dependent constraints, such as, e.g., minimum queue utilization constraints. Unlike the state-of-the-art approaches in load balancing, our policies satisfy the constraints while delivering favorable results in terms of system occupancy. We derive policies that provably satisfy the constraints and evaluate their performance through extensive simulations.


BeST -- A Novel Source Selection Metric for Transfer Learning

January 2025

One of the most fundamental, and yet relatively less explored, goals in transfer learning is the efficient means of selecting top candidates from a large number of previously trained models (optimized for various "source" tasks) that would perform the best for a new "target" task with a limited amount of data. In this paper, we undertake this goal by developing a novel task-similarity metric (BeST) and an associated method that consistently performs well in identifying the most transferrable source(s) for a given task. In particular, our design employs an innovative quantization-level optimization procedure in the context of classification tasks that yields a measure of similarity between a source model and the given target data. The procedure uses a concept similar to early stopping (usually implemented to train deep neural networks (DNNs) to ensure generalization) to derive a function that approximates the transfer learning mapping without training. The advantage of our metric is that it can be quickly computed to identify the top candidate(s) for a given target task before a computationally intensive transfer operation (typically using DNNs) can be implemented between the selected source and the target task. As such, our metric can provide significant computational savings for transfer learning from a selection of a large number of possible source models. Through extensive experimental evaluations, we establish that our metric performs well over different datasets and varying numbers of data samples.


Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters

January 2025

·

1 Read

The recent explosive growth of deep learning (DL) models has necessitated a compelling need for efficient job scheduling for distributed deep learning training with mixed parallelisms (DDLwMP) in GPU clusters. This paper proposes an adaptive shortest-remaining-processing-time-first (A-SRPT) scheduling algorithm, a novel prediction-assisted online scheduling approach designed to mitigate the challenges associated with DL cluster scheduling. By modeling each job as a graph corresponding to heterogeneous Deep Neural Network (DNN) models and their associated distributed training configurations, A-SRPT strategically assigns jobs to the available GPUs, thereby minimizing inter-server communication overhead. Observing that most DDLwMP jobs recur, A-SRPT incorporates a random forest regression model to predict training iterations. Crucially, A-SRPT maps the complex scheduling problem into a single-machine instance, which is addressed optimally by a preemptive "shortest-remaining-processing-time-first" strategy. This optimized solution serves as a guide for actual job scheduling within the GPU clusters, leading to a theoretically provable competitive scheduling efficiency. We conduct extensive real-world testbed and simulation experiments to verify our proposed algorithms.



Artificial Intelligence of Things: A Survey

October 2024

·

93 Reads

The integration of the Internet of Things (IoT) and modern Artificial Intelligence (AI) has given rise to a new paradigm known as the Artificial Intelligence of Things (AIoT). In this survey, we provide a systematic and comprehensive review of AIoT research. We examine AIoT literature related to sensing, computing, and networking & communication, which form the three key components of AIoT. In addition to advancements in these areas, we review domain-specific AIoT systems that are designed for various important application domains. We have also created an accompanying GitHub repository, where we compile the papers included in this survey: https://github.com/AIoT-MLSys-Lab/AIoT-Survey. This repository will be actively maintained and updated with new research as it becomes available. As both IoT and AI become increasingly critical to our society, we believe AIoT is emerging as an essential research field at the intersection of IoT and modern AI. We hope this survey will serve as a valuable resource for those engaged in AIoT research and act as a catalyst for future explorations to bridge gaps and drive advancements in this exciting field.


Figure 1: Finding Pareto front and Pareto front vertices in MO-MDP solution afterward. Due to its flexibility in adapting to changing preferences, this approach has received significant attention. This section gives an overview of the methods to get the Pareto optimal front estimation in multi-objective MDP and MO-RL. As early as in [24], the dynamic programming-based method was proposed to find the optimal Pareto front for MO-MDP, though the size of the candidate non-stationary policy set grows exponentially with the time horizon. To efficiently find the vertices on the Pareto front, [9] introduced the Optimistic Linear Support (OLS) method, which solves a single-objective problem along the direction that improves the current set of candidate policies the most. However, OLS can only find vertices on the Pareto front and does not provide a method for constructing the entire Pareto front from these vertices, which remains non-trivial. Authors of [25] proposed Pareto Q-learning, which learns a set of Pareto optimal policies in the MO-RL setting. Building on this, subsequent work has investigated training a universal Pareto optimal policy set, where the policy network takes preference embedding as input and outputs the corresponding Pareto optimal policy [10, 26, 27, 5]. These Q-learning variants rely on the maximization of Q-function estimations over the action space, making them suitable only for discrete action spaces. Recent works have extended Q-learning with policy networks to handle continuous action spaces [28, 29]. While these Q-learning methods update the Q-function for each preference vector by leveraging the Q-functions of similar visited preference vectors thus avoiding the need to retrain from scratch, they still essentially require updating the Q-function for each visited preference vector. Consequently, they still need to explore the preference space to achieve a near-accurate Pareto front. This process becomes computationally inefficient and impractical as the reward dimension increases. There are also policy-based MORL algorithms designed to find Pareto optimal policies [30, 31, 6, 12]. A concept similar to our approach, which directly searches over the Pareto front, is the Pareto-following algorithm proposed by [12]. This algorithm employs a modified policy-gradient method that adjusts the gradient direction to follow the Pareto front. While it can reduce the complexity of finding the Pareto front by avoiding the need to converge to the optimal policy from scratch, it still requires multiple policy-gradient steps to identify even a nearby Pareto optimal policy. Furthermore, it cannot guarantee comprehensive coverage of the estimated policies, nor can it ensure discovering the true Pareto front. In addition to value-based and policy-based methods that extend from single-objective RL, there are also heuristic approaches for combining policies that have shown promising performance in specific settings. For instance, Rewarded Soup [2, 11] proposes learning optimal policies for individual preference criteria and then linearly interpolating these policies to combine their capabilities. While [2] demonstrated that rewarded soup is optimal when individual rewards are quadratic with respect to policy parameters, this quadratic reward condition is highly restrictive and does not apply to general MDP settings.
Figure 4: Pareto front of a simple MDP with S = 4, A = 3, and D = 3.
Figure 5: Comparison between the proposed Pareto front searching algorithm and the benchmark algorithm when D = 3.
Figure 6: Comparison between the proposed Pareto front searching algorithm and the benchmark algorithm when D = 4.
How to Find the Exact Pareto Front for Multi-Objective MDPs?

October 2024

·

15 Reads

Multi-objective Markov Decision Processes (MDPs) are receiving increasing attention, as real-world decision-making problems often involve conflicting objectives that cannot be addressed by a single-objective MDP. The Pareto front identifies the set of policies that cannot be dominated, providing a foundation for finding optimal solutions that can efficiently adapt to various preferences. However, finding the Pareto front is a highly challenging problem. Most existing methods either (i) rely on traversing the continuous preference space, which is impractical and results in approximations that are difficult to evaluate against the true Pareto front, or (ii) focus solely on deterministic Pareto optimal policies, from which there are no known techniques to characterize the full Pareto front. Moreover, finding the structure of the Pareto front itself remains unclear even in the context of dynamic programming. This work addresses the challenge of efficiently discovering the Pareto front. By investigating the geometric structure of the Pareto front in MO-MDP, we uncover a key property: the Pareto front is on the boundary of a convex polytope whose vertices all correspond to deterministic policies, and neighboring vertices of the Pareto front differ by only one state-action pair of the deterministic policy, almost surely. This insight transforms the global comparison across all policies into a localized search among deterministic policies that differ by only one state-action pair, drastically reducing the complexity of searching for the exact Pareto front. We develop an efficient algorithm that identifies the vertices of the Pareto front by solving a single-objective MDP only once and then traversing the edges of the Pareto front, making it more efficient than existing methods. Our empirical studies demonstrate the effectiveness of our theoretical strategy in discovering the Pareto front.


Figure 1: Comparison of BO-DDNM, DDNM and DDNM + for Gaussian (left) and Gaussian mixture (right) Q 0 under measurement noise.
Figure 2: Distributional bias as a function of the conditioning y (left) and the correlation coefficient ρ (right) for Gaussian Q 0 .
Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers

October 2024

·

21 Reads

The denoising diffusion model has recently emerged as a powerful generative technique, capable of transforming noise into meaningful data. While theoretical convergence guarantees for diffusion models are well established when the target distribution aligns with the training distribution, practical scenarios often present mismatches. One common case is in zero-shot conditional diffusion sampling, where the target conditional distribution is different from the (unconditional) training distribution. These score-mismatched diffusion models remain largely unexplored from a theoretical perspective. In this paper, we present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers, focusing on target distributions with finite second moments. We show that score mismatches result in an asymptotic distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions. This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise. Interestingly, the derived convergence upper bound offers useful guidance for designing a novel bias-optimal zero-shot sampler in linear conditional models that minimizes the asymptotic bias. For such bias-optimal samplers, we further establish convergence guarantees with explicit dependencies on dimension and conditioning, applied to several interesting target distributions, including those with bounded support and Gaussian mixtures. Our findings are supported by numerical studies.



Citations (42)


... Therefore, it is crucial to develop ML methods that respect user privacy and conform to data protection standards [3], [4]. To this end, federated learning (FL) has emerged as an attractive paradigm for distributing ML over networks, as it allows for model updates to occur directly on the edge devices where the data originates [5]- [9]. Information transmitted over the network is in the form of locally trained models (rather than raw data), which are periodically aggregated at a central server. ...

Reference:

Differentially-Private Multi-Tier Federated Learning: A Formal Analysis and Evaluation
Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning?
  • Citing Conference Paper
  • October 2024

... Sensing applications Artificial Intelligence [18-20] Smart lighting Conventional ML and DL models [21] Human activity recognition Conventional ML models [22, 23] Resource-constrained IoT environments [24-26] Energy optimization in buildings Deep Reinforcement Learning models [27-31] Air quality and energy management Deep Reinforcement Learning models [32-34] State-of-the-art research often emphasizes survey studies focusing on practical machine learning (ML) and deep learning (DL) applications, addressing the need for diverse sensing systems to enable smart space applications [19]. Some studies specifically survey ML techniques for smart lighting applications [21] or introduce sensor technologies and deployment strategies leveraging AI methodologies [18,20]. Other research concentrates on human activity recognition (HAR) within smart spaces, exploring public datasets, sensor requirements, and traditional ML models [22,23]. ...

Artificial Intelligence of Things: A Survey

ACM Transactions on Sensor Networks

... Another study proposes a randomized update policy for remote tracking systems, balancing current versus past state information to optimize the trade-off between freshness and reconstruction queue length. This approach combines Last-Come-First-Serve (LCFS) and First-Come-First-Serve (FCFS) disciplines to manage packets in a way that enhances both freshness and reconstruction performance [20]. ...

Balancing Current and Historical State Information in Remote Tracking Systems: A Randomized Update Approach
  • Citing Conference Paper
  • May 2024

... More generally, recent works have shown that online learning techniques can enhance edge caching performance. For example, [16], [31], [32] propose optimistic no-regret algorithms that leverage neural network predictions about future requests, [33] proposes a randomized caching algorithm with a low dynamic regret, and [34] uses online learning to estimate fetching costs. ...

Minimizing Edge Caching Service Costs Through Regret-Optimal Online Learning
  • Citing Article
  • October 2024

IEEE/ACM Transactions on Networking

... Initially, research efforts were centered on analyzing and optimizing the average AoI and peak AoI in communication networks [15], [18], [19], [23]. Recent research endeavors have revealed that the performance of real-time applications can be modeled as non-linear functions of AoI, leading to the study of optimizing these non-linear functions in control system scenarios [27], [36], remote estimation [16], [25], [30], and remote inference [4], [5], [17], [22]. While a number of studies have analyzed AoI in queuing models, closest to the spirit of this paper is the control of AoI via replacement of exogenous data arrivals with the generation of data "at will" [19], [24], [26]- [28], [37]- [39]. ...

Sampling for Remote Estimation of the Wiener Process over an Unreliable Channel
  • Citing Article
  • June 2024

ACM SIGMETRICS Performance Evaluation Review

... Proof: Please refer to Sections 4 and 5.1 in [6]. Compared with the reference, note that the definition of discounted performance cost η(γ) i in (9) has been changed without the (1 − α) term in the front. This change also reflects the deletion of (1 − α) term in (10). ...

Model-Free Change Point Detection for Mixing Processes

IEEE Open Journal of Control Systems

... Extensive research has been conducted on various aspects of UAV networking (see, for example, [7][8][9]). However, several research gaps remain in developing distributed transmission policies that jointly consider (i) interference levels in unlicensed spectrum bands, (ii) transmission queue states regarding buffer size and queuing delay, and (iii) video encoding rate optimization. ...

Energy-Efficient Deadline-Aware Edge Computing: Bandit Learning with Partial Observations in Multi-Channel Systems
  • Citing Conference Paper
  • December 2023

... Richards and Rabbat (2021) studied the minimal eigenvalue of the empirical risk Hessian for a threelayer NN with a linear activation in the first layer for Lipschitz and convex losses (e.g., logistic loss), while we focus on NNs with more general activation functions for least square loss. (See more detailed discussion in U n c o r r e c t e d P r o o f appendix B.4). Ju et al. (2022) studied the generalization performance of overparameterized three-layer NTK models with absolute loss and ReLU activation. They showed that the generalization error is on the order of O(1/ √ n), when there are infinitely many neurons. ...

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models
  • Citing Chapter
  • August 2023

... This limitation arises from difficulties in determining optimal FEC redundancy parameters for dynamic networks in advance, and the significant delays introduced by RTX. Furthermore, bitrate adaptation algorithms (Hu et al. 2023) have been utilized to adjust codec bitrates in response to throughput fluctuations, yet the significant variability in 5G network throughput poses substantial challenges to their accuracy. Moreover, the complexity of distributing packets across heterogeneous network paths adds another layer of complication to video streaming in 5G networks. ...

COREL: Constrained Reinforcement Learning for Video Streaming ABR Algorithm Design Over mmWave 5G
  • Citing Conference Paper
  • October 2023

... When performing client selection, we refer to previous work [50,51] to randomly select low-scoring clients. This is because assuming that some clients cannot participate in the current round of training due to network and other factors and giving up their selection is unfair. ...

Multi-armed bandits with dependent arms

Machine Learning