Subbarao Kambhampati’s research while affiliated with Arizona State University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (452)


Methods and Mechanisms for Interactive Novelty Handling in Adversarial Environments
  • Preprint

February 2023

·

14 Reads

·

Ming Shen

·

Mayank Garg

·

[...]

·

Learning to detect, characterize and accommodate novelties is a challenge that agents operating in open-world domains need to address to be able to guarantee satisfactory task performance. Certain novelties (e.g., changes in environment dynamics) can interfere with the performance or prevent agents from accomplishing task goals altogether. In this paper, we introduce general methods and architectural mechanisms for detecting and characterizing different types of novelties, and for building an appropriate adaptive model to accommodate them utilizing logical representations and reasoning methods. We demonstrate the effectiveness of the proposed methods in evaluations performed by a third party in the adversarial multi-agent board game Monopoly. The results show high novelty detection and accommodation rates across a variety of novelty types, including changes to the rules of the game, as well as changes to the agent's action capabilities.


Data Driven Reward Initialization for Preference based Reinforcement Learning

February 2023

·

8 Reads

Preference-based Reinforcement Learning (PbRL) methods utilize binary feedback from the human in the loop (HiL) over queried trajectory pairs to learn a reward model in an attempt to approximate the human's underlying reward function capturing their preferences. In this work, we investigate the issue of a high degree of variability in the initialized reward models which are sensitive to random seeds of the experiment. This further compounds the issue of degenerate reward functions PbRL methods already suffer from. We propose a data-driven reward initialization method that does not add any additional cost to the human in the loop and negligible cost to the PbRL agent and show that doing so ensures that the predicted rewards of the initialized reward model are uniform in the state space and this reduces the variability in the performance of the method across multiple runs and is shown to improve the overall performance compared to other initialization methods.


Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning
  • Preprint
  • File available

February 2023

·

19 Reads

Preference Based Reinforcement Learning has shown much promise for utilizing human binary feedback on queried trajectory pairs to recover the underlying reward model of the Human in the Loop (HiL). While works have attempted to better utilize the queries made to the human, in this work we make two observations about the unlabeled trajectories collected by the agent and propose two corresponding loss functions that ensure participation of unlabeled trajectories in the reward learning process, and structure the embedding space of the reward model such that it reflects the structure of state space with respect to action distances. We validate the proposed method on one locomotion domain and one robotic manipulation task and compare with the state-of-the-art baseline PEBBLE. We further present an ablation of the proposed loss components across both the domains and find that not only each of the loss components perform better than the baseline, but the synergic combination of the two has much better reward recovery and human feedback sample efficiency.

Download

A State Augmentation based approach to Reinforcement Learning from Human Preferences

February 2023

·

4 Reads

Reinforcement Learning has suffered from poor reward specification, and issues for reward hacking even in simple enough domains. Preference Based Reinforcement Learning attempts to solve the issue by utilizing binary feedbacks on queried trajectory pairs by a human in the loop indicating their preferences about the agent's behavior to learn a reward model. In this work, we present a state augmentation technique that allows the agent's reward model to be robust and follow an invariance consistency that significantly improved performance, i.e. the reward recovery and subsequent return computed using the learned policy over our baseline PEBBLE. We validate our method on three domains, Mountain Car, a locomotion task of Quadruped-Walk, and a robotic manipulation task of Sweep-Into, and find that using the proposed augmentation the agent not only benefits in the overall performance but does so, quite early in the agent's training phase.


Figure 5: Interface at the plan writing phase without LLM assistance.
Figure 6: Interface at plan writing phase with assistance from the LLM.
Figure 7: Description of the translate panel.
Figure 8: Interface at the plan translation phase
On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

February 2023

·

124 Reads

·

4 Citations

Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) how good LLMs are by themselves in generating and validating simple plans in commonsense planning tasks (of the type that humans are generally quite good at) and (2) how good LLMs are in being a source of heuristic guidance for other agents--either AI planners or human planners--in their planning tasks. To investigate these questions in a systematic rather than anecdotal manner, we start by developing a benchmark suite based on the kinds of domains employed in the International Planning Competition. On this benchmark, we evaluate LLMs in three modes: autonomous, heuristic and human-in-the-loop. Our results show that LLM's ability to autonomously generate executable plans is quite meager, averaging only about 3% success rate. The heuristic and human-in-the-loop modes show slightly more promise. In addition to these results, we also make our benchmark and evaluation tools available to support investigations by research community.


Using Deception in Markov Game to Understand Adversarial Behaviors Through a Capture-The-Flag Environment

February 2023

·

9 Reads

Lecture Notes in Computer Science

Identifying the actual adversarial threat against a system vulnerability has been a long-standing challenge for cybersecurity research. To determine an optimal strategy for the defender, game-theoretic based decision models have been widely used to simulate the real-world attacker-defender scenarios while taking the defender’s constraints into consideration. In this work, we focus on understanding human attacker behaviors in order to optimize the defender’s strategy. To achieve this goal, we model attacker-defender engagements as Markov Games and search for their Bayesian Stackelberg Equilibrium. We validate our modeling approach and report our empirical findings using a Capture-The-Flag (CTF) setup, and we conduct user studies on adversaries with varying skill-levels. Our studies show that application-level deceptions are an optimal mitigation strategy against targeted attacks—outperforming classic cyber-defensive maneuvers, such as patching or blocking network requests. We use this result to further hypothesize over the attacker’s behaviors when trapped in an embedded honeypot environment and present a detailed analysis of the same.


A Mental Model Based Theory of Trust

January 2023

·

41 Reads

Handling trust is one of the core requirements for facilitating effective interaction between the human and the AI agent. Thus, any decision-making framework designed to work with humans must possess the ability to estimate and leverage human trust. In this paper, we propose a mental model based theory of trust that not only can be used to infer trust, thus providing an alternative to psychological or behavioral trust inference methods, but also can be used as a foundation for any trust-aware decision-making frameworks. First, we introduce what trust means according to our theory and then use the theory to define trust evolution, human reliance and decision making, and a formalization of the appropriate level of trust in the agent. Using human subject studies, we compare our theory against one of the most common trust scales (Muir scale) to evaluate 1) whether the observations from the human studies match our proposed theory and 2) what aspects of trust are more aligned with our proposed theory.


Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences

October 2022

·

46 Reads

Generating complex behaviors from goals specified by non-expert users is a crucial aspect of intelligent agents. Interactive reward learning from trajectory comparisons is one way to allow non-expert users to convey complex objectives by expressing preferences over short clips of agent behaviors. Even though this method can encode complex tacit knowledge present in the underlying tasks, it implicitly assumes that the human is unable to provide rich-form feedback other than binary preference labels, leading to extremely high feedback complexity and poor user experience. While providing a detailed symbolic specification of the objectives might be tempting, it is not always feasible even for an expert user. However, in most cases, humans are aware of how the agent should change its behavior along meaningful axes to fulfill the underlying purpose, even if they are not able to fully specify task objectives symbolically. Using this as motivation, we introduce the notion of Relative Behavioral Attributes, which acts as a middle ground, between exact goal specification and reward learning purely from preference labels, by enabling the users to tweak the agent's behavior through nameable concepts (e.g., increasing the softness of the movement of a two-legged "sneaky" agent). We propose two different parametric methods that can potentially encode any kind of behavioral attributes from ordered behavior clips. We demonstrate the effectiveness of our methods on 4 tasks with 9 different behavioral attributes and show that once the attributes are learned, end users can effortlessly produce desirable agent behaviors, by providing feedback just around 10 times. The feedback complexity of our approach is over 10 times less than the learning-from-human-preferences baseline and this demonstrates that our approach is readily applicable in real-world applications.


Using Deception in Markov Game to Understand Adversarial Behaviors through a Capture-The-Flag Environment

October 2022

·

51 Reads

Identifying the actual adversarial threat against a system vulnerability has been a long-standing challenge for cybersecurity research. To determine an optimal strategy for the defender, game-theoretic based decision models have been widely used to simulate the real-world attacker-defender scenarios while taking the defender's constraints into consideration. In this work, we focus on understanding human attacker behaviors in order to optimize the defender's strategy. To achieve this goal, we model attacker-defender engagements as Markov Games and search for their Bayesian Stackelberg Equilibrium. We validate our modeling approach and report our empirical findings using a Capture-The-Flag (CTF) setup, and we conduct user studies on adversaries with varying skill-levels. Our studies show that application-level deceptions are an optimal mitigation strategy against targeted attacks -- outperforming classic cyber-defensive maneuvers, such as patching or blocking network requests. We use this result to further hypothesize over the attacker's behaviors when trapped in an embedded honeypot environment and present a detailed analysis of the same.


Figure 1: Overview of PRESCA. (1) The user specifies their preference in terms of some symbolic concept. If concept is not present in the symbolic interface, then the user provides its causal relationship to some known concept. (2) PRESCA then generates likely positive examples and negative examples of the concept and query their label to the user. (3) After getting labels, PRESCA learns a classifier for the target concept and (4) incorporates user's preference in agent's training. (5) Finally, the concept is added to the interface.
Figure 2: (a) Instance of the Minecraft environment with two possible plans marked with arrows. The user prefers that the agent avoid going into the storage area (indicated by the red arrows) (b) the causal model of the Minecraft environment.
Towards customizable reinforcement learning agents: Enabling preference specification through online vocabulary expansion

October 2022

·

33 Reads

There is a growing interest in developing automated agents that can work alongside humans. In addition to completing the assigned task, such an agent will undoubtedly be expected to behave in a manner that is preferred by the human. This requires the human to communicate their preferences to the agent. To achieve this, the current approaches either require the users to specify the reward function or the preference is interactively learned from queries that ask the user to compare trajectories. The former approach can be challenging if the internal representation used by the agent is inscrutable to the human while the latter is unnecessarily cumbersome for the user if their preference can be specified more easily in symbolic terms. In this work, we propose PRESCA (PREference Specification through Concept Acquisition), a system that allows users to specify their preferences in terms of concepts that they understand. PRESCA maintains a set of such concepts in a shared vocabulary. If the relevant concept is not in the shared vocabulary, then it is learned. To make learning a new concept more efficient, PRESCA leverages causal associations between the target concept and concepts that are already known. Additionally, the effort of learning the new concept is amortized by adding the concept to the shared vocabulary for supporting preference specification in future interactions. We evaluate PRESCA by using it on a Minecraft environment and show that it can be effectively used to make the agent align with the user's preference.


Citations (38)


... Perceived inconsistencies can arise for other reasons, for instance, when the user's mental model of the world does not match up with the information the agent is acting on. Even if an agent is taking actions that align with the user's goals, its actions may appear misaligned if the human's model of the world is different [72]; think of a shopping agent purchasing what appears to be an overly expensive widget because it knows that the cheaper model is incompatible with the user's needs, but fails to consider the user's budget limitations. ...

Reference:

Challenges in Human-Agent Communication
Planning with mental models – Balancing explanations and explicability –
  • Citing Article
  • July 2024

Artificial Intelligence

... In particular, in the context of deep RL, Guan et al. (2021) provide coarse symbolic feedback in the form of object-centric image regions to accompany binary feedback on an agent's proposed actions. Another interesting use of symbolic explanations in RL is that of Zha et al. (2021), in which an RL agent learns to better understand human demonstrations by grounding these in humanaligned symbolic representations. ...

Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning
  • Citing Article
  • March 2024

Proceedings of the AAAI Conference on Artificial Intelligence

... We are interested in providing explanations that aid operators in interpreting solutions generated by complex multirobot systems that incorporate task allocation, scheduling, and motion planning into its decision making. Prior XAI work has addressed this challenge by introducing techniques for generating explanations for task allocation [36], [35], scheduling [24], [9], and motion planning [15] independently. However, recent work in the multi-robot community has shown that the close interdependency between these three subproblems (i.e., determining which robots should perform which tasks affect the timing/schedule of those tasks, and in turn, the motion plans required for their execution) is most effectively addressed by holistic solutions that consider all three challenges together [31], [27], [30]. ...

‘Why Didn’t You Allocate This Task to Them?’ Negotiation-Aware Task Allocation and Contrastive Explanation Generation
  • Citing Article
  • March 2024

Proceedings of the AAAI Conference on Artificial Intelligence

... These tasks may have different objective functions depending on the nature of the game (Bianchi et al., 2024;Bara et al., 2021;Guo et al., 2023;Sclar et al., 2022) Lastly, Inference refers to tasks where the model is expected to make logical inferences, which may involve Natural Language Inference (Cohen, 2021) or predictions using methods like logistic regression (Eysenbach et al., 2016). Benchmarks (Nematzadeh et al., 2018) Code/Data QA T -✓ ✓ ✓ HI-TOM (Wu et al., 2023) Code/Data MC T -✓ ✓ ✓ MINDGAMES (Sileo and Lernould, 2023) Code, Data (Shapira et al., 2023a) Data MC, Infer, TC H -✓ ✓ †CONVENTAIL (Zhang and Chai, 2010) Data (Sap et al., 2019) Data (Tracey et al., 2022) - (Shapira et al., 2023b) Code/Data NLG H,AI -✓ ✓ ✓ COKE (Wu et al., 2024) Code (Yu et al., 2024) Data (Zhou et al., 2023b) - (Eysenbach et al., 2016) Code/Data Infer -Cartoon ✓ ✓ ✓ TRIANGLE COPA (Gordon, 2016) Data (Jia et al., 2022) Data Infer H Images ✓ ✓ ✓ BIB (Gandhi et al., 2021) Code/Data Infer -2D Grid ✓ ✓ ✓ AGENT (Shu et al., 2021) Code (Sclar et al., 2022) Code (Bara et al., 2021) Code (Bara et al., 2023) Code (Ma et al., 2023b) Code (Mireshghallah et al., 2024) Code (Kim et al., 2023) Code (Gandhi et al., 2023) Code (Kosinski, 2024) Code - - (Leer et al., 2023) Code/Data NLG H, AI -✓ ✓ ✓ * MOTOMQA (Street et al., 2024) - (Xu et al., 2024a) Code (Bianchi et al., 2024) Code (Amirizaniani et al., 2024) -TC H -✓ ✓ ✓ * PERCEPT-TOMI (Jung et al., 2024) - (Jung et al., 2024) - (Verma et al., 2024) - Code (Murthy et al., 2023) Code NLG H -✓ ✓ * RBC (Stöhr and Wang, 2023) - (Ruis et al., 2023) Code/Data QA T -✓ ✓ ✓ Situatedness. Despite emphasizing situatedness in their work, Ma et al. (2023c) does not provide a detailed explanation of what constitutes a physically perceiver/interactor versus a socially perceiver/interactor in their taxonomized review. ...

Theory of Mind Abilities of Large Language Models in Human-Robot Interaction: An Illusion?
  • Citing Conference Paper
  • March 2024

... The default objective in non-optimal planning (or search) is to simply produce any plan as quickly as possible, i.e., without regard for quality. A bounded-cost search algorithm takes as input a cost bound, and aims to find a solution within that bound as quickly as possible, i.e., without expending effort on achieving a better-quality solution than required by the bound [Stern et al., 2011;. Bounded suboptimal search algorithms, the most famous of which is Weighted A ⋆ [Pohl, 1970], take a relative bound parameter w and ensure the solution found is within a factor w of optimal. ...

Generalizing Action Justification and Causal Links to Policies
  • Citing Article
  • July 2023

Proceedings of the International Conference on Automated Planning and Scheduling

... While in HIL systems, ablation studies are useful, in AI 2 L systems, they are essential. Typically, some other important aspects of the evaluation of these AI 2 Lsystems are the interpretability, explainability (Sreedharan, Kulkarni, and Kambhampati 2022), interactive capabilities (Zahedi et al. 2023), and generalizability of these systems (Wüst et al. 2024). ...

Trust-Aware Planning: Modeling Trust Evolution in Iterated Human-Robot Interaction
  • Citing Conference Paper
  • March 2023

... 35). LLMs are additionally indicted for several other incapacities, including planning (Valmeekam et al., 2023), natural language understanding, folk physics, information retrieval, pragmatics, theory of mind, spatial inference, simple logical reasoning (Dziri et al., 2023), and mathematical reasoning. ...

On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

... A popular tool introduced for eliciting and estimating trust by such works is through the use of self-report scales [19,25,6]. Works have also looked at developing methods for estimating trust-levels through eyetracking [12], social [15] and other behavior [26,28] cues. Unfortunately, directly using these measures to drive agent behavior remains quite challenging. ...

Modeling the Interplay between Human Trust and Monitoring
  • Citing Conference Paper
  • March 2022

... Our work is related to research that focuses on generating user-understandable behavior which is studied under different terminologies, viz., legibility [11], predictability [12], transparency [13], explicability [2], etc. For a detailed review of these studies, refer to [14]. ...

Explicability? Legibility? Predictability? Transparency? Privacy? Security? The Emerging Landscape of Interpretable Agent Behavior
  • Citing Article
  • May 2021

Proceedings of the International Conference on Automated Planning and Scheduling

... Landmarks have an enormous history of use in speeding up the combinatorial search process for planning (Hoffmann, Porteous, and Sebastia 2004) as well as in planningadjacent tasks like plan recognition (Pereira, Oren, and Meneguzzi 2020). In the past, landmarks have also been used to summarize plans (Chen and Mooney 2011;Grover et al. 2020;Sreedharan et al. 2020b) to the end-user and debug plans (Sreedharan et al. 2020a) for the developer in complex real-world domains such as in the authoring of goaloriented conversational agents (Muise et al. 2019), as well as for localization in path planning settings (Mataric 1992). To the best of our knowledge, this is the first attempt at using landmarks for plan disambiguation with end users. ...

– D3WA+ – A Case Study of XAIP in a Model Acquisition Task for Dialogue Planning
  • Citing Article
  • June 2020

Proceedings of the International Conference on Automated Planning and Scheduling