Conference Paper

Goal recognition over POMDPs: Inferring the intention of a POMDP agent

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Plan recognition is the problem of inferring the goals and plans of an agent from partial observations of her behavior. Recently, it has been shown that the problem can be formulated and solved using planners, reducing plan recognition to plan generation. In this work, we extend this model-based approach to plan recognition to the POMDP setting, where actions are stochastic and states are partially observable. The task is to infer a probability distribution over the possible goals of an agent whose behavior results from a POMDP model. The POMDP model is shared between agent and observer except for the true goal of the agent that is hidden to the observer. The observations are action sequences O that may contain gaps as some or even most of the actions done by the agent may not be observed. We show that the posterior goal distribution P(G|O) can be computed from the value function VG(b) over beliefs b generated by the POMDP planner for each possible goal G. Some extensions of the basic framework are discussed, and a number of experiments are reported.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Afterward, Algorithm 1 calculates the heuristic value for all goals (lines 10-12). As a last step, from the calculated heuristic values, Algorithm 1 selects the goals that have the maximum value (lines [13][14][15][16][17][18][19]: ...
... C is the set of facts for which the algorithm samples supporting ac-tions, f ound is a set that keeps track of the facts already supported, and sups is the set of supporting actions already selected for the current sample. The search for supporting actions for a fact p occurs in lines [13][14][15][16][17][18][19][20][21][22]. For this search, the algorithm starts at the first action level of the RPG, which ensures that supporters closer to s0 are found first. ...
... Since the idea of Plan Recognition as Planning was introduced by Ramírez and Geffner [13], many approaches have adopted this paradigm [14,15,20,18,17,9,11,4]. It was recognized relatively soon that the initial PRAP approaches are computationally demanding, as they require computing entire plans. ...
Preprint
Full-text available
We present a new approach to goal recognition that involves comparing observed facts with their expected probabilities. These probabilities depend on a specified goal g and initial state s0. Our method maps these probabilities and observed facts into a real vector space to compute heuristic values for potential goals. These values estimate the likelihood of a given goal being the true objective of the observed agent. As obtaining exact expected probabilities for observed facts in an observation sequence is often practically infeasible, we propose and empirically validate a method for approximating these probabilities. Our empirical results show that the proposed approach offers improved goal recognition precision compared to state-of-the-art techniques while reducing computational complexity.
... Several works have extended this approach. The problem of dealing with behaviors generated under partial observability, which may require inferring both the plan and the beliefs, the mental state [187,189], of the observed actor, was studied with both classical planning [199] and Bayesian approaches [188,200]. The impact of missing observations for the observer agent has also been analyzed [199]. ...
... The problem of dealing with behaviors generated under partial observability, which may require inferring both the plan and the beliefs, the mental state [187,189], of the observed actor, was studied with both classical planning [199] and Bayesian approaches [188,200]. The impact of missing observations for the observer agent has also been analyzed [199]. A further step has been proposed by active methods for activity recognition [64,201,202] that use the same world model both to interpret others' actions as well as selecting actions that would improve the recognition process, e.g. by giving access to the most informative observations [65] and allow the completion of a joint task [190]. ...
... Some approaches attempt to address this condition by allowing for random noise in action selection [208] or by using simplified punishment strategies for sub-optimality, e.g. increased expected plan length [199]. However, human errors and behavioral suboptimality often depend on environmental features that determine the complexity of the underlying cognitive processes [206] or induce habits [207]. ...
... First, based on the definition of Plan Recognition as Planning introduced in (Ramírez and Geffner, 2009), we formalize the problem of recognizing temporally extended goals (expressed in ltl f or ppltl) in fond planning domains, handling both stochastic (i.e., strongcyclic plans) and adversarial (i.e., strong plans) environments (Aminof et al, 2020). Second, we extend the probabilistic framework for goal recognition proposed in (Ramírez and Geffner, 2010), and develop a novel probabilistic approach that reasons over executions of policies and returns a posterior probability distribution for the goal hypotheses. Third, we develop a compilation approach that generates an augmented fond planning problem by compiling temporally extended goals together with the original planning problem. ...
... We now introduce our recognition approach that is able to recognizing temporally extended (ltl f and ppltl) goals in fond planning domains. Our approach extends the probabilistic framework of Ramírez and Geffner (2010) to compute posterior probabilities over temporally extended goal hypotheses, by reasoning over the set of possible executions of policies π and the observations. Our goal recognition approach works in two stages: the compilation stage and the recognition stage. ...
... Existing recognition approaches often return either a probability distribution over the set of goals (Ramírez and Geffner, 2010;Sohrabi et al, 2016), or scores associated with each possible goal hypothesis (Pereira et al, 2020). Here, we return a probability distribution P over the set of temporally extended goals G φ that "best" explains the observations sequence Obs. ...
Preprint
Full-text available
Goal Recognition is the task of discerning the correct intended goal that an agent aims to achieve, given a set of goal hypotheses, a domain model, and a sequence of observations (i.e., a sample of the plan executed in the environment). Existing approaches assume that goal hypotheses comprise a single conjunctive formula over a single final state and that the environment dynamics are deterministic, preventing the recognition of temporally extended goals in more complex settings. In this paper, we expand goal recognition to temporally extended goals in Fully Observable Non-Deterministic (FOND) planning domain models, focusing on goals on finite traces expressed in Linear Temporal Logic (LTLf) and Pure Past Linear Temporal Logic (PLTLf). We develop the first approach capable of recognizing goals in such settings and evaluate it using different LTLf and PLTLf goals over six FOND planning domain models. Empirical results show that our approach is accurate in recognizing temporally extended goals in different recognition settings.
... A wide range of works have tackled the MAPR problem based on a plan library, including the Bayesian model [4][5][6][7], hidden Markov algorithms [8,9], specialized procedures [10], and parsing algorithms [11,12]. They all required a suitable plan library or a set of rules to establish their model. ...
... The Boolean matrix generation algorithm is shown in Algorithm 1. First, extract the sub-matrix according to the starting time (steps 2-4). Then, combine the agent actions and extract the parts that are completely equal to the transition relation of A ϕ i (steps [6][7][8][9][10][11]. Finally, form the Boolean matrix according to Formula (2) (step [15][16][17][18][19][20][21]. ...
... cost(set k ) = costAl pha(q next ) + costBeta(i, q next ). (9) P(goal j |set k ) indicates the probability of selecting goal j in the action set set k of agent i. ...
Article
Full-text available
This paper studies the plan recognition problem of multi-agent systems with temporal logic tasks. The high-level temporal tasks are represented as linear temporal logic (LTL). We present a probabilistic plan recognition algorithm to predict the future goals and identify the temporal logic tasks of the agent based on the observations of their states and actions. We subsequently build a plan library composed of Nondeterministic Bu¨chi Automation to model the temporal logic tasks. We also propose a Boolean matrix generation algorithm to map the plan library to multi-agent trajectories and a task recognition algorithm to parse the Boolean matrix. Then, the probability calculation formula is proposed to calculate the posterior goal probability distribution, and the cold start situation of the plan recognition is solved using the Bayes formula. Finally, we validate the proposed algorithm via extensive comparative simulations.
... According to the level of rationality exhibited by the observed, behavior can be classified into two categories: optimal and non-optimal [18]. Intent recognition based on planning often assumes that the observed is optimal [19]. This is because planning-based intent recognition methods generate effective plans from the entire set of goals to achieve intent recognition, requiring the method of generating plans to be optimal. ...
... Subsequently, the observed selects an action according to − and the state-action pairs are counted (lines 14 to 18). Then, the observed receives a reward (line 19). Following that, the Q-table and the state of the observed are updated (lines 21 to 23). ...
Article
Full-text available
Deceptive path planning (DPP) aims to find a path that minimizes the probability of the observer identifying the real goal of the observed before it reaches. It is important for addressing issues such as public safety, strategic path planning, and logistics route privacy protection. Existing traditional methods often rely on “dissimulation”—hiding the truth—to obscure paths while ignoring the time constraints. Building upon the theory of probabilistic goal recognition based on cost difference, we proposed a DPP method, DPP_Q, based on count-based Q-learning for solving the DPP problems in discrete path-planning domains under specific time constraints. Furthermore, to extend this method to continuous domains, we proposed a new model of probabilistic goal recognition called the Approximate Goal Recognition Model (AGRM) and verified its feasibility in discrete path-planning domains. Finally, we also proposed a DPP method based on proximal policy optimization for continuous path-planning domains under specific time constraints called DPP_PPO. DPP methods like DPP_Q and DPP_PPO are types of research that have not yet been explored in the field of path planning. Experimental results show that, in discrete domains, compared to traditional methods, DPP_Q exhibits better effectiveness in enhancing the average deceptiveness of paths. (Improved on average by 12.53% compared to traditional methods). In continuous domains, DPP_PPO shows significant advantages over random walk methods. Both DPP_Q and DPP_PPO demonstrate good applicability in path-planning domains with uncomplicated obstacles.
... Several works have extended this approach. The problem of dealing with behaviors generated under partial observability, which may require inferring both the plan and the beliefs, the mental state [176,178], of the observed actor, was studied with both classical planning [189] and Bayesian approaches [177,190]. The impact of missing observations for the observer agent has also been analyzed [189]. ...
... The problem of dealing with behaviors generated under partial observability, which may require inferring both the plan and the beliefs, the mental state [176,178], of the observed actor, was studied with both classical planning [189] and Bayesian approaches [177,190]. The impact of missing observations for the observer agent has also been analyzed [189]. A further step has been proposed by active methods for activity recognition [62,191,192] that use the same world model both to interpret others' actions as well as selecting actions that would improve the recognition process, e.g., by giving access to the most informative observations [63] and allow the completion of a joint task [180]. ...
Preprint
Full-text available
Creating autonomous robots that can actively explore the environment, acquire knowledge and learn skills continuously is the ultimate achievement envisioned in cognitive and developmental robotics. Their learning processes should be based on interactions with their physical and social world in the manner of human learning and cognitive development. Based on this context, in this paper, we focus on the two concepts of world models and predictive coding. Recently, world models have attracted renewed attention as a topic of considerable interest in artificial intelligence. Cognitive systems learn world models to better predict future sensory observations and optimize their policies, i.e., controllers. Alternatively, in neuroscience, predictive coding proposes that the brain continuously predicts its inputs and adapts to model its own dynamics and control behavior in its environment. Both ideas may be considered as underpinning the cognitive development of robots and humans capable of continual or lifelong learning. Although many studies have been conducted on predictive coding in cognitive robotics and neurorobotics, the relationship between world model-based approaches in AI and predictive coding in robotics has rarely been discussed. Therefore, in this paper, we clarify the definitions, relationships, and status of current research on these topics, as well as missing pieces of world models and predictive coding in conjunction with crucially related concepts such as the free-energy principle and active inference in the context of cognitive and developmental robotics. Furthermore, we outline the frontiers and challenges involved in world models and predictive coding toward the further integration of AI and robotics, as well as the creation of robots with real cognitive and developmental capabilities in the future.
... Goal recognition, a special form of plan recognition, deals with online problems aiming at identifying the goal of an agent as quickly as possible given its behavior (Geffner and Bonet 2013;Ramírez and Geffner 2011). Goal recognition is relevant in many applications including security (Jarvis, Lunt, and Myers 2005), computer games (Kabanza et al. 2010), and natural language processing (Geib and Steedman 2007). ...
... Its goal is one of three possible ones G1, G2, and G3. The traditional approach has been to find efficient algorithms that observe the trajectory of the agent and predict its actual goal (Geffner and Bonet 2013;Ramírez and Geffner 2011). Keren, Gal, and Karpas (2014) took an orthogonal approach by proposing to modify the underlying environment in which the agents operate, typically by making a subset of feasible actions infeasible, so that agents are forced to reveal their goals as early as possible. ...
Article
Goal Recognition Design involves identifying the best ways to modify an underlying environment that agents operate in, typically by making asubset of feasible actions infeasible, so that agents are forced to reveal their goals as early as possible. Thus far, existing work has focused exclusively on imperative classical planning. In this paper, we address the same problem with a different paradigm, namely, declarative approaches based on Answer Set Programming (ASP). Our experimental results show that one of our ASP encodings is more scalable and is significantly faster by up to three orders of magnitude than thecurrent state of the art.
... It may be noted that there exists multiple fault-lines that could be used to explain the model's decisions. In this work, we pick the most optimal fault-line, i.e., the one that is most influential and suitable given the user's current understanding of CNN model, by using Theory-of-Mind (ToM) (Yoshida et al., 2008;Rabinowitz et al., 2018;Pearce et al., 2014;Raileanu et al., 2018;Ramırez and Geffner, 2011;Edmonds et al., 2019;Zhang and Zhu, 2018). ...
... It may be noted that we are not trying to estimate or build a rich and dynamic true state of a human mind using ToM -a grand challenge for AI. Instead, similar to prior works on ToM (Yoshida et al., 2008;Rabinowitz et al., 2018;Pearce et al., 2014;Raileanu et al., 2018;Ramırez and Geffner, 2011;Zhu et al., 2020), we cast ToM framework as a simple learning problem that enable us to better understand user preferences that improve the utility of the explanations. ...
Article
Full-text available
We propose CX-ToM, short for counterfactual explanations with theory-of-mind, a new explainable AI (XAI) framework for explaining decisions made by a deep convolutional neural network (CNN). In contrast to the current methods in XAI that generate explanations as a single shot response, we pose explanation as an iterative communication process, i.e. dialog, between the machine and human user. More concretely, our CX-ToM framework generates a sequence of explanations in a dialog by mediating the differences between the minds of the machine and human user. To do this, we use Theory of Mind (ToM) which helps us in explicitly modeling human’s intention, machine’s mind as inferred by the human as well as human’s mind as inferred by the machine. Moreover, most state-of-the-art XAI frameworks provide attention (or heat map) based explanations. In our work, we show that these attention-based explanations are not sufficient for increasing human trust in the underlying CNN model. In CX-ToM, we instead use counterfactual explanations called fault-lines which we define as follows: given an input image I for which a CNN classification model M predicts class cpred, a fault-line identifies the minimal semantic-level features (e.g., stripes on zebra), referred to as explainable concepts, that need to be added to or deleted from I in order to alter the classification category of I by M to another specified class calt. Extensive experiments verify our hypotheses, demonstrating that our CX-ToM significantly outperforms the state-of-the-art XAI models.
... Whereas a planning approach requires a representation of the interaction situation, a detailed knowledge of its dynamics, and an adaptation of the high-level strategy to low-level controls, it still seems a promising direction to us: (1) it can leverage generic models and algorithms to automatically compute a robot's strategy from the definition of the collaborative task, (2) planning approaches are able to deal with several sources of uncertainties like sensor noises or uncertainties regarding the human behavior or mental state, (3) planning approaches are generic and various questions of paramount importance for collaborative humanoid robotics have been represented and can merged in that same framework like intention estimation [18], role attribution and/or inference of user profile [19], and (4) planning allows computing strategies considering the longterm consequences of the robot's behavior which could have a huge impact in the decision process (e.g., when optimizing the user's fatigue). ...
... This is not an easy task since, usually, the utility we would like to optimize is not reduced to a single dimension and must consider all possible criteria. It might involve not only the efficiency of the task achievement but also the human ergonomics and physiological comfort and the cognitive load, as well as unknown objectives [18]. This is a problem of paramount importance due to reward hacking problems [26]: the produced policy will optimize the given reward but might have unexpected side effects that could be counter-productive or dangerous. ...
Article
Full-text available
Purpose of Review Humanoid robots are versatile platforms with the potential to assist humans in several domains, from education to healthcare, from entertainment to the factory of the future. To find their place into our daily life, where complex interactions and collaborations with humans are expected, their social and physical interaction skills need to be further improved. Recent Findings The hallmark of humanoids is their anthropomorphic shape, which facilitates the interaction but at the same time increases the expectations of the human in terms of advanced cooperation capabilities. Cooperation with humans requires an appropriate modeling and real-time estimation of the human state and intention. This information is required both at a high level by the cooperative decision-making policy and at a low level by the interaction controller that implements the physical interaction. Real-time constraints induce simplified models that limit the decision capabilities of the robot during cooperation. Summary In this article, we review the current achievements in the context of human-humanoid interaction and cooperation. We report on the cognitive and cooperation skills that the robot needs to help humans achieve their goals, and how these high-level skills translate into the robot’s low-level control commands. Finally, we report on the applications of humanoid robots as humans’ companions, co-workers, or avatars.
... Often, they assume that this pre-determined set of goals is given, but it is usually not the case. Several approaches remove the use plan libraries; instead, they use planning and POMDP like Ramirez and Geffner [19,20], or deep learning like Min et al. [9] or even inverse planning like Baker et al. [21][22][23]. It allows them to be more general and adaptive by not relying on some plan library that will need to be changed for each problem, but most of them still assume that they know the set of possible goals of the agent and need it to work. ...
... That is why we compare our method to other concept-learning algorithms and not to traditional plan recognition algorithms since, as mentioned before, we are not tackling the same problem. Where traditional plan recognition approaches based on probabilistic models or deep learning like [9,20] need a set of possible goals and will then, according to the observations that they get from the agent, compute which goal and plan have the highest probability to be followed by the agent, we do not compute such a thing. Instead, we will try to generate this set of possible goals used by the other approaches. ...
Article
Full-text available
Goal recognition is a sub-field of plan recognition that focuses on the goals of an agent. Current approaches in goal recognition have not yet tried to apply concept learning to a propositional logic formalism. In this paper, we extend our method for inferring an agent’s possible goal by observing this agent in a series of successful attempts to reach its goal and using concept learning on these observations. We propose an algorithm, LFST (Learning From Successful Traces), to produce concise hypotheses about the agent’s goal. We show that if such a goal exists, our algorithm always provides a possible goal for the agent, and we evaluate the performance of our algorithm in different settings. We compare it to another concept-learning algorithm that uses a formalism close to ours, and we obtain better results at producing the hypotheses with our algorithm. We introduce a way to use assumptions about the agent’s behavior and the dynamics of the environment, thus improving the agent’s goal deduction by optimizing the potential goals’ search space.
... For both case studies, we employ the Goal Markov Decision Process (Goal MDP) framework to capture an observer's view of the world. A Goal MDP (Ramırez & Geffner, 2011) represents the possible actions that can be taken and the causal relationships of their effects on the world's states. Formally, it is defined as a tuple Π = (S, S G , A, P, C), where S is a non-empty state space, S G is a non-empty set of goal states, A is a set of actions, P a (s ′ | s) is the probability of transitioning from state s to state s ′ given action a, and C(s, a, s ′ ) is the cost of that transition. ...
Preprint
Goal recognition (GR) involves inferring an agent's unobserved goal from a sequence of observations. This is a critical problem in AI with diverse applications. Traditionally, GR has been addressed using 'inference to the best explanation' or abduction, where hypotheses about the agent's goals are generated as the most plausible explanations for observed behavior. Alternatively, some approaches enhance interpretability by ensuring that an agent's behavior aligns with an observer's expectations or by making the reasoning behind decisions more transparent. In this work, we tackle a different challenge: explaining the GR process in a way that is comprehensible to humans. We introduce and evaluate an explainable model for goal recognition (GR) agents, grounded in the theoretical framework and cognitive processes underlying human behavior explanation. Drawing on insights from two human-agent studies, we propose a conceptual framework for human-centered explanations of GR. Using this framework, we develop the eXplainable Goal Recognition (XGR) model, which generates explanations for both why and why not questions. We evaluate the model computationally across eight GR benchmarks and through three user studies. The first study assesses the efficiency of generating human-like explanations within the Sokoban game domain, the second examines perceived explainability in the same domain, and the third evaluates the model's effectiveness in aiding decision-making in illegal fishing detection. Results demonstrate that the XGR model significantly enhances user understanding, trust, and decision-making compared to baseline models, underscoring its potential to improve human-agent collaboration.
... Altogether, these results outlined that taking a "like them" approach for beliefs is beneficial towards improved performance for predicting others' intentions. In line with previous computational studies in the field Rabinowitz et al., 2018;Ramırez & Geffner, 2011), we adopted a simplified environment model which can have an impact on models' performance (Ognibene et al., 2019a). Learning to explicitly represent beliefs does indeed increase, although only linearly, the computational demands and complexity of the task. ...
Preprint
Theory of Mind (ToM), the ability to attribute beliefs, intentions, or mental states to others, is a crucial feature of human social interaction. In complex environments, where the human sensory system reaches its limits, behaviour is strongly driven by our beliefs about the state of the world around us. Accessing others' mental states, e.g., beliefs and intentions, allows for more effective social interactions in natural contexts. Yet, these variables are not directly observable, making understanding ToM a challenging quest of interest for different fields, including psychology, machine learning and robotics. In this paper, we contribute to this topic by showing a developmental synergy between learning to predict low-level mental states (e.g., intentions, goals) and attributing high-level ones (i.e., beliefs). Specifically, we assume that learning beliefs attribution can occur by observing one's own decision processes involving beliefs, e.g., in a partially observable environment. Using a simple feed-forward deep learning model, we show that, when learning to predict others' intentions and actions, more accurate predictions can be acquired earlier if beliefs attribution is learnt simultaneously. Furthermore, we show that the learning performance improves even when observed actors have a different embodiment than the observer and the gain is higher when observing beliefs-driven chunks of behaviour. We propose that our computational approach can inform the understanding of human social cognitive development and be relevant for the design of future adaptive social robots able to autonomously understand, assist, and learn from human interaction partners in novel natural environments and tasks.
... The difference compared to an MDP is that, in a POMDP, the agent does not directly observe the current state [31]. Instead, it receives an observation which is probabilistically dependent on the true state, thus adding another layer of uncertainty [32]. ...
Preprint
Full-text available
In the rapidly evolving realm of cybersecurity, the need for dynamic and adaptive defense mechanisms is paramount. This article introduces an innovative approach to intrusion detection and prevention systems (IDPS) through the application of self-play reinforcement learning. We extend the existing framework by integrating a model-free, off-policy algorithm , Twin Delayed Deep Deterministic Policy Gradients (TD3), to enhance the automated response capabilities of IDPS. This advancement results in a more effective and adaptable system capable of responding to the dynamically changing landscape of cyber threats. Furthermore, we introduce an innovative policy strategy within the TD3 framework, coupled with substantial auto-regression enhancements. These enhancements significantly improve the robustness and adaptability of cybersecurity response infrastructures, equipping them to better handle evolving cyber threats. Our methodology involves modelling intrusion prevention as a zero-sum game using Markov games, which captures the dynamic interaction between a defender and an attacker. The paper showcases the effectiveness of these approaches through enhanced self-play, policy refinement, and simulation scenarios, indicating significant improvements in threat detection and response over traditional security mechanisms. The findings underscore the potential of enhanced autoregressive policy representation to reshape the landscape of intrusion prevention strategies, making it a promising candidate for broader applications in cybersecurity. Our work also incorporates advanced machine learning techniques to optimize the decision-making process, ensuring that the system remains agile and responsive in the face of new and sophisticated cyber threats. Taking advantage of the power of TD3, we demonstrate a significant increase in IDPS efficiency and effectiveness, setting a new benchmark in cybersecurity solutions.
... Goal Recognition is the task of recognizing the intentions of autonomous agents or humans by observing their interactions in an environment. Existing work on goal and plan recognition addresses this task over several different types of domain settings, such as plan-libraries [4], plan tree grammars [19], classical planning domain models [31,34,35,37], stochastic environments [36], continuous domain models [22], incomplete discrete domain models [29], and approximate control models [30]. Despite the ample literature and recent advances, most existing approaches to Goal Recognition as Planning cannot recognize temporally extended goals, i.e., goals formalized in terms of time, e.g., the exact order that a set of facts of a goal must be achieved in a plan. ...
Article
Full-text available
Goal Recognition is the task of discerning the intended goal that an agent aims to achieve, given a set of goal hypotheses, a domain model, and a sequence of observations (i.e., a sample of the plan executed in the environment). Existing approaches assume that goal hypotheses comprise a single conjunctive formula over a single final state and that the environment dynamics are deterministic, preventing the recognition of temporally extended goals in more complex settings. In this paper, we expand goal recognition to temporally extended goals in Fully Observable Non-Deterministic (fond) planning domain models, focusing on goals on finite traces expressed in Linear Temporal Logic (ltlff_f) and Pure-Past Linear Temporal Logic (ppltl). We develop the first approach capable of recognizing goals in such settings and evaluate it using different ltlff_f and ppltl goals over six fond planning domain models. Empirical results show that our approach is accurate in recognizing temporally extended goals in different recognition settings.
... First, we want to recast our approach to the case where the agent adopts a stochastic policy to choose its next action -yielding a Markov chain from its goal library. Second, we intend to consider actions with nondeterminism, leading to MPDs and/or POMDPs planning domains [23]. Finally, we plan to improve the execution times needed to compute the recognized plans by implementing approaches that do not return an optimal ranking of alignments but an approximate one, like the one presented in [6]. ...
... Inaccurate initial states have been handled by task planners [6,43]. Moreover, GR with probabilistic, partially observable state knowledge and stochastic action outcomes has previously been investigated [29,50,68]; however, these systems require the probability of each state and action outcome to be known (and thus defined within the GR problem). GR with incomplete domain models, i.e. problems containing actions with incomplete preconditions and effects, has also been considered [45], but the initial state was assumed to be accurately represented. ...
Article
Full-text available
Goal recognisers attempt to infer an agent’s intentions from a sequence of observed actions. This is an important component of intelligent systems that aim to assist or thwart actors; however, there are many challenges to overcome. For example, the initial state of the environment could be partially unknown, and agents can act suboptimally and observations could be missing. Approaches that adapt classical planning techniques to goal recognition have previously been proposed, but, generally, they assume the initial world state is accurately defined. In this paper, a state is inaccurate if any fluent’s value is unknown or incorrect. Our aim is to develop a goal recognition approach that is as accurate as the current state-of-the-art algorithms and whose accuracy does not deteriorate when the initial state is inaccurately defined. To cope with this complication, we propose solving goal recognition problems by means of an Action Graph. An Action Graph models the dependencies, i.e. order constraints, between all actions rather than just actions within a plan. Leaf nodes correspond to actions and are connected to their dependencies via operator nodes. After generating an Action Graph, the graph’s nodes are labelled with their distance from each hypothesis goal. This distance is based on the number and type of nodes traversed to reach the node in question from an action node that results in the goal state being reached. For each observation, the goal probabilities are then updated based on either the distance the observed action’s node is from each goal or the change in distance. Our experimental results, for 15 different domains, demonstrate that our approach is robust to inaccuracies within the defined initial state.
... Since the idea of Plan Recognition as Planning was introduced by [13], many approaches have adopted this paradigm [14], [15], [20], [18], [17], [9], [11], [3]. It was recognized relatively soon that the initial PRAP approaches are computationally demanding, as they require computing entire plans. ...
Preprint
Full-text available
Goal recognition is an important problem in many application domains (e.g., pervasive computing, intrusion detection, computer games, etc.). In many application scenarios, it is important that goal recognition algorithms can recognize goals of an observed agent as fast as possible. However, many early approaches in the area of Plan Recognition As Planning, require quite large amounts of computation time to calculate a solution. Mainly to address this issue, recently, Pereira et al. developed an approach that is based on planning landmarks and is much more computationally efficient than previous approaches. However, the approach, as proposed by Pereira et al., also uses trivial landmarks (i.e., facts that are part of the initial state and goal description are landmarks by definition). In this paper, we show that it does not provide any benefit to use landmarks that are part of the initial state in a planning landmark based goal recognition approach. The empirical results show that omitting initial state landmarks for goal recognition improves goal recognition performance.
... Plan recognition (Sukthankar et al. 2014) is often formulated as a generalized task of goal recognition because it focuses on inferring plans and goals of observed agents. Plan recognition approaches often require a plan library consisting of all potential agent behaviors to be explicitly provided (Bisson, Larochelle, and Kabanza 2015) or they adopt a planning technique assuming close-to-rational agents (Baker et al. 2009;Ramírez and Geffner 2011). We adopt a corpus-based, statistical goal recognition approach that is well suited to the characteristics of open-world games, in which players take exploratory actions in expansive, virtual gameworlds (Blaylock and Allen 2003;Min et al. 2016a). ...
Article
Recent years have seen a growing interest in player modeling to create player-adaptive digital games. As a core player-modeling task, goal recognition aims to recognize players’ latent, high-level intentions in a non-invasive fashion to deliver goal-driven, tailored game experiences. This paper reports on an investigation of multimodal data streams that provide rich evidence about players’ goals. Two data streams, game event traces and player gaze traces, are utilized to devise goal recognition models from a corpus collected from an open-world serious game for science education. Empirical evaluations of 140 players’ trace data suggest that multimodal LSTM-based goal recognition models outperform competitive baselines, including unimodal LSTMs as well as multimodal and unimodal CRFs, with respect to predictive accuracy and early prediction. The results demonstrate that player gaze traces have the potential to significantly enhance goal recognition models’ performance.
... Because plan recognition focuses on inferring plans and goals of observed agents, it is formulated as a generalized task of goal recognition (Baker et al. 2009;Sukthankar et al. 2014). While much previous plan recognition work has utilized a hand-crafted plan library and a decision model (Geib and Goldman 2009;Bisson et al. 2015), a salient line of investigation has addressed plan recognition by learning a plan library in a data-driven approach (Fagan and Cunningham 2003;Synnaeve and Bessière 2011) or dispensing with the need for a plan library (Baker et al. 2009;Ramírez and Geffner 2011) by interpreting plan recognition as an inversion of action planning given a goal. However, in open-world digital games, particularly those in which players have little or no prior experience, players' exploration-based actions are marked by highly idiosyncratic sequences of player actions and often sub-optimal for achieving goals (Min et al. 2016); thus, devising a reli-able plan library or using a planning approach that assumes a rational agent is infeasible. ...
Article
Recent years have seen a growing interest in player modeling, which supports the creation of player-adaptive digital games. A central problem of player modeling is goal recognition, which aims to recognize players’ intentions from observable gameplay behaviors. Player goal recognition offers the promise of enabling games to dynamically adjust challenge levels, perform procedural content generation, and create believable NPC interactions. A growing body of work is investigating a wide range of machine learning-based goal recognition models. In this paper, we introduce GOALIE, a multidimensional framework for evaluating player goal recognition models. The framework integrates multiple metrics for player goal recognition models, including two novel metrics, n-early convergence rate and standardized convergence point. We demonstrate the application of the GOALIE framework with the evaluation of several player goal recognition models, including Markov logic network-based, deep feedforward neural network-based, and long short-term memory network-based goal recognizers on two different educational games. The results suggest that GOALIE effectively captures goal recognition behaviors that are key to next-generation player modeling.
... While our work was originally motivated by the goal recognition task, we have developed a general learning approach for sequence classification. Previous work in goal and plan recognition has typically relied on rich domain knowledge (e.g., Kautz and Allen 1986;Ramírez and Geffner 2011), thus limiting the applicability of this body of work. To leverage the existence of large datasets and machine learning techniques, some approaches to goal recognition eschew assumptions about domain knowledge and instead propose to learn models from data and use these models to predict an agent's goal given a sequence of observations (e.g., Geib and Kantharaju 2018;Amado et al. 2018;Polyvyanyy et al. 2020). ...
Article
Sequence classification is the task of predicting a class label given a sequence of observations. In many applications such as healthcare monitoring or intrusion detection, early classification is crucial to prompt intervention. In this work, we learn sequence classifiers that favour early classification from an evolving observation trace. While many state-of-the-art sequence classifiers are neural networks, and in particular LSTMs, our classifiers take the form of finite state automata and are learned via discrete optimization. Our automata-based classifiers are interpretable---supporting explanation, counterfactual reasoning, and human-in-the-loop modification---and have strong empirical performance. Experiments over a suite of goal recognition and behaviour classification datasets show our learned automata-based classifiers to have comparable test performance to LSTM-based classifiers, with the added advantage of being interpretable.
... Deception is also related to the problem of goal recognition in which an observer aims to infer an agent's goal based on its past behavior (Ramírez and Geffner 2010;Ramirez and Geffner 2011;Shvo and McIlraith 2020). We consider an observer that aims to infer the agent's goal by using a prediction model based on the principle of maximum entropy. ...
Article
We study the design of autonomous agents that are capable of deceiving outside observers about their intentions while carrying out tasks in stochastic, complex environments. By modeling the agent's behavior as a Markov decision process, we consider a setting where the agent aims to reach one of multiple potential goals while deceiving outside observers about its true goal. We propose a novel approach to model observer predictions based on the principle of maximum entropy and to efficiently generate deceptive strategies via linear programming. The proposed approach enables the agent to exhibit a variety of tunable deceptive behaviors while ensuring the satisfaction of probabilistic constraints on the behavior. We evaluate the performance of the proposed approach via comparative user studies and present a case study on the streets of Manhattan, New York, using real travel time distributions.
... Continuous domains were traditionally addressed by a separate discretization component, translating angles, positions, motions-sometimes entire trajectoriesinto discrete symbols. This facilitates the use of powerful algorithms that use a variety of recognition algorithms that utilize plan libraries: hierarchical graphs (Kautz and Allen 1986; Avrahami-Zilberbrand and Kaminka 2005), deterministic and probabilistic grammars (Pynadath and Wellman 2000;Geib and Goldman 2011;Sadeghipour and Kopp 2011;Mirsky and Gal 2016), and other probabilistic models (Charniak and Goldman 1993;Bui 2003;Pynadath and Marsella 2005;Ramırez and Geffner 2011). ...
Article
Plan recognition is the task of inferring the plan of an agent, based on an incomplete sequence of its observed actions. Previous formulations of plan recognition commit early to discretizations of the environment and the observed agent's actions. This leads to reduced recognition accuracy. To address this, we first provide a formalization of recognition problems which admits continuous environments, as well as discrete domains. We then show that through mirroring---generalizing plan-recognition by planning---we can apply continuous-world motion planners in plan recognition. We provide formal arguments for the usefulness of mirroring, and empirically evaluate mirroring in more than a thousand recognition problems in three continuous domains and six classical planning domains.
... Partial observability in goal recognition has been modeled in various ways (Ramirez and Geffner 2011;Geib and Goldman 2005;Avrahami-Zilberbrand, Kaminka, and Zarosim 2005). In particular, observability can be modeled using a sensor model that includes an observation token for each action (Geffner and Bonet 2013). ...
Article
Goal recognition design involves the offline analysis of goal recognition models by formulating measures that assess the ability to perform goal recognition within a model and finding efficient ways to compute and optimize them. In this work we relax the full observability assumption of earlier work by offering a new generalized model for goal recognition design with non-observable actions. A model with partial observability is relevant to goal recognition applications such as assisted cognition and security, which suffer from reduced observability due to sensor malfunction or lack of sufficient budget. In particular we define a worst case distinctiveness (wcd) measure that represents the maximal number of steps an agent can take in a system before the observed portion of his trajectory reveals his objective. We present a method for calculating wcd based on a novel compilation to classical planning and propose a method to improve the design using sensor placement. Our empirical evaluation shows that the proposed solutions effectively compute and improve wcd.
... We also consider these two types of reliability in our experiments. Ramirez and Geffner (2011) look at how a partially-observable Markov decision process (POMDP) performing goal recognition can handle missing or noisy observations, in part because of its probabilistic representation of agent behavior. POMDPs can be fairly computationally expensive to compute, however, precluding our use of them here. ...
Article
Full-text available
Goal or intent recognition, where one agent recognizes the goals or intentions of another, can be a powerful tool for effective teamwork and improving interaction between agents. Such reasoning can be challenging to perform, however, because observations of an agent can be unreliable and, often, an agent does not have access to the reasoning processes and mental models of the other agent. Despite this difficulty, recent work has made great strides in addressing these challenges. In particular, two Artificial Intelligence (AI)-based approaches to goal recognition have recently been shown to perform well: goal recognition as planning, which reduces a goal recognition problem to the problem of plan generation; and Combinatory Categorical Grammars (CCGs), which treat goal recognition as a parsing problem. Additionally, new advances in cognitive science with respect to Theory of Mind reasoning have yielded an approach to goal recognition that leverages analogy in its decision making. However, there is still much unknown about the potential and limitations of these approaches, especially with respect to one another. Here, we present an extension of the analogical approach to a novel algorithm, Refinement via Analogy for Goal Reasoning (RAGeR). We compare RAGeR to two state-of-the-art approaches which use planning and CCGs for goal recognition, respectively, along two different axes: reliability of observations and inspectability of the other agent's mental model. Overall, we show that no approach dominates across all cases and discuss the relative strengths and weaknesses of these approaches. Scientists interested in goal recognition problems can use this knowledge as a guide to select the correct starting point for their specific domains and tasks.
... However, this is not possible in many domains. To overcome the limitations of plan libraries, plan recognition can be reformulated as inverse planning (Baker et al., 2007), which allows the application of planners to discover potential agent plans (Ramırez & Geffner, 2009), distributions over plans (Ramırez & Geffner, 2010;Zhuo et al., 2012), and belief states within a POMDP agent (Ramırez & Geffner, 2011). These methods extend the applicability of plan recognition, but at the cost of significant compute resources. ...
Article
Full-text available
Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines.
... Conventional models of goal recognition do not allow for this. They may accommodate partially observable environments such that particular actions, events or aspects of the environment are not available to the observer (e.g., Baker et al., 2011;Ramırez and Geffner, 2011) and they routinely allow for partial sequences of observations in that observations may be missing (i.e., not every step-on-the-path/action-in-the-plan is seen to be taken). But when an observation is made, everything observable at that moment and from that vantage point is assumed now known to the observer. ...
Article
Full-text available
The “science of magic” has lately emerged as a new field of study, providing valuable insights into the nature of human perception and cognition. While most of us think of magic as being all about deception and perceptual “tricks”, the craft—as documented by psychologists and professional magicians—provides a rare practical demonstration and understanding of goal recognition. For the purposes of human-aware planning, goal recognition involves predicting what a human observer is most likely to understand from a sequence of actions. Magicians perform sequences of actions with keen awareness of what an audience will understand from them and—in order to subvert it—the ability to predict precisely what an observer’s expectation is most likely to be. Magicians can do this without needing to know any personal details about their audience and without making any significant modification to their routine from one performance to the next. That is, the actions they perform are reliably interpreted by any human observer in such a way that particular (albeit erroneous) goals are predicted every time. This is achievable because people’s perception, cognition and sense-making are predictably fallible. Moreover, in the context of magic, the principles underlying human fallibility are not only well-articulated but empirically proven. In recent work we demonstrated how aspects of human cognition could be incorporated into a standard model of goal recognition, showing that—even though phenomena may be “fully observable” in that nothing prevents them from being observed—not all are noticed, not all are encoded or remembered, and few are remembered indefinitely. In the current article, we revisit those findings from a different angle. We first explore established principles from the science of magic, then recontextualise and build on our model of extended goal recognition in the context of those principles. While our extensions relate primarily to observations, this work extends and explains the definitions, showing how incidental (and apparently incidental) behaviours may significantly influence human memory and belief. We conclude by discussing additional ways in which magic can inform models of goal recognition and the light that this sheds on the persistence of conspiracy theories in the face of compelling contradictory evidence.
... Furthermore, a distribution over applicable actions is defined such that the probability of an action is proportional to its goal-directedness, i.e. actions that reduce the goal distance are assigned a higher probability. Other CSSM approaches include the approach by Ramírez and Geffner [178], which also uses PDDL as a computational description of the predict function, and defines inference in terms of a POMDP, and the approach by Sadilek and Kautz [183], which uses a Markov logic network (MLN) as a symbolic description, but unrolls the MLN into a graphical model instead of performing BF. ...
Thesis
Full-text available
Bayesian filtering (BF) is a general probabilistic framework for estimating the state of a dynamic system that can be observed only indirectly thorough noisy measurements. This thesis focuses on systems that consist of multiple, interacting entites (e.g. agents or objects), for which the system dynamics can be specified naturally by multiset rewriting systems (MRSs). Unfortunately, BF in MRSs is computationally challenging due to the combinatorial explosion in the state space size. Therefore, we investigate efficient BF algorithms for such multi-entity systems. The main insight is that the state space that is underling an MRS exhibits a certain symmetry, which can be exploited to increase inference efficiency. This thesis provides five main contributions. First, we show how distributions over multi- sets can be decomposed into two factors: A distribution over the structures and multiplicities of entities, and a distribution over values of the entities’ properties. This representation al- lows to group together entities with identical structure, thus achieving a substantial reduction in representation complexity. As this representation bears some similarity to other concepts from lifted probabilistic inference, we call it a lifted representation. Secondly, we introduce a BF algorithm that works directly on this lifted representation, which is able to achieve a factorial reduction in space and time complexity, compared to conventional, ground filtering. When observations or system dynamics break symmetry, the algorithm automatically adapts by splitting. When a maximally parallel action execution semantics is used – when all entities can act in parallel – exact BF can become intractable due the large number of parallel actions. To alleviate this problem, our third contribution is a Markov chain Monte Carlo algorithm that samples parallel actions instead of performing full enumeration. Fourth, we address the problem that due to symmetry breaks, the algorithm must perform splitting, so that the model can become completely propositional over time and inference becomes intractable. This is done by introducing inverse merging operations for a number of practically relevant special cases. Finally, we empirically evaluate the lifted BF algorithm on real-world human activity recognition domains, and show that the algorithm can be more efficient than propositional BF. To the best of our knowledge, this is the first attempt to provide BF for systems with MRS dynamics and the first attempt that allows to perform prediction and update directly on the lifted representation.
... This work inspired a number of later works which generalised the settings in which inverse-planning goal recognition could be performed [6], [24], [27], [32]. The most relevant to our work is the work of Ramirez and Geffner [25], however, this earlier work assumes the observer (i.e., ego-vehicle) can compute the belief-state of the observed agent (i.e., non-ego vehicles). In our setting, this assumption would amount to knowing if occluded factors are present. ...
Preprint
Full-text available
Recognising the goals or intentions of observed vehicles is a key step towards predicting the long-term future behaviour of other agents in an autonomous driving scenario. When there are unseen obstacles or occluded vehicles in a scenario, goal recognition may be confounded by the effects of these unseen entities on the behaviour of observed vehicles. Existing prediction algorithms that assume rational behaviour with respect to inferred goals may fail to make accurate long-horizon predictions because they ignore the possibility that the behaviour is influenced by such unseen entities. We introduce the Goal and Occluded Factor Inference (GOFI) algorithm which bases inference on inverse-planning to jointly infer a probabilistic belief over goals and potential occluded factors. We then show how these beliefs can be integrated into Monte Carlo Tree Search (MCTS). We demonstrate that jointly inferring goals and occluded factors leads to more accurate beliefs with respect to the true world state and allows an agent to safely navigate several scenarios where other baselines take unsafe actions leading to collisions.
... Existing work on goal and plan recognition has been addressing this task over several different types of domain settings, such as plan-libraries (Mirsky et al. 2016), plan tree grammars (Geib and Goldman 2009), classical planning domain models Geffner 2009, 2010;E.-Martín, R.-Moreno, and Smith 2015;Sohrabi, Riabov, and Udrea 2016;Pereira, Oren, and Meneguzzi 2017), stochastic environments (Ramírez and Geffner 2011), and continuous domain models (Kaminka, Vered, and Agmon 2018). In spite of the large literature, most existing approaches to Goal Recognition as Planning are not capable of recognizing temporally extended goals, goals formalized in terms of time, e.g., the exact order that a set of facts of a goal must be achieved in a plan. ...
Preprint
Goal Recognition is the task of discerning the correct intended goal that an agent aims to achieve, given a set of possible goals, a domain model, and a sequence of observations as a sample of the plan being executed in the environment. Existing approaches assume that the possible goals are formalized as a conjunction in deterministic settings. In this paper, we develop a novel approach that is capable of recognizing temporally extended goals in Fully Observable Non-Deterministic (FOND) planning domain models, focusing on goals on finite traces expressed in Linear Temporal Logic (LTLf) and (Pure) Past Linear Temporal Logic (PLTLf). We empirically evaluate our goal recognition approach using different LTLf and PLTLf goals over six common FOND planning domain models, and show that our approach is accurate to recognize temporally extended goals at several levels of observability.
... Authors of recent studies 8,[40][41][42][43] have also attempted to build a mind network directly for the observer robot, however all with symbolic reasoning processes. We have previously 40 conducted both simulations and real-world experiments involving a real robot that reverse-engineers the actor's policy with neural networks based on collected trajectories. ...
Article
Full-text available
Behavior modeling is an essential cognitive ability that underlies many aspects of human and animal social behavior (Watson in Psychol Rev 20:158, 1913), and an ability we would like to endow robots. Most studies of machine behavior modelling, however, rely on symbolic or selected parametric sensory inputs and built-in knowledge relevant to a given task. Here, we propose that an observer can model the behavior of an actor through visual processing alone, without any prior symbolic information and assumptions about relevant inputs. To test this hypothesis, we designed a non-verbal non-symbolic robotic experiment in which an observer must visualize future plans of an actor robot, based only on an image depicting the initial scene of the actor robot. We found that an AI-observer is able to visualize the future plans of the actor with 98.5% success across four different activities, even when the activity is not known a-priori. We hypothesize that such visual behavior modeling is an essential cognitive ability that will allow machines to understand and coordinate with surrounding agents, while sidestepping the notorious symbol grounding problem. Through a false-belief test, we suggest that this approach may be a precursor to Theory of Mind, one of the distinguishing hallmarks of primate social cognition.
... While our work was originally motivated by the goal recognition task, we have developed a general learning approach for sequence classification. Previous work in goal and plan recognition has typically relied on rich domain knowledge (e.g., (Kautz and Allen 1986;Geib and Goldman 2001;Ramírez and Geffner 2011;Pereira, Oren, and Meneguzzi 2017)), thus limiting the applicability of this body of work. To leverage the existence of large datasets and machine learning techniques, some approaches to goal recognition eschew assumptions about domain knowledge and instead propose to learn models from data and use the learned models to predict an agent's goal given a sequence of observations (e.g., (Geib and Kantharaju 2018;Amado et al. 2018;Polyvyanyy et al. 2020)). ...
Preprint
Sequence classification is the task of predicting a class label given a sequence of observations. In many applications such as healthcare monitoring or intrusion detection, early classification is crucial to prompt intervention. In this work, we learn sequence classifiers that favour early classification from an evolving observation trace. While many state-of-the-art sequence classifiers are neural networks, and in particular LSTMs, our classifiers take the form of finite state automata and are learned via discrete optimization. Our automata-based classifiers are interpretable---supporting explanation, counterfactual reasoning, and human-in-the-loop modification---and have strong empirical performance. Experiments over a suite of goal recognition and behaviour classification datasets show our learned automata-based classifiers to have comparable test performance to LSTM-based classifiers, with the added advantage of being interpretable.
... We consider the formulation of [8] for a Markov Decision Process (MDP), it is represented as a tuple P = S, A, T, R where: · S stands for a set of state variables · A represents a set of the agent's actions, in analogy with classical planning, one can extend this MDP formulation by adding effects (ef f (a)) and preconditions (pre(a)) to actions · T is a function of the transition probabilities P a (s |s) for a ∈ A, s , s ∈ S , if one is in state s and performs and action a, one gets to a state s with probability P a (s |s), · R is a reward function for executing an action a in state s We define an MDP policy π as a function π : S → A that maps actions to MDP states, a the policy that maximizes the long-term expected reward is an optimal policy π * . A reward can be discounted by means of a discount factor γ in [0, 1]. ...
Conference Paper
In semi-automated industrial applications, interaction between humans and robots is essential. Such interactions require some level of mutual awareness and coordination. Precisely, while interacting with humans, robots need to be aware of their state and possible future actions in order to collaborate with them and help achieve their goal more efficiently. The focus of this work is the problem of planning for Human-Robot collaboration. First, the robot actions and their dependency on the human's activity are modeled as a Markov Decision Process (MDP). Second, the instances of the model are solved using an off-the-shelf planner. Through analysis of the solutions to the model, results highlight the influence of experimental parameters such as the size of the task and the horizon on the efficiency of the solver. Finally, the deployment of the MDP on a use-case assembly process scenario inspired from an aerospace manufacturing industry is discussed.
Chapter
Goal recognition is an important problem in many application domains (e.g., pervasive computing, intrusion detection, computer games, etc.). In many application scenarios, it is important that goal recognition algorithms can recognize goals of an observed agent as fast as possible. However, many early approaches in the area of Plan Recognition As Planning, require quite large amounts of computation time to calculate a solution. Mainly to address this issue, recently, Pereira et al. developed an approach that is based on planning landmarks and is much more computationally efficient than previous approaches. However, the approach, as proposed by Pereira et al., considers trivial landmarks (i.e., facts that are part of the initial state and goal description are landmarks by definition) for goal recognition. In this paper, we show that it does not provide any benefit to use landmarks that are part of the initial state in a planning landmark based goal recognition approach. The empirical results show that omitting initial state landmarks for goal recognition improves goal recognition performance.
Chapter
Adult humans are typically capable of impressive, often recursive, reasoning about the mental states of others, but recent evidence has suggested that said reasoning, called Theory of Mind reasoning (ToM), is not easy or automatic. This has lead to the theory that human ToM reasoning requires two systems. One system, efficient but inflexible, enables rapid judgements by operating without explicit modeling of beliefs, while a separate, effortful system, enables richer predictions over more complex belief encodings. We argue that computational ToM requires a similar distinction. However, we propose a different model: a single process, but with effortful re-representation leading to two phases of ToM reasoning. Efficient reasoning, in our view, occurs over representations that include actions, but not necessarily explicit belief states. Effortful reasoning, then, involves re-representation of these initial encodings in order to handle errors, resolve real-world conflicts, and fully account for others’ belief states. We present an implemented computational model, based in memory retrieval and structural alignment, and discuss possible implications for computational agents in human-machine teams.
Article
The agent's capability to acquire, infer, and store the knowledge of other agents is known as agent modeling. Agent modeling addresses the problem of reasoning about an opponent, which is a critical task in competitive situations, or reasoning about a partner, which is important in situations of cooperation, communication, and to enhance social connections. The modeling information is useful to reason about the agent's intentions, to understand its current behavior, and to predict its future behavior. The objective of this work is to carry out a systematic mapping review of the investigations that address this problem in the last 13 years. As a result, the area was categorized in four dimensions, three wide methods, and identified twelve characteristics on the gathered data. The contribution of each investigation has been studied and offer an analysis of each one, as well as a summary of the use cases where the researchers are applying agent modeling. Finally, open problems in the area that could become future lines of research are identified.
Article
Goal recognition design involves the offline analysis of goal recognition models by formulating measures that assess the ability to perform goal recognition within a model and finding efficient ways to compute and optimize them. In this work we present goal recognition design for non-optimal agents, which extends previous work by accounting for agents that behave non-optimally either intentionally or naıvely. The analysis we present includes a new generalized model for goal recognition design and the worst case distinctiveness (wcd) measure. For two special cases of sub-optimal agents we present methods for calculating the wcd, part of which are based on novel compilations to classical planning problems. Our empirical evaluation shows the proposed solutions to be effective in computing and optimizing the wcd.
Preprint
We consider a team of autonomous agents that navigate in an adversarial environment and aim to achieve a task by allocating their resources over a set of target locations. The adversaries in the environment observe the autonomous team's behavior to infer their objective and counter-allocate their own resources to the target locations. In this setting, we develop strategies for controlling the density of the autonomous team so that they can deceive the adversaries regarding their objective while achieving the desired final resource allocation. We first develop a prediction algorithm, based on the principle of maximum entropy, to express the team's behavior expected by the adversaries. Then, by measuring the deceptiveness via Kullback-Leibler divergence, we develop convex optimization-based planning algorithms that deceives adversaries by either exaggerating the behavior towards a decoy allocation strategy or creating ambiguity regarding the final allocation strategy. Finally, we illustrate the performance of the proposed algorithms through numerical simulations.
Article
The human ability to reason about the causes behind other people’ behavior is critical for navigating the social world. Recent empirical research with both children and adults suggests that this ability is structured around an assumption that other agents act to maximize some notion of subjective utility. In this paper, we present a formal theory of this Naïve Utility Calculus as a probabilistic generative model, which highlights the role of cost and reward tradeoffs in a Bayesian framework for action-understanding. Our model predicts with quantitative accuracy how people infer agents’ subjective costs and rewards based on their observable actions. By distinguishing between desires, goals, and intentions, the model extends to complex action scenarios unfolding over space and time in scenes with multiple objects and multiple action episodes. We contrast our account with simpler model variants and a set of special-case heuristics across a wide range of action-understanding tasks: inferring costs and rewards, making confidence judgments about relative costs and rewards, combining inferences from multiple events, predicting future behavior, inferring knowledge or ignorance, and reasoning about social goals. Our work sheds light on the basic representations and computations that structure our everyday ability to make sense of and navigate the social world.
Article
Intention recognition is the process of using behavioural cues, such as deliberative actions, eye gaze, and gestures, to infer an agent's goals or future behaviour. In artificial intelligence, one approach for intention recognition is to use a model of possible behaviour to rate intentions as more likely if they are a better ‘fit’ to actions observed so far. In this paper, we draw from literature linking gaze and visual attention, we propose a novel model of online human intention recognition that combines gaze and model-based AI planning to build probability distributions over a set of possible intentions. In human-behavioural experiments (n=40) involving a multi-player board game, we demonstrate that adding gaze-based priors to model-based intention recognition improved the accuracy of intention recognition by 22% (p<0.05), determined those intentions ≈90 seconds earlier (p<0.05), and at no additional computational cost. We also demonstrate that, when evaluated in the presence of semi-rational or deceptive gaze behaviours, the proposed model is significantly more accurate (9% improvement) (p<0.05) compared to a model-based or gaze only approaches. Our results indicate that the proposed model could be used to design novel human-agent interactions in cases when we are unsure whether a person is honest, deceitful, or semi-rational.
Article
Full-text available
Receiver operating characteristics (ROC) graphs are useful for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been used increasingly in machine learning and data mining research. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in practice. The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.
Article
Full-text available
Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial intelligence. Researchers have argued that DP provides the appropriate basis for compiling planning results into reactive strategies for real-time control, as well as for learning such strategies when the system being controlled is incompletely known. We introduce an algorithm based on DP, which we call Real-Time DP (RTDP), by which an embedded system can improve its performance with experience. RTDP generalizes Korf's Learning-Real-Time-A* algorithm to problems involving uncertainty. We invoke results from the theory of asynchronous DP to prove that RTDP achieves optimal behavior in several different classes of problems. We also use the theory of asynchronous DP to illuminate aspects of other DP-based reinforcement learning methods such as Watkins' Q-Learning algorithm. A secondary aim of this article is to provide a bridge between AI research on real-time planning and learning and relevant concepts and algorithms from control theory.
Conference Paper
Full-text available
Plan recognition is the problem of inferring the goals and plans of an agent after observing its behavior. Recently, it has been shown that this problem can be solved efficiently, without the need of a plan library, using slightly modified planning algorithms. In this work, we extend this approach to the more general problem of probabilistic plan recognition where a probability distribution over the set of goals is sought under the assumptions that actions have deterministic effects and both agent and observer have complete information about the initial state. We show that this problem can be solved efficiently using classical planners provided that the probability of a partially observed execution given a goal is defined in terms of the cost difference of achieving the goal under two conditions: complying with the observations, and not complying with them. This cost, and hence the posterior goal probabilities, are computed by means of two calls to a classical planner that no longer has to be modified in any way. A number of examples is considered to illustrate the quality, flexibility, and scalability of the approach. Copyright © 2010, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Conference Paper
Full-text available
In this work we aim to narrow the gap between plan recognition and planning by exploiting the power and generality of recent planning algorithms for recognizing the set G of goals G that explain a se- quence of observations given a domain theory. Af- ter providing a crisp definition of this set, we show by means of a suitable problem transformation that a goalGbelongs toG if there is an action sequence that is an optimal plan for both the goal G and the goal G extended with extra goals representing the observations. Exploiting this result, we show how the set G can be computed exactly and approxi- mately by minor modifications of existing optimal and suboptimal planning algorithms, and existing polynomial heuristics. Experiments over several domains show that the suboptimal planning algo- rithms and the polynomial heuristics provide good approximations of the optimal goal set G while scaling up as well as state-of-the-art planning al- gorithms and heuristics.
Article
We demonstrate, using protocols of actual interactions with a question-answering system, that users of these systems expect to engage in a conversation whose coherence is manifested in the interpendence of their (often unstated) plans and goals with those of the system. Since these problems are even more obvious in other forms of natural-language understanding systems, such as task-oriented dialogue systems, techniques for engaging in question-answering conversation should be special cases of general conversational abilities. We characterize dimensions along which language understanding systems might differ and, based partly on this analysis, propose a new system architecture, centered around recognizing the user's plans and planning helpful responses, which can be applied to a number of possible application areas. To illustrate progress to date, we discuss two implemented systems, one operating in a simple question-answering framework, and the other in a decision support framework for which both graphic and linguistic means of communication are available. (Author)
Article
Humans are adept at inferring the mental states underlying other agents’ actions, such as goals, beliefs, desires, emotions and other thoughts. We propose a computational framework based on Bayesian inverse planning for modeling human action understanding. The framework represents an intuitive theory of intentional agents’ behavior based on the principle of rationality: the expectation that agents will plan approximately rationally to achieve their goals, given their beliefs about the world. The mental states that caused an agent’s behavior are inferred by inverting this model of rational planning using Bayesian inference, integrating the likelihood of the observed actions with the prior over mental states. This approach formalizes in precise probabilistic terms the essence of previous qualitative approaches to action understanding based on an “intentional stance” [Dennett, D. C. (1987). The intentional stance. Cambridge, MA: MIT Press] or a “teleological stance” [Gergely, G., Nádasdy, Z., Csibra, G., & Biró, S. (1995). Taking the intentional stance at 12 months of age. Cognition, 56, 165–193]. In three psychophysical experiments using animated stimuli of agents moving in simple mazes, we assess how well different inverse planning models based on different goal priors can predict human goal inferences. The results provide quantitative evidence for an approximately rational inference mechanism in human goal inference within our simplified stimulus paradigm, and for the flexible nature of goal representations that human observers can adopt. We discuss the implications of our experimental results for human action understanding in real-world contexts, and suggest how our framework might be extended to capture other kinds of mental state inferences, such as inferences about beliefs, or inferring whether an entity is an intentional agent.
Article
We present the PHATT algorithm for plan recognition. Unlike previous approaches to plan recognition, PHATT is based on a model of plan execution. We show that this clarifies several difficult issues in plan recognition including the execution of multiple interleaved root goals, partially ordered plans, and failing to observe actions. We present the PHATT algorithm's theoretical basis, and an implementation based on tree structures. We also investigate the algorithm's complexity, both analytically and empirically. Finally, we present PHATT's integrated constraint reasoning for parametrized actions and temporal constraints.
Article
This paper introduces a hierarchical Markov model that can learn and infer a user's daily movements through an urban community. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user's destination and mode of transportation. To achieve efficient inference, we apply Rao–Blackwellized particle filters at multiple levels of the model hierarchy. Locations such as bus stops and parking lots, where the user frequently changes mode of transportation, are learned from GPS data logs without manual labeling of training data. We experimentally demonstrate how to accurately detect novel behavior or user errors (e.g. taking a wrong bus) by explicitly modeling activities in the context of the user's historical data. Finally, we discuss an application called “Opportunity Knocks” that employs our techniques to help cognitively-impaired people use public transportation safely.
Article
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable MDPs (pomdps). We then outline a novel algorithm for solving pomdps off line and show how, in some cases, a finite-memory controller can be extracted from the solution to a POMDP. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
Conference Paper
The formulation of planning as heuristic search withheuristics derived from problem representations hasturned out to be a fruitful approach for classical planning.In this paper, we pursue a similar idea in thecontext planning with incomplete information. Planningwith incomplete information can be formulated asa problem of search in belief space, where belief statescan be either sets of states or more generally probabilitydistribution over states. While the formulation (as the...
Conference Paper
Point-based algorithms and RTDP-Bel are approx- imate methods for solving POMDPs that replace the full updates of parallel value iteration by faster and more effective updates at selected beliefs. An important difference between the two methods is that the former adopt Sondik's representation of the value function, while the latter uses a tabular representation and a discretization function. The algorithms, however, have not been compared up to now, because they target different POMDPs: discounted POMDPs on the one hand, and Goal POMDPs on the other. In this paper, we bridge this representational gap, showing how to trans- form discounted POMDPs into Goal POMDPs, and use the transformation to compare RTDP-Bel with point-based algorithms over the existing dis- counted benchmarks. The results appear to contra- dict the conventional wisdom in the area showing that RTDP-Bel is competitive, and sometimes su- perior to point-based algorithms in both quality and time.
Conference Paper
Recent applications of plan recognition face sev- eral open challenges: (i) matching observations to the plan library is costly, especially with com- plex multi-featured observations; (ii) computing recognition hypotheses is expensive. We present techniques for addressing these challenges. First, we show a novel application of machine-learning decision-tree to efficiently map multi-featured ob- servations to matching plan steps. Second, we pro- vide efficient lazy-commitment recognition algo- rithms that avoid enumerating hypotheses with ev- ery observation, instead only carrying out book- keeping incrementally. The algorithms answer queries as to the current state of the agent, as well as its history of selected states. We provide empirical results demonstrating their efficiency and capabili- ties.
Conference Paper
Sensors provide computer systems with a window to the outside world. Activity recognition "sees" what is in the window to predict the locations, tra- jectories, actions, goals and plans of humans and objects. Building an activity recognition system requires a full range of interaction from statisti- cal inference on lower level sensor data to sym- bolic AI at higher levels, where prediction results and acquired knowledge are passed up each level to form a knowledge food chain. In this article, I will give an overview of some of the current activity recognition research works and explore a life-cycle of learning and inference that allows the lowest- level radio-frequency signals to be transformed into symbolic logical representations for AI planning, which in turn controls the robots or guides human users through a sensor network, thus completing a full life cycle of knowledge.
Conference Paper
We present a new general framework for online probabilistic plan recognition called the Abstract Hidden Markov Memory Model (AHMEM). The new model is an extension of the existing Abstract Hidden Markov Model to allow the policy to have internal memory which can be updated in a Markov fashion. We show that the AHMEM can represent a richer class of probabilistic plans, and at the same time derive an efficient algorithm for plan recognition in the AHMEM based on the RaoBlackwellised Particle Filter approximate inference method.
Article
Introduction We describe the GPT system and its utilization over a number of examples. GPT (General Planning Tool) is an integrated software tool for modeling, analyzing and solving a wide range of planning problems dealing with uncertainty and partial information, that has been used for us and others for research and teaching. Our approach is based on different state models that can handle various types of action dynamics (deterministic and probabilistic) and sensor feedback (null, partial, and complete). The system consists mainly of a high-level language for expressing actions, sensors, and goals, and a bundle algorithms based on heuristic search for solving them. The language is one of GPT's strengths since it presents the user a consistent and unified framework for the planning task. These descriptions are then solved by appropriate algorithms chosen from the bun
Article
Probabilistic context-free grammars (PCFGs) provide a simple way to represent a particular class of distributions over sentences in a context-free language. Efficient parsing algorithms for answering particular queries about a PCFG (i.e., calculating the probability of a given sentence, or finding the most likely parse) have been developed, and applied to a variety of patternrecognition problems. We extend the class of queries that can be answered in several ways: (1) allowing missing tokens in a sentence or sentence fragment, (2) supporting queries about intermediate structure, such as the presence of particular nonterminals, and (3) flexible conditioning on a variety of types of evidence. Our method works by constructing a Bayesian network to represent the distribution of parse trees induced by a given PCFG. The network structure mirrors that of the chart in a standard parser, and is generated using a similar dynamic-programming approach. We present an algorithm for constructing Baye...