PreprintPDF Available

Abstract and Figures

Artificial intelligence (AI) agents will need to interact with both other AI agents and humans. Creating models of associates help to predict the modeled agents' actions, plans, and intentions. This work introduces algorithms that predict actions, plans and intentions in repeated play games, with providing an exploration of algorithms. We form a generative Bayesian approach to model S#. S# is designed as a robust algorithm that learns to cooperate with its associate in 2 by 2 matrix games. The actions, plans and intentions associated with each S# expert are identified from the literature, grouping the S# experts accordingly, and thus predicting actions, plans, and intentions based on their state probabilities. Two prediction methods are explored for Prisoners Dilemma: the Maximum A Posteriori (MAP) and an Aggregation approach. MAP (~89% accuracy) performed the best for action prediction. Both methods predicted plans of S# with ~88% accuracy. Paired T-test shows that MAP performs significantly better than Aggregation for predicting S#'s actions without cheap talk. Intention is explored based on the goals of the S# experts; results show that goals are predicted precisely when modeling S#. The obtained results show that the proposed Bayesian approach is well suited for modeling agents in two-player repeated games.
Content may be subject to copyright.
Predicting Plans and Actions in Two-Player Repeated Games
Najma Mathema, Michael A. Goodrich, and Jacob W. Crandall
Computer Science Department
Brigham Young University
Provo, UT, USA
Abstract
Artificial intelligence (AI) agents will need to interact with
both other AI agents and humans. Creating models of asso-
ciates help to predict the modeled agents’ actions, plans, and
intentions. This work introduces algorithms that predict ac-
tions, plans and intentions in repeated play games, with pro-
viding an exploration of algorithms. We form a generative
Bayesian approach to model S#. S# is designed as a robust
algorithm that learns to cooperate with its associate in 2 by
2 matrix games. The actions, plans and intentions associated
with each S# expert are identified from the literature, group-
ing the S# experts accordingly, and thus predicting actions,
plans, and intentions based on their state probabilities. Two
prediction methods are explored for Prisoners Dilemma: the
Maximum A Posteriori (MAP) and an Aggregation approach.
MAP (89% accuracy) performed the best for action pre-
diction. Both methods predicted plans of S# with 88% ac-
curacy. Paired T-test shows that MAP performs significantly
better than Aggregation for predicting S#’s actions without
cheap talk. Intention is explored based on the goals of the
S# experts; results show that goals are predicted precisely
when modeling S#. The obtained results show that the pro-
posed Bayesian approach is well suited for modeling agents
in two-player repeated games.
When agents interact, it is useful for one agent to have an
idea of what other agents are going to do, what their plans
are, and what intentions guide both their plans and actions.
This work creates agent models that allow utilizing the ob-
servations from their past interactions to predict the modeled
agent’s actions, plans, and intentions to develop algorithms
that: (a) predict actions of another agent and (b) identify
their plans and intent.
The main concept of the work is based on the perspec-
tive from the literature in which intentions are the rea-
sons behind actions, and that plans are the means for map-
ping intentions to actions (Perner 1991; Bratman 1987;
Malle and Knobe 1997). The overarching objective of the
research is to infer another agent’s plans, goals, intentions
and predict behavior (actions) based on these inferences and
using observations of their actions in an environment. If an
AI agent is able to predict future actions, the agent can plan
Copyright c
2020, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
ahead for appropriate actions and hence be able to make bet-
ter decisions for the future.
This work makes predictions in the context of Repeated
Games (RGs). Game theory has been applied in numerous
ways to understand human/agent behavior, relationships,
and decision-making. RGs in game theory provide an en-
vironment for understanding and studying the relationship
between agents because the game construct requires each
agent to account for the consequence of its action on the
future action of the other agent. The dilemma of whether
to cooperate or to compete with each other has been ex-
tensively studied in the game Prisoners Dilemma in the lit-
erature of psychology, economics, politics and many other
disciplines. Hence, the same game has been used for this
study. Prior work (Crandall et al. 2018; Crandall 2014) in-
troduced the S# algorithm, which is designed as a robust al-
gorithm that learns to cooperate with its associate in many
2 by 2 matrix games. S# is built on top of S++ (Crandall
2014) with the ability to share costless, non-binding sig-
nals called “cheap talk” indicating its experts’ intentional-
ity. For better expert-selection, in each round prior to taking
an action, the players get an opportunity to communicate
by sharing their plans via cheap talk. This paper presents
a model for predicting actions, plans, and intents assuming
the agent to be modeled is an S# agent. S# is studied be-
cause it is a highly effective algorithm in RGs and it uses
explicit models of planners (called “experts”) that are moti-
vated by specific designer intentions (Crandall et al. 2018;
Oudah et al. 2018). In the context of modeling S#’s behav-
ior in RGs, we use a generative Bayesian model, which as-
sumes that agents have a number of internal states defining
the “state of mind” used to select what action they would
want to take given the observations they see. The observa-
tions are (a) the speech acts/proposals via cheap talk that the
players share with each other prior to taking their action and
(b) the actions taken by both the S# agent and the agent with
whom S# is interacting. Table 1 shows a few interactions of
S# against a human player ABL in Prisoners Dilemma.
The generative Bayesian model provides a probability dis-
tribution over the S# agent’s internal states. This probability
distribution can be used as input to algorithms that predict
the most likely state, the most likely action, the most likely
Round Player Speech acts Action Payoff
35 S# None B 20
35 ABL You betrayed me.
Curse you.
D 20
36 S# None B 100
36 ABL None C 0
37 S# In your face! I for-
give you. Let’s alter-
nate between AC and
BC. This round, let’s
play BC. Do as I say or
I will punish you.
B 100
37 ABL Let’s always play BD. C 0
38 S# Excellent. This round,
let’s play AC.
A 60
38 ABL None C 60
Table 1: S# vs human player ABL in the Prisoner’s Dilemma
plan, and the most likely intention. Additionally, this kind
of Bayesian model is not dependent on the type of RGs,
and thus could be used for many two-player RGs. Since the
model is based on observations in the environment and em-
ploys Bayesian reasoning, it does not require a huge dataset
to train the model for better performance. Two types of al-
gorithms for translating the distribution into predictions will
be explored: (a) a Maximum A Posteriori (MAP) estimate
of the most likely state, which implicitly identifies an ac-
tion, plan, and intention; and (b) estimates of actions, plans,
and intentions that aggregate probability over related states.
Related Literature
A variant of RGs implemented by Crandall et al. (Cran-
dall et al. 2018; Oudah et al. 2018), called RGs with “cheap
talk”, allows each player to share messages/ proposals with
the other player before actions are taken.
Intentional Action
Consider the motivation of using the notion of “inten-
tionality” from folk psychology as the basis for model-
ing other agents. As per (de Graaf and Malle 2017), peo-
ple regard “Autonomous Intelligent Systems” as intentional
agents; people, therefore, use the conceptual and psycholog-
ical means of human behavior explanation to understand and
interact with them. Folk Psychology suggests that it is the
belief, desire, and intention of humans that control human
behavior and that our intention is the precursor to the ac-
tion we take (Perner 1991). Hence, inferring the intent be-
hind a particular action allows a human to infer the plans
and goals of the agent. In this context, intent is associated
with the commitment to achieve a particular goal through a
plan (Bratman 1987). Once an intent is formed and a plan is
selected to achieve the desire, an “intentional” action is one
that is derived as an instrumental means for moving towards
the intent.
M. de Graaf et al. mention that so-called “Autonomous
Intelligent Systems” exhibit traits like planning and decision
making, and hence are considered “intentional agents” (de
Graaf and Malle 2017). They further claim that the behav-
iors of intentional agents can be explained using the human
conceptual framework known as behavior explanation.
Related to the literature on intentional agents is work in
“folk psychology” (Perner 1991; Bratman 1987; Malle and
Knobe 1997), in which agent beliefs,desires, and intentions
are used to explain how and why agents choose actions to-
wards reaching their goal. Baker et al. (2011) presented a
Bayesian model of human Theory of Mind based on in-
verse planning to make joint inferences about the agents’
desires and beliefs about unobserved aspects of the environ-
ment. Similar to our work, they model the world as a kind
of Markov Decision Process and use the observations in the
environment to generate posterior probabilities about the en-
vironment. Additionally, by inverting the model, they make
inferences about the agents’ beliefs and desires.
Modeling Other Agents
Seminal work on modeling agents in the field of game theory
was presented in (Axelrod and Hamilton 1981). Axelrod’s
models allow strategies to play against each other as agents
to determine the winning strategy in Prisoners Dilemma
tournaments. Early work on agent modeling tended to focus
on equilibrium solutions for games and has now extended to
various fields of computer science like (Lasota et al. 2017;
Stone and Veloso 2000; Kitano et al. 1997).
One modeling approach is to predict action probabil-
ities for the modeled agent, an early example which is
the Fictitious Play algorithm (Brown 1951). In contrast to
the simple empirical probability distributions of fictitious
play, other authors have worked on making action predic-
tions by learning the action probabilities of the opponent
conditioned on their own actions (Sen and Arora 1997;
Banerjee and Sen 2007).
Similar to our research objective, which is to be able to
predict the next moves of the opponent, Gaudesi et al. (2014)
worked on an algorithm called Turan to model the opponent
player using finite state machines. The work in (Deng and
Deng 2015) studies Prisoners Dilemma as a game with in-
complete information and using Bayes rule and past interac-
tion history to form a possibility distribution table for each
players’ choice to predict the players’ choices. Park et al.
(2016) assert that building precise models of players in It-
erated Prisoners Dilemma requires a good dataset, so they
use a Bootstrap aggregation approach to generate new data
randomly from the original dataset. Also, an observer uses
active learning approach to model the behavior of its oppo-
nent.
Inferring Intent
There has not been much research in predicting or inferring
intents of agents in RGs, but there has been previous work
in predicting the intent of agents in various other fields. In-
tent in prior work relates to goals and plans. Kuhlman et al.
(1975) talk about the goals of agents in mixed-motive games
by identifying their motivational orientation (cooperative,
individualistic, or competitive) based on their choice behav-
ior in decomposed games. Thus, knowing the motive of the
subject, they use it to predict actions for Prisoners Dilemma.
The work in (Cheng, Lo, and Leskovec 2017) linked intent
with goal specificity and temporal range when predicting
intents in online platforms. Very recent research work uses
deep-learning models for intent prediction (Qu et al. 2019;
Pˆ
ırvu et al. 2018). Rabkina et al. (2013) used a computa-
tional model based on analogical reasoning to enable intent
recognition and action prediction. Other methods that use
Bayesian models for intent prediction include (Mollaret et
al. 2015; Tavakkoli et al. 2007; Rios-Martinez et al. 2012).
Modeling Framework
Bayesian Graphical Model
A Bayesian graphical model is used to model S#. The model
begins with a prior probability distribution over possible S#
agent states and then propagates that distribution using ob-
servations of actions and speech-acts/proposals.
The structure of the Bayesian model is illustrated in Fig-
ure 1. The model makes it evident that future predictions are
based on the present state and immediate observations. Un-
derstanding this model is made easier by comparing it to a
Hidden Markov Model (HMM). An HMM is a five-tuple
HMM = S, O, p(s0), p(st+1|st), p(ot|st),
where Sis a finite set of (hidden) states, Ois a finite set
of observations, p(s0)is the initial state distribution (i.e.,
the distribution over states at time t= 0), p(st+1|st)repre-
sents the transition probability function that describes how
states change over time, and p(ot|st)is the emission prob-
abilities (i.e., the probability that a given observation otis
generated by hidden state st). An HMM is one of the most
simple dynamic Bayesian models because it describes how
states change over time. A common application of HMMs
is to try to infer a most likely hidden state from a series of
observations.
Like an HMM, our model is also a dynamic model, but
the inference task is slightly different and so are the model
elements. The proposed model differs from the traditional
HMM in two ways: First, the Bayesian model makes two
state transitions at a single time step, that is, there are two
hidden states at each time step. Second, there is an external
input to the model. Figure 1 illustrates the proposed model.
In the figure, the player being modeled is denoted by a sub-
script ‘-i’ whereas the player’s associate (in the game) is de-
noted by a subscript ‘i’.
The Bayesian model is a tuple with seven elements,
BM odel =S, O, Σ, psi(t)|si(t), zi(t), zi(t)),
p(si(t+ 1)|ˆsi(t), ai(t), ai(t)),
p(zi(t)|si(t)), p(ai(t)|ˆsi(t)).
As with the HMM, Srepresents the set of states and Orep-
resents the set of observations. The set of states has two dif-
ferent kinds of states, siS, which represent proposi-
tional states (states from which speech acts are generated)
and ˆsiS, which represent action states (states from
which game actions are chosen). The set of observations
Ohas two different kinds of observations: (1) the speech
acts/proposals, {zi} ∈ O, shared by the player being mod-
eled and (2) the action, {ai} ∈ O, taken by the player be-
ing modeled. Σis the set of exogenous inputs to the model,
which consists of the observed actions and speech acts of the
other player in the game, represented by aiand zi, respec-
tively.
As mentioned earlier, the model of agent iis based on
a time series with two types of hidden states, si(t)and
ˆsi(t). The proposed model takes two state transitions at
a single time step. For a single time step, the first state tran-
sition occurs from si(t)to ˆsi(t)based on the observation
of what proposals are shared. ˆsi(t)is a kind of temporary
state for S#, from which it generates its aspiration level to
choose the expert to play the game further. The next state
transition from ˆsi(t)to si(t+ 1) takes place based on the
observation of the actions of the players. This state transi-
tion gives the prediction for the state at the next time step,
which is then utilized in predicting the action of the modeled
player for the next time step.
S(t)is the set of states available to the modeling agent i
at time t, and is given by the union of all the states in each
expert’s state machine,
S(t) = j∈J Sφj.
Conditional Probabilities
The Bayesian model makes use of the priors and conditional
probabilities to find the posterior probability of the states
after each observation. The priors about the states of the ex-
perts represent the agent’s beliefs about the states. Ideally,
prior probabilities should be fairly close to the true proba-
bilities in real scenarios; prior probabilities affect the future
computations and the predictions to be made. For comput-
ing the priors, the initial knowledge about S# and how the
experts are formed and then selected is utilized.
Based on the game, S# generates a set of experts, which
are essentially strategies that employ learning algorithm to
select actions and generate and respond to speech acts, based
on the state it is in. To select an expert to take an action, the
“expert potential” needs to meet a specified aspiration level
and the expert also needs to carry out plans which are con-
gruent with its partner’s last proposed plan. Thus, the priors
for models of S# agents are the probabilities that S# selects
a particular expert in the first round based on the expert’s
potential and uniform distribution over the aspiration level,
with most of the probability assigned to the start state of the
expert.
The following conditional probability elements describe
the necessary components for designing the model. The no-
tation used in the conditional probabilities is given by:
a:action, z :proposal, i :P artner, i:S#.
1. Sensor Model 1 (Speech): Given the current state of the
agent, the sensor model provides the probability of seeing
a particular proposal.
p(zi(t)|si(t))
Figure 1: Modeling S#
2. Transition Model 1 (Reflection): For the same time step,
the transition model is used for transitioning to a new state
after the proposals are observed.
psi(t)|si(t), zi(t), zi(t))
3. Sensor Model 2 (Action): Given the state of the agent, the
sensor model provides the probability of seeing a particu-
lar action.
p(ai(t)|ˆsi(t))
4. Transition Model 2 (Update): Once the actions are taken
by both the players, the update model encodes how a state
transition occurs to a new state for the next time step.
p(si(t+ 1) |ˆsi(t), ai(t), ai(t))
The S# algorithm has Finite State Machines (FSMs) for
each of the experts which define what speech acts are to be
generated based on the internal states of the experts and the
events in the game (Crandall et al. 2018). The events in the
game could involve the event of selecting an expert or the
events that affect individual experts. These FSMs have state
transition functions that map the events, internal states, and
speech outputs. Hence, for the game, the above conditional
probabilities have been determined based on how S# acts in
each event and also adding some uncertainty to make sure
that other possible transitions are non-zero.
Updating the Probability Distribution – Bayes
Filter Algorithm
An algorithm is needed to aggregate observations into a dis-
tribution over the hidden states. Since the proposed model is
a dynamic Bayesian model, a Bayes Filter is an appropriate
algorithm (Thrun, Burgard, and Fox 2005). The Bayes Filter
is a general algorithm for estimating an unknown probability
density function over time given the observations. The be-
liefs are computed recursively and are updated at each time
step with the most recent observations. The algorithm pre-
sented in Algorithm 1 is the Bayes Filter Algorithm modi-
fied from the model in (Thrun, Burgard, and Fox 2005) to
reflect the two hidden states.
Predicting Actions, Plans, and Intents
The Bayesian model and the Bayes Filter algorithm yield
a probability distribution over hidden states in the model.
From this distribution, one of the main tasks in this paper is
to predict actions, plans, and intents. For each prediction, we
Algorithm 1: Bayes Filter Algorithm
1function Bayes Filter ();
2bel(s0);
3bel(s0) = ηP (zi(0) |bel(s0));
4for tto len(observations)do
5for ststates do
6bel( ˆst) = PstP( ˆst|zi(t), zi(t), st)bel(st);
7bel( ˆst) = ηP (ai(t)|st)bel( ˆst);
8end
9for ststates do
10 bel(st+1) =
PˆstP(st+1 |ai(t), ai(t),ˆst)bel( ˆst);
11 bel(st+1) = η P (zi(t)|st+1)bel(st+1);
12 end
13 end
will explore two methods: a MAP estimate and a more com-
plete aggregation method. This subsection addresses how
actions, plans, and intent can be predicted.
Predicting an Action The results presented in this paper
have been obtained for the Prisoners Dilemma game, so the
set of possible actions contains Cooperate, Defect.
MAP Estimate for Action Prediction The MAP estimate
takes the maximum of all the probabilities over the actions
available to predict an action, ˆaMAP. The action probabili-
ties are calculated by aggregating over all states as:
ˆaMAP = arg max
aX
sS
P(s)P(a|s)
Aggregation Method for Action Prediction Each expert
φjhas different states sSφjwith the probability distri-
bution over each of these states P(sφj)generated by the
Bayesian model. Summing probabilities for all the states
that belong to a given expert is done for each expert giv-
ing p(φj) = PsSφj
P(s). The expert with the maximum
probability is identified, and the action is selected with the
equation below:
ˆaφ= arg max
aX
sSφj
P(s)P(a|s)
Predicting a Plan We can categorize each expert as hav-
ing a follower type or a leader type, as per the categorization
of plan in (Littman and Stone 2001). A “leader type” creates
strategies that will influence its partner to play a particular
action by playing a trigger strategy that induces its partner to
comply or be punished. Trigger strategies are the ones where
a player begins by cooperation but defects for a predefined
period of time when the other player shows a certain level of
defection (opponent triggers through defection). The experts
have a punishment stage in their state diagrams. The punish-
ment phase is the strategy designed to minimize the partner’s
maximum expected payoff. The punishment phase persists
from the time the partner deviates from the offer until the
sum of its partner’s payoffs (from the time of the deviation)
is below what it would have obtained had it not deviated
from the offer (Crandall et al. 2018). Hence the partner’s
optimal strategy would be to follow the offer.
A “follower type” expects its partner to do something and
it plays the best response to their move. A follower assumes
that its partner is using a trigger strategy. That means it as-
sumes that its partner will propose an offer which is expected
to be followed or else it might be punished. Following an
offer may require the player to play fair (both getting the
same payoff), to bully (demanding higher payoff than its as-
sociate) or to be bullied (accept lower payoffs than its asso-
ciate).
Two approaches can be used to estimate the
leader/follower plan being used: MAP and Aggregation.
MAP Estimate for Plan Prediction Let θ(φi)
{leader, follower}indicate the “type” of expert φi. The plan
is then the most probable type. For this we first identify the
MAP estimate for which expert is most likely and then select
that expert’s type,
ˆ
θ=θarg max
φiX
sSφi
P(s).
Aggregation Method for Plan Prediction Similar to how
the probabilities of actions could be aggregated across states,
we can aggregate probabilities across plan types and then
choose the most likely type as follows:
ˆ
θ= arg max
type∈{lead, foll}X
θ(φi)==type X
sSφi
P(s).
Predicting Intent Each expert can be categorized by the
goal it seeks to achieve by adopting its strategy. This is simi-
lar to categorizing agents by plan type, but the categorization
is by intent type. We identify the intent using the Bayesian
model of S# using the simple rule: the intent of S# is the
intent of the expert it uses to achieve its goal. S# experts
fall into two goal types: “Maximizing Payoff” and “Maxi-
mizing Fairness” (by minimizing the difference between the
two player’s payoffs).
Intent can be predicted by identifying the intent type of
the most likely expert using the MAP estimate, or by ag-
gregating probabilities over intent types and then selecting
the most probable type. These two prediction methods are
exactly analogous to predicting plan type from the previous
subsection so details are omitted.
Experiments, Results and Discussion
Data preparation
The dataset used in this work is from previous work by
Crandall et al. (2018) on RGs with cheap talk. Interac-
tion logs have been recorded for human-human interaction
and human interaction with S#. Two players play Prisoners
Dilemma against each other, each game lasting 51 rounds.
For each round, each player gets an opportunity to share
messages before taking their actions (which could include
their plan to play a particular action, or anything they would
want to say to their opponent). Interaction logs are formed
based on those game logs, consisting of payoffs, cheap talk
and actions played by the players in each gameplay. There
are a total of 24 interactions, 12 human-human games and
12 human-S# games, lasting 51 rounds each.
Another dataset is formed by having the strategies shown
in Table 2 play against both the S# and human players. This
dataset is used to compare the predictions of the proposed
graphical Bayesian Model to evaluate its performance.
Predicting Intent and Plans
Two approaches were used for predicting the intent and
the plan of the players for the repeated Prisoners Dilemma
game: the MAP and aggregation method. In our exper-
iments, both the methods for predicting intent predicted
“Maximize payoff” as the intent of the players for all interac-
tions. For predictions for S#, the models’ predictions comply
with the actual intent of the experts of S#. This is because the
experts of S# were designed in (Crandall et al. 2018) with
the intent to Maximize Payoff, except the Bouncer strategy
which is never initialized for the Prisoners Dilemma game
(Bouncer is relevant for other repeated games).
For validating the plan prediction, the interaction history
was run through S# to see which of the experts were se-
lected during each interaction, and hence the corresponding
plan followed by the expert was considered as the true plan
followed.
When used for humans, the model also predicts the intent
to be “Maximizing Payoff”. Unfortunately, we do not have
measures to evaluate the intent prediction for humans for this
game, which could be considered one of the limitations in
our work. The intent prediction is based on the intent of the
experts of S#. It would have been interesting to evaluate the
intent of the players from a different perspective like with
respect to their personalities or motivational orientation as
in the work (Kuhlman and Marshello 1975), where the goal
of cooperative, competitive, and individualistic agents is to
achieve joint gain, relative gain, and own gain respectively.
Both MAP and Aggregation approaches achieved an ac-
curacy of 88% for predicting plans for S#. Paired T-test
shows that the difference in average performance between
the MAP approach and Aggregation for plan prediction is
not big enough to be statistically significant (p = 0.0997).
But for predictions without cheap talk, paired T-test show
that the difference in average performance between the MAP
Figure 2: Average action prediction accuracy.
Figure 3: Action prediction comparison for players who lie.
approach and Aggregation is statistically significant (p =
0.0328), MAP being better.
Predicting actions
Average accuracy for action predictions The average ac-
curacy for predicting actions using the MAP and Aggrega-
tion was calculated for modeling S#. Considering humans
have similar internal states, the ability to form intentions,
and plans to take actions, the same model was then used
to model humans. MAP performed better than Aggregation
and was able to predict the actions 89.05% of the time for
S#, and 88.45% of the time for humans. We also tried ex-
perimenting on predicting the actions without using cheap
talk and achieved an accuracy of 85.62% for S# and 88.02%
for humans. Figure 2 summarizes the results. Paired T-test
shows that the difference in average performance between
the MAP and Aggregation approaches is not big enough
to be statistically significant (p = 0.338). However, without
cheap talk, the paired T-test shows that MAP performs sig-
nificantly better than Aggregation for predicting actions(p =
0.0328).
There were 7 predictions where the accuracy was less than
80%. Looking at the data, we found that this was because
of players who lie (51% of the time on average). Lying
refers to proposing a particular action but taking a different
Figure 4: Comparison of action predictions for modeling
Player 1 and 2 (Our model vs Others).
one during his/her turn. If we omit such interactions, MAP
was 93.31% accurate for modeling humans who lie less fre-
quently (18.6% of the time on average), and when we ig-
nored the cheap talk, it was 92.2% accurate.
We see that the predictions were always better with cheap
talk(without lying) as it provided more information about
the interaction. It was interesting to see that the only time the
accuracy bumped up when not using cheap talk, was when
we modeled humans who lie. In this case, the accuracy of
our model increased from 68.35% to 70.59% without cheap
talk (for MAP approach). Thus, with this observation, we
realize that our model performs well for modeling both S#
and humans except for the agents who lie. The results are
presented in Figure 3.
Comparing MAP predictions to fixed models
The action predictions from our model were compared with
predictions from the fixed models presented in (Fudenberg,
Rand, and Dreber 2012). The performance of the MAP and
aggregation were comparable. The Bayesian model outper-
formed the fixed strategies. However, it was interesting to
see that Tit for tat performed nearly as well as our model for
the action predictions. Also, the Exploitive Tit for Tat also
performed very close to that of Tit for Tat.
Figure 4 shows how our model performed in modeling
players 1 and 2 vs other strategies. The player number sim-
ply indicates the player who goes first in each game. The
following subsection presents how our model performs bet-
ter in modeling dynamic behavior in agents as compared to
the fixed strategy models.
Figure 5: Performance with <=25% continuous repetition
in actions.
Comparing MAP predictions excluding consistent
interactions
Each of the interactions between all the players were ob-
served carefully. Out of 48, 24 of the players had taken the
same action repeatedly for more than 75% of the rounds.
For fixed strategies like Tit for tat, it is easier to make cor-
rect predictions in such cases, as interactions were consistent
for at least 75% of the rounds. So, another analysis was per-
formed where we considered only those interactions which
had more variance in its actions in each of the rounds. More
specifically, interactions with only 25% or fewer continu-
ous repetition of the same action were taken. The results
are presented in Figure 5. Previously, we observed that Tit
for tat and exploitative Tit for tat performed very close to
our models. But as we compare the interactions with more
variance in actions across the rounds, our models were able
to perform significantly better as given by paired T-test (p
= 0.0160). In addition, as we disregarded the interactions
including lies, the performance of MAP was increased to
82%, and that of Aggregation increased to 81%.
Conclusion
This paper presented a graphical generative Bayesian model
that models the S# algorithm for two-player repeated games.
The highlight of the model is its ability to model the inter-
nal states of S#, considering each observation to calculate
the posterior state probabilities, which could also be used to
model humans. The other benefit of using this kind of model
is that it is game independent, so it could be used for any
two-player repeated game.
In comparison with other strategy models, the MAP ap-
proach on the Bayesian model performed the best in pre-
dicting actions for the Prisoners Dilemma game. It could
better model the dynamic actions of players as compared
to the other fixed strategy models. Also, for both plan and
action prediction, MAP performed significantly better with-
out cheap talk, i.e. when it is an ordinary repeated game.
Additionally, obtaining a high accuracy for plans and in-
tent prediction using the different approaches based on the
Strategy Description
Always Cooperate Always play C
Tit-for-Tat (TFT) Play C unless partner played D last round
TF2T C unless D played in both last 2 rounds
TF3T C unless D played in both last 3 rounds
2-Tits-for-1-Tat Play C unless partner played D in either
of the last 2 rounds (2 rounds of punish-
ment if partner plays D)
2-Tits-for-2-Tats Play C unless partner played 2 consecu-
tive Ds in the last 3 rounds (2 rounds of
punishment if D played twice in a row)
T2 Play C until either player plays D, then
play D twice and return to C
Grim Play C until either plays D, then play D
Lenient Grim 2 Play C until 2 consecutive rounds occur
in which either played D, then play D
Lenient Grim 3 Play C until 3 consecutive rounds occur
in which either played D, then play D
Perfect TFT/Win-
StayLose-Shift
Play C if both players chose the same
move last round, otherwise play D
Perfect Tit-for-Tat
with 2 rounds of
punishment
Play C if both players played C in the last
2 rounds, both players played D in the
last 2 rounds, or both players played D
2 rounds ago and C last round. Otherwise
play D
Always Defect Always play D
False cooperator Play C in the first round, then D forever
Expl. Tit-for-Tat Play D in the first round, then play TFT
Expl. Tit-for-2-Tats Play D in the first round, then play TF2T
Expl. Tit-for-3-Tats Play D in the first round, then play TF3T
Expl. Grim2 Play D in the first round, then play Grim2
Expl. Grim3 Play D in the first round, then play Grim3
Alternator DCDC ...
Pavlov Start with C, Always play C if partner
does not play D
Table 2: Existing strategies for Prisoners Dilemma.
same model, we can say that this Graphical Bayesian Model
shows promise for modeling agents in two-player repeated
games.
However, the model had some limitations. It was not able
to detect its partners lying in the game and hence did not
perform very well in such situations. A future enhancement
could include creating experts for S# having the ability to
deal with lies in the game. Also, further exploration of the
intent of players, based on other dimensions is necessary. It
would have been interesting to study intent from a different
perspective such as based on the personality of the players,
and how the intention of players change over time.
Acknowledgements
This work was supported in part by the U.S. Office of Naval
Research under Grant #N00014-18-1-2503. All opinions,
findings, conclusions, and recommendations expressed in
this paper are those of the author and do not necessarily re-
flect the views of the Office of Naval Research.
References
Axelrod, R., and Hamilton, W. D. 1981. The evolution of
cooperation. science 211(4489):1390–1396.
Baker, C.; Saxe, R.; and Tenenbaum, J. 2011. Bayesian
theory of mind: Modeling joint belief-desire attribution. In
Proceedings of the annual meeting of the cognitive science
society, volume 33.
Banerjee, D., and Sen, S. 2007. Reaching pareto-optimality
in prisoner’s dilemma using conditional joint action learn-
ing. Autonomous Agents and Multi-Agent Systems 15(1):91–
108.
Bratman, M. 1987. Intention, plans, and practical reason,
volume 10. Harvard University Press Cambridge, MA.
Brown, G. W. 1951. Iterative solution of games by ficti-
tious play. Activity analysis of production and allocation
13(1):374–376.
Cheng, J.; Lo, C.; and Leskovec, J. 2017. Predicting in-
tent using activity logs: How goal specificity and temporal
range affect user behavior. In Proceedings of the 26th Inter-
national Conference on World Wide Web Companion, 593–
601. International World Wide Web Conferences Steering
Committee.
Crandall, J. W.; Oudah, M.; Ishowo-Oloko, F.; Abdallah, S.;
Bonnefon, J.-F.; Cebrian, M.; Shariff, A.; Goodrich, M. A.;
Rahwan, I.; et al. 2018. Cooperating with machines. Nature
communications 9(1):233.
Crandall, J. W. 2014. Towards minimizing disappointment
in repeated games. Journal of Artificial Intelligence Re-
search 49:111–142.
de Graaf, M., and Malle, B. 2017. How people explain
action (and ais should too). In Proceedings of the Artificial
Intelligence for Human-Robot Interaction (AI-for-HRI) fall
symposium.
Deng, X., and Deng, J. 2015. A study of prisoner’s dilemma
game model with incomplete information. Mathematical
Problems in Engineering 2015.
Fudenberg, D.; Rand, D. G.; and Dreber, A. 2012. Slow to
anger and fast to forgive: Cooperation in an uncertain world.
American Economic Review 102(2):720–49.
Gaudesi, M.; Piccolo, E.; Squillero, G.; and Tonda, A. 2014.
Turan: evolving non-deterministic players for the iterated
prisoner’s dilemma. In 2014 IEEE Congress on Evolution-
ary Computation (CEC), 21–27. IEEE.
Kitano, H.; Tambe, M.; Stone, P.; Veloso, M.; Coradeschi,
S.; Osawa, E.; Matsubara, H.; Noda, I.; and Asada, M. 1997.
The robocup synthetic agent challenge 97. In Robot Soccer
World Cup, 62–73. Springer.
Kuhlman, D. M., and Marshello, A. F. 1975. Individual
differences in game motivation as moderators of prepro-
grammed strategy effects in prisoner’s dilemma. Journal of
personality and social psychology 32(5):922.
Lasota, P. A.; Fong, T.; Shah, J. A.; et al. 2017. A survey of
methods for safe human-robot interaction. Foundations and
Trends R
in Robotics 5(4):261–349.
Littman, M. L., and Stone, P. 2001. Leading best-response
strategies in repeated games. In In Seventeenth Annual In-
ternational Joint Conference on Artificial Intelligence Work-
shop on Economic Agents, Models, and Mechanisms. Cite-
seer.
Malle, B. F., and Knobe, J. 1997. The folk concept of
intentionality. Journal of experimental social psychology
33(2):101–121.
Mollaret, C.; Mekonnen, A. A.; Ferran´
e, I.; Pinquier, J.; and
Lerasle, F. 2015. Perceiving user’s intention-for-interaction:
A probabilistic multimodal data fusion scheme. In 2015
IEEE International Conference on Multimedia and Expo
(ICME), 1–6. IEEE.
Oudah, M.; Rahwan, T.; Crandall, T.; and Crandall, J. W.
2018. How ai wins friends and influences people in repeated
games with cheap talk. In Thirty-Second AAAI Conference
on Artificial Intelligence.
Park, H., and Kim, K.-J. 2016. Active player modeling in
the iterated prisoner’s dilemma. Computational intelligence
and neuroscience 2016:38.
Perner, J. 1991. Understanding the representational mind.
The MIT Press.
Pˆ
ırvu, M. C.; Anghel, A.; Borodescu, C.; and Constantin, A.
2018. Predicting user intent from search queries using both
cnns and rnns. arXiv preprint arXiv:1812.07324.
Qu, C.; Yang, L.; Croft, W. B.; Zhang, Y.; Trippas, J. R.;
and Qiu, M. 2019. User intent prediction in information-
seeking conversations. In Proceedings of the 2019 Confer-
ence on Human Information Interaction and Retrieval, 25–
33. ACM.
Rabkina, I., and Forbus, K. D. 2013. Analogical reasoning
for intent recognition and action prediction in multi-agent
systems.
Rios-Martinez, J.; Escobedo, A.; Spalanzani, A.; and
Laugier, C. 2012. Intention driven human aware navigation
for assisted mobility.
Sen, S., and Arora, N. 1997. Learning to take risks. In
AAAI-97 Workshop on Multiagent Learning, 59–64.
Stone, P., and Veloso, M. 2000. Multiagent systems: A
survey from a machine learning perspective. Autonomous
Robots 8(3):345–383.
Tavakkoli, A.; Kelley, R.; King, C.; Nicolescu, M.; Nico-
lescu, M.; and Bebis, G. 2007. A vision-based architecture
for intent recognition. In International Symposium on Visual
Computing, 173–182. Springer.
Thrun, S.; Burgard, W.; and Fox, D. 2005. Probabilistic
robotics. MIT press.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Since Alan Turing envisioned Artificial Intelligence (AI) [1], a major driving force behind technical progress has been competition with human cognition. Historical milestones have been frequently associated with computers matching or outperforming humans in difficult cognitive tasks (e.g. face recognition [2], personality classification [3], driving cars [4], or playing video games [5]), or defeating humans in strategic zero-sum encounters (e.g. Chess [6], Checkers [7], Jeopardy! [8], Poker [9], or Go [10]). In contrast, less attention has been given to developing autonomous machines that establish mutually cooperative relationships with people who may not share the machine's preferences. A main challenge has been that human cooperation does not require sheer computational power, but rather relies on intuition [11], cultural norms [12], emotions and signals [13, 14, 15, 16], and pre-evolved dispositions toward cooperation [17], common-sense mechanisms that are difficult to encode in machines for arbitrary contexts. Here, we combine a state-of-the-art machine-learning algorithm with novel mechanisms for generating and acting on signals to produce a new learning algorithm that cooperates with people and other machines at levels that rival human cooperation in a variety of two-player repeated stochastic games. This is the first general-purpose algorithm that is capable, given a description of a previously unseen game environment, of learning to cooperate with people within short timescales in scenarios previously unanticipated by algorithm designers. This is achieved without complex opponent modeling or higher-order theories of mind, thus showing that flexible, fast, and general human-machine cooperation is computationally achievable using a non-trivial, but ultimately simple, set of algorithmic mechanisms.
Article
Full-text available
Prisoners’ dilemma is a typical game theory issue. In our study, it is regarded as an incomplete information game with unpublicized game strategies. We solve our problem by establishing a machine learning model using Bayes formula. The model established is referred to as the Bayes model. Based on the Bayesian model, we can make the prediction of players’ choices to better complete the unknown information in the game. And we suggest the hash table to make improvement in space and time complexity. We build a game system with several types of game strategy for testing. In double- or multiplayer games, the Bayes model is more superior to other strategy models; the total income using Bayes model is higher than that of other models. Moreover, from the result of the games on the natural model with Bayes model, as well as the natural model with TFT model, it is found that Bayes model accrued more benefits than TFT model on average. This demonstrates that the Bayes model introduced in this study is feasible and effective. Therefore, it provides a novel method of solving incomplete information game problem.
Article
Full-text available
When perceiving, explaining, or criticizing human behavior, people distinguish between intentional and unintentional actions. To do so, they rely on a shared folk concept of intentionality. In contrast to past speculative models, this article provides an empirically based model of this concept. Study 1 demonstrates that people agree substantially in their judgments of intentionality, suggesting a shared underlying concept. Study 2 reveals that when asked to define directly the termintentional,people mention four components of intentionality: desire, belief, intention, and awareness. Study 3 confirms the importance of a fifth component, namely skill. In light of these findings, the authors propose a model of the folk concept of intentionality and provide a further test in Study 4. The discussion compares the proposed model to past ones and examines its implications for social perception, attribution, and cognitive development.
Conference Paper
Full-text available
Ensuring proper living conditions for an ever growing number of elderly people is an important challenge for many countries. The difficulty and cost of hiring and training specialized personnel has fostered research in assistive robotics as a viable alternative. In particular, this paper studies the case of a robotic wheelchair, specifically its autonomous navigation and user adapted control. Integration of a technique to interpret user intention using head movements and a human aware motion planner is presented. Test results exhibit emerging behavior showing a robotic wheelchair interpreting gesture commands and taking the user to his desired goal, respecting social conventions during its navigation.
Conference Paper
People have different intents in using online platforms. They may be trying to accomplish specific, short-term goals, or less well-defined, longer-term goals. While understanding user intent is fundamental to the design and personalization of online platforms, little is known about how intent varies across individuals, or how it relates to their behavior. Here, we develop a framework for understanding intent in terms of goal specificity and temporal range. Our methodology combines survey-based methodology with an observational analysis of user activity. Applying this framework to Pinterest, we surveyed nearly 6000 users to quantify their intent, and then studied their subsequent behavior on the web site. We find that goal specificity is bimodal - users tend to be either strongly goal-specific or goal-nonspecific. Goal-specific users search more and consume less content in greater detail than goal-nonspecific users: they spend more time using Pinterest, but are less likely to return in the near future. Users with short-term goals are also more focused and more likely to refer to past saved content than users with long-term goals, but less likely to save content for the future. Further, intent can vary by demographic, and with the topic of interest. Last, we show that user's intent and activity are intimately related by building a model that can predict a user's intent for using Pinterest after observing their activity for only two minutes. Altogether, this work shows how intent can be predicted from user behavior.
Article
Examined the impact of 3 programed strategies (tit-for-tat, 100% cooperation, and 100% defection) on cooperation level in the prisoner's dilemma game as a function of the S's motivational orientation (cooperative, competitive, or individualistic). Motivational orientation was assessed on the basis of each S's choices across four classes of decomposed games. Following this assessment, Ss played 30 trials of prisoner's dilemma in matrix form against one of the above-mentioned strategies. Ss were 167 undergraduates. Results show that (a) cooperatively oriented Ss cooperated with a tit-for-tat and a 100% cooperative strategy, but defected against a 100% defecting strategy; (b) competitive Ss defected against all 3 strategies; and (c) individualistic Ss defected against both 100% cooperative and 100% defective strategies, but they cooperated with a tit-for-tat strategy. It is concluded that the outcomes of a prisoner's dilemma have affectively different meanings (i.e., values) for Ss of differing orientations, and that Ss of all 3 orientations adopt strategies that effectively maximize their particular type of reward in the game. (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
We consider the problem of learning in repeated games against arbitrary associates. Specifically, we study the ability of expert algorithms to quickly learn effective strategies in repeated games, towards the ultimate goal of learning near-optimal behavior against any arbitrary associate within only a handful of interactions. Our contribution is three-fold. First, we advocate a new metric, called disappointment, for evaluating expert algorithms in repeated games. Unlike minimizing traditional notions of regret, minimizing disappointment in repeated games is equivalent to maximizing payoffs. Unfortunately, eliminating disappointment is impossible to guarantee in general. However, it is possible for an expert algorithm to quickly achieve low disappointment against many known classes of algorithms in many games. Second, we show that popular existing expert algorithms often fail to achieve low disappointment against a variety of associates, particularly in early rounds of the game. Finally, we describe a new meta-algorithm that can be applied to existing expert algorithms to substantially reduce disappointment in many two-player repeated games when associates follow various static, reinforcement learning, and expert algorithms.