Conference PaperPDF Available

Evolutionary Advantage of Reciprocity in Collision Avoidance

Authors:
  • DeepMind and University of Liverpool
Evolutionary Advantage of
Reciprocity in Collision Avoidance
Daniel Hennes, Daniel Claes, and Karl Tuyls
Department of Knowledge Engineering, Maastricht University
P.O. Box 616, 6200MD Maastricht, The Netherlands
{daniel.hennes,daniel.claes,k.tuyls}@maastrichtuniversity.nl
Abstract. Collision avoidance is a complex task, especially in the pres-
ence of dynamic obstacles. The task increases in complexity when the
dynamic obstacles are mobile robots that also take actions to avoid colli-
sions. On the other hand, assuming mutual avoidance (reciprocity), can
improve avoidance behavior since each robot only takes half of the re-
sponsibility of avoiding pairwise collisions. This paper combines research
in evolutionary game theory and multi-robot collision avoidance to an-
alyze the stability of various velocity obstacle based collision avoidance
methods in competition. Results show that reciprocity is advantageous
under evolutionary dynamics.
Keywords: Multi-robot collision avoidance, evolutionary game theory,
heuristic payoff tables, replicator dynamics
1 Introduction
Collision avoidance is relevant for a variety of domains and applications, e.g.,
crowd dynamics as well as automobile, aircraft, vessel, or multi-robot collision
avoidance systems. Each of the above is of high importance in everyday-life, and
of great theoretical and practical interest. Crowd dynamics [6] is the study of
pedestrian motion and how individuals affect each others’ movements locally. A
better understanding of crowd dynamics makes it possible to accurately simulate
emergency and evacuation scenarios, resulting in improved guidelines to prevent
blockages and jamming in case of a panic stampede.
Driver assistance or fully autonomous cars are further examples of appli-
cations requiring collision avoidance systems. Aircraft and vessel traffic follows
predefined sets of rules to avoid collisions or resolve such situations. Automo-
bile, aircraft and vessel traffic are application areas of collision avoidance where
a game-theoretic analysis has been applied in the past [8, 12, 17].
Finally, with the expected increase in the number of robots deployed in fac-
tories as well as in home environments, there is a need for methods to avoid
collision situations in a variety of multi-robot systems [11, 15].
In this paper we analyze the evolutionary stability of various local collision
avoidance methods in simulation. Local collision avoidance is the task of steering
free of collisions with static and dynamic obstacles, while following a global plan
2 Daniel Hennes, Daniel Claes and Karl Tuyls
Table 1. The game of chicken payoff table.
Swerve Straight
Swerve 0, 0 1, 1
Straight 1,110,10
to navigate towards a goal location. Static obstacles can be avoided using tra-
ditional planning algorithms whereas dynamic obstacles pose a tough challenge.
An intuitive approach is to observe consecutive obstacle positions in order to
extrapolate the future trajectory. The velocity obstacle [3] is a geometric rep-
resentation of all velocities that will eventually result in a collision given that
the dynamic obstacle maintains the observed velocity. Velocity obstacles find
application in robotics [15, 13, 7, 1, 2] and have also been applied to the study of
crowd dynamics [5].
The remainder of the paper is organized as follows. In Section 2 we introduce
a stylized game example of collision avoidance and evolutionary game theory.
Section 3 covers multi-robot collision avoidance based on the velocity obstacle
paradigm. In particular, we discuss velocity obstacles, reciprocal velocity obsta-
cles and hybrid velocity obstacles. In Section 4 we explain the application of
evolutionary game theory to collision avoidance and discuss the resulting dy-
namics. Finally, Section 5 concludes the paper.
2 Background
In this section we introduce the concepts of evolutionary game theory using a
stylized example, which relates to the situation of collision avoidance: the game
of chicken [10]:
The game of chicken: Two drivers are headed at high speeds for a narrow
passage from opposite directions. If both drivers continue to drive straight
the result is a catastrophic head-on collision. Whoever swerves is considered
a ”chicken” and yields the way to the other driver. The best outcome for a
driver is thus continuing straight while the other swerves and is the “chicken”
thereby successfully avoiding a collision.
We assume identical driver reaction times and turning radii of the cars. There-
fore, this strategic situation occurs in the last instant before a crash is unavoid-
able. Each driver faces the choice of continuing straight or swerving to the side.
The decisions are taking simultaneously and can not be revoked. The payoffs
for the game of chicken are shown in Table 1. There are two pure equilibria in
the game of chicken, “straight–swerve” with payoffs 1,1 and “swerve–straight”
with payoffs: 1,1. Neither player has a dominant strategy as each player’s best
strategy depends on the strategy played by the adversarial. In addition, there is
Evolutionary Advantage of Reciprocity in Collision Avoidance 3
0 11
x1
Fig. 1. Symmetric replicator dynamics of the game of chicken. State x1= 0 corre-
sponds to a population of “swerve-only”, while x1= 1 corresponds to a population of
“straight-only”. State x= ( 9
10 ,1
10 ) is the unique asymptotically stable fixed point of
the dynamics.
one symmetric mixed Nash equilibrium where both players play “straight” with
probability 9
10 and “swerve” with probability 1
10 .
2.1 Evolutionary game theory
Traditional game theory assumes perfectly rational and self-interested players.
Consequently, classical game theory aims at finding an optimal strategy that
maximizes the expected utility for a player. In contrast, evolutionary game the-
ory is a descriptive approach. A game is played repeatedly by boundedly rational
players with little or no knowledge of the game. Two randomly matched indi-
viduals play preassigned pure strategies according to their phenotype. In evolu-
tionary game theory, the payoff determines the fitness (value of success) of the
represented strategy, or phenotype. The evolutionary process is modeled using
biological-inspired operators, i.e., natural selection, replication and mutation.
Two core concepts of evolutionary game theory are the replicator dynamics and
evolutionarily stable strategies.
The continuous-time replicator dynamics [14] formally define the population
change over time. An infinite population state is represented by a probability
distribution xover all phenotypes (pure strategies). The payoff function fican
be interpreted as the Darwinian fitness of phenotype i.
˙xi=xi
fi(x)X
j
xjfj(x)
(1)
Evolutionarily stable states [9] are such population distributions that are fixed
points of the replicator dynamics, i.e., ˙x= 0, and where small perturbations
|ˆxx|<  would be driven back to xby selection pressure, i.e., by following the
replicator dynamics.
Figure 1 shows the continuous-time replicator dynamics in the game of chicken
with payoff values as shown in Table 1. All interior points converge to the mixed
Nash equilibrium, which is an evolutionarily stable strategy.
The continuous-time replicator dynamics model a population of individu-
als that are matched up at random and play a one-shot game. The dynamics
shown in Figure 1 are thus the result of the same collision avoidance interaction
occurring over and over with different random drivers.
4 Daniel Hennes, Daniel Claes and Karl Tuyls
3 Velocity Obstacles
Clearly, the game of chicken presents a very abstract (and radical) situation.
Collision avoidance in real domains requires more elaborate decision making than
simply choosing between driving straight or swerving to the side. One approach
to collision avoidance in continuous spaces is the velocity obstacle paradigm. The
velocity obstacle (VO) was first introduced by [3] for local collision avoidance
and navigation in dynamic environments with multiple moving objects. The
subsequent definition of the VO assumes planar motions, though the concept
extends to three dimensional motions in a straightforward manner.
Let us assume a workspace configuration with two robots on a collision course
as shown in Figure 2(a). If the position and speed of the moving object (robot
RB) is known to RA, we can mark a region in the robot’s velocity space which
leads to collision under current velocities and is thus unsafe. This region resem-
bles a cone with the apex at RB’s velocity vB, and two rays that are tangential
to the convex hull of the Minkowski sum of the footprints of the two robots. The
Minkowski sum for two sets of points Aand Bis defined as:
AB={a+b|aA, b B}(2)
We define the operator to denote the convex hull of the Minkowski sum such
that ABresults in the points on the convex hull of the Minkowski sum of A
and B. In the example, the two robots have circular footprints with radii rAand
rBrespectively.
The direction of the left and right ray is then defined as:
θlef t = max
pi∈FA⊕FB
atan2((prel +pi)·prel ,(prel +pi)·prel )
θright = min
pi∈FA⊕FB
atan2((prel +pi)·prel ,(prel +pi)·prel )
where prel =pBpAis the relative position of the two robots and FA⊕ FBis
the convex hull of the Minkowski sum of the footprints of the two robots. The
atan2 expression computes the signed angle between two vectors. The resulting
angles θlef t and θright are left and right of prel.
In the example in Figure 2, robot RA’s velocity vector vApoints into the VO,
thus we know that RAand RBare on collision course. Each robot computes a
VO for each of the other robots. If all robots at any given time step select
velocities outside of the VOs, the trajectories are guaranteed to be collision free.
However, oscillations can still occur when the robots are on collision course. Since
all robots select a new velocity outside of all velocity obstacles independently,
at the next time step, the old velocities pointing towards the goal will become
available again. Hence, all robots select their old velocities, which will be on
collision course again after the next time step.
To overcome these oscillations, the reciprocal velocity obstacle (RVO) was
introduced by [15]. The surrounding moving obstacles are in fact also pro-active
robots and thus aim to avoid collisions too. Assuming that each robot takes care
Evolutionary Advantage of Reciprocity in Collision Avoidance 5
x
y
rA
rB
pA
pB
vA
vB
(a) Workspace configuration
vx
vy
rA+rB
pBpA
vA
vB
(b) VO
(vA+vB)
2
vx
vy
vA
vB
(c) RVO
vx
vy
vA
vB
(d) HRVO
Fig. 2. Creating the different velocity obstacles out of a workspace configuration. (a)
A workspace configuration with two robots RAand RB. (b) Translating the situation
into the velocity space and the resulting velocity obstacle (VO) for RA. (c) Translating
the VO by vA+vB
2results in the reciprocal velocity obstacle (RVO), i.e., each robot
has to take care of half of the collision avoidance. (d) Translating the apex of the RVO
to the intersection of the closest leg of the RVO to the own velocity, and the leg of
the VO that corresponds to the leg that is furthest away from the own velocity. This
encourages passing the robot on a preferred side, i.e., in this example passing on the
left. The resulting cone is the hybrid velocity obstacle (HRVO).
of half of the collision avoidance, the apex of the VO can be translated to vA+vB
2.
Furthermore, this leads to the property that if every robot chooses a velocity
outside of the RVO closest to the current velocity, the robots will pass on the
same side. However, each robot optimizes its commanded velocity with respect
6 Daniel Hennes, Daniel Claes and Karl Tuyls
rA+rB
τ
vx
vy
(a) Truncated VO
vx
vy
vA
vB
(b) Truncated HRVO
Fig. 3. (a) Truncation of a VO of a static obstacle at τ= 2 and approximating the
truncation by a line. (b) Translating the truncated cone according to the HRVO method
to get a truncated HRVO.
to a preferred velocity in order to make progress towards its goal location. This
can lead to reciprocal dances, i.e., where both robots first try to avoid to the
same side and then to the other side. In a situation with perfect symmetry and
sensing, this behavior continues indefinitely.
To counter these situations, the hybrid velocity obstacle (HRVO) was intro-
duced by [13]. Figure 2(d) shows the construction of the HRVO. To encourage
the selection of a velocity towards the preferred side, e.g. left in this example,
the other leg of the RVO is substituted with the corresponding leg of the VO.
The new apex is the intersection of the line of the one leg from RVO and the
line of the other leg from the VO. This reduces the chance of selecting a velocity
on the “wrong” side of the velocity obstacle and thus the chance of a reciprocal
dance, while not overconstraining the velocity space. The robot might still try
to pass on the “wrong” side, e.g., another robot induces a HRVO that blocks
the whole side, but then soon all other robots will adapt to the new side too.
3.1 Truncation
When the workspace is cluttered with many robots that do not move or only
move slowly, the apices of the HRVOs are close to the origin in velocity space;
thus rendering robots immobile. This problem can be solved using truncation.
The idea of a truncated hybrid velocity obstacle can be best explained by imag-
ining a static obstacle. A velocity in the direction of the obstacle will eventually
lead into collision, but not directly. Hence, we can define an area in which the
selected velocities are safe for at least τtime steps. The truncation is then in the
shape of the Minkowski sum of the two footprints, shrunk by the factor τ. If the
footprints are discs, the shrunken disc that still fits in the truncated cone has a
radius of rA+rB
τ, see Figure 3(a). The truncation can be closely approximated
by a line perpendicular to the relative position and tangential to the shrunken
disk. Applying the same method to create a HRVO from a VO, we can create a
Evolutionary Advantage of Reciprocity in Collision Avoidance 7
vx
vy
vA
Fig. 4. ClearPath enumerates intersection points for all pairs of VOs (solid dots). In
addition the preferred velocity vAis projected on the closest leg of each VO (open
dots). The point closest to the preferred velocity (dashed line) and outside of all VOs
is selected as new velocity (solid line).
truncated HRVO out of the truncated VO by translating the apex accordingly,
see Figure 3(b). The same applies to RVOs.
3.2 ClearPath
To efficiently compute collision free velocities, we employ the ClearPath algo-
rithm introduced by [4]. The algorithm is applicable to many variations of ve-
locity obstacles (VO, RVO or HRVO) represented by line segments or rays.
ClearPath follows the general idea that the collision free velocity that is closest
to preferred velocity is: (a) on the intersection of two line segments of any two
velocity obstacles, or (b) the projection of the preferred velocity onto the clos-
est leg of each velocity obstacle. All points that are within another obstacle are
discarded and from the remaining set the one closest to the preferred velocity is
selected. Figure 4 shows the graphical interpretation of the algorithm.
4 Evolutionary Analysis
We have introduced three variations of the velocity obstacle approach, namely
VO, RVO, and HRVO. Evaluating the performance of these methods, especially
in a heterogenous setting, is the aim of this paper. We perform an evolutionary
analysis based on heuristic payoff tables [16] to approximate an infinite popula-
tion. The heuristic payoff table Hcaptures the payoff information for all possible
discrete distributions Nifor a finite population with nindividuals. The payoff
for an arbitrary continuous population state xis computed as the weighted av-
erage over all rows of the heuristic payoff table, where payoffs are weighted by
the probability that the discrete distribution of a particular row Niis the result
of drawing nindividuals according to x. For further details, see Section 4.1.
8 Daniel Hennes, Daniel Claes and Karl Tuyls
4.1 Method
The evolutionary model assumes an infinite population. We cannot compute the
payoff for such a population directly, but we can approximate it from evaluations
of a finite population.
All possible distributions over kinformation levels can be enumerated for
a finite population with nindividuals. Let Nbe a matrix, where each row Ni
contains one discrete distribution. The matrix will yield n+k1
nrows. Each
distribution over information levels can be simulated with the market model,
returning a vector of average expected relative market revenues u(Ni). Let U
be a matrix which captures the revenues corresponding to the rows in N, i.e.,
Ui=u(Ni). A heuristic payoff table H= (N, U ) is proposed in [16] to capture
the payoff information for all possible discrete distributions in a finite population.
In order to approximate the payoff for an arbitrary mix of strategies xin an
infinite population distributed over the phenotypes according to x,nindividuals
are drawn randomly from the infinite distribution. The probability for selecting
a specific row Nican be computed from xand Ni:
P(Ni|x) = n
Ni,1, Ni,2, . . . , Ni,kk
Y
j=1
xNi,j
j
The expected payoff fi(x) is computed as the weighted combination of the payoffs
given in all rows, compensating for payoff that cannot be measured. If a discrete
distribution features zero traders of a certain information type, its payoffs cannot
be measured and Uj,i = 0.
fi(x) = PjP(Nj|x)Uj,i
1(1 xi)k
This expected payoff can be used in (1) to compute the evolutionary change
according to the replicator dynamics.
4.2 Experimental Setup
To compute the payoffs corresponding to each finite population Niwe consider
the following scenario. All robots have a circular footprint with a radius of 0.2m
and move with a maximum speed of 0.5m/s. Robots are initially located on a
circle (equally spaced) with a radius of 10m and the goal locations are set to the
antipodal positions, i.e., each robot’s shortest path is through the center of the
circle. Figure 5 shows example trajectories for 6 robots. The goal is assumed to
be reached when the robot’s center is within a 0.01m radius of the goal location.
The performance of robot iis the negative value of its time of arrival, denoted
as Ti. Heuristic payoff tables are computed for n= 12 robots, leading to 91
rows. Payoffs for each discrete distribution Niare averaged over (a maximum
of) 20 random permutations on the initial positions of robots.1
1Some discrete distributions, e.g., all robots of one type, do not allow for 20 different
permutations.
Evolutionary Advantage of Reciprocity in Collision Avoidance 9
Fig. 5. Trajectories of 6 RVO robots initially positioned on a circle with goal locations
set to the antipodal positions.
HRVO RVO
VO
Fig. 6. Evolutionary dynamics of a population mixing between the avoidance strate-
gies: velocity obstacle (VO), reciprocal velocity obstacle (RVO) and hybrid-velocity
obstacle (HRVO). Asymptotically stable attractors are depicted by solid circles; unsta-
ble rest points are shown as open circles.
4.3 Results and Discussion
Figure 6 shows the evolutionary dynamics of a population with robots of types
VO, RVO, and HRVO using no truncation of the velocity obstacles, i.e., τ=
. All “pure” population states are asymptotically stable fixed points under
the replicator dynamics. However, the strategy space is not partitioned equally
between all attractors, the basin of attraction for RVO is considerably smaller.
Between each pair of strategies, there is one repeller at the uniform mixture. In
addition, there is one saddle point at (0.29,0.49,0.22).
We do not see any dominant strategy for a heterogeneous setting including all
three variations of the velocity obstacle. Also for pairwise comparison between
two strategies, along the faces of the simplex, no strategy is inferior. All three
strategies are evolutionarily stable.
10 Daniel Hennes, Daniel Claes and Karl Tuyls
HRVO RVO
VO
(a) Truncated VO, RVO and HRVO with
τ= 10.
RVO (τ = 2) RVO (τ =10)
RVO (τ = )
(b) Truncated RVO.
Fig. 7. Evolutionary dynamics of collision avoidance with truncation. Asymptotically
stable attractors are depicted by solid circles; unstable rest points are shown as open
circles.
Figure 7(a) shows the dynamics for the same strategies with truncation τ=
10. Strategies VO and HRVO are still asymptotically stable, while RVO is a
repeller. The interior stable fixed point at (0.23,0.49,0.28) has the largest basin
of attraction amounting to more than half of the strategy space.
Introducing truncation leads to significantly different and more complex dy-
namics. In a pairwise comparison (faces of the simplex), RVO is dominated by
VO as well as HRVO. However, the reciprocal velocity obstacle is most robust in
the presence of all three strategies (interior of the simplex). Considering Figure 2,
we see that RVO is the most “aggressive” or least restricting velocity obstacle.
The collision space of a RVO is always a subset of the corresponding VO for
moving obstacles. This is due to the assumption that other robots take care of
half of the collision avoidance. VO and HRVO are both more conservative and
thus restrict the admissible velocity space more.
Finally, Figure 7(b) shows a comparison of different levels of truncation for
the reciprocal velocity obstacle. In particular, we use τ=(no truncation),
τ= 2, and τ= 10. In this comparison robots using RVOs with no truncation (τ=
) are strictly dominated; both “pure” population states using truncation are
asymptotically stable. Truncation with τ= 2 has the largest basin of attraction.
Truncation with low values for τis less restrictive and robots continue on
a straight path until the truncated velocity obstacle takes affect. As such, the
average time of arrival is shorter and the performance increases. In the pres-
ence of robots employing velocity obstacles with less truncation, distinct use of
truncation leads to situations where robots are “trapped” near the center. In
particular, robots with τ= 2 drive straight towards the center, while robots
with τ= 10 see affect of the velocity obstacles sooner and enter in a spiraling
motion.
Evolutionary Advantage of Reciprocity in Collision Avoidance 11
5 Conclusions and Future Work
We have studied three variations of the velocity obstacle approach in competi-
tion, i.e., the velocity obstacle, the reciprocal velocity obstacle and the hybrid
velocity obstacle. Without truncation all three types perform equally well and
we do not find a dominant strategy. With the use of truncated velocity obstacles
the dynamics become more complex; the reciprocal velocity obstacle is most ro-
bust in the heterogenous system, however, in pairwise comparison this strategy
is inferior.
Our evaluation is based on a scenario commonly used in literature to show-
case the velocity obstacle approach, i.e., robots are initially located on a circle
with their goal locations set to the antipodal positions [15, 13, 7, 1]. A natural
extension is to consider various other scenarios, e.g., robots moving in a free
space or in the presence of obstacles with randomly generated goal locations.
However, it must be taken into account that such a setting requires a global
navigation strategy, which might have an effect on the performance of the local
collision avoidance.
Furthermore, the extension to a less symmetric and stylized configuration
also allows to evaluate aggressive ”straight” driving robots that do not adhere
to any of the collision avoidance obstacles as suggested in the game of chicken.
References
1. Daniel Claes, Daniel Hennes, Wim Meeussen, and Karl Tuyls. CALU: Multi-robot
collision avoidance with localization uncertainty (Demonstration). In Proceedings
of 11th International Conference on Adaptive Agents and Multi-agent Systems
(AAMAS 2012), Valencia, Spain, June 2012.
2. Daniel Claes, Daniel Hennes, Karl Tuyls, and Wim Meeussen. Collision avoidance
under bounded localization uncertainty. In Proceedings of IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS 2012), Vilamoura, Portugal,
October 2012.
3. Paolo Fiorini and Zvi Shiller. Motion planning in dynamic environments using
velocity obstacles. International Journal of Robotics Research, 17:760–772, July
1998.
4. Stephen J. Guy, Jatin Chhugani, Changkyu Kim, Nadathur Satish, Ming C. Lin,
Dinesh Manocha, and Pradeep Dubey. Clearpath: Highly parallel collision avoid-
ance for multi-agent simulation. In Symposium on Computer Animation, 2009.
5. Stephen J. Guy, Jur van den Berg, Wenxi Liu, Rynson Lau, Ming C. Lin, and
Dinesh Manocha. A statistical similarity measure for aggregate crowd dynamics.
ACM Trans. Graph., 31(6):190:1–190:11, November 2012.
6. Dirk Helbing and Anders Johansson. Pedestrian, crowd and evacuation dynamics.
In Robert A. Meyers, editor, Encyclopedia of Complexity and Systems Science,
pages 6476–6495. Springer, 2009.
7. Daniel Hennes, Daniel Claes, Wim Meeussen, and Karl Tuyls. Multi-robot collision
avoidance with localization uncertainty. In Proceedings of 11th International Con-
ference on Adaptive Agents and Multi-agent Systems (AAMAS 2012), Valencia,
Spain, June 2012.
12 Daniel Hennes, Daniel Claes and Karl Tuyls
8. Rainer Lachner, MichaelH. Breitner, and H.Josef Pesch. Real-time collision avoid-
ance: Differential game, numerical solution, and synthesis of strategies. In JerzyA.
Filar, Vladimir Gaitsgory, and Koichi Mizukami, editors, Advances in Dynamic
Games and Applications, volume 5 of Annals of the International Society of Dy-
namic Games, pages 115–135. Birkh¨auser Boston, 2000.
9. John Maynard Smith. The theory of games and the evolution of animal conflicts.
Journal of Theoretical Biology, 47(1):209–221, September 1974.
10. Anatol Rapoport and Albert M. Chammah. The Game of Chicken. American
Behavioral Scientist, 10(3):10–28, November 1966.
11. J.J. Rebollo, I. Maza, and A. Ollero. Collision avoidance among multiple aerial
robots and other non-cooperative aircrafts based on velocity planning. In Proceed-
ings of the 7th Conference On Mobile Robots And Competitions, Paderne, Portugal,
2007.
12. C. Shi, M. Zhang, and J. Peng. Vessel collision avoidance in close-quarter situa-
tion using differential games. In Proceedings of the International Conference on
Transportation Engineering, 2007.
13. Jamie Snape, Jur P. van den Berg, Stephen J. Guy, and Dinesh Manocha. The
hybrid reciprocal velocity obstacle. IEEE Transactions on Robotics, 2011.
14. Peter D. Taylor and Leo Jonker. Evolutionary stable strategies and game dynamics.
Mathematical Biosiences, 40:145–156, 1978.
15. Jur van den Berg, Ming Lin, and Dinesh Manocha. Reciprocal velocity obstacles
for real-time multi-agent navigation. In ICRA 2008, pages 1928 –1935, 2008.
16. William E. Walsh, Rajarshi Das, Gerald Tesauro, and Jeffrey O. Kephart. An-
alyzing complex strategic interactions in multi-agent systems. In Proceedings of
the Workshop on Game Theoretic and Decision Theoretic Agents, pages 109–118,
2002.
17. Ji Xiaohui, Zhang Xuejun, and Guan Xiangmin. A collision avoidance method
based on satisfying game theory. In Intelligent Human-Machine Systems and Cy-
bernetics (IHMSC), 2012 4th International Conference on, volume 2, pages 96 –99,
aug. 2012.
... Replicator dynamics has been used in the comparison of velocity obstacle methods, where each method represents a competitive strategy in a finite population [18]. This evolutionary game theory algorithm has also been shown to match the dynamics of several reinforcement learning algorithms [17]. ...
... It should be noted that the coordinates of the patches' centres and the coordinates of a vessel are taken as unitless values. We define scalar s i , with (18). ...
... The total cost function of an individual OV agent j is a weighted sum of individual cost functions (18). ...
Conference Paper
Full-text available
This paper presents a decision support system for marine vehicle collision avoidance that utilizes agent-based modelling. It generates waypoints, consecutive strategies of heading changes, and if necessary, speed changes to avoid risky collision situations in multi-vessel encounters. The global collision risk metric is defined as a weighted sum of cost functions, which are computed for each target vessel based on factors such as distance and relative velocity. An evolutionary game theory algorithm based on the replicator dynamics concept is applied to determine the best strategy using competitive agents. The proposed method aims to optimize vessel trajectories considering the risk of collision and deviated path length. The feasibility of the approach is demonstrated using simulation in the NetLogo modelling tool and provides insights into how to define an appropriate model for a scalable agent-based application for vessel guidance algorithm verification.
... For example, this allows us to study the evolutionary dynamics of various trading strategies in stock markets Hennes, Bloembergen, Kaisers, Tuyls, & Parsons, 2012;Bloembergen, Hennes, McBurney, & Tuyls, 2015). Similarly, it is possible to compare auction mechanisms (Phelps, Parsons, & McBurney, 2005), strategies in the game of poker (Ponsen, Tuyls, Kaisers, & Ramon, 2009), or even collision avoidance methods in multi-robot systems (Hennes, Claes, & Tuyls, 2013). Moreover, the link between the replicator dynamics and reinforcement learning allows us to predict what will happen when agents learn to optimise their strategy in such scenarios. ...
... However, assuming mutual avoidance (reciprocity) may potentially improve avoidance behaviour since each robot only needs to take half of the responsibility of avoiding pairwise collisions. In order to test this hypothesis we can employ the same meta-strategy approach to evaluate the evolutionary strength of different collision avoidance strategies (Hennes et al., 2013). One approach to collision avoidance in continuous spaces is the velocity obstacle (VO) paradigm, first introduced by Fiorini and Shiller (1998) for local collision avoidance and navigation in dynamic environments with multiple moving objects. ...
... Truncation yields significantly different and more complex dynamics, as shown in Figure 11 (right). In a pairwise comparison (faces of the simplex), RVO is dominated by VO as well as by HRVO. Figure 11: Evolutionary dynamics of three strategies for multi-robot collision avoidance, without truncation (left) and with truncation (right) (Hennes et al., 2013). ...
Article
Full-text available
The interaction of multiple autonomous agents gives rise to highly dynamic and nonde- terministic environments, contributing to the complexity in applications such as automated financial markets, smart grids, or robotics. Due to the sheer number of situations that may arise, it is not possible to foresee and program the optimal behaviour for all agents be- forehand. Consequently, it becomes essential for the success of the system that the agents can learn their optimal behaviour and adapt to new situations or circumstances. The past two decades have seen the emergence of reinforcement learning, both in single and multi- agent settings, as a strong, robust and adaptive learning paradigm. Progress has been substantial, and a wide range of algorithms are now available. An important challenge in the domain of multi-agent learning is to gain qualitative insights into the resulting system dynamics. In the past decade, tools and methods from evolutionary game theory have been successfully employed to study multi-agent learning dynamics formally in strategic interactions. This article surveys the dynamical models that have been derived for various multi-agent reinforcement learning algorithms, making it possible to study and compare them qualitatively. Furthermore, new learning algorithms that have been introduced us- ing these evolutionary game theoretic tools are reviewed. The evolutionary models can be used to study complex strategic interactions. Examples of such analysis are given for the domains of automated trading in stock markets and collision avoidance in multi-robot sys- tems. The paper provides a roadmap on the progress that has been achieved in analysing the evolutionary dynamics of multi-agent learning by highlighting the main results and accomplishments.
... While these complex economic problems continue to be a primary application area of these methods [5,37,38,41], the general technique has been applied in many different settings. These include analysis interaction among heuristic meta-strategies in poker [24], network protocol compliance [43], collision avoidance in robotics [11], and security games [20,25,48]. Research that followed on Walsh's [39] initial work branched off in two directions: the first strand of work focused on strategic reasoning for simulation-based games [44], while the second strand focused on the evolutionary dynamical analysis of agent behavior inspired by evolutionary game theory [31,33]. ...
... Evolutionary dynamics (foremost replicator dynamics) have often been presented as a practical tool for analyzing interactions among meta-strategies found in EGTA [2,11,39], and for studying the change in policies of multiple learning agents [3], as the EGTA approach is largely based on the same assumptions as evolutionary game-theory, viz. repeated interactions among sub-groups sampled independently at random from an arbitrarily-large population of agents. ...
Article
Full-text available
This paper provides several theoretical results for empirical game theory. Specifically, we introduce bounds for empirical game theoretical analysis of complex multi-agent interactions. In doing so we provide insights in the empirical meta game showing that a Nash equilibrium of the estimated meta-game is an approximate Nash equilibrium of the true underlying meta-game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Additionally, we extend the evolutionary dynamics analysis of meta-games using heuristic payoff tables (HPTs) to asymmetric games. The state-of-the-art has only considered evolutionary dynamics of symmetric HPTs in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable. Finally, we carry out an empirical illustration of the generalised method in several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm (symmetric), the dynamics of the Colonel Blotto game played by human players on Facebook (symmetric), the dynamics of several teams of players in the capture the flag game (symmetric), and an example of a meta-game in Leduc Poker (asymmetric), generated by the policy-space response oracle multi-agent learning algorithm.
... While these complex economic problems continue to be a primary application area of these methods [71][72][73][74], the general techniques have been applied in many different settings. These include analysis of interactions among heuristic meta-strategies in poker [75], network protocol compliance [76], collision avoidance in robotics [77], and security games [78][79][80]. ...
... Evolutionary dynamics have often been presented as a practical tool for analyzing interactions among meta-strategies found in EGTA [6,33,77], and for studying the change in policies of multiple learning agents [33], as the EGTA approach is largely based on the same assumptions as evolutionary game-theory, viz. repeated interactions among sub-groups sampled independently at random from an arbitrarily-large population of agents. ...
Preprint
We introduce {\alpha}-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous-time and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of agents, in the type of interactions (beyond dyadic), and the type of empirical games (symmetric and asymmetric). Current models are fundamentally limited in one or more of these dimensions, and are not guaranteed to converge to the desired game-theoretic solution concept (typically the Nash equilibrium). {\alpha}-Rank automatically provides a ranking over the set of agents under evaluation and provides insights into their strengths, weaknesses, and long-term dynamics in terms of basins of attraction and sink components. This is a direct consequence of our new model's direct correspondence to the dynamical MCC solution concept when its ranking-intensity parameter, {\alpha}, is chosen to be large, which exactly forms the basis of {\alpha}-Rank. In contrast to the Nash equilibrium, which is a static solution concept based solely on fixed points, MCCs are a dynamical solution concept based on the Markov chain formalism, Conley's Fundamental Theorem of Dynamical Systems, and the core ingredients of dynamical systems: fixed points, recurrent sets, periodic orbits, and limit cycles. Our {\alpha}-Rank method runs in polynomial time with respect to the total number of pure strategy profiles, whereas computing a Nash equilibrium for a general-sum game is known to be intractable. We introduce mathematical proofs that reveal the formal underpinnings of the {\alpha}-Rank methodology. We illustrate the method in canonical games and in AlphaGo, AlphaZero, MuJoCo Soccer, and Poker.
... In a nutshell, the HPT makes it amenable to theoretically analyse the multiplayer game as long as the game is symmetric. This method has been applied to many areas such as continuous double auctions, poker games and multi-robot systems (Hennes, Claes, and Tuyls 2013). ...
Preprint
Full-text available
Evolutionary game theory has been a successful tool to combine classical game theory with learning-dynamical descriptions in multiagent systems. Provided some symmetric structures of interacting players, many studies have been focused on using a simplified heuristic payoff table as input to analyse the dynamics of interactions. Nevertheless, even for the state-of-the-art method, there are two limits. First, there is inaccuracy when analysing the simplified payoff table. Second, no existing work is able to deal with 2-population multiplayer asymmetric games. In this paper, we fill the gap between heuristic payoff table and dynamic analysis without any inaccuracy. In addition, we propose a general framework for m versus n 2-population multiplayer asymmetric games. Then, we compare our method with the state-of-the-art in some classic games. Finally, to illustrate our method, we perform empirical game-theoretical analysis on Wolfpack as well as StarCraft II, both of which involve complex multiagent interactions.
... Originally, Empirical Game Theory was introduced to reduce and study the complexity of large economic problems in electronic commerce, e.g., continuous double auctions [50][51][52], and later it has been also applied in various other domains and settings [22,[36][37][38]47]. Empirical game theoretic analysis and the effects of uncertainty in payoff tables (in the form of noisy payoff estimates and/or missing table elements) on the computation of Nash equilibria have been studied for some time [1,15,27,43,49,54,55], with contributions including sample complexity bounds for accurate equilibrium estimation [49], adaptive sampling algorithms [55], payoff query complexity results of computing approximate Nash equilibria in various types of games [15], and the formulation of particular varieties of equilibria robust to noisy payoffs [1]. ...
Preprint
This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents. Traditionally, researchers have relied on Elo ratings for this purpose, with recent works also using methods based on Nash equilibria. Unfortunately, Elo is unable to handle intransitive agent interactions, and other techniques are restricted to zero-sum, two-player settings or are limited by the fact that the Nash equilibrium is intractable to compute. Recently, a ranking method called {\alpha}-Rank, relying on a new graph-based game-theoretic solution concept, was shown to tractably apply to general games. However, evaluations based on Elo or {\alpha}-Rank typically assume noise-free game outcomes, despite the data often being collected from noisy simulations, making this assumption unrealistic in practice. This paper investigates multiagent evaluation in the incomplete information regime, involving general-sum many-player games with noisy outcomes. We derive sample complexity guarantees required to confidently rank agents in this setting. We propose adaptive algorithms for accurate ranking, provide correctness and sample complexity guarantees, then introduce a means of connecting uncertainties in noisy match outcomes to uncertainties in rankings. We evaluate the performance of these approaches in several domains, including Bernoulli games, a soccer meta-game, and Kuhn poker.
... However, assuming mutual avoidance (reciprocity) may potentially improve avoidance behaviour since each robot only takes half of the responsibility of avoiding pairwise collisions. In order to test this hypothesis, Hennes et al. (2013) employ the aforementioned meta-strategy approach to evaluate the evolutionary strength of different collision avoidance strategies, by simulating a multi-robot system in which those strategies are employed, and estimating their payoff functions based on those simulations. They find that reciprocity is robust in the presence of alternative, non-reciprocal collision avoidance strategies. ...
... Such empirical game theoretic analysis has proven valuable in getting insights into various complex real-world domains, such as automated trading [18], auction mechanism design [19], the game of poker [20], collision avoidance in multi-robot systems [21], adaptive cyber-defence strategies [22] and large-scale bargaining [23]. In this work, we follow a similar approach, but focus on the domain of space debris removal. ...
Article
Full-text available
We analyse active space debris removal efforts from a strategic, game-theoretical perspective. Space debris is non-manoeuvrable, human-made objects orbiting Earth, which pose a significant threat to operational spacecraft. Active debris removal missions have been considered and investigated by different space agencies with the goal to protect valuable assets present in strategic orbital environments. An active debris removal mission is costly, but has a positive effect for all satellites in the same orbital band. This leads to a dilemma: each agency is faced with the choice between the individually costly action of debris removal, which has a positive impact on all players; or wait and hope that others jump in and do the ‘dirty’ work. The risk of the latter action is that, if everyone waits, the joint outcome will be catastrophic, leading to what in game theory is referred to as the ‘tragedy of the commons’. We introduce and thoroughly analyse this dilemma using empirical game theory and a space debris simulator. We consider two- and three-player settings, investigate the strategic properties and equilibria of the game and find that the cost/benefit ratio of debris removal strongly affects the game dynamics.
Article
Full-text available
Contemporary developments of on-board systems for automatic or semiautomatic driving include car collision avoidance systems. For this purpose two approaches based on pursuit-evasion differential games are compared. On a freeway a correct driver (evader) is faced with a wrong-way driver (pursuer), i.e., a person driving on the wrong side of the road. The correct driver tries to avoid collision against all possible maneuvers of the wrong-way driver and additionally tries to stay on the freeway. The representation of the optimal collision avoidance behavior along many optimal paths is used to synthesize an optimal collision avoidance strategy by means of neural networks. Examples of simulations that prove a satisfactory performance of the real-time collision avoidance scheme are presented.
Article
Full-text available
We present an information-theoretic method to measure the similarity between a given set of observed, real-world data and visual simulation technique for aggregate crowd motions of a complex system consisting of many individual agents. This metric uses a two-step process to quantify a simulator's ability to reproduce the collective behaviors of the whole system, as observed in the recorded real-world data. First, Bayesian inference is used to estimate the simulation states which best correspond to the observed data, then a maximum likelihood estimator is used to approximate the prediction errors. This process is iterated using the EM-algorithm to produce a robust, statistical estimate of the magnitude of the prediction error as measured by its entropy (smaller is better). This metric serves as a simulator-to-data similarity measurement. We evaluated the metric in terms of robustness to sensor noise, consistency across different datasets and simulation methods, and correlation to perceptual metrics.
Conference Paper
Full-text available
We present a multi-mobile robot collision avoidance system based on the velocity obstacle paradigm. Current positions and velocities of surrounding robots are translated to an efficient geometric representation to determine safe motions. Each robot uses on-board localization and local communication to build the velocity obstacle representation of its surroundings. Our close and error-bounded convex approximation of the localization density distribution results in collision-free paths under uncertainty. While in many algorithms the robots are approximated by circumscribed radii, we use the convex hull to minimize the overestimation in the footprint. Results show that our approach allows for safe navigation even in densely packed environments.
Conference Paper
Full-text available
This paper describes a multi-robot collision avoidance system based on the velocity obstacle paradigm. In contrast to previous approaches, we alleviate the strong requirement for perfect sensing (i.e. global positioning) using Adaptive Monte-Carlo Localization on a per-agent level. While such methods as Optimal Reciprocal Collision Avoidance guarantee local collision-free motion for a large number of robots, given perfect knowledge of positions and speeds, a realistic implementation requires further extensions to deal with inaccurate localization and message passing delays. The presented algorithm bounds the error introduced by localization and combines the computation for collision-free motion with localization uncertainty. We provide an open source implementation using the Robot Operating System (ROS). The system is tested and evaluated with up to eight robots in simulation and on four differential drive robots in a real-world situation.
Conference Paper
In solving collision avoidance problem of two vessels in close-quarter situation, differential game theory is introduced to investigate anti-collision strategy in close-quarter situations. Mathematical model using differential game theory for collision avoidance is formulated. The collision avoidance problem in urgent situation is converted into differential games problem. By differential games method, the optimal controlling strategies for the two vessels are deducted from the mathematical model. Numerical simulation of various meeting situations is performed. The research indicates that the controlling strategy is optimal when they have the same acceleration direction. It is also shown that difFerential game theory is an effective tool in dealing with collision avoidance problems.
Conference Paper
Flight safety at both high and low altitude airspace become more and more important in the recent years. In this paper, considering the free flight characteristic of aircrafts in airspace, a decentralized approach based on satisficing game theory is proposed. This method can eliminate conflicts among aircrafts effectively, and it can also solve collisions with obstacles which mainly include the special use airspace (SUA), severe weather regions in high altitude airspace, and obstacles in low altitude airspace. The simulations show good performance in a complex scenario, which includes a large number of aircrafts flying through a airspace with obstacles at the same time.
Article
This paper presents a method for robot motion planning in dynamic environments. It consists of selecting avoidance maneuvers to avoid static and moving obstacles in the velocity space, based on the cur rent positions and velocities of the robot and obstacles. It is a first- order method, since it does not integrate velocities to yield positions as functions of time. The avoidance maneuvers are generated by selecting robot ve locities outside of the velocity obstacles, which represent the set of robot velocities that would result in a collision with a given obstacle that moves at a given velocity, at some future time. To ensure that the avoidance maneuver is dynamically feasible, the set of avoidance velocities is intersected with the set of admissible velocities, defined by the robot's acceleration constraints. Computing new avoidance maneuvers at regular time intervals accounts for general obstacle trajectories. The trajectory from start to goal is computed by searching a tree of feasible avoidance maneuvers, computed at discrete time intervals. An exhaustive search of the tree yields near-optimal trajectories that either minimize distance or motion time. A heuristic search of the tree is applicable to on-line planning. The method is demonstrated for point and disk robots among static and moving obstacles, and for an automated vehicle in an intelligent vehicle highway system scenario.
Article
Behavior in a game simulating brinksmanship and appeasement is analyzed as a function of varying parameters in the game and as over-time trends. Anatol Rapoport is Professor of Mathematical Biology and Senior Research Mathematician at the Mental Health Research Institute, University of Michigan. He is the author of Fights, Games and Debates; Strategy and Conscience, and co-author of Prisoner's Dilemma. Albert M. Chammah is also at MHRI, as Assistant Research Mathematical Psychologist, he is co-author of Prisoner's Dilemma.
Article
We consider a class of matrix games in which successful strategies are rewarded by high reproductive rates, so become more likely to participate in subsequent playings of the game. Thus, over time, the strategy mix should evolve to some type of optimal or stable state. Maynard Smith and Price (1973) have introduced the concept of ESS (evolutionarily stable strategy) to describe a stable state of the game. We attempt to model the dynamics of the game both in the continuous case, with a system of non-linear first-order differential equations, and in the discrete case, with a system of non-linear difference equations. Using this model, we look at the notions of stability and asymptotic behavior. Our notion of stable equilibrium for the continuous dynamic includes, but is somewhat more general than, the notion of ESS.