Content uploaded by James Moffat
Author content
All content in this area was uploaded by James Moffat on Mar 24, 2016
Content may be subject to copyright.
1
Using Genetic Algorithms to Represent Higher Level Planning in Simulation Models of Conflict
James Moffat* and Susan Fellows
The Defence Science and Technology Laboratory, UK
*Contact Address: C Floor ISAT K; Grenville West Court; Dstl Portsdown West, Portsdown Hill Road
Fareham PO17 6AD, UK
Abstract
The focus of warfare has shifted from the Industrial Age to the Information Age, as encapsulated by the term Network
Enabled Capability. This emphasises information sharing, command decision making and the resultant plans made by
commanders on the basis of that information. Planning by a higher level military commander is, in most cases, regarded
as such a difficult process to emulate, that it is performed by a real commander during wargaming or during an
experimental session based on a Synthetic Environment. Such an approach gives a rich representation of a small
number of data points. However, a more complete analysis should allow search across a wider set of alternatives. This
requires a closed form version of such a simulation. In this paper we discuss an approach to this problem, based on
emulating the higher command process using a combination of game theory and genetic algorithms. This process was
initially implemented in an exploratory research initiative, described here, and now forms the basis of the development
of a ‘Mission Planner’, potentially applicable to all of our higher level closed form simulation models.
© Crown Copyright 2009. Published with the Permission of the Defence Science and Technology Laboratory on
Behalf of the Controller HMSO.
Introduction
Since the Cold War period, the scenario context has widened considerably, reflecting the uncertainties of the future.
Moreover, decision cycles for our customer community in the UK Ministry of Defence (MoD) have significantly
shortened. The focus of war has also shifted from the Industrial Age of grinding attrition to the Information Age, as
encapsulated in the term Network Enabled Capability (NEC). NEC is a key goal for the MoD, with the emphasis on
command, the sharing of awareness among commanders, and the creation of agile effects. These influences together
have led to the need for simulation models which are focussed on command rather than equipment, which can consider
a large number of future contexts, and which can robustly examine a number of ‘what if’ alternatives (Taylor and Lane,
2004).
In response to these demands, we have built a new generation of simulation models, with command (and commander
decision making in particular) at their core (Moffat, 2000). These span the range from the single environment (e.g. a
land only conflict at the tactical level) to the whole joint campaign, and across a number of coalition partners (Moffat,
Campbell and Glover, 2004). They also encompass both warfighting and peacekeeping operations. These models have
been deliberately built as a hierarchy, feeding up from the tactical (or systems) level to the operational (or system of
systems) level, to give enhanced analytical insight, as shown in Figure 1.
2
Figure 1: The hierarchy of key simulation models.
WISE
As part of these development activities, we have constructed a stochastic wargame called ‘WISE’ (Wargame
Infrastructure and Simulation Environment). As the name suggests, this is more that just a single model, and in fact
provides a modelling infrastructure from which a number of tailored models can be created. The key development thus
far has been the wargame itself (Robinson and Wright 2002, Pearce et al 2003). However, a logistics simulation has
also been developed and is being used to examine vehicle reliability and consequent repair.
The model addresses a previous gap in modelling capability relating to the representation of command decision making,
and has utilised where possible novel techniques to represent this key aspect of Network Enabled Capability. The
wargame represents operations up to Army Division level. Army commanders play the roles of Division and Brigade
commanders in the game, on both sides, and they are supported by an underlying simulation environment which
represents the evolution of events. The Synthetic Environment (SE) representation exploits the Rapid Planning process
(Moffat, 2002) to determine the decisions made by the lower level commanders that are not explicitly represented by
players. We define a Synthetic Environment (SE) as consisting of real and simulated people interacting with simulated
environments. In contrast, a closed form constructive simulation consists of simulated people (i.e. computer algorithms)
interacting with simulated environments, with no human intervention during the model run. Synthetic Environments are
particularly good at exploring new situations and future contexts.
Campaign level
Joint warfighting (COMAND) Campaign level
Peacekeeping (DIAMOND)
Campaign level Land
warfighting
(CLARION)
System level
Air/Maritime
(SIMMAIR)
System level
Land Warfighting
(SIMBRIG, SIMBAT)
System level Land
Wargame/Simulation
Peacekeeping/warfighting
(WISE)
Operational Level
Tactical Level
3
In problem exploration, SEs give a rich understanding (i.e. qualitative information) of a small set of possible options.
The number of options which can be explored, however, is limited due to the high cost and time required to stage such
wargaming events. In order to allow us to explore around these initial options, and thus develop a wider understanding
of their robustness (a key aspect of understanding force ‘agility’) we needed to develop a closed form discrete event
simulation equivalent of the WISE wargame – in essence replacing the human players by some form of artificial
intelligence representation, to allow the running of the scenario without human intervention. This was done by
exploiting the Deliberate Planning Process, an algorithmic representation of higher level command based on a
combination of game theory and genetic algorithms. Our implementation was exploratory, and a test of the feasibility of
generating ‘sensible’ higher level plans, within a realistic conflict context, using genetic algorithms rather than expert
military players. Using the same model as both an SE and a closed form constructive simulation has the additional
benefit that the algorithms derived for planning in the closed form model can be calibrated by running experiments
using expert military players in the SE version of the same situation.
Deliberate Planning Process
The Deliberate Planner emulates the ‘formal estimate process’ whereby a high level commander develops an overall
plan for the campaign. At this level of the command process, a ‘Blue’ (friendly) commander considers a number of
potential courses of action, taking into account his intent (i.e. his primary goal or objective), and the intent of the enemy
(‘Red’) force. The algorithms which we have implemented to represents this process firstly develop a ‘picture’ of the
layout and intent of the enemy force, based on sensor inputs and a Bayesian approach to information updating. On the
basis of this ‘picture’, the planner then decides on a layout of the friendly force which best achieves the commander’s
goals. It does this by ‘breeding’ plans in an innovative way, using a genetic algorithm, and then selecting a plan with a
high ‘fitness’ level. Our approach to using genetic algorithms is based on that of (Goldberg, 1989).
A plan, in this context, is an allocation of forces to different potential areas of operation across the whole ‘theatre’ of
operations. This is turned into a ‘chromosome’ by expressing each allocation to a specific region in binary terms, so that
the plan is then a string of binary numbers (i.e. a string of 0s and 1s). The fitness of the plan is calculated using a
number of historical analysis equations, exploiting the approach of (Rowland, 2006), which relate force layout to
potential campaign outcome. These allow the model to calculate aspects of the plan such as the likely level of casualties
(own and opposing forces), the likely rate of advance towards the objective, the probability of breakthrough, and the
probability of overall success. These are individual contributors to the plan fitness function. They are weighted to allow
for the representation of different ‘styles of command’ (for example a risk averse commander might put a high priority
on keeping casualties to a minimum, while another commander might put more priority on getting to the objective). The
fitness value also reflects the style of the commander through a game theory approach which seeks to maximise, across
the different areas of operation, the Blue commander’s minimum payoff in each such area, taking account of the courses
of action (the strategies in our game theory formulation) available to the Red opponent. This is then a maximin solution
corresponding to what we then term ‘cautious command’. Alternative formulations of the fitness function attempt an
even spread of risk (median command) or attempt to maximise his maximum payoff (bold command) (Moffat, 2002).
Bold command appears to be a high risk strategy. However, when played through the modelling environment, it can
give rise to very ‘manoeuverist’ plans which can catch the opponent out.
4
The initial ‘gene pool’ of the genetic algorithm corresponds to 100 ‘random plans’. Each of these plan ‘chromosomes’
is represented by a random string of 0s and 1s. In general, the fitness values for these initial plans will be low, and we
need to evolve this set of plans in order to breed a plan which is ‘sufficiently good’ (as measured by the plan fitness
function). As a first step, all of these initial plans are evaluated, arranged in rank order of fitness, and then randomly
selected for pairing, with higher ranked chromosomes being more likely to be chosen (‘survival of the fittest’).
Crossover operators act on these pairs. The probability of such a crossover being applied is user definable (with a
default value of 0.7). Only a single fixed crossover point is used. If employed, the crossover operator then swaps the
tails of the two chromosomes. We also employ a mutation operator, corresponding to the possibility of flipping a 0 to a
1 or vice versa, when applied to the binary representation of a particular plan. The probability of this occurring is also
user definable, with a default value of 0.033 (Moffat, 2002). The gene pool is then updated across a number of
generations. A form of ‘elitism’ is applied in which the best plan thus far, across the generations, is carried forward to
the final stage. Here the best plan carried forward, together with the best plans from the final generation, are considered,
and a final choice of plan is made.
Testing the Algorithm in WISE
In order for a plan to be generated there is an implicit assumption that the unit undertaking Deliberate Planning has an
understanding (or ‘picture’) of what is happening around it in the model. This is derived from a number of sensor
platforms or units which are feeding information in to allow the Recognised Picture to be compiled. At the beginning of
a run an ‘Assess Current Situation’ task is called which sends out the initial orders to the sensors to search for
information updating the picture. An intelligence fusion process and possible additional tasking of sensor units to add
further information is then carried out to further build up the picture, and allow an analysis of enemy intent and likely
courses of action (i.e. Red strategies in the game theory sense) to be completed. All of the sensor acquisitions are made
using the ‘Surveillance and Target Acquisition’ model in WISE which are passed up the command chain. When a
sensor asset completes a search of its tasked zone a ‘fused’ set of acquisitions is passed into the Intelligence Fusion
process, and a new order is generated for that sensor asset.
As already discussed, a number of cycles of intelligence fusion are required in order to build up a suitable picture
against which to create a plan. Two criteria are specified in the data that determine when intelligence fusion is deemed
to be complete enough for planning: (a) the number of times that specified zones must be searched, or (b) a time period.
The first of these criteria to be realised is used to initiate the plan generation task. We also normally assume that one
side in the model is attacking, and the other defending, with the attacker (either Blue or Red) being the first to formulate
a plan, followed by the defender.
Once started the Blue plan generation process takes account of the likely Red strategies, and possible own Blue
strategies, together with the assumed style of command (bold, median or cautious) in order to determine the course of
action to adopt and hence the orders that need to be issued. A plan (a Blue strategy) is a force allocation to a number of
areas of operation, and this is evaluated using the plan fitness function, given the possible set of Red strategies. The
initial set of Blue plans is then ‘bred’ using the genetic algorithm, as previously described, to determine the best plan to
5
adopt. Once this process is complete, a set of orders are generated and picked up by the interface classes to be translated
into the orders required to task units within WISE.
As the plan is executed in the simulation, sensor assets continue to search for further information, and the Deliberate
Planner’s recognised picture continues to be updated. Each time that this process is carried out, an assessment is made
(the ‘Plan Supervision and Repair’ process) to determine whether the plan is performing within defined bounds. This is
done by applying the plan fitness function to the Blue plan as it evolves through the simulation, taking account of
additional sensor based information (i.e. Blue’s evolving perception) about the location of both Blue and Red units. If
the plan is failing (i.e. not achieving the required fitness level) the Plan Supervision and Repair process takes place. The
planning algorithm determines which areas of operation are failing to meet the plan. It also identifies which units are
surplus or in reserve and places these in an availability pool. The areas of operation that are in deficit are then
supplemented as required and a new set of orders are issued.
Testing the Genetic Algorithm
In order to test the genetic algorithm, we played through a future scenario using the SE version of WISE, employing
expert military players on both the Blue and Red sides. We also represented the same scenario within the closed form
constructive simulation version of WISE. In the Deliberate Planner, the broad movement of the forces on the ground is
task organised into ‘channels’ or areas of operation, which head towards objectives (such as an area of ground to be
attacked, or a capital city to be defended). These are options which the Deliberate Planner can use in its consideration of
how to deploy the force, and forces can be moved between channels as the scenario progresses, as part of the Plan
Supervision and Repair process. In our future scenario there are two Blue channels (Figure 2) and two Red channels
(Figure 3). Red are initially static with Blue moving towards their objective. In order to make a fair comparison, both
the players in the wargame, and the closed form simulation, started with the same information from sensors and
intelligence reports, and had the same initial appreciation of the battlespace in terms of movement and key areas of
ground. Thus, for example, the initial picture available to a Blue commander in the SE version of the scenario, had the
same information content as the picture available to the algorithms representing that commander in the closed form
constructive simulation version. Of course this could diverge as the scenario unfolded, depending on the choices made
subsequently either in the SE or the closed form version. It was also assumed in each case that the Unmanned Air
Vehicles (UAV) deployed as sensors could not be shot down, in order that a reasonable level of situational awareness
could be maintained and that this factor (i.e. loss of sensor input) would not greatly influence the plan created. For our
comparison, the planner was run with a cautious command style (i.e. a maximin payoff function was assumed for Blue,
as part of the evaluation of his plan fitness). A higher weighting in the fitness function was also given to the impact of
Blue’s plan on Red forces than the impact of Blue’s plan on Blue forces.
6
Figure 2; Blue areas of operation (‘channels’) within the context of Blue intent.
Figure 3; Red areas of operation (‘channels’) within the context of Red intent.
7
Figure 4 shows the initial deployment of the forces (the same for both the wargame and the closed form simulation,
again to ensure fairness of comparison).
Figure 4; Blue and Red initial deployment locations.
Comparison of the simulation model algorithm with the wargame
When the Deliberate Planner algorithm is initialised it allocates airborne unmanned air vehicle sensors (UAVs) to the
first zones on the channels. Data is used to define the list of sensors allocated to each channel, as well as how many
should be used on that channel at any one time. The output log from the closed form simulation showed that both the
initial sensor tasking and subsequent sensor tasking took place in the model, with information from these sensors
influencing the Deliberate Planner. A plan is generated either when the sensors have searched all of the zones three
times or when a user defined trigger time has been reached. In terms of the simulation run, the plan generated was
triggered by the user defined time.
Prior to the generation of the plan, information is supplied to the Deliberate Planner to allow it to build up its
Recognised Picture. An idea of the type of information available in completing this situational assessment can be seen
in the two Brigade perception screenshots at Figure 5 showing the initial assessment made by the planner, and then a
more refined assessment as further sensor information is taken into account by the algorithms.
8
Figure 5; Brigade picture evolution following additional sensor based information.
The acquired locations of the enemy force as well as the location of own force units are used to generate the recognised
picture. It should be noted that acquisitions made outside of the defined zones are not passed into the Deliberate
Planner. The algorithm only considers those acquisitions within the zones where it maintains its recognised picture, thus
the plan is defined on the basis of the planner’s perception.
Figure 6 shows the execution of the plan within the simulation following the dissemination of orders to the forces. This
higher level plan is a ‘left hook’ by Blue forces, bypassing concentrations of Red force in order to achieve Blue’s
objectives and intent in a timely way. A small allocation of Blue force is also directed towards the Red perceived
objective in order to ‘fix’ Red forces in place.
9
Figure 6; The higher level plan generated and implemented by the simulation.
Discussion
The plan that was generated sent the majority of units along the ‘left hook’, with only two company sized units being
tasked down the second channel to ‘fix’ the Red forces. At first glance this appears to be counter-intuitive. However,
since the planner is clearly trying to reach the objective as quickly and with as few casualties on its side as possible, the
plan is militarily credible. By choosing the left hook, the main enemy dispositions in the two urban areas are bypassed
so that the objective can be reached through the least cost path. Bypassing urban areas rather than clearing them is an
accepted tactic in order to maintain tempo, but the enemy left behind must be fixed or at the very least screened to
provide intelligence on enemy movement. In the orders generated by the planner in the simulation, the allocation of
Blue units to these areas would be insufficient to conduct this without support from other assets, e.g. UAVs, Attack
Helicopters, Indirect Fire, etc.
Further Developments
We have demonstrated that higher level planning can be carried out using genetic algorithms, and produces militarily
credible plans. This approach is being exploited further within the UK in current model developments, as illustrated in
Figure 7. In one of our other models (CLARION – see Figure 1) we are developing a Mission Planner based on the
same genetic algorithm approach employed in Deliberate Planning. As indicated in Figure 7, the approach being
constructed is that each unit develops a local plan using Rapid Planning (Moffat, 2002). These resultant ‘missions’ or
10
course of action choices are then coordinated within an area by the Mission Planner. Meanwhile, the Deliberate
Planning algorithms deal with the larger scale allocation of forces to areas of operation.
Figure 7; The interaction of Deliberate, Mission and Rapid Planning.
References
D Goldberg ‘Genetic Algorithms in Search, Optimisation and Machine Learning’ Addison-Wesley, Reading, MA,
USA, 1989.
J Moffat (2000) ‘Representing the Command and Control Process in Simulation Models of Combat’, J Opl Res Soc 51,
431-439.
J Moffat (2002) ‘Command and Control in the Information Age: Representing its Impact’, The Stationery Office,
London, UK.
J Moffat, I Campbell and P Glover (2004) ‘Validation of the Mission Based Approach to Representing Command and
Control in Simulation Models of Conflict’, J Opl Res Soc 55, 340-349.
A Robinson and S Wright (2002) ‘ The Wargame Infrastructure and Simulation Environment (WISE)’, Operational
Research Society Simulation Workshop, Birmingham University.
Deliberate Planner
Allocates force to an area
of operations
Mission Planner
Coordinates unit level
missions within an area of
operations
Rapid Planner
Mission choice at unit level
11
D Rowland (2006) ‘The Stress of Battle: Quantifying Human Performance in Combat’, The Stationery Office, London,
UK.
B Taylor and A Lane (2004) ‘Development of a Novel Family of Military Campaign Simulation Models’, J Opl Res
Soc 55, 333-339.
P Pearce, A Robinson and S Wright (2003) ‘The Wargame Infrastructure and Simulation Environment (WISE)’,
Knowledge-Based Intelligent Information and Engineering Systems, 7th International Conference, KES 2003, Oxford
UK, September 2003 Proceedings Part II, ISBN 3-540-40804-5