Article

Classifier systems in combat: Two-sided learning of maneuvers for advanced fighter aircraft

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper reports the continuing results of a project where a genetics-based machine learning system acquires rules for novel fighter combat maneuvers through simulation. In this project, a genetics-based machine learning system was implemented to generate high angle-of-attack air combat tactics for advanced fighter aircraft. This system, which was based on a learning classifier system approach, employed a digital simulation model of one-versus-one air combat, and a genetic algorithm, to develop effective tactics for the X-31 experimental fighter aircraft. Previous efforts with this system showed that the resulting maneuvers allowed the X-31 to successfully exploit its post-stall capabilities against a conventional fighter opponent. This demonstrated the ability of the genetic learning system to discover novel tactics in a dynamic air combat environment. The results gained favorable evaluation from fighter aircraft test pilots. However, these pilots noted that the static strategy employed by the X-31's opponent was a limitation. In response to these comments, this paper reports new results with two-sided learning, where both aircraft in a one-versus-one combat scenario use genetics-based machine learning to adapt their strategies. The experiments successfully demonstrate both aircraft developing objectively interesting strategies. However, the results also point out the complexity of evaluating results from mutually adaptive players, due to the red queen effect. These complexities, and future directions of the project, are discussed in the paper's conclusions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... However, this approach is difficult to be applied in practice since the influence diagram is converted into a nonlinear programming problem, which cannot meet the demand of fast computation during air combat. In [19], a genetics-based machine learning algorithm is implemented to generate high angle-of-attack air combat maneuver tactics for the X-31 fighter aircraft in a one-on-one air combat scenario. Approximate dynamic programming (ADP) can be employed to solve the air combat pursuit maneuver decision problem quickly and effectively [20,21]. ...
... With probability ϵ select a random action A t (10) Otherwise select A t � arg max a Q(S t , a, w) (11) for i � 0, Δt/δtdo (12) Execute action A t in air combat simulation software (13) Obtain the positions of aircraft and target (14) if episode terminates then (15) break (16) end if (17) end for (18) Observe reward R t and state S t+1 (19) Store 20) if episode terminates then (21) break (22) end if (23) Sample random minibatch of transitions [S j , A j , R j , S j+1 ] from D (24) if episode terminates at step j + 1then (25) set Y j � R j (26) else (27) set ...
... Smaller |μ r | means that the heading or gun of the red aircraft has better aim at its opponent, and smaller |η b | implies a higher possibility of the blue aircraft being shot or facing a fatal situation. For example, in the offensive (1) Set parameters of both aircrafts (2) Set simulation parameters (3) Set the number of training periods K and the condition for ending each training period W threshold (4) Set DRL parameters (5) Set the opponent initialization policy π blue 0 (6) for period � 1, Kdo (7) for aircraft � [red, blue] do (8) if aircraft � red then (9) Set the opponent policy π � π blue period−1 (10) Initialize neural networks of red agent (11) while Winning rate< W threshold do (12) Train agent using Algorithm 1 (13) end while (14) Save the well-trained agent, whose maneuver guidance policy is π red period (15) else (16) if period � K + 1then (17) break (18) else (19) Set the opponent policy π � π red period (20) Initialize neural networks of blue agent (21) while Winning rate< W threshold do (22) Train agent using Algorithm 1 (23) end while (24) Save the well-trained agent, whose maneuver guidance policy is π blue period (25) end if (26) end if (27) end for (28) end for ALGORITHM 2: Alternate freeze game DQN for maneuver guidance agent training in air combats. scenario, the initial state value of the red agent is large because of small |μ r | and |η b |, according to equations (14) and (16). ...
Article
Full-text available
In a one-on-one air combat game, the opponent’s maneuver strategy is usually not deterministic, which leads us to consider a variety of opponent’s strategies when designing our maneuver strategy. In this paper, an alternate freeze game framework based on deep reinforcement learning is proposed to generate the maneuver strategy in an air combat pursuit. The maneuver strategy agents for aircraft guidance of both sides are designed in a flight level with fixed velocity and the one-on-one air combat scenario. Middleware which connects the agents and air combat simulation software is developed to provide a reinforcement learning environment for agent training. A reward shaping approach is used, by which the training speed is increased, and the performance of the generated trajectory is improved. Agents are trained by alternate freeze games with a deep reinforcement algorithm to deal with nonstationarity. A league system is adopted to avoid the red queen effect in the game where both sides implement adaptive strategies. Simulation results show that the proposed approach can be applied to maneuver guidance in air combat, and typical angle fight tactics can be learnt by the deep reinforcement learning agents. For the training of an opponent with the adaptive strategy, the winning rate can reach more than 50%, and the losing rate can be reduced to less than 15%. In a competition with all opponents, the winning rate of the strategic agent selected by the league system is more than 44%, and the probability of not losing is about 75%.
... The maneuver decision based on optimization theory includes genetic algorithm [9], Bayesian inference [10], statistical theory [14], etc. This kind of method is to transform the maneuvering decision problem into the optimization problem. ...
... However, many of these optimization algorithms have poor real-time performance on large-scale problems and cannot implement online decision making for air combat. Therefore, they can only be used for offline air combat tactics optimization research [9]. ...
... The algorithm based on statistics principle only performs one traversal calculation, and the calculation time is relatively short. Other optimization algorithms such as genetic algorithms require a large number of loop iteration calculations, and real-time performance is more difficult to meet the requirements of online decision-making, so the author of [9] believes that the purpose of this optimization is not to achieve online control, but to find some meaningful new maneuvers to carry out tactical research. So, it can be concluded that the real-time performance of the model established in this paper is better than that of iterative optimization algorithms. ...
Article
Full-text available
With the development of artificial intelligence and integrated sensor technologies, unmanned aerial vehicles (UAVs) are more and more applied in the air combats. A bottleneck that constrains the capability of UAVs against manned vehicles is the autonomous maneuver decision, which is a very challenging problem in the short-range air combat undergoing highly dynamic and uncertain maneuvers of enemies. In this paper, an autonomous maneuver decision model is proposed for the UAV short-range air combat based on reinforcement learning, which mainly includes the aircraft motion model, one-to-one short-range air combat evaluation model and the maneuver decision model based on deep Q network (DQN). However, such model includes a high dimensional state and action space which requires huge computation load for DQN training using traditional methods. Then, a phased training method, called "basic-confrontation", which is based on the idea that human beings gradually learn from simple to complex is proposed to help reduce the training time while getting suboptimal but efficient results. Finally, one-to-one short-range air combats are simulated under different target maneuver policies. Simulation results show that the proposed maneuver decision model and training method can help the UAV achieve autonomous decision in the air combats and obtain an effective decision policy to defeat the opponent.
... As in [17], four different initial conditions of the DF engagement are defined (see Fig. 3) based on the initial relative position and orientation of the fighters. ...
... The initial altitude of blue is also randomly chosen and can vary by 700 m around the red altitude. The limits for initial angular positions and max-min initial horizontal distances between the fighters are based on the DF literature [1,13,17]. ...
... The boundaries between the different DF situations have been derived from expert knowledge and cannon lethality [1,11,13,17]. The recommended maneuver is a continuous function of the distance between fighters as shown in Fig. 4; parameters a and b have been determined empirically by simulation. ...
... For confrontation tasks with a highly dynamic characteristic, it is often difficult to obtain the optimal real-time policy due to the complexity of computation [17]. For the methods based on the optimization algorithms, such as genetic algorithm [19], Bayesian inference [20], and statistical theory [24], they transform the maneuvering decision problem into an optimization problem and solve it mathematically to obtain an autonomous optimal policy. However, for large-scale complex non-convex optimization, it is also difficult to ensure the optimality of the solution. ...
... However, for large-scale complex non-convex optimization, it is also difficult to ensure the optimality of the solution. Furthermore, the above methods are mostly offline [19]. AI-based methods include expert systems [26], neural networks [27], and RL methods [28]- [31]. ...
Article
Full-text available
The core technique of unmanned vehicle systems is the autonomous maneuvering decision, which not only determines the applications of unmanned vehicles but also is the critical technique many countries are competing to develop. Reinforcement Learning (RL) is the potential design method for autonomous maneuvering decision-making systems. Nevertheless, in the face of complex decision-making tasks, it is still challenging to master the optimal policy due to the low learning efficiency caused by the complex environment, high dimensional state, and sparse reward. Inspired by the human learning process from simple to complex, in this paper, we propose a novel progressive deep RL algorithm for policy optimization in unmanned autonomous decision-making systems. The proposed algorithm divides the training of the autonomous maneuvering decision into a sequence of curriculums with learning tasks from simple to complex. Finally, through the self-play game, the iterative optimization of the policy is realized. Furthermore, the confrontation environment with two unmanned vehicles with obstacles is analyzed and modeled. Finally, simulation results in the one-to-one adversarial tasks demonstrate the proposed design algorithm’s effectiveness and applicability.
... However, only few parameters are reported. In a follow-up study, Zhang et al. [13] used 40 generations, with a population size of 80. Smith et al. applied a learning classifier system to develop novel oneversus-one WVR tactics for an experimental fighter jet [14,15]. A population of 200 rules is reported, tested throughout 300 generations. ...
... While a large number of simulations may be acceptable for exploratory studies such as [15], or offline learning before humanin-the-loop trials, it poses a problem in the case of learning online during training simulations. A CGF, trying to adapt its behaviour to that of a human participant, only has limited time to do so between engagements. ...
Conference Paper
Full-text available
Adaptive behaviour for computer generated forces enriches training simulations with appropriate challenge levels. For adequate insight into the range of possible behaviour, the adaptation has to take place in a rapid fashion. Ideally, each new behaviour model should remain readable by (and thereby under the control of) human experts. Although various attempts have been made at creating adaptive behaviour, current solutions require large numbers of simulations. Moreover, usability by end users has been of subordinate interest, as is compliance with doctrine and ethics. In this work, we present a machine learning method that enables fast behaviour adaptation, while keeping the behaviour models in a human-readable format. We demonstrate the effectiveness of the proposed method in beyond-visual-range air combat simulations.
... Optimization methods treat air game problems as multi-objective optimization problems subject to aerodynamic constraints. Smith et al. [16] developed a genetics-based machine learning system that employs digital simulation models of one-on-one air games and genetic algorithms to generate maneuvers. Experimental results demonstrated that the proposed system enabled the X-31 to successfully counter traditional fighter opponents. ...
Article
Full-text available
Deep reinforcement learning technology applied to three-dimensional Unmanned Aerial Vehicle (UAV) air game maneuver decision-making often results in low utilization efficiency of training data and algorithm convergence difficulties. To address these issues, this study proposes an expert experience storage mechanism that improves the algorithm’s performance with less experience replay time. Based on this mechanism, a maneuver decision algorithm using the Dueling Double Deep Q Network is introduced. Simulation experiments demonstrate that the proposed mechanism significantly enhances the algorithm’s performance by reducing the experience by 81.3% compared to the prioritized experience replay mechanism, enabling the UAV agent to achieve a higher maximum average reward value. The experimental results suggest that the proposed expert experience storage mechanism improves the algorithm’s performance with less experience replay time. Additionally, the proposed maneuver decision algorithm identifies the optimal policy for attacking target UAVs using different fixed strategies.
... Intelligent methods involve optimisation and artificial intelligence. The former, which models the manoeuvre decisionmaking and then obtains the manoeuvre policy by solving the optimisation model, covers Genetic algorithm [8], Particle Swarm Optimisation algorithm [9], Bayesian inference [10] etc. However, it proves to be computationally intensive as for largescale problems and thus fails to make real-time decisions. ...
Article
Full-text available
The demand for autonomous motion control of unmanned aerial vehicles in air combat is boosted as taking the initiative in combat appears more and more crucial. Unmanned aerial vehicles inability to manoeuvre autonomously during air combat that features highly dynamic and uncertain manoeuvres of the enemy; however, limits their combat capabilities, which proves to be very challenging. To meet the challenge, this article proposes an autonomous manoeuvre decision model using an expert actor‐based soft actor critic algorithm that reconstructs empirical replay buffer with expert experience. Specifically, the algorithm uses a small amount of expert experience to increase the diversity of the samples, which can largely improve the exploration and utilisation efficiency of deep reinforcement learning. And to simulate the complex battlefield environment, a one‐to‐one air combat model is established and the concept of missile's attack region is introduced. The model enables the one‐to‐one air combat to be simulated under different initial battlefield situations. Simulation results show that the expert actor‐based soft actor critic algorithm can find the most favourable policy for unmanned aerial vehicles to defeat the opponent faster, and converge more quickly, compared with the soft actor critic algorithm.
... NASA has been developing intelligent air combat systems based on expert predictions between the 1960s and 1990s, making several attempts to use artificial intelligence systems to assist and even replace pilots in air combat decisions [10]. Heuristic methods such as genetic algorithm and fuzzy tree are also explored and used [11,12]. In recent years, with the develop of machine learning theory and the improvement of computing power, intelligent algorithms represented by deep learning and reinforcement learning have shown great advantages in air combat. ...
... Intelligent autonomous decision-making is mainly through mathematical optimization, artificial intelligence and other methods to build a mapping from naval operational situation to operational commands. According to the different ideas of solving the mapping, there are mainly: influence diagram [2][3][4], genetic algorithm [5,6], fuzzy logic [7,8], neural network [9] and other methods. In response to different scenarios, these methods have been used to solve the problem of autonomous decision-making, and many results have been achieved [10][11][12]. ...
Article
Full-text available
Changes in the information age have induced the necessity for a more efficient and effective self-decision-making requirement. A method of extracting and constructing naval operations decision-making rules based on scenario analysis is proposed. The template specifications of Event Condition Action (ECA) rules are defined, and a consistency detection method of ECA rules based on SWRL is proposed. The logical relationships and state transitions of the naval operational process is analyzed in detail, and the association of objects, events, and behaviors is realized. Finally, the operation of the proposed methods is illustrated through an example process, showing the method can effectively solve the problems of self-decision-making rule extraction and construction among naval battlefield decision environment, and avoid relying on artificial intelligence, which may have brought some uncertain factors.
... Autonomous air combat maneuver decision refers to the process of automatically generating flight control commands to gain the advantage during air combat confrontation based on mathematical optimization, artificial intelligence, and other methods. At present, the research methods of autonomous air combat maneuver decision can be mainly divided into three categories based on the game theory [5,6], the optimization method [7−9], and the artificial intelligence [10−15] method. The methods based on the game theory and optimization algorithms mostly divide the actions of the aircraft into a limited number of maneuver actions and then calculate the effect of each action on the situation to select the best maneuver action execution. ...
Article
In order to improve the autonomous ability of unmanned aerial vehicles (UAV) to implement air combat mission, many artificial intelligence-based autonomous air combat maneuver decision-making studies have been carried out, but these studies are often aimed at individual decision-making in 1v1 scenarios which rarely happen in actual air combat. Based on the research of the 1v1 autonomous air combat maneuver decision, this paper builds a multi-UAV cooperative air combat maneuver decision model based on multi-agent reinforcement learning. Firstly, a bidirectional recurrent neural network (BRNN) is used to achieve communication between UAV individuals, and the multi-UAV cooperative air combat maneuver decision model under the actor-critic architecture is established. Secondly, through combining with target allocation and air combat situation assessment, the tactical goal of the formation is merged with the reinforcement learning goal of every UAV, and a cooperative tactical maneuver policy is generated. The simulation results prove that the multi-UAV cooperative air combat maneuver decision model established in this paper can obtain the cooperative maneuver policy through reinforcement learning, the cooperative maneuver policy can guide UAVs to obtain the overall situational advantage and defeat the opponents under tactical cooperation.
... An approach based on online model predictive control has been used in [27], and other methods such as genetic algorithms, machine learning [105], and reinforcement learning with level-k games [86] have been applied to learn effective strategies in PE scenarios. Discretization and sampling-based algorithms have also been utilized to provide fast numerical computations in PE scenarios [52] [74] [26]. ...
Thesis
This dissertation studies adversarial conflicts among a group of agents moving in the plane, possibly among obstacles, where some agents are pursuers and others are evaders. The goal of the pursuers is to capture the evaders, where capture requires a pursuer to be either co-located with an evader, or in close proximity. The goal of the evaders is to avoid capture. These scenarios, where different groups compete to accomplish conflicting goals, are referred to as pursuit-evasion games, and the agents are called players. Games featuring one pursuer and one evader are analyzed using dominance, where a point in the plane is said to be dominated by a player if that player is able to reach the point before the opposing players, regardless of the opposing players' actions. Two generalizations of the Apollonius circle are provided. One solves games with environments containing obstacles, and the other provides an alternative solution method for the Homicidal Chauffeur game. Optimal pursuit and evasion strategies based on dominance are provided. One benefit of dominance analysis is that it extends to games with many players. Two foundational games are studied; one features multiple pursuers against a single evader, and the other features a single pursuer against multiple evaders. Both are solved using dominance through a reduction to single pursuer, single evader games. Another game featuring competing teams of pursuers is introduced, where an evader cooperates with friendly pursuers to rendezvous before being captured by adversaries. Next, the assumption of complete and perfect information is relaxed, and uncertainties in player speeds, player positions, obstacle locations, and cost functions are studied. The sensitivity of the dominance boundary to perturbations in parameters is provided, and probabilistic dominance is introduced. The effect of information is studied by comparing solutions of games with perfect information to games with uncertainty. Finally, a pursuit law is developed that requires minimal information and highlights a limitation of dominance regions. These contributions extend pursuit-evasion game theory to a number of games that have not previously been solved, and in some cases, the solutions presented are more amenable to implementation than previous methods.
... Learning classifier system is a machine learning approach that evolves a group of if-then rules by employing evolutionary machine learning to solve practical learning problems that is general enough for a wide range of tasks [26][27][28][29][30]. In fuzzy learning classifier system, which is an extension of the learning classifier system (LCS), classifiers are modeled as fuzzy rules and are applied to realize tactical behavior [31] of robotic systems. ...
Article
Full-text available
Unmanned autonomous vehicles for various civilian and military applications have become a particularly interesting research area. Despite their many potential applications, a related technological challenge is realizing realistic coordinated autonomous control and decision making in complex and multi-agent environments. Machine learning approaches have been largely employed in simplified simulations to acquire intelligent control systems in multi-agent settings. However, the complexity of the physical environment, unrealistic assumptions, and lack of abstract physical environments derail the process of transition from simulation to real systems. This work presents a modular framework for automated data acquisition, training, and the evaluation of multiple unmanned surface vehicles controllers that facilitate prior knowledge integration and human-guided learning in a closed-loop. To realize this, we first present a digital maritime environment of multiple unmanned surface vehicles that abstracts the real-world dynamics in our application domain. Then, a behavior-driven artificial immune-inspired fuzzy classifier systems approach that is capable of optimizing agents’ behaviors and action selection in a multi-agent environment is presented. Evaluation scenarios of different combat missions are presented to demonstrate the performance of the system. Simulation results show that the resulting controllers can achieved an average wining rate between 52% and 98% in all test cases, indicating the effectiveness of the proposed approach and its feasibility in realizing adaptive controllers for efficient multiple unmanned systems’ cooperative decision making. We believe that this system can facilitate the simulation, data acquisition, training, and evaluation of practical cooperative unmanned vehicles’ controllers in a closed-loop.
... These approaches comprise methods within evolutionary computation , Bäck et al. 1997, Eiben and Smith 2003. Applications are diverse (Kicinger et al. 2005), with examples (and example citations) being electrical circuits (Koza et al. 1997), mechanical components (Deb and Goel 2001), software design (Salustowicz and Schmidhuber 1997), hardware (Lohn and Hornby 2006), economics (Holland and Miller 1991), and even combat maneuvers (Smith et al. 1999), music (Tokui 2000), and art (Bentley 1999). ...
Article
Full-text available
Evolutionary computational methods have adopted attributes of natural selection and evolution to solve problems in computer science, engineering, and other fields. The method is growing in use in zoology and ecology. Evolutionary principles may be merged with an agent-based modeling perspective to have individual animals or other agents compete. Four main categories are discussed: genetic algorithms, evolutionary programming, genetic programming, and evolutionary strategies. In evolutionary computation, a population is represented in a way that allows for an objective function to be assessed that is relevant to the problem of interest. The poorest performing members are removed from the population, and remaining members reproduce and may be mutated. The fitness of the members is again assessed, and the cycle continues until a stopping condition is met. Case studies include optimizing: egg shape given different clutch sizes, mate selection, migration of wildebeest, birds, and elk, vulture foraging behavior, algal bloom prediction, and species richness given energy constraints. Other case studies simulate the evolution of species and a means to project shifts in species ranges in response to a changing climate that includes competition and phenotypic plasticity. This introduction concludes by citing other uses of evolutionary computation and a review of the flexibility of the methods. For example, representing species’ niche spaces subject to selective pressure allows studies on cladistics, the taxon cycle, neutral versus niche paradigms, fundamental versus realized niches, community structure and order of colonization, invasiveness, and responses to a changing climate.
... A prevalent example of learning air combat behavior is [14]. Smith et al. used a learning classifier system (LCS) to automatically generate fighter jet maneuvers for within visual range air combat. ...
... Learning air combat behavior is a non-trivial task, as air combat involves multiple agents, team behavior, and limited resources. Previous attempts at generating air combat behavior using machine learning have included learning classifier systems [10], behavior mining [11], and neuro-evolution [12]. However, such techniques create opaque behavior models that are hard to review after the learning process has completed. ...
... Smith et al. [16] which aim at finding new strategies in 1-v-1 air combat. However, they do not deploy a cognitive model, making the level of explainability as well as the replication of human behaviour for effective training troublesome. ...
Conference Paper
Full-text available
In this paper, an approach is advocated to use a hybrid approach towards learning behavior for computer generated entities (CGEs) in a serious gaming setting. Hereby, an agent equipped with cognitive model is used but this agent is enhanced with Machine Learning (ML) capabilities. This facilitates the agent to exhibit human like behavior but avoid an expert having to define all parameters explicitly. More in particular, the ML approach utilizes co-evolution as a learning paradigm. An evaluation in the domain of one-versus-one air combat shows promising results.
Article
Although swarms of unmanned aerial vehicles have received much attention in the last few years, adversarial swarms (that is, competitive swarm-versus-swarm games) have been less well studied. In this paper, we demonstrate a deep reinforcement learning method to train a policy of fixed-wing aircraft agents to leverage hand-scripted tactics to exploit force concentration advantage and within-team coordination opportunities to destroy, or destroy, as many opponent team members as possible while preventing teammates from being attrited. The efficacy of agents using the policy network trained using the proposed method outperform teams utilizing only one of the handcrafted baseline tactics in [Formula: see text]-vs-[Formula: see text] engagements for [Formula: see text] as small as two and as large as 64 as well as learner teams trained to vary their yaw rate actions, even when the trained team’s agents’ sensor range and teammate partnership possibility is constrained.
Article
A novel new air combat algorithm is proposed, which is based on the knowledge extracted from the experience of human pilots. First, to implement a fighter that maneuvers based on manual control, the maneuver form of the fighter is analyzed and represented as a block. Second, the blocks for each function are connected based on their relationship, and a flow diagram is presented according to the engagement situation of the adversary and ownship. Third, a behavior tree model is applied as a decision-making model to implement the flow diagram as a simulation program. The behavior tree offers good scalability because nonleaf nodes can be added when sophisticated and complex decision-making is required. The proposed method has the advantage of making all maneuvers performed by the algorithm understandable and interpretable. Additionally, it can replace expensive and dangerous dogfighting training for student pilots because the proposed model can emulate maneuvers that manned pilots would perform. To verify the proposed method, the evaluation criteria from the AlphaDogfight Trials are equally applied in the simulation. The experimental results demonstrate that the proposed method has superior engagement capability as compared to the existing air-to-air combat models.
Article
Full-text available
To solve the enemy uncertain manipulation problem during a UAV's autonomous air combat maneuver decision-making, this paper proposes an autonomous air combat maneuver decision-making method that combines target maneuver command prediction with the deep deterministic policy algorithm. The situation data of both sides of air combat are effectively fused and processed, the UAV's six-degree-of-freedom model and maneuver library are built. In air combat, the target generates its corresponding maneuver library instructions through the deep Q network algorithm; at the same time, the UAV on our side gives the target maneuver prediction results through the probabilistic neural network. A deep deterministic policy gradient reinforcement learning method that considers both the situation information of two aircraft and the prediction results of enemy aircraft is proposed, so that the UAV can choose the appropriate maneuver decision according to the current air combat situation. The simulation results show that the method can effectively use the air combat situation information and target maneuver prediction information so that it can improve the effectiveness of the reinforcement learning method for UAV's autonomous air combat decision-making on the premise of ensuring convergence.
Article
In the air combat process, confrontation position is the critical factor to determine the confrontation situation, attack effect and escape probability of UAVs. Therefore, selecting the optimal confrontation position becomes the primary goal of maneuver decision-making. By taking the position as the UAV's maneuver strategy, this paper constructs the optimal confrontation position selecting games (OCPSGs) model. In the OCPSGs model, the payoff function of each UAV is defined by the difference between the comprehensive advantages of both sides, and the strategy space of each UAV at every step is defined by its accessible space determined by the maneuverability. Then we design the limit approximation of mixed strategy Nash equilibrium (LAMSNQ) algorithm, which provides a method to determine the optimal probability distribution of positions in the strategy space. In the simulation phase, we assume the motions on three directions are independent and the strategy space is a cuboid to simplify the model. Several simulations are performed to verify the feasibility, effectiveness and stability of the algorithm.
Chapter
In this paper, a UAV cluster confrontation decision-making algorithm based on two-layer intelligent optimization is proposed. Firstly, single-UAV strategy set and cluster strategy set are introduced, as well as the reward functions between cluster are constructed through height, speed and angle. Secondly, according to different cluster confrontation methods, the corresponding payoff matrix is calculated. Then, the overall cluster confrontation decision-making algorithm is proposed, along with the analysis of the Nash equilibrium solution in the first-layer intelligent optimization. Finally, a numerical experiment is presented to verify the effectiveness of the proposed algorithm. Based on the research of the two-layer intelligent decision-making algorithm, this paper effectively improves the cooperative decision-making ability of the UAV cluster, and provides a theoretical basis for the intelligent development of the information interaction between the UAV cluster and the single-UAV.
Article
As a crucial technology of air-to-air confrontation, autonomous maneuver decision has attracted wide attention in recent years. This paper proposes an improved pigeon-inspired optimization method to realize autonomous maneuver decision for unmanned aerial vehicles (UAVs) rapidly and accurately in an aerial combat engagement. The maneuver library is designed, including some advanced offensive and defensive maneuvers. A dependent set of trial maneuvers is generated to help UAVs make decisions in any tactical situation, and a future engagement state of the opponent UAV is predicted for each trial maneuver. The core of the decision-making process is that the objective function to be optimized is designed using the game mixed strategy, and the optimal mixed strategy is obtained by the improved pigeon-inspired optimization. A comparative analysis with other classical optimization algorithms highlights the advantage of the proposed algorithm. The simulation tests are conducted under four different initial conditions, namely, neutral, offensive, opposite, and defensive conditions. The simulation results verify the effectiveness of the proposed autonomous maneuver decision method.
Article
In order to improve the performance of UAV’s autonomous maneuvering decision-making, this paper proposes a decision-making method based on situational continuity. The algorithm in this paper designs a situation evaluation function with strong guidance, then trains the Long Short-Term Memory (LSTM) under the framework of Deep Q Network (DQN) for air combat maneuvering decision-making. Considering the continuity between adjacent situations, the method takes multiple consecutive situations as one input of the neural network. To reflect the difference between adjacent situations, the method takes the difference of situation evaluation value as the reward of reinforcement learning. In different scenarios, the algorithm proposed in this paper is compared with the algorithm based on the Fully Neural Network (FNN) and the algorithm based on statistical principles respectively. The results show that, compared with the FNN algorithm, the algorithm proposed in this paper is more accurate and forward-looking. Compared with the algorithm based on the statistical principles, the decision-making of the algorithm proposed in this paper is more efficient and its real-time performance is better.
Preprint
The Intelligent decision of the unmanned combat aerial vehicle (UCAV) has long been a challenging problem. The conventional search method can hardly satisfy the real-time demand during high dynamics air combat scenarios. The reinforcement learning (RL) method can significantly shorten the decision time via using neural networks. However, the sparse reward problem limits its convergence speed and the artificial prior experience reward can easily deviate its optimal convergent direction of the original task, which raises great difficulties for the RL air combat application. In this paper, we propose a homotopy-based soft actor-critic method (HSAC) which focuses on addressing these problems via following the homotopy path between the original task with sparse reward and the auxiliary task with artificial prior experience reward. The convergence and the feasibility of this method are also proved in this paper. To confirm our method feasibly, we construct a detailed 3D air combat simulation environment for the RL-based methods training firstly, and we implement our method in both the attack horizontal flight UCAV task and the self-play confrontation task. Experimental results show that our method performs better than the methods only utilizing the sparse reward or the artificial prior experience reward. The agent trained by our method can reach more than 98.3% win rate in the attack horizontal flight UCAV task and average 67.4% win rate when confronted with the agents trained by the other two methods.
Article
To solve the problem of realizing autonomous aerial combat decision-making for unmanned combat aerial vehicles (UCAVs) rapidly and accurately in an uncertain environment, this paper proposes a decision-making method based on an improved deep reinforcement learning (DRL) algorithm: the multi-step double deep Q-network (MS-DDQN) algorithm. First, a six-degree-of-freedom UCAV model based on an aircraft control system is established on a simulation platform, and the situation assessment functions of the UCAV and its target are established by considering their angles, altitudes, environments, missile attack performances, and UCAV performance. By controlling the flight path angle, roll angle, and flight velocity, 27 common basic actions are designed. On this basis, aiming to overcome the defects of traditional DRL in terms of training speed and convergence speed, the improved MS-DDQN method is introduced to incorporate the final return value into the previous steps. Finally, the pre-training learning model is used as the starting point for the second learning model to simulate the UCAV aerial combat decision-making process based on the basic training method, which helps to shorten the training time and improve the learning efficiency. The improved DRL algorithm significantly accelerates the training speed and estimates the target value more accurately during training, and it can be applied to aerial combat decision-making.
Chapter
Chapter 15 describes some applications of machine learning methods other than neural networks to computational mechanics. They cover most categories discussed in Chaps. 8 through 13. Topics given here include the parameter identification of constitutive model using evolutionary algorithm (Sect. 15.1), the construction of constitutive model using genetic programming (Sect. 15.2), the data-driven analysis (Sect. 15.3), the numerical quadrature using symbolic manipulation (Sect. 15.4), the contact search using genetic algorithm (Sect. 15.5), the contact search using genetic programming (Sect. 15.6), the non-linear equation systems solved with genetic algorithm (Sect. 15.7), and other applications using various machine learning methods (Sects. 15.8–15.10).
Article
Full-text available
With the continuous development of UAV technology, the trend of using UAV in the military battlefield is increasingly obvious, but the autonomous air combat capability of UAV needs to be further improved. The air combat maneuvering decision is the key link to realize the UAV autonomous air combat, and the genetic algorithm has good robustness and global searching ability which is suitable for solving large-scale optimization problems. This paper uses an improved genetic algorithm to model UAV air combat maneuvering decisions. Based on engineering application requirements, a typical simulation test scenario is established. The simulation results show that the air combat maneuvering decision model based on reinforcement genetic algorithm in this paper can obtain the correct maneuvering decision sequence and gain a position advantage in combat.
Article
The present paper describes a method to enhance the capability of, or to broaden the scope of computational mechanics by using deep learning, which is one of the machine learning methods and is based on the artificial neural network. The method utilizes deep learning to extract rules inherent in a computational mechanics application, which usually are implicit and sometimes too complicated to grasp from the large amount of available data A new method of numerical quadrature for the FEM stiffness matrices is developed by using the proposed method, where a kind of optimized quadrature rule superior in accuracy to the standard Gauss-Legendre quadrature is obtained on the element-by-element basis. The detailed formulation of the proposed method is given with the sample application above, and an acceleration technique for the proposed method is discussed.
Chapter
In this chapter we will discuss a selection of application areas in which natural computation shows its value in real-world enterprises. For the purposes of demonstrating the significant impact and potential of natural computation in practice, there is certainly no shortage of documented examples that could be selected. We present just ten applications, ranging from specific problems to specific domains, and ranging from cases familiar to the authors to highlights known well in the general natural computation community. Each displays the proven promise or great potential of nature-inspired computation in high-profile and important real-world applications, and we hope that these applications inspire both students and practitioners.
Chapter
This chapter reports the authors’ ongoing experience with a system for discovering novel fighter combat maneuvers, using a genetics-based machine learning process, and combat simulation. Despite the difficulties often experienced with LCSs, this complex, real-world application has proved very successful. In effect, the adaptive system is taking the place of a test pilot, in discovering complex maneuvers from experience. The goal of this work is distinct from that of many other studies, in that innovation, and discovery of novelty, is, in itself valuable. This makes the details of aims and techniques somewhat distinct from other LCSs.
Article
Full-text available
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field
Article
Full-text available
This paper selectively surveys contributions to major topics in pattern recognition since 1968. Representative books and surveys pattern recognition published during this period are listed. Theoretical models for automatic pattern recognition are contrasted with practical,, design methodology. Research contributions to statistical and structural pattern recognition are selectively discussed, including contributions to error estimation and the experimental design of pattern classifiers. The survey concludes with a representative set of applications of pattern recognition technology.
Article
Full-text available
In many classifier systems, the classifier strength parameter serves as a predictor of future payoff and as the classifier's fitness for the genetic algorithm. We investigate a classifier system, XCS, in which each classifier maintains a prediction of expected payoff, but the classifier's fitness is given by a measure of the prediction's accuracy. The system executes the genetic algorithm in niches defined by the match sets, instead of panmictically. These aspects of XCS result in its population tending to form a complete and accurate mapping X x A => P from inputs and actions to payoff predictions. Further, XCS tends to evolve classifiers that are maximally general subject to an accuracy criterion. Besides introducing a new direction for classifier system research, these properties of XCS make it suitable for a wide range of reinforcement learning situations where generalization over states is desirable. Key words Classifier systems, strength, fitness, accuracy, mapping, generalizati...
Article
Full-text available
A basic classifier system, ZCS, is presented which keeps much of Holland's original framework but simplifies it to increase understandability and performance. ZCS's relation to Q-learning is brought out, and their performances compared in environments of two difficulty levels. Extensions to ZCS are proposed for temporary memory, better action selection, more efficient use of the genetic algorithm, and more general classifier representation. Key words Classifier systems, Q-learning, temporary memory, action selection, restricted mating, s-classifiers, genetic programming 2 1. Introduction A classifier system is a learning system in which a set of condition-action rules called classifiers compete to control the system and gain credit based on the system's receipt of reinforcement from the environment. A classifier's cumulative credit, termed strength, determines its influence in the control competition and in an evolutionary process using a genetic algorithm in which new, plausibly b...
Article
Full-text available
In this paper, we explore the use of genetic algorithms (GAs) as a key element in the design and implementation of robust concept learning systems. We describe and evaluate a GA-based system called GABIL that continually learns and refines concept classification rules from its interaction with the environment. The use of GAs is motivated by recent studies showing the effects of various forms of bias built into different concept learning systems, resulting in systems that perform well on certain concept classes (generally, those well matched to the biases) and poorly on others. By incorporating a GA as the underlying adaptive search mechanism, we are able to construct a concept learning system that has a simple, unified architecture with several important features. First, the system is surprisingly robust even with minimal bias. Second, the system can be easily extended to incorporate traditional forms of bias found in other concept learning systems. Finally, the architecture of the sys...
Article
Full-text available
This paper reports the authors' ong oing experience with a system for discovering novel fig hter combat maneuvers, using a g enetics-based machine learning process, and combat simulation. In effect, theg enetic learning system in this application is taking the place of a test pilot, in discovering complex maneuvers from experience. Theg oal of this work is distinct from that of many other studies, in that innovation, and d iscovery of novelty, is, in itself valuable. This mak s th d tails of aims and t chniqu s som what distinct from oth r g n tics-bas d machin l arning r s arch. Th pap r discuss s th d tails of th application, and th motivations and d tails of th t chniqu s mploy d. R sults ar pr s nt d for syst ms wh r on play r adapts to a fix d stratg y oppon nt, and wh r two play rs co-adapt. G n ral implications of this work for oth r adaptiv b havior applications ar discuss d. 1
Article
This paper reports on a project where a genetics-based machine learning system acquired rules for novel fighter combat maneuvers through simulation. In this project, a genetic machine learning system was implemented to generate high angle-of-attack air combat tactics for the NASA X-31 research aircraft. The Genetic Learning System (GLS), based on a learning classifier system approach, employed a digital simulation model of one on one air combat and a Genetic Algorithm to develop effective tactics for the X-31. The resulting maneuvers allowed the X-31 to successfully exploit its post-stall capabilities against a conventional fighter opponent, demonstrating the ability of the GLS to discover novel tactics in a dynamic air combat environment. Moreover, the project demonstrates how genetic machine learning can acquire rules that implement novel approaches to unforeseen problems via experience with simulations.
Article
The DARPA/AFRL 'Moving and Stationary Target Acquisition and Recognition' (MSTAR) program is developing a model-based vision approach to Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR). The motivation for this work is to develop a high performance ATR capability that can identify ground targets in highly unconstrained imaging scenarios that include variable image acquisition geometry, arbitrary target pose and configuration state, differences in target deployment situation, and strong intra-class variations. The MSTAR approach utilizes radar scattering models in an on-line hypothesize-and-test operation that compares predicted target signature statistics with features extracted from image data in an attempt to determine a 'best fit' explanation of the observed image. Central to this processing paradigm is the Search algorithm, which provides intelligent control in selecting features to measure and hypotheses to test, as well as in making the decision about when to stop processing and report a specific target type or clutter. Intelligent management of computation performed by the Search module is a key enabler to scaling the model-based approach to the large hypothesis spaces typical of realistic ATR problems. In this paper, we describe the present state of design and implementation of the MSTAR Search engine, as it has matured over the last three years of the MSTAR program. The evolution has been driven by a continually expanding problem domain that now includes 30 target types, viewed under arbitrary squint/depression, with articulations, reconfigurations, revetments, variable background, and up to 30% blocking occlusion. We believe that the research directions that have been inspired by MSTAR's challenging problem domain are leading to broadly applicable search methodologies that are relevant to computer vision systems in many areas.
Article
Testing a SAR Automatic Target Recognition (ATR) algorithm at or very near its training conditions often yields near perfect results as we commonly see in the literature. This paper describes a series of experiments near and not so near to ATR algorithm training conditions. Experiments are setup to isolate individual Extended Operating Conditions (EOCs) and performance is reported at these points. Additional experiments are setup to isolate specific combinations of EOCs and the SAR ATR algorithm's performance is measured here also. The experiments presented here are a by-product of a DARPA/AFRL Moving and Stationary Target Acquisition and Recognition (MSTAR) program evaluation conducted in November of 1997. Although the tests conducted here are in the domain of EOCs, these tests do not encompass the real world (i.e., what you might see on the battlefield) problem. In addition to performance results this paper describes an evaluation methodology including the Extended Operating Condition concept, as well as, data; algorithm; and figures of merit. In summary, this paper highlights the sensitivity that a baseline Mean Squared Error (MSE) ATR algorithm has to various operating conditions both near and varying degrees away from the training conditions.
Conference Paper
GAs have proven effective on a broad range of search problems. However, when each population member's fitness evaluation is computationally expensive, the prospect of evaluating an entire population can prohibit use of the GA. This paper examines a GA that overcomes this difficulty by evaluating only a portion of the population. The remainder of the population has its fitness assigned by inheritance. Theoretical arguments justify this approach. An application to a GA-easy problem shows that greater efficiency can be obtained by evaluating only a small portion of the population. A real-world search problem confirms these results. Implications and future directions are discussed.
Book
Genetic algorithms are playing an increasingly important role in studies of complex adaptive systems, ranging from adaptive agents in economic theory to the use of machine learning techniques in the design of complex devices such as aircraft turbines and integrated circuits. Adaptation in Natural and Artificial Systems is the book that initiated this field of study, presenting the theoretical foundations and exploring applications. In its most familiar form, adaptation is a biological process, whereby organisms evolve by rearranging genetic material to survive in environments confronting them. In this now classic work, Holland presents a mathematical model that allows for the nonlinearity of such complex interactions. He demonstrates the model's universality by applying it to economics, physiological psychology, game theory, and artificial intelligence and then outlines the way in which this approach modifies the traditional views of mathematical genetics. Initially applying his concepts to simply defined artificial systems with limited numbers of parameters, Holland goes on to explore their use in the study of a wide range of complex, naturally occuring processes, concentrating on systems having multiple factors that interact in nonlinear ways. Along the way he accounts for major effects of coadaptation and coevolution: the emergence of building blocks, or schemata, that are recombined and passed on to succeeding generations to provide, innovations and improvements. Bradford Books imprint
Article
David Goldberg's Genetic Algorithms in Search, Optimization and Machine Learning is by far the bestselling introduction to genetic algorithms. Goldberg is one of the preeminent researchers in the field--he has published over 100 research articles on genetic algorithms and is a student of John Holland, the father of genetic algorithms--and his deep understanding of the material shines through. The book contains a complete listing of a simple genetic algorithm in Pascal, which C programmers can easily understand. The book covers all of the important topics in the field, including crossover, mutation, classifier systems, and fitness scaling, giving a novice with a computer science background enough information to implement a genetic algorithm and describe genetic algorithms to a friend.
Article
We consider "competitive coevolution," in which fitness is based on direct competition among individuals selected from two independently evolving populations of "hosts" and "parasites." Competitive coevolution can lead to an "arms race," in which the two populations reciprocally drive one another to increasing levels of performance and complexity. We use the games of Nim and 3-D Tic-Tac-Toe as test problems to explore three new techniques in competitive coevolution. "Competitive fitness sharing" changes the way fitness is measured; "shared sampling" provides a method for selecting a strong, diverse set of parasites; and the "hall of fame" encourages arms races by saving good individuals from prior generations. We provide several different motivations for these methods and mathematical insights into their use. Experimental comparisons are done, and a detailed analysis of these experiments is presented in terms of testing issues, diversity, extinction, arms race progress measurements, and drift.
Article
Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur; those which are accompanied or closely followed by discomfort to the animal will, other things being equal, have their connections with that situation weakened, so that, when it recurs, they will be less likely to occur. The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond. (Thorndike, 1911) The idea of learning to make appropriate responses based on reinforcing events has its roots in early psychological theories such as Thorndike's "law of effect" (quoted above). Although several important contributions were made in the 1950s, 1960s and 1970s by illustrious luminaries such as Bellman, Minsky, Klopf and others (Farley and Clark, 1954; Bellman, 1957; Minsky, 1961; Samuel, 1963; Michie and Chambers, 1968; Grossberg, 1975; Klopf, 1982), the last two decades have wit- nessed perhaps the strongest advances in the mathematical foundations of reinforcement learning, in addition to several impressive demonstrations of the performance of reinforcement learning algo- rithms in real world tasks. The introductory book by Sutton and Barto, two of the most influential and recognized leaders in the field, is therefore both timely and welcome. The book is divided into three parts. In the first part, the authors introduce and elaborate on the es- sential characteristics of the reinforcement learning problem, namely, the problem of learning "poli- cies" or mappings from environmental states to actions so as to maximize the amount of "reward"
Article
http://vislab.ucr.edu/PUBLICATIONS/pubs/Journal%20and%20Conference%20Papers/before10-1-1997/Journals/1986/AutomatictargetRecognition86.pdf In this paper a review of the techniques used to solve the automatic target recognition (ATR) problem is given. Emphasis is placed on algorithmic and implementation approaches. ATR algorithms such as target detection, segmentation, feature computation, classification, etc. are evaluated and several new quantitative criteria are presented. Evaluation approaches are discussed and various problems encountered in the evaluation of algorithms are addressed. Strategies used in the data base design are outlined. New techniques such as the use of contextual cues, semantic and structural information, hierarchical reasoning in the classification and incorporation of multisensors in ATR systems are also presented.
Article
This paper reviews statistical, adaptive, and heuristic techniques used in laboratory investigations of pattern recognition problems. The discussion includes correlation methods, discriminant analysis, maximum likelihood decisions minimax techniques, perceptron-like algorithms, feature extraction, preprocessing, clustering and nonsupervised learning. Two-dimensional distributions are used to illustrate the properties of the various procedures. Several experimental projects, representative of prospective applications, are also described.
Article
The Bayes Net Toolbox (BNT) is an open-source Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models. BNT is widely used in teaching and research: the web page has received over 28,000 hits since May 2000. In this paper, we discuss a broad spectrum of issues related to graphical models (directed and undirected), and describe, at a high-level, how BNT was designed to cope with them all. We also compare BNT to other software packages for graphical models, and to the nascent OpenBayes effort.
Multi-system integrated control (MuSIC) program, final report
  • P M Doane
  • C H Gay
  • J A Fligg
P.M. Doane, C.H. Gay, J.A. Fligg, Multi-system integrated control (MuSIC) program, ®nal report, Technical Report, Wright Laboratories, Wright-Patterson AFB, OH, 1989.
God save the red queen!: competition in co-evolutionary robotics
  • D Floriano
  • S Nolfi
D. Floriano, S. Nol®, God save the red queen!: competition in co-evolutionary robotics, in: Proceedings of the Second International Conference on Genetic Programming, 1997, pp. 398±406.
Machine learning in exploitation, in: Synthetic Aperture Radar Imagery XII
  • J Gilmore
J. Gilmore, Machine learning in exploitation, in: Synthetic Aperture Radar Imagery XII, SPIE vol. 5808, Orlando FL, May 2005, pp. 337– 344.
MSTAR extended operating conditions: a tutorial, in: Algorithms for Synthetic Aperture Radar Imagery IV
  • E R Keydel
  • S W Lee
  • J T Moore
E.R. Keydel, S.W. Lee, J.T. Moore, MSTAR extended operating conditions: a tutorial, in: Algorithms for Synthetic Aperture Radar Imagery IV, SPIE vol. 2757, Orlando, FL, April 1997, pp. 228–242.
Characterization of ATR systems, in: Algorithms for Synthetic Aperture Radar Imagery IV
  • E Zelnio
  • F Garber
  • L Westerkamp
  • S Worrell
  • J Westerkamp
  • M Jarratt
  • C Deardork
  • P Ryan
E. Zelnio, F. Garber, L. Westerkamp, S. Worrell, J. Westerkamp, M. Jarratt, C. Deardork, P. Ryan, Characterization of ATR systems, in: Algorithms for Synthetic Aperture Radar Imagery IV, SPIE vol. 3070, Orlando, FL, July 1997, pp. 223–234.
Extensibility and other model-based ATR evaluation concepts, in: Algorithms for Synthetic Aperture Radar Imagery IV
  • T Ross
  • L Westerkamp
  • E Zelnio
  • T J Burns
T. Ross, L. Westerkamp, E. Zelnio, T.J. Burns, Extensibility and other model-based ATR evaluation concepts, in: Algorithms for Synthetic Aperture Radar Imagery IV, SPIE vol. 3070, Orlando, FL, July 1997, pp. 213–222.