ArticlePDF Available

Research on Air Combat Maneuver Decision-Making Method Based on Reinforcement Learning

Authors:

Abstract and Figures

With the development of information technology, the degree of intelligence in air combat is increasing, and the demand for automated intelligent decision-making systems is becoming more intense. Based on the characteristics of over-the-horizon air combat, this paper constructs a super-horizon air combat training environment, which includes aircraft model modeling, air combat scene design, enemy aircraft strategy design, and reward and punishment signal design. In order to improve the efficiency of the reinforcement learning algorithm for the exploration of strategy space, this paper proposes a heuristic Q-Network method that integrates expert experience, and uses expert experience as a heuristic signal to guide the search process. At the same time, heuristic exploration and random exploration are combined. Aiming at the over-the-horizon air combat maneuver decision problem, the heuristic Q-Network method is adopted to train the neural network model in the over-the-horizon air combat training environment. Through continuous interaction with the environment, self-learning of the air combat maneuver strategy is realized. The efficiency of the heuristic Q-Network method and effectiveness of the air combat maneuver strategy are verified by simulation experiments.
Content may be subject to copyright.
A preview of the PDF is not available
... The intelligence of the UAV flight is becom-ing the most significant developing trend [7] . Therefore, intelligent decision-making methods have also been considered including Bayesian network [8,9], search algorithm [10][11][12] , and reinforcement learning [13][14][15][16][17] . A Bayesian network was used to make situation assessment or decision prediction to realize maneuver decision [8,9] . ...
... A monte-carol tree search and symbiotic organisms search algorithm was used to find the optimal decision depending on a dominant function [10][11][12] . A reinforcement learning method was proposed to find the optimal maneuver decision according to reward functions [13][14][15][16][17] . Although these methods have been applied successfully to certain conditions, they do not consider that the current decision-making is heavily related to the historical situations in the decision-making process. ...
... When the predicted value and the real value of a sample meet Eq. (12), it is considered that the prediction of the sample is accurate. The of the model can be obtained from Eq. (13). ...
Article
Full-text available
In this paper, a hybrid deep learning network-based model is proposed and implemented for maneuver decision-making in an air combat environment. The model consists of stacked sparse auto-encoder network for dimensionality reduction of high-dimensional, dynamic time series combat-related data and long short term memory network for capturing the quantitative relationship between maneuver control variables and the time series combat-related data after dimensionality reduction. This model features: 1) time series data is used as the basi of decision-making, which is more in line with the actual decision-making process. 2) using stacked sparse auto-encoder network to reduce the dimension of time series data to predict the result more accurately. 3) the model takes the maneuver control variables as the output to control the maneuver, making the maneuver process more flexible. The relevant experiments have demonstrated that the proposed model can effectively improve the prediction accuracy and convergence rate in the prediction of maneuver control variables.
... Reinforcement learning is a learning method that uses "trial and error" to interact with the environment [29], which is a feasible method for autonomous decision making of UAV intelligent air combat maneuver strategy. The application of reinforcement learning in air combat is mainly based on value function search [30,31] and strategy search [32,33]. Deep Q network (DQN) algorithm is improved in [32] and realizes the UAV close range one-to-one air combat, but the algorithm uses discrete state and motion spaces, which makes the results of air combat quite different from reality. ...
... The application of reinforcement learning in air combat is mainly based on value function search [30,31] and strategy search [32,33]. Deep Q network (DQN) algorithm is improved in [32] and realizes the UAV close range one-to-one air combat, but the algorithm uses discrete state and motion spaces, which makes the results of air combat quite different from reality. The Actor-Critic (A-C) framework is used in [34] to realize the continuous expression of UAV maneuver strategy in state space, but the algorithm is only effective in two-dimensional space. ...
Article
Full-text available
Unmanned aerial vehicles (UAVs) have been found significantly important in the air combats, where intelligent and swarms of UAVs will be able to tackle with the tasks of high complexity and dynamics. The key to empower the UAVs with such capability is the autonomous maneuver decision making. In this paper, an autonomous maneuver strategy of UAV swarms in beyond visual range air combat based on reinforcement learning is proposed. First, based on the process of air combat and the constraints of the swarm, the motion model of UAV and the multi-to-one air combat model are established. Second, a two-stage maneuver strategy based on air combat principles is designed which include inter-vehicle collaboration and target-vehicle confrontation. Then, a swarm air combat algorithm based on deep deterministic policy gradient strategy (DDPG) is proposed for online strategy training. Finally, the effectiveness of the proposed algorithm is validated by multi-scene simulations. The results show that the algorithm is suitable for UAV swarms of different scales.
... Value-based reinforcement learning methods cannot deal with the problem of continuous action space [12][13][14][15]. Lillicrap combined the deterministic policy gradient algorithm [16] and actor-critic framework, and a deep deterministic policy gradient (DDPG) algorithm is proposed to address continuous state space and continuous action space problems [17]. ...
Article
Full-text available
With the rapid development of unmanned combat aerial vehicle (UCAV)-related technologies, UCAVs are playing an increasingly important role in military operations. It has become an inevitable trend in the development of future air combat battlefields that UCAVs complete air combat tasks independently to acquire air superiority. In this paper, the UCAV maneuver decision problem in continuous action space is studied based on the deep reinforcement learning strategy optimization method. The UCAV platform model of continuous action space was established. Focusing on the problem of insufficient exploration ability of Ornstein–Uhlenbeck (OU) exploration strategy in the deep deterministic policy gradient (DDPG) algorithm, a heuristic DDPG algorithm was proposed by introducing heuristic exploration strategy, and then a UCAV air combat maneuver decision method based on a heuristic DDPG algorithm is proposed. The superior performance of the algorithm is verified by comparison with different algorithms in the test environment, and the effectiveness of the decision method is verified by simulation of air combat tasks with different difficulty and attack modes.
... Different algorithms suit different abilities. For instance, a neural network based RL algorithm may be better suited for lower-level tactical execution, such a platform maneuvering and control (Zhang et al., 2018); and a rule-based RL algorithm can be suited for more high-level strategic and tactical decision-making. If the task is related to obtaining good situational awareness or understanding, other algorithms come into play such as Deep Learning or Bayesian networks (Chakraborty et al., 2017). ...
Conference Paper
Full-text available
In military modeling and simulation (M&S) there is an increasing need for Computer Generated Forces (CGFs) with machine learning capabilities for use in training or decision support applications. Machine learning based CGFs have benefits for the implementation of adaptive enemies for human-in-the-loop training; the development and evaluation of TTPs (tactics, techniques and procedures) for military platforms; or supporting course of action (CoA) planning and analysis in mission simulation. Machine learning introduces a radical new paradigm for behavior modeling for CGFs. When modeling a behavioral task for a CGF, rather than handcrafting individual decisions and actions, learning algorithms allow for mere specifying the underlying goal of the task, and leaving it up to the algorithm to learn how to achieve this goal. In this paper we explore the implications of this paradigm shift for the future development of CGFs. Underlying the paper is our vision that within five to ten years, military operators can independently employ machine learning for CGFs to enhance their real-world operations. Our contribution to the field is threefold. First, we describe the benefits of CGFs with learning capabilities by reviewing several application areas in the M&S domain. Second, we describe the primary challenges to the exploitation of machine learning capabilities for CGFs in terms of gaps in industry practices, impacts on the process of behavior modeling and deployment in M&S ecosystems. Finally, we report on our experiences with taking a pragmatic approach to overcoming some of these challenges by implementing and demonstrating trainable and reusable CGF behaviors within a conventional CGF behavior modeling tool. The ideas in this paper have been implemented in a military simulation system geared towards the air-to-air combat domain. Based on this implementation, challenges, lessons learned and future directions are discussed.
Chapter
Autonomous maneuver decision-making is an essential issue for unmanned combat air vehicle (UCAV) during air combat. To compensate for the insufficiency of the current studies, a decision-making framework is developed to achieve the autonomous maneuvering of UCAV in confrontation combat. First, a situation assessment model with the consideration of the missile performance is described. Then, a competitive CMA-ES variant with anistropically eigenvalue adaptation and local search strategy (AEALSCE) is employed as the solver to optimize the UCAV control variables to achieve the optimal maneuver action. Experimental studies are carried out to compare different maneuver decision-making methods adopted by a UCAV and target under equal conditions. The simulation results show that the UCAV wins in confrontation games, which demonstrates the validity of the maneuver decision-making model and the superiority of the decision-making method proposed in this paper.
Article
The high operational cost of aircraft, limited availability of air space, and strict safety regulations make training of fighter pilots increasingly challenging. By integrating Live, Virtual, and Constructive simulation resources, efficiency and effectiveness can be improved. In particular, if constructive simulations, which provide synthetic agents operating synthetic vehicles, were used to a higher degree, complex training scenarios could be realised at low cost, the need for support personnel could be reduced, and training availability could be improved. In this work, inspired by the recent improvements of techniques for artificial intelligence, we take a user perspective and investigate how intelligent, learning agents could help build future training systems. Through a domain analysis, a user study, and practical experiments, we identify important agent capabilities and characteristics, and then discuss design approaches and solution concepts for training systems to utilise learning agents for improved training value.
Article
In order to improve the autonomous ability of unmanned aerial vehicles (UAV) to implement air combat mission, many artificial intelligence-based autonomous air combat maneuver decision-making studies have been carried out, but these studies are often aimed at individual decision-making in 1v1 scenarios which rarely happen in actual air combat. Based on the research of the 1v1 autonomous air combat maneuver decision, this paper builds a multi-UAV cooperative air combat maneuver decision model based on multi-agent reinforcement learning. Firstly, a bidirectional recurrent neural network (BRNN) is used to achieve communication between UAV individuals, and the multi-UAV cooperative air combat maneuver decision model under the actor-critic architecture is established. Secondly, through combining with target allocation and air combat situation assessment, the tactical goal of the formation is merged with the reinforcement learning goal of every UAV, and a cooperative tactical maneuver policy is generated. The simulation results prove that the multi-UAV cooperative air combat maneuver decision model established in this paper can obtain the cooperative maneuver policy through reinforcement learning, the cooperative maneuver policy can guide UAVs to obtain the overall situational advantage and defeat the opponents under tactical cooperation.
Article
This paper investigates the air combat mission between the multiple Unmanned Aerial Vehicle (UAV) and hostile multi-UAV. The multi-UAV air combat threat assessment model is firstly established to evaluate the situation of each drone in the combat scenario and then, the target allocation problem is formulated based on matrix game theory. To solve the target allocation problem, the modified Estimate of Distribution Algorithm (EDA) is proposed to search the best strategy. After the target allocation, the social behavioral based sliding mode control is finally constructed to realize the UAV swarm motion. The simulation experiment of multi-UAV air combat proves the validity of the algorithm.
Article
To solve the problem of realizing autonomous aerial combat decision-making for unmanned combat aerial vehicles (UCAVs) rapidly and accurately in an uncertain environment, this paper proposes a decision-making method based on an improved deep reinforcement learning (DRL) algorithm: the multi-step double deep Q-network (MS-DDQN) algorithm. First, a six-degree-of-freedom UCAV model based on an aircraft control system is established on a simulation platform, and the situation assessment functions of the UCAV and its target are established by considering their angles, altitudes, environments, missile attack performances, and UCAV performance. By controlling the flight path angle, roll angle, and flight velocity, 27 common basic actions are designed. On this basis, aiming to overcome the defects of traditional DRL in terms of training speed and convergence speed, the improved MS-DDQN method is introduced to incorporate the final return value into the previous steps. Finally, the pre-training learning model is used as the starting point for the second learning model to simulate the UCAV aerial combat decision-making process based on the basic training method, which helps to shorten the training time and improve the learning efficiency. The improved DRL algorithm significantly accelerates the training speed and estimates the target value more accurately during training, and it can be applied to aerial combat decision-making.
Article
Full-text available
Breakthroughs in genetic fuzzy systems, most notably the development of the Genetic Fuzzy Tree methodology, have allowed fuzzy logic based Artificial Intelligences to be developed that can be applied to incredibly complex problems. The ability to have extreme performance and computational efficiency as well as to be robust to uncertainties and randomness, adaptable to changing scenarios, verified and validated to follow safety specifications and operating doctrines via formal methods, and easily designed and implemented are just some of the strengths that this type of control brings. Within this white paper, the authors introduce ALPHA, an Artificial Intelligence that controls flights of Unmanned Combat Aerial Vehicles in aerial combat missions within an extreme-fidelity simulation environment. To this day, this represents the most complex application of a fuzzy-logic based Artificial Intelligence to an Unmanned Combat Aerial Vehicle control problem. While development is on-going, the version of ALPHA presented withinwas assessed by Colonel (retired)Gene Lee who described ALPHA as “the most aggressive, responsive, dynamic and credible AI (he’s) seen-to-date.” The quality of these preliminary results in a problem that is not only complex and rife with uncertainties but also contains an intelligent and unrestricted hostile force has significant implications for this type of Artificial Intelligence. This work adds immensely to the body of evidence that this methodology is an ideal solution to a very wide array of problems.
Article
Full-text available
This study introduces the technique of Genetic Fuzzy Trees (GFTs) through novel application to an air combat control problem of an autonomous squadron of Unmanned Combat Aerial Vehicles (UCAVs) equipped with next-generation defensive systems. GFTs are a natural evolution to Genetic Fuzzy Systems, in which multiple cascading fuzzy systems are optimized by genetic methods. In this problem a team of UCAV's must traverse through a battle space and counter enemy threats, utilize imperfect systems, cope with uncertainty, and successfully destroy critical targets. Enemy threats take the form of Air Interceptors (AIs), Surface to Air Missile (SAM) sites, and Electronic WARfare (EWAR) stations. Simultaneous training and tuning a multitude of Fuzzy Inference Systems (FISs), with varying degrees of connectivity, is performed through the use of an optimized Genetic Algorithm (GA). The GFT presented in this study, the Learning Enhanced Tactical Handling Algorithm (LETHA), is able to create controllers with the presence of deep learning, resilience to uncertainties, and adaptability to changing scenarios. These resulting deterministic fuzzy controllers are easily understandable by operators, are of very high performance and efficiency, and are consistently capable of completing new and different missions not trained for.
Article
Full-text available
Complex pursuit-evasion games with complete information under state variable inequality constraints are investigated. By exploitation of Isaacs’ minimax principle, necessary conditions of first and second order are derived for the optimal trajectories. These conditions give rise to multipoint boundary-value problems, which yield open-loop representations of the optimal strategies along the optimal trajectories. The multipoint boundary-value problems are accurately solved by the multiple shooting method. The computed open-loop representations can thereafter be used to synthesize the optimal strategies globally. As an illustrative example, the evasion of an aircraft from a pursuing missile is investigated. The flight of the aircraft is restricted by various control variable inequality contraints and by a state variable inequality constraint for the dynamic pressure. The optimal trajectories exhibit boundary arcs with regular and singular constrained controls. The influence of various singular surfaces in the state space including a low-dimensional universal surface is discussed.
Article
Full-text available
The primary goal of this project was to explore the applicability of artificial neural network (NN) models in the domain of air combat maneuvering (ACM). The work investigated several models: (a) NN models that select ACM on the basis of training with the production rules of a model, Air Combat Expert Simulation (ACES); (b) NN models that mimic the action selections of the Automated Maneuvering Logic (AML) System; (c) NN models that predict the outcome of engagements flown in the Simulator for Air-to-Air Combat (SAAC) given summary measures of various parameters measured during the engagements; and (d) NN models that predict future aircraft control inputs in SAAC engagements given the values of flight parameters at particular points in time. These various models incorporate knowledge about air combat maneuvers and components of maneuvers as well as rudimentary knowledge about maneuver planning and situational awareness. For most of the models, validation tests were conducted using data different from that used in training the models. The authors provide details on each of these efforts as well as a review of the ACES model, a presentation of the basics of NNs, and an overview of a software system developed for the implementation and testing of the NN models. Air combat, Flight simulation, Performance measurement, Air combat maneuvering, Flight simulators, Flight training, Neural networks.
Article
Full-text available
Unmanned Aircraft Systems (UAS) have the potential to perform many of the dangerous missions currently own by manned aircraft. Yet, the complexity of some tasks, such as air combat, have precluded UAS from successfully carrying out these missions autonomously. This paper presents a formulation of a level flight, fixed velocity, one-on-one air combat maneuvering problem and an approximate dynamic programming (ADP) approach for computing an efficient approximation of the optimal policy. In the version of the problem formulation considered, the aircraft learning the optimal policy is given a slight performance advantage. This ADP approach provides a fast response to a rapidly changing tactical situation, long planning horizons, and good performance without explicit coding of air combat tactics. The method's success is due to extensive feature development, reward shaping and trajectory sampling. An accompanying fast and e ffective rollout-based policy extraction method is used to accomplish on-line implementation. Simulation results are provided that demonstrate the robustness of the method against an opponent beginning from both off ensive and defensive situations. Flight results are also presented using micro-UAS own at MIT's Real-time indoor Autonomous Vehicle test ENvironment (RAVEN).
Conference Paper
With the rapid development of intelligent technology, sensing technology, digital communication technology and virtual reality technology etc, the research for air-combat UAV ushered in a rare opportunity. The UAV Air-Combat Decision problem has become one of the research focuses of the military field in the world. Firstly, this paper analyzes the specific process of the UAV Air-Combat Decision and divides it into four decision-making phases. Secondly, the paper introduces the research status and the decision-making methods of the various phases, to further analyze the UAV Air-Combat Decisions. Finally, the paper analyzes in detail the difficulties and the research trends of the UAV Air-Combat Decision in the future.
Article
Since future air combat missions will involve both manned and unmanned aircraft, the primary motivation for this research is to enable unmanned aircraft with intelligent maneuvering capabilities. During air combat maneuvering, pilots use their knowledge and experience of maneuvering strategies and tactics to determine the best course of action. As a result, we try to capture these aspects using an artificial immune system approach. The biological immune system protects the body against intruders by recognizing and destroying harmful cells or molecules. It can be thought of as a robust adaptive system that is capable of dealing with an enormous variety of disturbances and uncertainties. However, another critical aspect of the immune system is that it can remember how previous encounters were successfully defeated. As a result, it can respond faster to similar encounters in the future. This paper describes how an artificial immune system is used to select and construct air combat maneuvers. These maneuvers are composed of autopilot mode and target commands, which represent the low-level building blocks of the parameterized system. The resulting command sequences are sent to a tactical autopilot system, which has been enhanced with additional modes and an aggressiveness factor for enabling high performance maneuvers. Just as vaccinations train the biological immune system how to combat intruders, training sets are used to teach the maneuvering system how to respond to different enemy aircraft situations. Simulation results are presented, which demonstrate the potential of using immunized maneuver selection for the purposes of air combat maneuvering.