Conference Paper

Optimal automatic train operation via deep reinforcement learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Using DRL, promising results have been reported in multiple rail transportation areas, including train timetable rescheduling (Ning et al., 2019;Obara et al., 2018;Wang et al., 2019f;Yang et al., 2019), automatic train operations (Zhou and Song, 2018;Zhou et al., 2020;Zhu et al., 2017), and train shunting operations (Peer et al., 2018). For each of the problems, we review relevant publications and provide a summary in Table 7. Train timetable rescheduling problems involve finding a feasible timetable of a train either by re-routing, re-ordering, retiming, or canceling in case of uncertain disturbances associated with equipment/system failure along the railway line. ...
... One of the objectives for automatic train operations is minimizing energy consumption (Zhou and Song, 2018;Zhu et al., 2017), subject to Availability, reshuffling, and yard crane shifting delay (Zhu et al., 2017), punctuality, and riding comfort constraints. In doing so, Zhou and Song (2018) and Zhou et al. (2020) define speed and train position as state. ...
... One of the objectives for automatic train operations is minimizing energy consumption (Zhou and Song, 2018;Zhu et al., 2017), subject to Availability, reshuffling, and yard crane shifting delay (Zhu et al., 2017), punctuality, and riding comfort constraints. In doing so, Zhou and Song (2018) and Zhou et al. (2020) define speed and train position as state. The magnitude of acceleration and deceleration is considered as action. ...
Article
Full-text available
Applying and adapting deep reinforcement learning (DRL) to tackle transportation problems is an emerging interdisciplinary field. While rapidly growing, a comprehensive and synthetic review of existing DRL applications and adaptations in transportation research remains missing. The objective of this paper is to fill this gap. We expose the broad transportation research community to the methodological fundamentals of DRL, and present what have been accomplished in the literature by reviewing a total of 155 relevant papers that have appeared between 2016 and 2020. Based on the review, we further synthesize the applicability, strengths, shortcomings, issues, and directions for future DRL research in transportation, along with a discussion on the available DRL research resources. We hope that this review will serve as a useful reference for the transportation community to better understand DRL and its many potentials to advance research, and to stimulate further explorations in this exciting area.
... (Meng et al., 2020) and employed deep Q networks (DQN) to optimize train control strategies. Subsequently, the issue of continuous action space in Deep Reinforcement Learning (DRL) has been extensively examined by scholars (Zhou and Song, 2018;Zhou et al., 2022). Notably, (Lillicrap, 2015) and (Plaksin, 2022) have proposed subway intelligent operation algorithms based on the Deep Deterministic Policy Gradient (DDPG) and Normalized Advantage Function (NAF) respectively. ...
Article
Full-text available
The energy consumption of urban rail transit plays a significant role in the operating costs of trains. It is particularly crucial to decrease the energy consumption of the traction power supply in subway systems, as it accounts for approximately half of the total energy consumption of the subway operating organization. To overcome the limitations of traditional real-time speed profile generation methods and the limited exploration capabilities of popular reinforcement learning algorithms in the speed domain, this paper presents the Energy-Saving Maximum Entropy Deep Reinforcement Learning (ES-MEDRL) algorithm. The ES-MEDRL algorithm incorporates Lagrange multipliers and maximum policy entropy as penalties to formulate a novel objective function. This function aims to intensify exploration in the speed domain, minimize train traction energy consumption, and ensure a balance between ride comfort, punctuality, and safety within the subway system. This leads to the optimization of speed profile strategies. To further reduce energy consumption, this paper proposes a secondary optimization strategy for the energy-saving speed profile. This approach involves trading acceptable travel time for improved energy efficiency. To validate the performance of the proposed model and algorithm, numerical experiments are conducted using the Yizhuang Line of the Beijing Metro. The findings demonstrate a minimum 20 % increase in energy efficiency with the ES-MEDRL algorithm compared to manual driving. This algorithm can guide energy-efficient train operations at the planning level.
... Moreover, deep reinforcement learning (DRL) [26] can learn optimal controls in more complicated decision-making problems than converntional RL by using deep neural networks (DNNs), such as settings with high-dimensional state and action spaces. Many studies have combined DRL with expert knowledge rules to solve the TTO [9], [27]- [29]. Nevertheless, few considers real-time TTO with disturbances during operation. ...
Article
Full-text available
Artificial intelligence of things (AIoT)-enabled intelligent automatic train operation (iATO) is an urgently needed technology to expand the capability of ATO in addressing the real-time responsiveness and dynamic online challenges to energy-efficient train trajectory optimization (TTO) and its associated ride-comfort, punctuality, and safety issues in modern urban rail transit networks. This paper proposes a three-step supervised reinforcement learning-based intelligent energy-efficient train trajectory optimization (SRL-IETTO) approach for iATO by hybrid-integrating deep reinforcement learning (DRL) and supervised learning. First, multiple objectives are formulated based on real-time train operation and systematically integrated into the RL algorithm by a binary function-based goal-directed reward design method. Second, an IETTO model is established to handle uncertain disturbances in real-time train operation and generate optimal energy-efficient train trajectories online by optimizing energy efficiency and receiving supervisory information from trajectories of pre-trained TTO models. Finally, numerical simulations are implemented to validate the effectiveness of the SRL-IETTO using in-service subway line data. The results demonstrate the superiority and improved energy saving of the proposed approach, and confirm its adaptability to online trip time adjustments within the practical running time range under uncertain disturbances with less trip time error compared to other intelligent TTO algorithms.
... They presented using artificial intelligence for train timetable synchronization and optimization, speed control, and trajectory control to reduce energy and costs while improving train safety, comfort, and convenience. Zhou et al. [70] transformed the HST trajectory planning problem into an optimization problem and applied deep learning to solve the problem. Simulation results show that the proposed method can improve the energy efficiency of the trajectory planning problem, and its smooth decreasing process can better satisfy the comfort criteria. ...
Article
Full-text available
Energy‐efficient train operation (EETO) in high‐speed railways (HSRs) is an extra cost‐effective and flexible means to promote energy‐saving. This paper first examines the energy consumption sources and energy‐saving measures of high‐speed trains (HSTs). Then presents the EETO in HSRs, including three categories: energy‐efficient train control, energy‐efficient train timetabling, and EETO considering train timetabling and driving strategy. Next, the current research status and progress on three aspects of EETO in HSRs, namely optimization algorithm, constraint condition, and oriented scenario, are sorted out. Finally, the research on EETO in HSRs has prospected and proposed directions that can study in depth in the future. By summarizing and reviewing the research on EETO in HSRs, it is hoped to furnish a reference for the railway sector to enhance quality, efficiency, and green, low‐carbon development. © 2022 Institute of Electrical Engineers of Japan. Published by Wiley Periodicals LLC.
Article
The influence of geological development factors such as reservoir heterogeneity needs to be comprehensively considered in the determination of oil well production control strategy. In the past, many optimization algorithms are introduced and coupled with numerical simulation for well control problems. However, these methods require a large number of simulations, and the experience of these simulations is not preserved by the algorithm. For each new reservoir, the optimization algorithm needs to start over again. To address the above problems, two reinforcement learning methods are introduced in this research. A personalized Deep Q-Network (DQN) algorithm and a personalized Soft Actor-Critic (SAC)algorithm are designed for optimal control determination of oil wells. The inputs of the algorithms are matrix of reservoir properties, including reservoir saturation, permeability, etc., which can be treated as images. The output is the oil well production strategy. A series of samples are cut from two different reservoirs to form a dataset. Each sample is a square area that takes an oil well at the center, with different permeability and saturation distribution, and different oil-water well patterns. Moreover, all samples are expanded by using image enhancement technology to further increase the number of samples and improve the coverage of the samples to the reservoir conditions. During the training process, two training strategies are investigated for each personalized algorithm. The second strategy uses 4 times more samples than the first strategy. At last, a new set of samples is designed to verify the model’s accuracy and generalization ability. Results show that both the trained DQN and SAC models can learn and store historical experience, and push appropriate control strategies based on reservoir characteristics of new oil wells. The agreement between the optimal control strategy obtained by both algorithms and the global optimal strategy obtained by the exhaustive method is more than 95%. The personalized SAC algorithm shows better performance compared to the personalized DQN algorithm. Compared to the traditional Particle Swarm Optimization (PSO), the personalized models were faster and better at capturing complex patterns and adapting to different geological conditions, making them effective for real-time decision-making and optimizing oil well production strategies. Since a large amount of historical experience has been learned and stored in the algorithm, the proposed method requires only 1 simulation for a new oil well control optimization problem, which showing the superiority in computational efficiency.
Preprint
Full-text available
Deep reinforcement learning (DRL) is an emerging methodology that is transforming the way many complicated transportation decision-making problems are tackled. Researchers have been increasingly turning to this powerful learning-based methodology to solve challenging problems across transportation fields. While many promising applications have been reported in the literature, there remains a lack of comprehensive synthesis of the many DRL algorithms and their uses and adaptations. The objective of this paper is to fill this gap by conducting a comprehensive, synthesized review of DRL applications in transportation. We start by offering an overview of the DRL mathematical background, popular and promising DRL algorithms, and some highly effective DRL extensions. Building on this overview, a systematic investigation of about 150 DRL studies that have appeared in the transportation literature, divided into seven different categories, is performed. Building on this review, we continue to examine the applicability, strengths, shortcomings, and common and application-specific issues of DRL techniques with regard to their applications in transportation. In the end, we recommend directions for future research and present available resources for actually implementing DRL.
Chapter
The learning-based energy management strategy (EMS) is able to optimize the control of the heterogeneous multi-energy drive system (HMDS) by learning relevant offline data or online data and centralized training, and therefore realizes lower consumption and higher efficiency. Moreover, it is equally of great important for HMDS to select an appropriate drive structure as it is to develop a suitable energy management strategy. In this paper, domestic and overseas development situation of HMDS is discussed. Moreover, it describes the drive structure of the present HMDS and then introduces the research status of two learning-based EMS in HMDS. In addition, the actual implementation prospect and challenges of learning-based energy management strategy are proposed by further analysis.
Article
Full-text available
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Article
Full-text available
A parallel multipopulation genetic algorithm (PMPGA) is proposed to optimize the train control strategy, which reduces the energy consumption at a specified running time. The paper considered not only energy consumption, but also running time, security, and riding comfort. Also an actual railway line (Beijing-Shanghai High-Speed Railway) parameter including the slop, tunnel, and curve was applied for simulation. Train traction property and braking property was explored detailed to ensure the accuracy of running. The PMPGA was also compared with the standard genetic algorithm (SGA); the influence of the fitness function representation on the search results was also explored. By running a series of simulations, energy savings were found, both qualitatively and quantitatively, which were affected by applying cursing and coasting running status. The paper compared the PMPGA with the multiobjective fuzzy optimization algorithm and differential evolution based algorithm and showed that PMPGA has achieved better result. The method can be widely applied to related high-speed train.
Article
Full-text available
This paper discusses a well-known tool for calculating train resistance to motion and its suitability for describing operations at high speed. The tool, originally developed by Armstrong and Swift [1], also permits the estimation of the contribution to aerodynamic resistance of various features of the architecture of a train. They compare this approach with the results of other formulae for calculating train resistance, as well as published measurements taken during experimental work. It is concluded that Armstrong and Swift's expressions can be considered to provide good estimates for the coefficients to the Davis equation for both high-speed and suburban trains that fit the British loading gauge and have a power car-trailer ratio of 1:3 or less without the need for run-down testing. However, the expressions are not suitable for trains with a predominance of powered axles.
Article
Full-text available
This paper describes an analytical process that computes the optimal operating successions of a rail vehicle to minimize energy consumption. Rising energy prices and environmental concerns have made energy conservation a high priority for transportation operations. The cost of energy consumption makes up a large portion of the Operation and Maintenance (O&M) costs of transit especially rail transit systems. Energy conservation or reduction in energy cost may be one of the effective ways to reduce transit operating cost, therefore improve the efficiency of transit operations.
Article
2014 In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.
Article
High-speed train transportation is organized in a way of globally centralized planning and locally autonomous adjustment with the real-time known positions, speeds and other state information of trains. The hierarchical integration architecture composed of top, middle and bottom levels is proposed based on model predictive control (MPC) for the real-time scheduling and control. The middle-level trajectory configuration and tractive force setpoints play a critical role in fulfilling the top-level scheduling commands and guaranteeing the controllability of bottomlevel train operations. In the middle-level MPC-based train operation planning, the continuous cellular automaton model of train movements is proposed to dynamically configure the train operation positions and speeds at appointed time, which synthetically considers the scheduling strategies from the top layer, and the tempospatial constraints and operation statuses at the bottom level. The macroscopic dynamic model of a train predicts the trajectories under the candidate control sequences. Through Levenberg-Marquardt optimization, the feasible tractive forces and updated trajectories are attained under the power constraints of electric machines. Numerical results have demonstrated the effectiveness of proposed control planning technique. This paper reveals the utilities of different-level models of train movements for the accomplishment of railway network operation optimization and the guaranty of individual train operation safety. It also provides a solution to automatic trajectory configuration in the automatic train protection (ATP) and operation (ATO) systems.
Article
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Article
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch}. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
Article
The optimal operation of railway systems minimizing total energy consumption is discussed in this paper. Firstly, some measures of finding energy-saving train speed profiles are outlined. After the characteristics that should be considered in optimizing train operation are clarified, complete optimization based on optimal control theory is reviewed. Their basic formulations are summarized taking into account most of the difficult characteristics peculiar to railway systems. Three methods of solving the formulation, dynamic programming (DP), gradient method, and sequential quadratic programming (SQP), are introduced. The last two methods can also control the state of charge (SOC) of the energy storage devices. By showing some numerical results of simulations, the significance of solving not only optimal speed profiles but also optimal SOC profiles of energy storage are emphasized, because the numerical results are beyond the conventional qualitative studies. Future scope for applying the methods to real-time optimal control is also mentioned. Copyright © 2010 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.