Article

Policy Optimization to Learn Adaptive Motion Primitives in Path Planning With Dynamic Obstacles

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper addresses the kinodynamic motion planning for non-holonomic robots in dynamic environments with both static and dynamic obstacles – a challenging problem that lacks a universal solution yet. One of the promising approaches to solve it is decomposing the problem into the smaller sub-problems and combining the local solutions into the global one. The crux of any planning method for non-holonomic robots is the generation of motion primitives that generates solutions to local planning sub-problems. In this work we introduce a novel learnable steering function (policy), which takes into account kinodynamic constraints of the robot and both static and dynamic obstacles. This policy is efficiently trained via the policy optimization. Empirically, we show that our steering function generalizes well to unseen problems. We then plug in the trained policy into the sampling-based and lattice-based planners, and evaluate the resultant POLAMP algorithm (Policy Optimization that Learns Adaptive Motion Primitives) in a range of challenging setups that involve a car-like robot operating in the obstacle-rich parking-lot environments. We show that POLAMP is able to plan collision-free kinodynamic trajectories with success rates higher than 92%, when 50 simultaneously moving obstacles populate the environment showing better performance than the state-of-the-art competitors. The code is available at https://github.com/BrianAnguloYauri/POLAMP .

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The POLAMP environment is proposed in [27] and provides a simulator with an autonomous vehicle equipped with a kinematic bicycle model and lidar sensor. This simulator is populated with static obstacles(hazard zones) as shown in Fig.4. ...
... This simulator is populated with static obstacles(hazard zones) as shown in Fig.4. This paper considers the same action space as [27]. The actions a t = (a, ω) are composed of the linear acceleration a and rotation rate ω. ...
... The actions a t = (a, ω) are composed of the linear acceleration a and rotation rate ω. We refer the readers to [27] for details of the actions and kinematic model. The observation o t is a vector that consists of the N beams = 39 measurements of the lidar that cover the 360 • surrounding of the robot up to the length of beam max = 10 m concatenated with the robot's state (x, y, θ, v, γ), where (x, y, θ, v, γ) are the parameters of the current state. ...
Preprint
Reinforcement learning is a widely used approach to autonomous navigation, showing potential in various tasks and robotic setups. Still, it often struggles to reach distant goals when safety constraints are imposed (e.g., the wheeled robot is prohibited from moving close to the obstacles). One of the main reasons for poor performance in such setups, which is common in practice, is that the need to respect the safety constraints degrades the exploration capabilities of an RL agent. To this end, we introduce a novel learnable algorithm that is based on decomposing the initial problem into smaller sub-problems via intermediate goals, on the one hand, and respects the limit of the cumulative safety constraints, on the other hand -- SPEIS(Safe Policy Exploration Improvement via Subgoals). It comprises the two coupled policies trained end-to-end: subgoal and safe. The subgoal policy is trained to generate the subgoal based on the transitions from the buffer of the safe (main) policy that helps the safe policy to reach distant goals. Simultaneously, the safe policy maximizes its rewards while attempting not to violate the limit of the cumulative safety constraints, thus providing a certain level of safety. We evaluate SPEIS in a wide range of challenging (simulated) environments that involve different types of robots in two different environments: autonomous vehicles from the POLAMP environment and car, point, doggo, and sweep from the safety-gym environment. We demonstrate that our method consistently outperforms state-of-the-art competitors and can significantly reduce the collision rate while maintaining high success rates (higher by 80% compared to the best-performing methods).
... To evaluate the influence of safety constraints, we will use a policy-gradient algorithm PPO [6] and its safe version -LagrangianPPO (LPPO) [4]. The learning and evaluation of these two algorithms are conducted in a gym environment from [7]. Next, we will briefly introduce some details of environment. ...
... We consider agent's state s t = (∆x, ∆y, ∆θ, ∆v, ∆γ, θ, v, γ, l) where ∆x j is the difference between the j-element of the tuple between goal and current vehicle state, θ, v, γ are elements of current vehicle state and l is a tuple of rays-measurements from lidar. In this paper, we consider the same actions and reward function from [7]. The actions a t = (a, ω) are composed of the linear acceleration a and rotation rate ω. ...
... The actions a t = (a, ω) are composed of the linear acceleration a and rotation rate ω. For details of the actions, kinematic model and reward function we refer the readers to [7]. ...
Preprint
While reinforcement learning algorithms have had great success in the field of autonomous navigation, they cannot be straightforwardly applied to the real autonomous systems without considering the safety constraints. The later are crucial to avoid unsafe behaviors of the autonomous vehicle on the road. To highlight the importance of these constraints, in this study, we compare two learnable navigation policies: safe and unsafe. The safe policy takes the constraints into account, while the other does not. We show that the safe policy is able to generate trajectories with more clearance (distance to the obstacles) and makes less collisions while training without sacrificing the overall performance.
... Graph-based, evolutionary, genetic algorithms [1], reinforcement learning [38], [39]. ...
... Compared to other traditional path planning algorithms, RL does not require specific forms of objective functions, while it can dynamically learn and adapt to optimise navigation routes [52]. Some recent works have enhanced traditional path planners with machine learning techniques, for example, the authors in [39] have applied an RL policy optimisation method to local planning sub-problems on a regular A* path planner. Although current path planning algorithms exhibit good performance, no single strategy can fully address the complexity of mobile robot navigation constraints. ...
Conference Paper
Full-text available
The deployment of autonomous robots in harsh and unpredictable environments has gained significant attention in recent years, driven by the need to improve safety and efficiency in tasks where human involvement poses significant risks. Technological advances in sensors and communication technologies, as well as the proliferation of AI, have catalysed the development of highly capable and robust robotic systems in diverse scenarios, optimising efficiency, particularly in the area of disaster response. However, despite the progress made in the field of autonomous Search and Rescue (SAR) operations in harsh environments, several challenges remain unresolved. This paper presents a comprehensive overview of the key technological advancements in components, algorithms and communication systems and how they facilitate highly efficient and robust autonomous missions in extreme subterranean conditions and SAR operations. Additionally, this paper aims to tackle some of the key challenges encountered during emergency response operations through a novel autonomous robotic design that leverages locomotion, perception, communication, and control capabilities to achieve effective exploration tasks in harsh and hazardous environments. Finally, it sets the scene for future enhancements in the area of autonomous and intelligent robots, suggesting potential avenues for further research and development in enhancing their capabilities and adaptability.
... Learning-based approaches Deo and Trivedi [2018], Jiang et al. [2023], Cheng et al. [2023], Shi et al. [2022] utilize machine learning and deep learning on extensive driving datasets to predict trajectories, with recurrent neural networks (RNNs), convolutional neural networks (CNNs), and hybrid models showing promise. RL methods learn optimal policies for predicting future trajectories Angulo et al. [2023]. Figure 3: The schematic diagram of the proposed autonomous driving system is presented. ...
Preprint
Full-text available
Decision-making, motion planning, and trajectory prediction are crucial in autonomous driving systems. By accurately forecasting the movements of other road users, the decision-making capabilities of the autonomous system can be enhanced, making it more effective in responding to dynamic and unpredictable environments and more adaptive to diverse road scenarios. This paper presents the FFStreams++ approach for decision-making and motion planning of different maneuvers, including unprotected left turn, overtaking, and keep-lane. FFStreams++ is a combination of sampling-based and search-based approaches, where iteratively new sampled trajectories for different maneuvers are generated and optimized, and afterward, a heuristic search planner is called, searching for an optimal plan. We model the autonomous diving system in the Planning Domain Definition Language (PDDL) and search for the optimal plan using a heuristic Fast-Forward planner. In this approach, the initial state of the problem is modified iteratively through streams, which will generate maneuver-specific trajectory candidates, increasing the iterating level until an optimal plan is found. FFStreams++ integrates a query-connected network model for predicting possible future trajectories for each surrounding obstacle along with their probabilities. The proposed approach was tested on the CommonRoad simulation framework. We use a collection of randomly generated driving scenarios for overtaking and unprotected left turns at intersections to evaluate the FFStreams++ planner. The test results confirmed that the proposed approach can effectively execute various maneuvers to ensure safety and reduce the risk of collisions with nearby traffic agents.
... Furthermore, Francis et al. [49] introduced PRM-RL for long-range navigation, where RL serves as both a short-range local planner and a collision-prediction module for the high-level PRM planner. This dual role enhances the planner's ability to navigate complex environments over long distances. ...
Article
Full-text available
Efficient navigation is crucial for intelligent mobile robots in complex environments. This paper introduces an innovative approach that seamlessly integrates advanced machine learning techniques to enhance mobile robot communication and path planning efficiency. Our method combines supervised and unsupervised learning, utilizing spline interpolation to generate smooth paths with minimal directional changes. Experimental validation with a differential drive mobile robot demonstrates exceptional trajectory control efficiency. We also explore Motion Planning Networks (MPNets), a neural planner that processes raw point-cloud data from depth sensors. Our tests demonstrate MPNet’s ability to create optimal paths using the Probabilistic Roadmap (PRM) method. We highlight the importance of correctly setting parameters for reliable path planning with MPNet and evaluate the algorithm on various path types. Our experiments confirm that the trajectory control algorithm works effectively, consistently providing precise and efficient trajectory control for the robot.
... Francis et al. [20] proposed PRM-RL for longrange navigation, in which RL functions as a short-range local planner and also a collision-prediction module for the high-level PRM planner. A similar framework can be seen in [21] which uses a regular global planner (RRT or A*) and an RL policy optimization method as the local planner. Besides, there are pure neural motion planners that approximate the motion planners with neural networks, mapping states directly to paths or actions. ...
Article
High-quality and representative data is essential for both Imitation Learning (IL)- and Reinforcement Learning (RL)-based motion planning tasks. For real robots, it is challenging to collect enough qualified data either as demonstrations for IL or experiences for RL due to safety consideration in environments with obstacles. We target this challenge by proposing the selfimitation learning by planning plus (SILP+) algorithm, which efficiently embeds experience-based planning into the learning architecture to mitigate the data-collection problem. The planner generates demonstrations based on successfully visited states from the current RL policy, and the policy improves by learning from these demonstrations. In this way, we relieve the demand for human expert operators to collect demonstrations required by IL and improve the RL performance as well. Various experimental results shows that SILP+ achieves better training efficiency, higher and more stable success rate in complex motion planning tasks compared to several other methods. Extensive tests on physical robots illustrate the effectiveness of SILP+ in a physical setting, retaining a success rate of 90% where the next-best contender drops from 87% to 75% in the Sim2Real transition.
... Francis et al. [20] proposed PRM-RL for longrange navigation, in which RL functions as a short-range local planner and also a collision-prediction module for the high-level PRM planner. A similar framework can be seen in [21] which uses a regular global planner (RRT or A*) and an RL policy optimization method as the local planner. Besides, there are pure neural motion planners that approximate the motion planners with neural networks, mapping states directly to paths or actions. ...
Preprint
Full-text available
High-quality and representative data is essential for both Imitation Learning (IL)- and Reinforcement Learning (RL)-based motion planning tasks. For real robots, it is challenging to collect enough qualified data either as demonstrations for IL or experiences for RL due to safety considerations in environments with obstacles. We target this challenge by proposing the self-imitation learning by planning plus (SILP+) algorithm, which efficiently embeds experience-based planning into the learning architecture to mitigate the data-collection problem. The planner generates demonstrations based on successfully visited states from the current RL policy, and the policy improves by learning from these demonstrations. In this way, we relieve the demand for human expert operators to collect demonstrations required by IL and improve the RL performance as well. Various experimental results show that SILP+ achieves better training efficiency higher and more stable success rate in complex motion planning tasks compared to several other methods. Extensive tests on physical robots illustrate the effectiveness of SILP+ in a physical setting.
Preprint
Full-text available
Graphical abstract Abstract Multi-robot path planning must adapt to difficult situations, allowing autonomous navigation in both static and dynamic barriers in complicated environments. However, defining the best planning strategies for certain applications remains unsolved. This study focused at three methods for learning complex robotic decision-making principles such as Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO), and Deep Reinforcement Learning (DRL). Furthermore, proposed a novel technique for obstacle avoidance and autonomous navigation called Dynamic Improvement Trust Region Policy Optimization with Covariance Grid Adaptation (DITRPO-CGA). Initially, created the Dynamic Improvement Proximal Policy Optimization with Covariance Grid Adaptation (DIPPO-CGA) based on PPO to assure collision-free policies. Next, developed a DRL technique that integrates DIPPO-CGA, resulting in the DITRPO-CGA algorithm, which improved the flexibility of multi-robot systems in different situations. During training process, DIPPO-CGA is utilized to optimize the multi-robot multi-task policies, ensuring least distance obstacle avoidance and target completion. The proposed DIPPO-CGA algorithm reaches the target within minimum distance. The findings showed that when compared to PPO, TRPO, and DIPPO-CGA, the proposed DITRPO-CGA algorithm achieves a higher convergence rate, faster target achievement and reaches the positions more quickly.
Chapter
This paper addresses the autonomous parking for a vehicle in environments with static and dynamic obstacles. Although parking maneuvering has reached the level of fully automated valet parking, there are still many challenges to realize the parking motion planning in the presence of dynamic obstacles. One of the most famous autonomous driving platforms is the Baidu Apollo platform. In the Apollo platform, this problem is solved using the classic method hybrid A*. However, this method has two main downsides. Firstly, it generates in some parking scenarios, trajectories that consist of many partitions with different gear types and sizes. Such trajectories are intractable by a self-driving car when testing the Apollo planner on more realistic data coming from a simulator such as SVL. Secondly, the built-in algorithm does not have the ability to interact with dynamic obstacles, which might lead to a collision in some critical parking scenarios. To overcome these issues, we proposed a method based on reinforcement learning, which uses the RL-policy (from POLAMP) allowing us to take into account the kinematic constraints of the vehicle, static and dynamic obstacles. The proposed method was fully integrated into the Apollo platform with developed Cyber RT nodes, which were used for publishing the parking trajectory from our algorithm to the SVL simulator through a ROS/Cyber bridge. The final model demonstrates transferability to the previously unseen experimental environments and flexibility with respect to built-in hybrid A*.
Chapter
Full-text available
In safety-critical systems such as autonomous driving systems, behavior planning is a significant challenge. The presence of numerous dynamic obstacles makes the driving environment unpredictable. The planning algorithm should be safe, reactive, and adaptable to environmental changes. The paper presents an adaptive maneuver planning algorithm based on an evolving behavior tree created with genetic programming. In addition, we make a technical contribution to the Baidu Apollo autonomous driving platform, allowing the platform to test and develop overtaking maneuver planning algorithms.
Article
Full-text available
Planning of collision-free trajectory for robot motion under hard constraints and unpredictable environment is a difficult issue. To cope with this problem, this paper presents a novel replanning method based on receding horizon control and asymptotically optimal single-query motion planning. The approach, called the horizon-based lazy optimal rapidly-exploring random tree algorithm, enables real-time replanning in both static and dynamic environment with updated information. Previous feasible solutions are fully considered to generate new plans. Contributions include lazy steering and lazy collision checking search tree, forward tree pruning and sampling distribution online learning. The techniques are proven to be efficient, near optimal and fast responsive to change. Moreover, three experiments are performed to test the properties of the proposed algorithm numerically.
Article
Full-text available
This paper addresses two challenges facing sampling-based kinodynamic motion planning: a way to identify good candidate states for local transitions and the subsequent computationally intractable steering between these candidate states. Through the combination of sampling-based planning, a Rapidly Exploring Randomized Tree (RRT) and an efficient kinodynamic motion planner through machine learning, we propose an efficient solution to long-range planning for kinodynamic motion planning. First, we use deep reinforcement learning to learn an obstacle-avoiding policy that maps a robot's sensor observations to actions, which is used as a local planner during planning and as a controller during execution. Second, we train a reachability estimator in a supervised manner, which predicts the RL policy's time to reach a state in the presence of obstacles. Lastly, we introduce RL-RRT that uses the RL policy as a local planner, and the reachability estimator as the distance function to bias tree-growth towards promising regions. We evaluate our method on three kinodynamic systems, including physical robot experiments. Results across all three robots tested indicate that RL-RRT outperforms state of the art kinodynamic planners in efficiency, and also provides a shorter path finish time than a steering function free method. The learned local planner policy and accompanying reachability estimator demonstrate transferability to the previously unseen experimental environments, making RL-RRT fast because the expensive computations are replaced with simple neural network inference.
Article
Full-text available
This paper presents an online kinodynamic motion planning algorithmic framework using asymptotically optimal rapidly-exploring random tree (RRT*) and continuous-time Q-learning, which we term as RRT-Q*. We formulate a model-free Q-based advantage function and we utilize integral reinforcement learning to develop tuning laws for the online approximation of the optimal cost and the optimal policy of continuous-time linear systems. Moreover, we provide rigorous Lyapunov-based proofs for the stability of the equilibrium point, which results in asymptotic convergence properties. A terminal state evaluation procedure is introduced to facilitate the online implementation. We propose a static obstacle augmentation and a local replanning framework, which are based on topological connectedness, to locally recompute the robot's path and ensure collision-free navigation. We perform simulations and a qualitative comparison to evaluate the efficacy of the proposed methodology.
Article
Full-text available
Intelligent vehicles have increased their capabilities for highly and, even fully, automated driving under controlled environments. Scene information is received using onboard sensors and communication network systems, i.e., infrastructure and other vehicles. Considering the available information, different motion planning and control techniques have been implemented to autonomously driving on complex environments. The main goal is focused on executing strategies to improve safety, comfort, and energy optimization. However, research challenges such as navigation in urban dynamic environments with obstacle avoidance capabilities, i.e., vulnerable road users (VRU) and vehicles, and cooperative maneuvers among automated and semi-automated vehicles still need further efforts for a real environment implementation. This paper presents a review of motion planning techniques implemented in the intelligent vehicles literature. A description of the technique used by research teams, their contributions in motion planning, and a comparison among these techniques is also presented. Relevant works in the overtaking and obstacle avoidance maneuvers are presented, allowing the understanding of the gaps and challenges to be addressed in the next years. Finally, an overview of future research direction and applications is given.
Article
Full-text available
Dynamic environments have obstacles that unpredictably appear, disappear, or move. We present the first sampling-based replanning algorithm that is asymptotically optimal and single-query (designed for situation in which a priori offline computation is unavailable). Our algorithm, RRTX, refines and repairs the same search-graph over the entire duration of navigation (in contrast to previous single-query replanning algorithms that prune and then regrow some or all of the search-tree). Whenever obstacles change and/or the robot moves, a graph rewiring cascade quickly remodels the existing search-graph and repairs its shortest-path-to-goal sub-tree to reflect the new information. Both graph and tree are built directly in the robot’s state-space; thus, the resulting plan(s) respect the kinematics of the robot and continue to improve during navigation. RRTX is probabilistically complete and makes no distinction between local and global planning, yet it reacts quickly enough for real-time high-speed navigation through unpredictably changing environments. Low information transfer time is essential for enabling RRTX to react quickly in dynamic environments; we prove that the information transfer time required to inform a graph of size n about an ε-cost decrease is O(n log n) for RRTX—faster than other current asymptotically optimal single-query algorithms (we prove RRT* is and RRT# is (n log2n)). In static environments RRTX has the same amortized runtime as RRT and RRT*, Θ(log n), and is faster than RRT#, (log2n). In order to achieve O(log n) iteration time, each node maintains a set of O(log n) expected neighbors, and the search-graph maintains ε-consistency for a predefined ε. Experiments and simulations confirm our theoretical analysis and demonstrate that RRTX is useful in both static and dynamic environments.
Conference Paper
Full-text available
Robotic path planning in static environments is a thoroughly studied problem that can typically be solved very efficiently. However, planning in the presence of dynamic obstacles is still computationally challenging because it requires adding time as an additional dimension to the search-space explored by the planner. In order to avoid the increase in the dimensionality of the planning problem, most real-time approaches to path planning treat dynamic obstacles as static and constantly re-plan as dynamic obstacles move. Although gaining efficiency, these approaches sacrifice optimality and even completeness. In this paper, we develop a planner that builds on the observation that while the number of safe timesteps in any configuration may be unbounded, the number of safe time intervals in a configuration is finite and generally very small. A safe interval is a time period for a configuration with no collisions and if it were extended one timestep in either direction, it would then be in collision. The planner exploits this observation and constructs a search-space with states defined by their configuration and safe interval, resulting in a graph that generally only has a few states per configuration. On the theoretical side, we show that our planner can provide the same optimality and completeness guarantees as planning with time as an additional dimension. On the experimental side, in simulation tests with up to 200 dynamic obstacles, we show that our planner is significantly faster, making it feasible to use in real-time on robots operating in large dynamic environments. We also ran several real robot trials on the PR2, a mobile manipulation platform.
Conference Paper
Full-text available
The Rapidly-exploring Random Tree (RRT) al- gorithm, based on incremental sampling, efficiently computes motion plans. Although the RRT algorithm quickly produces candidate feasible solutions, it tends to converge to a solution that is far from optimal. Practical applications favor "anytime" algorithms that quickly identify an initial feasible plan, then, given more computation time available during plan execution, improve the plan toward an optimal solution. This paper describes an anytime algorithm based on the RRT which (like the RRT) finds an initial feasible solution quickly, but (unlike the RRT) almost surely converges to an optimal solution. We present two key extensions to the RRT , committed trajectories and branch-and-bound tree adaptation, that together enable the algorithm to make more efficient use of computation time online, resulting in an anytime algorithm for real-time implementation. We evaluate the method using a series of Monte Carlo runs in a high-fidelity simulation environment, and compare the operation of the RRT and RRT methods. We also demonstrate experimental results for an outdoor wheeled robotic vehicle.
Article
Full-text available
This paper describes the dynamic window approach to reactive collision avoidance for mobile robots equipped with synchro-drives. The approach is derived directly from the motion dynamics of the robot and is therefore particularly well-suited for robots operating at high speed. It differs from previous approaches in that the search for commands controlling the translational and rotational velocity of the robot is carried out directly in the space of velocities. The advantage of our approach is that it correctly and in an elegant way incorporates the dynamics of the robot. This is done by reducing the search space to the dynamic window, which consists of the velocities reachable within a short time interval. Within the dynamic window the approach only considers admissible velocities yielding a trajectory on which the robot is able to stop safely. Among these velocities the combination of translational and rotational velocity is chosen by maximizing an objective function. The objective fu...
Chapter
This paper addresses the autonomous parking for a vehicle in environments with static and dynamic obstacles. Although parking maneuvering has reached the level of fully automated valet parking, there are still many challenges to realize the parking motion planning in the presence of dynamic obstacles. One of the most famous autonomous driving platforms is the Baidu Apollo platform. In the Apollo platform, this problem is solved using the classic method hybrid A*. However, this method has two main downsides. Firstly, it generates in some parking scenarios, trajectories that consist of many partitions with different gear types and sizes. Such trajectories are intractable by a self-driving car when testing the Apollo planner on more realistic data coming from a simulator such as SVL. Secondly, the built-in algorithm does not have the ability to interact with dynamic obstacles, which might lead to a collision in some critical parking scenarios. To overcome these issues, we proposed a method based on reinforcement learning, which uses the RL-policy (from POLAMP) allowing us to take into account the kinematic constraints of the vehicle, static and dynamic obstacles. The proposed method was fully integrated into the Apollo platform with developed Cyber RT nodes, which were used for publishing the parking trajectory from our algorithm to the SVL simulator through a ROS/Cyber bridge. The final model demonstrates transferability to the previously unseen experimental environments and flexibility with respect to built-in hybrid A*.
Article
Path finding is a well-studied problem in AI, which is often framed as graph search. Any-angle path finding is a technique that augments the initial graph with additional edges to build shorter paths to the goal. Indeed, optimal algorithms for any-angle path finding in static environments exist. However, when dynamic obstacles are present and time is the objective to be minimized, these algorithms can no longer guarantee optimality. In this work, we elaborate on why this is the case and what techniques can be used to solve the problem optimally. We present two algorithms, grounded in the same idea, that can obtain provably optimal solutions to the considered problem. One of them is a naive algorithm and the other one is much more involved. We conduct a thorough empirical evaluation showing that, in certain setups, the latter algorithm might be as fast as the previously-known greedy non-optimal solver while providing solutions of better quality. In some (rare) cases, the difference in cost is up to 76%, while on average it is lower than one percent (the same cost difference is typically observed between optimal and greedy any-angle solvers in static environments).
Chapter
Local planner makes a trajectory physically executable for an agent. Open Space Planner of Apollo framework based on nonlinear optimization methods smooths the trajectory received from a global planner. Such dependency on a global planner forces an agent to relaunch both planners when local changes occur (e.g., when an environment has dynamic obstacles), what can waste too much time. In this article, we consider a different approach which is based on reinforcement learning. This method allows agent generate a trajectory using information about environment (the current and goal state, lidar sensors, etc.). Experiments conducted on the simplified environment show that such algorithm can be implemented as the local planner in Apollo infrastructure.
Article
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy - that is, succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.
Article
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.
Article
Existing counting methods often adopt regression-based approaches and thus cannot precisely localize the target objects, which hinders the further applications and analysis (e.g., high-level understanding and fine-grained classification). In addition, most of prior work mainly focus on counting objects in the static environments with fixed cameras. Motivated by the advent of unmanned flying vehicles (i.e., drone), we are interested in detecting and counting objects in such dynamic environments. We propose a Layout Proposal Networks (LPNs) and spatial kernels to simultaneously count and localize target objects (e.g., cars) in the Drone videos. Different from the conventional region proposal methods, we leverage the spatial layout information (e.g., cars often park regularly) and introduce the spatially regularized constraints into our network to improve the localization accuracy. To evaluate our counting method, we present a new large-scale car parking lot dataset (CARPK) that contains nearly 90,000 cars captured from different parking lots. To the best of our knowledge, it is the first and the largest drone view dataset that supports object counting, and provides the bounding box annotations.
Article
Self-driving vehicles are a maturing technology with the potential to reshape mobility by enhancing the safety, accessibility, efficiency, and convenience of automotive transportation. Safety-critical tasks that must be executed by a self-driving vehicle include planning of motions through a dynamic environment shared with other vehicles and pedestrians, and their robust executions via feedback control. The objective of this paper is to survey the current state of the art on planning and control algorithms with particular regard to the urban setting. A selection of proposed techniques is reviewed along with a discussion of their effectiveness. The surveyed approaches differ in the vehicle mobility model used, in assumptions on the structure of the environment, and in computational requirements. The side-by-side comparison presented in this survey helps to gain insight into the strengths and limitations of the reviewed approaches and assists with system level design choices.
Article
We present an approach for asymptotically optimal motion planning for kinodynamic systems with arbitrary nonlinear dynamics amid obstacles. Optimal sampling-based planners like RRT∗, FMT∗, and BIT∗ when applied to kinodynamic systems require solving a two-point boundary value problem (BVP) to perform exact connections between nodes in the tree. Two-point BVPs are non-trivial to solve, hence the prevalence of alternative approaches that focus on specific instances of kinodynamic systems, use approximate solutions to the two-point BVP, or use random propagation of controls. In this work, we explore the feasibility of exploiting recent advances in numerical optimal control and optimization to solve these two-point BVPs for arbitrary kinodynamic systems and how they can be integrated with existing optimal planning algorithms. We combine BIT∗ with a two-point BVP solver that uses sequential quadratic programming (SQP). We consider the problem of computing minimum-time trajectories. Since the duration of trajectories is not known a-priori, we include the time-step as part of the optimization to allow SQP to optimize over the duration of the trajectory while keeping the number of discrete steps fixed for every connection attempted. Our experiments indicate that using a two-point BVP solver in the inner-loop of BIT∗ is competitive with the state-of-the-art in sampling-based optimal planning that explicitly avoids the use of two-point BVP solvers.
Article
In the present work the problem of exponential stabilization of the kinematic and dynamic model of a simple wheeled mobile robot is addressed and solved using a discontinuous, bounded, time invariant, state feedback control law. The properties of the closed-loop system are studied in detail and its performance in presence of model errors and noisy measurements are evaluated and discussed.
Article
Sampling-based planning algorithms are efficient practical solutions to motion planning challenges. Existing algorithms such as PRM* and RRT* take advantages of random geometric graph theory to answer motion planning queries. This theory requires solving the two-point boundary value problem (BVP) in the state space, which is generally considered to be difficult and impractical. This work presents a different theory of asymptotic optimality. It fills in the gap between optimal kinodynamic planning problems and sampling-based algorithms. The resulting contributions explain some open problems, e.g., the existence of BVP-free asymptotically optimal sampling-based kinodynamic algorithms, properties of a previously proposed heuristical algorithm RRT-BestNear. This work further presents new algorithms Stable Sparse-RRT (SST) and SST*. Analysis and experimental results show that SST and SST* are efficient, general, BVP-free, sampling-based optimal kinodynamic planning algorithms that are practical for a great variety of physical systems.
Conference Paper
We present Kinodynamic RRT*, an incremental sampling-based approach for asymptotically optimal motion planning for robots with linear dynamics. Our approach extends RRT*, which was introduced for holonomic robots [10], by using a fixed-final-state-free-final-time controller that optimally connects any pair of states, where the cost function is expressed as a trade-off between the duration of a trajectory and the expended control effort. Our approach generalizes earlier work on RRT* for kinodynamic systems, as it guarantees asymptotic optimality for any system with controllable linear dynamics, in state spaces of any dimension. In addition, we show that for the rich subclass of systems with a nilpotent dynamics matrix, closed-form solutions for optimal trajectories can be derived, which keeps the computational overhead of our algorithm compared to traditional RRT* at a minimum. We demonstrate the potential of our approach by computing asymptotically optimal trajectories in three challenging motion planning scenarios: (i) a planar robot with a 4-D state space and double integrator dynamics, (ii) an aerial vehicle with a 10-D state space and linearized quadrotor dynamics, and (iii) a car-like robot with a 5-D state space and non-linear dynamics.
Conference Paper
Incremental sampling-based motion planning algorithms such as the Rapidly-exploring Random Trees (RRTs) have been successful in efficiently solving computationally challenging motion planning problems involving complex dynamical systems. A recently proposed algorithm, called the RRT*, also provides asymptotic optimality guarantees, i.e., almost-sure convergence to optimal trajectories (which the RRT algorithm lacked) while maintaining the computational efficiency of the RRT algorithm. In this paper, time-optimal maneuvers for a high-speed off-road vehicle taking tight turns on a loose surface are studied using the RRT* algorithm. Our simulation results show that the aggressive skidding maneuver, usually called the trail-braking maneuver, naturally emerges from the RRT* algorithm as the minimum-time trajectory. Along the way, we extend the RRT* algorithm to handle complex dynamical systems, such as those that are described by nonlinear differential equations and involve high-dimensional state spaces, which may be of independent interest. We also exploit the RRT* as an anytime computation framework for nonlinear optimization problems.
Conference Paper
In this paper we describe a novel and simple to implement yet effective lattice design algorithm, which simultaneously produces input and state-space sampled lattice graphs. The presented method is an extension to the ideas suggested by Bicchi et al. on input lattices and is applicable to systems which can be brought into (2,n) chained form, such as kinematic models of unicycles, bicycles, differential-drive robots and car-like vehicles (pulling several trailers). We further show that a transformation from chained form to path coordinates allows the resulting lattice to be bent along any C1 continuous path. We exploit this fact by shaping it along the skeleton of arbitrary structured environments, such as the center of road lanes and corridors. In our experiments in both structured (i.e. on-road) and unstructured (i.e. parking lot) scenarios, we successfully demonstrate for the first time the applicability of lattice-based planning approaches to search queries in arbitrary environments.
Conference Paper
We present a method for motion planning in the presence of moving obstacles that is aimed at dynamic on-road driving scenarios. Planning is performed within a geometric graph that is established by sampling deterministically from a manifold that is obtained by combining configuration space and time. We show that these graphs are acyclic and shortest path algorithms with linear runtime can be employed. By reparametrising the configuration space to match the course of the road, it can be sampled very economically with few vertices, and this reduces absolute runtime further. The trajectories generated are quintic splines. They are second order continuous, obey nonholonomic constraints and are optimised for minimum square of jerk. Planning time remains below 20 ms on general purpose hardware.
Article
In this paper, we present an algorithm for generating complex dynamically-feasible maneuvers for autonomous vehicles traveling at high speeds over large distances. Our approach is based on performing anytime incremental search on a multi-resolution, dynamically-feasible lattice state space. The resulting planner provides real-time performance and guarantees and control of the sub-optimality of its solution. We provide theoretical properties and experimental results from an implementation on an autonomous passenger vehicle that competed in, and completed, the Urban Challenge.
Article
Although the problem of determining the minimum cost path through a graph arises naturally in a number of interesting applications, there has been no underlying theory to guide the development of efficient search procedures. Moreover, there is no adequate conceptual framework within which the various ad hoc search strategies proposed to date can be compared. This paper describes how heuristic information from the problem domain can be incorporated into a formal mathematical theory of graph searching and demonstrates an optimality property of a class of search strategies.
Article
We introduce the concept of a Rapidly-exploring Random Tree (RRT) as a randomized data structure that is designed for a broad class of path planning problems. While they share many of the beneficial properties of existing randomized planning techniques, RRTs are specifically designed to handle nonholonomic constraints (including dynamics) and high degrees of freedom. An RRT is iteratively expanded by applying control inputs that drive the system slightly toward randomly-selected points, as opposed to requiring point-to-point convergence, as in the probabilistic roadmap approach. Several desirable properties and a basic implementation of RRTs are discussed. To date, we have successfully applied RRTs to holonomic, nonholonomic, and kinodynamic planning problems of up to twelve degrees of freedom.
Maximum a posteriori policy optimisation
  • A Abdolmaleki
  • J T Springenberg
  • Y Tassa
  • R Munos
  • N Heess
  • M Riedmiller
Muesli: Combining improvements in policy optimization
  • Hessel
Maximum a posteriori policy optimisation
  • Abdolmaleki