Article

A Time-Domain Wavefront Computing Accelerator With a 32 ×\times 32 Reconfigurable PE Array

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This work presents a hardware accelerator realizing true time-domain wavefront computing in a massive parallel two-dimensional (2-D) processing element (PE) array. The proposed 2-D time-domain PE array is designed for multiple applications based on its scalable and reconfigurable architecture. The shortest path problem (a classical problem in graph theory) is one of the critical problems to solve using the proposed accelerator. Unlike the A {}^\ast search algorithm, a heuristic method widely used in shortest path searching problems, the proposed accelerator requires only the propagation of rising-edge signals through the PE array without calculating or estimating the distances from the start to the goal. Hence, a single execution of the proposed time-domain wavefront computing provides all the optimal paths from a start point to an arbitrary goal. Besides the King’s graph model used for solving the shortest path searching, the PE array is reconfigured to a simpler lattice graph model and solves other problems, such as maze solving we used in this article as a benchmark. In addition, we used the proposed accelerator to demonstrate a scientific simulation. The propagation of circular or planar wavefronts was simulated using single or multiple start points using King’s graph configuration. A 1 ×\times 1 mm 2^{2} test chip with a 32 ×\times 32 reconfigurable time-domain PE array is fabricated using a 65-nm process. For a 2-D map with 32 ×\times 32 vertices, the proposed PE array consumes 776 pJ per task and achieves 1.6 G edges/second search rate using 1.2-/1.0-V core supply voltages.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
Artificial intelligence (AI) and machine learning (ML) are revolutionizing many fields of study, such as visual recognition, natural language processing, autonomous vehicles, and prediction. Traditional von-Neumann computing architecture with separated processing elements and memory devices have been improving their computing performances rapidly with the scaling of process technology. However, in the era of AI and ML, data transfer between memory devices and processing elements becomes the bottleneck of the system. To address this data movement issue, memory-centric computing takes an approach of merging the memory devices with processing elements so that computations can be done in the same location without moving any data. Processing-In-Memory (PIM) has attracted research community’s attention because it can improve the energy efficiency of memory-centric computing systems substantially by minimizing the data movement. Even though the benefits of PIM are well accepted, its limitations and challenges have not been investigated thoroughly. This paper presents a comprehensive investigation of state-of-the-art PIM research works based on various memory device types, such as static-random-access-memory (SRAM), dynamic-random-access-memory (DRAM), and resistive memory (ReRAM). We will present the overview of PIM designs in each memory type, covering from bit cells, circuits, and architecture. Then, a new software stack standard and its challenges for incorporating PIM with the conventional computing architecture will be discussed. Finally, we will discuss various future research directions in PIM for further reducing the data conversion overhead, improving test accuracy, and minimizing intra-memory data movement.
Conference Paper
Full-text available
Many emerging real-world applications require fast processing of large-scale data represented in the form of graphs. In this paper, we design a Field-Programmable Gate Array (FPGA) framework to accelerate graph algorithms based on the edge-centric paradigm. Our design is flexible for accelerating general graph algorithms with various vertex attributes and update propagation functions, such as Sparse Matrix Vector Multiplication (SpMV), PageRank (PR), Single Source Shortest Path (SSSP), and Weakly Connected Component (WCC). The target platform consists of large external memory to store the graph data and FPGA to accelerate the processing. By taking an edge-centric graph algorithm and hardware resource constraints as inputs, our framework can determine the optimal design parameters and produce an optimized Register-Transfer Level (RTL) FPGA accelerator design. To improve data locality and increase parallelism, we partition the input graph into non-overlapping partitions. This enables our framework to efficiently buffer vertex data in the on-chip memory of FPGA and exploit both inter-partition and intra-partition parallelism. Further, we propose an optimized data layout to improve external memory performance and reduce data communication between FPGA and external memory. Based on our design methodology, we accelerate two fundamental graph algorithms for performance evaluation: Sparse Matrix Vector Multiplication (SpMV) and PageRank (PR). Experimental results show that our accelerators sustain a high throughput of up to 2250 Million Traversed Edges Per Second (MTEPS) and 2487 MTEPS for SpMV and PR, respectively. Compared with several highly-optimized multi-core designs, our FPGA framework achieves up to 20.5× speedup for SpMV, and 17.7× speedup for PR, respectively; compared with two state-of-the-art FPGA frameworks, our designs demonstrate up to 5.3× and 1.8× throughput improvement for SpMV and PR, respectively.
Conference Paper
Full-text available
Rapidly Exploring Random Tree (RRT) is one of the quickest and the most efficient obstacle free path finding algorithm. However, it cannot guarantee finding the most optimal path. A recently proposed extension of RRT, known as Rapidly Exploring Random Tree Star (RRT*), claims to achieve convergence towards the optimal solution but has been proven to take an infinite time to do so and with a slow convergence rate. To overcome these limitations, we propose an extension of RRT*, called RRT*-Smart, which aims to accelerate its rate of convergence and to reach an optimum or near optimum solution at a much faster rate and at a reduced execution time. Our novel algorithm inculcates two new techniques in RRT*: these are path optimization and intelligent sampling. Simulation results presented in various obstacle cluttered environments confirm the efficiency of RRT*-Smart.
Conference Paper
Full-text available
Path planning field for autonomous mobile robot is an optimization problem that involves computing a collision-free path between initial location and goal location. In this paper, we present an improved artificial potential field based regression search (Improved APF-based RS) method which can obtain a global sub-optimal/optimal path efficiently without local minima and oscillations in complete known environment information. We redefine potential functions to eliminate non-reachable and local minima problems, and utilize virtual local target for robot to escape oscillations. Due to the planned path by improved APF is not the shortest/approximate shortest trajectory, we develop a regression search (RS) method to optimize the planned path. The optimization path is calculated by connecting the sequential points which produced by improved APF. Amount of simulations demonstrate that the improved APF method very easily escape from local minima and oscillatory movements. Moreover, the simulation results confirm that our proposed path planning approach could always calculate a more global optimal/near-optimal, collision-free and safety path to its destination compare with general APF. That proves our improved APF-based RS method very feasibility and efficiency to solve path planning which is a NP-hard problem for autonomous mobile robot. Index Terms – Autonomous mobile robot. Path planning. Artificial potential field. Regression search.
Conference Paper
Full-text available
Single-query sampling-based motion planners are an efficient class of algorithms widely used today to solve challenging motion planning problems. This paper exposes the common core of these planners and presents a tutorial for their implementation. A set of ideas extracted from algorithms existing in the literature is presented. In addition, lower level implementation details that are often skipped in papers due to space limitations are discussed. The purpose of the paper is to improve our understanding of single-query sampling-based motion planners and motivate our community to explore avenues of research that lead to significant improvements of such algorithms.
Article
Full-text available
We study path planning on grids with blocked and unblocked cells. Any-angle path-planning algorithms find short paths fast because they propagate information along grid edges without constraining the resulting paths to grid edges. Incremental path-planning algorithms solve a series of similar path-planning problems faster than repeated single-shot searches because they reuse information from the previous search to speed up the next one. In this paper, we combine these ideas by making the any-angle path-planning algorithm Basic Theta* incremental. This is non-trivial because Basic Theta* does not fit the standard assumption that the parent of a vertex in the search tree must also be its neighbor. We present Incremental Phi* and show experimentally that it can speed up Basic Theta* by about one order of magnitude for path planning with the freespace assumption.
Article
This article presents an all-digital hardware accelerator for solving partial differential equations using the finite difference method (FDM) with dynamically reconfigurable computing bit precision. The proposed accelerator consists of 21 ×\times 21 bit-serial processing elements (PEs) to compute 2-D grid solutions with massive parallelism. The 21 ×\times 21 bit-serial PEs are connected in a lattice structure, and a PE communicates with four neighboring PEs to update the grid solutions. A PE comprises four key building blocks: a bit-serial adder, a shift register, 4:1 multiplexers, and an accumulator. The proposed hardware accelerator minimizes data movement based on its array architecture that directly maps a 2-D grid of the FDM. Besides, the proposed residue-based bit-serial computation method lowers energy consumption and latency. The checkerboard update method further improves the performance by updating the solutions in two cycles regardless of the grid size. A test chip is fabricated using 65 nm, and a 21 ×\times 21 PE array occupies 0.462 mm2. The measured energy consumption is 1.59 nJ per iteration at 16 bit, 1 V, and 25.6 MHz.
Article
A* search is a fundamental topic in artificial intelligence. Recently, the general purpose computation on graphics processing units (GPGPU) has been widely used to accelerate numerous computational tasks. In this paper, we propose the first parallel variant of the A* search algorithm such that the search process of an agent can be accelerated by a single GPU processor in a massively parallel fashion. Our experiments have demonstrated that the GPU-accelerated A* search is efficient in solving multiple real-world search tasks, including combinatorial optimization problems, pathfinding and game solving. Compared to the traditional sequential CPU-based A* implementation, our GPU-based A* algorithm can achieve a significant speedup by up to 45x on large-scale search problems.
Article
In this work, we present a novel 8T static random access memory (SRAM)-based compute-in-memory (CIM) macro for processing neural networks with high energy efficiency. The proposed 8T bitcell is free from disturb issues thanks to the decoupled read channels by adding two extra transistors to the standard 6T bitcell. A 128 ×\times 128 8T SRAM array offers massively parallel binary multiply and accumulate (MAC) operations with 64 ×\times binary inputs (0/1) and 64 ×\times 128 binary weights (+1/–1). After parallel MAC operations, 128 column-based neurons generate 128 ×\times 1–5 bit outputs in parallel. The proposed column-based neuron comprises 64 ×\times bitcells for dot-product, 32 ×\times bitcells for analog-to-digital converter (ADC), and 32 ×\times bitcells for offset calibration. The column ADC with 32 ×\times replica SRAM bitcells converts the analog MAC results (i.e., a differential read bitline (RBL/RBLb) voltage) to the 1–5 bit output code by sweeping their reference levels in 1–31 cycles (i.e., 2N2^{N} –1 cycles for N -bit ADC). The measured linearity results [differential nonlinearity (DNL) and integral nonlinearity (INL)] are +0.314/–0.256 least significant bit (LSB) and + 0.27/–0.116 LSB, respectively, after offset calibration. The simulated image classification results are 96.37% for Mixed National Institute of Standards and Technology database (MNIST) using a multi-layer perceptron (MLP) with two hidden layers, 87.1%/82.66% for CIFAR-10 using VGG-like/ResNet-18 convolutional neural networks (CNNs), demonstrating slight accuracy degradations (0.67%–1.34%) compared with the software baseline. A test chip with a 16K 8T SRAM bitcell array is fabricated using a 65-nm process. The measured energy efficiency is 490–15.8 TOPS/W for 1–5 bit ADC resolution using 0.45-/0.8-V core supply.
Article
No existing algorithms can find exact solutions to the combinatorial optimization problems (COPs) classified as non-deterministic polynomial-time (NP) hard problems. Alternatively, Ising computer based on the Ising model and annealing process has recently drawn significant attention. The Ising computers can find approximate solutions to the NP-hard COPs by observing the convergence of dynamic spin states. However, they have encountered challenges in mapping the optimization problems to the inflexible Ising computers with fixed spin interconnects. In this article, we propose a scalable CMOS Ising computer with sparse and reconfigurable spin interconnects for arbitrary mapping of spin networks with minimal overhead. Without a mapping algorithm, the proposed Ising computer provides a method for directly mapping COPs to the reconfigurable hardware. A 65-nm CMOS Ising test chip with 252 spins is fabricated and used for solving COPs, including max-cut problems.
Article
This article (Colonnade) presents a fully digital bit-serial compute-in-memory (CIM) macro. The digital CIM macro is designed for processing neural networks with reconfigurable 1–16 bit input and weight precisions based on bit-serial computing architecture and a novel all-digital bitcell structure. A column of bitcells forms a column MAC and used for computing a multiply-and-accumulate (MAC) operation. The column MACs placed in a row work as a single neuron and computes a dot-product, which is an essential building block of neural network accelerators. Several key features differentiate the proposed Colonnade architecture from the existing analog and digital implementations. First, its full-digital circuit implementation is free from process variation, noise susceptibility, and data-conversion overhead that are prevalent in prior analog CIM macros. A bitwise MAC operation in a bitcell is performed in the digital domain using a custom-designed XNOR gate and a full-adder. Second, the proposed CIM macro is fully reconfigurable in both weight and input precision from 1 to 16 bit. So far, most of the analog macros were used for processing quantized neural networks with very low input/weight precisions, mainly due to a memory density issue. Recent digital accelerators have implemented reconfigurable precisions, but they are inferior in energy efficiency due to significant off-chip memory access. We present a regular digital bitcell array that is readily reconfigured to a 1–16 bit weight-stationary bit-serial CIM macro. The macro computes parallel dot-product operations between the weights stored in memory and inputs that are serialized from LSB to MSB. Finally, the bit-serial computing scheme significantly reduces the area overhead while sacrificing latency due to bit-by-bit operation cycles. Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency (i.e., both benefits of prior analog and digital accelerators) for processing neural networks. A test-chip with 128×128128 \times 128 SRAM-based bitcells for digital bit-serial computing is implemented using 65-nm technology and tested with 1–16 bit weight/input precisions. The measured energy efficiency is 117.3 TOPS/W at 1 bit and 2.06 TOPS/W at 16 bit.
Article
A mixed-signal, time-based 65-nm application-specific integrated circuit is developed for solving shortest-path problems. Digital circuits are collocated with the memory as intra-memory computing. The core follows similar principles from wave routing and, additionally, incorporates a gradient on the periphery of the core to implement the A* algorithm predicted distance heuristic. A leading pulse is propagated from start nodes and is asynchronously latched in neighboring vertex cells and pushed to its four neighbors. Applications include collision avoidance for self-driving cars, shortest path planning, and scientific computing, and are shown to be scalable across many cores. The chip achieves 559 million traversed edges per second at 105 ×\times improved energy efficiency compared with existing platforms such as field-programmable gate array and CPU. The processor operates nominally at 1.79 ns per node with peak power consumption of 26.4 mW.
Article
Autonomous microrobots have been utilized in a wide range of applications. Energy-efficient, real-time path planning for navigation is essential. This work presents a path-planning processor for 2-D/3-D autonomous navigation. Energy and latency are minimized through algorithm-architecture optimization. The processor utilizes the rapidly exploring random tree (RRT) algorithm to ensure efficient planning on maps that have higher dimensions and a higher resolution. Dual-tree planning, branch extension, and parallel expansion are adopted in order to reduce both computational complexity and memory requirements. A prune-and-reuse strategy is also adopted so as to quickly respond to dynamic scenarios. An array of processing engines (PEs) is deployed in order to enable parallel expansion. The number of PEs is minimized through latency analysis. Low-complexity implementation for the PE is proposed while maintaining a high performance. Fabricated in a 40-nm CMOS technology, the chip integrates 2M logic gates in an area of 3.65 mm 2 . The processor supports path-planning tasks for both 2-D and 3-D maps, with latencies of less than 1 and 10 ms, respectively. For a 2-D map that has 100 ×\mathbf {\times } 100 grids, the proposed processor dissipates 1.5 μ\boldsymbol{\mu } /task at a clock frequency of 200 MHz from a 0.9-V supply. Compared with the state-of-the-art designs, the proposed path-planning processor achieves a 1467 ×\times shorter processing latency based on an energy dissipation that is 2133 ×\times lower, despite the capability for larger maps.
Article
A novel 4T2C ternary embedded DRAM (eDRAM) cell is proposed for computing a vector-matrix multiplication in the memory array. The proposed eDRAM-based compute-in-memory (CIM) architecture addresses a well-known Von Neumann bottle-neck in the traditional computer architecture and improves both latency and energy in processing neural networks. The proposed ternary eDRAM cell takes a smaller area than prior SRAM-based bitcells using 6–12 transistors. Nevertheless, the compact eDRAM cell stores a ternary state (−1, 0, or +1), while the SRAM bitcells can only store a binary state. We also present a method to mitigate the compute accuracy degradation issue due to device mismatches and variations. Besides, we extend the eDRAM cell retention time to 200 μs200~\mu \text{s} by adding a custom metal capacitor at the storage node. With the improved retention time, the overall energy consumption of eDRAM macro, including a regular refresh operation, is lower than most of prior SRAM-based CIM macros. A 128×128128\times 128 ternary eDRAM macro computes a vector-matrix multiplication between a vector with 64 binary inputs and a matrix with 64×12864\times 128 ternary weights. Hence, 128 outputs are generated in parallel. Note that both weight and input bit-precisions are programmable for supporting a wide range of edge computing applications with different performance requirements. The bit-precisions are readily tunable by assigning a variable number of eDRAM cells per weight or adding multiple pulses to input. An embedded column ADC based on replica cells sweeps the reference level for 2N12^{\mathrm {N}}-1 cycles and converts the analog accumulated bitline voltage to a 1-5bit digital output. A critical bitline accumulate operation is simulated (Monte-Carlo, 3K runs). It shows the standard deviation of 2.84% that could degrade the classification accuracy of the MNIST dataset by 0.6% and the CIFAR-10 dataset by 1.3% versus a baseline with no variation. The simulated energy is 1.81fJ/operation, and the energy efficiency is 552.5-17.8TOPS/W (for 1-5bit ADC) at 200MHz using 65nm technology.
Article
Time-series classification (TSC) is a challenging problem in machine learning and significant efforts have been made to improve its speed and computation efficiency. Among various approaches, dynamic time warping (DTW) algorithm is one of the most prevalent methods for TSC due to its succinctness and generality. To improve the throughput of the operation, this work presents a mixed-signal DTW accelerator utilizing mixed-signal time-domain (TD) computing where signals are encoded and processed using time pulses. A pipelined operation is enabled by a specially designed time flip-flop (TFF) circuit leading to dramatic improvements in performance and scalability of the operation. A 65-nm CMOS test chip was implemented and measured. The results show more than 9×9\times improvements in throughput compared with prior work on TSC. As most existing TD designs suffer from the lack of TD storage elements, this work utilizes sequential circuit elements in TD computing extending the capability of time-based circuits.
Article
This paper presents, HitGraph, an FPGA framework to accelerate graph processing based on the edge-centric paradigm. HitGraph takes in an edge-centric graph algorithm and hardware resource constraints, determines design parameters and then generates a Register Transfer Level (RTL) FPGA design. This makes accelerator design for various graph analytics transparent and user-friendly by masking internal details of the accelerator design process. HitGraph enables increased data reuse and parallelism through novel algorithmic optimizations, including (1) an optimized data layout that reduces non-sequential external memory accesses, (2) an efficient update merging and filtering scheme to reduce the data communication between the FPGA and external memory, and (3) a partition skipping scheme to reduce redundant edge traversals for non-stationary graph algorithms. Based on our design methodology, we accelerate Sparse Matrix Vector Multiplication (SpMV), PageRank (PR), Single Source Shortest Path (SSSP), and Weakly Connected Component (WCC). Experimental results show that HitGraph sustains a high throughput of 2076 Million Traversed Edges Per Second (MTEPS) for SpMV, 2225 MTEPS for PR, 2916 MTEPS for SSSP, and 3493 MTEPS for WCC, respectively. Compared with highly-optimized multi-core implementations, HitGraph achieves up to 37.9× speedup. Compared with state-of-the-art FPGA frameworks, HitGraph achieves up to 50.7× throughput improvement.
Article
Time-domain computing (TC) has drawn significant attention recently due to its highly efficient computation for applications such as image processing and neural network computing. This paper presents novel time-domain circuit techniques, including: 1) double-encoding strategy; 2) bit-scalable design that accelerates the performance compared with previous linear coding; and 3) shared time generator (TG) with variation-aware design technique which significantly improves the error tolerance of TC. A feature-extraction and vector-quantization processor accelerated by TC has been developed for real-time image recognition. A 55-nm prototype chip shows 72-fps/core (at 1.33 GHz) operation with up to 42% area and power saving from TC compared to the conventional digital implementation.
Conference Paper
Micro robots with artificial intelligence (AI) are being investigated for many applications, such as unmanned delivery services. The robots have enhanced controllers that realize AI functions, such as perception (information extraction) and cognition (decision making). Historically, controllers have been based on general-purpose CPUs, and only recently, a few perception SoCs have been reported. SoCs with cognition capability have not been reported thus far, even though cognition is a key AI function in micro robots for decision making, especially autonomous drones. Path planning and obstacle avoidance require more than 10,000 searches within 50ms for a fast response, but a software implementation running on a Cortex-M3 takes ~5s to make decisions. Micro robots require 10× lower power and 100× faster decision making than conventional robots because of their fast movement in the environment, small form factor, and limited battery capacity. Therefore, an ultra-low-power high-performance artificial-intelligence processor (AIP) is necessary for micro robots to make fast and smart maneuvers in dynamic environments filled with obstacles.
Conference Paper
We propose a novel computing approach, dubbed “Race Logic”, in which information, instead of being represented as logic levels, as is done in conventional logic, is represented as a timing delay. Under this new information representation, computations can be performed by observing the relative propagation times of signals injected into the circuit (i.e. the outcome of races). Race Logic is especially suited for solving problems related to the traversal of directed acyclic graphs commonly used in dynamic programming algorithms. The main advantage of this novel approach is that information processing (min-max and addition operations) can be very efficiently expressed through the manipulation of the natural delay chaining inherent to digital designs, which then results in superior latency, throughput, and energy efficiency. To verify this hypothesis, we designed several Race Logic implementations of a DNA global sequence alignment engine and compared it to the state-of-the-art conventional systolic array implementation. Our synthesized design shows that synchronous Race Logic is up to 4× faster when both approaches are mapped to a 0.5μm CMOS standard cell technology. At the same time the throughput for sequence matching per circuit area is about 3× higher at 5× lower power density for 20-long-symbol DNA sequences.
Article
During the last decade, sampling-based path planning algorithms, such as Probabilistic RoadMaps (PRM) and Rapidly-exploring Random Trees (RRT), have been shown to work well in practice and possess theoretical guarantees such as probabilistic completeness. However, little effort has been devoted to the formal analysis of the quality of the solution returned by such algorithms, e.g., as a function of the number of samples. The purpose of this paper is to fill this gap, by rigorously analyzing the asymptotic behavior of the cost of the solution returned by stochastic sampling-based algorithms as the number of samples increases. A number of negative results are provided, characterizing existing algorithms, e.g., showing that, under mild technical conditions, the cost of the solution returned by broadly used sampling-based algorithms converges almost surely to a non-optimal value. The main contribution of the paper is the introduction of new algorithms, namely, PRM* and RRT*, which are provably asymptotically optimal, i.e., such that the cost of the returned solution converges almost surely to the optimum. Moreover, it is shown that the computational complexity of the new algorithms is within a constant factor of that of their probabilistically complete (but not asymptotically optimal) counterparts. The analysis in this paper hinges on novel connections between stochastic sampling-based path planning algorithms and the theory of random geometric graphs.
Article
We consider a graph with n vertices, all pairs of which are connected by an edge; each edge is of given positive length. The following two basic problems are solved. Problem 1: construct the tree of minimal total length between the n vertices. (A tree is a graph with one and only one path between any two vertices.) Problem 2: find the path of minimal total length between two given vertices.
Conference Paper
The task of planning trajectories for a mobile robot has received considerable attention in the research literature. Most of the work assumes the robot has a complete and accurate model of its environment before it begins to move; less attention has been paid to the problem of partially known environments. This situation occurs for an exploratory robot or one that must move to a goal location without the benefit of a floorplan or terrain map. Existing approaches plan an initial path based on known information and then modify the plan locally or replan the entire path as the robot discovers obstacles with its sensors, sacrificing optimality or computational efficiency respectively. This paper introduces a new algorithm, D*, capable of planning paths in unknown, partially known, and changing environments in an efficient, optimal, and complete manner
Conference Paper
Incremental heuristic search methods use heuristics to focus their search and reuse information from previous searches to find solutions to series of similar search tasks much faster than is possible by solving each search task from scratch. In this paper, we apply Lifelong Planning A* to robot navigation in unknown terrain, including goal-directed navigation in un- known terrain and mapping of unknown terrain. The resulting D* Lite algorithm is easy to understand and analyze. It im- plements the same behavior as Stentz' Focussed Dynamic A* but is algorithmically different. We prove properties about D* Lite and demonstrate experimentally the advantages of combining incremental and heuristic search for the applica- tions studied. We believe that these results provide a strong foundation for further research on fast replanning methods in artificial intelligence and robotics.
Conference Paper
We present a graph-based planning and replanning al- gorithm able to produce bounded suboptimal solutions in an anytime fashion. Our algorithm tunes the quality of its solution based on available search time, at every step reusing previous search efforts. When updated in- formation regarding the underlying graph is received, the algorithm incrementally repairs its previous solu- tion. The result is an approach that combines the bene- fits of anytime and incremental planners to provide ef- ficient solutions to complex, dynamic search problems. We present theoretical analysis of the algorithm, exper- imental results on a simulated robot kinematic arm, and two current applications in dynamic path planning for outdoor mobile robots.
Article
Although the problem of determining the minimum cost path through a graph arises naturally in a number of interesting applications, there has been no underlying theory to guide the development of efficient search procedures. Moreover, there is no adequate conceptual framework within which the various ad hoc search strategies proposed to date can be compared. This paper describes how heuristic information from the problem domain can be incorporated into a formal mathematical theory of graph searching and demonstrates an optimality property of a class of search strategies.
Article
We introduce the concept of a Rapidly-exploring Random Tree (RRT) as a randomized data structure that is designed for a broad class of path planning problems. While they share many of the beneficial properties of existing randomized planning techniques, RRTs are specifically designed to handle nonholonomic constraints (including dynamics) and high degrees of freedom. An RRT is iteratively expanded by applying control inputs that drive the system slightly toward randomly-selected points, as opposed to requiring point-to-point convergence, as in the probabilistic roadmap approach. Several desirable properties and a basic implementation of RRTs are discussed. To date, we have successfully applied RRTs to holonomic, nonholonomic, and kinodynamic planning problems of up to twelve degrees of freedom.
Article
Finding the lowest-cost path through a graph is central to many problems, including route planning for a mobile robot. If arc costs change during the traverse, then the remainder of the path may need to be replanned. This is the case for a sensor-equipped mobile robot with imperfect information about its environment. As the robot acquires additional information via its sensors, it can revise its plan to reduce the total cost of the traverse. If the prior information is grossly incomplete, the robot may discover useful information in every piece of sensor data. During replanning, the robot must either wait for the new path to be computed or move in the wrong direction; therefore, rapid replanning is essential. The D* algorithm (Dynamic A*) plans optimal traverses in real-time by incrementally repairing paths to the robot's state as new information is discovered. This paper describes an extension to D* that focusses the repairs to significantly reduce the total time re...