Particle filter (PF) acceleration in GPU. 

Particle filter (PF) acceleration in GPU. 

Source publication
Article
Full-text available
A particle filter (PF) has been introduced for effective position estimation of moving targets for non-Gaussian and nonlinear systems. The time difference of arrival (TDOA) method using acoustic sensor array has normally been used to for estimation by concealing the location of a moving target, especially underwater. In this paper, we propose a GPU...

Similar publications

Article
Full-text available
In this paper, we propose fast fourier transform networks for object tracking, called FFTNet. FFTNet is a correlation filter (CF)-based tracker that integrates two main components of CF, i.e. auto correlation and cross correlation between the features of two images. Thus, FFTNet takes full advantage of CF: (1) Auto correlation and cross correlation...

Citations

... Thus, using high-precision data types like fp64 or fp32 would be wasteful for both off-chip and on-chip resources including memory bandwidth, registers and ALU usage. Harnessing mixed-precision data ( Figure 1) brings the opportunities of deploying real-time tasks on resource-limited devices [4,5]. ...
Article
Full-text available
GPUs have been broadly used to accelerate big data analytics, scientific computing and machine intelligence. Particularly, matrix multiplication and convolution are two principal operations that use a large proportion of steps in modern data analysis and deep neural networks. These performance-critical operations are often offloaded to the GPU to obtain substantial improvements in end-to-end latency. In addition, multifarious workload characteristics and complicated processing phases in big data demand a customizable yet performant operator library. To this end, GPU vendors, including NVIDIA and AMD, have proposed template and composable GPU operator libraries to conduct specific computations on certain types of low-precision data elements. We formalize a set of benchmarks via CUTLASS, NVIDIA’s templated library that provides high-performance and hierarchically designed kernels. The benchmarking results show that, with the necessary fine tuning, hardware-level ASICs like tensor cores could dramatically boost performance in specific operations like GEMM offloading to modern GPUs.
... Missile Application [11], [12] X -X [13], [14] X -○ [15], [18], [19], [21] ○ Weight computation X [20], [21] ○ Resampling X [16] ○ rendering / normalized X [17], [18], [20] ○ Likelihood calculation X Ours ○ Model propagation ○ To achieve high-speed target tracking, the PFs are accelerated in Compute Unified Device Architecture (CUDA) with a GPU. If the entire process of the PF algorithm is converted to CUDA, all parts regardless of their computation time are converted. ...
... Acceleration studies were conducted using a GPU to perform the PF algorithm in real-time. In [15] and [19], the PF that is parallelized for the weight computation is proposed. A GPU was used to improve the PF estimation for target tracking rather than acceleration [15], And in [19], a GPU was used to accelerate IoT applications. ...
... In [15] and [19], the PF that is parallelized for the weight computation is proposed. A GPU was used to improve the PF estimation for target tracking rather than acceleration [15], And in [19], a GPU was used to accelerate IoT applications. The tracking algorithm was accelerated by approximately 55 % compared to the CPUbased algorithm. ...
Article
Full-text available
This study addresses the problem of real-time tracking of high-speed ballistic targets. Particle filters can be used to overcome the nonlinearity of motion and measurement models in ballistic targets. However, applying particle filters (PFs) to real-time systems is challenging since they generally require a significant computation time. So, most of the existing methods of accelerating PF using a graphics processing unit (GPU) for target tracking applications have accelerated computation weight function and resampling part. However, the computational time per part varies from application to application, and in this work, we confirm that it takes a lot of computational time in the model propagation part and propose accelerated PF by parallelizing the corresponding logic. The real-time performance of the proposed method was tested and analyzed using an embedded system. And compared to conventional PF on the central processing unit (CPU), the proposed method shows that the proposed method significantly reduces computational time by at least 10 times, improving real-time performance.
... However, when the number of particles is enormous, the particle-filter method takes a long time. S. Kim et al. [11] present a method based on the graphics processing unit (GPU) to speed the particle filter operation to estimate the changing location of the target, which is well suited to scenes requiring real-time processing. The findings demonstrate that the speed has improved, and the tracking of the target's quick shift trajectory performs well. ...
Article
Full-text available
Bear time records, which are the accumulations of spatial spectrum estimates on the time axis, are often employed for passive sonar information processing. Multi-target jamming is a common difficulty in this approach due to the constraints of Rayleigh limit, and neither the conventional beamforming (CBF) nor minimum variance distortionless response (MVDR) technique can handle it well. This work presents a post-processing tracking framework based on visual pattern recognition algorithms to track weak acoustic targets within jamming environments, which includes target motion analysis, matched filtering, and principal component analysis-based denoising, and we call this ’P-Gabor’ algorithm. The simulations and sea-trial experiments show that the proposed method can track a weak target successfully under −23 dB (signal-to-interference ratio) SIR, which is more effective than the references, especially in terms of using real-world data from sea trials. We further demonstrate that the method also has stable tracking performance at even −25 dB SNR (signal-to-noise ratio) circumstances.
... Based on this, we propose a technique to adaptively change the reception period of a sensor node based on the predicted vehicle condition. A moving object at a sensor node, such as a vehicle, can be tracked in diverse ways [10], [11]. In microcontrollerbased embedded sensor node systems, poor performance can cause a large amount of noise in the measurement. ...
... We can predict the vehicle velocity, v k , from the measured vehicle position, p k , using the variables obtained from (8), (11), (13), and the Kalman filter algorithm. Finally, variables Q and R of (4)-(5) must be determined. ...
... In addition, effective results were achieved for large-scale IoT networks when the number of vehicles increased. The proposed technique made a practical contribution that allow efficient computation on sensor nodes VOLUME 4, 2016 11 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. ...
Article
Full-text available
Sensor nodes that operate as edge devices in Internet-of-Things (IoT) networks have various limitations, such as insufficient power supply and small memory size. Therefore, the sensor node must be able to use resources efficiently to achieve the specified software behavior of the target application. The application in this study involves an IoT network in which the sensor node requests the position from a moving vehicle and estimates the velocity by using a Kalman filter. Using the same sensing cycle for all vehicles improves accuracy regardless of the predicted velocity of the sensor node but increases unnecessary computations. The proposed technique can be used to weigh the communication distance from the sensor node as the speed of the vehicle decreases, to enable the sensor node to adaptively determine the data request period based on the state of the vehicle. When a slow-moving vehicle intermittently communicates with a sensor node, the time required for the computation performed by the sensor node can be significantly reduced. To evaluate the proposed technique, we experimented with a traffic simulator that was implemented in MATLAB. Compared with the increment in the root mean square error of the reference velocity that sensed the position at every time step, the decrement in the processing time of the sensor node was considerable. Experiments with four manually determined distance weights and a number of spawned vehicles showed that the sensor node processing time was reduced by up to 72.91%.
... At the same time, operations on each particle are involved in the particle prediction phase, update phase, grid statistical moment calculation, and resampling phase. Due to the characteristics of particle filter, these operations can be accelerated through parallel computing [35,36]. After the particle prediction step is performed using the parallel method, since the update of the particle weight and the calculation of the statistical moment are all based on the grid cell, each grid cell must record the particles in the grid cell. ...
Article
Full-text available
The PHD (Probability Hypothesis Density) filter is a sub-optimal multi-target Bayesian filter based on a random finite set, which is widely used in the tracking and estimation of dynamic objects in outdoor environments. Compared with the outdoor environment, the indoor environment space and the shape of dynamic objects are relatively small, which puts forward higher requirements on the estimation accuracy and response speed of the filter. This paper proposes a method for fast and high-precision estimation of the dynamic objects’ velocity for mobile robots in an indoor environment. First, the indoor environment is represented as a dynamic grid map, and the state of dynamic objects is represented by its grid cells state as random finite sets. The estimation of dynamic objects’ speed information is realized by using the measurement-driven particle-based PHD filter. Second, we bound the dynamic grid map to the robot coordinate system and derived the update equation of the state of the particles with the movement of the robot. At the same time, in order to improve the perception accuracy and speed of the filter for dynamic targets, the CS (Current Statistical) motion model is added to the CV (Constant Velocity) motion model, and interactive resampling is performed to achieve the combination of the advantages of the two. Finally, in the Gazebo simulation environment based on ROS (Robot Operating System), the speed estimation and accuracy analysis of the square and cylindrical dynamic objects were carried out respectively when the robot was stationary and in motion. The results show that the proposed method has a great improvement in effect compared with the existing methods.
... Alam et al. [18] Multinomial re-sampling Virtex-6 Generic Improved re-sampling of particle filter design that included a weight pre-fetch function to reduce the re-sampling step's latency. 3 [24], Par et al. 4 [25], Kim et al. 5 [26] Proposed modified algorithms to implement particle filters efficiently on a GPU. PFs on a DSP for wireless network tracking applications. ...
... Their performance analysis shows that up to 75x speedup can be achieved on a 512core GPU over sequential implementation. Kim et al. [26] implemented PF on a GPU for target position estimation and parallelized the calculation process utilizing multiple GPU cores. The proposed algorithm was simulated on a CPU in MATLAB and then verified on GPU, resulting in a 55% reduction in execution time. ...
Article
Full-text available
Particle filtering is very reliable in modelling non-Gaussian and non-linear elements of physical systems, which makes it ideal for tracking and localization applications. However, a major drawback of particle filters is their computational complexity, which inhibits their use in real-time applications with conventional CPU or DSP based implementation schemes. The re-sampling step in the particle filters creates a computational bottleneck since it is inherently sequential and cannot be parallelized. This paper proposes a modification to the existing particle filter algorithm, which enables parallel re-sampling and reduces the effect of the re-sampling bottleneck. We then present a high-speed and dedicated hardware architecture incorporating pipe-lining and parallelization design strategies to supplement the modified algorithm and lower the execution time considerably. From an application standpoint, we propose a novel source localization model to estimate the position of a source in a noisy environment using the particle filter algorithm implemented on hardware. The design has been prototyped using Artix-7 field-programmable gate array (FPGA), and resource utilization for the proposed system is presented. Further, we show the execution time and estimation accuracy of the high-speed architecture and observe a significant reduction in computational time. Our implementation of particle filters on FPGA is scalable and modular, with a low execution time of about 5.62 μs for processing 1024 particles (compared to 64 ms on Intel Core i7-7700 CPU with eight cores clocking at 3.60 GHz) and can be deployed for real-time applications.
... For the parallel implementation of PF, particle generation and weight update can be easily implemented in parallel [48], [57], [58]- [60], but the resampling step requires the cumulative sum of weights which cannot be easily parallelized. Efforts have been put to implement the parallelized version [59] of the resampling step using GPU. ...
... Using multicore architecture, parallel implementation of the resampling step and likelihood calculation by utilizing the computing power of embedded systems are presented by Truong and Kim [49]. An accelerated PF method based on GPU, which can be applied to Internet of Things based applications, is presented by Kim et al. [58]. The authors achieved increased processing speed by simultaneously processing the state update process and weight computation process. ...
Article
Full-text available
Particle Filter is one of the widely used techniques in visual tracking as it can model a dynamic environment with non-linear motions and multimodal non-Gaussian noises. Many decades of active research in visual tracking using particle filter has improved its various techniques, such as importance proposal, particle degeneracy and impoverishment, parallel implementation of resampling and weight update, data association, and target labelling, to make particle filter more accurate and efficient. In the last decade, many attempts have been reported which integrate the particle filter with the convolutional neural network. This integration has produced more accurate visual trackers as compared to traditional particle filter-based techniques. However, there are many unresolved challenges, such as variations in illumination, rapid and sudden change in motion, deformation of targets, complex and cluttered dynamic environment that need further research. Multiple target tracking is posing additional problems, such as identification and labeling of targets, track model drifting, misdetection, and computational explosion with the increase in the number of targets. In this paper, a review of recent advances, specifically in the last decade, in single target tracking and multiple targets tracking using particle filter is reported and research gaps are identified to give impetus to further research.
... Furthermore, Ahmed et al. and Huang et al. conducted research on Gaussian and arbitrary distribution, respectively [42,43]. From the discussions of the above and other authors, it has been verified that the CB can form a directional beam, as shown in Figure 3, and is prominently helpful to achieve power improvement [44] in the direction of a desired AP (theoretically, proportional to the square of the number of sensors) [45][46][47][48][49][50][51][52][53][54][55][56][57]. Their work also show that the CB can increase the success probability of long-range direct transmission regardless of distribution of sensors. ...
Article
Full-text available
Collaborative beamforming (CB) enables uplink transmission in a wireless sensor network (WSN) composed of sensors (nodes) and far-away access points (APs). It can also be applied to the case where the sensors are equipped with beam-switching structures (BSSs). However, as the antenna arrays of the BSSs are randomly headed due to the irregular mounting surface, some sensors form beams that do not illuminate a desired AP and waste their limited energy. Therefore, to resolve this problem, it is required to switch the beams toward the desired AP. While an exhaustive search can provide the globally optimal combination, a greedy search (GS) is utilized to solve this optimization problem efficiently. Simulation and experimental results verify that under certain conditions the proposed algorithm can drive the sensors to switch their beams properly and increase the received signal-to-noise ratio (SNR) significantly with low computational complexity and energy consumption.
... In the industry, there also exists a wide range of applications that require to perform acoustic source localization [22]. Currently, smart factories making use of distributed sensors are gaining momentum. ...
Article
The rapid development of the Internet of Things (IoT) has posed important changes in the way emerging acoustic signal processing applications are conceived. While traditional acoustic processing applications have been developed taking into account high-throughput computing platforms equipped with expensive multichannel audio interfaces, the IoT paradigm is demanding the use of more flexible and energy-efficient systems. In this context, algorithms for source localization and ranging in wireless acoustic sensor networks can be considered an enabling technology for many IoT-based environments including security, industrial and health-care applications. This paper is aimed at evaluating important aspects dealing with the practical deployment of IoT systems for acoustic source localization. Recent Systems-On-Chip (SoC) composed of low-power multicore processors, combined with a small graphics accelerator (or GPU), yield a notable increment of the computational capacity needed in intensive signal processing algorithms while partially retaining the appealing low power consumption of embedded systems. Different algorithms and implementations over several state-of-the-art platforms are discussed, analyzing important aspects such as the trade-offs between performance, energy efficiency and exploitation of parallelism by taking into account real-time constraints.
... During the last few years, the capability of GPU is growing much faster than that of CPU's because of greatly increasing hardware requirement for modern computer games. The GPU is also rapidly and widely used for various scientific computations in addition to graphic display [30], such as fluid dynamics [31], biophysics [32], molecular dynamics [33], and IoT sensing [34]. GPUs can provide huge performance improvement than a single CPU core for many applications. ...
Article
Full-text available
Facility layout problem (FLP) is one of the hottest research areas in industrial engineering. A good facility layout can achieve efficient production management, improve production efficiency, and create high economic values. Because FLP is an NP-hard problem, meaning it is impossible to find the optimal solution when problem becomes sufficiently large, various evolutionary algorithms (EAs) have been proposed to find a sub-optimal solution within a reasonable time interval. Recently, a genetic algorithm (GA) was proposed for unequal area FLP (UA-FLP), where the areas of facilities are not identical. More precisely, the GA is an island model based, which is called IMGA. Since EAs are still very time consuming, many efforts have been devoted to how to parallelize various EAs including IMGA. In recent work, Steffen and Dietmar proposed how to parallelize island models of EAs. However, their parallelization approaches are preliminary because they focused mainly on comparing the performances between different parallel architectures. In addition, they used one mathematical function to model the problem. To further investigate on how to parallelize the IMGA by GPU, in this paper we propose multiple parallel algorithms, for each individual step in the IMGA when solving the industrial engineering problem, UA-FLP, and conduct experiments to compare their performances. After integrating better algorithms for all steps into the IMGA, our GPU implementation outperforms the CPU counterpart and the best speedup can be as high as 84.