Available via license: CC BY 4.0

Content may be subject to copyright.

A preview of the PDF is not available

A particle filter (PF) has been introduced for effective position estimation of moving targets for non-Gaussian and nonlinear systems. The time difference of arrival (TDOA) method using acoustic sensor array has normally been used to for estimation by concealing the location of a moving target, especially underwater. In this paper, we propose a GPU -based acceleration of target position estimation using a PF and propose an efficient system and software architecture. The proposed graphic processing unit (GPU)-based algorithm has more advantages in applying PF signal processing to a target system, which consists of large-scale Internet of Things (IoT)-driven sensors because of the parallelization which is scalable. For the TDOA measurement from the acoustic sensor array, we use the generalized cross correlation phase transform (GCC-PHAT) method to obtain the correlation coefficient of the signal using Fast Fourier Transform (FFT), and we try to accelerate the calculations of GCC-PHAT based TDOA measurements using FFT with GPU compute unified device architecture (CUDA). The proposed approach utilizes a parallelization method in the target position estimation algorithm using GPU-based PF processing. In addition, it could efficiently estimate sudden movement change of the target using GPU-based parallel computing which also can be used for multiple target tracking. It also provides scalability in extending the detection algorithm according to the increase of the number of sensors. Therefore, the proposed architecture can be applied in IoT sensing applications with a large number of sensors. The target estimation algorithm was verified using MATLAB and implemented using GPU CUDA. We implemented the proposed signal processing acceleration system using target GPU to analyze in terms of execution time. The execution time of the algorithm is reduced by 55% from to the CPU standalone operation in target embedded board, NVIDIA Jetson TX1. Also, to apply large-scaled IoT sensing applications, we use NVIDIA Tesla K40c as target GPU. The execution time of the proposed multi-state-space model-based algorithm is similar to the one-state-space model algorithm because of GPU-based parallel computing. Experimental results show that the proposed architecture is a feasible solution in terms of high-performance and area-efficient architecture.

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

A preview of the PDF is not available

... Missile Application [11], [12] X -X [13], [14] X -○ [15], [18], [19], [21] ○ Weight computation X [20], [21] ○ Resampling X [16] ○ rendering / normalized X [17], [18], [20] ○ Likelihood calculation X Ours ○ Model propagation ○ To achieve high-speed target tracking, the PFs are accelerated in Compute Unified Device Architecture (CUDA) with a GPU. If the entire process of the PF algorithm is converted to CUDA, all parts regardless of their computation time are converted. ...

... Acceleration studies were conducted using a GPU to perform the PF algorithm in real-time. In [15] and [19], the PF that is parallelized for the weight computation is proposed. A GPU was used to improve the PF estimation for target tracking rather than acceleration [15], And in [19], a GPU was used to accelerate IoT applications. ...

... In [15] and [19], the PF that is parallelized for the weight computation is proposed. A GPU was used to improve the PF estimation for target tracking rather than acceleration [15], And in [19], a GPU was used to accelerate IoT applications. The tracking algorithm was accelerated by approximately 55 % compared to the CPUbased algorithm. ...

This study addresses the problem of real-time tracking of high-speed ballistic targets. Particle filters can be used to overcome the nonlinearity of motion and measurement models in ballistic targets. However, applying particle filters (PFs) to real-time systems is challenging since they generally require a significant computation time. So, most of the existing methods of accelerating PF using a graphics processing unit (GPU) for target tracking applications have accelerated computation weight function and resampling part. However, the computational time per part varies from application to application, and in this work, we confirm that it takes a lot of computational time in the model propagation part and propose accelerated PF by parallelizing the corresponding logic. The real-time performance of the proposed method was tested and analyzed using an embedded system. And compared to conventional PF on the central processing unit (CPU), the proposed method shows that the proposed method significantly reduces computational time by at least 10 times, improving real-time performance.

... Based on this, we propose a technique to adaptively change the reception period of a sensor node based on the predicted vehicle condition. A moving object at a sensor node, such as a vehicle, can be tracked in diverse ways [10], [11]. In microcontrollerbased embedded sensor node systems, poor performance can cause a large amount of noise in the measurement. ...

... We can predict the vehicle velocity, v k , from the measured vehicle position, p k , using the variables obtained from (8), (11), (13), and the Kalman filter algorithm. Finally, variables Q and R of (4)-(5) must be determined. ...

... In addition, effective results were achieved for large-scale IoT networks when the number of vehicles increased. The proposed technique made a practical contribution that allow efficient computation on sensor nodes VOLUME 4, 2016 11 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. ...

Sensor nodes that operate as edge devices in Internet-of-Things (IoT) networks have various limitations, such as insufficient power supply and small memory size. Therefore, the sensor node must be able to use resources efficiently to achieve the specified software behavior of the target application. The application in this study involves an IoT network in which the sensor node requests the position from a moving vehicle and estimates the velocity by using a Kalman filter. Using the same sensing cycle for all vehicles improves accuracy regardless of the predicted velocity of the sensor node but increases unnecessary computations. The proposed technique can be used to weigh the communication distance from the sensor node as the speed of the vehicle decreases, to enable the sensor node to adaptively determine the data request period based on the state of the vehicle. When a slow-moving vehicle intermittently communicates with a sensor node, the time required for the computation performed by the sensor node can be significantly reduced. To evaluate the proposed technique, we experimented with a traffic simulator that was implemented in MATLAB. Compared with the increment in the root mean square error of the reference velocity that sensed the position at every time step, the decrement in the processing time of the sensor node was considerable. Experiments with four manually determined distance weights and a number of spawned vehicles showed that the sensor node processing time was reduced by up to 72.91%.

... For the parallel implementation of PF, particle generation and weight update can be easily implemented in parallel [48], [57], [58]- [60], but the resampling step requires the cumulative sum of weights which cannot be easily parallelized. Efforts have been put to implement the parallelized version [59] of the resampling step using GPU. ...

... Using multicore architecture, parallel implementation of the resampling step and likelihood calculation by utilizing the computing power of embedded systems are presented by Truong and Kim [49]. An accelerated PF method based on GPU, which can be applied to Internet of Things based applications, is presented by Kim et al. [58]. The authors achieved increased processing speed by simultaneously processing the state update process and weight computation process. ...

Particle Filter is one of the widely used techniques in visual tracking as it can model a dynamic environment with non-linear motions and multimodal non-Gaussian noises. Many decades of active research in visual tracking using particle filter has improved its various techniques, such as importance proposal, particle degeneracy and impoverishment, parallel implementation of resampling and weight update, data association, and target labelling, to make particle filter more accurate and efficient. In the last decade, many attempts have been reported which integrate the particle filter with the convolutional neural network. This integration has produced more accurate visual trackers as compared to traditional particle filter-based techniques. However, there are many unresolved challenges, such as variations in illumination, rapid and sudden change in motion, deformation of targets, complex and cluttered dynamic environment that need further research. Multiple target tracking is posing additional problems, such as identification and labeling of targets, track model drifting, misdetection, and computational explosion with the increase in the number of targets. In this paper, a review of recent advances, specifically in the last decade, in single target tracking and multiple targets tracking using particle filter is reported and research gaps are identified to give impetus to further research.

... Alam et al. [18] Multinomial re-sampling Virtex-6 Generic Improved re-sampling of particle filter design that included a weight pre-fetch function to reduce the re-sampling step's latency. 3 [24], Par et al. 4 [25], Kim et al. 5 [26] Proposed modified algorithms to implement particle filters efficiently on a GPU. PFs on a DSP for wireless network tracking applications. ...

... Their performance analysis shows that up to 75x speedup can be achieved on a 512core GPU over sequential implementation. Kim et al. [26] implemented PF on a GPU for target position estimation and parallelized the calculation process utilizing multiple GPU cores. The proposed algorithm was simulated on a CPU in MATLAB and then verified on GPU, resulting in a 55% reduction in execution time. ...

Particle filtering is very reliable in modelling non-Gaussian and non-linear elements of physical systems, which makes it ideal for tracking and localization applications. However, a major drawback of particle filters is their computational complexity, which inhibits their use in real-time applications with conventional CPU or DSP based implementation schemes. The re-sampling step in the particle filters creates a computational bottleneck since it is inherently sequential and cannot be parallelized. This paper proposes a modification to the existing particle filter algorithm, which enables parallel re-sampling and reduces the effect of the re-sampling bottleneck. We then present a high-speed and dedicated hardware architecture incorporating pipe-lining and parallelization design strategies to supplement the modified algorithm and lower the execution time considerably. From an application standpoint, we propose a novel source localization model to estimate the position of a source in a noisy environment using the particle filter algorithm implemented on hardware. The design has been prototyped using Artix-7 field-programmable gate array (FPGA), and resource utilization for the proposed system is presented. Further, we show the execution time and estimation accuracy of the high-speed architecture and observe a significant reduction in computational time. Our implementation of particle filters on FPGA is scalable and modular, with a low execution time of about 5.62 μs for processing 1024 particles (compared to 64 ms on Intel Core i7-7700 CPU with eight cores clocking at 3.60 GHz) and can be deployed for real-time applications.

... Thus, using high-precision data types like fp64 or fp32 would be wasteful for both off-chip and on-chip resources including memory bandwidth, registers and ALU usage. Harnessing mixed-precision data ( Figure 1) brings the opportunities of deploying real-time tasks on resource-limited devices [4,5]. ...

GPUs have been broadly used to accelerate big data analytics, scientific computing and machine intelligence. Particularly, matrix multiplication and convolution are two principal operations that use a large proportion of steps in modern data analysis and deep neural networks. These performance-critical operations are often offloaded to the GPU to obtain substantial improvements in end-to-end latency. In addition, multifarious workload characteristics and complicated processing phases in big data demand a customizable yet performant operator library. To this end, GPU vendors, including NVIDIA and AMD, have proposed template and composable GPU operator libraries to conduct specific computations on certain types of low-precision data elements. We formalize a set of benchmarks via CUTLASS, NVIDIA’s templated library that provides high-performance and hierarchically designed kernels. The benchmarking results show that, with the necessary fine tuning, hardware-level ASICs like tensor cores could dramatically boost performance in specific operations like GEMM offloading to modern GPUs.

... However, when the number of particles is enormous, the particle-filter method takes a long time. S. Kim et al. [11] present a method based on the graphics processing unit (GPU) to speed the particle filter operation to estimate the changing location of the target, which is well suited to scenes requiring real-time processing. The findings demonstrate that the speed has improved, and the tracking of the target's quick shift trajectory performs well. ...

Bear time records, which are the accumulations of spatial spectrum estimates on the time axis, are often employed for passive sonar information processing. Multi-target jamming is a common difficulty in this approach due to the constraints of Rayleigh limit, and neither the conventional beamforming (CBF) nor minimum variance distortionless response (MVDR) technique can handle it well. This work presents a post-processing tracking framework based on visual pattern recognition algorithms to track weak acoustic targets within jamming environments, which includes target motion analysis, matched filtering, and principal component analysis-based denoising, and we call this ’P-Gabor’ algorithm. The simulations and sea-trial experiments show that the proposed method can track a weak target successfully under −23 dB (signal-to-interference ratio) SIR, which is more effective than the references, especially in terms of using real-world data from sea trials. We further demonstrate that the method also has stable tracking performance at even −25 dB SNR (signal-to-noise ratio) circumstances.

... At the same time, operations on each particle are involved in the particle prediction phase, update phase, grid statistical moment calculation, and resampling phase. Due to the characteristics of particle filter, these operations can be accelerated through parallel computing [35,36]. After the particle prediction step is performed using the parallel method, since the update of the particle weight and the calculation of the statistical moment are all based on the grid cell, each grid cell must record the particles in the grid cell. ...

The PHD (Probability Hypothesis Density) filter is a sub-optimal multi-target Bayesian filter based on a random finite set, which is widely used in the tracking and estimation of dynamic objects in outdoor environments. Compared with the outdoor environment, the indoor environment space and the shape of dynamic objects are relatively small, which puts forward higher requirements on the estimation accuracy and response speed of the filter. This paper proposes a method for fast and high-precision estimation of the dynamic objects’ velocity for mobile robots in an indoor environment. First, the indoor environment is represented as a dynamic grid map, and the state of dynamic objects is represented by its grid cells state as random finite sets. The estimation of dynamic objects’ speed information is realized by using the measurement-driven particle-based PHD filter. Second, we bound the dynamic grid map to the robot coordinate system and derived the update equation of the state of the particles with the movement of the robot. At the same time, in order to improve the perception accuracy and speed of the filter for dynamic targets, the CS (Current Statistical) motion model is added to the CV (Constant Velocity) motion model, and interactive resampling is performed to achieve the combination of the advantages of the two. Finally, in the Gazebo simulation environment based on ROS (Robot Operating System), the speed estimation and accuracy analysis of the square and cylindrical dynamic objects were carried out respectively when the robot was stationary and in motion. The results show that the proposed method has a great improvement in effect compared with the existing methods.

We present robust high‐performance implementations of signal‐processing tasks performed by a high‐throughput wildlife tracking system called ATLAS. The system tracks radio transmitters attached to wild animals by estimating the time of arrival of radio packets to multiple receivers (base stations). Time‐of‐arrival estimation of wideband radio signals is computationally expensive, especially in acquisition mode (when the time of transmission is not known, not even approximately). These computations are a bottleneck that limits the throughput of the system. We developed a sequential high‐performance CPU implementation of the computations a few years back, and more recently a GPU implementation. Both strive to balance performance with simplicity, maintainability, and development effort, as most real‐world codes do. The article reports on the two implementations and carefully evaluates their performance. The evaluations indicates that the GPU implementation dramatically improves performance and power‐performance relative to the sequential CPU implementation running on a desktop CPU typical of the computers in current base stations. Performance improves by more than 50X on a high‐end GPU and more than 4X with a GPU platform that consumes almost 5 times less power than the CPU platform. Performance‐per‐Watt ratios also improve (by more than 16X), and so do the price‐performance ratios.

We present robust high-performance implementations of signal-processing tasks performed by a high-throughput wildlife tracking system called ATLAS. The system tracks radio transmitters attached to wild animals by estimating the time of arrival of radio packets to multiple receivers (base stations). Time-of-arrival estimation of wideband radio signals is computationally expensive, especially in acquisition mode (when the time of transmission of not known, not even approximately). These computation are a bottleneck that limits the throughput of the system. The paper reports on two implementations of ATLAS’s main signal-processing algorithms, one for CPUs and the other for GPUs, and carefully evaluates their performance. The evaluations indicates that the GPU implementation dramatically improves performance and power-performance relative to our baseline, a high-end desktop CPU typical of the computers in current base stations. Performance improves by more than 50X on a high-end GPU and more than 4X with a GPU platform that consumes almost 5 times less power than the CPU platform. Performance-per-Watt ratios also improve (by more than 16X), and so do the price-performance ratios.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation of the theoretical ESS definition is widely applied, involving the inverse of the sum of the squares of the normalized importance weights. This formula, , has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric mean of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient among others. We list five theoretical requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

We design a sequential Monte Carlo scheme for the joint purpose of Bayesian inference and model selection, with application to urban mobility context where different modalities of transport and measurement devices can be employed. In this case, we have the joint problem of online tracking and detection of the current modality. For this purpose, we use interacting parallel particle filters each one addressing a different model. They cooperate for providing a global estimator of the variable of interest and, at the same time, an approximation of the posterior density of the models given the data. The interaction occurs by a parsimonious distribution of the computational effort, adapting on-line the number of particles of each filter according to the posterior probability of the corresponding model. The resulting scheme is simple and flexible. We have tested the novel technique in different numerical experiments with artificial and real data, which confirm the robustness of the proposed scheme.

Two decades ago, with the publication of [1], we witnessed the rebirth of particle filtering (PF) as a methodology for sequential signal processing. Since then, PF has become very popular because of its ability to process observations represented by nonlinear state-space models where the noises of the model can be non-Gaussian. This methodology has been adopted in various fields, including finance, geophysical systems, wireless communications, control, navigation and tracking, and robotics [2]. The popularity of PF has also spurred the publication of several review articles [2]?[6].

Due to cost-effectiveness and easy-deployment, RFID location systems are widely utilized into many industrial fields, particularly in the emerging environment of the internet of things (IoT). High accuracy and precision are key demands for these location systems. Numerous studies have attempted to improve localisation accuracy and precision by using either dedicated RFID infrastructures or advanced localisation algorithms. But these effects mostly consider utilization of novel RFID localisation solutions rather than optimization of this utilization. Practical use of these solutions in industrial applications leads to increased cost and deployment difficulty of RFID system. This paper attempts to investigate how accuracy and precision in passive RFID location systems (PRLS) are impacted by infrastructures and localisation algorithms. A general experimental based investigation strategy, PRLS-INVES, is designed for analyzing and evaluating the factors that impact the performance of a passive RFID location system. Through a case study on passive HF RFID location systems with this strategy, it is discovered that (1) RFID infrastructure is the primary factor determining the localisation capability of a RFID location system. (2) Localisation algorithm can improve accuracy and precision, but limited by the primary factor. A discussion on how to efficiently improve localisation accuracy and precision in passive HF RFID location systems is given.

Localization is one of the fundamental tasks in wireless sensor network. Measuring time of arrival (TOA) and time difference of arrival of a signal (TDOA) are two widely used criteria for localization. In TOA based schemes target must be synchronous with anchors while target can be asynchronous with anchors in TDOA schemes. In this paper, we propose a target localization scheme based on measuring TDOA for inhomogeneous underwater environment. One of the properties of inhomogeneous underwater is that waves travel over curved path due to the inhomogeneity of the underwater. This phenomenon makes TDOA based localization in underwater a different problem from localization in terrestrial wireless sensor network. Our proposed TDOA based localization is developed by iterative algorithm. Simulation results shows that while our proposed underwater-TDOA algorithm converges to the Cramer Rao Lower Bound (CRLB), it outperforms the line-of-sight-TDOA algorithm in accuracy sense and also underwater-TOA algorithm for localizing an asynchronous target.

We propose a Sequential Monte Carlo (SMC) method for filtering and prediction of time-varying signals under model uncertainty. Instead of resorting to model selection, we fuse the information from the considered models within the proposed SMC method. We achieve our goal by dynamically adjusting the resampling step according to the posterior predictive power of each model, which is updated sequentially as we observe more data. The method allows the models with better predictive powers to explore the state space with more resources than models lacking predictive power. This is done autonomously and dynamically within the SMC method. We show the validity of the presented method by evaluating it on an illustrative application.

Modern parallel computing devices, such as the graphics processing unit (GPU), have gained significant traction in scientific and statistical computing. They are particularly well-suited to data-parallel algorithms such as the particle filter, or more generally Sequential Monte Carlo (SMC), which are increasingly used in statistical inference. SMC methods carry a set of weighted particles through repeated propagation, weighting and resampling steps. The propagation and weighting steps are straightforward to parallelise, as they require only independent operations on each particle. The resampling step is more difficult, as standard schemes require a collective operation, such as a sum, across particle weights. Focusing on this resampling step, we analyse two alternative schemes that do not involve a collective operation (Metropolis and rejection resamplers), and compare them to standard schemes (multinomial, stratified and systematic resamplers). We find that, in certain circumstances, the alternative resamplers can perform significantly faster on a GPU, and to a lesser extent on a CPU, than the standard approaches. Moreover, in single precision, the standard approaches are numerically biased for upwards of hundreds of thousands of particles, while the alternatives are not. This is particularly important given greater single- than double-precision throughput on modern devices, and the consequent temptation to use single precision with a greater number of particles. Finally, we provide auxiliary functions useful for implementation, such as for the permutation of ancestry vectors to enable in-place propagation. Supplementary materials are available online.

Sound source localization is an important topic in expert systems involving microphone arrays, such as automatic camera steering systems, human–machine interaction, video gaming or audio surveillance. The Steered Response Power with Phase Transform (SRP-PHAT) algorithm is a well-known approach for sound source localization due to its robust performance in noisy and reverberant environments. This algorithm analyzes the sound power captured by an acoustic beamformer on a defined spatial grid, estimating the source location as the point that maximizes the output power. Since localization accuracy can be improved by using high-resolution spatial grids and a high number of microphones, accurate acoustic localization systems require high computational power. Graphics Processing Units (GPUs) are highly parallel programmable co-processors that provide massive computation when the needed operations are properly parallelized. Emerging GPUs offer multiple parallelism levels; however, properly managing their computational resources becomes a very challenging task. In fact, management issues become even more difficult when multiple GPUs are involved, adding one more level of parallelism. In this paper, the performance of an acoustic source localization system using distributed microphones is analyzed over a massive multichannel processing framework in a multi-GPU system. The paper evaluates and points out the influence that the number of microphones and the available computational resources have in the overall system performance. Several acoustic environments are considered to show the impact that noise and reverberation have in the localization accuracy and how the use of massive microphone systems combined with parallelized GPU algorithms can help to mitigate substantially adverse acoustic effects. In this context, the proposed implementation is able to work in real time with high-resolution spatial grids and using up to 48 microphones. These results confirm the advantages of suitable GPU architectures in the development of real-time massive acoustic signal processing systems.

Multipath propagation and reverberation of underwater acoustic signal affect the accuracy of localization estimation based on time difference of arrival (TDOA). To solve the problem, a novel algorithm is proposed to improve the accuracy of localization estimation of underwater acoustic source. Based on the signals received at an array of sensors, a general framework for acoustic source localization using particle filtering is proposed. A generic particle filtering framework was derived. The simulation results demonstrate the superiority of the proposed method. The resulting particle filter is shown to outperform traditional acoustic source localization method.

This article presents a sequential Monte Carlo (SMC) algorithm that can be used for any one-at-a-time Bayesian sequential design problem in the presence of model uncertainty where discrete data are encountered. Our focus is on adaptive design for model discrimination but the methodology is applicable if one has a different design objective such as parameter estimation or prediction. An SMC algorithm is run in parallel for each model and the algorithm relies on a convenient estimator of the evidence of each model that is essentially a function of importance sampling weights. Methods that rely on quadrature for this task suffer from the curse of dimensionality. Approximating posterior model probabilities in this way allows us to use model discrimination utility functions derived from information theory that were previously difficult to compute except for conjugate models. A major benefit of the algorithm is that it requires very little problem-specific tuning. We demonstrate the methodology on three applications, including discriminating between models for decline in motor neuron numbers in patients suffering from motor neuron disease. Computer code to run one of the examples is provided as online supplementary materials.