Content uploaded by Kenia Picos
Author content
All content in this area was uploaded by Kenia Picos on Mar 28, 2016
Content may be subject to copyright.
A Parallel CPU/GPU Algorithm for 3D Pose Estimation using
CUDA and OpenMP
Kenia Picos1, Víctor H. Díaz-Ramírez1, Antonio S. Montemayor2, Juan J. Pantrigo2
kpicos@citedi.mx, vdiazr@ipn.mx, {antonio.sanz, juanjose.pantrigo}@urjc.es
1Instituto Politécnico Nacional (CITEDI-IPN), 2Universidad Rey Juan Carlos NVIDIA GPU Education Center
Motivation
Pose recognition is characterized by location and orientation parameters, in which introduces
high complexity due the huge number of visualization than a target can present within a scene.
Several issues such as noise, incident light sources and geometrical distortions, modify the
appearance of a target with respect an observation point, making object recognition a complex
task. A design of an effective algorithm is needed in order to analyze the physical phenomena
implied in the visualization of a moving 3D object, and to improve the execution performance.
3D Pose Estimation
The proposed system employs an adaptive bank of space variant correlation filters to solve 3D
target pose recognition. Each generalized matching filter (Eq. 1) is constructed with a image
template, which includes the parametric view of the target in a specified 3D pose.
H∗(µ)= T(µ)+mbW(µ)+mtWt(µ)
1
2π|W(µ)|2∗N0
b(µ)+ 1
2π|W(µ)|2∗N0
t(µ)Eq. (1) DC =1−|cb|2
|ct|2Eq. (2)
The output correlation function is given by
ci(x)=IFT{F(µ)H∗
i(µ)}
. The best match is
determined by the max value of DC (discrimination capability) obtained from the computed
correlation planes (Eq. 2). The selection of the output estimation is given by the highest DC
value obtained from the entire filter bank defined by DCbest = max {DC1, DC2,...,DC
M}.
Frame 0 Frame 10 Frame 20 Frame 30 Frame 130
Figure 1: Pose estimation in a synthetic sequence.
Frame 0 Frame 10 Frame 40 Frame 60 Frame 80
Figure 2: Pose estimation in a real video sequence using a 3D printed model.
The input frame is characterized by a target in an unknown pose, defined by location, orientation
and scaling parameters of the target, and embedded into a disjoint background. The appearance
of the target depends on pose and lighting properties of the surface material. For this, the image
templates was rendered with the Phong lighting model including small highlight reflections
using OpenGL. Figs. 1 and 2 show the pose estimation performance in synthetic and real input
video, respectively, both with 300 frames of 512 ×512 pixels.
Results
The current proposal focuses on improving the execution of a 3D object recognition algorithm by
using a combination of data and task parallelism, in order to take full advantage of a CPU/GPU
architecture. The implementation has been developed in a CPU/GPU architecture running on a
Linux OS with a multi-core host processor (Intel i7) and NVIDIA graphics processor (GTX780).
The program was developed with CUDA CUFFT with OpenGL interop for 3D graphic model
generation, and OpenMP for CPU thread parallelization, executed with -O3 optimization.
Thread 0 Thread 1 Thread 2 ··· Thread 7
Filter bank 0 Filter bank 1 Filter bank 2 ··· Filter bank 7
Correlation
processes
c0∼c7
Correlation
processes
c8∼c15
Correlation
processes
c16 ∼c23
··· Correlation
processes
c56 ∼c63
Evaluation
DC0∼DC7
Evaluation
DC8∼DC15
Evaluation
DC16 ∼DC23
··· Evaluation
DC24 ∼DC63
Find best DC value
Figure 3: Parallel implementation using OpenMP and CUDA CUFFT.
The proposed algorithm computes a bank of
M
filters with
N
CPU threads. As we can see in
Fig. 3, each thread executes a defined portion
(M/N )
of the bank. Then, several correlation
processes are computed in a GPU kernel using CUFFT. Finally, the best match is found given
the max DC value of the entire bank, given the best correspondence of location and orientation
parameters of the target. The proposed parallel approach yields a time execution improvement
using a filter bank up to 1000 filters processed in frequency domain.
Location Orientation Sequential OpenMP+CUDA Speedup with
error error evaluation parallel evaluation 1000 filters
8.9±0.6 pixels 5.6±0.4 degrees 200 s 68 s 3×
Conclusions
This work presents a proposal for pose recognition with adaptive correlation filters to accurately
estimate the location and orientation of a target within a 3D space. A parallel algorithm was
designed in order to execute massive concurrent correlation processes in frequency domain.
A CPU/GPU implementation using OpenMP and CUDA achieves an improvement of almost
3 times the sequential execution. Our future work will focus to optimize the performance of
multi-pose object recognition for real-time applications.
contact name
Kenia Picos: kenia.picos@gmail.com
Poster
P6274
category: comPuter Vision & machine Vision - cVmV01