ArticlePDF Available

Autonomous positioning control of manipulator and fast surface fitting based on particle filter and point cloud library technology

Authors:

Abstract and Figures

The real-time calculations of the positioning error, error correction, and state analysis have always been a difficult challenge in the process of autonomous positioning. In order to solve this problem, a simple depth imaging equipment (Kinect) is used, and a particle filter based on three-frame subtraction to capture the end-effector’s motion is proposed in this article. Further, a back-propagation neural network is adopted to recognize targets. The point cloud library technology is used to collect the space coordinates of the end-effector and target. Finally, a three-dimensional mesh simplification algorithm based on the density analysis and average distance between points is proposed to carry out data compression. Accordingly, the target point cloud is fitted quickly. The experiments conducted in the article demonstrate that the proposed algorithm can detect and track the end-effector in real time. The recognition rate of 99% is achieved for a cylindrical object. The geometric center of all particles is regarded as the end-effector’s center. Furthermore, the gradual convergence of the end-effector center to the target centroid shows that the autonomous positioning is successful. Compared to traditional algorithms, both moving the end-effector and a stationary object can be extracted from image frames using a thesis. The thesis presents a simple and convenient positioning method, which adjusts the motion of the manipulator according to the error between the end-effector’s center and target centroid. The computational complexity is reduced and the camera calibration is eliminated.
Content may be subject to copyright.
Research Article
Autonomous positioning control of
manipulator and fast surface fitting
based on particle filter and point
cloud library technology
Leiyuan Li
1
and Yanzhu Hu
2
Abstract
The real-time calculations of the positioning error, error correction, and state analysis have always been a difficult
challenge in the process of autonomous positioning. In order to solve this problem, a simple depth imaging
equipment (Kinect) is used, and a particle filter based on three-frame subtraction to capture the end-effector’s
motion is proposed in this article. Further, a back-propagation neural network is adopted to recognize targets. The
point cloud library technology is used to collect the space coordinates of the end-effector and target. Finally, a
three-dimensional mesh simplification algorithm based on the density analysis and average distance between points is
proposed to carry out data compression. Accordingly, the target point cloud is fitted quickly. The experiments
conducted in the article demonstrate that the proposed algorithm can detect and track the end-effector in real
time. The recognition rate of 99% is achieved for a cylindrical object. The geometric center of all particles is
regarded as the end-effector’s center. Furthermore, the gradual convergence of the end-effector center to the
target centroid shows that the autonomous positioning is successful. Compared to traditional algorithms, both
moving the end-effector and a stationary object can be extracted from image frames using a thesis. The thesis
presents a simple and convenient positioning method, which adjusts the motion of the manipulator according to the
error between the end-effector’s center and target centroid. The computational complexity is reduced and the
camera calibration is eliminated.
Keywords
Autonomous positioning, particle filter, manipulator, BP neural network, point cloud library
Date received: 9 November 2016; accepted: 2 August 2017
Topic: Special Issue - Biologically-Inspired Vision Systems in Robotics
Topic Editor: Antonio Fernandez-Caballero
Associate Editor: Ismael Garcia Varea
Introduction
In computer vision, 3-D reconstruction refers to the process
of 3-D scene restoration based on single view or multiple
view images. There are four kinds of reconstruction meth-
ods: binocular stereo vision, sequential imaging, photo-
metric stereo, and motion view analysis. The binocular
stereo method is suitable for larger objects. The sequential
imaging method is suitable for small objects. The photo-
metric stereo and motion view analysis methods are
1
Century College, Beijing University of Posts and Telecommunications,
Beijing, China
2
College of Automation, Beijing University of Posts and
Telecommunications, Beijing, China
Corresponding author:
Leiyuan Li, Century College, Beijing University of Posts and
Telecommunications, Beijing 102101, China.
Email: lileiyuan1985@163.com
International Journal of Advanced
Robotic Systems
September-October 2017: 1–14
ªThe Author(s) 2017
DOI: 10.1177/1729881417734829
journals.sagepub.com/home/arx
Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License
(http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without
further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/
open-access-at-sage).
suitable for large and complex scene reconstruction.
Because single video information is incomplete, recon-
struction based on it needs to use empirical knowledge.
Multi-view reconstruction is relatively easy. It involves the
calculation of the pose relation between the image coordi-
nate frame and the world coordinate frame. Then, 3-D
information is reconstructed using the plurality of 2-D
images. But the computational complexity and cost is usu-
ally high. For 3-D reconstruction using simple depth image
apparatus, there are a few studies showing certain results.
Izadi et al.
1
propose a new 3-D reconstruction technology
based on Microsoft Kinect. However, due to the limitation
of the time of flight (TOF) technology, the accuracy of
surface texture information is not high. The purpose of 3-
D reconstruction is to provide the real-time monitoring of
the end-effector positioning process and error. At the same
time, it can lay a foundation for the debugging and analysis
of the manipulator autonomy job task. Ma et al.
2
propose a
gradual human reconstruction method based on individual
Kinect. Body feature points are positioned in depth video
frames combined with feature point detection and an error
correction processing algorithm. The human body model is
obtained by estimating the body size. Guo and Gao
3
pro-
pose a robust automatic unmanned aerial vehicle (UAV)
image reconstruction method using a batch framework. Li
et al.
4
find a multi-view reconstruction method from the
perspective of motion visual analysis. A sparse point cloud
and initial mesh are built using each view bias model. Lv
et al.
5
propose a Bayesian network model that describes
body joint spatial relationships and dynamic characteris-
tics. A golf swing process 3-D reconstruction system is
built based on the similarities of swing movements. The
problem of limb occlusion is effectively solved using an
easy depth imaging device to capture motions. Lin et al.
6
utilize an adaptive window stereo matching reconstruction
method based on the integral gray variance and integral
gradient variance. The image texture quality is determined
according to the integral variance size. Related calculations
are performed if the variance is greater than a preset thresh-
old. The method needs to traverse the whole image to
obtain the dense disparity map. Izadi et al.
7
gather point
cloud data using a single mobile and four fixed Kinect
devices. The point cloud alignment and fitting problems
are solved using iterative closest points. Kahl and Hartley
8
convert 3-D reconstruction into a norm minimization prob-
lem. A closure approximate solution is derived using
second-order cone programming. In the case of known cam-
era rotation, the problem can be solved simultaneously for
the shift and space position of the camera. Carbone and
Gomez-Bravo
9
introduce a method for the vision-based
motion control of robot manipulators. A dynamic look-and-
move system architecture is discussed, as a robot-vision sys-
tem, which is closed at the task level. Kanatani et al.
10
describe techniques for 3-D reconstruction from multiple
images and summarize mathematical theories of statistical
error analysis for general geometric estimation problems.
Gruen and Thomas
11
address the question of where and how
an imaged object is located in the object space. Basic com-
ponent algorithms, such as image matching, segmentation,
feature extraction, and so on, are favorably supported and
constrained by the use of orientation parameters.
In this article, the target object is determined using back-
propagation (BP) neural network recognition. The end-
effector is extracted using motion state estimation based
on a particle filter (PF). Then, the target centroid (TC) and
end-effector center (EEC) are extracted. The spatial coor-
dinates of the EEC and TC are calculated using coordinate
frame transforming. The errors between EEC and TC are
obtained. The error is mapped to the joint angle by inverse
kinematics modeling. Then, the motion of the manipulator
is adjusted using forward kinematics modeling. Therefore,
autonomous positioning is achieved. The general flowchart
is shown in Figure 1. The algorithm effectiveness can be
verified using fast surface fitting.
End-effector motion estimation
The position of the EEC can be obtained by motion estima-
tion. The image coordinates of this center point can be
transformed into the space coordinates according to hand-
eye calibration. Then, an inverse kinematics model is used
to transform the space coordinates of the center point into
the joint angles. Accordingly, the end-effector is controlled
to reach the target position.
Figure 1. General flowchart.
2International Journal of Advanced Robotic Systems
The moving object and static background are separated
from frame-stream images using the three-frame differen-
cing method proposed by Martin and Robert.
12
The con-
secutive three-frame differencing method can better deal
with the environmental noise, such as the weather, light,
shadows, and messy background interference. It is shown
to be better than the two-adjacent differencing method in
double-shadow treatment proposed by Andre et al.
13
The
expressions are as follows
di;i1ðx;yÞ¼jIiðx;yÞIi1ðx;yÞj
diþ1;iðx;yÞ¼jIiþ1ðx;yÞIiðx;yÞj
(1)
where Ii1ðx;yÞdenotes the pixel distribution of the previ-
ous image, Iiðx;yÞdenotes the pixel distribution of the
current image, Iiþ1ðx;yÞdenotes the pixel distribution of
the next image, and di;i1ðx;yÞand diþ1;iðx;yÞdenote dif-
ference images. The binary form of the differencing image
is obtained by selecting an appropriate threshold
bi;i1ðx;yÞ¼(1;if di;i1ðx;yÞT
0;if di;i1ðx;yÞ<T
biþ1;iðx;yÞ¼(1;if diþ1;iðx;yÞT
0;if diþ1;iðx;yÞ<T
8
>
>
>
>
>
<
>
>
>
>
>
:
(2)
where bi;i1ðx;yÞdenotes the binary image and Tdenotes
the threshold manually set to divide the foreground and
background. If the value is less than T, it is the background,
otherwise, it is the foreground. The “and” operation is done
on the binary image. The double shadow becomes a single
shadow. Another binary image is obtained
Biðx;yÞ¼ 1;if bi;i1ðx;yÞ&biþ1;iðx;yÞ¼1
0;if bi;i1ðx;yÞ&biþ1;iðx;yÞ 6¼ 1
(3)
Then, the morphology erosion and the dilation are
applied to the binary image to remove holes.
The motion state is estimated after the moving end-effector
is extracted. So, it is convenient to calculate the position error
between the end-effector and target object. The motion of the
end-effector is nonlinear in video frames. The PF state estima-
tionisadoptedinthestudybySumanetal.
14
A series of random
particle sets S¼fsðnÞ;pðnÞjn¼1;2;...;Mgwith weights
is used to approximate the posterior probability density
function. sðnÞdenotes the sampling and it describes the
prediction of the target state. pðnÞdenotes the discrete
sampling probability. The PF is independent of the model
of the system without linearization errors or Gaussian noise
restrictions. It can be applied to any state and measurement
model in any environment. The end-effector’s state equa-
tion and measurement equation are as follows
xtþ1¼Fðxt;NtÞ
yt¼Gðxt;VtÞ
(4)
where xtdenotes the end-effector’s position state at time t,
and y0:tdenotes the observation sequence fy0;y1;...;ytg
from the beginning time to the current time t.
fy0;y1;...;ytgcorresponds to the position distribution
sequence fx0;x1;...;xtg. The state equation describes the
state transition probability pðxtþ1jxtÞ. The function Fis the
target state function in the time domain. The measurement
equation describes the conditional probability pðytjxtÞin
the presence of measurement noise. Gis a nonlinear mea-
surement function. The independent dynamic vectors Nt
and Vtdenote random white noise sequences in the discrete
time domain. Ntdenotes the state noise. Vtdenotes the
measurement noise. The autonomous positioning system
of the manipulator is nonlinear and non-Gaussian. Theore-
tically, when the number of particles N!1, the Monte
Carlo method can always be used to approximate any real
probability distribution. In the case of a given measurement
sequence y0:t, the end-effector’s position xtat the current
time is estimated.
In addition, assume that the target state follows a
Markov process, namely, the current state depends only
on the previous state. When the conditional probability
pðxtjy0:tÞis used to describe a given observation
sequence y0:t, the state transition probability of the
end-effector pðxtjx0:t1Þcan be expressed as pðxtjxt1Þ.
When the measured values are independent of the state
space, according to the Bayes rule, the conditional prob-
ability pðxtjy0:tÞis calculated as follows
pðxtjy0:tÞ¼pðyt;y0:t1jxtÞpðxtÞ
pðyt;y0:t1Þ
¼pðytjy0:t1;xtÞpðy0:t1jxtÞpðxtÞ
pðytjy0:t1Þpðy0:t1Þ
¼pðytjy0:t1;xtÞpðxtjy0:t1Þpðy0:t1ÞpðxtÞ
pðytjy0:t1Þpðy0:t1ÞpðxtÞ
¼pðytjxtÞpðxtjy0:t1Þ
pðytjy0:t1Þ(5)
where pðytjxtÞdefines the system measurement noise
model. The a priori probability distribution pðxtjy0:t1Þ
describes the state transition probability density
and system knowledge model. At time t1, accord-
ing to the conditional probability density pðxt1jy0:t1Þ
of the end-effector’s motion state, the following for-
mula is derived using the Chapman–Kolmogorov
equation
pðxtjy0:t1Þ¼ðpðxtjxt1Þpðxt1jy0:t1Þdxt1(6)
In equation (5), pðytjy0:t1Þdenotes the measurement
vector
pðytjy0:t1Þ¼ðpðytjxtÞpðxtjy0:t1Þdxt(7)
When the number of particle sets is large enough, these
particles are set to replace the true posterior probability
distribution. The end-effector’s posterior probability
Li and Hu 3
distribution pðxtjy0:tÞcan be approximated by Nindepen-
dent identically distributed (IID) random variables
pðxtjy0:tÞ1
NX
N
t¼1
dðxtxi
tÞ^
pðxtjy0:tÞ(8)
where dðxtxi
tÞdenotes the Dirac dunit pulse function,
and xi
tdenotes IID of pðxtjy0:tÞ. When Nis large enough,
^
pðxtjy0:tÞcan effectively approach the posterior distribution
pðxtjy0:tÞ. But, when pðxtjy0:tÞis a high-dimensional prob-
ability distribution, its direct sampling is very difficult.
Usually, a proposal distribution qðxtjy0:tÞis selected to
obtain a sample. Then, the nonlinear function hðxtÞof any
state variable xtcan be estimated
EðhðxtÞÞ ¼ ðhðxtÞpðxtjy0:tÞdxt
¼ðhðxtÞpðy0:tjxtÞpðxtÞ
pðy0:tÞqðxtjy0:t1Þqðxtjy0:t1Þdxt
¼1
pðy0:tÞðhðxtÞoðxtÞqðxtjy0:tÞdxt(9)
where oðxtÞ¼pðy0:tjxtÞpðxtÞ
qðxtjy0:t1Þdenotes a nonnegative weight of
sample xi
t. Normalization
~
oðxi
tÞ¼ oðxi
tÞ
PN
j¼1oðxj
tÞ(10)
So, EðhðxtÞÞ can be approximated as follows
EðhðxtÞÞ  X
N
i¼1
~
oðxi
tÞhðxi
tÞ(11)
When the sampling data qðx0:t1jy0:t1Þat time t1 and
the latest sample qðxtjx0:t1;y0:tÞat time tare known, the
proposal distribution can be broken down into
qðx0:tjy0:tÞ¼qðx0:t1jy0:t1Þqðxtjx0:t1;y0:tÞ
¼qðx0ÞY
t
k¼1
qðxkjx0:k1;y0:kÞ(12)
The recursive calculation of weight oðxi
tÞ
oðxi
tÞ¼pðy0:tjxtÞpðxtÞ
qðxtjy0:t1Þ/pðxi
0:tjy0:tÞ
qðxi
0:tjy0:tÞ
¼pðytjxi
tÞpðxi
tjxi
t1Þ
qðxi
tjxi
0:t1;y0:tÞpðxi
0:t1jy0:t1Þ
qðxi
0:t1jy0:t1Þ
¼oðxi
t1Þpðytjxi
tÞpðxi
tjxi
t1Þ
qðxi
tjxi
0:t1;y0:tÞ(13)
Object recognition
Image preprocessing
This section describes preprocessing based on Kinect RGB
images. Target recognition is illustrated by an example
using a cylindrical target object (CTO). The end-effector
and CTO appear in the same video. First, image gray
processing is carried out to convert color images to gray
ones in order to reduce computation. The gray image
gradation is 0–255. The grayscale method, Gray ¼
0.114Bþ0.587Gþ0.299R, is used, as in the study by Refael
and Richard.
15
Second, image median filtering is carried out. It is a type
of nonlinear smoothing methods. Chang et al.
16
find that it
cannot blur edges while suppressing random noise. The
gray values of pixels are sorted in a sliding window. The
original gray value of each pixel in the window center is
substituted by the median.
Third, mathematical morphology operations are carried
out. Dilation and erosion operations are used in the study by
Chang et al.
17
Theyarewidelyusedinedgedetection,
image segmentation, image thinning, noise filtering, and
so on. Assume that E(x,y) denotes the binary image and
B(s,t) denotes structural elements. The following opera-
tions are used.
(I) Morphology dilation
X¼EB¼fðs;tÞ:Bðs;tÞ\E:g(14)
(II) Morphology erosion
Y¼EB¼fðs;tÞ:Bðs;tÞEg(15)
Then, the weighted fusion between the input image and
its “canny” operator detection is conducted. The threshold
segmentation of the fusion image is carried out. Image
segmentation is a basis for determining feature parameters.
The whole contour of the object is obtained after image
segmentation. An example is demonstrated in Figure 2. The
shaded area represents the boundary. Finally, this should
make the system possess capable of automatically extract-
ing geometric features. These features should stay invariant
when the image is transformed by translation, rotation,
twisting, scaling, and so on.
There are two kinds of CTO feature parameters: edge
contour features and shape parameters. The parameters of
contour points are edge contour features. Shape parameters
include the perimeter, area, longest axis, azimuth, boundary
matrix, and shape coefficient.
Figure 2. Calculation example of feature parameters.
4International Journal of Advanced Robotic Systems
(I) Contour points represent the required number of
pixels which can outline the contour. The number
of contour points is 22 in Figure 2.
(II) The perimeter represents the contour length of the
outer boundary. It can be calculated as the sum of
the distances between two adjacent pixels on the
outer boundary. Assume that the distance between
two edge pixels sharing a side is 1, otherwise it is
ffiffi
2
p. So, the perimeter is 14 þ8ffiffi
2
pin Figure 2.
(III) The area can be represented as the number of
pixels in the target region. So, the area is 41 in
Figure 2.
(IV) The longest axis denotes the maximum extension
length of the target region, that is, the connection
line of the maximum distance between two pixel
points on the outer boundary. So, the longest axis
is 8 in Figure 2.
(V) The azimuth represents the angle between the
longest axis and the x-axis in the target region.
So, the azimuth is 0 in Figure 2.
(VI) The boundary matrix denotes the minimum matrix
encompassing the target region. It is also the intui-
tive expression of the flat level of the target
region. It is composed of four outer boundary tan-
gents. Two of them are parallel to the longest axis,
and the other two are perpendicular to the longest
axis. So, in Figure 2, the boundary matrix is
M¼
01111000
01000110
00100001
11000001
10000010
01011010
01100110
2
6
6
6
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
7
7
7
5
(VII) The shape coefficient denotes the ratio of the
area to the square of the perimeter. So, the shape
coefficient is 0.0639 in Figure 2.
Neural network recognition
This section describes recognition based on Kinect RGB
images. A BP neural network learning algorithm is used in
this article. This network can learn and store a large amount of
input–output mappings. It is also a mathematical equation that
does not use the description of these mappings in advance. Its
learning rule is the gradient descent method. The mean square
error (MSE) is minimized by adjusting the network weights
and thresholds continuously. The network topology includes
the input layer, the hidden layer, and the output layer.
Feature vectors are extracted as a training sample. The
neural network is used as a classifier instead of the Eucli-
dean distance method to implement target recognition.
The design of the input and output layers is as follows.
The number of nodes is 7 in the input layer. The elements
of the input vector are fcontour points, perimeter, area,
longest axis, azimuth, boundary matrix, and shape
coefficientsg. The number of nodes is 3 in the output layer.
The elements of the output vector are fcylinder, square,
and sphericalg. The normalized values of the output are
0.1, 0.2 and 1, respectively. The design of the hidden layer
is as follows: There are two hidden layers which include a
logarithmic characteristic function and “purelin” function.
The number of nodes is 20 in the first hidden layer. The
number of nodes is 3 in the second hidden layer. A linear
excitation function is used in the output layer. The number
of hidden layers is related to the number of neurons and
specific issues. At present, it is difficult to provide an
accurate function to describe this relation. Experiments
show that the accuracy of the network does not necessarily
increase when the number of hidden layers and neurons is
increased. The initial number of the hidden layers can be
selected as
n0¼ffiffiffiffiffiffiffiffiffiffiffiffi
mþn
pþa(16)
where mdenotes the number of neurons in the input layer, n
denotes the number of neurons in the output layer, and a
denotes an integer from 1 to 10. Here, n0is set to be 15.
The sample set is collected from a shooting scene of
Kinect. In Figure 3, there are cylindrical objects, square
objects, and spherical objects. There are 30 cylindrical
objects (Figure 3(a)), 10 square objects (Figure 3(b)), and
10 spherical objects (Figure 3(c)). For each target object,
there are 20 different viewing angles (the schematic dia-
gram is shown in Figure 3(d) to (f)). So, the numbers of
cylindrical objects, square objects, and spherical objects are
600, 200 and 200, respectively, in the sample set. In Figure
3(g), the edge contour is extracted. For the number of sam-
ples, there is no exact formula in any references. But, it is
not the case where more is better. There are 970 samples in
the study by Shi et al.
18
and 150 samples in the study by Liu
and You.
19
In addition, Hao and Jiang
20
prove that the
sample size 1000 seems reasonable according to the sample
choice method.
Network training: The weights of the neurons are
adjusted in the training process. The training stops when
the MSE reaches 10
7
. The maximum number of iterations
is set to 10,000. The momentum constant is set to 0.8. The
initial learning rate is 0.01. The ratio of increase in the
learning rate is 1.05. The decrease ratio of the learning rate
is 0.7. The sample set is normalized. The range is [0,1]. The
normalization function is
fðxÞ¼1ð1þexÞ
(17)
The TC is calculated after target identification. The
shape of the target object is regular, so the spatial position
of the TC is the destination of the end-effector positioning.
The TC is calculated as follows
Li and Hu 5
x¼Pie
ibPje
jbiðpij TÞ
Pie
ibPje
jbðpij TÞ
y¼Pie
ibPje
jbjðpij TÞ
Pie
ibPje
jbðpij TÞ
8
>
>
>
>
>
<
>
>
>
>
>
:
(18)
where iband iedenote the minimum pixel and maximum
pixel of the target object, respectively, along the row direc-
tion, jband jedenote the minimum pixel and maximum
pixel of the target object, respectively, along the column
direction, Tdenotes the adaptive threshold, and pij denotes
the grayscale value.
Autonomous positioning
Point cloud acquisition
The motion end-effector extracted from image frames
belongs to the active object in the positioning system. Tar-
get objects identified from image frames belong to the
passive object in the positioning system. After the active
object and passive object are determined in the sections
“End-effector Motion Estimation” and “Object
Recognition,” their point cloud data (also called the target
point cloud, TPC) is obtained by Kinect. It contains the 3-D
coordinates and RGB information of each point.
The point cloud library (PCL) is an open-source point
cloud processing library. It includes point cloud data acqui-
sition and processing, filtering, feature extraction, surface
reconstruction, point cloud registration, point cloud inte-
gration, and so on. The OpenNI is used as the system I/O
interface in the PCL. Richard et al.
21
suppose that the dis-
tance is d
raw
from a point P to camera in the Kinect mea-
surement space. According to the 3-D coordinate extraction
method provided by Zhu et al.,
22
the actual position of
point P in the space is
d¼KtanðHd raw þLÞO(19)
where dand d
raw
are in centimeters, H¼3:5104rad,
K¼12:36 cm, O¼3:7 cm, and L¼1:18 rad. Then, the
coordinate (x,y,z)/cm is extracted as
x¼ðuu0Þfxd
y¼ðvv0Þfyd
z¼d
8
>
<
>
:
(20)
Figure 3. Image preprocessing. (a) 30 kinds of cylindrical objects. (b) 10 kinds of square objects. (c) 10 kinds of spherical objects. (d)
RGB images and binary segmentation of cylindrical objects from different perspectives. (e) RGB images and binary segmentation of
square objects from different perspectives. (f) RGB images and binary segmentation of spherical objects from different perspectives. (g)
Edge contour extraction.
6International Journal of Advanced Robotic Systems
where (u,v) denotes the projection position of point P in the
image frame
fx¼587:944 pixel;fy¼589:418 pixel
u0¼313:222 pixel;v0¼259:304 pixel
is the official calibration data. f
x
and f
y
denote the conver-
sion factors between the imaging plane pixel and space
physics length. u
0
and v
0
denote the image origin.
Point cloud processing
The number of the TPC is of order 10
6
in each frame. The
processing of huge and unordered data consumes a lot of
time and hardware resources. It affects the speed of the
algorithm. He et al.
23
find that noise in a point cloud can
be removed by preprocessing based on density analysis.
The point cloud distribution obtained from Kinect is not
uniform. The average distance between each point and its
neighborhood points follows approximately Gaussian dis-
tribution. There are some outliers in the point cloud data.
Their neighborhood points are always closer. The neigh-
borhood average distance (NAD) sis generally large. If the
average distance is not in the standard range determined by
the mean and standard deviation s, the point is defined as
an outlier, and it should be removed. The probability den-
sity function of the NAD is
fðsiÞ¼ 1
ffiffiffiffiffiffiffiffi
2ps
pexp ðsiÞ2
2s2
!
ði¼1;2;3;...Þ
(21)
where sidenotes the NAD of any point, denotes the mean
of the NAD, and sdenotes the standard deviation. The
number of neighborhood points is set to m. The multiple
of the standard deviation is set to n. When the NAD of a
point exceeds the global average distance ns,thepoint
should be marked as an outlier and removed. When m
increases and ndecreases, the noise determined to be out-
liers is lower. mand nare set to 25 and 4, respectively, by
experiments.
TPC data compression
In order to achieve fast 3-D surface fitting, TPC data com-
pression is necessary. Merry et al.
24
propose a 3-D mesh
simplification algorithm based on the average distance
between points. The 3-D mesh can be assumed to be a cube
surrounded by the 3-D coordinates of the point set. The side
length of the cube is the maximum coordinate difference
between the 3-D data points. In the smallest mesh, the
average distance method proposed selects a part of feature
points instead of all data points. The accuracy of simplifi-
cation is determined by the size of the mesh. When the cube
mesh is smaller, the simplification precision is higher.
The idea of the average distance method is that when the
point cloud density is higher, the distance between points is
smaller. Otherwise, the size of the point cloud density can
also be determined by the distance. According to the size of
the point cloud density, the number of points to delete is
chosen. The steps are as follows:
(I) Define the side length Lof the cube mesh. Define
the rate between the expected simplification
data and original data.
(II) Assume that a random point Pis a starting point.
The remaining point set is P0¼fP0
iðxi;yi;ziÞ;
i¼1;2;...;ng. Then, the distance diis calcu-
lated from Pto any point of P0.
(III) The average distance of Pis
d¼Pn1
i¼0di
n(22)
(IV) All point sets in the mesh are calculated according
to the above steps. The points with the minimum
average distance are simplified by . Therefore,
point cloud data compression is realized.
Coordinate frames transforming
In section “Point cloud acquisition,” the TPC extracted
describes the positional relationship between the camera
coordinate frame fcgand target coordinate frame ftg.
However, this is not enough. According to the principle
of inverse kinematics, the TPC needs to be transformed
into the base coordinate frame. Thereby, the system can
control the trajectory of the manipulator. The forward kine-
matics model describes the relative position of the end-
effector coordinate frame fegand base coordinate frame
fbg. Now, the TPC is mapped to the spatial coordinates
relative to fbg. The origin of the world coordinate frame
fwgis set at the base. fbgand fwgcoincide. Coordinate
frame transforming is shown in Figure 4.
The homogeneous transformation matrix describes the
relative position of the two coordinate frames with dimen-
sion 4 4.
Definition:
C
T
B
denotes the position of fcgin fbg,
C
T
E
denotes the position of fcgin feg,and
E
T
B
denotes the
position of fegin fbg. Coordinate frame transforming is
CTB¼CTEETB(23)
where CTEcan be calculated as in the study by Ivan et al.
25
Basedontheforwardkinematicsmodel,ETB¼0TN.
Therefore, the 3-D TPC is transformed into the base coor-
dinate frame.
The spatial coordinates of the TPC are mapped to the
rotation angles of each joint by inverse kinematics model-
ing.
26
The corresponding command is sent out to control
the movement of the manipulator. The end-effector reaches
the target position using forward kinematics. Therefore,
autonomous positioning is achieved.
Li and Hu 7
Finally, surface fitting is made as in the study by Bloo-
menthal
27
to validate the correctness of the proposed posi-
tioning method.
Experiment and analysis
The experimental platform consists of a computer
(AcerTMP455, 16G memory, 500G SSD), manipulator
system, and a simple depth imaging device Kinect, as
shown in Figure 5. Kinect work principle specification is
described on the corresponding webpage. The software
includes VCþþ2010, OpenNI, MATLAB 2012a, and
Kinect for Windows SDK v1.7.
Kinematics analysis
The proposed method is illustrated using the example of a
five-degree-of-freedom manipulator, as shown in Figure 6.
There are five rotation axes, five joints, and three links.
Five joints are the waist rotation joint (the first joint, J1),
arm pitching joint (the second joint, J2), forearm pitching
joint (the third joint, J3), wrist pitching joint (the fourth
joint, J4), and wrist rotation joint (the fifth joint, J5). The
DH parameters are determined as shown in Table 1, where
yidenotes the rotation angle along the z-axis, idenotes the
rotation angle along the x-axis, aidenotes the distance
along the x-axis between two neighbor z-axes, didenotes
the distance along the z-axis between two neighbor x-axes,
r1denotes the arm length, r2denotes the forearm length,
and r3denotes the wrist length.
The transforming matrices are obtained as
0T1¼
1000
0100
0010
0001
2
6
6
6
4
3
7
7
7
5
1T2¼
0:8940 0:4481 0 67
0010
0:4481 0:8940 0 0
0001
2
6
6
6
4
3
7
7
7
5
2T3¼
0:5985 0:8012 0 80
0:8012 0:5985 0 0
0010
0001
2
6
6
6
4
3
7
7
7
5
3T4¼
0:4481 0:8940 0 490
0010
0:8940 0:4481 0 0
0001
2
6
6
6
4
3
7
7
7
5
4T5¼
1000
0010
0100
0001
2
6
6
6
4
3
7
7
7
5
0T5¼
0:4006 0:4481 0:7992 433:5
0:8940 0 0:4481 0
0:2008 0:8940 0:4006 255:4
000 1
2
6
6
6
4
3
7
7
7
5
Because the origin of the base coordinate frame is set at
the center of J1, and the waist length is 76 mm, the space
motion range of J1 is (0, 0, 38 mm). The range of J2 is (0, 0,
76 mm). The space trajectory range of J3 is shown in
Figure 7(a). The space trajectory range of J4 is shown in
Figure7(b). The space trajectory range of the end-effector is
shown in Figure 7(c).
Take point A (414 mm, 890 mm, 1045 mm) in the
world coordinate frame as an example. The inverse
kinematic solutions are shown in Table 2. The optimal
solution (21.61, 125.73, 108.42, 99.41, 0) is obtained
using the optimization method from the study by Hu
and Li.
26
Figure 5. Experimental platform.
Figure 6. Manipulator diagram.
Table 1. DH parameters.
Links yi()i()ai(mm) di(mm)
1y1067 0
2y2080 0
3y30r1(490) 0
4y40r2(530) 0
5y590 0 r3(480)
Figure 4. Coordinate frame transforming.
8International Journal of Advanced Robotic Systems
End-effector state estimation
The video is transformed into the image frame format
(480 640 3) as shown in Figure 8(a). There are 73
frames in total. The binary threshold Tis set to 0.0118.
The static background is extracted using the three-frame
differencing method as shown in Figure 8(b). When the
end-effector is moving, its moving area is extracted
from each frame. It is marked with green squares as
showninFigure8(c).
In the PF, the number of particles is M¼100. The
process noise covariance is 10. The measurement noise
covariance is 1. The initial state of the EEC and particles
is shown in Figure 9(a). Red “” denotes the EEC’s initial
position. Black “” denotes the distribution of 100 particles.
Blue “” denotes the geometric center of the particles.
The tracking process of the end-effector using the PF is
illustrated in Figure 9(b). Red “” denotes the state change
of the EEC. Black “” denotes the state change of the par-
ticles. In the process of tracking, all particles are clustered.
Blue “” denotes the state change of the particle center. The
Figure 7. Joint motion trajectories. (a) Motion range of J3. (b) Motion range of J4. (c) Motion range of the end-effector.
Table 2. Inverse kinematic solutions.
Rotary angle ()
The number
of solution (k) y1y2y3y4
1 21.61 83.82 91.38 160.23
2 21.61 87.63 87.98 147.20
3 21.61 91.44 87.00 138.17
4 21.61 95.25 87.15 130.92
5 21.61 99.06 88.07 124.79
6 21.61 102.87 89.59 119.47
7 21.61 106.68 91.63 114.82
8 21.61 110.49 94.14 110.75
9 21.61 114.30 97.09 107.19
10 21.61 118.11 100.46 104.13
11 21.61 121.92 104.24 101.53
12 21.61 125.73 108.42 99.41
13 21.61 129.54 112.98 97.74
14 21.61 133.35 117.90 96.53
15 21.61 137.16 123.17 95.78
16 21.61 140.97 128.77 95.49
17 21.61 144.78 134.67 95.67
18 21.61 148.59 140.84 96.31
Li and Hu 9
clusters of the particles and their centers coincide substan-
tially. During the end-effector moving, the center of the
particles approaches the EEC gradually. Their centers
almost coincide. The state trend is also almost the same.
So, the EEC can be approximated using the geometric cen-
ter of the particles. Therefore, the position of the end-
effector can be obtained in real time.
The general trends of the actual position and estimated
position coincide. This indicates that this algorithm can pre-
dict the end-effector’s position. At the same time, the experi-
ment shows that the adaptability of the system is improved.
While the static background is extracted, the moving
end-effector is also detected and tracked in the scene.
Target recognition experiment
The edge contour of the target object is extracted according
to the section “Object recognition.” Then, the shape para-
meters (seven kinds) are calculated to obtain the sample set.
The dimension of the sample set is 1000 8 (cylindrical
600, square 200, and spherical 200). The dimension of the
training set is 900 8 (cylindrical 540, square 180, and
spherical 180). The dimension of the test set is 100 8
(cylindrical 60, square 20, and spherical 20). The normal-
ization processing of the sample set is carried out. The
situation with the BP network training is shown in Fig-
ure 10. This training takes 5 s with 47 iterations. The blue
line represents the convergence of the training sample. The
red line represents the convergence of the test sample. The Figure 10. Convergence of BP neural network training.
0 100 200 300 400 500 600
0
50
100
150
200
250
300
350
400
450
(a)
(b)
0
50
100
150
200
250
300
350
400
450
Image x-axis (pixel)
0 100 200 300 400 500 600
Image x-axis (pixel)
Image y-axis (pixel)
EEC state
Particles
The center of particles
Image y-axis (pixel)
EEC state
The center of particles
Figure 9. State estimation of end-effector. (a) Initial position of
EEC, particles distribution, and initial position of particle center.
(b) Motion of EEC is tracked by center of particles.
Figure 8. Target Separation. (a) Image frame. (b) Static object
extraction. (c) End-effector extraction.
10 International Journal of Advanced Robotic Systems
network stops training when the MSE reaches 1.24 10
8
.
The test set converges when the MSE reaches 7.84 10
5
.
The function gradient decreases from 1.49 to 9.41 10
6
.
The dotted line denotes the preset MSE of the stopping
training.
The results of the identification are shown in Figure 11.
The horizontal axis denotes the number of test samples (per
group). The vertical axis denotes the classification (iden-
tification result). Blue “.” denotes the network predictive
result. Red “” denotes the actual classification. If the
classification is equal to 0.1, the target is a cylindrical
object. If the classification is equal to 0.2, the target is a
square object. If the classification is equal to 1, the target
is a spherical object. From Figure 11, there is probably a
one-to-one correspondence between the network classifi-
cation and actual results. Fifty-nine cylindrical objects are
identified correctly, one is identifiedtobesquare(prob-
ability 98.3%). Twenty square objects are all identified
correctly (probability 100%). Twenty spherical objects
are all identified correctly (probability 100%). So, the
recognition rate is 99%for the test sample. Further, a
cylindrical object is selected randomly from the test
sample. Its coordinate of the TC is (507 pixels, 306 pixels)
according to equation (18).
The high recognition rate shows that the features
extracted are comprehensive and critical. It also indicates
that the design of the BP network is rational. This algorithm
does not require a precise mathematical model and is able
to adapt to changes in the scene. At the same time, it has the
learning ability and improves the intelligence level of the
system. However, there are some shortcomings. First, there
is no universal theoretical guidance for the selection of the
network structure. Therefore, it takes a long time to carry
out preliminary experiments. Second, the learning ability
0 102030405060708090100
Test sample/group
0
0.5
1
1.5
Output/class
0
0.5
1
1.5
Output/class
Prediction
Actual
Prediction
Actual
0 1020 3040 5060 7080 90100
Validation sample/group
Figure 11. Recognition results.
Table 3. 3-D coordinates of EECs in base coordinate frame.
EEC (c)
Frames (F) x(mm) y(mm) z(mm) Frames (F) x(mm) y(mm) z(mm) Frames (F) x(mm) y(mm) z(mm)
1 0 0 0 26 211.5 132.7 816.3 50 221.2 62.7 816.6
2 211.3 205.4 818.1 27 216.4 130.9 815.2 51 221.7 57.6 816.5
3 216.5 201.4 818.1 28 214.9 127.4 813.4 52 222.0 54.0 815.3
4 215.5 199.3 819.3 29 212.8 124.9 809.7 53 217.9 52.6 813.7
5 213.9 196.0 821.5 30 217.8 122.8 805.4 54 219.0 50.5 813.8
6 215.3 194.9 822.4 31 217.9 116.7 805.3 55 214.4 47.8 813.9
7 216.3 188.6 823.0 32 210.8 114.7 803.8 56 216.2 44.7 813.6
8 215.5 187.9 820.5 33 211.7 111.9 803.4 57 217.3 39.4 813.5
9 213.5 185.1 820.2 34 223.3 110.2 801.5 58 216.9 38.4 813.7
10 221.3 182.7 820.2 35 221.4 107.7 801.6 59 215.9 36.8 813.4
11 221.8 177.0 819.0 36 222.0 104.5 801.9 60 216.1 34.4 813.3
12 222.8 174.9 818.9 37 225.5 99.5 801.2 61 215.4 30.7 816.5
13 221.9 173.1 816.0 38 221.6 97.7 801.1 62 213.5 27.8 816.6
14 221.3 170.6 820.8 39 222.3 94.8 801.5 63 217.2 22.9 818.0
15 222.2 167.0 818.9 40 219.6 92.6 803.2 64 217.4 19.7 820.1
16 219.0 165.7 818.4 41 223.3 89.0 803.4 65 211.0 21.3 818.6
17 219.6 159.4 820.4 42 221.2 88.2 805.3 66 212.3 18.0 818.5
18 214.8 158.1 820.0 43 221.3 84.7 805.1 67 214.7 11.6 820.3
19 216.3 157.1 818.0 44 222.9 80.5 809.8 68 208.3 8.8 818.3
20 210.8 154.4 818.1 45 221.0 78.1 813.9 69 209.3 6.6 818.3
21 212.8 151.3 820.6 46 220.8 72.4 811.7 70 213.2 3.2 820.2
22 214.3 146.5 818.6 47 220.4 71.1 811.9 71 211.4 3.2 820.3
23 208.5 143.9 820.3 48 220.1 67.9 813.1 72 210.3 3.4 823.3
24 209.2 142.9 818.5 49 222.4 65.6 815.3 73 212.6 3.5 823.9
25 212.7 139.9 816.3
Li and Hu 11
and convergence of the neural network are closely related
to the sample.
Positioning experiment
According to the section “Coordinate frames trans-
forming,” we obtain the following results
ETB¼
0:4006 0:4481 0:7992 433:5
0:8940 0 0:4481 0
0:2008 0:8940 0:4006 255:4
000 1
2
6
6
6
4
3
7
7
7
5
CTE¼
0:4591 0:3678 0:0932 250
0:1897 0:2539 0:6363 540
0:5302 0:4011 0:4728 1080
0001
2
6
6
6
4
3
7
7
7
5
CTB¼
0:1262 0:1224 0:5691 472:8231
0:0232 0:4838 0:5203 295:2540
0:6659 0:1851 0:4334 729:4052
000 1
2
6
6
6
4
3
7
7
7
5
The 3-D coordinates of the end-effector and target
object are obtained by the PCL acquisition, removing out-
liers, data compression, and so on. For the video (including
73 frames), the 3-D coordinates of the EECs are shown in
Table 3. (x,y,z) denote space coordinates of EECs in the
base coordinate frame. Before data compression, the TPC
is shown in Figure 12(a). After data compression, the TPC
is shown in Figure 12(b). The 3-D coordinates of the
cylindrical TC are (214.2 mm, 3.9 mm, 825 mm).
From Table 3, the EEC gradually approaches the TC in
the time domain. The absolute errors are 214.2
212.6¼1.6 mm, 3.5(3.9) ¼0.4 mm, and 825
823.9¼1.1 mm along the x-, y-, and z-axes, respectively.
The theoretical values are constant along the x-axis. But the
experimental data fluctuate within a certain range. The
maximum random fluctuation is 225.5 208.3 ¼17.2
mm. The theoretical values decrease along the y-axis. The
experimental data also decrease. The theoretical values
increase along the z-axis. The trend of the experimental
data is not increasing. The data are sometimes unchanged
or fluctuant in some consecutive frames. The reason for the
deviation is the low pixel accuracy of Kinect.
The deviation between the EEC and TC represents the
positioning error. It provides reference data for the manip-
ulator’s motion control. Under ideal conditions, the posi-
tioning is successful if the TC coordinates coincide with the
EEC coordinates. But in the experiment, it is normal if
there is a certain deviation. A known condition: The clamp-
ing mechanism has the maximum opening range 287 mm.
According to the actual positioning requirement, the max-
imum permissible errors are 20 mm, 25 mm, and 20 mm
along the x-, y-, and z-axes, respectively.
The surface fitting is implemented using triangular
facets and morphological processing as shown in Figure
12(c). The article only shows the fitting results for the
5th, 25th
,
45th, 65th, and 73rd frames. This process verifies
the correctness of the kinematic model and the effective-
ness of the autonomous positioning algorithm. At the same
time, it provides a convenient way for the visual monitoring
of the positioning process.
Comparative analysis
In sections “Kinematics Analysis,” “End-effector State
Estimation,” and “Target Recognition Experiment,” the
active and passive objects are extracted from the scene
based on the PF, BP recognition, and PCL technology. The
Figure 12. Fitting results. (a) TPC before data compression.
(b) TPC after data compression. (c) Triangular facets and
morphological processing.
12 International Journal of Advanced Robotic Systems
position of the end-effector is adjusted using the error
between the EECs and TC as a reference. Comparative
experiments are carried out as shown in Table 4.
The accumulated error cannot be corrected in the study
by Izadi et al.
1
The 3-D reconstruction based on the mono-
cular camera only relies on the mathematical model in the
study by Li et al.
4
The requirement for the light source is
harsh. The 3-D reconstruction based on the binocular vision
requires pattern matching and a lot of computations in the
study by Lin et al.
6
The effect of the reconstruction is
significantly reduced in the case of a large baseline dis-
tance. The error of the position estimation is close to one
reported in Carbone and Gomez-Bravo.
9
The real time of
the point cloud fitting is higher than that in the study by
Kanatani et al.
10
The recognition rate of 99%based on the
BP neural network is higher than 90%based on the study
by Gruen and Thomas.
11
The visual calibration is not
required and calculations are clearly simplified in this arti-
cle compared with that reported in Hu and Li.
28
The following conclusions are obtained from the com-
parative analysis.(I) Calculations are simplified using the
PCL acquisition and data compression.(II) The hardware
system has low light source demanding, mainly affected by
direct sunlight. The reason is that Kinect uses near-infrared
light. The solar spectrum interferes with Kinect.(III) The
neural network recognition reduces the reliance on a math-
ematical model.(IV) The reconstruction accuracy is close
to one reported in Izadi et al.
1
and Lin et al.
6
The camera
pixel accuracy, target recognition, and end-effector’s
extraction all affect the accuracy of the surface fitting. In
summary, the system efficiency improves significantly.
Conclusion
The article proposes a PF based on three-frame differen-
cing to detect and track the end-effector’s motion in a static
background. The target objects are identified based on the
BP neural network classification idea. The coordinates of
the TC are calculated. The positioning is successful when
the positioning error is within an allowed range. Then, the
TPC is collected based on the PCL technology. Outliers
are removed, and data compression is carried out. The
coordinate frame transforming model is established. TPCs
are transformed into the base coordinate frame from the
camera coordinate frame. Therefore, the coordinates are
meaningful for the manipulator motion control. Finally, the
surface fitting of the TPC is achieved.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with
respect to the research, authorship, and/or publication of this
article.
Funding
The author(s) received no financial support for the research,
authorship, and/or publication of this article.
References
1. Izadi S, Newcombe RA, and Kim D. Kinectfusion: real-time
dynamic 3D surface reconstruction and interaction. Acm Sig-
graph 2011; 3(8): 23.
2. Ma X, Xue Y, and Yang RY. Kinect-based real time 3D
reconstruction of human and its application. J Comput Aided
Des Comput Graph 2014; 26(10): 1720–1726.
3. Guo FS and Gao W. Batch reconstruction from UAV images
with prior information. Acta Autom Sin 2013; 39(6):
834–845.
4. Li XZ, Yang AL, Qin BL, et al. Monocular camera three
dimensional reconstruction based on optical flow feedback.
Acta Opt Sin 2015; 35(5): 1–9.
5. Lv DY, Huang ZP, Tao GH, et al. Dynamic Bayesian network
model based golf swing 3D reconstruction using simple depth
imaging device. J Electron Inf Technol 2015; 37(9):
2076–2081.
6. Lin YM, Lv NG, Lou XP, et al. Robot vision system for 3D
reconstruction in low texture environment. Opt Precis Eng
2015; 23(2): 540–549.
7. Izadi S, Kim D, and Hilliges O. Kinectfusion: real-time 3D
reconstruction and interaction using a moving depth camera.
ACM, Santa Barbara 2011: 559–568.
8. Kahl F and Hartley RI. Multiple-view geometry under the L
1
norm. IEEE Trans Pattern Anal Mach Intell 2005; 30(9):
1603–1617.
9. Carbone G and Gomez-Bravo F. Motion and operation plan-
ning of robotic systems. Mech Mach Sci 2015; 29(4): 15–26.
10. Kanatani K, Sugaya Y, and Kanazawa Y. Guide to 3D vision
computation: geometric analysis and implementation.
Advances in computer vision and pattern recognition.
Springer International Publishing, 2016.
11. Gruen A and Thomas SH. Calibration and orientation of
cameras in computer vision. Springer series in information
sciences. Springer Berlin Heidelberg, 2001; 34(3): 555–560.
12. Martin AF and Robert CB. Random sample consensus: a
paradigm for model fitting with applications to image analy-
sis and automated cartography. In: Graphics and Image Pro-
cessing. Morgan Kaufmann Publishers, 1981, pp. 381–395.
13. Andre U, Christof E, Robert H, et al. 3D scene segmentation for
autonomous robot grasping. In: 2012 IEEE/RSJ international
Table 4. Algorithms comparison.
Parameters
Izadi
et al.
1,7
Li
et al.
4
Lin
et al.
6
Hu and
Li
28
Our
method
Calculations 47,548 32,361 61,387 53,982 17,562
Time (s) 5.405 10.249 15.652 2.081 1.681
Subject to light
source influence
Small Larger Larger Small Small
Reliance on
mathematical
modeling
Large Larger Larger Large Large
Reconstruction
accuracy (mm)
0.15 0.40 0.19 0.15 0.15
Li and Hu 13
conference on intelligent robots and systems, 2012, pp.
1734–1740. DOI: 10.1109/ IROS.2012.6385692.
14. Suman S, Mohammed B, and Du QH. A Gaussian process
guided particle filter for tracking 3D human pose in video.
IEEE Trans Image Process 2013; 22(11): 4286–4300.
15. Refael CG and Richard EW. Digital image processing, 2nd
ed. University of Tennessee: Publishing House of Electronics
Industry, 2002.
16. Chang CC, Hsiao JY, and Hsieh CP. An adaptive median filter
for image denoising. Elsevier B.V. 2008; 39(11): 346–350.
17. Liu Q and Lin TS, Image edge detection algorithm based on
mathematical morphology. J South China Univ Technol
(National Science Edition) 2008; 36(9): 113–116.
18. Shi JL, Cai XJ, and Zhu PY. Ship target identification based
on the wavelet transform and neural network. Syst Eng Elec-
tron 2004; 26(7): 893–895.
19. Liu YG and You ZS. A neural network for image object
recognition and its application to car type recognition. Com-
put Eng 2003; 29(3): 30–32.
20. Hao HW and Jiang RR. Training sample selection method for
neural networks based on nearest neighbor rule. Acta Autom
Sin 2007; 33(12): 1247–1251.
21. Richard AN, Shahram I, Otmar H, et al. KinectFusion:
real-time dense surface mapping and tracking. In:
IEEE international symposium on mixed and augmented
reality 2011 science and technology proceedings, 2011,
pp. 127–136. DOI:10.1109/ISMAR.2011.6092378.
22. Zhu XX, Cao QX, Yang Y, et al. An improved Kinectfusion
3D reconstruction algorithm. Robot 2016; 36(2): 129–136.
23. He DJ, Shao XN, Wang D, et al. Denoising method of 3D
point cloud data of plants obtained by Kinect. Trans Chin Soc
Agric Mach 2016; 47(1): 331–335.
24. Merry B, Marais P, and Gain J. Compression of dense and
regular point clouds. Comput Graph Forum 2006; 25(4):
709–716.
25. Ivan L, Marten B, and Petter O. Intrinsic camera and hand-
eye calibration for a robot vision system using a point marker.
In: 14th IEEE-RAS international conference on humanoid
robots, 18–20 November 2014, pp. 59–66. DOI: 10.1109/
HUMANOIDS.2014.7041338.
26. Hu YZ and Li LY. Series actuator end integrated positioning
analysis based on multilayer perception neural network.
Trans Chin Soc Agric Eng Academic Press Professional,Inc.:
San Diego, CA, USA, 2016; 1(32): 22–27.
27. Bloomenthal J. An implicit surface polygonizer. In: Graphics
Gems IV. San Diego, CA: Academic Press Professional Inc.,
1994, pp. 324–349.
28. Hu YZ and Li LY. 3D reconstruction of end-effector in auton-
omous positioning process using depth imaging device. Math
Probl Eng 2016; 2016(10): 1–16.
14 International Journal of Advanced Robotic Systems
... The last paper offers a proposition for autonomous positioning control of a manipulator. 9 In order to solve the challenges of real-time calculations of the positioning error, error correction, and state analysis in the process of autonomous positioning, a Kinect depth imaging equipment is used, and a particle filter based on three-frame subtraction to capture the end-effector's motion is proposed. Further, a backpropagation neural network is adopted to recognize targets. ...
... 9 Hu et al. use the Kalman filter algorithm to optimize the trajectory of the manipulator. 10 In the third type of research, scholars study the manipulators from specific perspectives, such as the analysis of the main linkage of redundant linkage. These studies do not have wide applicability to the problem of substantially improving accuracy and efficiency. ...
Article
Full-text available
This article aims to improve the accuracy of each joint in a manipulator and to ensure the high-speed and real-time requirements. A method called the variational method genetic algorithm radial basis function, which is based on a combination feedback controller, is proposed to solve the optimal control problem. It is proposed a combined feedback with a linear part and a nonlinear part. We reconstruct the manipulator’s kinematics and dynamics models with a feedback control. In this model, the optimal trajectory, which was solved by the variation method, is regarded as the desired output. The other one is also established an improved genetic algorithm radial basis function neural network model. The optimal trajectory is rapidly solved by using the desired output and the improved genetic algorithm radial basis function neural network. This method can greatly improve the speed of the calculation and guarantee real-time performance while simultaneously ensuring accuracy.
Article
Full-text available
The real-time calculation of positioning error, error correction, and state analysis has always been a difficult challenge in the process of manipulator autonomous positioning. In order to solve this problem, a simple depth imaging equipment (Kinect) is used and Kalman filtering method based on three-frame subtraction to capture the end-effector motion is proposed in this paper. Moreover, backpropagation (BP) neural network is adopted to recognize the target. At the same time, batch point cloud model is proposed in accordance with depth video stream to calculate the space coordinates of the end-effector and the target. Then, a 3D surface is fitted by using the radial basis function (RBF) and the morphology. The experiments have demonstrated that the end-effector positioning error can be corrected in a short time. The prediction accuracies of both position and velocity have reached 99% and recognition rate of 99.8% has been achieved for cylindrical object. Furthermore, the gradual convergence of the end-effector center (EEC) to the target center (TC) shows that the autonomous positioning is successful. Simultaneously, 3D reconstruction is also completed to analyze the positioning state. Hence, the proposed algorithm in this paper is competent for autonomous positioning of manipulator. The algorithm effectiveness is also validated by 3D reconstruction. The computational ability is increased and system efficiency is greatly improved.
Book
This classroom-tested and easy-to-understand textbook/reference describes the state of the art in 3D reconstruction from multiple images, taking into consideration all aspects of programming and implementation. Unlike other textbooks on computer vision, this Guide to 3D Vision Computation takes a unique approach in which the initial focus is on practical application and the procedures necessary to actually build a computer vision system. The theoretical background is then briefly explained afterwards, highlighting how one can quickly and simply obtain the desired result without knowing the derivation of the mathematical detail. Topics and features: • Reviews the fundamental algorithms underlying computer vision, and their implementation • Describes the latest techniques for 3D reconstruction from multiple images • Summarizes the mathematical theory behind statistical error analysis for general geometric estimation problems • Offers examples of experimental results, enabling the reader to get a feeling of what can be done using each procedure • Presents derivations and justifications as problems at the end of each chapter, with solutions supplied at the end of the book • Explains the historical background for each topic in the supplemental notes at the end of each chapter • Provides additional material at an associated website, include sample code for typical procedures to help readers implement the algorithms described in the book This accessible work will be of great value to students on introductory computer vision courses. Serving as both as a practical programming guidebook and a useful reference on mathematics for computer vision, it is suitable for practitioners seeking to implement computer vision algorithms as well as for theoreticians wishing to know the underlying mathematical detail.
Article
In order to solve the difficult acquisition of plants' 3-D point cloud data, the Kinect was adopted to collect the 3-D point cloud data of corn. Compared with the usual 3-D scanning equipment, Kinect can rapidly and efficiently acquire the data with lower cost. But the accuracy of data acquired by Kinect is low. It is valuable to denoise the data. According to the characteristics of the point cloud data acquired by Kinect, the data were pre-processed and smoothed. In this paper, a multi frame data fusion method was used to obtain more complete plant 3-D point cloud data, and it played a role in smoothing. A denoising algorithm based on density analysis and depth data bilateral filtering methods were proposed to process the outlier noise and internal high-frequency noise. In the experiment of corn and eggplant internal high-frequency noise denoising, compared with the traditional bilateral filtering, the denoising time of the algorithm in this paper was only 2.71% and 1.78% of traditional bilateral filtering and the noise was well removed by adjusting the parameters. The experimental results show that the proposed method can easily and quickly remove the noise of different scales, while preserving the integrity of edge data. Consequently, the good 3-D point cloud data of the plant could be obtained. © 2016, Chinese Society of Agricultural Machinery. All right reserved.
Article
It is difficult to establish Jacobian matrix and determine the coordinate frames of links for non-standard actuator. A new analytical method to establish the Jacobian matrix and determine the coordinate frames for joints and links are proposed in this paper. The proposed method made the positioning analysis of end-effector easier in space. At the same time, it is necessary to prove the effectiveness of the proposed method theoretically and verify the localization and configuration capabilities through simulations. First of all, forward kinematics model was set up based on a non-standard five Degree Of Freedom(5-DOF) actuator. A frame transformation is performed from base coordinate to end-effector coordinate. The relation between two adjacent joints is defined by a homogenous pose matrix. Secondly, the necessary and sufficient conditions for comprehensive localization are derived. They can guide the actuator to perform various tasks, such as tracking, assembly and autonomous grasping. A 5-DOF actuator is considered here as an example and this holds good for any N-DOF. Thirdly, inverse kinematics solutions are obtained by using artificial Neural Network(NN) based on back-propagation Multi-Layer Perceptron (MLP, multilayer perceptron NN) and are not unique. A unique solution using nonlinear minimization optimization is found. A NN based on supervisory learning method including three inputs, twenty neurons and five outputs has been used. Excitation function tansigmoid and linear excitation function pureline are in hidden and outer layers respectively. In Cartesian coordinate space, NN is trained by means of Levenberg Marquardt (LM) algorithm. The training sets used are Denav Hartenberg(DH) parameters and Cartesian coordinates. The weights are updated continuously which reduces the Mean Square Error(MSE) gradually. When MSE reaches the threshold set up, NN training will be terminated. After training, the test sets are used to examine the capability of NN. Fourthly, there are two evaluation functions viz., localization and cost functions. The localization function is defined to evaluate the positioning property of end-effector. At the same time, in task space, it will check whether the actuator has reached the target point along the direction needed or not. The cost function is defined to evaluate the kinematics configuration. There is a great relevance between cost function and Jacobian matrix. Velocity mapping from each joint to the end-effector was described by Jacobian matrix. So the cost function could give expression for kinematic configuration. At the end, simulations and experiments are conducted. The settings include industrial computer UNO2184G, 5-DOF non-standard actuator, Windows 7, MATLAB2012a. Coordinate frames for each joint are established and D-H parameters are determined. Then relative pose matrix is obtained between each of the two adjacent joints. Initial end-effector pose is obtained following right multiplication rule. The end-effector space range is formed under each joint operation range. Then, simulation is performed using NN, obtained localization and cost functions. The following results are obtained. The rank of Jacobian matrix is equal to 5. Therefore, this actuator met necessary and sufficient conditions for comprehensive positioning. NN method for solving inverse kinematics has reduced the computational complexity compared to conventional method. There are 21 groups of solutions when positioning to(41.4, 89.0, 104.5). The optimal solution obtained is(21.61, 91.44, 135.52, 221.42, 0) according to localization function rule. The optimal solution obtained according to cost function rule is(21.61, 125.73, 108.42, 221.99.41, 0). NN accuracy is 89.9%(approximately) while conventional method is 87.5%. By approximate estimation, the errors for θ1, θ2, θ3, θ4 and θ5 are 3.7°, 3.1°, 3.5°, 3.3°and 4.5°respectively. NN used 1.2 seconds while conventional method completed in 0.9 seconds. Therefore, computation accuracy has improved by 20% and efficiency by 2.4%. If the system is linear, the conventional method is chosen when less demand in real-time. In contrast, if the system is nonlinear, new method proposed in this paper is chosen when more demand in real-time. The minimum value of localization function is 0.96. The maximum value of cost function is 4.0349×1014. These two parameters decide the comprehensive positioning and the kinematics configuration. From the results presented, it can be concluded that the non-standard actuator with MLP has better localization and optimal configuration. © 2016, Chinese Society of Agricultural Engineering. All right reserved.
Article
It is known that the traditional edge detection methods are effective but are sensitive to noise, and that most mathematical morphology-based edge detection algorithms are insufficient for complex edge feature due to their single structure element. In order to solve these problems, a novel multi-structure element multi-scale edge detection algorithm based on mathematical morphology is proposed. In this algorithm, the noise is filtered by an alternative-order morphological open-close filter with a multi-scale element, the edge information in different directions is obtained using multi-structure elements and is further fused to generate the final edge. Experimental results show that the proposed algorithm is insensitive to noise and is efficient in complex edge detection.
Article
Two improvements of KinectFusion algorithm are proposed. On one hand, the edge feature points in the environment are matched to improve the robustness, on the other hand, a ground plane point cloud in the model is preset to improve the accuracy. A standard RGB-D (RGB-Depth) SLAM (simultaneous localization and mapping) benchmark dataset and a data of the lab environment are modeled, and the comparison results show that, both the robustness and the accuracy are improved obviously after the improvement. The improved algorithm decreases the modeling error from 4.5% to 1.5% in a room of 6 m×3 m×3 m. Although the efficiency is influenced, the running speed of the algorithm is still very high, and the user experience during modeling is good.
Article
3D Human reconstruction generally begins with acquiring the geometric information captured from multiple perspectives with limited postures only. Therefore, they can't well serve the applications that need easily modified 3D human models. To solve these problems, this paper presents a new reconstruction method which only needs one Kinect. First, it detects the feature points of a human in depth video frames combining with error correction. Second, based on the feature point position, it implements the anthropometric by measuring the size of each body part with progressive calculation algorithm. Finally, it achieves the segmented parametric modeling according to the obtained body part information. The experiments demonstrate that our method can reconstruct an usable 3D human models fast and can modify them conveniently while people change the postures in front of Kinect. Thus, the constructed flexible models help to achieve good efficiency and interactive effects in the virtual dressing application.
Article
The simple depth imaging device gains more and more attention because of its lower cost and easy-to-use property compared with traditional motion capture systems. However, this kind of devices lack the basic data condition of 3D motion reconstruction due to low resolution, occlusions, and mixing up of body parts. In this paper, a Dynamic Bayesian Network (DBN) model is proposed to describe the spatial and temporal characteristics of human body joints. The model is based on fusion of the parent-child characteristics of joints and multi-order Markov property of joint during motion. A golf swing capture and reconstruction system DBN-Motion (DBN-based Motion reconstruction system), is presented based on the DBN model and the similarity of swing with a simple depth imaging device, Kinect, as capturing device. The proposed system effectively solves the problem of occlusions and mixing up of body parts, and successfully captures and reconstructs golf swing in 3D space. Experimental results prove that the proposed system can achieve comparable reconstruction accuracy to the commercial optical motion caption system.