Content uploaded by Hui Xiao

Author content

All content in this area was uploaded by Hui Xiao on Nov 03, 2020

Content may be subject to copyright.

International Joournal of Intelligent Robotics and Applications manuscript No.

(will be inserted by the editor)

Robotic Target Following with Slow and Delayed Visual

Feedback

Hui Xiao ·Xu Chen

Received: date / Accepted: date

Abstract Following rapidly and precisely a moving tar-

get has become the core functionality in robotic sys-

tems for transportation, manufacturing and surgery.

Among existing targeting following methods, vision-

based tracking continues to thrive as one of the most

popular, and is the closest method to human percep-

tion. However, the low sampling rate and the time de-

lays of visual outputs fundamentally hinder real-time

applications. In this paper, we show the potential of

signiﬁcant performance gain in vision-based target fol-

lowing when partial knowledge of the target dynamics

is available. Speciﬁcally, we propose a new framework

with Kalman ﬁlters and multi-rate model-based predic-

tion (1) to reconstruct fast-sampled 3D target position

and velocity data, and (2) to compensate the time delay.

Along the path, we study the impact of slow sampling

and the delay duration, and we experimentally verify

diﬀerent algorithms with a robot manipulator equipped

with an eye-in-hand camera. The results show that our

approach can achieve 95% error reduction rate com-

pared with the commonly used visual servoing tech-

nique, when the target is moving with high speed and

the visual measurements are slow and incapable of pro-

viding timely information.

Keywords Visual servoing ·Kalman ﬁlter ·Delay

compensation

Hui Xiao, PhD Student

Mechanical Engineering, University of Washington

3900 E Stevens Way NE, Seattle, WA 98195

E-mail: huix27@uw.edu

Xu Chen, Assistant Professor

Mechanical Engineering, University of Washington

3900 E Stevens Way NE, Seattle, WA 98195

E-mail: chx@uw.edu

corresponding author

1 Introduction

This paper considers the problem of controlling a robot

to follow a moving target based on only visual feed-

back. Such capability relates to high-impact robotic ap-

plications ranging from autonomous ground or aerial

vehicles that follow a leading target, to surgical robot

arms that track the motions of human organs, and to

robotic manipulators that perform pick-and-place tasks

in a highly dynamic environment (e.g., above the sea or

in a turbulence airﬂow). In those applications and the

like, control based on visual feedback is a prime target

for innovations. Indeed, vision sensors (e.g. cameras)

are becoming ubiquitous and non-contact imaging is

increasingly preferred in unstructured environments.

In typical vision-based target following, image pro-

cessing ﬁrst extracts useful information from raw pixel

data, then visual servo control algorithms calculate the

motion commands of the robot. From a controls per-

spective, the goal of visual servoing is to minimize the

errors between the desired and measured visual fea-

tures. Based on how the visual features are deﬁned,

model-based visual servoing approaches can be clas-

siﬁed into position-based and image-based [8,9]. The

position-based visual servoing (PBVS) deﬁnes visual

features in the 3D space while the image-based visual

servoing (IBVS) deﬁnes visual features in the 2D im-

age space [8]. There are also extended approaches that

use more sophisticated visual features [10,6], or deploy

a hybrid system [20,11, 22] that combines advantages

of IBVS and PBVS while trying to avoid their short-

comings. The recent learning-based approaches directly

train neural network models to predict the control com-

mands [4,24].

The above mentioned approaches are based on the

assumption of a motionless target. A moving target,

2 Hui Xiao, Xu Chen

however, demands additional and careful considerations

in tracking. One design is to add an integral term in the

velocity control law to compensate the object-following

error [30]—eﬀective only if the target moves at a con-

stant velocity. A more commonly used approach is to

estimate the target velocity by Kalman ﬁlters [14,19,

16,25], extended Kalman ﬁlters [21] or other ﬁltering

techniques [5]. Then one can compute the robot mo-

tion commands, assuming the target is moving at the

estimated velocity.

The major barriers for implementing visual servoing

algorithms for moving target following are that the slow

sampling speed of cameras and the time delays induced

by time-consuming image processing. Most vision-based

robot controls are framed into systems of systems with

multiple sampling rates, where the robot motion con-

troller runs at a higher sampling rate (usually more

than 100Hz) under fast encoder feedback, and the vi-

sual servo loop runs at a lower sampling rate (e.g. 25Hz

for a standard vision system). To integrate the two sam-

pling rate, the simplest method is to upsample the vi-

sual feedbacks by repeating the latest data between the

slow sampling instances. However, such an approach

does not consider target dynamics and can still cause

large following errors. When the target moves with a

high speed, a large tracking error could cause the cam-

era to lose the target in its ﬁeld of view and trigger a

tracking failure. Many works for tackling the above bar-

riers have been focused on developing advanced imag-

ing hardware or communication protocals [23,7, 13] that

can provide fast and timely image data. However, those

methods often require costly hardware and the speed

improvement is still limited by the time-consumming

image processing.

To reduce the eﬀect of slow-sampling and time de-

lays, visual sensing dynamics compensation (VSDC)

[26,18] was developed and approved to be successful in

target tracking experiments [27]. It formulated a cus-

tomized dual-rate Kalman ﬁlter where the prediction

runs at a faster rate and the correction step runs at

a slower rate. On the contrary in this paper, we pro-

vide an approach where state estimation performs op-

timally at sensor speeds and post compensation recov-

ers a fast-sampled and delay-free signal. This is made

possible by multi-rate model-based prediction (MMP)

[12] and knowing (partially) the underlying physics of

the target dynamics. The contribution of this work is

the development of a new vision-based target follow-

ing algorithm that can track a high-speed target when

the vision system is slow and incapable of providing

timely information. Speciﬁcally, we provide four diﬀer-

ent target dynamic models to cover a wide range of

motion trajectories, then use Kalman ﬁlters to estimate

the target 3D position and velocity sampled at a slow

rate. Based on polynomial models derived from the dy-

namic models, we extend the usages of MMPs to re-

construct a fast-sampled target position and velocity,

and to compensate the time delays of visual measure-

ments. Finally, we derive a target tracking controller

that substantially extends the traditional visual servo-

ing method, which only applies to follow a motionless

target. Compared to other visual servo control involv-

ing the robot dynamics, our controller design inherits

the algorithmic advantages in [8] and assures good con-

vergence properties.

The remainder of this paper is organized as follows.

Section 2 formulates the problem. Section 3 derives the

proposed target following algorithm. Section 4 describes

how to estimate the target motion using Kalman ﬁl-

ters. Section 5 details the algorithm of MMP and the

procedure to use it for interpolation and delay compen-

sation. Numerical simulation and robot experiment are

reported in section 6 and section 7, respectively. Finally,

section 8 concludes the paper. A preliminary version of

this article was presented in [29]. This article is a sub-

stantially extended study that includes new simulations

and the full analysis results.

Nomenclature

{A}A 3D coordinate system with origin at porint

A.

AξBThe 3D pose of frame {B}with respect to frame

{A}.

AvBThe 3D velocity of frame {B}with respect to

frame {A}.

⊕The composition operator of relative poses, e.g.,

AξC=AξB⊕BξC.

ARBThe rotation matrix corresponding to the rela-

tive pose AξB.

AtBThe translation vector corresponding to the rel-

ative pose AξB.

Rx(θ) The rotation matrix that corresponds to the 3D

rotating of θdegrees about xaxis.

A0The transpose of matrix A.

2 Problem Formulation

Coordinate system deﬁnition. The target following

system includes a moving target T, a follower robot R,

and a vision system Cstatically attached to the robot.

We deﬁne a static world coordinate system {W}and

attach moving coordinate systems {T}and {C}to the

target and to the vision system, respectively. Figure 1

shows the relationship between the coordinate systems,

Robotic Target Following with Slow and Delayed Visual Feedback 3

Fig. 1: Coordinate systems of target following problem

where {C∗}represents the desired position and orien-

tation of the vision system.

We now discuss the major elements in Figure 1:

•The observed target pose in the camera frame CξT:

There are many approaches to estimate the target

pose from images. For example, with a calibrated

camera and a known 3D model of the target, CξT

can be obtained from the solution of Perspective-

n-Point (PnP) problem based on n3D-to-2D point

correspondences [17, 31]. With a stereo camera or an

RGB-D camera, it is possiable to estimated the 3D

pose of the target without using the 3D model of

the target as a prior [3,1]. There are also learning-

based pose estimation proposed recently [28]. In this

paper, we use the PnP algorithm for simplicity and

will assume an estimation Cˆ

ξTis available hereafter.

•The camera pose in the world coordinate system

WξC: Because the camera is rigidly attached to the

robot, WξCcan be calculated by the forward kine-

matics of the robot.

•The target pose in the world frame WξT: This is un-

known and is estimated by the vision system. Most

important, we consider the case when the target is

moving with unknown velocity WvT(t).

Data acquisition and time delay. In this pa-

per, we consider the case when the image-based pose

estimation Cˆ

ξThas a time delay of τseconds, and is

updated every Tss seconds. However, the controller of

the robot is running with a much shorter period of Tsf

seconds. We assume that the sampling rate of the imag-

ing system can be adjusted such that

Tss =LTsf ,(1)

where Lis an integer greater than one.

Control goal. Similar to the standard PBVS [8],

the vision-based target tracking problem can be formu-

lated as minimizing the visual feature error e(t) given

by

e(t) = s(t)−s∗(t),(2)

where srepresents a set of visual features and s∗is the

desired value of s. For the target following problem, we

deﬁne the visual featuer as the pose diﬀerence between

the desired and the actual camera pose s,C∗ξC. An

estimation of the visual feature is calculated from the

target pose estimation

C∗ˆ

ξC=C∗ξT⊕Tˆ

ξC,(3)

where C∗ξTis a user-deﬁned value and Tˆ

ξCis the re-

verse of the pose estimation Cˆ

ξT. The goal of target

following is to control the camera velocity CvCsuch

that CξTwill converge to C∗ξT, that is, sconverges to

s∗=0.

3 PBVS with a Moving Target

In this paper, we parameterize the feature error as e=

C∗ξC= (C∗tC, θu), where C∗tCand θare the transla-

tion vector and the rotation angles from {C∗}to {C},

respectively. uis a unit vector representing the corre-

sponding rotation axis. Let the relative instantaneous

velocity of the camera with respect to the target be

TvC∈R6×1,then the relationship between the time

derivative of visual feature error ˙

eand the relative ve-

locity TvCis

˙

e=LeTvC=C∗

RT0

0Lθu

CRTTvC,(4)

where Leis the feature Jacobian and Lθuis given by

Lθu=I3−θ

2[u]×+ 1−sincθ

sinc2θ

2![u]2

×.(5)

Here, sinc(x) is the sinus cardinal such that xsinc(x) =

sin(x) and sinc(0)=1. [u]×is a skew-symmetric matrix

deﬁned as

[u]×=

0 -uzuy

uz0 -ux

-uyux0

.(6)

To ensure a decreasing feature error, we design a veloc-

ity controller

TvC= -λL−1

ee= -λTRC∗0

0TRCL−1

θuC∗tC

θu,(7)

4 Hui Xiao, Xu Chen

where λ > 0.Then from Eq. (7) and Eq. (4):

˙

e= -λe.(8)

That is to say, the visual feature error will exponentially

decrease to zero. Note that L−1

θuθu=θu. Eq. (7) can

then be simpliﬁed as

(TvC,l = -λ·TRC∗C∗tC

TvC,a = -λθ ·TRCu(9)

where TvC,l and TvC,a are the linear and the angular

parts of the velocity vector TvC, respectively.

Note that in the standard PBVS, the feature Ja-

cobian is deﬁned with respect to the camera’s velocity

in the camera frame CvC[8], i.e., ˙

e=LeCvC. Such

a relationship is valid only if the target is motionless.

In this paper, the feature Jacobian is deﬁned with re-

spect to the relative velocity TvCbetween two moving

frames, such that Eq. (4) is still valid when the target is

maneuvering. To compute the velocity command CvC,

we use a velocity transformation formula

CvC=CRT0

0CRTTvC+CRT-CRTTtC×

0CRTTvT.

(10)

Here, the target velocity TvTis obtained from Kalman

ﬁlters, as is described in the following section.

4 Pose and Velocity Estimation

While we can estimate the target pose from the percep-

tion system, the visual measurements are usually noisy.

To reduce the measurement noise and to estimate the

unknown target velocity, we build a set of Kalman ﬁl-

ters that can ﬁt to common robotic target motions.

4.1 Dynamic models of the moving target

We list ﬁrst the widely used constant velocity model

and constant acceleration model for a moving target,

then derive a constant frequency model and a com-

pound constant frequency model.

4.1.1 Constant velocity model

When the acceleration is relatively small, we can as-

sume a constant speed in the state update equation and

model the acceleration as white noise, which formulates

to the discrete white noise acceleration (DWNA) model

[2]. Deﬁne the system state as x= (η, ˙η). Then the sys-

tem state-space model is given by

x(k+ 1) = 1Tss

0 1 x(k) + 1

2T2

ss

Tss v(k),(11)

z(k+ 1) = 1 0 x(k+ 1) + ω(k+ 1),(12)

where v(k) is the process noise and ω(k+ 1) is the

measurement noise.

4.1.2 Constant acceleration model

When the acceleration is nearly constant, we use dis-

crete Wiener process acceleration (DWPA) model that

assumes a constant acceleration within each update pe-

riod. Deﬁning the system state as x= (η, ˙η, ¨η), the sys-

tem model is given as

x(k+ 1) =

1Tss 1

2T2

ss

0 1 Tss

0 0 1

x(k) +

1

2T2

ss

Tss

1

v(k),(13)

z(k+ 1) = 100x(k+ 1) + ω(k+ 1).(14)

4.1.3 Constant frequency (CF) model

When the movement is approximately periodic, both

constant velocity model and constant acceleration model

will fail to timely capture the velocity and acceleration

changes. This model mismatch is signiﬁcant when the

moving frequency is fast. The vast amount of periodic

tasks can be decomposed to sinusodial motions from

the Fourier theory. Consider η(t) = asin(Ωt +ϕ) and

its derivative ˙η(t) = aΩ cos(Ωt +ϕ). Let the system

state be x= (η, ˙η). Note that the second derivative

¨η(t) = −aΩ2sin(Ωt+ϕ) = −Ω2η(t). Then we can write

the continuous-time state-space model as

d

dtx(t) = 0 1

-Ω20x(t),Acx(t).(15)

Discretizing Eq. (15) at a sampling time Tss , we have

x(k+ 1)

= eAcTss x(k) = cos(ΩTss ) sin(ΩTss )/Ω

-Ωsin(ΩTss ) cos(ΩTss )x(k).(16)

When the sinusoidal signal has a bias term, i.e., η(t) =

asin(Ωt +ϕ) + b, we can augment the system to in-

clude the bias: x= (η, ˙η, b). The corresponding ex-

tended model with noise is

x(k+ 1)

=

cos(ΩTss ) sin(ΩTss )/Ω 0

-Ωsin(ΩTss ) cos(ΩTss ) 0

0 0 1

x(k) +

1

2T2

ss

Tss

Tss

v(k),

(17)

z(k+ 1) = 101x(k+ 1) + ω(k+ 1).(18)

Robotic Target Following with Slow and Delayed Visual Feedback 5

4.1.4 Compound constant frequency (CCF) model

We consider here the case when the core movement is

a mixture of multiple sinusoids, i.e.,

η(t) =

N

X

i=1

aisin(Ωit+ϕi)+b, Ωi6=Ωj6= 0; ∀i6=j(19)

In order to model the kinematics of the above, let x=

(η1,˙η1, . . . , ηN,˙ηN, b). The state is deﬁned such that the

i-th pair (ηi,˙ηi) corresponds to the i-th frequency com-

ponent, and has the same discrete model as in Eq. (16).

Each (ηi,˙ηi) is independent from other pairs, thus we

have the following state-space model:

x(k+ 1) =

A1

A2

...

AN

1

x(k) +

1

2T2

ss

Tss

.

.

.

.

.

.

Tss

v(k)

(20)

z(k+ 1) = 1 0 1 0 · · · 0 1 x(k+ 1) + ω(k+ 1) (21)

where Aiis a 2 ×2 matrix deﬁned as

Ai=cos(ΩiTss) sin(ΩiTss )/Ωi

-Ωisin(ΩiTss) cos(ΩiTss ), i = 1,· · · , N.

(22)

4.2 Linear position and velocity estimation

We decouple the 3D target motions to x,yand zaxes

and estimate the components independently. As a re-

sult, the system order can be reduced, and we only need

to consider the problem of estimating the position and

velocity in one generic axis. Recall that a relative pose

measurement Cˆ

ξTsampled at 1/Tss Hz is available from

the vision system. Combined with the known camera

pose WξC, the target pose measurements in the world

coordinate system can be obtained as WξT=WξC⊕

Cˆ

ξT.Then the position measurement WtT= (tx, ty, tz)

can be extracted from WξT. For the motion in each

axis, we choose the appropriate dynamic model based

on an assessment of the motion type.

x(k+ 1) = F x(x) + v(k) (23)

z(k+ 1) = Hx(k+ 1) + w(k) (24)

Denote the process noise covariance as Qand the mea-

surement noise covariance as R. Kalman ﬁlter predic-

tion and update steps, Eqs. (25-31), are applied to ﬁlter

the noisy measurement and estimate the position and

velocity for each axis. In the prediction step, we ﬁrst

compute the estimated state x(k+ 1|k) and state co-

variance P(k+ 1|k).

ˆx(k+ 1|k) = Fˆx(k|k) (25)

P(k+ 1|k) = F P (k|k)F0+Q(26)

Here, •(k+1|k) indicates estimated variables given mea-

surements up to and includeing z(k). Similarly, •(k+

1|k+1) indicates estimates given measurement include-

ing z(k+1). The measurement prediction ˆz(k+1|k) and

the corresponding prediction covariance S(k+ 1) are

ˆz(k+ 1|k) = Hˆx(k+ 1|k) (27)

S(k+ 1) = R+HP (k+ 1|k)H0(28)

In the correction step, we compute the Kalman ﬁlter

gain K(k+ 1) and the updated state estimation and

covairance using latest measurement z(k+ 1).

W(k+ 1) = P(k+ 1|k)H0S(k+ 1)−1(29)

ˆx(k+ 1|k+ 1) = ˆx(k+ 1|k)

+W(k+ 1)(z(k+ 1) −ˆz(k+ 1|k))

(30)

P(k+ 1|k+ 1) =

P(k+ 1|k)−W(k+ 1)S(k+ 1)W(k+ 1)0(31)

4.3 Angular position and velocity estimation

In order to build Kalman ﬁlters for estimating the ro-

tation angles and velocities, we ﬁrst extract the mea-

sured rotation matrix WRTfrom WξT, then convert it

to Euler angles (α, β, γ ). Note that there are twelve dif-

ferent representations of the Euler angles and any one

of them can be used. Here we use the roll-pitch-yaw rep-

resentation (i.e., with the rotation order “ZYX”) that

is popular for ships, aircraft and vehicles. Similar to the

linear position estimation, we view each Euler angle as

an individual component and apply Eqs. (25-31) with

the appropriate dynamic model. Note that the physi-

cal meanings of the angular velocity estimates ( ˙α, ˙

β, ˙γ)

depend on the choice of Euler representation. For ex-

ample, in the roll-pitch-yaw representation, ˙

βand ˙γ

are the angular velocities about the rotated yand x-

axes instead of the original ones. We obtain the angu-

lar velocity with respect to the world coordinate system

WvT,a = (ωx, ωy, ωz) by coordinate conversion:

WvT,a = ˙α−→

z+Rz(α)˙

β−→

y+Ry(β) ˙γ−→

x.(32)

Expanding Eq. (32), we have

ωx= ˙γcos(α) cos(β)−˙

βsin(α)

ωy=˙

βcos(α) + ˙γcos(β) sin(α)

ωz= ˙α−˙γsin(β)

(33)

6 Hui Xiao, Xu Chen

5 Interpolation and Delay Compensation

The target position and velocity estimated from the

Kalman ﬁlter are only sampled at fss = 1/Tss Hz.

They also inherit the time delays from the visual mea-

surements. In this section, we build a multi-rate model-

based predictor to construct new data points from the

slow sampled Kalman ﬁlter outputs. Substantially ex-

tending our past theoretical results [12], the proposed

algorithm also compensates the time delays.

5.1 Multi-rate model-based prediction revisited

Consider a continuous-time signal dc(t) and its two dis-

cretized sequences df[n] and ds[n] with sampling pe-

riods of Tsf and Tss =LTsf ), respectively. That is,

df[Ln] = ds[n] = dc(nTss).If there exists a polynomial

model A(z−1) = 1 + a1z−1+· · · +amz−m(am6= 0)

such that A(z−1)df[n] = 0 at the steady state, then

df[nL +k] can be expressed with d[n] as

df[nL +k] = Wk(z−1)d[n]

=wk,0d[n] + wk,1d[n−1] + ···+wk ,m−1d[n−m+ 1].

(34)

The coeﬃcients of Wk(z−1) can be solved from a sys-

tem of linear equations

Mk

fk,1

.

.

.

fk,L(m−1)−m+k

wk,0

.

.

.

wk,m−1

=−

a1

.

.

.

am

0

.

.

.

0

,(35)

where Mk∈R[L(m−1)+k]×[l(m−1)+k]and is given by the

polynomial model A(z−1) (see [12] for details).

5.2 Polynomial models of the target movement

trajectory

Given a discrete sequence d[n], its polynomial model (if

exists) is deﬁned by A(z−1) = 1 + a1z−1+· · · +amz−m

and A(z−1)d[n]→0 when n→ ∞. We show that the

polynomial model of d[n] can be easily obtained from

its Z-transform.

Lemma 1 If a discrete sequence d[n]has Z-transform

Z{d[n]}=B(z−1)/A(z−1), then the polynomial model

of d[n]is A(z−1).

Proof Suppose a discrete linear time-invariant system

with a transfer function B(z−1)/A(z−1). From the prop-

erty of the transfer function, d[n] is equivalent to the

Dynamic model Position Velocity

DWNA (1 −z−1)21−z−1

DWPA (1 −z−1)3(1 −z−1)2

CF 1 −2 cos(ΩTsf )z−1+z−2

CCF QN

i=1 1−2 cos(ΩiTsf )z−1+z−2

Table 1: Polynomial models of the position and velocity

proﬁles for diﬀerent dynamic models

Fig. 2: Illustration of the interpolation and delay com-

pensation procedure. Here, L=Tss /Tsf = 3.

impulse response of the system. That is, A(z−1)d[n] =

B(z−1)δ[n],where δ[n] is the delta impulse signal. Then

we have A(z−1)d[n] = 0 for n>nb, where nbis the or-

der of polynomial B(z−1).

For example, for the constant frequency model, the

ideal velocity proﬁle (sampled at 1/Tsf Hz) is a pure

sinusoid df[n] = aΩ cos(ΩTsf n+ϕ), whose polynomial

model can be found as A(z−1) = 1 −2 cos(ΩTsf )z−1+

z−2by taking the Z-transform of df[n]. Table 1 summa-

rizes the polynomial models of the position and velocity

proﬁles for each dynamic model mentioned in section

4.1.

5.3 Interpolation and delay compensation algorithm

As is shown in subsection 5.1, the MMP can construct a

connection between the historical slow-sampled data set

(i.e., ds[n], ds[n−1] and so on) and a future data point

df[nL +k] in the fast-sampled sequence. This connec-

tion allows to predict the position and velocity sampled

at 1/Tsf Hz using Kalman ﬁlter outputs. Furthermore,

by properly adjusting the kvalue, the measurement

delay time τcan be compensated. In more details, at

each discrete step, we ﬁrst calculate the index kcsuch

that nL +kccorresponds to the current time tc(see

Figure 2). Then we calculate the length of prediction

steps kp= round(τ/Tsf ) needed to compensate the de-

lay time τ. Finally we use k=kc+kpand apply Eq.

(34). The algorithm to compute the fast-sampled data

ˆ

dfwith delay compensated is summarized in Algorithm

1.

A simulated interpolation and delay compensation

result is shown in Figure 3. In the simulation, the fast

and slow sampling times are Tsf = 8ms and Tss =

Robotic Target Following with Slow and Delayed Visual Feedback 7

Algorithm 1: Interpolation and delay com-

pensation (one circle)

input : most recent ds[n], delay time τand current

time tc

output: ˆ

df[n]

1if ds[n]has a new value then

2update the data storage that keeps the most m

recent ds[n] values {ds[n],··· , ds[n−m+ 1]};

3tlast ←tc;

4end if

5kc←(tc−tlast)/Tsf ;

6kp←round(τ/Tsf );

7k←kc+kp;

8From Eq. (35), solve wk,0,··· , wk,m−1;

9From Eq.(34), compute ˆ

df[n] ;

0 0.5 1 1.5 2 2.5 3

Time (seconds)

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Data

Fig. 3: Simulated interpolation and delay compensation

result.

48ms, respectively. Thus L=Tss/Tsf = 6. The polyno-

mial model used here corresponds to the CCF dynamic

model with two frequencies Ω1= 2π×1.2 rad/s and

Ω2= 2π×3.1 rad/s. Based on Table 1, A(z−1) can be

derived as 1 −3.97z−1+ 5.94z−2−3.97z−3+z−4.The

results shows that MMP can accurately predict the in-

tersample data and compensate the measurement delay

(56ms in this case).

6 Numerical Simulation

In this section, we simulate the performances of the

proposed target following algorithm under various sce-

narios. The simulated algorithm combines the PBVS

considering a moving target (see Section 3), Kalman

ﬁlters (see Section 4) and the interpolation and delay

compensation using MMP (see Section 5). An overview

of the system structure is shown in Figure 4. In the sim-

ulation, the target moves following a ﬁxed 2D track as

is showing in the Figure 5. The target started from posi-

tion (0,0) and then went through the full circle multiple

times. Here, the target pose measurement from the pose

estimation block is simulated by adding a zero-mean

Gaussian noise to the true target pose. The noises are

independently distributed across each axis and have a

standard derivation of 0.01 meters. To exclude the ran-

domness in each simulation runs, we did 50 independent

simulation runs for each scenario and use the averaged

following error for comparison.

In the ﬁrst scenario, the robot controller loop is run-

ning at 125Hz (i.e., Tsf = 8 micro seconds). However,

the pose estimation from the vision system is running

eight times slower and is delayed by 64 micro seconds.

Figure 6 shows the tracking error statistics for the ﬁrst

four laps. Here, we also provide simulation results when

using the basic PBVS (see [8]) or our modiﬁed PBVS

(see Section 3) but without interpolation and delay

compensation. In the ﬁrst lap, we observed that the

following error reduced from its initial value to a signiﬁ-

cant lower level. This can be seen in the error bars where

the maximum errors in the ﬁrst lap all start with around

3 meters and the minimum errors are below 0.1 meters.

In the following laps, the simulation all transient to a

steady-state where the following errors are constrained

in a ﬁxed range. The average tracking errors using the

basic PBVS is 87 millimeters in the steady-state, which

is signiﬁcant considering that the track dimension is

only 320 millimeters by 160 millimeters. Implementing

our improved PBVS which considered a moving target

could reduce the following error. However, the error is

still large due to the slow-sampling and delays in visual

feedback. Finally, using our proposed approach could

reduce the errors to a small range around 3 millime-

ters, yielding a 95% error reduction compared to the

basic PBVS.

In order to analyze the impact of slow-sampling and

time delay, we designed two sets of scenarios where the

target all follow the same track as is shown in Figure 5.

In the ﬁrst set of scenarios, we ﬁxed the system’s sam-

pling speed and the time delays but varied the target

moving speed. The steady-state following errors for the

ﬁrst set of scenarios are shown in Figure 7. Not surpris-

ingly, we observe that the object following performance

downgrades as the target speed increases. When the

target moves 12 times faster than the base level, we see

that our improved PBVS actually induced more error

compared with the baseline, while MMP techniques can

still achieve a 80% error reduction rate.

In the second set of scenarios, we ﬁxed the system’s

sampling speed and the target moving speed but varied

the time delays of the vision system. As is shown in Fig-

ure 8, the following performance will downgrade as the

delay time increase when we applied no delay compen-

sation. With Kalman ﬁlters and MMP, the approached

algorithm could still constrain the following error to a

small level when delay increases.

8 Hui Xiao, Xu Chen

PBVS with

Moving Target

Interpolation

and Delay

Compensation

Kalman Filters Pose

Estimation

Robot

Controller

image data

target

pose

target

pose &

velocity

reconstructed

target pose &

velocity

desired robot

pose relative to

target

fast sampled encoder feedback

velocity

commands joint current

commands moving target

Fig. 4: The overview of the proposed target following algorithm. The dashed lines represent signals that are updated

every Tss seconds while the solid lines represent fast-sampled signals that will update every Tsf seconds.

0 0.05 0.1 0.15 0.2 0.25 0.3

x (meters)

-0.1

-0.05

0

0.05

0.1

y (meters)

Fig. 5: The target moving trajectory used in the simu-

lation.

Lap 1 Lap 2 Lap 3 Lap 4

10-3

10-2

10-1

100

Following error (meters)

4.3e-01

8.7e-02 8.7e-02 8.7e-02

4.4e-01

4.6e-02 4.6e-02 4.5e-02

4.2e-01

3.4e-03 3.3e-03 3.3e-03

Basic PBVS

PBVS with velocity compensation

Our approach

Fig. 6: Comparison of the following errors in each lap.

Each error bar indicates the maximun, minimun and

mean errors in a lap.

7 Experiment Results

The proposed target following algorithm was tested on

a dual-arm robot, as is shown in Figure 9. The left

arm in the ﬁgure holds a target and the right arm has

1x 2x 6x 8x 10x 12x

Speed Multiplier

10-3

10-2

10-1

100

Following error (meters)

8.7e-02 1.2e-01 1.3e-01 1.3e-01 1.3e-01 1.2e-01

1.5e-02

3.0e-02

8.7e-02 1.1e-01 1.4e-01 2.0e-01

3.2e-03 4.3e-03

1.2e-02 1.6e-02 2.0e-02 2.5e-02

Basic PBVS

PBVS with velocity compensation

Our approach

Fig. 7: Comparison of the following error when the

target is moving at diﬀerent levels of speed. At the

base level, target takes 2 seconds to complete one lap.

Each error bar indicates the maximun, minimun and

the mean values of the errors in the last lap.

0 32 64 96 128 160

Delay (micro seconds)

10-3

10-2

10-1

Following error (meters)

8.7e-02 9.8e-02 1.1e-01 1.2e-01 1.3e-01 1.4e-01

1.5e-02

3.1e-02

4.5e-02 6.1e-02 7.5e-02 8.9e-02

3.3e-03 3.5e-03 3.3e-03 3.5e-03 3.3e-03 3.6e-03

Basic PBVS

PBVS with velocity compensation

Our approach

Fig. 8: Comparison of the following error when the vi-

sion system has diﬀerent levels of time dealys. Each er-

ror bar indicates the maximun, minimun and the mean

values of the errors in the last lap.

a camera mounted to the end-eﬀector. The world co-

ordinate system {W}is attached to the base of the

right robot arm. In the experiment, we moved the tar-

get with a 2-D circle trajectory that is parallel to the

Robotic Target Following with Slow and Delayed Visual Feedback 9

Tar get

Cam er a

Table

Fig. 9: Dual-arm robot used to test the target following

algorithm.

W-x-yplane. Speciﬁcally, the linear xand yvelocities of

the target is controlled to be sinusoidal with π/2 phase

diﬀerence. During the target following experiment, the

end-eﬀector’s position and velocity of the left robot are

assumed to be in the 3D space and are unknown to

the right robot, but is used to calculate the following

errors at a high sampling speed for analysis purposes.

Note that the visual following errors are represented re-

spected to the table coordinate system (see Figure 9),

which has a 45 degree orientation diﬀerence from the

robot arm base.

Sampling speed and measurement delay. The

camera used is Mako G192C from Allied Vision Tech-

nology which has 60fps at a full resolution of 1600×1200

pixels. ArUco markers [15] are used for estimating the

target pose in the camera frame. Due to the heavy com-

putation cost of the marker detecting and pose estima-

tion process, the target pose measurements can only

be updated at a maximum rate of about 20Hz. On the

other hand, the robot servo loop is running at a high

sampling rate of 125Hz. To make the fast and slow sam-

pling rate almost integer multiples, we triggered the

camera to acquire images at a ﬁxed rate of 17.85fps,

then L=Tss/Tsf = 7. The measurement delays can be

monitored by adding time stamps to the images when

captured. In our experiment setup, the measurement

delays were about 50ms.

Two scenarios were tested using the proposed tar-

get following algorithm. In the ﬁrst scenario the tar-

get moved at 0.5Hz, while in the second scenario, the

target moved at a higher frequency of 1Hz. In both

scenarios, the CF model was used for linear xand y

axes for the Kalman ﬁltering; DWNA model was used

for other axes. Then the appropriate polynomial mod-

els were chosen from Table 1 and MMPs were applied

for all axes. Note that if a time-invariant polynomial

0.1 0.2 0.3 0.4 0.5

x axis (meter)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

y axis (meter)

target trajectory

robot trajectory with level 1 control

robot trajectory with level 2 control

robot trajectory with level 3 control

robot trajectory with level 4 control

Fig. 10: Moving trajectories of the target and robot in

2D.

model is assumed, then the MMP parameters w0

k,is can

be calculated oﬀ-line to save the on-line computation

cost.

Figure 11a shows the target following errors of the

linear x,yand zaxes in the ﬁrst scenario. Here, we

tested four levels of target following algorithms. The

ﬁrst level only uses the basic PBVS algorithm which is

designed based on a static target assumption [8]. The

second level considered a moving target, and the esti-

mated target positions and velocities from the Kalman

ﬁlters are directly used. The third level not only consid-

ered a moving target but also compensated the delays

using MMP. In order to show the importance of fast

sampling rate of measurement, we down-sampled the

MMP outputs in the third level to the slow sampling

rate 1/Tss Hz. Finally, in the fourth level, both Kalman

ﬁlters and MMP are used without down-sampling. Statis-

tic result of the target following error are shown in Fig-

ure 11b, where the 3σvalue plus the absolute mean of

the errors are computed for comparison. The same per-

formance improvements can also be observed by com-

paring the moving trajectories of the target and robot

(see Figure 10).

One can clearly observe the performance improve-

ment when the target velocity estimation, delay com-

pensation, and interpolation techniques are added to

the algorithm.

For the second scenario, the following error of lin-

ear x,yand z-axes are shown in Figure 11c and Figure

11d. The same four levels of target following algorithm

are tested. Diﬀerent from the ﬁrst scenario, the ﬁrst

three levels actually failed to reduce the following er-

ror. This is because when the target moves at a higher

frequency, the consequences of measurement delay and

slow measurement sampling become more signiﬁcant.

10 Hui Xiao, Xu Chen

As a result, it requires both delay compensation and

interpolation to reduce the following error.

8 Conclusion and Future Work

This paper proposed a method to track a moving target

with fast dynamics with only slow sampled and delayed

visual feedback. We discussed a general method to track

3D target movement with partially known target veloc-

ity. The proposed approach estimates target position

and velocity with Kalman ﬁlters and builds multi-rate

model-based predictions to reconstruct fast motion pro-

ﬁles and to compensate the time delay. The result from

a dual-arm robot experiment veriﬁed the eﬀectiveness

of the proposed algorithm and demonstrated the danger

of slow-sampled and delay visual feedback.

The current proposed approach assumes that the

target follows a time-invariant model during tracking.

Adaptive ﬁltering techniques can be further investigated

for applications with time-variant target models. An-

other future work is extended robustness in presence of

more uncertainties in the interpolation and delay com-

pensation.

References

1. Aldoma, A., Marton, Z.C., Tombari, F., Wohlkinger, W.,

Potthast, C., Zeisl, B., Rusu, R.B., Gedikli, S., Vincze,

M.: Tutorial: Point cloud library: Three-dimensional ob-

ject recognition and 6 dof pose estimation. IEEE

Robotics & Automation Magazine 19(3), 80–91 (2012)

2. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation

with applications to tracking and navigation: theory al-

gorithms and software. John Wiley & Sons (2004)

3. Barrois, B., W¨ohler, C.: 3d pose estimation of vehicles us-

ing stereo camera. Encyclopedia of Sustainability Science

and Technology pp. 10589–10612 (2012)

4. Bateux, Q., Marchand, E., Leitner, J., Chaumette, F.,

Corke, P.: Training deep neural networks for visual servo-

ing. In: 2018 IEEE International Conference on Robotics

and Automation (ICRA), pp. 3307–3314 (2018)

5. Bensalah, F., Chaumette, F.: Compensation of abrupt

motion changes in target tracking by visual servoing. In:

Proceedings 1995 IEEE/RSJ International Conference on

Intelligent Robots and Systems. Human Robot Interac-

tion and Cooperative Robots, vol. 1, pp. 181–187. IEEE

(1995)

6. Casta˜no, A., Hutchinson, S.: Visual compliance: Task-

directed visual servo control. IEEE transactions on

Robotics and Automation 10(3), 334–342 (1994)

7. Chalimbaud, P., Berry, F.: Embedded active vision sys-

tem based on an fpga architecture. EURASIP Journal

on Embedded Systems 2007(1), 26–26 (2007)

8. Chaumette, F., Hutchinson, S.: Visual servo control. I.

basic approaches. IEEE Robotics & Automation Maga-

zine 13(4), 82–90 (2006)

9. Chaumette, F., Hutchinson, S.: Visual servo control. ii.

advanced approaches [tutorial]. IEEE Robotics & Au-

tomation Magazine 14(1), 109–118 (2007)

10. Chaumette, F., Rives, P., Espiau, B.: Classiﬁcation and

realization of the diﬀerent vision-based tasks. In: Vi-

sual Servoing: Real-Time Control of Robot Manipulators

Based on Visual Sensory Feedback, pp. 199–228. World

Scientiﬁc (1993)

11. Chen, J., Dawson, D.M., Dixon, W.E., Behal, A.:

Adaptive homography-based visual servo tracking for a

ﬁxed camera conﬁguration with a camera-in-hand exten-

sion. IEEE Transactions on Control Systems Technology

13(5), 814–825 (2005)

12. Chen, X., Xiao, H.: Multirate forward-model distur-

bance observer for feedback regulation beyond nyquist

frequency. Systems & Control Letters 94, 181–188 (2016)

13. Delbr¨uck, T., Linares-Barranco, B., Culurciello, E.,

Posch, C.: Activity-driven, event-based vision sensors. In:

Proceedings of 2010 IEEE International Symposium on

Circuits and Systems, pp. 2426–2429. IEEE (2010)

14. Firouzi, H., Najjaran, H.: Real-time monocular vision-

based object tracking with object distance and motion

estimation. In: 2010 IEEE/ASME International Confer-

ence on Advanced Intelligent Mechatronics, pp. 987–992.

IEEE (2010)

15. Garrido-Jurado, S., Munoz-Salinas, R., Madrid-Cuevas,

F.J., Medina-Carnicer, R.: Generation of ﬁducial marker

dictionaries using mixed integer linear programming.

Pattern Recognition 51, 481–491 (2016)

16. Janabi-Shariﬁ, F., Marey, M.: A kalman-ﬁlter-based

method for pose estimation in visual servoing. IEEE

transactions on Robotics 26(5), 939–947 (2010)

17. Lepetit, V., Moreno-Noguer, F., Fua, P.: Epnp: An ac-

curate O(n) solution to the PnP problem. International

journal of computer vision 81(2), 155 (2009)

18. Lin, C.Y., Wang, C., Tomizuka, M.: Pose estimation in in-

dustrial machine vision systems under sensing dynamics:

A statistical learning approach. In: 2014 IEEE Interna-

tional Conference on Robotics and Automation (ICRA),

pp. 4436–4442. IEEE (2014)

19. Luo, R.C., Chou, S.C., Yang, X.Y., Peng, N.: Hybrid

eye-to-hand and eye-in-hand visual servo system for par-

allel robot conveyor object tracking and fetching. In:

IECON 2014-40th Annual Conference of the IEEE In-

dustrial Electronics Society, pp. 2558–2563. IEEE (2014)

20. Malis, E., Chaumette, F., Boudet, S.: 2 1/2 d visual ser-

voing. IEEE Transactions on Robotics and Automation

15(2), 238–250 (1999)

21. Mohebbi, A., Keshmiri, M., Xie, W.F.: An eye-in-hand

stereo visual servoing for tracking and catching moving

objects. In: Proceedings of the 33rd Chinese Control Con-

ference, pp. 8570–8577. IEEE (2014)

22. Morel, G., Liebezeit, T., Szewczyk, J., Boudet, S., Pot, J.:

Explicit incorporation of 2d constraints in vision based

control of robot manipulators. In: Experimental Robotics

VI, pp. 99–108. Springer (2000)

23. Nakabo, Y., Ishikawa, M., Toyoda, H., Mizuno, S.: 1 ms

column parallel vision system and its application of high

speed target tracking. In: Proceedings 2000 ICRA. Mil-

lennium Conference. IEEE International Conference on

Robotics and Automation. Symposia Proceedings (Cat.

No. 00CH37065), vol. 1, pp. 650–655. IEEE (2000)

24. Sadeghi, F.: Divis: Domain invariant visual servoing for

collision-free goal reaching. In: Robotics: Science and

Systems (RSS) (2019)

25. Siradjuddin, I., Behera, L., McGinnity, T.M., Coleman,

S.: A position based visual tracking system for a 7

dof robot manipulator using a kinect camera. In: The

2012 international joint conference on neural networks

(IJCNN), pp. 1–7. IEEE (2012)

Robotic Target Following with Slow and Delayed Visual Feedback 11

(a) Following error when target moves at 0.5Hz.

no tracking level 1 level 2 level 3 level 4

level of the tracking algorithm

0

0.05

0.1

0.15

Following error (meters)

error in x axis

error in y axis

error in z axis

(Proposed

Approach)

(b) Following error statistics when target moves at 0.5Hz.

(c) Following error when target moves at 1Hz.

no tracking level 1 level 2 level 3 level 4

level of the tracking algorithm

0

0.02

0.04

0.06

0.08

0.1

Following error (meters)

error in x axis

error in y axis

error in z axis

(Proposed

Approach)

(d) Following error statistics when target moves at 1Hz.

Fig. 11: Following errors for diﬀerent levels of controls in the experiment.

26. Wang, C., Lin, C.Y., Tomizuka, M.: Visual servoing con-

sidering sensing dynamics and robot dynamics. IFAC

Proceedings Volumes 46(5), 45–52 (2013). 6th IFAC

Symposium on Mechatronic Systems

27. Wang, C., Lin, C.Y., Tomizuka, M.: Design of kinematic

controller for real-time vision guided robot manipulators.

In: 2014 IEEE International Conference on Robotics and

Automation (ICRA), pp. 4141–4146. IEEE (2014)

28. Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.:

PoseCNN: A convolutional neural network for 6d object

pose estimation in cluttered scenes. In: Robotics: Science

and Systems (RSS) (2018)

29. Xiao, H., Chen, X.: Following fast-dynamic targets with

only slow and delayed visual feedback: A kalman ﬁlter

and model-based prediction approach. In: Dynamic Sys-

tems and Control Conference (2019)

30. Zhang, M., Liu, X., Xu, D., Cao, Z., Yu, J.: Vision-based

target-following guider for mobile robot. IEEE Transac-

tions on Industrial Electronics (2019)

31. Zheng, Y., Kuang, Y., Sugimoto, S., Astrom, K., Oku-

tomi, M.: Revisiting the PnP problem: A fast, general

and optimal solution. In: Proceedings of the IEEE Inter-

national Conference on Computer Vision, pp. 2344–2351

(2013)

A preview of this full-text is provided by Springer Nature.

Content available from International Journal of Intelligent Robotics and Applications

This content is subject to copyright. Terms and conditions apply.