Conference PaperPDF Available

Active visual features based on events to guide robot manipulators to track moving objects

Authors:

Abstract and Figures

Traditional visual servoing systems do not deal with the topic of moving objects tracking. When these systems are employed to track a moving object, depending on the object velocity, visual features can go out of the image, causing the fail of the tracking task. This occurs specially when the object and the robot are both stopped and then the object starts the movement. In this work, we have employed a retina camera based on Address Event Representation (AER) in order to use events as input in the visual servoing system. The events launched by the camera indicate a pixel movement. Event visual information is processed only at the moment it occurs, reducing the response time of visual servoing systems when they are used to track moving objects.
Content may be subject to copyright.
Active visual features based on events to guide
robot manipulators in tracking tasks
P. Gil*, G.J. García**, C.M. Mateo*, F. Torres**
*Computer Science Research Institute, University of Alicante, San Vicente del Raspeig,
Spain (Tel: +34-965-903400; e-mail: {pablo.gil,cm.mateo}@ua.es)
** Physics, Systems Engineering and Signal Theory Department, University of Alicante, San Vicente del Raspeig,
Spain (Tel: +34-965-903400; e-mail:{gjgg, fernando.torres}@ua.es)
Abstract: Traditional visual servoing systems do not deal with the topic of moving objects tracking.
When these systems are employed to track a moving object, depending on the object velocity, visual
features can go out of the image, causing the fail of the tracking task. This occurs specially when the
object and the robot are both stopped and then the object starts the movement. In this work, we have
employed a retina camera based on Address Event Representation (AER) in order to use events as input
in the visual servoing system. The events launched by the camera indicate a pixel movement. Event
visual information is processed only at the moment it occurs, reducing the response time of visual
servoing systems when they are used to track moving objects.
Keywords: AER, asynchronous vision sensor, visual servoing, robot vision systems, spike events, image
motion.
1. INTRODUCTION
One of the most valuable advantages of an image-based
visual servoing scheme is its robustness to calibration errors,
Espiau (1994). These systems have been proved to be robust
to great errors in the camera intrinsic parameters estimation,
but also to calibration errors between camera and robot 3D
Cartesian space, Malis et al. (2010). Thus, these systems have
been widely used in the literature for guiding a robot
manipulator in unstructured workspaces, Chaumette et al.
(2006). Nevertheless, there is a main parameter of these
systems, which is often simplified: visual features. This
simplification detracts image-based visual servoing from
dealing with unstructured environments.
In general, in classical visual servoing systems, features can
be segments, points of edge, etc. based on gradient
operations, Marchand et al. (2005). These features identify
keypoints of the object in the scene and allow us to guide
robots and generate trajectories between an initial pose and a
desired pose. The object is usually a pattern with fiducial
marks, Fiala (2010). Some efforts have been made to deal
with this problem in the literature. By the end of 20th century,
Janabi-Sharifi et al. (1997), proposed an automatic visual
features selection method to increase the possibilities of the
system. More recently, works like, Chaumette (2004) or,
Kragic et al. (2001) tried to obtain different visual features
like image moments or cue integration, which could be
obtained directly from the object to be tracked. Nowadays,
the topic continues being studied in the literature, Gratal et al.
(2012). In this last work, Gratal et al. described a visual
servoing scheme to track unknown objects by using a 3D
model tracking based on virtual visual servoing.
In this paper, we implement a classical image-based visual
servoing system behaviour using Address Event
Representation (AER) technology, Lichsteiner et al. (2008).
In this case, we have changed the way in which the visual
features are obtained. The used camera sensor is a retina
camera DVS128, and its behaviour is very different to
classical cameras based on CMOS or CCD sensor. With
DVS128, features are not obtained from gradients in an
image. They are computed from pixel-event. Then, a
clustering process is carried out to obtain active visual
features from pixel-events. The data stream is reduced by
using the event-based control theory, Aström (2008). An
event is something that occurs requiring some response. The
basic idea is to communicate, compute, or control only when
something significant occurs in the system. Event-based
control has been applied to many fields. Sanchez et al. (2009)
used events to control the level of a water tank. García et al
(2013) proposed a visual controller based on events to avoid
loss of features in the image.
In addition, in indirect visual servoing systems, the joint pose
of the robot is not controlled. A classical image-based visual
servoing scheme controls the robot end-effector’s movement
from the computation of the visual difference between any
position of the features and the final one. Traditionally, visual
servoing systems do not deal with moving objects, and only
the robot is moved in the scene to achieve the desired pose.
Visual servoing was born as a positioning technique, Espiau
et al. (1992). An eye-in-hand technique is referred as a
specific camera configuration where the camera is mounted
at the robot’s end-effector. The knowledge of the fixed
camera pose with respect to the robot’s end-effector allows us
to talk about the end-effector velocity instead of the camera
Preprints of the 19th World Congress
The International Federation of Automatic Control
Cape Town, South Africa. August 24-29, 2014
Copyright © 2014 IFAC 11890
velocity without loss of generality. The classical image-based
visual servoing control law computes the camera velocity to
minimize the error between the current position and the
desired one, Chaumette et al. (2006). When the reference
object (from which visual features are computed) is moving
in the scene, the controller must take this movement into
account. A new term is added to the control law. This new
term must be estimated. One of the most used techniques to
estimate this term is a Kalman filter, as it can be seen in a
recent work of Liu et al. (2012). In early works, such as
Cretual et al. (2001), authors proposed a method to do visual
servoing based on movement. Firstly, they used geometric
features retrieved by integration motion without changing the
classical control laws. Secondly, they used a motion field to
model features in image sequence. In those works, the
amount of information to be processed was high due to the
very dense map of features generated from motion field, if it
is compared with classical visual systems that use only four
features. To attach this problem and solve it, we have used a
retina camera based on AER sensor. The pixels of an image
are captured from AER sensor and they are only represented
when those pixels have been affected by luminosity changes.
If the pixel suffers changes, the pixel represents a keypoint of
a moving object. The key is the small data volume generated
by AER sensors in contrast to classic cameras which
represents the information of all pixels in the sensor.
The main goal of this work is to design a visual servoing
system to guide a robot manipulator by tracking a moving
object without fiducial markers, reducing the complexity of
high-level computer vision algorithms in order to work in a
real-time scope. After an initial step where the object starts its
movement and the robot is stopped, this proposed system
switches to a classical image-based visual servoing scheme to
perform the tracking.
The paper is structured as follows. Section 2 shows the basics
of AER technology and an approach to select events in the
frames. In Section 3, a strategy to determine how they can be
used to obtain active visual features is presented along with
the visual control system to track moving objects from these
computed active visual features; Section 4 presents different
experiments to validate the proposal; and finally, in Section 5
the main conclusions are discussed.
2. EVENT-FEATURES FOR ROBOT CONTROL
2.1 Event Representation System: AER technology
AER sensor generates digital images composed of 128 rows
and 128 columns, so it has 16384 candidates to pixel-events.
The value of each pixel is computed from the changes in
luminosity as a brightness derivative in time. The changes in
luminosity can be obtained when an object is moving in front
of retina camera whenever the ambient light is not varying. In
DVS128, two encoders (1 for rows and 1 for columns) are
used to generate two packets composed of a code of 7 bits
that will be transmitted. The code of each packet from an
encoder represents the position of a pixel in a row and a
column. Each packet from each encoder is sent using a
parallel bus composed of 7 lines of communication to send
all code simultaneously, a bit for each line. Nevertheless,
AER technology in Dynamic Vision Sensors (DVS) only
sends the code (the position of pixels) when the brightness of
a pixel has changed in time. But also, the retina camera never
transmits intensity of the pixels (value of brightness [0-255]),
only the position of pixel which changes. The values of
brightness cannot be obtained because they have been
codified using Pulse Frequency Modulation (PFM).
AER works in three steps. Firstly, the system counts the
signal spikes. These spikes are used to determine the
luminosity changes when the movement occurs. Secondly,
the spike frequency is saved in a table in memory. The
position index of the table represents the pixel position, row
and column. Thus, the number of luminosity changes is
associated to a pixel. Thereby, the retina camera digitizes the
number of luminosity changes and then the pixel represents
the number of times that a brightness changes and not the
brightness values. Each pixel represents a spatial derivative in
time of the intensity obtained from the moving objects in
scene. This fact implies that only the information of moving
objects can be seen in the resulting image.
2.2 Classification and selection of events
AER works in real-time scope. It processes the information
captured in a continuous way. The information flow cannot
be stopped. The implementation requires processing or
discarding the acquired information. Thus, the computed
events, their classifications and the visual features computed
from events can be automatically self-corrected over time.
Fig. 1 shows the approach to estimate visual features from
motion represented by pixel-events computed from AER.
Pixel-Events
stream
Calculation of
Transition-
Events
Filtering from
time
Clustering
Estimation of
visual features
Event Map
Event-Clusters
ON/OFF
Transitions
Fig. 1. Our method to classify events and estimate visual
features
19th IFAC World Congress
Cape Town, South Africa. August 24-29, 2014
11891
In Fig. 2, the dots are a series of pixel-events with a polarity
either positive (labelled as ’ON’) or negative (labelled as
‘OFF’). The polarity is positive when there is a positive
change in brightness and, thus, a negative change in the
contrast is detected.
a)
b)
Fig. 2. a) Original AER image (without filter). b) Filtered
image of a moving scene.
Each pixel-event is represented by a tuple (tk, pk, xk, yk) where
the scalar tk is the timestamp when the event is generated and
it is measured in sec. The value pk is the polarity and it is ‘1’
when is ON and ‘0’ when is OFF. The coordinate xk, yk
represents the position of the pixel-event in the frame. The
index k=1…n where n is the number of pixel-events detected
in the frame.
In each frame, pixel-events whose activation is decorrelated
with the scene can appear. Sometimes, this variability can be
defined as an unpredictable noise. This random noise occurs
due to the silicon substrate characteristics of the circuits of
the hardware device, DVS128. This noise is attenuated using
a median filter.
In our approach, an idea similar to Censi, et al. (2013) has
been used. Thus, we look for both ON-OFF and OFF-ON
transitions from a frame sequence. Each transition is
modelled as (
t,
t,
p, xk, yk) where
t is the instant of time at
which a transition is detected, the time interval
t=tk-tk-j is the
transition time and it represents the elapsed time to generate a
transition from the generation of a pixel-event until the
instant when this pixel-event changes its polarity and
p
identifies the type of two transitions. The transition history
represents the state of a particular pixel-event over time (Fig.
3).
Pixel-event
(tk,pk,xk,yk)
Transition-event
(t, t p,xk,yk)
OFFONONOFFOFFONONON
pk=1 pk=1
pk=0
p=ON-OFF
t=tk+4 -t
k
tk-3 tk-2 tk-1 tktk+1 tk+2 tk+3 tk+4
p=OFF-ON p=ON-OFF
t
Fig. 3.Calculation of transition-events from each pixel-events
stream
In order to classify events, our method uses the event
transition and its associated time. Thus, a high transition time
indicates a pixel in the frame with slow movements. Further a
low transition time indicates that the pixel is moving quickly.
Then, our method filters each frame to show only the
transitions selected according to the speed of movement
represented in a frame sequence. Hence, we can easily find
active regions, which represent movement in a frame from a
real dynamic scene.
otherwise
yxpte t
kktk 0
,1
),,,,( 21
(1)
In our case, two thresholds are used to detect the type of
movement. Using (1), these two thresholds allow us to filter
high speed if the transition time is low, and slow speed if the
transition time is high or a speed ranging between both
thresholds. Thereby, for transition (tk,
t,
p, xk, yk) the event
is labelled according to (1).
The experiment in Fig. 4 and 5 consists of an observation of
two objects moving at a constant lineal velocity. The surface
of the objects is homogeneous (they have the same texture
and colour) but they are moved at different lineal velocities
and different trajectories in a plane. The background is a
black surface and the objects are two autonomous moving
robots programmed from open code. In this experiment, the
detection of the objects is computed from different known
speeds obtained by driving the DC motor that moves the
moving robots.
Firstly, Fig. 4 shows the ability of the proposed filter to
analyse the quantity of movement in real-time using the
continuous train of pixel-events from the camera. Fig. 4a
shows the scene without filtering. Fig. 4b and Fig. 4c show
the detection of pixel-events, which represent the surfaces of
each mobile robot at high velocity and at low velocity,
respectively. The objects are computed by accumulating
pixel-events for a specific duration. This duration is
dependent on the movement velocity. The fast movement
provides the best estimation of the object position due to a
dense and accurate events map.
In order to cluster the event-image, a map based on 3-D array
is computed. The map is used to save references among
groups having similar timestamp and spatial closeness.
Thereby, C={C1,…,Cm} are the groups of events where n
represents the number of them. The number of groups
detected is dependent on the thresholds used in (1). So, they
are dependent on the timestamp when they are detected. In
our case, m=3 because two thresholds were used. These
clusters represent the background (objects without
movement), foreground (objects which are moving very
quickly in front of the camera) and other objects that are
moving slowly far away from the camera. Each cluster, Ci, is
defined as:
i
i
Ce
ikii
Ce
iti
iiii nyxyx
nt
yxtC /),(),(
/
),,(
(2)
temporalit t
(3)
cedisikik ryCyexCxe tan
22 )..()..( (4)
ON-OFF OFF-ON
Time=99202
SizeOFFON=89
SizeONOFF=88
Time=99202
SizeOFFON=293
SizeONOFF=269
19th IFAC World Congress
Cape Town, South Africa. August 24-29, 2014
11892
a)
b)
c)
Fig. 4. Events selection from AER images. a) All events. b) High line velocity. c) Low line velocity
Cluster C2
Cluster C3
Cluster C1
19th IFAC World Congress
Cape Town, South Africa. August 24-29, 2014
11893
In order to obtain each cluster, a comparison between the
event’s timestamp,
t, with the average timestamp of the
neighbouring (xk+dx,yk+dy) was done. Thus, an event is an
item of the cluster supposing it satisfies (3) and (4). That is, if
the average timestamp is less than a tolerance value,
, and if
the position of the event is in a neighbouring with radius, r
pixels.
In the experiment shown in Fig. 4 and 5, we measure the
ability for the filtering of movement from some objects
moving at a constant but different velocity. The constraints
on the timestamps of the events applied from thresholds
allow us to be able to estimate the position of the objects at
each instant of time.
140 160 180 200 220 240 260 280 300 320 340
0
500
1000
Scene without filter
Frame
Number of ev ents
140 160 180 200 220 240 260 280 300 320 340
-50
0
50
100
150
Slow and fast objec t (green and blue)
Frame
Number of even ts
Fig. 5. Events integrated in each object shown in Fig. 4
Fig. 5 shows the number of events for the scene without filter
and for each detected event-cluster for a sequence of frames.
First and last frames have been erased to measure the number
of events. In this case, the detected event-clusters represent
the grouped pixel-events for each object with movement. In
addition, the average and standard deviation have been drawn
in the sequence of movement in order to estimate the
observed errors. The position errors are due to variability in
the number of events computed from two consecutive times,
tk and tk+1. Although the objects are moved at constant speed,
the number of events of its clusters is changed. Those take
place because the sensor emits a noise impulse then
spontaneous events occur, randomly. In addition, optic
characteristics, lens mounted on the sensor, the mobile
surface geometry for each object and how these surfaces
reflect the light when the position and orientation changes
determine this error, too.
3. VISUAL FEATURES AND TRACKING SYSTEM
3.1 Active visual features from selected events
The real-time processing to compute the events limits the
quality and precision to obtain features to be used in visual
servoing systems. In visual servoing systems, the features
must be obtained with robustness, accurately and easily. This
way, good result is guaranteed and a success guide of robots
can be done using the extracted features. That is achieved
because the visual features are computed from fiducial
markers in the image. A disadvantage is the high-level image
processing slowing down the behaviour of the visual servoing
system based on image. To solve that, in this paper, the visual
features are previously computed from selected events (Fig.
6). These events must satisfy several criteria among all events
detected by the retina camera. The criteria are discussed
below.
Each cluster represents two groups of survivor pixel-event
transitions such as ON-OFF and OFF-ON. These two groups
define a solid object. First step is the computation of the
position of the cluster. The position is met by measuring the
pixel-event distribution of the two groups. This objective is
achieved by computing the median parameter in each case.
The median provides better detection that the mean parameter
because it measures the central tendency. The median is not
influenced by noise in the cluster. It is very much robust. The
noise in the cluster can be identified by few extreme values or
bad definition in the contours of groups. The Cartesian
coordinates of these two means define the visual features as
two points in each frame.
In addition, the line segment used to connect these points
determines the major axis of the object, and the angle of this
axis defines the orientation of the object. In order to obtain
two additional points as visual features, the middle point of
the major axis is computed. Then a new line segment which
is at a right angle with the major axis is computed from the
intersection in that point. The two additional features are two
virtual points located at the same distance from the middle
point. Therefore, the visual points are the endpoints of the
two axes (Fig. 6).
020 40 60 80 100 120
0
20
40
60
80
100
120
Corodinate x
(p
ixel-event
)
Coordinate y (pix el-event )
Fig. 6. Computation of Active Visual features. OFF-ON and
ON-OFF events are represented by blue and black asterisks.
3.2 Visual servoing
Image-based visual servoing uses only visual features
extracted from the acquired images, s, to control the robot.
Therefore, these controllers do not need neither a complete
3-D model of the scene nor a perfect camera calibration. The
desired visual features, sd, are obtained from a desired final
position of the robot in the scene. Image-based visual
s = {f1, f2, f3, f4}
f1
f2
f
3
f
4
f2
f3 f1
f4
19th IFAC World Congress
Cape Town, South Africa. August 24-29, 2014
11894
servoing controllers minimize the error between any robot
position and the goal position by minimizing the error of the
visual features computed from the images acquired at each
robot position, e = (ssd). To minimize exponentially this
error e, a proportional control law is used:
=-
λ
ee (5)
where λ is a proportional gain.
In a basic image-based visual servoing approach, the velocity
of the camera, vc is the command input for controlling the
robot movements. To obtain the control law, the interaction
matrix, Ls, must be firstly presented. Chaumette et al. (2006)
denoted this concept. Interaction matrix relates the variations
of the visual features in the image with the variations of the
poses of the camera in the 3D space, i.e. its velocity.
sc
=sLv
(6)
From Equation (5) and Equation (6), the velocity of the
camera to exponentially minimize the error in the image is
obtained:

+
csd
ˆ
=-λ-vLss
(7)
where ˆ+
s
L is the pseudoinverse of an approximation of the
interaction matrix. This camera velocity is then transformed
to obtain the velocity to be applied to the end-effector of the
robot. To do this, the constant relation between the camera
and the end-effector is used in a camera-in-hand
configuration (i.e., the camera is mounted at the end-effector
of the robot).
In this work, we have developed a tracking system for objects
moving in the robot’s workspace. The tracking has been
divided in three different phases (see Fig. 7).
A first stage permits to obtain an initial time, τt, that can be
used to filter the object which must been tracked (see Section
2).
The second stage guides the robot manipulator in the tracking
of the initial movement of the object (which may produce a
fail of a classical visual servoing scheme by losing the visual
features in the image). Active visual features (AVF) are used
in the control law depicted in (7) in order to perform a
quickly native tracking of the initial movement of the object.
The proposal allows the system to guide the robot when the
object radically changes its velocity under conditions where
classical image-based visual servoing can lose its visual
features.
Finally, when active visual features are lost, a classical
image-based visual servoing continues the tracking of the
moving object.
The first stage has been detailed in Section 2. For the second
step, active visual features, s, are obtained as described in
Section 3.1. The desired visual features, sd, are stored in an
off-line step positioning the visual features in the middle of
tis not defined t
tis
computed to
filter
i0... ... ik... Ik+x ...
AVF are lost
Motion estimation Robot guided from AVF Classic Visual Servoing
AVF
computed
Analysis Event-
Histogram
Fig. 7. Scheme of the tracking system proposed
the image. As visual features are obtained from the
movement of the object, the set of desired visual features are
obtained by moving the object but not the manipulator with
the camera. For our experiments, the camera movements
have been restricted to a plane parallel to a table where the
object is moving. Thus, the movement establishes an area for
the object and with this information the desired visual
features are stored in the middle of the image plane. When
the objects move through the scene, the camera launches the
events and a new set of visual features, s, is computed. This
feedback permits to obtain a new robot’s end-effector
velocity. The new velocity is restricted to the translational
velocities in a parallel plane to the table (X, Y) and a
rotational velocity for the Z axis.
In order to switch from first stage to the second one, a
histogram as shown in Fig 8 is used. Histogram represents
the number of events for each transition time in the image.
a) b) c)
Fig. 8. Histograms show the quality of events. Each
histogram represents elements number in y-axis and delta-
time in x-axis.
When the mobile robot is moving and robot is stopped, the
robot motion generates events whose associated transition is
dominant in the event image (Fig. 8a). This time permits to
choose, t,, to better filter the event image and to obtain the
pixel-event belonging to the robot motion. Then AVF can be
computed with more accuracy. Thereby, the mobile robot can
be tracked by the robot manipulator. However, through the
guided process, the manipulator changes the velocity to track
the AVF because the features obtained are closer to the
desired one at each iteration. Relative speed between
19th IFAC World Congress
Cape Town, South Africa. August 24-29, 2014
11895
manipulator and mobile robot is smaller so the histogram
changes as shown in Fig. 8b. When the relative speed
approaches to zero, the histogram begins to be homogenous
(Fig. 8c) and consequently the AVF are lost. Once the AVF
are lost, the system switches to an image-based visual
servoing like in equation (7). Visual features are obtained
from a conventional camera located at the robot’s end-
effector. The centre of gravity of the mobile robot is used to
obtain the four points in the image, similarly to the technique
employed in Section 3.1.
4. RESULTS AND DISCUSSION
The system architecture to test the proposal is based on a
robot manipulator of 7 degrees of freedom (a Mitsubishi
PA10). Two cameras have been attached at its end-effector:
an AER camera (DVS128) and a conventional Gigabit
Ethernet camera (Mikrotron MC1324). The first camera is
employed to guide the manipulator in the second stage, where
we have proposed an algorithm to track a moving object
using the events provided by this kind of cameras. The
second camera is used to guide the robot in the third phase,
where the robot is guided with a classical image-based visual
servoing scheme. A mobile robot developed by Goshield has
been used to represent the moving object in the scene with
different velocities. The test system architecture is depicted in
Fig. 9.
Fig. 9. Test system architecture
Fig. 10 represents an experiment where the mobile robot
controlled crosses the manipulator’s workspace. In order to
have a better comprehension of the active visual features
extraction, this figure shows a sequence obtained from the
camera located at the manipulator in a parallel plane to the
table. The manipulator has been stopped in order to better see
the mobile robot movement through the image plane. This
experiment shows the evolution of the active visual features
through the image. This evolution can be then tracked with
the proposed scheme described in Section 3.2.
5. CONCLUSIONS
This paper describes an event-based visual servoing system.
Visual features for the robot guidance are obtained without
fiducial marks in the scene. Traditional visual servoing use a
pattern-object with marks to easily extract visual features
using high-level computer vision techniques and to determine
its position in the image. On the contrary, the approach
shown in this paper uses the measurement of motion to detect
the position of the object. Thus, a pattern-object is not
required in the scene. In this work we have used a retina
camera based on AER in order to compute the movements
through events as input in the visual servoing system. This
approach only uses event visual information and at the
moment they occur for reducing the response time. The
number of events depends on the latency, transition time and
type of pixel-events (ON, OFF). In addition, the number of
events depends on the speed of motion in the scene. In this
paper, we have evaluated the use of this technology for
tracking the initial motion of mobile robots from a camera
mounted at the end of a robot manipulator.
The developments shown in this paper can be used directly to
perform a pick-and-place task. Now, we are working on
extending the second stage to completely control the robot
manipulator through only the events camera. Information of
the movement can be directly retrieved from the AER
camera. This information can be employed directly in the
control law where other works employ an estimate of it.
ACKNOWLEDGEMENTS
The research leading to these results has received funding
from the Spanish Government and European FEDER funds
(DPI2012-32390), the Valencia Regional Government
(GV2012/102 and PROMETEO/2013/085) and the
University of Alicante (GRE12-17).
REFERENCES
Aström, K. J. (2008). Event Based Control. In Astolfi, A.
and Marconi, L. Analysis and Design of Nonlinear
Control Systems, 127-147. Springer Berlin Heidelberg.
Censi, A., Strubel, J., Brandi, C., Delbruck, T. and
Scaramuzza, D. (2013). Low-latency localization by
Active LED Markers tracking using a Dynamic Vision
Sensor. In Proceedings of IEEE/RSJ International
Conference on Intelligent Robos and Systems (IROS).
Chaumette, F. (2004). Image moments: a general and useful
set of features for visual servoing. IEEE Transactions
on Robotics, 20(4), 713–723.
Chaumette, F., and Hutchinson, S. (2006). Part I: Basic
Approaches. IEEE Robotics & Automation Magazine,
13(4), 82–90.
Cretual, A. and Chaumette, F. (2001). Visual Servoing
Based on Image Motion. International Journal of
Robotics Research, 20 (11), 857-877.
Espiau, B., Chaumette, F., and Rives, P. (1992). A new
approach to visual servoing in robotics. IEEE
Transactions on Robotics and Automation, 8(3), 313–
326.
Espiau, B. (1994) Effect of camera calibration errors on
visual servoing in robotics. In Yoshikawa, L.T and
Miyazaki, F, Experimental Robotics III, LNCIS vol.
200, 182–192. Springer Berlin Heidelberg.
19th IFAC World Congress
Cape Town, South Africa. August 24-29, 2014
11896
Fiala, M. (2010). Designing Highly Reliable Fiducial
Markers. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 32 (7), 1317-1324.
García, G.J., Pomares, J., Torres, F. and Gil, P., (2013).
Event-based visual servoing. In Proceedings of the 10th
International Conference on Informatics in Control,
Automation and Robotics (ICINCO), 307-314,
SciTePress.
Gratal, X., Romero, J., Bohg, J., and Kragic, D. (2012).
Visual servoing on unknown objects. Mechatronics, 22
(4), 423–435.
Janabi-Sharifi, F., and Wilson, W. J. (1997). Automatic
selection of image features for visual servoing. IEEE
Transactions on Robotics and Automation, 13(6), 890–
903.
Kragic, D., and Christensen, H. I. (2001). Cue integration for
visual servoing. IEEE Transactions on Robotics and
Automation, 17(1), 18–27.
Lichtsteiner, P., Posch, C. and Delbruck, T. (2008). A
128x128 120dB 15s Latency Asynchronous Temporal
Constrast Vision Sensor. IEEE Journal of Solid-State
Circuits, 43 (2), 566-576.
Liu, C., Huang, X., and Wang, M. (2012). Target Tracking
for Visual Servoing Systems Based on an Adaptive
Kalman Filter. International Journal of Advanced
Robotic Systems, 9(149), 1-12.
Malis, E., Mezouar, Y. and Rives, P. (2010). Robustness of
Image-Based Visual Servoing with a Calibrated
Camera in the Presence of Uncertainties in the Three-
Dimensional Structure. IEEE Transactions on
Robotics, 26(1), 112–120.
Marchand, É. and Chaumette, F. (2005). Feature tracking for
visual servoing purposes. Robotics and Autonomous
Systems, 52(1), 53–70.
Sánchez, J., Guarnes, M. A. and Dormido, S., (2009). On the
application of different event-based sampling strategies
to the control of a simple industrial process, Sensors, 9,
6795-6818.
Fig. 10. a) Detection of Active Visual features from Event-Clusters (outside). b) Evolution of features for each frame (inside).
20 30 40 50 60 70 80 90 100 110
20
25
30
35
40
45
50
55
60
65
X (px)
Y (px)
f1
f2
f3
f4
i=90
i=150 i=140 i=130 i=110 i=100
i=160 a)
i=1 i=20 i=40 i=50 i=60
i=170
b)
i=170
f2
f3
f4
f1
Cluster C1
Cluster C2
19th IFAC World Congress
Cape Town, South Africa. August 24-29, 2014
11897
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Visual servoing has been around for decades, but time delay is still one of the most troublesome problems to achieve target tracking. To circumvent the problem, in this paper, the Kalman filter is employed to estimate the future position of the object. In order to introduce the Kalman filter, accurate time delays, which include the processing lag and the motion lag, need to be obtained. Thus, the delays of the visual control servoing systems are discussed and a generic timing model for the system is provided. Then, we present a current statistical model for a moving target. A fuzzy adaptive Kalman filter, which is evolved from the Kalman filter, is introduced based on the current statistical model. Finally, a two DOF visual controller based variable structure control law for micro-manipulation is presented. The results show that the proposed adaptive Kalman filter can improve the ability to track moving targets.
Conference Paper
Full-text available
At the current state of the art, the agility of an autonomous flying robot is limited by its sensing pipeline, because the relatively high latency and low sampling frequency limit the aggressiveness of the control strategies that can be implemented. To obtain more agile robots, we need faster sensing pipelines. A Dynamic Vision Sensor (DVS) is a very different sensor than a normal CMOS camera: rather than providing discrete frames like a CMOS camera, the sensor output is a sequence of asynchronous timestamped events each describing a change in the perceived brightness at a single pixel. The latency of such sensors can be measured in the microseconds, thus offering the theoretical possibility of creating a sensing pipeline whose latency is negligible compared to the dynamics of the platform. However, to use these sensors we must rethink the way we interpret visual data. This paper presents a method for low-latency pose tracking using a DVS and Active Led Markers (ALMs), which are LEDs blinking at high frequency (>1 KHz). The sensor's time resolution allows distinguishing different frequencies, thus avoiding the need for data association. This approach is compared to traditional pose tracking based on a CMOS camera. The DVS performance is not affected by fast motion, unlike the CMOS camera, which suffers from motion blur.
Conference Paper
Full-text available
Traditional visual servoing systems have been widely studied in the last years. These systems control the position of the camera attached to the robot end-effector guiding it from any position to the desired one. These controllers can be improved by using the event-based control paradigm. The system proposed in this paper is based on the idea of activating the visual controller only when something significant has occurred in the system (e.g. when any visual feature can be loosen because it is going outside the frame). Different event triggers have been defined in the image space in order to activate or deactivate the visual controller. The tests implemented to validate the proposal have proved that this new scheme avoids visual features to go out of the image whereas the system complexity is reduced considerably. Events can be used in the future to change different parameters of the visual servoing systems.
Article
Full-text available
We study visual servoing in a framework of detection and grasping of unknown objects. Classically, visual servoing has been used for applications where the object to be servoed on is known to the robot prior to the task execution. In addition, most of the methods concentrate on aligning the robot hand with the object without grasping it. In our work, visual servoing techniques are used as building blocks in a system capable of detecting and grasping unknown objects in natural scenes. We show how different visual servoing techniques facilitate a complete grasping cycle.
Article
Full-text available
This paper is an experimental study of the utilization of different event-based strategies for the automatic control of a simple but very representative industrial process: the level control of a tank. In an event-based control approach it is the triggering of a specific event, and not the time, that instructs the sensor to send the current state of the process to the controller, and the controller to compute a new control action and send it to the actuator. In the document, five control strategies based on different event-based sampling techniques are described, compared, and contrasted with a classical time-based control approach and a hybrid one. The common denominator in the time, the hybrid, and the event-based control approaches is the controller: a proportional-integral algorithm with adaptations depending on the selected control approach. To compare and contrast each one of the hybrid and the pure event-based control algorithms with the time-based counterpart, the two tasks that a control strategy must achieve (set-point following and disturbance rejection) are independently analyzed. The experimental study provides new proof concerning the ability of event-based control strategies to minimize the data exchange among the control agents (sensors, controllers, actuators) when an error-free control of the process is not a hard requirement.
Article
Object tracking is an important issue for research and application related to visual servoing and more generally for robot vision. In this paper, we address the problem of realizing visual servoing tasks on complex objects in real environments. We briefly present a set of tracking algorithms (2D features-based or motion-based tracking, 3D model-based tracking, …) that have been used for 10 years to achieve this goal.
Article
Abstract The general aim of visual servoing is to control the motion of a robot so that visual features acquired by a camera,become,superimposed with a desired visual pattern. Visual servoing based on geometrical features such as image point coordinates is now,well established. Nevertheless, this approach has the drawback that it usually needs visual marks on the observed object to retrieve geometric features. The idea developed in this paper is to use motion in the image as the input of the control scheme,since it can be estimated without any a priori knowledge of the observed scene. Thus, more realistic scenes or objects can be considered. Two different methods,are presented. In the first method, geometric features are retrieved by integration of motion, which allows the use of classical control laws. This method is applied t oa6d,egree-of-freedom positioning task. The authors show that, in such a case, an affine model of 2-D motion is insufficient to ensure convergence,and that a quadratic model is needed. In the second method, the principle is to try to obtain a desired 2-D motion field in the image sequence. In usual image-based visual servoing, variations of visual features are linearly linked to the camera velocity. In this case, the corresponding relation is more complex, and the authors describe how it is possible to use this relation. This approach is illustrated with two tasks: positioning a camera parallel to a plane and following trajectory. KEY WORDS—visual servoing, 2D image motion,
Article
This paper describes a CMOS vision sensor which is inspired by biological visual systems. Each pixel independently and in continuous time quantizes local relative intensity changes to generate spike events. These events appear at the output of the sensor as an asynchronous stream of digital pixel addresses. These address-events signify scene reflectance change and have sub-millisecond timing precision. The output data rate depends on the dynamic content of the scene and is typically orders of magnitude lower than those of conventional frame-based imagers. By combining an active front-end logarithmic photoreceptor running in continuous time with a self-timed switched-capacitor differencing circuit, the sensor achieves an array mismatch of 2.1% in relative intensity event threshold and a pixel bandwidth of 3 kHz under 1 klux scene illumination. Dynamic range is >120 dB and chip power consumption is 23 mW. Event latency shows weak light dependency and decreases to 15 us at >1 klux pixel illumination. The sensor is built in a 0.35u 4M2P process yielding 40x40 um2 pixels with 9.4% fill-factor. By providing high pixel bandwidth, wide dynamic range, and precisely- timed sparse digital output, this neuromorphic silicon retina provides an attractive combination of characteristics for low-latency dynamic vision under uncontrolled illumination with low post-processing requirements.