Conference PaperPDF Available

Tree Cavity Inspection Using Aerial Robots


Abstract and Figures

We present an aerial robotic platform for remote tree cavity inspection, based on a hexacopter Micro-Aerial vehicle (MAV) equipped with a dexterous manipulator. The goal is to make the inspection process safer and more efficient and facilitate data collection about tree cavities, which are important for the conservation of biodiversity in forest ecosystems. This work focuses on two key enabling technologies, namely a vision-based cavity detection system and strategies for high level control of the MAV and manipulator. The results of both simulation and real-world experiments are discussed at the end of the paper and demonstrate the effectiveness of our approach.
Content may be subject to copyright.
Tree Cavity Inspection Using Aerial Robots
Kelly Steich1,2, Mina Kamel2, Paul Beardsley1, Martin K. Obrist3, Roland Siegwart2and Thibault Lachat3,4
Abstract We present an aerial robotic platform for remote
tree cavity inspection, based on a hexacopter Micro-Aerial
vehicle (MAV) equipped with a dexterous manipulator. The goal
is to make the inspection process safer and more efficient and
facilitate data collection about tree cavities, which are important
for the conservation of biodiversity in forest ecosystems. This
work focuses on two key enabling technologies, namely a vision-
based cavity detection system and strategies for high level
control of the MAV and manipulator. The results of both
simulation and real-world experiments are discussed at the end
of the paper and demonstrate the effectiveness of our approach.
Aerial robots are gaining significant attention in the re-
search community through their ability to reach inaccessible
areas and operate in hazardous environment. They have
proven great potential in many relevant applications such
as infrastructure inspection [1]–[3], surveillance and security
[4], assistance in natural disasters [5] and crop monitoring
[6]. The range of possible applications is extended by in-
tegrating robotic manipulators into aerial robotic systems,
enabling them to interact with the environment [7], [8].
In this paper, we propose an aerial robotic platform for tree
cavity inspection, based on a hexacopter Micro Aerial Ve-
hicle (MAV) equipped with a dexterous manipulator. Within
the scope of this work, tree cavity inspection is defined as
the problem of (a) detecting a cavity aperture using a depth
sensor, (b) hovering in front of the aperture, (c) inserting
a stereo camera mounted on an end-effector on a robot
manipulator into the cavity and (d) capturing images of
the cavity for offline 3D reconstruction. Figure 1 shows a
schematic of such a platform and the inspection process.
Tree cavities are used by a multitude of animal species,
including birds, mammals and beetles, for shelter, nesting or
larval development and their importance for the conservation
of biodiversity in forest ecosystems is widely recognized.
However, in managed forests, trees with low economic
value, which are often those with cavities or those prone to
1Kelly Steich and Paul Beardsley are with Disney Research
Zurich, Stampfenbachstrasse 48, 8006, Zurich, Switzerland.
2Kelly Steich, Mina Kamel and Roland Siegwart are with the Au-
tonomous Systems Lab at ETH Zurich, Leonhardstrasse 21, 8092, Zurich,
3Martin K. Obrist and Thibault Lachat are with the Swiss Federal Institute
for Forest, Snow and Landscape Research WSL, Z¨
urcherstrasse 111, 8903,
Birmensdorf, Switzerland.
4Thibault Lachat is with Bern University of Applied Sciences, School of
Agricultural, Forest and Food Sciences HAFL, Zollikofen, Switzerland.
*This work has received funding partially from the WSL tree cavity in-
spection project and the European Union’s Horizon 2020 Research and Inno-
vation Programme under the Grant Agreement No.644128, AEROWORKS.
Fig. 1: A MAV equipped with a dexterous manipulator performing
a tree cavity inspection using a depth sensor to detect the cavity and
a close-up view of the end-effector of the manipulator fitted with a
stereo camera being inserted into a cross section of the cavity.
their formation, are routinely removed. Consequently, their
number has dramatically decreased in the last century and
many cavity dependent species became highly endangered.
Many researchers now call for an increased conservation
effort regarding tree cavities [9]–[12]. Effective conservation,
while minimizing the associated economic costs, requires a
better understanding of tree cavities and the characteristics
that make them ecologically valuable. However, large scale
studies and detailed data, especially about their genesis
and evolution, are rare [9], [11]. Usually, tree cavities are
observed from the ground using binoculars, resulting in
superficial information, by climbing the tree and manu-
ally inserting an endoscopic camera or by using inspection
cameras mounted on a telescopic pole. Drawbacks of the
latter two methods are the heaviness and bulkiness of the
equipment, the very basic image quality, that cannot be used
for any metric observations (e.g., dimension and volume
of the cavity) and that many trees are too unstable to be
climbed. Tree cavity inspection by means of an aerial robot
has the potential to solve all these problems, by being safer,
more time-efficient and able to gather highly detailed data.
This work focuses on cavities built by woodpeckers. These
are the most common type of cavities in managed forests,
since woodpeckers excavate a new breeding cavity every
year and such trees are increasingly maintained by forest
managers. They have circular or elliptical apertures (usually
with a diameter of 3-12 cm) and a deep hollow inside space
as shown by the cross section of a tree cavity in Figure 1.
The contributions of this paper are two key enabling
technologies for the tree cavity inspection robotic platform.
Firstly a vision-based system for the detection of cavities
of previously unknown size and secondly a strategy for
high level control of the MAV and manipulator. The system
runs semi-autonomously - an initial user input starts the
autonomous cavity detection and a pilot flies the vehicle
to a position in front of the cavity, before switching to
autonomous control. A more advanced system would handle
obstacle avoidance of tree branches to avoid the need for
manual piloting, but this was outside the scope of this work.
The remainder of the paper is organized as follows:
Section II describes the system and the tree cavity inspection
procedure. The cavity detection algorithm and high level
control strategies are detailed in Section III. In Section IV,
we discuss a comparison between three candidate sensors
to perform the cavity detection and the proposed solution is
evaluated in simulation and real–world experiment. Finally,
Section V summarizes and concludes the paper.
A. Hardware
The MAV used in this project, the AscTec Neo hexacopter
[13], comes equipped with an inertial measuring unit (IMU)
and an on-board computer with an Intel Core i7-5557U dual-
core 3.1 GHz processor and 8 GB RAM. Figure 2 shows a
Neo hexacopter platform fitted with the additional sensor
suite intended for the tree cavity inspection application.
The Visual-Inertial Sensor (VI-Sensor) developed by the
Autonomous Systems Lab at ETHZ and Skybotix AG [14],
provides stereo images tightly coupled with a high quality
IMU. It will be used for the robot’s pose estimation.
The CamBoard pico flexx from pmdtechnologies [15],
a time-of-flight 3D camera, has been installed for cavity
detection and is henceforth referred to as detection sensor.
It is small (68x17x7.25 mm), has a range of 0.1-4 m, a
resolution of 224x171 px and a frame rate of up to 45 fps.
A dexterous 3 degrees of freedom parallel robot manipu-
lator is attached to the bottom of the body of the vehicle. For
this work, the manipulator’s pitch is fixed, which restricts
its movement to a plane. The exact configuration of the
manipulator is shown in Figure 6.
An end-effector, designed for this application [16], is
attached rigidly to one link of the manipulator. It includes
AWAIBA’s NanEye stereo camera [17], an LED light and a
laser. This allows the use of either stereo vision or structured
light for 3D reconstruction. It can be rotated inside the cavity
around 2 axes (yaw and pitch) using servo motors, allowing
the sensors to capture the complete interior of the cavity.
B. Software
The software architecture is shown in Figure 3 and is
comprised of 3 parts: cavity detection, control and GUI.
All computations of the cavity detection and control run in
real time on board the MAV, avoiding communication latency
issues in these key parts of the system. The GUI runs on an
external processor (e.g., a laptop) to allow the user to interact
with the tree cavity inspection process. It communicates with
the MAV via WiFi, receiving information of the system’s
current state and sending back user commands. All software
components that are novel contributions of this paper are
marked in yellow in Figure 3. They are implemented in C++,
using ROS, QT, OpenCV and PCL.
MAV VI-Sensor
Fig. 2: The hardware comprised of an AscTec Neo hexacopter MAV,
a VI-Sensor, a CamBoard pico flexx time-of-flight 3D camera, a
parallel robot manipulator and an end-effector fitted with a NanEye
stereo camera, an LED light, a laser and servo motors.
C. Coordinate systems
Figure 3 illustrates the system’s different coordinate
frames. The world frame Wis fixed to the starting position
and orientation of the robot. The vehicle frame Vis rigidly
attached to the robot base. The camera frame Chas a fixed
position and orientation relative to Vand its origin is at the
center of the detection sensor. The manipulator frame Mhas
its origin at the base of the manipulator.
D. Cavity inspection procedure
The robot is first flown manually, helped by the robot
position controller, to a starting point where the cavity is in
the field of view of the detection sensor. This can be verified
by looking at this sensor’s depth image in the GUI.
The user clicks on this depth image to mark a seed point
belonging to the tree trunk. This starts the cavity detection
algorithm, which first finds the 3D position of the cavity in
frame Cin the depth image (see Section III-A). The detection
sensor’s point cloud is used to refine this measurement and
estimate the cavity’s normal. (see Section III-B).
Next, a Kalman filter uses the measured cavity position
and the robot pose to generate a continuous estimate of the
cavity position in frame V(see Section III-C).
The robot pose in frame Wis estimated using a Robust
Visual Inertial Odometry (ROVIO) algorithm, based on the
VI-Sensor [18]. This is further fused with the MAV’s IMU
data using a Multi Sensor Fusion (MSF) approach [19].
Once the GUI shows that the cavity has been successfully
detected, the user can activate the auto controller. Subse-
quently, the high level controller autonomously navigates the
MAV to hover directly in front of the cavity entrance at a
distance of 0.5 m, by sending commands to the robot position
controller. Finally, the user can activate the manipulator
controller, which autonomously generates commands to the
robot manipulator interface to extend it and insert the end-
effector in the cavity (see Section III-D).
The inverse kinematics derived for this manipulator config-
uration are used to transform a desired end-effector position
command to angle commands for the 2 manipulator servo
motors (see Section III-E).
point cloud
depth image
point cloud
cavity position
cavity position
cavity position
Cavity Detection
robot pose
stereo camera images
and imu data
roll, pitch, yaw,
thrust command
manipulator joint state
Fig. 3: The software architecture and the defined coordinate frames for the system.
(a) Color image. (b) Depth image. (c) Segmentation.
(d) Binarization. (e) Contours. (f) Detection.
Fig. 4: Simulated color image and depth image from a simulated
detection sensor, showing a tree cavity and the different image
processing steps performed to detect the 3D position of the cavity.
The robot’s position controller is a linear Model Predictive
Controller (MPC). Using the full state estimate feedback and
considering the vehicle dynamics, it provides attitude and
thrust reference for the low level controller running on the
autopilot provided by the MAV manufacturer. A disturbance
observer estimates external forces and moments, such that the
linear MPC can compensate for disturbances such as wind.
A. Depth image analysis
Figure 4 displays a color image and a depth image from
a simulated detection sensor, showing a tree cavity1and the
different image processing steps performed to detect the tree
trunk and the 3D position of the cavity.
First, the user marks a point in the depth image that
belongs to the tree trunk (red point in Figure 4b), to create a
seed point ps=[xs,y
s]T, where xsand ysare the image
coordinates of the marked point and zsis its depth value.
Segmentation: The image is segmented by depth using K-
means clustering with fixed K. The cluster with the centroid
closest to psis assumed to be the tree cluster Ctree (pink
1We use an image from simulation to better illustrate the concepts used,
without being distracted by increased noise.
in Figure 4c). Clusters closer to Ctree than a threshold tt
(shades of blue in Figure 4c) are merged with Ctree to
complete the tree cluster. We also assume that the centroid
with the smallest depth belongs to the the manipulator cluster
Cman. Clusters closer to Cman than a threshold tmare
merged with Cman to complete the manipulator cluster (red
in Figure 4c). Cman and Ctree are merged to form a final
tree cluster Ctm, which avoids any breaks in the tree cluster
due to obstruction by the manipulator.
Binarization: A binary image is created based on whether
an image point belongs to Ctm and small holes are closed by
applying the morphological operations dilation and erosion
(Figure 4d).
Contour extraction: Contours are extracted from the
binary image using OpenCV’s contour detection algorithm
[20] (Figure 4e), which also provides the hierarchical rela-
tionships of the contours. The contour containing psand not
surrounded by another contour, is determined to be the tree
contour. Any of its children, i.e. all contours contained within
the tree contour, are considered potential cavities.
Ellipse fitting: Ellipses are fitted, in the least-squares
sense, to all these children, as woodpecker cavities are
always circular or elliptical. This potentially produces a large
number of ellipses, so heuristic filtering is used to select
relevant ellipses. They need a minimum width and height and
the ratio of width to height cannot be too large. Any ellipses
along the very edge of the tree contour are also rejected, since
those are most likely to be false positives. Finally, only the
ellipse with the largest area Emax is selected, since it is least
likely to result from noise. Figure 4f shows the tree contour
in blue, Emax in green and the center of Emax in red.
Detection: The center pe=[ximg,y
img]Tof Emax in
image coordinates, together with the average depth of the
tree cluster zavg and the intrinsic camera parameters is used
to calculate a 3D real world position of the tree cavity pc=
real]Tin camera frame C.
This is repeated for each subsequent depth image frame,
except a new seed point psmust be computed at the start.
We calculate the moments of the previous tree contour to
find its center of mass mc, which is selected as new seed
point if its depth is within a threshold tdto the depth of the
previous seed point. Otherwise, the immediate neighborhood
of mcis searched for a point with similar enough depth. The
frame is skipped if no new seed point can be found.
B. Cavity detection refinement in the point cloud
Figure 5 displays a point cloud from a simulated detection
sensor, showing a tree cavity2and the different processing
steps for cavity detection refinement.
3D space is partitioned into an octree data structure based
on the point cloud generated by the detection sensor. The
octree allows us to efficiently determine points that lie within
a certain bounding box, which we take advantage of to both
refine the 3D position of the cavity and determine its normal.
First, we find the maximum fitting cuboid inside the cavity
(green cuboid in Figure 5a) to determine with increased
certainty the area of the cavity (Emax might have been over
fitted) and its depth. We start with a cuboid centered around
pc(red point in Figure 5a) and then iteratively query whether
there are any points in an ever increasing cuboid until a
maximum number of inlier points is reached. If the maximum
fitting cuboid is smaller than the space necessary for inserting
the end-effector, this cavity position measurement is rejected,
otherwise the center (xb,y
b)of this cuboid is used as a new
better estimate pc=[xb,y
real](green point in Figure 5a).
Next, the cavity neighborhood points (pink points in
Figure 5b) are extracted by querying for all the points lying
within a cuboid surrounding pc(green point in Figure 5b)
and larger than the maximum fitting cuboid in x and y
direction (pink cuboid in Figure 5b). The z coordinate of the
cavity position now equals the average depth of the cavity
neighborhood points zbinstead of the average of all the
points of the tree. The new cavity position measurement in
camera frame Cis pc=[xb,y
b](red point in figure 5b).
Finally the normal of the cavity Nis estimated based on
the assumption that the immediate area of the trunk bordering
the cavity is approximately planar. A plane is fitted through
the cavity neighborhood points (pink plane in Figure 5c) and
its normal is assumed to be N(red arrow in Figure 5c).
C. Kalman filter
A Kalman Filter (KF) generates a continuous estimate of
the cavity position. The state xt=[xt,y
t]of the KF is the
cavity position in frame Vat time tand it is initialized as:
0]=pc0where pc0is the first measurement
of the 3D cavity position in frame Ctransformed to frame
V. The underlying model of the KF is given by:
where wtN(0,Qt)is the process noise, vtN(0,Jt)is
the measurement noise and Vt
Vt1Rand Vt
Vt1Tare the rotation
and translation matrix respectively from frame Vat time t1
2Again, we use a point cloud from simulation to better illustrate the
concepts used, without being distracted by increased noise.
(a) Maximum fitting cuboid. (b) Cavity neighborhood.
(c) Fitted plane. (d) Cavity point and normal.
Fig. 5: Point cloud from a simulated detection sensor, showing a
tree cavity and the different processing steps performed to refine the
detection of the 3D position of the cavity and compute its normal.
to frame Vat time t. They can be expressed in terms of the
robot pose at time tand t1as follows
where W
VtRis the rotation from frame Vat time tto frame
W, which corresponds to the robot orientation at time tand
VtTis the translation from frame Vat time tto frame W,
which corresponds to the robot position at time t.
ztis the measurement of the 3D cavity position in frame
Vand is given by
CR·pct +V
where V
CRand V
CTare the fixed rotation and translation from
frame Cto frame Vrespectively and pct is the measurement
of the 3D cavity position in frame Cat time t.
The prediction step advances the state, i.e. the estimated
cavity position in frame V, at each time step (whenever there
is a new robot pose at 100Hz), while the measurement step
incorporates the measured cavity position from the vision-
based cavity detection system whenever one is available. This
allows the system to estimate the cavity position for several
seconds even without new vision measurements and reduces
the impact of false cavity detections. The output of the KF is
a continuous cavity position estimate in frame Vat 100Hz.
D. Controller
Once the auto controller is started by the user, the high
level controller calculates the currently desired position of
the robot in frame W, using the estimated cavity position
and the desired cavity position (0.5,0,0) in frame Vand
the current robot pose in frame W. With the cavity normal
transformed into frame Wusing the current robot pose, the
desired robot heading in frame Wis computed. The desired
robot position and heading are sent as a command to the
linear MPC robot pose controller.
When the user starts the arm controller, the high level
controller also sends out commands for the desired end-
effector position in frame V, i.e. the estimated cavity position
possible con
(xe,r ef ,y
e,r ef )
end e ector position reference
Fig. 6: The manipulator configuration and a visualization of the
inverse kinematics process.
in frame Vwith the addition of the depth of the cavity
in x direction, filtered trough a first order filter. They are
transformed into angle commands using inverse kinematics
and sent to the manipulator hardware interface.
E. Inverse Kinematics
Figure 6 shows the robot manipulator configuration. We
define the joint lengths Lj,j =1, ..., 6, the angles qm=
2), the fixed angle between end-effector and one manip-
ulator link , the manipulator closing position c=(xp,y
and the end-effector position e=(xe,y
e)The forward and
inverse kinematics of a planar delta manipulator have been
described in other works [21], [22]. Expanding the forward
kinematics to include the end-effector is straightforward.
To expand the inverse kinematics we use a ”brute force”
method. Given a desired end-effector position eref (blue
cross in Figure 6), we compute the possible manipulator
closing positions ci(red points in Figure 6) given eref as
the points lying on a semi-circle with center eref and radius
L6. Increasing the number of semi-circle points iresults in
higher accuracy in exchange for higher computation time.
For all cithe resulting angles qm,i are calculated (if possible)
using the known inverse kinematics for a planar delta manip-
ulator. Using forward kinematics, the resulting end-effector
positions eiare calculated. Finally, the configuration with the
minimal error between eiand eref is chosen (drawn in blue
in Figure 6) with the corresponding angles qm,i.
A. Sensor comparison
The CamBoard pico flexx was chosen as detection sensor
for this application for its small size, large range and good
performance outdoors, after comparing it with 2 other depth
sensors, the Intel RealSense Camera F200 and the Intel
RealSense Camera R200 [23], on a field trip in the forest.
Figure 7 shows a comparison of depth images acquired by
all 3 sensors of the same tree cavity at different distances,
approximately 0.3m, 0.5m and 1m, while mounted on a
telescopic pole. The CamBoard pico flexx works well at all 3
distances, which represent the most likely operational range
for the tree cavity inspection application. The other sensors
Color Image, 0.3m to tree
Color Image, 0.5m to tree
Color Image, 1m to tree
RealSense F200, 0.3m to tree
RealSense F200, 0.5m to tree
RealSense F200, 1m to tree
RealSense R200, 0.3m to tree
RealSense R200, 0.5m to tree
RealSense R200, 1m to tree
Pico exx, 0.3m to tree
Pico exx, 0.5m to tree
Pico exx, 1m to tree
Fig. 7: Comparison of depth images acquired by the Intel RealSense
Camera F200 (2nd row), Intel RealSense Camera R200 (3rd row)
and pmdtechnologies CamBoard pico flexx (4th row) of the same
tree cavity at approximately 0.3m (1st column), 0.50m (2nd column)
and 1m (3rd column) while mounted on a telescopic pole on a field
trip in the forest . The first row shows a color image of the cavity
taken at the corresponding distance from the tree.
Fig. 8: Simulation experiment setup and result.
fail either at close or far range. Additionally the CamBoard
pico flexx is the smallest of the 3 sensors and produced
results with very little noise outdoors. Tests were done under
an overcast sky, so sunlight was not a factor in the poorer
quality results obtained from the two RealSense sensors.
B. Simulation studies
We first test our approach in the Gazebo–based simulation
environment RotorS [24]. RotorS is used to simulate a
MAV based on the provided model of the AscTec Firefly
hexacopter [13] with a VI-Sensor, a CamBoard pico flexx, a
robot manipulator with an end-effector and the environment
which the simulated MAV operates in. The latter is populated
with a realistic 3D model of a tree cavity and tree models
in the background, as shown in Figure 8.
In the simulation, the vehicle starts on the ground 1m
in front of the tree and then flies to several manually set
waypoints, where the visual of the cavity is sometimes lost.
Then the auto controller is started and the MAV navigates to
hover 0.5m in front of the cavity. Finally, the arm controller
is started and the manipulator is extended to insert the
0 10 20 30 40 50 60 70 80
time [s]
coordinate [m]
X coordinate
0 10 20 30 40 50 60 70 80
time [s]
coordinate [m]
Y coordinate
0 10 20 30 40 50 60 70 80
time [s]
coordinate [m]
Z coordinate
Ground Truth
Kalman filter estimation Lost cavity in vision
Measurement (from vision)
Fig. 9: Comparison of the estimated cavity position, the measured
cavity position from the vision-based detection and the ground truth
in frame Vduring the simulation experiment.
TABLE I: MAE and SD for the estimated cavity position compared
to the ground truth from the simulation and real-world experiment.
Simulation Real-world
MAE [m] 0.006 0.001 0.008 0.019 0.039 0.024
SD [m] 0.002 0.001 0.004 0.012 0.011 0.009
end-effector in the cavity. Figure 8 shows the end-effector
correctly inserted in the cavity during an experiment.
To evaluate our cavity detection approach, the estimated
cavity position is tracked during this whole scenario and
compared to the ground truth, as shown in Figure 9. In
the simulation experiments, the ground truth robot pose was
used as input for the KF instead of the pose estimated
by MSF. The red curve represents the KF estimate of the
cavity position, while the blue dots represent the measured
cavity position from vision-based detection and the black
curve shows the ground truth. The filtered position estimate
remains sufficiently accurate even if the measured position
fluctuates heavily or if there is no result from vision-based
detection at all for a certain time window (either because the
cavity was not detected correctly or was not in the field of
view of the detection sensor anymore).
The mean absolute error (MAE) and the standard deviation
of the absolute error (SD) can be found in Table I. The
MAE is below 1cm in all directions. The very low error
is partly due to using the ground truth robot pose as input
in the KF. We can still conclude that our proposed approach
effectively gives an accurate and continuous estimate of the
cavity position and performs satisfactory enough that we can
move on to real-world experiments.
C. Experimental evaluation
Real-world experiments, with the aerial robot platform
described in II-A, were conducted to further evaluate the
0 5 10 15 20 25 30 35
time [s]
coordinate [m]
X coordinate
Ground Truth
Kalman filter estimation
Measurement (from vision)
Lost cavity in vision
0 5 10 15 20 25 30 35
time [s]
coordinate [m]
Y coordinate
0 5 10 15 20 25 30 35
time [s]
coordinate [m]
Z coordinate
Fig. 11: Comparison of the estimated cavity position, the measured
cavity position from the vision-based detection and the ground truth
in frame Vduring the real-world experiment.
performance of the proposed cavity inspection platform. The
experiments were conducted in an indoors flying room and
we used a model of a tree cavity carved from wood at the
Swiss Federal Institute for Forest, Snow and Landscape Re-
search (WSL). An external motion capture system (VICON)
was used only to monitor the robot and cavity ground truth
state for later evaluation of our method.
Figure 10 shows a sequence from the experiment. The
MAV is first flown up manually and then the user starts the
cavity detection by setting a seed point in the depth image
to indicate the tree trunk. Once it is clear that the cavity has
been successfully detected, the auto controller is activated
and the MAV autonomously flies to hover in front of the
cavity. Then the arm controller is started and the manipulator
is extended to insert the end-effector in the cavity.
The experiments show that the cavity is successfully
detected and the MAV is able to hold its position in front of
the cavity entrance, while correctly inserting the end-effector,
for several minutes
Figure 11 shows a comparison of the KF cavity position
estimate (red curve) with the measured cavity position from
the vision-based detection (blue dots) and the ground truth
(black curve). Although the ground truth is, in this case, only
an approximation calculated by transforming the ground truth
position of the cavity from the VICON system to the vehicle
frame using the ground truth vehicle pose from the VICON
system. The MAE and SD for the real world experiments can
be found in Table I. The MAE is below 4cm in all directions.
However, the ground truth is an approximation, therefore the
error needs to be regarded critically. Still, we can observe
that again the cavity position estimate smoothly follows the
ground truth even if there is no result from the vision-based
detection at all for a certain time window. Also, the cavity
position is quickly corrected again after there were a lot of
false positives, e.g., in the very beginning of the experiment,
when the MAV is still quite far away from the cavity.
Hence, we can conclude that our approach is able to
provide relatively accurate position estimates in real-time.
Fig. 10: Sequence from the real-world experiment.
We have presented an aerial robotic platform for au-
tonomous remote tree cavity inspection, based on a hexa-
copter MAV equipped with a dexterous manipulator. The
full goal of the project is to detect a tree cavity, hover in
front of the cavity, insert an end-effector fitted with a stereo
camera, and capture data for offline 3D reconstruction of
the cavity. This paper has focused on the problem of cavity
detection and provided strategies for high level control of the
MAV and robot manipulator with minimal user intervention.
Simulation experiments show that our approach is able to
accurately detect a cavity, navigate the MAV to hover in front
of it and insert the end-effector safely. We have carried out
real-world experiments to show that accurate cavity detection
is possible in real time on board the MAV.
Preliminary experiments on natural trees in a forest have
been realized and the results looked promising, but more
outdoors tests are needed to truly test the vialbility of the
platform for autonomous remote tree cavity inspection.
[1] N. Metni and T. Hamel, “A UAV for bridge inspection: Visual servoing
control law with orientation limits,” Automation in Construction,
vol. 17, no. 1, pp. 3–10, November 2007.
[2] A. Bircher, K. Alexis, M. Burri, P. Oettershagen, S. Omari, T. Mantel
and R. Siegwart, “Structural inspection path planning via iterative
viewpoint resampling with application to aerial robotics,” in Robotics
and Automation (ICRA), 2014 IEEE International Conference on, May
2015, (accepted).
[3] G. Darivianakis, K. Alexis, M. Burri, and R. Siegwart, “Hybrid
predictive control for aerial robotic physical interaction towards in-
spection operations,” in Robotics and Automation (ICRA), 2014 IEEE
International Conference on, May 2014, pp. 53–58.
[4] A. Girard, A. Howell, and J. Hedrick, “Border patrol and surveillance
missions using multiple unmanned air vehicles,” in Decision and
Control, 2004. CDC. 43rd IEEE Conference on, 2004.
[5] P. Rudol and P. Doherty, “Human body detection and geolocalization
for uav search and rescue missions using color and thermal imagery,
in Aerospace Conference, 2008 IEEE, 2008, pp. 1–8.
[6] E. R. Hunt, W. D. Hively, S. J. Fujikawa, D. S. Linden, C. S. T.
Daughtry, and G. W. McCarty, “Acquisition of nir-green-blue digital
photographs from unmanned aircraft for crop monitoring,” Remote
Sensing, vol. 2, no. 1, pp. 290–305, 2010.
[7] G. Darivianakis, K. Alexis, M. Burri, and R. Siegwart, “Hybrid
predictive control for aerial robotic physical interaction towards in-
spection operations,” in Robotics and Automation (ICRA), 2014 IEEE
International Conference on. IEEE, 2014, pp. 53–58.
[8] M. Fumagalli, R. Naldi, A. Macchelli, F. Forte, A. Keemink, S. Strami-
gioli, R. Carloni, and L. Marconi, “Developing an Aerial Manipulator
Prototype: Physical Interaction with the Environment,Robotics Au-
tomation Magazine, IEEE, vol. 21, no. 3, pp. 41–50, Sept 2014.
[9] J. Remm and A. L˜
ohmus, “Tree cavities in forests - the broad
distribution pattern of a keystone structure for biodiversity,” Forest
Ecology and Management, vol. 262, no. 4, pp. 579 – 585, 2011.
[10] R. B¨
utler, T. Lachat, L. Larrieu, and Y. Paillet, “Habitat trees: key
elements for forest biodiversity,” in Integrative approaches as an
opportunity for the conservation of forest biodiversity, D. Kraus and
F. Krumm, Eds. Freiburg: European Forest Institute, 2013, pp. 84–91.
[11] T. Ranius, “Influence of stand size and quality of tree hollows on
saproxylic beetles in sweden,” Biological Conservation, vol. 103,
no. 1, pp. 85 – 91, 2002.
[12] J. M¨
uller, A. Jarzabek-M¨
uller, H. Bussler, and M. M. Gossner,
“Hollow beech trees identified as keystone structures for saproxylic
beetles by analyses of functional and phylogenetic diversity,” Animal
Conservation, vol. 17, no. 2, pp. 154–162, 2014.
[13] Ascending Technologies GmbH. [Online]. Available: http://www.
[14] J. Nikolic, J. Rehder, M. Burri, P. Gohl, S. Leutenegger, P. T. Furgale,
and R. Siegwart, “A synchronized visual-inertial sensor system with
fpga pre-processing for accurate real-time slam,” in Robotics and
Automation (ICRA), IEEE International Conference on, 2014.
[15] Pmdtechnologies AG. [Online]. Available:
[16] D. Goglio, “Design of end-effector for tree cavity inspection,”
Semester Thesis, Swiss Federal Institute of Technology Zurich
(ETHZ), Switzerland, 2015.
[17] Awaiba. [Online]. Available:
[18] M. Bloesch, S. Omari, M. Hutter, and R. Siegwart, “Robust visual
inertial odometry using a direct ekf-based approach,” in Intelligent
Robots and Systems (IROS), 2015 IEEE/RSJ International Conference
on, Sept 2015, pp. 298–304.
[19] S. Lynen, M. Achtelik, S. Weiss, M. Chli, and R. Siegwart, “A
robust and modular multi-sensor fusion approach applied to mav
navigation,” in Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ
International Conference on, Nov 2013, pp. 3923–3929.
[20] S. Suzuki and K. Abe, “Topological structural analysis of digitized
binary images by border following,” Computer Vision, Graphics, and
Image Processing, vol. 30, pp. 32–46, 1985.
[21] X.-J. Liu, J. Wang, and H.-J. Zheng, “Optimum design of the 5r sym-
metrical parallel manipulator with a surrounded and good-condition
workspace,” Robotics and Autonomous Systems, vol. 54, no. 3, pp.
221 – 233, 2006.
[22] H. Yu, “Modeling and control of hybrid machine systems — a
five-bar mechanism case,International Journal of Automation and
Computing, vol. 3, no. 3, pp. 235–243.
[23] Intel RealSense. [Online]. Available:
www/us/en/architecture-and- technology/realsense-overview.html
[24] RotorS: An MAV gazebo simulator. [Online]. Available: https:
// simulator
... Aerial manipulator platforms typically consist of the following two subsystems: i) the aerial vehicle, and ii) the manipulation system that is used for interaction with the environment [8], [9]. Regarding the vehicle type, aerial manipulators typically belong in one of the following classes: i) helicopters [2], [10], ii) unidirectional multirotors, such as quadrotors [1] and hexacopters [11], iii) ducted-fan vehicles [12], iv) fixedwing designs [13], and v) hybrid designs [14]. Although unidirectional UAVs are very useful for many applications, they are not very well suited for aerial manipulation tasks since their underactuated nature limits their dexterity. ...
... One of them is the holonomic nature of the ground mode, which allows for changing the moving direction at any time. As the Omnirotor's cage shape resembles a cylinder when pushing on something, the interaction is similar to a line contact, where only a force VOLUME 4, 2016 11 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. ...
Full-text available
Mobile and aerial robots offer many potential applications, including warehouse logistics, surveillance, cinematography, and search and rescue. However, most such robots are task-specific and generally need more versatility to tackle multiple scenarios, terrains, and unstructured, dynamic environments. This paper presents the Omnirotor platform, a versatile, multi-modal, coaxial, tilt-rotor, all-terrain vehicle that combines an Unmanned Aerial Vehicle (UAV) and an Unmanned Ground Vehicle (UGV) into a hybrid, all-terrain vehicle. The Omnirotor has two locomotion modes of operation (aerial and ground vehicle) and five operation configurations, as it can fly both in the normal and inverted configurations, and it can drive on the ground both in the normal and inverted configurations. It can also recover from any non-operational state to its normal, upside-down configuration. Moreover, in addition to the locomotion modes, the continuous omnidirectional thrust vectoring enables the Omnirotor platform to perform complex manipulation of objects. This work introduces the concept and discusses in detail the design, development, and experimental validation of the Omnirotor platform. In particular, it discusses the modeling and control schemes required by the different operation modes and configurations. It experimentally validates the platform’s capabilities with experiments focusing on traversing challenging environments and unstructured, uneven terrains (e.g., a public park). Finally, the platform’s ground, pushing-based manipulation capabilities are demonstrated through the execution of a puzzle-solving experiment where the solved puzzle serves as a landing platform for the all-terrain vehicle. The versatility of Omnirotor offers exciting prospects for use in challenging search-and-rescue scenarios, surveillance, and aerial and ground manipulation applications.
... Traditional multirotor platforms usually align multiple propellers in the same direction to effectively compensate for the gravity during the flight. This configuration is underactuated, but is proven differential-flat [1] for the independent control of a four DoF subspace [2,3], which is adequate for positionoriented applications such as surveillance, rescue, delivery, inspection and mobile network construction, etc. [4][5][6][7][8][9][10][11][12]. ...
... This property has been found of great significance in our following works for the tracking of highly-dynamic reference signals [32] and flight recovery under certain actuator failures [33,34]. On the other hand, C1 does not have auxiliary inputs according to (9) and (10). C2 only have , but the magnitude (therefore the control effort) is much smaller than , and is usually not sufficient to handle the tracking or failure recovery tasks in [32,33]. ...
This paper presents the control and experiments of a novel multirotor aerial platform, which is capable of full actuation for six Degree of Freedom (DoF) motions. The platform is actuated by a number of tilting-thrust modules, each consisting of a regular quadcopter and a mechanically passive hinge. The platform in this paper has four such actuator modules, making an over-actuated system that requires input allocations in the feedback control. In addition to the common least-square method that minimizes the sum of squares of the thrusts, we propose a control allocation that minimizes the maximum thrust in a closed-form analytical solution for efficient real time computation. This allocation can achieve larger inclination angles than that by the least-square method, when thrust forces are insufficient to overcome the gravity for all poses. Simulation and real world experiments are presented to demonstrate the control of the aerial platform for six DoF motion tracking and disturbance rejection.
... As shown in Tables 3 and 4, the 3D object pose-estimation systems focused on by other scholars [29,30] highlight the objects in static states, while the model applied in this paper exhibited higher errors in estimating the dynamic position of Przewalski's Tibetan antelope in real time. However, it possesses better robustness and is also more accurate than the model in [28] that uses the KF algorithm alone for 3D pose estimation of the target. ...
Full-text available
This paper presents an autonomous unmanned-aerial-vehicle (UAV) tracking system based on an improved long and short-term memory (LSTM) Kalman filter (KF) model. The system can estimate the three-dimensional (3D) attitude and precisely track the target object without manual intervention. Specifically, the YOLOX algorithm is employed to track and recognize the target object, which is then combined with the improved KF model for precise tracking and recognition. In the LSTM-KF model, three different LSTM networks (f, Q, and R) are adopted to model a nonlinear transfer function to enable the model to learn rich and dynamic Kalman components from the data. The experimental results disclose that the improved LSTM-KF model exhibits higher recognition accuracy than the standard LSTM and the independent KF model. It verifies the robustness, effectiveness, and reliability of the autonomous UAV tracking system based on the improved LSTM-KF model in object recognition and tracking and 3D attitude estimation.
... An aerial manipulator combines a flying base, which is usually a multirotor aerial vehicle, with one or multiple robotic arms. Compared to conventional flying robots such as multirotors, an aerial manipulator can interact with the environment to complete various tasks such as pick-and-place [1], tree cavity inspection [2], contact-based inspection [3], opening/closing of a valve [4], assembly of structures [5]. Compared to ground mobile manipulators, an aerial manipulator can fly to areas that are difficult to reach by humans or ground robots, hence greatly enlarging the scope of manipulation tasks. ...
This paper studies the tracking control problem of an aerial manipulator that consists of a quadcopter flying base and a Delta robotic arm. We propose a novel control approach that consists of extended state observers (ESOs) for dynamic coupling estimation, ESO-based flight controllers, and a cooperative trajectory planner. Compared to the state-of-the-art approaches, the proposed one has some attractive features. First, it requires much less measurement information as opposed to the full-body control approaches and hence can be implemented conveniently and efficiently in practice. Second, while the existing approaches estimate the coupling effect based on precise models, the proposed ESOs can do that based on much less information about the system model. The proposed approach is verified by four experiments on a real aerial manipulation platform. The experimental results show that the average tracking error can reach 1 cm by the proposed approach as opposed to 10 cm by the PX4 baseline controller. Although force control is not considered specifically in the approach, the system can complete aerial weaving tasks thanks to the ESOs in the presence of drag forces applied to the end-effector during manipulation. Note to Practitioners —Aerial manipulators have received increasing research attention in recent years due to their wide range of applications. In this paper, we particularly focus on the high-precision and robust control of aerial manipulators. We propose a novel control approach that consists of extended state observers (ESOs) for dynamic coupling estimation, ESO-based flight controllers, and a cooperative trajectory planner. Four experiments on a real aerial manipulation platform demonstrate the effectiveness of the approach. In future research, we will address the control problem when the aerial manipulator contacts the environment.
... In [1] researchers envisioned insertion tasks for aerial manipulators. In line with that, researchers in [2] used camera equipped aerial manipulator to inspect tree cavities. Authors of [3] proposed door opening task using an aerial manipulator, while researchers in [4] designed a human-size dualarm lightweight manipulator. ...
Full-text available
This paper presents a method for aerial manipulator end-effector trajectory tracking by encompassing dynamics of the Unmanned Aerial Vehicle (UAV) and null space of the manipulator attached to it in the motion planning procedure. The proposed method runs in phases. Trajectory planning starts by not accounting for roll and pitch angles of the underactuated UAV system. Next, we propose simulating the dynamics on such a trajectory and obtaining UAV attitude through the model. The full aerial manipulator state obtained in such a manner is further utilized to account for discrepancies in planned and simulated end-effector states. Finally, the end-effector pose is corrected through the null space of the manipulator to match the desired end-effector pose obtained in trajectory planning. Furthermore, we have applied the TOPP-RA approach on the UAV by invoking the differential flatness principle. Finally, we conducted experimental tests to verify effectiveness of the planning framework.
... The use of Unmanned Aerial Vehicles (UAVs) towards autonomous task completion has received increased attention in the past decade. Indicative tasks include aerial manipulation in Gassner et al. (2017), Gkountas and Tzes (2021), Nguyen and Alexis (2021), Li et al. (2021), surveillance in Bisio et al. (2021), Tsoukalas et al. (2021), as well as Simultaneous Localization and Mapping (SLAM) in Papachristos et al. (2019a), Tsoukalas et al. (2020), Dang et al. (2019) or inspection in Papachristos et al. (2016), Steich et al. (2016), Bircher et al. (2015); these works necessitate the collaborating agents to exchange their relative pose (position and orientation). ...
Full-text available
UAVs operating in a leader-follower formation demand the knowledge of the relative pose between the collaborating members. This necessitates the RF-communication of this information which increases the communication latency and can easily result in lost data packets. In this work, rather than relying on this autopilot data exchange, a visual scheme using passive markers is presented. Each formation-member carries passive markers in a RhOct configuration. These markers are visually detected and the relative pose of the members is on-board determined, thus eliminating the need for RF-communication. A reference path is then evaluated for each follower that tracks the leader and maintains a constant distance between the formation-members. Experimental studies show a mean position detection error (5 × 5 × 10cm) or less than 0.0031% of the available workspace [0.5 up to 5m, 50.43° × 38.75° Field of View (FoV)]. The efficiency of the suggested scheme against varying delays are examined in these studies, where it is shown that a delay up to 1.25s can be tolerated for the follower to track the leader as long as the latter one remains within its FoV.
Conference Paper
Full-text available
The challenge of aerial robotic physical interaction towards inspection of infrastructure facilities through contact is the main motivation of this paper. A hybrid model predictive control framework is proposed, based on which a typical quadrotor vehicle becomes capable of stable physical interaction , accurate trajectory tracking on environmental surfaces as well as force control with only minor structural adaptations. Convex optimization techniques enabled the explicit computation of such a controller which accounts for the dynamics in free–flight and during physical interaction, ensures the stability of the hybrid system as well as response optimality, while respecting system constraints and imposed logical rules. This control framework is further extended to include obstacle avoidance capabilities. Extensive experimental studies that included complex " aerial–writing " tasks, interaction with non–planar and textured surfaces and obstacle avoidance maneuvers, indicate the efficiency of the approach and the potential capabilities of such aerial robotic physically interacting operations.
Conference Paper
Full-text available
Robust, accurate pose estimation and mapping at real-time in six dimensions is a primary need of mobile robots, in particular flying Micro Aerial Vehicles (MAVs), which still perform their impressive maneuvers mostly in controlled environments. This work presents a visual-inertial sensor unit aimed at effortless deployment on robots in order to equip them with robust real-time Simultaneous Localization and Mapping (SLAM) capabilities, and to facilitate research on this important topic at a low entry barrier. Up to four cameras are interfaced through a modern ARM- FPGA system, along with an Inertial Measurement Unit (IMU) providing high-quality rate gyro and accelerometer measure- ments, calibrated and hardware-synchronized with the images. This facilitates a tight fusion of visual and inertial cues that leads to a level of robustness and accuracy which is difficult to achieve with purely visual SLAM systems. In addition to raw data, the sensor head provides FPGA-pre-processed data such as visual keypoints, reducing the computational complexity of SLAM algorithms significantly and enabling employment on resource-constrained platforms. Sensor selection, hardware and firmware design, as well as intrinsic and extrinsic calibration are addressed in this work. Results from a tightly coupled reference visual-inertial motion estimation framework demonstrate the capabilities of the presented system.
Conference Paper
Within this paper, a new fast algorithm that provides efficient solutions to the problem of inspection path planning for complex 3D structures is presented. The algorithm assumes a triangular mesh representation of the structure and employs an alternating two-step optimization paradigm to find good viewpoints that together provide full coverage and a connecting path that has low cost. In every iteration, the viewpoints are chosen such that the connection cost is reduced and, subsequently, the tour is optimized. Vehicle and sensor limitations are respected within both steps. Sample implementations are provided for rotorcraft and fixed-wing unmanned aerial systems. The resulting algorithm characteristics are evaluated using simulation studies as well as multiple real-world experimental test-cases with both vehicle types.
Conference Paper
It has been long known that fusing information from multiple sensors for robot navigation results in increased robustness and accuracy. However, accurate calibration of the sensor ensemble prior to deployment in the field as well as coping with sensor outages, different measurement rates and delays, render multi-sensor fusion a challenge. As a result, most often, systems do not exploit all the sensor information available in exchange for simplicity. For example, on a mission requiring transition of the robot from indoors to outdoors, it is the norm to ignore the Global Positioning System (GPS) signals which become freely available once outdoors and instead, rely only on sensor feeds (e.g., vision and laser) continuously available throughout the mission. Naturally, this comes at the expense of robustness and accuracy in real deployment. This paper presents a generic framework, dubbed MultiSensor-Fusion Extended Kalman Filter (MSF-EKF), able to process delayed, relative and absolute measurements from a theoretically unlimited number of different sensors and sensor types, while allowing self-calibration of the sensor-suite online. The modularity of MSF-EKF allows seamless handling of additional/lost sensor signals during operation while employing a state buffering scheme augmented with Iterated EKF (IEKF) updates to allow for efficient re-linearization of the prediction to get near optimal linearization points for both absolute and relative state updates. We demonstrate our approach in outdoor navigation experiments using a Micro Aerial Vehicle (MAV) equipped with a GPS receiver as well as visual, inertial, and pressure sensors.
The general importance of dead wood in European beech forests for species requiring high amounts of decayed wood of large diameter has recently been demonstrated using a functional approach. However, the effect of veteran trees, particularly of living hollow trees with mould, on functional diversity, is less understood. These trees are known to be a habitat for a few endangered and specialized arthropods and epiphytes. Their ecological role as a complex habitat has been assumed, but not yet formally tested. We compared the richness and functional and phylogenetic diversity of saproxylic beetle assemblages of vital beech trees, habitat trees (i.e. trees with partial bark loss, broken crowns or sporocarps) and hollow trees with mould. As expected, the richness of red-listed species increased from vital trees to habitat trees to hollow trees. When we controlled for species richness using null models, both functional and phylogenetic diversity were higher for hollow trees than for habitat trees, which can be explained by the habitat heterogeneity hypothesis. Single-trait analyses revealed that hollow trees promoted species requiring late decay stages, large diameters and shady habitats. This suggests that in beech forests, hollow trees not only promote the few specialists of hollow trees, but also play a superior role for species under pressure by current logging practices and as a keystone structure with high habitat diversity at one tree. We therefore urge forest managers and conservationists to monitor particularly the easy-to-identify hollow trees and the conspicuous species living in such trees, as useful umbrellas for a high-diversity dead-wood habitat.
We explored the main factors affecting the global distribution of tree cavities – a habitat component of mostly biotic origin that is crucial for many animal species. We considered the influence of eight environmental variables (ranging from the single-tree to the biogeographic-region scale) on cavity density in a meta-analysis of 103 published studies. The global median density of cavities was 16ha−1, with densities highest in Australasia and lowest in the Palaearctic region. Two major factors influencing density were identified: cavity density was positively related to the amount of precipitation, and was higher in natural than in managed forests. These effects suggest that the distribution of tree cavities largely reflects the incidence of fungal heart-rot in trees, and that forest management, by affecting wood decay processes, can have a broad-scale impact on tree microhabitat availability. Although air temperature, forest composition and wood hardness had suggestive univariate effects, neither these variables nor biogeographic region explained any additional variation in multifactor models. In regions where woodpeckers are present there was an upper limit to the density of woodpecker-excavated cavities (approximately 10–20cavitiesha−1) that was considerably lower than the highest total cavity densities encountered (up to 140ha−1). This indicates that primary cavity-nesters are particularly important keystone species in cavity-poor forests where wood decay processes are suppressed either climatically or by forest management.
The habitat requirements of 11 beetle species associated with tree hollows were studied by correlating their occupancy with characteristics of the oaks and the number of hollow oaks in each stand. The species richness was higher in trunks with a large girth and in tree hollows with entrances situated high up on the trunks and not directed upwards. Several species occurred with higher frequencies in larger stands, which suggests that these species are sensitive to habitat fragmentation. In some cases at least, the present species richness may reflect higher densities of hollow trees in the past. It is therefore suggested that, in order to preserve the rare fauna in old oaks, the stand size must be increased at many sites. In this study two species,Elater ferrugineus and Tenebrio opacus, were absent from almost all stands with