Content uploaded by Fabio Augusto de Alcantara Andrade
Author content
All content in this area was uploaded by Fabio Augusto de Alcantara Andrade on Nov 11, 2022
Content may be subject to copyright.
Detection and tracking of crossing vessels for small
autonomous vessels equipped with stereo camera
Fabio A. A. Andrade
University of South-Eastern Norway
NORCE Norwegian Research Centre
Norway
Email: fabio@ieee.org
Marius Tannum
University of South-Eastern Norway
Norway
Email: marius.tannum@usn.no
Adrian Knutsen
University of South-Eastern Norway
Norway
Nina Tran
University of South-Eastern Norway
Norwegian University of Science and Technology
Norway
Torben Falleth Olsen
University of South-Eastern Norway
Norwegian University of Science and Technology
Norway
Tobias Tufte
Kongsberg Maritime
Norway
Carlos A. M. Correia
Federal Center for Technological Education of Rio de Janeiro
Brazil
Email: ccorreiaprof@gmail.com
Abstract—When Small Autonomous Vessels perform a mission
in a non-segregated area, that is, an area where other vessels
(manned or not) will also be present, they need to be able to
comply with navigation rules, such as collision avoidance. In
order to decide to give-way or stand-on, the autonomous vessel
must keep track of other vessels. In this work, a vessels detection
and tracking solution is presented. The system uses a depth
camera, IMU, magnetometer and GNSS as main sensors for
this task. The boat detection is performed by the SSD-MobileNet
V2 Neural Network pre-trained with the COCO dataset. The
tracking is performed by a simple and fast algorithm, and the
samples are later filtered using a Kalman Filter. The solution is
implemented in the Robot Operating System (ROS) framework.
Practical tests were performed, where stationary and moving
targets were tracked. Results show that the method performs
well and is able to achieve even better results after some fine
tuning.
I. INTRODUCTION
When navigating at sea, vessels need to follow the COL-
REG (International Regulations for Preventing Collisions at
Sea) regulations [1]. The rules for encountering other vessels
states that one vessel must stand-on while the other vessel
must give-way, depending on the position and direction of
the vessels. As these rules must be followed by any vessel,
small autonomous vessels are also included.
In order to leverage the development of small autonomous
vessels, Norwegian universities organize an annual compe-
tition called AUTODRONE [2], which, among many chal-
lenges, has a Collision Avoidance task. In this task, the vessel
needs to navigate through a small canal with green and red
buoys. The objective is to get through to the other side,
without crashing into intersecting traffic (Figure 1).
Therefore, this project aims to implement a collision avoid-
ance module on the university’s small autonomous vessel
(SeaDrone) for the competition.
As similar works, it is possible to highlight the works
in [3] and [4], where the authors investigated the use of
Deep Learning, sensor fusion and tracking strategies for boat
detection.
Additionally, in [5], the authors expose and discuss the
results of an exercise with maritime autonomous vessels
for collision avoidance, in compliance with the COLREG
instructions.
In [6], probabilistic data of a vessel movement was calcu-
lated in order to infer its trajectory in a generalized framework
for obstacle intentions predictions. These predictions are used
in a collision avoidance (COLAV) system. The Probabilistic
Scenario-Based Model Predictive Control (MPC) was em-
ployed and a finite set of obstacle maneuvers were considered
and its respective probability of collision were estimated.
In [7], a new approach in the collision avoidance problem
was proposed, using a prediction model of the simulation-
based MPC to model the unmanned surface vehicle (ASV)
collision avoidance system as an extension of its guidance
system, where the decisions of the guidance system are
employed as references for the MPC COLAV method.
In [8], the authors propose a proactive, instead of a reactive,
collision avoidance method for ASVs, for the Velocity Ob-
stacles Framework, which employs relative motion arguments
in order to proceed collision avoidance operations.
In [9] MPC is used to predict a finite and low number set
of obstacle maneuver scenarios for control of an autonomous
COLAV system. The number of control variables are also
reduced, and the model works only with the guidance course
angle, commanded to the autopilot, and with propulsion
command. Despite restricting the degrees of freedom of the
control, this choice achieves more effectiveness in the control
of COLAV system performance, reducing its complexity and
computing demand.
The novelty of this project lies in coupling a Deep Learn-
ing object detection, direct georeferencing using the depth,
GNSS and IMS (Inertial Measurement System) measure-
ments, tracking, and filtering. The solution is developed using
the Robot Operating System (ROS). In addition, field tests
were performed to evaluate the proposed solution.
Fig. 1. Collision Avoidance Mission layout.
II. DE VE LO PM ENT
The block diagram that summarizes the system is presented
in Figure 2. First, the image is captured and the Deep
Learning model identifies potential boats in the image. These
detected objects are tracked by a tracking algorithm. Then,
a Kalman Filter is updated for each tracked object with the
new measurements. Finally, based on the filtered position and
estimated velocity of the tracked objects, the decision making
algorithm evaluates if the SeaDrone should give-way or stand-
on.
Fig. 2. System’s block diagram.
A. Hardware
1) ZED 2 Stereo Camera: The Stero Lab’s ZED2 camera
[10] is a stereo camera with neural depth perception, built-in
IMU, barometer and magnetometer, it also has a full 120°
field of view. The stereo camera is absolutely essential for
the SeaDrone’s ability to interpret the surroundings and also
for extracting depth data of crossing vessels.
Fig. 3. SeaDrone
2) NVIDIA Jetson Xavier NX: The NVIDIA Jetson Xavier
NX [11] is an advanced microcontroller that has exceptionally
high performance in relation to size and power. Xavier is
specifically designed to perform demanding graphical and
arithmetic functions, such as the Deep Learning models used
in this work. The Xavier is connected to the ZED 2 Stereo
Camera through USB 3.0 and to the GNSS receiver and
Raspberry Pi through Ethernet. In addition, it runs all the
algorithms developed in this work.
3) Raspberry Pi 3 Model B+: Raspberry pi 3b+ [12]
is a microcontroller that works as the main control unit
for the navio2. The 3b+ model is equipped with 64 bits 4
core processor, dual-band 2,4GHz, bluetooth 4,2 and 5GHz
wireless LAN.
4) Emlid Navio2 HAT: The Navio2 HAT (Hardware At-
tached On Top) [13] is an autopilot add-on card for Raspberry
Pi with ArduPilot and ROS support. The card is equipped
with GNSS receiver, IMU, Barometer and and RC I/O co-
processor.
At the same time the card is equipped with GNSS receiver,
RC I/O coprocessor, IMU, Barometer and has support for
SBUS. [17] The Navio2 card is mounted together with the
Raspberry Pi via the GPIO pins, which makes it possible
to retrieve data from the GPS and compass sensors. Navio2
is also equipped with a number of PWM outputs which are
used to control the subsea motors and servomotors in the
Propulsion and Control System.
5) SIMRAD HS60 GPS Compass: The HS60 GPS Com-
pass [14] provides accurate course and position data. It is
ideal for applications where accurate course data is important,
such as autopilot systems. It determines the course by using
two separate GPS antennas, a gyro and a tilt sensor. The two
last sensors are also used to provide position updates during
temporary loss of GPS signal. The communication with ROS
is done via a NMEA driver package.
B. Image Capture
The image is made available by the ZED2 stereo camera.
The RGB image is taken by the left camera and rectified.
ZED2 ROS Wrapper provides the rectified image as well as
the camera calibration matrix for the rectified image. The
depth map is registered on left image.
C. Object Detection
For the boat detection, the pre-trained SSD-Mobilenet-
v2 object detection network was used. This network was
developed by TensorFlow and was trained using the COCO
2017 dataset [15]. It can detect 91 different classes, including
the class ”boat”, which is the object of interest in this work.
The object detection network was implemented using the
NVIDIA inference toolkit.
The detection threshold was set to 50% and the center
pixels of the bounding boxes of only the objects classified as
”boats” are sent to the tracking module.
D. Tracking
The boat detection module gives the pixel position of
the center of the detected boats bounding boxes. Therefore,
together with the data from the GPS, IMU, magnetometer
and depth, it will be possible to calculate the position of the
boat in the world frame.
In addition, more than one boat may be detected and the
order of the detected boats may vary from frame to frame.
Therefore, it is necessary to use a tracking strategy that is
able to assign the new position to an already detected boat.
The position of the boat in the world frame in metric units
is calculated from its position in pixels units in the image
frame through the following rotations and translations (Figure
4), which are described in the following equations.
First, the position of the boat in the camera frame
([xC, yC, zC]T) is calculated based on the pixel position
[u, v, 1]T(equation 1). Here, the XCaxis points to the
right, the YCpoints down and the ZCaxis points straight.
uand vare the horizontal and vertical pixel coordinates,
where the pixel [1,1,1]Tis in the top-left of the image. As
the ZCaxis points straight, zCis the distance between the
camera and the detected boat. This information is the depth
provided by the ZED2 stereo camera though the ROS Topic
/zed/depth/depth registered, which is already mapped on the
left image.
xC
yC
zC
=K−1zC
u
v
1
.(1)
where the intrinsic camera matrix Kis given by:
K=
fx0cx
0fycy
0 0 1
(2)
The matrix parameters are made available by the ZED2
Camera ROS Wrapper and obtained by subscribing to the
ROS Topic /zed/left/camera info.
Now, to calculate the detected boat position in the body
frame, a simple rotation is needed followed by a translation:
xB
yB
zB
=RC→B
xC
yC
zC
+TC→B,(3)
where TC→Bis the position of the camera in the body frame
and the rotation from the camera to the body frame (RC→B)
is given by:
RC→B=
001
100
010
(4)
Fig. 4. Direct georeferencing steps.
The next step is to calculate the position in the world frame.
First, a rotation to the NED frame (RB→NED) will be done
using the Euler angles: yaw (ψ), pitch (θ) and roll (φ). Then,
the position is calculated in the ENU frame, using the same
origin as the Universal Transverse Mercator (UTM) zone:
xEN U
yEN U
zEN U
=RNED→ENURB→NED
xB
yB
zB
+TNED→ENU,
(5)
where RB→NED is the transpose of the Direction Cosine
Matrix (DCM) with rotation order ZYX (yaw-pitch-roll).
In this work, a very simple tracking strategy was imple-
mented. The process is described in the Algorithm 1 pseudo-
code. Only two parameters need to be set: a maximum
possible distance in meters between subsequent detected
positions; and a maximum timeout in frames.
Given a list of tracked points and a list of detections, the
algorithm calculates the distance from all detections to all
tracked points. If the distance is shorter than the maximum
distance, a detection is assigned to the closest tracked point
and the timeout of the tracked point is zeroed. If a tracked
point ends without any detection, one is added to the timeout.
If the timeout reaches the maximum timeout the tracked point
is deleted. If there is a rest of detections (e.g it is not assigned
to any tracked point), a new tracked point is created for that
detection.
Algorithm 1 Tracking
1: Define timeoutmax
2: Define distancemax
3: Input detections
4: Input points
5: for point in points do
6: for detection in detections do
7: Calculate distance between detection and point
8: if distance > distancemax then
9: distances ←distance
10: end if
11: end for
12: end for
13: pointstemp ←points
14: while distances do
15: distancetemp ←minimum distance in distances
16: point(distancetemp)←detection(distancetemp )
17: timeout(point(distancetemp)) ←0
18: Delete detection(distancetemp)
19: Delete pointstemp(distancetemp )
20: Delete distance in distances
21: end while
22: for point in pointstemp do
23: Add 1to timeout(point)
24: if timeout(point)> timeoutmax then
25: Delete point in points
26: end if
27: end for
28: for detection in detections do
29: Create a new point
30: point ←detection
31: end for
E. Filtering
A Kalman Filter was used to estimate the positions and
velocities of the detected boats based on the tracked detected
positions. A new Kalman Filter is initialized and assigned to
each new tracked point created.
As the boats are on sea level, only the horizontal coordi-
nates are used. Therefore, the measurement vector zis equal
to [x, y]and the state vector xis equal to [x, y, vx, vy], where
xand yare the East and North positions and vxand vyare
the East and North velocities. The observation matrix (H) is,
therefore, given as:
H=1000
0100,(6)
and the state transition matrix (A) is given as:
A=
10∆t0
0 1 0 ∆t
0 0 1 0
1 0 0 1
(7)
where ∆tis calculated every loop based on the time between
the current and the previous detection. No control input data
was used in the prediction step.
F. Decision Making
The decision making takes place only if the distance
between both vessels is shorter than a defined distance. This
is to avoid making unnecessary decisions and maneuvers if
the encountering does not represent any risk.
If the vessels are within the defined distance, the process
goes as follows. First, to decide if the SeaDrone should
stand-on or give-way, the bearing between the SeaDrone
and the detected boat is compared with the heading of the
SeaDrone. Therefore, if the detected boat is on the left side,
the SeaDrone should stand-on, except if the distance between
both vessels are shorter than a defined safe distance. In this
case, the SeaDrone stops regardless.
If the detected boat is on the left side of the SeaDrone,
the heading of both vessels are compared. Therefore, if the
detected boat is moving away, the SeaDrone stands-on. If the
vessel’s heading is such as it will likely cross the SeaDrone
course, the SeaDrone gives-way.
III. RES ULTS
Field tests were performed to evaluate the proposed solu-
tion. First, a stationary boat was detected and tracked when
the SeaDrone was moving on the direction of the target boat.
As it can be seen in Figure 5, the detected stationary boat’s
position (gray) has a deviation of around 1.5 meters in the x
axis and 1 meter in the y axis. The reasons for this deviation
may be the fact that the detection returns the center pixel
of the bounding box of the target boat and as the SeaDrone
is moving, this center pixel may vary if the target boat is
being spotted from different angles. Other reasons may be
sensor reading errors, for example, the depth, IMU or even
the GNSS, as no RTK solution is used.
Fig. 5. SeaDrone moving straight tracking a steady boat
In Figure 5, the trajectory of the SeaDrone was divided into
two colors, the blue trajectory is before the target boat was
detected, and the orange trajectory is during the detection
and tracking. In Figure 6, the heading of the SeaDrone is
presented, only from the moment where the target boat is
detected, until the end of the test.
The objective of the second test was to investigate how
the system performed when rotation around the steady boat
Fig. 6. SeaDrone heading during the tracking
(Figure 7). Therefore, a variation of 150 degrees of the
heading was observed (Figure 8 during the test.
Fig. 7. SeaDrone rotating around the tracked boat
It is possible to notice that the there is a sudden jump for
the position of the tracked boat. Again, this is due to the
fact that the boat is big, so when the SeaDrone started the
tracking, it was spotting one side of the boat and, after a
while, it began to spot the other side of the boat. An easy
solution to avoid problems on the collision avoidance is to
always have a safe collision avoidance distance that is bigger
than the maximum expected length of a crossing vessel. If
very large vessels are expected, the SeaDrone should be able
to re-evaluate the obstacles during the waypoint navigation.
Fig. 8. SeaDrone heading when rotating around the tracked boat
Finally, a last test was performed, when the SeaDrone was
moving and tracking multiple targets that were also moving
(Figure 9). It is possible to see that the the objects 5 (red)
and 6 (dark gray) are most likely the same target and that
the tracking module assigned a new number tag to it in the
tracking process. The same can be said about the object 1
(gray) and 2 (yellow). Probably the SeaDrone started tracking
the object and because it started turning, it lost the object from
the camera view and then assigned a new number tag when
it was detected again.
Fig. 9. SeaDrone moving when tracking multiple moving humans
Regarding the heading (Figure 10), it is possible to notice
that the SeaDrone starts moving south, with around 170
degrees of heading, then turns to north direction. The heading
does not reach zero degrees (north) probably because the
SeaDrone has the course of zero degrees but the heading is
around 40 degrees. This means that the SeaDrone is moving
sideways.
Fig. 10. SeaDrone heading when tracking multiple moving humans
IV. CONCLUSION
In this work, a solution for detection and tracking of boats
for collision avoidance was presented. The system is equipped
with a depth camera, IMU, magnetometer and GNSS sensors.
The solution was developed in the Robot Operating System
(ROS) framework. Results show that the system is able to
detect and track the targets. The tracking and filtering work
as expected. As future work, further testing and tuning will be
performed to achieve better results. In addition, the solution
will be completed using Model Predictive Control to follow
a trajectory that avoids the intersecting traffic.
ACKNOWLEDGMENT
This work was supported by the Research Council of
Norway under the project AUTOSTRIP; and by the Brazilian
National Council for Scientific and Technological Develop-
ment (CNPq).
REFERENCES
[1] I. M. Organization, COLREG: Convention on the International Regu-
lations for Preventing Collisions at Sea, 1972. IMO, 2003.
[2] U. of South-East Norway. Autodrone. [Online]. Available:
https://www.autodrone.no/
[3] V. Kamsv˚
ag, “Fusion between camera and lidar for autonomous surface
vehicles,” mastersthesis, NTNU, 2018.
[4] Ø. K. Helgesen, “Sensor fusion for detection and tracking of maritime
vessels,” mastersthesis, NTNU, Trondheim, 2019.
[5] D. K. M. Kufoalor, T. A. Johansen, E. F. Brekke, A. Hepsø, and
K. Trnka, “Autonomous maritime collision avoidance: Field verification
of autonomous surface vehicle behavior in challenging scenarios,”
Journal of Field Robotics, vol. 37, no. 3, pp. 387–403, 2020.
[6] T. Tengesdal, T. A. Johansen, and E. Brekke, “Risk-based autonomous
maritime collision avoidance considering obstacle intentions,” in 2020
IEEE 23rd International Conference on Information Fusion (FUSION).
IEEE, 2020, pp. 1–8.
[7] I. B. Hagen, D. K. M. Kufoalor, E. F. Brekke, and T. A. Johansen,
“Mpc-based collision avoidance strategy for existing marine vessel
guidance systems,” in 2018 IEEE International Conference on Robotics
and Automation (ICRA). IEEE, 2018, pp. 7618–7623.
[8] D. K. M. Kufoalor, E. F. Brekke, and T. A. Johansen, “Proactive colli-
sion avoidance for asvs using a dynamic reciprocal velocity obstacles
method,” in 2018 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS). IEEE, 2018, pp. 2402–2409.
[9] T. A. Johansen, T. Perez, and A. Cristofaro, “Ship collision avoid-
ance and colregs compliance using simulation-based control behavior
selection with predictive hazard assessment,” IEEE transactions on
intelligent transportation systems, vol. 17, no. 12, pp. 3407–3422, 2016.
[10] S. Labs. Zed 2. [Online]. Available: https://www.stereolabs.com/zed-2/
[11] NVIDIA. Jetson xavier nx. [Online]. Avail-
able: https://www.nvidia.com/en-us/autonomous-machines/embedded-
systems/jetson-xavier-nx/
[12] R. PI. Raspberry pi 3 model b+. [Online]. Available:
https://www.raspberrypi.org/products/raspberry-pi-3-model-b-plus/
[13] emlid. Navio2. [Online]. Available: https://navio2.emlid.com/
[14] Simrad. Simrad hs60 gps compass — sensor — simrad norge.
[Online]. Available: https://https://www.simrad-yachting.com/nb-
no/simrad/type/kompass/hs60-gps-compass/
[15] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
P. Doll´
ar, and C. L. Zitnick, “Microsoft coco: Common objects in
context,” in European conference on computer vision. Springer, 2014,
pp. 740–755.