Conference PaperPDF Available

Towards Active Vision with UAVs in Marine Search and Rescue: Analyzing Human Detection at Variable Altitudes

Authors:

Abstract and Figures

Unmanned Aerial Vehicles (UAVs) have been playing an increasingly active role in supporting search and rescue (SAR) operations in recent years. The benefits are multiple such as enhanced situational awareness, status assessment, or mapping of the operational area through aerial imagery. Most of these application scenarios require the UAVs to cover a certain area. If the objective is to detect people or other objects, or analyze in detail the area, then there is a trade-off between speed (higher altitude coverage) and perception accuracy (lower altitude). An optimal point in between requires active perception on-board the UAV to dynamically adjust the flight altitude and path planning. As an initial step towards active vision in UAV search in maritime SAR scenarios, in this paper we focus on analyzing how the flight altitude affects the accuracy of object detection algorithms. In particular, we quantify what are the probabilities for false negatives and false positives in human detection at different altitudes. Our results define the correlation between the altitude and the ability of UAVs to effectively detect people in the water.
Content may be subject to copyright.
Towards Active Vision with UAVs in Marine Search and Rescue:
Analyzing Human Detection at Variable Altitudes
Li Qingqing1, Jussi Taipalmaa2, Jorge Pe ˜
na Queralta1, Tuan Nguyen Gia1, Moncef Gabbouj2,
Hannu Tenhunen1, Jenni Raitoharju3, Tomi Westerlund1
1Turku Intelligent Embedded and Robotic Systems Lab, University of Turku, Finland
Emails: 1{qingqli, jopequ, tunggi, hatenhu, tovewe}@utu.fi
2Department of Computing Sciences, Tampere University, Finland
Emails: 2{jussi.taipalmaa, moncef.gabbouj}@tuni.fi
3Programme for Environmental Information, Finnish Environment Institute, Jyv¨
askyl¨
a, Finland
Email: jenni.raitoharju@environment.fi
Abstract—Unmanned Aerial Vehicles (UAVs) have been play-
ing an increasingly active role in supporting search and rescue
(SAR) operations in recent years. The benefits are multiple
such as enhanced situational awareness, status assessment, or
mapping of the operational area through aerial imagery. Most of
these application scenarios require the UAVs to cover a certain
area. If the objective is to detect people or other objects, or
analyze in detail the area, then there is a trade-off between
speed (higher altitude coverage) and perception accuracy (lower
altitude). An optimal point in between requires active perception
on-board the UAV to dynamically adjust the flight altitude and
path planning. As an initial step towards active vision in UAV
search in maritime SAR scenarios, in this paper we focus on
analyzing how the flight altitude affects the accuracy of object
detection algorithms. In particular, we quantify what are the
probabilities for false negatives and false positives in human
detection at different altitudes. Our results define the correlation
between the altitude and the ability of UAVs to effectively detect
people in the water.
Index Terms—Active Vision; Flight Altitude; Dynamic Alti-
tude; Object Detection; Human Detection; Marine Search and
Rescue (SAR); Unmanned Aerial Vehicles (UAV);
I. INTRODUCTION
Recent years have seen an increasingly wider adoption of
unmanned aerial vehicles (UAVs) to support search and res-
cue (SAR) operations. Owing to their fast deployment, speed
and aerial point of view, UAVs can aid quick response teams,
but also in longer-term monitoring and surveillance [1]. Some
of the main applications of UAVs in these scenarios are real-
time mapping of the operational area or delivery of emer-
gency supplies. In particular, UAVs can bring a significant
increase of the response team’s situational awareness and
detect objects and people from the air, specially those in
need of rescue [2]. An overview of recent research in this
area is available in [3], where UAVs for SAR operations
are characterized based on the operational environment, the
type of robotic systems in use, and the onboard sensing
capabilities of the UAVs.
We are interested in optimizing the support that UAVs
can provide in maritime SAR operations (see Fig. 1), but
Fig. 1: Illustration of active-vision-based search in maritime envi-
ronments with UAVs. A single UAV can first fly higher to cover
larger areas and descend in the event of a positive detection to in-
crease reliability. Search time can then be optimized by dynamically
adjusting the altitude depending on the perception confidence.
also for monitoring and surveillance in maritime environ-
ments, where they have already been widely utilized [4].
Maritime SAR operations might occur in both normal and
harsh environments. For example, according to the Spanish
national drowning report [5], in 2019 over 40% of drownings
happened on a beach, around 60% of the incidents happened
between 10:00 and 18:00, and in 20% of the cases lifeguards
were present in the area. Therefore, there is still a need
for better solutions for monitoring and supporting SAR
operations in safeguarded beaches, lakes or rivers even with
favorable weather conditions, which can then be extended
towards rougher environments as the technology evolves.
In this paper, we study the detection of people in mostly
still waters at different altitudes. In the future, we aim to
utilize this information within an active vision algorithm
that can dynamically adapt the flight plan of UAVs towards
optimization of search speed and reliability.
In terms of UAV-based perception, deep learning (DL)
methods have become the de-facto standards in object de-
tection and image segmentation with great success across
multiple domains [6], [7]. In this paper, we utilize the
YOLOv3 [6] architecture and characterize its performance for
human detection on still water surfaces. Within the machine
perception field, active vision has been a topic of interest
that has gained increasing research interest, owing to the
multiple advances in DL and accessibility of UAV platforms
for research. Active vision has been successfully applied for
single and multi-agent tracking [8], but we have observed a
gap in the literature in terms of active vision for search and
area coverage. The most active research direction in active
perception is currently reinforcement learning (RL) [9]. How-
ever, we consider in this paper a more traditional approach.
An RL approach can be challenging owing to the lack of
realistic simulators to train models for sea SAR.
Deep learning for perception in maritime environments
is limited by the lack of realistic training datasets openly
available. Moreover, a key challenge for UAV-based person
search and detection in these environments is the relatively
small size of objects to be detected in comparatively large
areas to be searched [10]. There is an evident trade-off
between speed and area coverage, and reliability of both
positive and negative detection. An additional challenge is
that the view of people at sea from the air is only partial, as
a significant portion of the body is immersed in the water.
Water reflection and refraction effects might also distort the
shape. In order to train YOLOv3 to adapt to this scenario,
and owing to the lack of open data for detecting people in
water, we collected over 450 high-resolution images to train,
validate and test our model. The images have been taken at
altitudes ranging from 20 m to 120 m.
This is, to the best of our knowledge, the first paper to
analyze the perception accuracy for UAVs with RGB cameras
in maritime environments as a function of their altitude. The
results can be generalized by accounting for the size in pixels
of the persons to be detected assuming well-focused images.
Moreover, the retrained YOLO model outperforms the state-
of-the-art in object classification, as it has been trained to
detect people even when only their head emerges above the
water level. The retrained YOLO model can be applied for
people swimming but also standing near the shore in a beach.
The rest of the paper is organized as follows. Section
II briefly overviews previous research in active vision, on
one side, and maritime SAR operations supported by UAVs,
on the other. We then describe the main objectives of our
study in Section III, together with data acquisition and model
training details. Section IV reports our experimental results
and Section V concludes the work.
II. BACK GROU ND
Multiple works have demonstrated the benefits of integrat-
ing UAVs to maritime SAR operations [11], [12]. Typical
sensors onboard UAVs are RGB, RGB-D and thermal cam-
eras, 3D lidars, and inertial/positional sensors for GNSS and
altitude estimation [13], [14]. With these sensors, UAVs can
aid in SAR operations by mapping the environment, locating
victims and survivors, and recognising and classifying dif-
ferent objects [13]. From the perception point of view, DL
methods have become the predominant solution for detecting
humans or other objects [7], [15], [16].
Human detection is a sub-task of object detection that
is of particular interest for SAR robotics [17]. Some of
the most popular neural network architectures for object
detection are R-CNN [18], Fast-RCNN [19], and YOLO [6].
In particular, YOLOv3 is the current state-of-the-art for real-
time detection, able of fast inference and high accuracy [6].
In this paper, we re-train the YOLOv3 network with a new
dataset for detecting people in the water.
Active perception has been defined as:
An agent is an active perceiver if it knows
why it wishes to sense, and then chooses what to
perceive, and determines how,when, and where to
achieve that perception. [20]
In UAV-aided maritime SAR operations, algorithms for area
coverage and human search incorporating active vision need
to be aware that their main objective is to find humans (why),
and need to be able to dynamically adjust their path planning
and orientation to achieve higher-confidence results (what).
This latter aspect can be achieved by, for instance, adjusting
their height and camera pitch, or by moving around the
person to get a better angle (how, where and when).
Active vision has been increasingly adopted in different
object detection tasks. However, no previous research has,
to the best of our knowledge, focused on active vision for
detection of humans in SAR scenarios. We therefore list
here some other relevant works in the area. Ammirato et
al. presented a dataset for robotic vision tasks in indoor
environments using RGBD cameras with the introduction
of an active vision strategy using Deep RL to predict the
next best move for object detection [21]. Juan et al. pre-
sented an autonomous Sequential Decision Process (SDP)
for active perception of targets in uncertain and cluttered
environments, with experiments conducted in a simulated
SAR scenario [22]. Davide et al. applied active vision to a
path planing algorithms that enabled quadrotor flight through
narrow gaps in indoor complex environments [23]. Manuela
et al. applied bio-inspired active vision for object avoidance
with wheel robots in indoor environments [24]. In SAR
operations, once a target has been identified, continuously
updating the position of target is essential, so that path
planning for the rescue teams can be adjusted. This can be
achieved though active tracking [25].
In terms of detecting people in maritime environments,
Eleftherios et al. presented a real-time human detection
system using DL models that run on-board UAVs to detect
open water swimmers [26]. The authors, however, do not
study the accuracy of the perception for different altitudes
or positions. In this work, we focus on analyzing human
detection as a trade-off between larger area coverage (higher
altitude) and higher amount of detail in the images (lower
altitudes).
(a) Beach view (b) Top view
(c) Low altitude (d) Back light
Fig. 2: (a) Example images of terrain at Littoinen Lake, Finland,
(b) The top view of swimmer, (c) The far view of swimmers, (d)
The close view of swimmer
In general, we see a clear trend towards a more widespread
utilization of UAVs in SAR missions and DL models for per-
ception (either onboard or offloading computation). We have
found, however, no previous works exploring the correlation
between the altitude at which UAVs fly and the detection
accuracy in maritime SAR scenarios.
III. METHODOLOGY
This section describes the data and details of the training
process for the perception algorithm. We also outline the
metrics that are analyzed in our experiments.
A. Data Acquisition
Owing to the lack of labeled open data showing people in
water, and in particular data labeled with the flight altitude,
we have collected data from people swimming and standing
in a lake. The dataset contains 458 labeled photos that are
taken by the camera mounted on the UAV. The camera has
a fixed focal length of 24 mm (35 mm format equivalent)
with a field of view of 83°and an aperture f/2.8. The images
have a resolution of 9 MP (4000 by 2250 pixels), and were
recorded near the beach area of Littoistenj¨
arvi Lake (60.4582
N, 22.3766 E), shown in Fig. 2 (a), Turku, Finland.
Each photo captures one or more people that are either
swimming or standing in the lake at different heights and
angles. Some examples are shown in Fig. 2 (b), (c) and (d).
However, the majority of pictures were taken with a gimbal
pitch of -90°(top-view images). The dataset contains 2D
bounding boxes for two classes: persons and other objects,
the latter one being used for animals in the water and other
floating objects. In addition to the bounding boxes, each
image contains information about the GPS position, relative
altitude to the take-off point (just above the water level), and
pitch angle of the camera gimbal (from horizontal images
with 0°pitch to top-view images with -90°pitch). The relative
altitude ranges from 0m to 143 m. While the dataset has
been acquired with good weather conditions and mostly still
waters, variable light conditions are also introduced. This
results in different colors for both water and people, as can
be seen in Fig. 2 (b) and (d). Some of the swimmers use
swimming caps of different colors and wear different types
of swimming suits.
B. Training and test setup
Training and testing were done with the YOLOv3 real-time
object detection model [6]. The YOLOv3 model pre-trained
with ImageNet [27] was trained again with our dataset using
transfer learning. Training is done in a way where all but
the last three layers are frozen for the first 50 epochs and
then unfrozen and trained further for another 50 epochs with
batch size of 32 and learning rate of 0.001.
Each image contains between 1 and 50 object instances.
The objects are divided into two classes: ’person’, contain-
ing 2454 instances, and ’something else’, containing 238
instances, mostly birds but also some other objects floating
in the water. All the images were labeled manually, using
bounding boxes with the Labelbox annotation tool [28].
Training and testing were done using 4-fold cross-validation,
randomly splitting the images using a 75/25 train/test split.
We refer to the re-trained model as the task-specific model
hereinafter.
C. Metrics
Object detection performance was evaluated using PAS-
CAL VOC challenge metrics [29] provided by [30]. We
calculated average perception (AP) for both classes separately
and mean average perception (mAP) over both classes using
different intersection over union (IoU) thresholds. The com-
parison in performance was done between the pre-trained
YOLOv3 model and the task-specific model with our data
using transfer learning. Furthermore, since our objective is
to analyze the correlation between the performance of the
human detection and the altitude, we also analyze how the
detection confidence and the ratio of false positives and false
negatives changes as a function of the altitude.
IV. EXP ERIMENTAL RES ULTS
In this section, we assess the performance of the trained
model as a classifier using the mean average precision for
different IoU thresholds, but also its usability for active-
vision-based control where the input to the algorithm is the
confidence of the model on each of its detections.
Some representative example detections made by the task-
specific model are illustrated in Fig. 3. In Fig. 3a, we observe
how the network is able to pinpoint the location of people in
the image, but the bounding box appears around the turbulent
water rather than around the person itself. However, not all
objects or turbulent areas are detected as people, as other
objects are also properly identified (Fig. 3b. In Fig. 3b, we
also observe that people can be located far away when the
gimbal pitch is closer to 0°. Finally, we see that even at high
altitudes, the confidence remains high and people are detected
also when immersed (Fig. 3c).
The performance of the task-specific model compared to
the pre-trained YOLOv3 network is shown in Table I, where
we see that the task-specific model is clearly superior. In
TABLE I: mAP-scores for different IoU-thresholds.
IoU-threshold
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Model
Task-specific model 0.6985 0.6984 0.6972 0.6954 0.6934 0.6883 0.6780 0.6384 0.5422 0.0000
Pre-trained YOLOv3 0.0547 0.0533 0.0533 0.0528 0.0514 0.0514 0.0514 0.0507 0.0440 0.0000
(a) Detection of one person (high confidence), and turbulent water
next to another (lower confidence). Altitude: 37 m. Pitch: -80°.
(b) Detection of other objects but missing two persons in the
distance. Altitude: 12 m. Pitch: -25°.
(c) Successful detection of three people at high altitude, one of them fully immersed in the water (only a portion of the original image
is shown). Altitude: 86 m. Gimbal pitch: -90°.
Fig. 3: Samples of detections made using the task-specific model.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.8
0.84
0.88
0.92
0.96
1
Recall
Precision
Person
Others
Fig. 4: Precision x Recall curve for class ’person’ and ’some-
thing˙else’ using IoU-threshold 0.5 with the task-specific model.
terms of the precision ×recall curves, those corresponding to
classes ’person’ and ’something else’ are provided in Fig. 4.
Next, we analyze performance at different altitudes. The
significance of the altitude is, however, relative to the reso-
lution of the camera and its ability to produce clear images.
20 40 60 80 100 120 140
0
50
100
150
200
250
Relative Altitude (m)
Bounding box side length (pixels)
Individual
Average
Fig. 5: Side length of the ground truth bouding boxes, in pixels,
based on the altitude.
The camera pitch is also important as illustrated. In order to
provide results that are more generalizable, Fig. 5 shows the
size in pixels of the ground truth bounding boxes.
Fig. 6a shows all the person detections plotted in terms
of their confidence against the altitude, using IoU = 0.1to
20 40 60 80 100 120
0.2
0.4
0.6
0.8
1
Altitude (m)
Detection Confidence
True Positives
False Positives
(a) Confidence with IOU
20 40 60 80 100 120
0.2
0.4
0.6
0.8
1
Altitude (m)
Detection Confidence
True Positives
False Positives
(b) Confidence with DIST
Fig. 6: Confidence of individual detections as a function of the relative UAV altitude. We observe a clear difference between high-confidence
and true positives under the threshold of 100 m, with lower confidence and higher rate of false positives above it.
TP/IOU FP/IOU TP/DIST FP/DIST
0.2
0.4
0.6
0.8
1
Distributions
Confidence
Fig. 7: Distributions for the confidence of true positive (TP) and
false positive (FP) detections (DIST and IoU = 0.1).
20 40 60 80 100 120
0
0.25
0.5
0.75
1
Relative altitude (m)
FN/(TP+FN) Proportion
IoU=0.1
IoU=0.5
DIST
Fig. 8: Proportion of false negatives (FN) over true positives (TP)
and FN. This gives an idea of the probability of missing a person.
consider true positives. We have set the IoU to 0.1 because
we are only interested in pointing to the approximate location
of persons but not their exact size and place. For altitudes
under 60 m, over 98.8% of the detections with a confidence
above 0.5 are correct. A clear threshold appears at an altitude
of 90 m. Above 90 m, 83.3% of the detections are correct.
In some of the test images, we have noticed that the model
detects turbulence in the water created by people as persons,
and not the full bodies of the people themselves. Because we
are not interested in analyzing how capable the task-specific
model is of generating accurate bounding boxes, but instead
on pointing to the approximate location of people at sea,
we might also want to consider as correct detection boxes
that are just adjacent to actual people. In Fig. 6b, we have
plotted the confidence as a function of the altitude, but now
using a distance in pixels of less than 100 between the ground
truth and the predicted box (DIST) to assume that a detection
is correct. We now see that all except one of the positive
detections with a confidence of over 70% are correct for an
altitude up to 100 m. For a confidence above 45%, all but
one detections are correct up to an altitude of 70 m. The
distributions of the true positives and false positives for each
of the two metrics (IoU, DIST) are shown in Fig. 7. There is
a clear threshold just under a confidence of 0.6, with almost
75% of true positive having a confidence over 0.6, and almost
75% of false positives having a lower confidence.
In order to evaluate this model within its context for
SAR missions, we also need to take into account that false
positives do not necessarily have a significant impact on the
search performance, but false negatives do, as they mean
that the UAV misses a person. We have therefore plotted in
Fig. 8 the proportion of false negatives over true positives. If
we use the pixel distance to consider a detection as correct,
then the proportion remains under 10% for all altitudes. With
IoU = 0.5, however, over 50% of the people in the water
are undetected. However, we do not consider this an effective
way of evaluating a detection in this scenario.
V. CONCLUSION
With UAVs increasingly penetrating multiple civil domains
and, among them, search and rescue operations, more com-
plex control mechanisms are required for more autonomous
UAVs. To that end, active perception is one of the most
promising research directions. In UAV search, active vision
can be exploited to optimize the flight plan based on the
confidence of the DL vision algorithms. We have presented
preliminary work that studies the confidence of a re-trained
YOLOv3 model for detecting people in the water for altitudes
ranging from 20 m to 120 m. With a custom dataset, we
have seen a major performance increase with respect to
the pre-trained YOLOv3 network. Our results show a clear
correlation between the altitude and the confidence of the
detections and between the confidence and the correctness of
the detections. When considering as true positives detections
near actual people (e.g., over water turbulence created by
people), we have seen that the proportion of false negatives
remains low even for high altitudes, and the proportion of
false positives over true positives drops significantly for all
predictions with a confidence over 60%. Finally, we have
observed a clear altitude threshold at around 100 m after
which confidence and accuracy drop.
The results presented in this paper will serve as the starting
point towards the design of active-vision-based search with
UAVs in marine SAR operations. In future works, we will
also incorporate the camera pitch into the analysis. The
dataset will be made publicly available with further additions.
ACK NOWLEDGEM ENT S
This work was supported by the Academy of Finland’s
AutoSOS project with grant number 328755.
REFERENCES
[1] H. Shakhatreh, A. H. Sawalmeh, A. Al-Fuqaha, Z. Dou, E. Almaita,
I. Khalil, N. S. Othman, A. Khreishah, and M. Guizani, “Unmanned
aerial vehicles (uavs): A survey on civil applications and key research
challenges,” IEEE Access, vol. 7, pp. 48572–48 634, 2019.
[2] J. Pe˜
na Queralta, J. Taipalmaa, B. C. Pullinen, V. K. Sarker, T. N. Gia,
H. Tenhunen, M. Gabbouj, J. Raitoharju, and T. Westerlund, “Collab-
orative multi-robot search and rescue: Coordination and perception,
arXiv preprint arXiv:2008.12610 [cs.RO], 2020.
[3] S. Grogan, R. Pellerin, and M. Gamache, “The use of unmanned
aerial vehicles and drones in search and rescue operations–a survey,”
Proceedings of the PROLOG, 2018.
[4] W. Roberts, K. Griendling, A. Gray, and D. Mavris, “Unmanned
vehicle collaboration research environment for maritime search and
rescue,” in 30th Congress of the International Council of the Aero-
nautical Sciences. International Council of the Aeronautical Sciences
(ICAS) Bonn, Germany, 2016.
[5] Royal Spanish Federation of First Aid and Rescue, “National Drown-
ings Report - Informe Nacional de Ahogamientos (INA),” 2019.
[6] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,
arXiv, 2018.
[7] S.-J. Hong, Y. Han, S.-Y. Kim, A.-Y. Lee, and G. Kim, “Application of
deep-learning methods to bird detection using unmanned aerial vehicle
imagery,Sensors, vol. 19, no. 7, p. 1651, 2019.
[8] R. Tallamraju, E. Price, R. Ludwig, K. Karlapalem, H. H. B¨
ulthoff,
M. J. Black, and A. Ahmad, “Active perception based formation control
for multiple aerial vehicles,” IEEE Robotics and Automation Letters,
vol. 4, no. 4, pp. 4491–4498, 2019.
[9] D. Gallos and F. Ferrie, “Active vision in the era of convolutional
neural networks,” in 2019 16th Conference on Computer and Robot
Vision (CRV), 2019, pp. 81–88.
[10] J. Pe˜
na Queralta, J. Raitoharju, T. N. Gia, N. Passalis, and T. Wester-
lund, “Autosos: Towards multi-uav systems supporting maritime search
and rescue with lightweight ai and edge computing,” arXiv preprint
arXiv:2005.03409, 2020.
[11] A. Matos, A. Martins, A. Dias, B. Ferreira, J. M. Almeida, H. Ferreira,
G. Amaral, A. Figueiredo, R. Almeida, and F. Silva, “Multiple robot
operations for maritime search and rescue in eurathlon 2015 competi-
tion,” in OCEANS 2016-Shanghai. IEEE, 2016, pp. 1–7.
[12] J. G¨
uldenring, L. Koring, P. Gorczak, and C. Wietfeld, “Heterogeneous
multilink aggregation for reliable uav communication in maritime
search and rescue missions,” in 2019 International Conference on
Wireless and Mobile Computing, Networking and Communications
(WiMob). IEEE, 2019, pp. 215–220.
[13] R. Konrad, D. Serrano, and P. Strupler, “Unmanned aerial systems,”
Search and Rescue Robotics—From Theory to Practice, pp. 37–52,
2017.
[14] H. Surmann, R. Worst, T. Buschmann, A. Leinweber, A. Schmitz,
G. Senkowski, and N. Goddemeier, “Integration of uavs in urban
search and rescue missions,” in 2019 IEEE International Symposium
on Safety, Security, and Rescue Robotics (SSRR). IEEE, 2019, pp.
203–209.
[15] T. Giitsidis, E. G. Karakasis, A. Gasteratos, and G. C. Sirakoulis,
“Human and fire detection from high altitude uav images,” in 2015
23rd Euromicro International Conference on Parallel, Distributed, and
Network-Based Processing. IEEE, 2015, pp. 309–315.
[16] S. Yong and Y. Yeong, “Human object detection in forest with deep
learning based on drone’s vision,” in 2018 4th International Conference
on Computer and Information Sciences (ICCOINS). IEEE, 2018, pp.
1–5.
[17] “Autonomous human detection system mounted on a drone,” 2019
International Conference on Wireless Communications, Signal Pro-
cessing and Networking, WiSPNET 2019, pp. 335–338, 2019.
[18] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2014, pp. 580–587.
[19] R. Girshick, “Fast r-cnn,” in International Conference on Computer
Vision (ICCV), 2015.
[20] R. Bajcsy, Y. Aloimonos, and J. Tsotsos, “Revisiting active perception,
Autonomous Robots, vol. 42, pp. 177—-196, 2018.
[21] P. Ammirato, P. Poirson, E. Park, J. Koˇ
seck´
a, and A. C. Berg, “A
dataset for developing and benchmarking active vision,” in 2017 IEEE
International Conference on Robotics and Automation (ICRA), 2017,
pp. 1378–1385.
[22] J. Sandino, F. Vanegas, F. Gonz´
alez, and F. Maire, “Autonomous uav
navigation for active perception of targets in uncertain and cluttered
environments,” in 2020 IEEE Aerospace Conference, 2020.
[23] D. Falanga, E. Mueggler, M. Faessler, and D. Scaramuzza, “Aggressive
quadrotor flight through narrow gaps with onboard sensing and com-
puting using active vision,” in 2017 IEEE International Conference on
Robotics and Automation (ICRA), 2017, pp. 5774–5781.
[24] M. Chessa, S. Murgia, L. Nardelli, S. P. Sabatini, and F. Solari,
“Bio-inspired active vision for obstacle avoidance,” in Conference on
Computer Graphics Theory and Applications, 2014, pp. 1–8.
[25] F. Zhong, P. Sun, W. Luo, T. Yan, and Y. Wang, “AD-VAT: An
asymmetric dueling mechanism for learning visual active tracking,
in International Conference on Learning Representations, 2019.
[26] E. Lygouras, N. Santavas, A. Taitzoglou, K. Tarchanidis, A. Mitropou-
los, and A. Gasteratos, “Unsupervised human detection with an em-
bedded vision system on a fully autonomous uav for search and rescue
operations,” Sensors, vol. 19, no. 16, p. 3542, 2019.
[27] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei,
“ImageNet: A Large-Scale Hierarchical Image Database,” in CVPR09,
2009.
[28] Labelbox, “Labelbox,” 2019, [Online]. Available: https://labelbox.com.
[29] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and
A. Zisserman, “The pascal visual object classes (voc) challenge,”
International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338,
Jun. 2010.
[30] R. Padilla, S. L. Netto, and E. A. B. da Silva, “A survey on perfor-
mance metrics for object-detection algorithms,” in 2020 International
Conference on Systems, Signals and Image Processing (IWSSIP), 2020,
pp. 237–242.
... In addition to search and rescue ships, present maritime search and rescue missions are generally operated by helicopters as a supporter [9,10]. In recent years, many scholars' studies have shown that search and rescue work can be combined with drones to assist a person in distress [10][11][12]. In the part of the academic research about planning maritime search and rescue operations, Barciu [13] used risk assessment to determine the search area to enhance the effectiveness of maritime search and rescue operations. ...
Article
Full-text available
Coastal countries began to develop green energy, and offshore wind power equipment in coastal areas was gradually built. Since coastal wind power generation often requires carrying out maintenance between wind turbines with the assistance of service operation vessels, this situation may cause coastal areas to be prone to people falling into the water. However, traditional maritime search and rescue plans take a long time to gather information from man overboard incidents. In order to minimize injuries to people in distress, the maritime search and rescue process must be as short as possible. Despite that all the search and rescue plans are based on the concept of the shortest path, the efficient plans must not only consider the distance but also consider the cost of search and rescue. Therefore, this study established a set of practices applicable to the on-site commander (OSC) to dispatch rescue ships, as well as the planning of maritime search and rescue route models. Based on the easy-to-observe state of the target in distress, the model is analyzed and calculated by Floyd–Warshall algorithm and Grey relational analysis so as to sort the rescue plan and optimize the effect of the search and rescue route at sea. According to the simulation analysis, when the man overboard incident occurs in the coastal area, the OSC can immediately use this model to plan the best search and rescue route and dispatch a reasonable number of rescue ships.
... Other studies [38,31] only considered fixed-altitude mission planning in terrain monitoring problems; thereby neglecting the altitude dependency of the camera. We follow previous approaches that empirically assess the effects of multi-resolution observations for trained models [16,25,27,11]. In contrast to these works, which derive the sensor model for planning offline, i.e. before a mission, our contribution is a decision function that supports online updates based on incoming images for more reliable predictive planning performance. ...
Preprint
Full-text available
Efficient data collection methods play a major role in helping us better understand the Earth and its ecosystems. In many applications, the usage of unmanned aerial vehicles (UAVs) for monitoring and remote sensing is rapidly gaining momentum due to their high mobility, low cost, and flexible deployment. A key challenge is planning missions to maximize the value of acquired data in large environments given flight time limitations. This is, for example, relevant for monitoring agricultural fields. This paper addresses the problem of adaptive path planning for accurate semantic segmentation of using UAVs. We propose an online planning algorithm which adapts the UAV paths to obtain high-resolution semantic segmentations necessary in areas with fine details as they are detected in incoming images. This enables us to perform close inspections at low altitudes only where required, without wasting energy on exhaustive mapping at maximum image resolution. A key feature of our approach is a new accuracy model for deep learning-based architectures that captures the relationship between UAV altitude and semantic segmentation accuracy. We evaluate our approach on different domains using real-world data, proving the efficacy and generability of our solution.
... Onboard odometry and positioning are typically based on monocular or stereo vision (e.g., VINS-mono [19], vinsfusion [20]), but lidars are also effective in larger UAVs [21]. Passive visual sensors, however, have evident limitations in terms of environmental conditions (e.g., night operation) and in situations where there is a lack of features [22], [23] D. UWB Localization Ultra-wideband (UWB) positioning systems are being increasingly adopted for autonomous systems [10].UWB positioning systems based on a series of fixed nodes in known locations (or anchors), and ranging measurements between these and mobile nodes (or tags), can be used for consistent, long-term localization of mobile robots [24], [8].Compared to RTK-based localization systems, UWB systems can be utilized both indoors and outdoors, can be automatically calibrated [25] for ad-hoc deployment, and offer similar accuracies at much lower prices. UWB sensors also have a small form factor and are generally considered more energy efficient than other wireless solutions. ...
... Examples of UAV-based solutions for SAR applications in different environments and scenarios such as: remote disaster areas and wilderness SAR [1, [230][231][232]; urban SAR (ex. collapsing buildings) [233][234][235]; underground tunnels [236,237]; and marine SAR [238][239][240][241]. ...
Article
Full-text available
Unmanned Aerial Vehicles have undergone rapid developments in recent decades. This has made them very popular for various military and civilian applications allowing us to reach places that were previously hard to reach in addition to saving time and lives. A highly desirable direction when developing unmanned aerial vehicles is towards achieving fully autonomous missions and performing their dedicated tasks with minimum human interaction. Thus, this paper provides a survey of some of the recent developments in the field of unmanned aerial vehicles related to safe autonomous navigation, which is a very critical component in the whole system. A great part of this paper focus on advanced methods capable of producing three-dimensional avoidance maneuvers and safe trajectories. Research challenges related to unmanned aerial vehicle development are also highlighted.
... Compared with fixed vision, active vision overcomes the restriction of FOV without adding extra visual devices. Active vision has been successfully applied to aerial swarm missions, such as human detection of marine search and rescue [16] and target tracking with optimal view-point configurations [17]. Applying active vision to aerial swarms requires decentralized planning of temporal and spatial distribution of the camera's attention, i.e., planning when and where the camera observes so that all agents can cooperate and achieve accurate and agile flight. ...
Preprint
Full-text available
Relative localization is a prerequisite for the cooperation of aerial swarms. The vision-based approach has been investigated owing to its scalability and independence on communication. However, the limited field of view (FOV) inherently restricts the performance of vision-based relative localization. Inspired by bird flocks in nature, this letter proposes a novel distributed active vision-based relative localization framework for formation control in aerial swarms. Aiming at improving observation quality and formation accuracy, we devise graph-based attention planning (GAP) to determine the active observation scheme in the swarm. Then active detection results are fused with onboard measurements from UWB and VIO to obtain real-time relative positions, which further improve the formation control performance. Real-world experiments show that the proposed active vision system enables the swarm agents to achieve agile flocking movements with an acceleration of 4 $m/s^2$ in circular formation tasks. A 45.3 % improvement of formation accuracy has been achieved compared with the fixed vision system.
... Other studies [25], [21] only consider fixed-altitude mission planning. We follow previous approaches that empirically assess the effects of multi-resolution observations for trained models [10], [16], [17]. Specifically, our contribution is a new decision function that supports online updates for more reliable predictive planning. ...
Preprint
Full-text available
In this paper, we address the problem of adaptive path planning for accurate semantic segmentation of terrain using unmanned aerial vehicles (UAVs). The usage of UAVs for terrain monitoring and remote sensing is rapidly gaining momentum due to their high mobility, low cost, and flexible deployment. However, a key challenge is planning missions to maximize the value of acquired data in large environments given flight time limitations. To address this, we propose an online planning algorithm which adapts the UAV paths to obtain high-resolution semantic segmentations necessary in areas on the terrain with fine details as they are detected in incoming images. This enables us to perform close inspections at low altitudes only where required, without wasting energy on exhaustive mapping at maximum resolution. A key feature of our approach is a new accuracy model for deep learning-based architectures that captures the relationship between UAV altitude and semantic segmentation accuracy. We evaluate our approach on the application of crop/weed segmentation in precision agriculture using real-world field data.
Article
Full-text available
Affected by the complex environment and the destruction of communication infrastructure in the disaster-stricken area, it has brought great challenges to the search and rescue team. The use of small unmanned aerial vehicles (UAVs) for search tasks can minimize casualties. Therefore, in order to avoid any possible collision and search for unknown targets in the shortest time, it is necessary to design a multi-UAV cooperative target search strategy. In this paper, we analyze the unknown target search problem of multi-UAVs under random dynamic topology and propose an adaptive target search strategy based on the whale algorithm. First of all, each UAV detects the environmental information of its current area and uses the probability map search algorithm to gain the target existence probability map in the task area. Then, the whale optimization search method of shrinking circle or spiral is selected to update the position of the UAV to continuously approach the target. Finally, the obstacle avoidance strategy based on artificial potential field is designed to solve any collision problems that may be encountered during the flight of UAVs. Simulations on multi-UAVs target search in different scenarios show that compared with the whale optimization algorithm, the proposed algorithm can reduce the search time by 43.1% and the total path cost by 18.1%, and it is also superior to the advanced metaheuristic optimization algorithms such as PSO and GWO.
Conference Paper
Full-text available
This work explores and compares the plethora of metrics for the performance evaluation of object-detection algorithms. Average precision (AP), for instance, is a popular metric for evaluating the accuracy of object detectors by estimating the area under the curve (AUC) of the precision × recall relationship. Depending on the point interpolation used in the plot, two different AP variants can be defined and, therefore, different results are generated. AP has six additional variants increasing the possibilities of benchmarking. The lack of consensus in different works and AP implementations is a problem faced by the academic and scientific communities. Metric implementations written in different computational languages and platforms are usually distributed with corresponding datasets sharing a given bounding-box description. Such projects indeed help the community with evaluation tools, but demand extra work to be adapted for other datasets and bounding-box formats. This work reviews the most used metrics for object detection detaching their differences, applications, and main concepts. It also proposes a standard implementation that can be used as a benchmark among different datasets with minimum adaptation on the annotation files.
Conference Paper
Full-text available
The use of Small Unmanned Aerial Vehicles (sUAVs) has grown exponentially owing to an increasing number of autonomous capabilities. Automated functions include the return to home at critical energy levels, collision avoidance, takeoff and landing, and target tracking. However, sUAVs applications in real-world and time-critical scenarios, such as Search and Rescue (SAR) is still limited. In SAR applications, the overarching aim of autonomous sUAV navigation is the quick localisation, identification and quantification of victims to prioritise emergency response in affected zones. Traditionally , sUAV pilots are exposed to the prolonged use of visual systems to interact with the environment, which cause fatigue and sensory overloads. Nevertheless, the search for victims onboard a sUAV is challenging because of noise in the data, low image resolution, illumination conditions, and partial (or full) occlusion between the victims and surrounding structures. This paper presents an autonomous Sequential Decision Process (SDP) for sUAV navigation that incorporates target detection uncertainty from vision-based cameras. The SDP is modelled as a Partially Observable Markov Decision Process (POMDP) and solved online using the Adaptive Belief Tree (ABT) algorithm. In particular, a detailed model of target detection uncertainty from deep learning-based models is shown. The presented formulation is tested under Software in the Loop (SITL) through Gazebo, Robot Operating System (ROS), and PX4 firmware. A Hardware in the Loop (HITL) implementation is also presented using an Intel Myriad Vision Processing Unit (VPU) device and ROS. Tests are conducted in a simulated SAR GPS-denied scenario, aimed to find a person at different levels of location and pose uncertainty.
Article
Full-text available
Unmanned aerial vehicles (UAVs) play a primary role in a plethora of technical and scientific fields owing to their wide range of applications. In particular, the provision of emergency services during the occurrence of a crisis event is a vital application domain where such aerial robots can contribute, sending out valuable assistance to both distressed humans and rescue teams. Bearing in mind that time constraints constitute a crucial parameter in search and rescue (SAR) missions, the punctual and precise detection of humans in peril is of paramount importance. The paper in hand deals with real-time human detection onboard a fully autonomous rescue UAV. Using deep learning techniques, the implemented embedded system was capable of detecting open water swimmers. This allowed the UAV to provide assistance accurately in a fully unsupervised manner, thus enhancing first responder operational capabilities. The novelty of the proposed system is the combination of global navigation satellite system (GNSS) techniques and computer vision algorithms for both precise human detection and rescue apparatus release. Details about hardware configuration as well as the system’s performance evaluation are fully discussed.
Article
Full-text available
We present a novel robotic front-end for autonomous aerial motion-capture (mocap) in outdoor environments. In previous work, we presented an approach for cooperative detection and tracking (CDT) of a subject using multiple micro-aerial vehicles (MAVs). However, it did not ensure optimal view-point configurations of the MAVs to minimize the uncertainty in the person's cooperatively tracked 3D position estimate. In this article, we introduce an active approach for CDT. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs can actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieve this by decoupling the goal of active tracking into a quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We derive this decoupling using Gaussian observation model assumptions within the CDT algorithm. We preserve convexity in optimization by embedding all the non-convex constraints, including those for dynamic obstacle avoidance, as external control inputs in the MPC dynamics. Multiple real robot experiments and comparisons involving 3 MAVs in several challenging scenarios are presented.
Article
Full-text available
Wild birds are monitored with the important objectives of identifying their habitats and estimating the size of their populations. Especially in the case of migratory bird, they are significantly recorded during specific periods of time to forecast any possible spread of animal disease such as avian influenza. This study led to the construction of deep-learning-based object-detection models with the aid of aerial photographs collected by an unmanned aerial vehicle (UAV). The dataset containing the aerial photographs includes diverse images of birds in various bird habitats and in the vicinity of lakes and on farmland. In addition, aerial images of bird decoys are captured to achieve various bird patterns and more accurate bird information. Bird detection models such as Faster Region-based Convolutional Neural Network (R-CNN), Region-based Fully Convolutional Network (R-FCN), Single Shot MultiBox Detector (SSD), Retinanet, and You Only Look Once (YOLO) were created and the performance of all models was estimated by comparing their computing speed and average precision. The test results show Faster R-CNN to be the most accurate and YOLO to be the fastest among the models. The combined results demonstrate that the use of deep-learning-based detection methods in combination with UAV aerial imagery is fairly suitable for bird detection in various environments.
Article
Visual Active Tracking (VAT) aims at following a target object by autonomously controlling the motion system of a tracker given visual observations. To learn a robust tracker for VAT, we propose a novel adversarial reinforcement learning method which adopts an Asymmetric Dueling mechanism, referred to as AD-VAT. In this mechanism, the tracker and target, viewed as two learnable agents, are opponents and can mutually enhance each other during the dueling/competition: i.e., the tracker intends to lockup the target, while the target tries to escape from the tracker. The dueling is asymmetric in that the target is additionally fed with the tracker's observation and action, and learns to predict the tracker's reward as an auxiliary task. Such an asymmetric dueling mechanism produces a stronger target, which in turn induces a more robust tracker. To improve the performance of the tracker in challenging scenarios, we employ more advanced environment augmentation method and two-stage training strategies, termed as AD-VAT+. For a better understanding of the mechanism, we analyze the target's behaviors as the training proceeds and visualize the latent space of the tracker. The experimental results on 2D and 3D environments demonstrate that the proposed method yields a robust tracker.
Conference Paper
This technical report is about the architecture and integration of commercial UAVs in Search and Rescue missions. We describe a framework that consists of heterogeneous UAVs, a UAV task planner, a bridge to the UAVs, an intelligent image hub, and a 3D point cloud generator. A first version of the framework was developed and tested in several training missions in the EU project TRADR.