ArticlePDF Available

Victim Localization in USAR Scenario Exploiting Multi-Layer Mapping Structure

  • Dubai Futue Labs

Abstract and Figures

Urban search and rescue missions require rapid intervention to locate victims and survivors in the affected environments. To facilitate this activity, Unmanned Aerial Vehicles (UAVs) have been recently used to explore the environment and locate possible victims. In this paper, a UAV equipped with multiple complementary sensors is used to detect the presence of a human in an unknown environment. A novel human localization approach in unknown environments is proposed that merges information gathered from deep-learning-based human detection, wireless signal mapping, and thermal signature mapping to build an accurate global human location map. A next-best-view (NBV) approach with a proposed multi-objective utility function is used to iteratively evaluate the map to locate the presence of humans rapidly. Results demonstrate that the proposed strategy outperforms other methods in several performance measures such as the number of iterations, entropy reduction, and traveled distance.
Content may be subject to copyright.
remote sensing
Victim Localization in USAR Scenario Exploiting
Multi-Layer Mapping Structure
Abdulrahman Goian 1,†, Reem Ashour 1, *,†, Ubaid Ahmad 2, Tarek Taha 3, Nawaf Almoosa 2
and Lakmal Seneviratne 1
1Khalifa University Center for Autonomous Robotic Systems (KUCARS), Khalifa University of Science and
Technology (KU), P.O. Box 127788, Abu Dhabi, UAE; (A.G.); (L.S.)
2Emirates British Telecommunications Innovation Center (EBTIC), Khalifa University of Science and
Technology (KU), P.O. Box 127788, Abu Dhabi, UAE; (U.A.); (N.A.)
3Algorythma’s Autonomous Aerial Lab, P.O. Box 112230, Abu Dhabi, UAE;
These authors contributed equally to this work.
Received: 27 August 2019; Accepted: 11 November 2019; Published: 19 November 2019
Urban search and rescue missions require rapid intervention to locate victims and survivors
in the affected environments. To facilitate this activity, Unmanned Aerial Vehicles (UAVs) have been
recently used to explore the environment and locate possible victims. In this paper, a UAV equipped
with multiple complementary sensors is used to detect the presence of a human in an unknown
environment. A novel human localization approach in unknown environments is proposed that
merges information gathered from deep-learning-based human detection, wireless signal mapping,
and thermal signature mapping to build an accurate global human location map. A next-best-view
(NBV) approach with a proposed multi-objective utility function is used to iteratively evaluate the
map to locate the presence of humans rapidly. Results demonstrate that the proposed strategy
outperforms other methods in several performance measures such as the number of iterations,
entropy reduction, and traveled distance.
multi-layer mapping; next-best-view; search and rescue; victim localization; sensor fusion
1. Introduction
The current advanced development of aerial robotics research has enabled their deployment in a
variety of tasks such as urban search and rescue (USAR) [
], infrastructure inspection [
], 2D/3D
reconstruction for building using point cloud generated from UAVs Images [
] or 3D reconstruction of
historical cities by using the laser scanning and digital photogrammetry [
], and mining [
]. Notably,
the intervention of the UAVs in the USAR environment is critical in the sense that they require
rapid response to locate all victims and survivors in this environment. Many sensors, such as RGB
color cameras [
] and thermal cameras [
], were used to localize victims in the literature. Single
sensors methods cannot effectively locate victims due to the harsh nature of USAR environments.
Some sensors cannot operate under certain conditions such as smoke, dust, radiation, and gases.
For instance, vision-based systems are incapable of locating a victim under dark lighting condition or
in the presence of occlusions. Additionally, thermal-based systems can be deceived by excessive heat
sources such as fires. Consequently, recent studies have been conducted to reduce victim localization
time by using multiple sensors, thus improving the detection and localization performance in such
unfavorable environment. The work presented in this paper tackles considers each sensor as a victim
Remote Sens. 2019,11, 2704; doi:10.3390/rs11222704
Remote Sens. 2019,11, 2704 2 of 25
detector module. These detector modules are independent of one another, and they are used to
generate victim-location maps. A probabilistic sensor fusion technique is performed on the generated
maps to create a new merged map. Consequently, this map is used by the robot for environment
exploration to localize the presence of victims.
In the current state of the art, a variety of contributions have been made to accomplish
autonomous exploration. Most autonomous exploration approaches use frontier exploration [
and, information gain theory, e.g., next-best-view(NBV) [
]. Different types of maps can be constructed
to project different information such as topological occupancy maps, metric maps, and semantic maps.
Human detection was widely studied by training Support Vector Machine (SVM) classifier to detect
human presence from 2D images [
]. Recently, deep-learning algorithms are applied for object
and human detection [
]. In a NBV exploration process, the UAV navigates in the environment using
a utility function, also called a reward function, which selects the next best exploration activities that
minimize the effort required to localize victims.
In this work, we present a multi-sensor-based NBV system to locate victims in a simulated urban
search and rescue environment. The proposed multi-sensor system uses different sensor sources,
namely vision, thermal, and wireless, to generate victim-location maps which are then fused into a
merged map using probabilistic Bayesian framework.
Moreover, a multi-objective utility function is proposed to speed up the search mission of locating
victims. The proposed multi-objective utility function evaluates viewpoints based on three desired
objectives namely exploration, victim detection and traveled distance. Finally, an adaptive grid
sampling algorithm (AGSA) is developed to resolve the local minimum issue where the robot tends
to move in a repeated pattern [
], a problem commonly occurs in the Regular Grid Sampling
Approach (RGSA).
In particular, our contributions are as follows:
A multi-sensor fusion approach that uses vision, thermal, and wireless sensors to generate a
probabilistic 2D victim-location maps.
A multi-objective utility function that uses the fused 2D victim-location map for viewpoints
evaluation task. The evaluation process targets three desired objectives, specifically exploration,
victim detection, and traveled distance.
An adaptive grid sampling algorithm (AGSA) to resolve the local minimum issue occurs in the
Regular Grid Sampling Approach (RGSA).
The rest of the paper is organized as follows; Section 2outlines the related work. The detailed
proposed approach is presented in Section 3. Section 4presents the simulation process and the
experimental results. Finally, the conclusion and future works are given in Section 5.
2. Related Work
To autonomously locate victims inside an urban search and rescue environments with a robotic
system, multiple prerequisite components should exist on the robotic system to perform its task
effectively. In this section, the various robotic components that constitutes an autonomous search and
rescue robot are presented, and related work to this component will be described.
2.1. Victim Detection
Victim detection is the most critical task in search and rescue missions where victims should be
identified as fast as possible for the first responders to intervene and rescue them. Throughout the
literature, the human detection problem has been investigated extensively from different perspectives
such as thermal [
] and visual perspectives [
]. Various sensors enabled different strategies to
detect humans such as the RGB cameras and thermal cameras.
Remote Sens. 2019,11, 2704 3 of 25
From thermal detection perspective, authors in [
] used a background subtraction along with
trained Support Vector Machine (SVM) classifier to detect human presence in images streamed from a
thermal camera. Similarly, authors in [
] used a combination of Histogram of Oriented Gradients
(HOG) and CENsus TRansform hISTogram (CENTRIST) features to train the SVM classifier for
thermal detection.
However, from the visual detection scope, the detection is extensively performed using machine
learning methods and currently through deep-learning methods. In the context of machine learning
approaches, authors in [
] used SVM classifier along with HOG features to locate victims with different
poses. They focused on diversifying their dataset with different human poses and lighting condition
to boost the detection accuracy. Additionally, in [
] a rotational invariant F-HOG classifier was used
to detect a lying-pose human in RGB image. They showed that the F-HOG is more discriminative and
performs better compared to the traditional HOG feature descriptor.
Machine learning detection methods suffer from several limitations resulting in lower
accuracy/precision performance. First, machine learning approaches consider training a set of
images with a nearly uniform background, which is not reliable when performing it in a real-world
disaster area. Additionally, machine learning algorithms represent human with pre-defined features.
However, this assumption may not be capable of generalizing human features, which lead to detection
error. Therefore, deep-learning models are used to achieve higher accurate detection without feature
extraction process.
However, more accurate approaches to human detection are achieved using deep learning [
Unlike conventional methods, deep-learning models can perform higher detection accuracy by finding
the most efficient features patterns existed in trained visual input [
]. An accurate approaches
to human detection are achieved using deep learning is found in [
]. In [
] author succeeds
in running the deep-learning model called Fast R-CNN at 5 fps with reasonable detection accuracy.
A faster detection model was introduced in [
] where the authors designed a real-time deep-learning
network called Single-Shot multi-box detector (SSD). The model involves multiple convolutional
layers that perform the same detection but at different aspect ratio leading to higher overall detection
accuracy. The achieved detection speed is 59 FPS on Nvidia Titan X.
However, one main drawback in deep-learning classifiers is the accuracy vs. speed trade-off.
In other words, accurate CNN model tends to have more layers with a larger size to learn deep
features, which consequently requires a longer time to evaluate the input images streams in real-time
requirements. For examples, authors in [
] proposed a CNN structures containing 7 layers,
5 convolutional layers and 2 full-connected layers. The model is pre-trained on the ImageNet database,
and then the last two fully connected layers in the CNN were fine-tuned to provide the model with
distinguishable features for the victims. The main disadvantage of the system is that it possesses a
high computational load and need a separate server to run the detection model. Recent work was
designed to address and improve the accuracy vs. speed trade-off.
Another strategy that is used to localize humans accurately is the sensor fusion. Sensor fusion
allows taking the advantages of each sensor while compensating for their limitations. In [
authors performed sensor fusion to identify victim using rescue robots operating in cluttered USAR
environment. The sensor fusion was performed in the feature level where initially an RGB-D image
is captured from the environment. Then, a depth region segmentation is performed on the image
to get sub-region segmentation. After that, skin feature, body part association and the geometric
feature are extracted from the image. Finally, the extracted features are used to train a SVM classier
to identify the victim a captured RGB-D image. In [
], the thermal image is used to obtain rough
region of interest (ROI) coordinates of human location then the estimated ROI is fused with the depth
image to obtain the actual human location along with occlusion handling. In [
], the depth–color joint
features are extracted from the RGB-D images which used to train a human detector whose inputs are
multi-channels namely LUV, HOG and depth channel. Unlike conventional methods, deep learning
Remote Sens. 2019,11, 2704 4 of 25
can find the most efficient features patterns that existed in trained visual input, in contrast to the
conventional methods, which used pre-defined features leading to higher detection accuracy.
In this work, three different sensors are used to capture environment data, RGB Camera, thermal
camera, and wireless sensor. Each sensor data is going to be used to generate a 2D map called victim
maps. After that, the three maps are fused into one merged map that is going to be used for the
exploration process. Our method is different to the state-of-the-art method because of adding the
wireless sensor. The wireless sensor is used to indicate the presence of a victim from the mobile phone.
Using a wireless sensor with the visual and thermal sensors can help in locating the victim accurately.
2.2. Mapping
While attempting to locate victims in an USAR environment, exploration techniques require the
presence of a map that captures the information gathered by the sensors during the exploration method.
Among the methods which implemented a human detector, many of them used a mapping structure
to stores their detection results. The most popular mapping structure is the 2D and 3D occupancy grid
where the work-space is discretized into 2D squares or 3D cubic volumes known as pixels or voxels,
respectively and they are used to store the probability that a human is detected.
In occupancy grid maps, a
is a cell with index
and this occupancy grid is simply a finite
collection of all the cells modeled as
. . .
. Where
is the total number of cells.
In a 2D occupancy grid representing victim presence, every cell contains victim detection probability.
Clearly, the higher the probability, the higher indication of human presence. The cell that holds a 0.5
probability indicated as an unknown cell where it is equally likely for the victim to be presented or not.
Figure 1illustrates an occupancy cell obtained from an RGB-D camera where ray-tracing is performed
to update the map based on the camera visible field of view (FOV).
To update the map, many methods found in the literature used a Bayesian approach [
They either used the Bayes rule directly or indirectly through one of its derived methods such as
Kalman Filter, Particle Filter, etc. The concept behind the Bayes rule is that given some characteristic
information and level of confidence about it, it is used to determine the probability a human occupies
a given location. Some examples of work which used Bayes rule includes [26,27].
To capture human/victim detection, each cell in the occupancy grid map can hold one of the two
distinct states
S={human,Non human}
. The probability that a human is observed will be denoted as
and probability of non-human is donated as
. Practically, determining the probability
at specific cell requires the last state on the cell, the observation from the human detection module and
the robot motion. Such information is taken into account using the Bayes filter [28].
In this work, three single 2D maps are generated from three different sensors, RGB camera, thermal
camera, and wireless sensor. All maps demonstrate the presence of humans and are updated using a
Bayesian approach. After that, a merged map is generated by combining the weighted re-scaled maps.
Remote Sens. 2019,11, 2704 5 of 25
Figure 1.
) Shows the top-view of the scene containing people and an object scanned by an RGB-D
camera. (
) Shows the obtained occupancy grid of the scene wherein the higher the probability,
the darker the cell is, and vice versa.
2.3. Exploration
Exploration of unknown environment has been achieved using various methods throughout the
literature. Frontier-based [
] and Next-Best-View (NBV) [
] approaches are the most commonly
used approaches. The frontier-based methods aim to explore the frontiers (boundaries between free
and unexplored space) of a given area. However, the NBV approaches are used to determine the
next viewpoint or candidate position to visit to increase the chance of achieving the objective defined
in the utility function. The NBV approach selects the next pose from the list of possible candidates
(viewpoints) that score the highest according to the criteria defined in the utility function. The criteria
may include: the amount of information that could be potentially obtained by the candidate viewpoint;
the execution cost (i.e., energy, path) of the next action [
]; or the distance to be traveled to reach the
next best view [
]. Consequently, a proper utility function and information gain prediction have a
vital role in NBV selection.
In this research, we use the NBV as it is directly related to the work presented in this paper.
The main elements of the NBV approach are viewpoint sampling, viewpoint evaluation, and a
termination condition.
2.3.1. Viewpoint Sampling
To determine the next viewpoint to visit in order to maximize the chance of detecting a victim,
the common methods sample the space into candidate viewpoints, and evaluate these viewpoints in
order to determine which one can provide the best chance of detecting a victim. Different sampling
strategies are commonly used in the literature such as (1) Regular Grid Sampling [
], (2) Frontier
Sampling [32], and (3) Rapidly exploring Random Trees [33].
In the regular grid sampling, the state space around the robot is discretized based on the number
of degrees of freedom (n-DOF) of the robot. Precisely, each degree of the robot is used to divide state
space and the possible poses that an n-DOF robot can is visualized as points lying in a discretized
n-dimensional space. In the case of 3- dimensional space, the sampled point will look like a crystal
lattice; hence the name “state lattice” [31].
In frontier sampling, moving the robot to the boundaries between known and unknown space can
maximize the chance of obtaining new information. This approach allows the robot to see beyond the
explored map and hence expand the map. Typically, this sampling procedure is done on an occupancy
grid map, as it is easy to allocate frontiers where the free and unexplored space presented clearly [
Remote Sens. 2019,11, 2704 6 of 25
A simple form of frontier selection is to identify the frontier cell which is closest to the current
robot location. Then, the robot moves to that frontier and captures data from its sensor to expand the
exploration map. The robot repeats this process until no frontier is left [32,34].
Rapidly exploring Random Trees (RRTs) are another method commonly used to sample in a
receding horizon manner [
]. By sampling possible future viewpoints in a geometric random tree,
the defined utility function will enable the exploration of unknown volume environment. During this
process, a series of random points are generated. Then, these points are connected in a tree-like
expanding through explored space. Each branch represents a set of random chains and the branch
which maps the unknown space is selected. Each branch of the samples RRT is then evaluated using
the defined utility function, and the branch with the best possible exploration outcome is used to select
the best new exploration position to travel to.
2.3.2. Viewpoint Evaluation
A utility function is usually used to select the best viewpoint from the list of available candidates
generated during the sampling process. Information metric used to evaluate the viewpoints is the
information gain [35]. The entropy (H) for a specific cell xin the occupancy map is given as
Hi=Pilog (Pi) + (1Pi)log (1Pi)(1)
From Equation (1), the entropy
is a measure of uncertainty associated with the cell probability.
is maxed when the all outcomes are equally likely while it is zero when the trial outcomes are certain.
The corresponding view entropy for a specific view denoted by subscript
is the sum on the
individual cell’s entropy within the sensor FOV, and defined as:
(Pilog (Pi)+(1Pi)log (1Pi)) (2)
is a normalization factor that represents the number of visible cells in the sampled FOV [
An extension of the information gain is the
weighted information gain
with distance penalty [
given by:
(Pilog (Pi)+(1Pi)log (1Pi)) (3)
where λis the penalization factor [12].
Another information gain utility function is the
max probability
. In this method, the viewpoints
that contain cells with higher maximum probability are preferred compared to other viewpoints with
cells with less maximum probability. This method works well when a victim is found as the next
evaluated viewpoints will cover the victim within their FOV aiming to higher or lower the victim
detection probability. The evaluation of the max probability gain of a viewpoint [2] is given by:
iPiwhere i ={1, ..., N}(4)
2.3.3. Termination Conditions
Autonomous exploration tasks terminate once a pre-defined condition has been achieved. There
are several type of termination conditions depending on the scenario such as whether a victim was
found [
], whether a maximum number of iterations has been reached [
], or whether the information
gain reached a max [
]. Additionally, for frontier-based methods, the exploration terminates when
no more frontiers generated [32].
Our work is different from the state-of-the-art approaches in two elements of the NBV, viewpoint
sampling, and viewpoint evaluation. The proposed Adaptive Grid Sampling Approach overcomes
the limitation of one of the existing state-of-the-art approaches, which is the Regular Grid Sampling
Remote Sens. 2019,11, 2704 7 of 25
Approach [
]. Also, we propose a new viewpoint evaluation (utility function) method to detect
a human as fast as possible and reduce the traveled distance. Our viewpoint evaluation function
considers occupancy, distance, and victim detection using both the proposed merged map and the
proposed sampling approach. The proposed utility function is compared with the state-of-the-art
utility functions presented in Equations (3) and (4).
3. Proposed Approach
In this work, a new approach for victim localization, depicted in Figure 2, is presented.
The proposed victim localization method consists of two main processes: the victim detection/mapping
and the NBV exploration processes. In the first process, victim detection/mapping, the victim is
detected and localized in multi-layer maps that are generated by fusing vision, thermal, and wireless
data. Victim detection is accomplished by applying a detection algorithm that estimates human
location with uncertainty. The victim location is stored in an occupancy grid map, which is recursively
updated to increase the confidence in locating the victim. In the second process, exploration, the NBV
approach is used, and a new utility function is proposed. The NBV approach includes viewpoint
sampling, viewpoint evaluation, navigation, and termination conditions.
The novelty of our solution is on the integration of the wireless signal with other various classical
sensors to assist in locating victims. This presents a unique solution along with the multi-sensor-based
mapping approach and the multi-objective exploration strategy. Furthermore, deep-learning-based
human detection is used to detect the presence of a human from a 2D image. This information is
then used with a synchronized 3D point cloud to localize the human accurately. Moreover, unlike
regular grid sampling approaches, our proposed adaptive viewpoint sampling method solves the
local minimum problem. This approach enables rapid exploration and victims’ detection in a USAR
environment. Finally, the performance of the proposed approach was compared with other approaches
available in the literature, described in Section 4. Results showed that our approach performs better in
various metrics such as the number of iterations, traveled distance, and entropy reduction.
Figure 2. Flowchart of the System Model.
3.1. Multi-Sensor Approach for Victim Localization and Mapping
3.1.1. Vision-Based Victim Localization
The procedure for the vision-based victim localization is summarized in Algorithm 1. It involves
two main stages: victim detection, where the victim is found in the image frame; and mapping which
register the detection confidences in a 2D occupancy grid map.
Victim Detection:
In this work, Single-Shot Multi-box detector (SSD) [
] structure was adapted
in our classifier, trained it with visual object class challenge (VOC2007) dataset [
]. The SSD
structure is depicted in Figure 3. The input to the detection module is a 2D RGB image and the
desired output is a box overlapping the detected human with a probability reflecting the detection
confidence. The victim detection aims to find a human in a received RGB image.
Remote Sens. 2019,11, 2704 8 of 25
Figure 3. System Model of SSD detector. Taken from [21].
Victim Mapping:
Based on the analysis of the SSD detector, the minimum size of the detected box
in the training-set is 7
10 pixels. This box size corresponds to 0.6
0.8 in meter. To estimate the
distance between the detected human and the camera [
], the ratio
of the box B to the image is
along the y-axis is given as
yres =10
300 =
0.033 where
is re-calculated for the tangent of the
half-angle under which you see the object which given as
rp=rp·tan(v f ov
2) = r·tan(60
2) =
The distance
to the camera is estimated as
20.019 =
. The victim will appropriately
be located if its distance separation from the camera is within the camera depth operational range.
Algorithm 1 Vision-Based Victim Localization Scheme
• I — Input RGB Image
• P — Point Cloud synchronized with the RGB Image
CameraFOV — Camera Field of View
• MC— Current Occupancy Map for the Vision Detection
• MU— Updated Occupancy Map for the Vision Detection
1: L ← .initialize Las empty vector to hold victim locations
2: {B(C),k} ← DetectHuman(I)
is a matrix containing Box coordinates,
C= [c1
represents the box coordinates,
and kis a vector containing the human corresponding probabilities
3: for i=1:length(k)do
4: if ki>threshold then
5: PeExtractPointCloud(B(c))
6: CEuclideanClusterExtraction(Pe)
7: if C is empty then
8: continue
9: end if
10: sort {C}in ascending order
11: Cmajor C0
12: PFindCentroid(Cm ajor )
13: P2Dproject Pin x-y plane
14: L ← A ppend(L,P2D).append P2Dto the list of victim locations L
15: end if
16: end for
17: for all cells c∈ MC&cCameraFOV do
18: for all htdo
19: bel(ht)c
20: bel(ht) =
21: bel(ht) = η·p(dt|ht)·bel(ht)
22: end for
23: cbel(ht)
24: MUUpdateMap(MC,c)
25: end for
Remote Sens. 2019,11, 2704 9 of 25
3.1.2. Thermal-Based Victim Localization
The procedure for thermal-based victim localization summarized in Algorithm 2involves two
stages: victim detection, where the victim is found in the thermal image based on heat signature;
and thermal occupancy mapping; where detection confidences obtained from the thermal detector
is stored.
Victim Detection:
Thermal detection is used to detect human heat signature presented using the
infrared light which is part of the electromagnetic (EM) spectrum. This approach is useful when
the human cannot be detected in RGB images, especially, in dark lighting condition. The method
adopted here is a simple blob detector which locates blobs in a thermal image for a temperature
range of 34
to 38
(normal human temperature). The procedures for thermal detection is presented
in Figure 4which is composed of three stages. In the first stage, the thermal images are converted
into mono-image. Then, two thresholds,
Tmin =34
Tmax =38
are applied to the mono-image
to get a binary image that correspond
Tmin <T<Tmax
. After that, contouring is done to extract
regions which represented the human thermal location. The blob detector was implemented
using OpenCV where the minimum blob area is set to 20 pixels and a minimum adjacent distance
between blob is set to 25 pixels. A human face was detected as shown in Figure 4. A full-human
body can also be detected using the thermal detector by relaxing the choice of Tmin and Tmax.
Victim Mapping:
After a box is resolved on the human thermal image, the center of the box
calculated in the image frame give as
Bc= [(cx,max cx,min)/2, (cy,max cy,min)/2]T(5)
Then a ray is broadcast from the thermal camera center through the obtained
After identifying the corresponding 3D ray in the world frame, orthogonal projection is deployed
where the
is removed. The final location is stored as
Pproj = [Px
Figure 4. Demonstration of thermal detection using the Optris PI200.
Remote Sens. 2019,11, 2704 10 of 25
The thermal map is updated when using 2D rays by assigning
to 0.6 for the cells
corresponding to the 2-D ray and 0.4 for the other cells within the camera FOV. Then
is obtained by finding the belief distribution
bel (ht)
at time
using the previous
(ht) = ηP(Dt|ht)¯
bel(ht) = RP(ht|ut
. Both control
and the
distribution of the state
are used to determine the distribution of the current state
. This step is
known as the prediction. When the state space is finite, the integral in
results in a finite sum.
used here for normalization, to make sure that the resulting product is a probability function
whose integral is equal to 1. The map resolution is set to 0.2 m as the thermal-based victim localization
is a bearing-only localization and required multiple ray generations of different poses to sufficiently
locate a victim as shown in Figure 5.
Algorithm 2 Thermal-Based Victim Localization Scheme
• I — Input Thermal Image
CameraFOV — Camera Field of View
• MC— Current Occupancy Map for the Thermal Detection
• MU— Updated Occupancy Map for the Thermal Detection
1: R ← .initialize Ras empty vector to hold the rays
2: Imono ConvertToMono(I)
3: for all pixels p∈ Imono do
4: if Tmin <p<Tma x then
5: Ith(p) = 1
6: else
7: Ith(p) = 0
8: end if
9: end for
10: B(C)BlobDetection(Ithreshol d)
is a matrix containing detected Box coordinates in
C= [c1
represents the
box coordinates
11: if B(c)is not empty then
12: i=0
13: for all boxes cin B do
generate 2-D line ,
, that pass through the center of box
and terminate at the end of
15: R ← A ppend(R,L).append Lto the list of rays R
16: end for
17: end if
18: for all cells cMC&cCameraFOV do
19: for all htdo
20: bel(ht)c
21: bel(ht) =
22: bel(ht) = η·p(dt|ht)·bel(ht)
23: end for
24: cbel(ht)
25: MUUpdateMap(MC,c)
26: end for
Remote Sens. 2019,11, 2704 11 of 25
Figure 5. Thermal map update for three different robot poses.
3.1.3. Wireless-Based Victim Localization
In the literature, many various techniques are used to find the distance or the angle to localize
an unknown static node which in this work is assumed to be the cell-phone of the victim. One of the
popular methods for the ranging techniques is the Received Signal Strength (RSS) which provides a
distance indication to the unknown node. The robot will act as a moving anchor node and captures
multiples RSS readings with the aim of locating the victim. The proposed system assumes the
transmission power of victim wireless device is fixed and known. This section explains the log-normal
channel model which adopted in the wireless detection module. Also, the updating procedure for the
captured measurement is explained to speed up the victim search.
Victim Detection:
A victim can be detected wirelessly by monitoring a given transmitted signal
from the victim phone. In this configuration, the victim phone can be treated as a transmitter
while the wireless receiver can be placed on a robot. If no obstacle between the transmitter and
receiver is found, the free-space propagation model is used to predict the Received Signal Strength
in the line-of-sight (LOS) [
]. The RSS equation is derived from Friis Transmission formula [
(4π×dij )2(6)
is the Received Signal Strength from receiver node
at node transmitter node
at time
is the transmitted power,
is the transmitter gain,
is the receiver gain,
is the distance
from sensor node
at node
, and
is the wavelength of the signal. From Equation (6), the received
attenuates exponentially with the distance
]. The free-space path loss
derived from the equation above by 10 log the ratio of the transmitted power
to the received
and setting
=1 because, in most of the cases, the used antennas are isotropic
Remote Sens. 2019,11, 2704 12 of 25
antennas, which radiate equally in all direction, giving constant radiation pattern [
]. Thus,
the formula is the following:
PLF(dij )[dB] = 10 log Pt
Pr=20 log 4π×dij
When using the free-space model, the average received signal decreases with the distance between
the transmitter and receiver
in all the other real environments in a logarithmic way. Therefore,
the path-loss model generalized form can be obtained by changing the free-space path loss with
the path-loss exponent
depending on the environment, which is known as the log-distance
path-loss model which will result in the following formula [41]:
PLLD(dij )[dB] = PLF(d0) + 10ηlog dij
is the reference distance at which the path loss inherits the characteristics of free space
in (7) [
]. Every path between the transmitter and the receiver has different path loss since the
environment characteristics changes as the location of the receiver changes. Moreover, the signal
may not penetrate in the same way. For that reason, more realistic modeling of the transmission is
assumed which is the log-normal modeling.
PL(dij)[dB] = PLF(d0) + 10ηlog dij
is the path loss at distance
from the transmitter,
the path-loss model at the reference distance
which is constant.
is Gaussian random variable
with a zero mean and standard deviation σ[43].
Due to the presence of noise in the log-normal model, the path loss
will be different even
the same location with distance
. To obtain a relatively stable reading,
is recording for
samples ,then, an averaging process is done to get the measured
. The victim can be set to
be identified wireless if the detected distance is less than a specific distance threshold
. This is
because RSS reading is high when the transmitter is closer to the receiver leading to a less noise
variance. Hence, from a log-normal model, it is sufficient to state that the victim is found in the
measured distance dij satisfy the condition di j <dt
The same idea is used in weighted least-square approach where high weights are given for low
distances leading to a less RMSE in the localized node when compared to the conventional
lease-square approach which assumes constant noise variance across all measured distances [
Victim Mapping:
When RSS is used to locate an unknown node, a single measured distance is
not sufficient because the unknown node can be anywhere over a circle with radius equal to the
distance. That can be solved using trilateration as shown in Figure 6. In 2D space, trilateration
requires at least three-distance measurements from anchor nodes. The location of the unknown
node is the intersection of the three circles as show in Figure 6[45].
Using trilateration can lead to a problem in case all the measured distances are large (with high
noise variance), which can lead to a false located position. To alleviate this problem, the use of
occupancy grid is proposed when updating the map. The measured distance is compared based
on the criteria
dij <dt
. If this criteria it met, the measurement is trusted, and the map is updated
with victim probability within a circle of radius equal to
. Otherwise, the measured distance is
discarded. The updating process is shown in Figure 7.
Remote Sens. 2019,11, 2704 13 of 25
Figure 6. Trilateration
) Wireless map obtained at
first robot pose.
) Wireless map obtained at
second robot pose
) Wireless map obtained at
third robot pose
Figure 7. Wireless map update for three different robot poses.
3.1.4. Multi-Sensor Occupancy Grid Map Merging
In this work, three different victim-location maps are generated using three different sensors.
The probability of the victim’s presence in each map is obtained by the Bayesian-probabilistic
framework independently. These maps are then fused in a Bayesian-probabilistic framework to
obtain a merged map that indicates human presence probability. Following the Bayesian approach
in generating the multi-sensor occupancy grid may introduce undesired errors. For example, some
victim detectors may have less confidence compared to others due to sensor limitations. For this
reason, multiple sensors with different weights can be used for each victim detector. Let D represents
a victim detector and
is the confidence assigned for each detector. Each of the victim detectors
gives two quantities, which are the measured probability
of finding a human given detector
D and the weight
that reflects how much confidence assigned to a victim detector. The posterior
probability is given as:
P(C|D1. . . Dm)=α
is a normalized factor for all the weights. Figure 8shows the process of
updating a single cell in the combined map taken into consideration the different weights for the
victim modules implemented in each map. Let
be the combined map from the three sensors.
The merged map from the three sensors is used to monitor the maximum probability. The merged map
is obtained by re-scaling the three occupancy maps to the smallest resolution
which is
the lowest resolutions of the three used maps. The combined map is given as
Mapmerged =α
wi(C)Mapm,rescaled (11)
Remote Sens. 2019,11, 2704 14 of 25
Figure 8.
Merged map obtained from three maps of different resolutions where a single cell update
is highlighted.
3.2. Exploration
We adapt the NBV approach to explore the environment. The typical pipeline for NBV includes
viewpoint sampling, viewpoint evaluation, navigation and termination. In this work, two viewpoint
sampling methods are used for NBV evaluation and viewpoint selection which are the Regular Grid
Sampling Approach (RGSA) and a proposed Adaptive Grid Sampling Approach (AGSA). After that,
the different sample viewpoint are evaluated using a utility function. Then, the robot navigates to the
selected viewpoint and update the maps simultaneously. The search mission is terminated when a
termination condition is met.
In this work, the search process terminates when a victim is found or when all the viewpoints
provide a zero reward from the utility. The victim is assumed to be found in the maximum probability if
the occupancy map went above a specific threshold or if the number of iteration exceeded a pre-defined
number of iterations.
The proposed NBV approach uses a multi-objective utility function which is composed of three
components. The first function
Uex p
is the information gain, which focuses on the exploration objective
while the function
is biased towards exploring occupancy grids with higher victim presence
probability. Traveling further distances is penalized by the function
to compromise between
satisfying the objectives and the traveling cost. The multi-objective utility defined as:
U=wex p ×Uexp +wvic tim ×Uvictim ×Udist
= wex p × HIG
max{HIG }!+wvictim ×e−| ζ|β!×ewdistd
ζ=1max{PMAX }
is the probability of a cell in a viewpoint,
is the maximum probability
cell within a given viewpoint,
is the information gain (IG) for a given viewpoint,
is the distance
from the current location to the evaluated viewpoint,
is the exponential penalization parameter while
wex p
are the weights used to control each of the three objectives. The impact of
the selectivity of
is illustrated in Figure 9, used to control how much emphasis should be put
on victim detection vs environment exploration.
Remote Sens. 2019,11, 2704 15 of 25
Figure 9. Impact of the βparameter on the selectivity of Uvictim function where PMAX =0.9.
After viewpoint evaluation, different navigation schemes are used based on the selected view
generation method. For Regular Grid Sampling approach, a collision-checking is performed over the
line between the vehicle and the evaluated viewpoint, to ensure that the path is safe for the drone to
traverse. For adaptive sampling, initially, a straight line is performed. In case it failed, then an A* [
planner is used.
Upon finding a victim, the search process is terminated to allow the robot to report back the victim
location to the rescue team. The victim is assumed to be found in the maximum probability if the
occupancy map went above a specific threshold or if the number of iteration exceeded a pre-defined
number of iterations. This method can also be extended to incorporate more than one victim.
4. Experimental Results
Simulation experiments were performed on an Asus-ROG-STRIX Laptop (Intel Core i7-6700HQ
@ 2.60 GHz
8, 15.6 GB RAM). The victim localization system was implemented on Ubuntu-kinetic
16.04 using the Robot Operating System (ROS) [
], where the system consists of a group of nodes
with data sharing capability. Hence, the use of ROS simply the direct implementation of the system on
the hardware platform. The environment was simulated using Gazebo, and the code programmed
in both C++ and Python. The GridMap library [
] was used to represent the 2-D occupancy grid,
navigation was done using Octomap [
] and point cloud data (PCL) was handled using the Point
Cloud Library (PCL) [
]. An open source implementation of the proposed approach can be found
at [51] for further use.
4.1. Vehicle Model and Environment
The adopted vehicle model in the tests is based on a floating sensor approach, where the sensor
moves virtually in space and captures data from the selected viewpoints. The floating sensor is
presented as a virtual common base link among candidate sensors. Static transformations are
used to refer the sensors links to the floating sensor base link. Upon moving the floating sensor,
the way-point command is applied in the floating sensor local frame where the sensors take their
new positions based on their respective transformations. Figure 10 shows the floating sensor along
with its components. By using the floating sensor approach we skip continuous data that could be
acquired during the transition between the viewpoints, resulting in a better evaluating setup where
only data and information gathered from the viewpoints is used to conduct the exploration and victim
detection. The system is equipped with three sensors, namely an RGB-D camera, a thermal camera,
Remote Sens. 2019,11, 2704 16 of 25
and a wireless adapter. The specifications of the RGB-D camera, thermal camera, and wirelesses
transmitter specifications are presented in Table 1. Both RGB-D and thermal camera are mounted
underneath to increase the unobstructed field of view (FOV). The proposed system can adapt to
any UAVs.
Figure 10. Floating sensor model.
Table 1. RGB camera, Thermal camera, and Wireless Sensor specifications.
Specs RGB-D Thermal Specs Wireless
HFOV 80 60 Type
VFOV 60 43 N of samples 150
Resolution 640 ×480 px 160 ×120 px frequency 109Hz
Range 0.5–5.0 m N/A SNR 5 dB
Position y 0.05 m 0.09 m Position 0.05 m
Pitch 15 15 - -
The simulated environment is shown in Figure 11 with dimensions 13 m
15 m. This environment
has multiple rooms with dead-ends and a corridor which can demonstrate the problems that the system
can experiences and how to be tackled. Due to the difficulty in simulating an actual heat signature in
the environment, the output of the thermal blob detector is simulated by the detection of the victim
red-shirt from the captured RGB image.
Figure 11. Simulated environment.
4.2. Tests Scenario and Parameters
The tests are carried out to show the effectiveness of the proposed approach, namely the combined
map, viewpoint sampling method, and the multi-objective utility function. The executed tests are
divided into two classes based on the sampling approaches, Regular Grid Sampling Approach (RGSA)
and the proposed Adaptive Grid Sampling Approach (AGSA). The proposed AGSA is compared with
the state-of-the-art RGSA [
]. In each sampling approach, the proposed merged map including
Remote Sens. 2019,11, 2704 17 of 25
the wireless map, is compared with the state-of-the-art single generated maps using single sensor.
In addition, the proposed utility (
) function is compared with the state-of-the-art utility functions
) and (
), weighted information gain [
] obtained using Equation (3) and max probability [
obtained using Equation (4), respectively. The evaluation metrics used for each test are the number of
iteration, distance traveled, and entropy reduction. The parameters used for RGSA and AGSA are
shown in Tables 2and 3, respectively.
In each test, the four different maps: vision, thermal, wireless, and merged map were used
to test their performance. The viewpoint evaluation is executed using the three utilities: weighted
information gain (
), improved max utility (
), and the proposed multi-objective utility (
). Table 4
lists the parameters used in the map generation.
Table 2. Regular Grid Sampling Approach Parameters.
Regular Grid Sampling Parameters Value
x displacement between samples 0.8
y displacement between samples 0.8
yaw step-size π
Straight-line Collision Check Box size 0.8
Table 3. Adaptive Grid Sampling Approach Parameters.
Adaptive Grid Sampling Parameters Value
Adaptive scale factor base a2
Entropy change per cell Ev0.1 m
Agrid resolution 0.3 m
Acircular distance to goal 0.3 m
circular distance to robot in case of
no-path found
0.5 m
Table 4. Parameters used in the simulation.
Vision Map Parameters Value Thermal Map Parameters Value
map resolution 0.5 map resolution 0.2
P(D|H)0.8 P(D|H)0.7
P(D|H)0.1 P(D|H)0.3
P(D|H)0.2 P(D|H)0.3
P(D|H)0.9 P(D|H)0.7
SSD probability threshold 0.8
minimum cluster size 10 points minimum cluster size 40 pixels
minimum distance between clusters 0.01 m minimum distance between clusters 30 pixels
Wireless Map Parameters Value Merged Map Parameters Value
map resolution 0.5 map resolution 0.5
P(D|H)0.7 vision map weight w1(C)0.65
P(D|H)0.3 thermal map weighted w2(C)0.2
P(D|H)0.3 wireless map weighted w3(C)0.15
wireless distance threshold 5 m
The Octomap resolution is set to 0.2 m. A termination condition of maximum iteration is used
in all the tests where the maximum iteration
is set to 120, and the victim probability threshold
is set to 0.9 (the probability at which a victim is considered found). For the utilities, The distance
is set to 0.05, while for the proposed utilities, the weights are selected as
wexp =
1 and
wvictim =
2 when evaluated in the vision and thermal map. For the wireless map, it is assigned as
wexp =
1 and
wvictim =
1. The penalization parameter
is set to 1. The metrics used in the evaluation
Remote Sens. 2019,11, 2704 18 of 25
are the total number of iterations before termination (It), total map entropy before the victim search
begin subtracted by total map entropy after the victim is found (TER), and the total distance traveled in
meters (D). Some of the parameters used during the experimentation were carefully tuned to achieve
the optimum setup given the experimental environment. The benchmarking, however, was conducted
with the same parameters for all the methods.
4.3. Regular Grid Sampling Approach (RGSA) Results
The obtained results of the different utility functions are shown in Figure 12a in terms of entropy
and Figure 12b in terms of distance traveled. For comparison, the same results listed in Table 5.
(a) (b)
Figure 12.
Regular Grid Sampling Approach Results. (
) Entropy vs. Iteration using Regular Grid
Sampling Approach. The fastest method is indicated by the magenta vertical dash line. (
) Distance
vs. Iteration using Regular Grid Sampling Approach. The fastest method is indicated by the magenta
vertical dash line.
Table 5.
Results of RGSA tests. VF: victim Found It: Iterations, D: Distance, TER: Total
Entropy Reduction. The numbers in bold indicate better performance.
Scenario VF It D (m) TER
U1-vision No 120 101.1 368.9
U2-vision Yes 81 72.4 382.2
U3-vision Yes 77 68.0 440.9
U1-thermal Yes 100 88.1 2494.4
U2-thermal Yes 104 91.71 2067.2
U3-thermal Yes 98 83.46 2395.0
U1-wireless No 120 75.59 732
U2-wireless No 120 85.17 472.1
U3-wireless No 120 76.20 824.2
U1-merged No 120 98.46 2810
U2-merged No 120 17.24 587
U3-merged Yes 73 64.34 2344.3
Remote Sens. 2019,11, 2704 19 of 25
In the vision map, the proposed utility (
) has fewer iterations (77), entropy reduction,
and distance traveled when compared to the other utilities within the same map. It is also noticeable
that the weighed information gain (
) failed to find the victim as it exceeds the maximum iteration
120. This because of
experience a local minimum indicated by the stabilized entropy curve for
an extended number of iterations. The max utility (
) succeeded in locating the victim but with a
relatively higher number of iterations as this utility does not have an exploration capability.
In the thermal map,
has fewer iterations (98), and distance traveled when compared to the
other utilities within the same map. Also,
has more iterations but a better entropy reduction
compared to
has the worst performance with a higher number of iterations and distance travel
to locate the victim.
In the wireless map, all the three methods failed in locating the victim where the total iterations
reached the maximum allowable iteration. That is due to the nature of the wireless transmitter wherein
the map update is done in a circular manner making it easy for the robot to fall in a local minimum.
In the merged map,
has the best performance with a lower number of iterations (73), entropy
reduction and distance traveled compared to the other utilities in the same map as well as across all
maps. In
, the method failed to locate the victim as the vehicle experience a set of local minima
which was mainly inherited from the wireless component. The merged map with
failed because
the utility falls in an inescapable local minimum. Figure 13 shows the path trajectory of the vehicle in
U3-merged test using RGSA.
Figure 13.
Vehicle traversed path in the
-merged test using RGSA where (
) start location,
) direct-path trajectory, (
) location of the detected local minimum where A* Planner was invoked to
generate path, (d) end location.
4.4. Adaptive Grid Sampling Approach (AGSA) Results
Here, the Adaptive Grid Sampling Approach (AGSA) is used to test the performance of the
utilities when implemented on the individual maps as well as the merged map. The settings used in
the AGSA are shown in Table 3. The obtained results of the different tests shown in Figure 14a in terms
of entropy and Figure 14b in terms of distance traveled. Also, the instants when the AGSA detected
local minimum as shown in Figure 15. For comparison, the same results listed in Table 6.
Remote Sens. 2019,11, 2704 20 of 25
(a) (b)
Figure 14.
Regular Adaptive Grid Sampling Approach Results. (
) Entropy vs. Iteration using Adaptive
Grid Sampling Approach. The fastest method is indicated by the magenta vertical dash line. (
) Travel
distance vs. Iteration using Adaptive Grid Sampling Approach. The fastest method is indicated by the
magenta vertical dash line.
Figure 15. Generator states vs. iterations using Adaptive Grid Sampling Approach.
Remote Sens. 2019,11, 2704 21 of 25
Table 6.
Results of AGSA tests. VF: victim Found It: Iterations, D: Distance, TER: Total Entropy
Reduction. The numbers in bold indicate better performance
Scenario VF It D (m) TER
U1-vision No 84 78.4 645.52
U2-vision Yes 81 72.4 718.06
U3-vision Yes 80 68.4 671.1
U1-thermal Yes 89 88.7 4608.2
U2-thermal Yes 104 91.7 4932.7
U3-thermal Yes 86 75.6 4752.1
U1-wireless No 120 223 490.8
U2-wireless No 120 115.4 245.8
U3-wireless No 120 223 490.8
U1-merged Yes 76 73.6 4459.4
U2-merged No 120 96.8 4959.8
U3-merged Yes 66 61.6 4772.7
In the vision map, the proposed utility (
) has fewer iterations to find the victim (80).
The weighed information gain (
) succeeds due to the local minimum’s problem been mitigated by
the AGSA. The max utility (
) has the same performance as in the case of RGSA. That is because the
works by selecting a viewpoint with maximum entropy, making it hard to fall under the threshold
Ev,threshold .
In the thermal map,
has fewer iterations to find the victim (86). In comparison to
a lower entropy reduction but more iterations to locate the victim.
has the worst performance
compared to the rest of the method. Also, the performance
is the same as in the case of RGSA. In the
wireless map, all the utilities failed, as they require more iterations to find the victim.
In the merged map,
has fewer iterations to find the victim (66). Unlike in RGSA,
in locating the victim as the vehicle was able to escape local minima.
failed in locating the victim as
this method does not encourage the vehicle to explore upon initial detection of the victim. That can
lead to a problem if the victim is located through the wireless sensor, but an obstacle such as a wall
occludes the victim. Under this scenario, it is preferable to explore and found a line-of-sight with the
victim allowing the vision and the thermal module to operate, a feature the max utility is lacking.
Across all the tests,
-merged using ARGS has the best performance in terms of iterations and
distance travel with a relatively better entropy reduction.
Figure 16 shows the traveled path of the vehicle in the
-merged test using AGSA. A video
that demonstrates the efficiency of indoor environment exploration using AGSA is provided in [
A video that represents a real-time experiment for victim detection is shown in [
], which was
conducted to demonstrate the effectiveness of the proposed work.
Remote Sens. 2019,11, 2704 22 of 25
Figure 16.
Vehicle traversed path in the
-merged test using AGSA where (
) start location,
) direct-path trajectory, (
) location of the detected local minimum where A* Planner was invoked to
generate path, (d) end location.
4.5. Discussion Results
The best performance was obtained when merging the probabilistic information in an occupancy
grid map, and using it to guide the search strategy. In this case, the combined information merged in a
probabilistic Bayesian framework is used instead of local optima of each system individually.
Using the Adaptive Grid Sampling Approach (AGSA), the proposed merged map with the
proposed utility function outperforms both traditional single-vision and thermal maps using the
proposed utility function on the number of iteration and the traveled distance. The reduction in the
number of iteration was in a range of 17.5% to 23.25% while the reduction in the traveled distance was
in a range of 10% to 18.5% respectively.
In addition, using the Regular Grid Sampling Approach (RGSA), the proposed merged map with
the proposed utility function outperforms both proposed single-vision map and single-thermal map
using proposed utility function on the number of iteration and the traveled distance. The reduction in
the number of iteration was in a range of 5.2% to 25.5% while the reduction in the traveled distance
was in a range of 5.4% to 23%, respectively.
Hence, the merged map results in a moderate performance over the other single maps when using
the Regular Grid Sampling Approach. However, combining the merged map with the Adaptive Grid
Sampling Approach presents higher performance.
5. Conclusions
In this work we proposed an USAR exploration approach based on multi-sensor fusion that
uses vision, thermal, and wireless sensors to generate a probabilistic 2D victim-location maps.
A multi-objective utility function exploited the fused 2D victim-location map to generate and evaluate
viewpoints in a NBV fashion. The evaluation process targets three desired objectives, specifically
exploration, victim detection, and traveled distance. Additionally, an adaptive grid sampling
algorithm (AGSA) was proposed to resolve the local minimum issue occurs in the Regular Grid
Sampling Approach (RGSA). Experimental results showed that the merged map results in a moderate
performance over the other single maps when using the Regular Grid Sampling Approach. However,
combining the merged map with the Adaptive Grid Sampling Approach presents higher performance.
In both cases, the results demonstrated that the proposed utility function using merged map performs
better than methods that rely on a single sensor in the time it takes to detect the victim, distance
Remote Sens. 2019,11, 2704 23 of 25
traveled, and information gain reduction. The tests were conducted in a monitored environment yet
there are different sources of uncertainty. In future work, we will extend the system to address these
uncertainty with scalability, and conduct experimental tests in real-world scenarios.
Author Contributions:
Conceptualization, A.G., R.A., T.T., and N.A.; methodology, A.G., R.A. and T.T.; software,
A.G., R.A., and T.T.; validation, A.G. and T.T. formal analysis, A.G., R.A., U.A., and T.T.; investigation, A.G., R.A.,
U.A., and T.T.; resources, A.G., U.A., T.T., N.A., and R.A.; data curation, A.G., R.A., and T.T.; writing—original
draft preparation, A.G. and R.A.; writing—review and editing, R.A. and T.T.; visualization, A.G., R.A.; supervision,
T.T., N.A., and L.S.; project administration, T.T., N.A., and L.S.
This work was supported by ICT fund for “A Semi Autonomous Mobile Robotic C4ISR System for
Urban Search and Rescue” project.
This publication acknowledges the support by the Khalifa University of Science and
Technology under Award No. RC1-2018-KUCARS.
Conflicts of Interest: The authors declare no conflict of interest.
Naidoo, Y.; Stopforth, R.; Bright, G. Development of an UAV for search & rescue applications. In Proceedings
of the IEEE Africon’11, Livingstone, Zambia, 13–15 September 2011; pp. 1–6.
Waharte, S.; Trigoni, N. Supporting Search and Rescue Operations with UAVs. In Proceedings of the
2010 International Conference on Emerging Security Technologies, Canterbury, UK, 6–7 September 2010;
pp. 142–147. [CrossRef]
Erdelj, M.; Natalizio, E.; Chowdhury, K.R.; Akyildiz, I.F. Help from the sky: Leveraging UAVs for disaster
management. IEEE Pervasive Comput. 2017,16, 24–32. [CrossRef]
Kohlbrecher, S.; Meyer, J.; Graber, T.; Petersen, K.; Klingauf, U.; von Stryk, O. Hector Open Source Modules
for Autonomous Mapping and Navigation with Rescue Robots. In RoboCup 2013: Robot World Cup XVII;
Behnke, S., Veloso, M., Visser, A., Xiong, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 624–631.
Milas, A.S.; Cracknell, A.P.; Warner, T.A. Drones—The third generation source of remote sensing data. Int. J.
Remote Sens. 2018,39, 7125–7137. [CrossRef]
Malihi, S.; Valadan Zoej, M.; Hahn, M. Large-Scale Accurate Reconstruction of Buildings Employing Point
Clouds Generated from UAV Imagery. Remote Sens. 2018,10, 1148. [CrossRef]
Balsa-Barreiro, J.; Fritsch, D. Generation of visually aesthetic and detailed 3D models of historical cities
by using laser scanning and digital photogrammetry. Digit. Appl. Archaeol. Cult. Herit.
,8, 57–64.
Wang, Y.; Tian, F.; Huang, Y.; Wang, J.; Wei, C. Monitoring coal fires in Datong coalfield using multi-source
remote sensing data. Trans. Nonferrous Met. Soc. China 2015,25, 3421–3428. [CrossRef]
Kinect Sensor for Xbox 360 Components. Available online:
accessories/kinect-sensor-components (accessed on 13 November 2019).
FLIR DUO PRO R 640 13 mm Dual Sensor. Available online:
640-13mm.htm (accessed on 13 November 2019).
Hahn, R.; Lang, D.; Selich, M.; Paulus, D. Heat mapping for improved victim detection. In Proceedings of the
2011 IEEE International Symposium on Safety, Security, and Rescue Robotics, Kyoto, Japan, 1–5 November
2011; pp. 116–121.
González-Banos, H.H.; Latombe, J.C. Navigation strategies for exploring indoor environments. Int. J.
Robot. Res. 2002,21, 829–848. [CrossRef]
Cieslewski, T.; Kaufmann, E.; Scaramuzza, D. Rapid Exploration with Multi-Rotors: A Frontier Selection
Method for High Speed Flight. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS 2017), Vancouver, BC, Canada, 24–28 September 2017.
Bircher, A.; Kamel, M.; Alexis, K.; Oleynikova, H.; Siegwart, R. Receding Horizon “Next-Best-View” Planner
for 3D Exploration. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation
(ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 1462–1468. [CrossRef]
Portmann, J.; Lynen, S.; Chli, M.; Siegwart, R. People detection and tracking from aerial thermal views.
In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong,
China, 31 May–7 June 2014; pp. 1794–1800. [CrossRef]
Remote Sens. 2019,11, 2704 24 of 25
Riaz, I.; Piao, J.; Shin, H. Human detection by using centrist features for thermal images. In Proceedings
of the IADIS International Conference Computer Graphics, Visualization, Computer Vision and Image
Processing, Prague, Czech Republic, 22–24 July 2013.
Uzun, Y.; Balcılar, M.; Mahmoodi, K.; Davletov, F.; Amasyalı, M.F.; Yavuz, S. Usage of HoG (histograms
of oriented gradients) features for victim detection at disaster areas. In Proceedings of the 2013 8th
International Conference on Electrical and Electronics Engineering (ELECO), Bursa, Turkey, 28–30 November
2013; pp. 535–538.
Sulistijono, I.A.; Risnumawan, A. From concrete to abstract: Multilayer neural networks for disaster victims
detection. In Proceedings of the 2016 International Electronics Symposium (IES), Denpasar, Indonesia,
29–30 September 2016; pp. 93–98.
Abduldayem, A.; Gan, D.; Seneviratne, L.D.; Taha, T. 3D Reconstruction of Complex Structures with
Online Profiling and Adaptive Viewpoint Sampling. In Proceedings of the International Micro Air Vehicle
Conference and Flight Competition, Toulouse, France, 18–21 September 2017; pp. 278–285.
Xia, D.X.; Li, S.Z. Rotation angle recovery for rotation invariant detector in lying pose human body detection.
J. Eng. 2015,2015, 160–163. [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox
detector. In Lecture Notes in Computer Science; Springer International Publishing: New York, NY, USA, 2016;
pp. 21–37. [CrossRef]
Zhang, L.; Lin, L.; Liang, X.; He, K. Is Faster R-CNN Doing Well for Pedestrian Detection? arXiv
Louie, W.Y.G.; Nejat, G. A victim identification methodology for rescue robots operating in cluttered USAR
environments. Adv. Robot. 2013,27, 373–384. [CrossRef]
Hadi, H.S.; Rosbi, M.; Sheikh, U.U.; Amin, S.H.M. Fusion of thermal and depth images for occlusion
handling for human detection from mobile robot. In Proceedings of the 2015 10th Asian Control Conference
(ASCC), Kota Kinabalu, Malaysia, 31 May–3 June 2015; pp. 1–5. [CrossRef]
Hu, Z.; Ai, H.; Ren, H.; Zhang, Y. Fast human detection in RGB-D images based on color-depth joint feature
learning. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix,
AZ, USA, 25–28 September 2016; pp. 266–270. [CrossRef]
Utasi, Á.; Benedek, C. A Bayesian approach on people localization in multicamera systems. IEEE Trans.
Circuits Syst. Video Technol. 2012,23, 105–115. [CrossRef]
Cho, H.; Seo, Y.W.; Kumar, B.V.K.V.; Rajkumar, R.R. A multi-sensor fusion system for moving object detection
and tracking in urban driving environments. In Proceedings of the 2014 IEEE International Conference on
Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 1836–1843. [CrossRef]
Bongard, J. Probabilistic Robotics. Sebastian Thrun, Wolfram Burgard, and Dieter Fox. (2005, MIT Press.)
647 pages. Artif. Life 2008,14, 227–229. [CrossRef]
Yamauchi, B. A frontier-based approach for autonomous exploration. In Proceedings of the 1997 IEEE
International Symposium on Computational Intelligence in Robotics and Automation (CIRA’97), Monterey,
CA, USA, 10–11 July 1997; pp. 146–151.
Heng, L.; Gotovos, A.; Krause, A.; Pollefeys, M. Efficient visual exploration and coverage with a micro aerial
vehicle in unknown environments. In Proceedings of the 2015 IEEE International Conference on Robotics
and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 1071–1078.
Paul, G.; Webb, S.; Liu, D.; Dissanayake, G. Autonomous robot manipulator-based exploration and mapping
system for bridge maintenance. Robot. Auton. Syst. 2011,59, 543–554. [CrossRef]
Al khawaldah, M.; Nuchter, A. Enhanced frontier-based exploration for indoor environment with multiple
robots. Adv. Robot. 2015,29. [CrossRef]
Karaman, S.; Frazzoli, E. Sampling-based algorithms for optimal motion planning. Int. J. Robot. Res.
30, 846–894. [CrossRef]
Verbiest, K.; Berrabah, S.A.; Colon, E. Autonomous Frontier Based Exploration for Mobile Robots.
In Intelligent Robotics and Applications; Lecture Notes in Computer Science; Springer: Cham, Switzerland,
2015; pp. 3–13. [CrossRef]
Delmerico, J.; Isler, S.; Sabzevari, R.; Scaramuzza, D. A comparison of volumetric information gain metrics
for active 3D object reconstruction. Auton. Robots 2018,42, 197–208. [CrossRef]
Remote Sens. 2019,11, 2704 25 of 25
Kaufman, E.; Lee, T.; Ai, Z. Autonomous exploration by expected information gain from probabilistic
occupancy grid mapping. In Proceedings of the 2016 IEEE International Conference on Simulation, Modeling,
and Programming for Autonomous Robots (SIMPAR), San Francisco, CA, USA, 13–16 December 2016;
pp. 246–251. [CrossRef]
Batista, N.C.; Pereira, G.A.S. A Probabilistic Approach for Fusing People Detectors. J. Control Autom.
Electr. Syst. 2015,26, 616–629. [CrossRef]
Isler, S.; Sabzevari, R.; Delmerico, J.; Scaramuzza, D. An Information Gain Formulation for Active Volumetric
3D Reconstruction. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation
(ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 3477–3484.
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes
Challenge 2007 (VOC2007) Results. Available online:
workshop/index.html (accessed on 13 November 2019).
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press:
Cambridge, UK; New York, NY, USA, 2004.
41. Pozar, D.M. Microwave and Rf Design of Wireless Systems, 1st ed.; Wiley: Hoboken, NJ, USA, 2000.
Kulaib, A.R. Efficient and Accurate Techniques for Cooperative Localization in Wireless Sensor Networks.
Ph.D. Thesis, Khalifa University, Abu Dhabi, UAE, 2014.
Tran-Xuan, C.; Vu, V.H.; Koo, I. Calibration mechanism for RSS based localization method in wireless sensor
networks. In Proceedings of the 11th International Conference on Advanced Communication Technology
(ICACT 2009), Phoenix Park, Korea, 15–18 February 2009; Volume 1, pp. 560–563.
Tarrío, P.; Bernardos, A.M.; Casar, J.R. Weighted least squares techniques for improved received signal
strength based localization. Sensors 2011,11, 8569–8592. [CrossRef]
Patwari, N.; Ash, J.; Kyperountas, S.; Hero, A.; Moses, R.; Correal, N. Locating the nodes: cooperative
localization in wireless sensor networks. IEEE Signal Proc. Mag. 2005,22, 54–69. [CrossRef]
Zeng, W.; Church, R.L. Finding shortest paths on real road networks: The case for A. Int. J. Geogr. Inf. Sci.
2009,23, 531–543. [CrossRef]
Koubâa, A. Robot Operating System (ROS): The Complete Reference (Volume 1), 1st ed.; Springer International
Publishing: Heidelberg, Germany, 2016.
Fankhauser, P.; Hutter, M. A universal grid map library: Implementation and use case for rough terrain
navigation. In Robot Operating System (ROS); Koubaa, A., Ed.; Studies in Computational Intelligence;
Springer: Cham, Switzerland, 2016; pp. 99–120.
Hornung, A.; Wurm, K.M.; Bennewitz, M.; Stachniss, C.; Burgard, W. OctoMap: An efficient probabilistic 3D
mapping framework based on octrees. Auton. Robots 2013,34, 189–206. [CrossRef]
Rusu, R.B.; Cousins, S. 3D is here: Point Cloud Library (PCL). In Proceedings of the IEEE International
Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011.
Goian, A. Open Source Implementation of the Proposed Algorithm as a ROS Package. Available online: (accessed on 13 November 2019).
Ashour, R. Indoor Environment Exploration Using Adaptive Grid Sampling Approach, 2019. Available
online: (accessed on 13 November 2019).
Ashour, R. Victim Localization in USAR Scenario Exploiting Multi-Layer Mapping Structure, 2019. Available
online: (accessed on 13 November 2019).
2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (
... In the current state of the art, various contributions studied the problem of object detection, including human detection [5]- [12] and semantic mapping [13]- [18]. However, most of the work focused on locating the victims inside the VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. ...
... Different robotics platforms are used in Search and Rescue missions to perform various tasks such as victim detection [12], exploration for mapping and 3D reconstruction [13], and risk assessment [19], [20]. The first responders should be aware of the environmental situation before intervention. ...
Full-text available
This paper proposes an innovative hazard identification and risk assessment mapping model for Urban Search and Rescue (USAR) environments, concentrating on a 3D mapping of the environment and performing grid-level semantic labeling to recognize all hazards types found in the scene and to distinguish their risk severity level. The introduced strategy employs a deep learning model to create semantic segments for hazard objects in 2D images and create semantically annotated point clouds that encapsulate occupancy and semantic annotations such as hazard type and risk severity level. After that, a 3D semantic map that provides situational awareness about the risk in the environment is built using the annotated point cloud. The proposed strategy is evaluated in a realistic simulated indoor environment, and the results show that the system successfully generates a risk assessment map. Further, an open-source package for the proposed approach is provided online for testing and reproducibility.
... Autonomous exploration enables robots to be capable of actively perceiving the environment, which has played an increasingly important role in various applications, such as monitoring environmental quality [1,2], precision agriculture [3], search and rescue [4][5][6], and open-sea exploration [7,8]. There has been some valuable work on navigation using GNSS [9,10], but in most cases, indoor robots can only rely on their carried sensors for navigation. ...
... Sens. 2021, 13, x FOR PEER REVIEW6 The system architecture of the proposed autonomous exploration framework. ...
Full-text available
Autonomous exploration and remote sensing using robots have gained increasing attention in recent years and aims to maximize information collection regarding the external world without human intervention. However, incomplete frontier detection, an inability to eliminate inefficient frontiers, and incomplete evaluation limit further improvements in autonomous exploration efficiency. This article provides a systematic solution for ground mobile robot exploration with high efficiency. Firstly, an integrated frontier detection and maintenance method is proposed, which incrementally discovers potential frontiers and achieves incremental maintenance of the safe and informative frontiers by updating the distance map locally. Secondly, we propose a novel multiple paths planning method to generate multiple paths from the robot position to the unexplored frontiers. Then, we use the proposed utility function to select the optimal path and improve its smoothness using an iterative optimization strategy. Ultimately, the model predictive control (MPC) method is applied to track the smooth path. Simulation experiments on typical environments demonstrate that compared with the benchmark methods, the proposed method reduce the path length by 27.07% and the exploration time by 27.09% on average. The real-world experimental results also reveal that our proposed method can achieve complete mapping with fewer repetitive paths.
... The significant impact of the unmanned aerial vehicles (UAVs) in various robotics fields made them a promising solution for urban search and rescue tasks. For victim detection, the UAV could utilize the onboard sensors and the onboard intelligence to accurately detect and locate victims in disaster scenes in order to increase the likelihood of finding victims [18][19][20]. For mapping and 3D reconstruction, the UAV should be capable of reconstructing the scene in 3D and locating the contextual and rich information that can be used by the rescuers' team. ...
... Object detection task is concerned with discovering instances of semantic objects that belong to a particular class, such as human, table, or chair. In the literature, a variety of mechanisms are used to detect objects either by using a single sensor or fusing multiple sensors [20]. ...
Full-text available
This paper presents a strategy to autonomously explore unknown indoor environments, focusing on 3D mapping of the environment and performing grid level semantic labeling to identify all available objects. Unlike conventional exploration techniques that utilize geometric heuristics and information gain theory on an occupancy grid map, the work presented in this paper considers semantic information, such as the class of objects, in order to gear the exploration towards environmental segmentation and object labeling. The proposed approach utilizes deep learning to map 2D semantically segmented images into 3D semantic point clouds that encapsulate both occupancy and semantic annotations. A next-best-view exploration algorithm is employed to iteratively explore and label all the objects in the environment using a novel utility function that balances exploration and semantic object labeling. The proposed strategy was evaluated in a realistically simulated indoor environment, and results were benchmarked against other exploration strategies.
... Also, the distraction of the sound from the environment will give a significant effect on the performance of the system. Goian et al. [8], a deep learning-based human detection algorithm is proposed to locate the presence of a human in a disaster area by processing the data collected from RBG camera, thermal and wireless sensor. However, this approach has a limitation where the victim can only be identified when the victim is located on the surface. ...
Full-text available
p>Industrial growth has increased the number of jobs hence increase the number of employees. Therefore, it is impossible to track the location of all employees in the same building at the same time as they are placed in a different department. In this work, a real-time indoor human tracking system is developed to determine the location of employees in a real-time implementation. In this work, the long-range (LoRa) technology is used as the communication medium to establish the communication between the tracker and the gateway in the developed system due to its low power with high coverage range besides requires low cost for deployment. The received signal strength indicator (RSSI) based positioning method is used to measure the power level at the receiver which is the gateway to determine the location of the employees. Different scenarios have been considered to evaluate the performance of the developed system in terms of precision and reliability. This includes the size of the area, the number of obstacles in the considered area, and the height of the tracker and the gateway. A real-time testbed implementation has been conducted to evaluate the performance of the developed system and the results show that the system has high precision and are reliable for all considered scenarios.</p
... Mobile robot technology [1] has been developed in various fields including industrial production [2], space exploration [3], disaster relief [4], etc. Simultaneous positioning and mapping (SLAM), as one of the core technologies of mobile robot, uses a single vision sensor or LiDAR to construct environment [5]. The LiDAR offers the spatial position of the targets, and cameras can provide details such as the color and profile of the object. ...
For sensory data fusion, a calibration method between 3D light detection and ranging (LiDAR) and color camera based on ranging statistical characteristics and improved RANSAC algorithm is proposed. The multi-frame LiDAR point cloud data of the calibration triangular board are recorded. The scanned points with close angles are defined a cluster within same degrees. Furthermore, accurate points are preserved using statistical filtering based on Gaussian distribution. Afterwards, the plane and edge parameters of the triangular board are estimated by the reserved point cloud using improved the random sample consensus (RANSAC) algorithm to obtain the 3D locations of the vertices. Meanwhile, corner points in the image can be extracted manually. Finally, the projection matrix between the camera and the LiDAR is estimated by using the 2D-3D correspondences in different positions. The projection errors of different frames and corresponding points are calculated. The results demonstrate that the average error with 300 frames is reduced by 23.05% compared to 1 frame. Moreover, the standard deviation diminishes with the increasing of corresponding points. The reliability and advantage of the method are verified compared with other state-of-art methods. It provides theoretical and technical support for low resolution LiDAR.
... SAR missions are generally time critical, where any delay can result in potential human losses, and such missions often occur in unfriendly environments that are likely to be detrimental to manned rescues [12]. Therefore, an increasing recognition of the potential of UAVs for SAR missions is evident in the literature (e.g., [13][14][15][16][17][18][19]). In fact, the usage of UAVs in such operations has already been demonstrated in the events of 9/11, after the collapse of the World Trade Center in New York [20]. ...
Full-text available
The challenge concerning the optimal allocation of tasks across multiple unmanned aerial vehicles (multi-UAVs) has significantly spurred research interest due to its contribution to the success of various fleet missions. This challenge becomes more complex in time-constrained missions, particularly if they are conducted in hostile environments, such as search and rescue (SAR) missions. In this study, a novel fish-inspired algorithm for multi-UAV missions (FIAM) for task allocation is proposed, which was inspired by the adaptive schooling and foraging behaviors of fish. FIAM shows that UAVs in an SAR mission can be similarly programmed to aggregate in groups to swiftly survey disaster areas and rescue-discovered survivors. FIAM’s performance was compared with three long-standing multi-UAV task allocation (MUTA) paradigms, namely, opportunistic task allocation scheme (OTA), auction-based scheme, and ant-colony optimization (ACO). Furthermore, the proposed algorithm was also compared with the recently proposed locust-inspired algorithm for MUTA problem (LIAM). The experimental results demonstrated FIAM’s abilities to maintain a steady running time and a decreasing mean rescue time with a substantially increasing percentage of rescued survivors. For instance, FIAM successfully rescued 100% of the survivors with merely 16 UAVs, for scenarios of no more than eight survivors, whereas LIAM, Auction, ACO and OTA rescued a maximum of 75%, 50%, 35% and 35%, respectively, for the same scenarios. This superiority of FIAM performance was maintained under a different fleet size and number of survivors, demonstrating the approach’s flexibility and scalability.
Unmanned aerial vehicles (UAVs) have grown in popularity over the last decade, with applications ranging from agriculture to commercial and widespread use in search and rescue (SAR) operations. The utilization of UAVs has been shown to enhance the effectiveness of these missions by providing a fast, agile, and cost-effective means of gathering information and performing tasks in challenging environments. This chapter aims to examine the current usage of UAVs in SAR missions, identify the challenges associated with their use, and provide a direction for future research on the topic. This chapter provides a detailed description of the various types of UAVs and sensors that are required to carry out successful SAR missions. Our research has found that UAVs are commonly utilized during the rescue and recovery phases in both indoor and outdoor settings for tasks such as mapping, detection, and intervention. While the number of studies on the deployment of UAVs in SAR operations has significantly increased in the past few years, further investigation is necessary to specifically focus on drone intervention in these missions.
Full-text available
To assist in the implementation of a fine 3D terrain reconstruction of the scene in remote sensing applications, an automatic joint calibration method between light detection and ranging (LiDAR) and visible light camera based on edge points refinement and virtual mask matching is proposed in this paper. The proposed method is used to solve the problem of inaccurate edge estimation of LiDAR with different horizontal angle resolutions and low calibration efficiency. First, we design a novel calibration target, adding four hollow rectangles for fully automatic locating of the calibration target and increasing the number of corner points. Second, an edge refinement strategy based on background point clouds is proposed to estimate the target edge more accurately. Third, a two-step method of automatically matching between the calibration target in 3D point clouds and the 2D image is proposed. Through this method, i.e., locating firstly and then fine processing, corner points can be automatically obtained, which can greatly reduce the manual operation. Finally, a joint optimization equation is established to optimize the camera’s intrinsic and extrinsic parameters of LiDAR and camera. According to our experiments, we prove the accuracy and robustness of the proposed method through projection and data consistency verifications. The accuracy can be improved by at least 15.0% when testing on the comparable traditional methods. The final results verify that our method is applicable to LiDAR with large horizontal angle resolutions.
The presence of an obstacle in detecting respiration vital signs using a radar system does not only exhibit attenuation and phase shift effects on the radar signal but also causes additional beat frequencies to appear beside those from the target. When a conventional FMCW radar system detects the peak of the received signal spectrum as a beat frequency, this will give an error of detection. The respiratory vital signs will then be undetectable. A method to overcome this obstacle problem is needed. Therefore the implementation of the FMCW radar system in detecting respiratory vital signs behind the wall can be realized. In this paper, a method to deal with the previously mentioned problem is proposed. The proposed method consists of a method for identifying the obstacle response. Then the results are used to eliminate the beat frequency that arises from the obstacle structure, and therefore the signal to clutter ratio (SCR) can be improved. Furthermore, the method is combined with the phase detection method, which is applied to extract the Doppler response related to the target respiratory vital signs. The weighting process by elaborating cross correlation is also employed for minimizing clutter from other objects reflection under debris. A laboratory experiment by developing an FMCW radar system with a 24 GHz frequency operation was performed in this research. The proposed detection method was elaborated in the post-processing of the radar system. The experiment result demonstrates that the proposed method is capable of eliminating the effect of the wall and successfully extracting the respiration vital sign pattern.
Full-text available
High-density point clouds are valuable and detailed sources of data for different processes related to photogrammetry. We explore the knowledge-based generation of accurate large-scale three-dimensional (3D) models of buildings employing point clouds derived from UAV-based photogrammetry. A new two-level segmentation approach based on efficient RANdom SAmple Consensus (RANSAC) shape detection is developed to segment potential facades and roofs of the buildings and extract their footprints. In the first level, the cylinder primitive is implemented to trim point clouds and split buildings, and the second level of the segmentation produces planar segments. The efficient RANSAC algorithm is enhanced in sizing up the segments via point-based analyses for both levels of segmentation. Then, planar modelling is carried out employing contextual knowledge through a new constrained least squares method. New evaluation criteria are proposed based on conceptual knowledge. They can examine the abilities of the approach in reconstruction of footprints, 3D models, and planar segments in addition to detection of over/under segmentation. Evaluation of the 3D models proves that the geometrical accuracy of LoD3 is achieved, since the average horizontal and vertical accuracy of the reconstructed vertices of roofs and footprints are better than (0.24, 0.23) m, (0.19, 0.17) m for the first dataset, and (0.35, 0.37) m, (0.28, 0.24) m for the second dataset.
Conference Paper
Full-text available
A modified Next Best View (NBV) approach is presented to improve the 3D reconstruction of complex symmetric structures. Two new stages are introduced to the NBV approach; A profiling stage quickly scans the structure of interest and builds a rough model called the "profile". The profile is relatively sparse but captures the major geometric features of the structure. A symmetry detection stage then determines major lines of symmetry in the profile and labels points of interest. If a point exists in known space but its mirror image lies in unknown space, the mirrored point becomes a point of interest. The reconstruction is performed by a sensor mounted on an Unmanned Aerial Vehicle by using a utility function that incorporates the detected symmetry points. We compare the proposed method with the classical information gain utility function in terms of coverage completeness , total iterations, and travel distance.
Conference Paper
Full-text available
Exploring and mapping previously unknown environments while avoiding collisions with obstacles is a fundamental task for autonomous robots. In scenarios where this needs to be done rapidly, multi-rotors are a good choice for the task, as they can cover ground at potentially very high velocities. Flying at high velocities, however, implies the ability to rapidly plan trajectories and to react to new information quickly. In this paper, we propose an extension to classical frontier-based exploration that facilitates exploration at high speeds. The extension consists of a reactive mode in which the multi-rotor rapidly selects a goal frontier from its field of view. The goal frontier is selected in a way that minimizes the change in velocity necessary to reach it. While this approach can increase the total path length, it significantly reduces the exploration time, since the multi-rotor can fly at consistently higher speeds.
Conference Paper
Full-text available
We consider the problem of next-best view selection for volumetric reconstruction of an object by a mobile robot equipped with a camera. Based on a probabilistic volumetric map that is built in real time, the robot can quantify the expected information gain from a set of discrete candidate views. We propose and evaluate several formulations to quantify this information gain for the volumetric reconstruction task, including visibility likelihood and the likelihood of seeing new parts of the object. These metrics are combined with the cost of robot movement in utility functions. The next best view is selected by optimizing these functions, aiming to maximize the likelihood of discovering new parts of the object. We evaluate the functions with simulated and real world experiments within a modular software system that is adaptable to other robotic platforms and reconstruction problems. We release our implementation open source.
Full-text available
In this paper, we investigate the following question: when performing next best view selection for volumetric 3D reconstruction of an object by a mobile robot equipped with a dense (camera-based) depth sensor, what formulation of information gain is best? To address this question, we propose several new ways to quantify the volumetric information (VI) contained in the voxels of a probabilistic volumetric map, and compare them to the state of the art with extensive simulated experiments. Our proposed formulations incorporate factors such as visibility likelihood and the likelihood of seeing new parts of the object. The results of our experiments allow us to draw some clear conclusions about the VI formulations that are most effective in different mobile-robot reconstruction scenarios. To the best of our knowledge, this is the first comparative survey of VI formulation performance for active 3D object reconstruction. Additionally, our modular software framework is adaptable to other robotic platforms and general reconstruction problems, and we release it open source for autonomous reconstruction tasks.
This book is the fifth volume in the successful book series Robot Operating System: The Complete Reference. The objective of the book is to provide the reader with comprehensive coverage on the Robot Operating System (ROS), which is currently considered to be the primary development framework for robotics applications, and the latest trends and contributing systems. The content is divided into six parts. Pat I presents for the first time the emerging ROS 2.0 framework, while Part II focuses on multi-robot systems, namely on SLAM and Swarm coordination. Part III provides two chapters on autonomous systems, namely self-driving cars and unmanned aerial systems. In turn, Part IV addresses the contributions of simulation frameworks for ROS. In Part V, two chapters explore robotic manipulators and legged robots. Finally, Part VI presents emerging topics in monocular SLAM and a chapter on fault tolerance systems for ROS. Given its scope, the book will offer a valuable companion for ROS users and developers, helping them deepen their knowledge of ROS capabilities and features.
Historical cities (or towns) are inherently valuable for their uniqueness. Although reflecting the past, these cities offer significant opportunities. For this reason, it is important to know how to preserve them and assess their exploitation potential from different perspectives. An increased knowledge about them is continuously required and demanded. Three-dimensional (henceforth 3D) virtual models are an excellent way for introducing these cities to people. However, some particular characteristics derived from irregular urban structures and specific human dynamics in historical urban environments can constrain the use of some technologies for 3D data collection. This research paper proposes an own working methodology for the generation of 3D virtual models of historical cities. This methodology is based on the combined use of laser scanning and photogrammetric techniques, which complement each other. As a result, a 3D virtual model with high geometric accuracy and visual completeness is obtained and integrated into a web-based application. Resulting virtual models can be used for touristic promotion, navigation, urban planning, documentation and preservation of cultural heritage, etc.
Conference Paper
Search-and-rescue (SAR) team main objective is to quickly locate victims in post-disaster scenario. In such disaster scenario, images are usually complex containing highly cluttered background such as debris, soil, gravel, ruined building, and clothes, which are difficult to distinguish from the victims. Previous methods which only work on nearly uniform background taken from either indoor or yard are not suitable and can deteriorate the detection system. In this paper, we demonstrate the feasibility of multilayer neural network for disaster victims detection on highly cluttered background. Theoretical justification from which deep learning learns from concrete to object abstraction is established. In order to build a more discriminative system, this theoretical justification then leads us to perform pretraining using data-rich datasheet followed by finetuning only on the last layers using data-specific datasheet while keeping the other layers fixed. A new Indonesian disaster victims datasheet is also provided. Experimental results show the efficiency of the method for disaster victims detection in highly cluttered background.