Content uploaded by Jean-Philippe Lauffenburger
Author content
All content in this area was uploaded by Jean-Philippe Lauffenburger on Nov 11, 2019
Content may be subject to copyright.
2.5D Evidential Grids for Dynamic Object
Detection
Hind Laghmara, Thomas Laurain, Christophe Cudel and Jean-Philippe Lauffenburger
IRIMAS EA7499, Universit´
e de Haute-Alsace
Mulhouse, France
firstname.name@uha.fr
Abstract—Perception is a crucial and challenging part of
Intelligent Transportation Systems. One of the main issues is
to keep up with the moving objects in complex and dynamic
environments. This paper proposes a method for dynamic object
detection using Evidential 2.5D Occupancy Grids. The approach
is based on a map representation for occupation modeling and
navigable area definition. At each time step, a local grid is
derived from the sensor data. Belief Theory is then retained
to perform a grid fusion over time in order to keep track of
the moving objects in the grid. The description of the dynamic
behavior of objects in a scene is related to the conflict issued
after the temporal fusion. Finally, the construction of the objects
themselves is realized with a segmentation based on Density-
Based Spatial Clustering of Applications with Noise (DBSCAN).
In order to validate the efficiency of the proposed approach,
experimental results are provided based on the KITTI dataset.
Performances are evaluated through comparison with the ground
truth.
Index Terms—Occupancy Grids, Belief Theory, Dynamic Ob-
ject Detection, Autonomous Vehicles, LiDAR.
I. INTRODUCTION
This paper focuses on the perception of Intelligent Trans-
portation Systems for which the objective is to model the
vehicle’s local environment according to data issued from
multiple sensors. This paper aims to achieve such task con-
sidering that the vehicle’s pose is known. The surrounding
environment representation is based on Occupancy Grids (OG)
as it indicates two main features: the navigable space as
well as the location of obstacles which may be static or
dynamic. OGs can be constructed based on the measures of the
objects’ distance from the ego-vehicle which can be given by
exteroceptive sensors such as LiDARs (Light Detection And
Ranging), Radars or stereo vision.
The main challenges considering this issue are the uncer-
tainty and imprecision of information as well as the complexity
that lies behind modeling a dynamic scene. Multiple Object
Tracking (MOT) is the application allowing to handle this
issue which includes the detection, association and tracking of
dynamic objects sequentially. OGs have proven to be effective
with a limited computation complexity for local environment
modelling taking account of temporal data. The aim in this
paper is to improve OGs to insure a robust detection of
dynamic objects. Due to the high uncertainty in such task, the
use of Dempster-Shafer theory is convenient for autonomous
systems’ applications.
Some of the main references in the literature to treat the
detection of dynamic objects with grid-based solutions are
reviewed in Section II-A. A survey is also presented in [1]
where a 2.5D (two dimensions plus the elevation information
obtained by averaging the height of all points that fall into a
given cell) approach is used for the determination of moving
cells.
This paper will be based on the same representation to
include a tri-dimensional modeling of the environment. The
main contribution of this approach is that the classification
of dynamic objects in a 2.5D grid is done according to
an evidential fusion of multi-grids. The second aim was to
extract an object-level representation according to the detected
dynamic cells for tracking purposes.The objective is to achieve
the dynamic detections on an object-level rather than on a
cell-level which is commonly done in literature. The objects
are built based on the clustering of mobile cells using the
DBSCAN algorithm. The third contribution of this work is
a quantitative evaluation according to a measure of average
precision of the detection results based on a KITTI dataset for
comparison purposes.
The paper is structured as follows: Section II covers a
survey on multiple object detection based on occupancy grids
as well as the definition of a 2.5D representation. Section
III introduces the different steps allowing the transition from
a grid-level representation to the dynamic object detection.
This includes the definition of an evidential grid as well as
mobile cells labeling (cf. Fig. 1). The segmentation algorithm
is also specified in order to extract dynamic objects. Section
IV illustrates the dataset used for evaluation as well as a
quantitative result analysis. Section V concludes the paper.
II. 2.5D GRID MAP S
A. Related Work
An OG is a representation which employs a multidimen-
sional tesselation of the space into cells where each one stores
a knowledge of its state of occupancy [2]. Today, there is a
large use of OGs due to the availability of powerful resources
to handle their computational cost. The construction of a
grid has been applied in multiple dimensions (2D, 2.5D and
3D) [1] using different sensor technologies like 2D radars,
2D or 3D LiDARs and stereo-vision. In this representation,
each cell-state is described according to a chosen formalism.
The most common one is the Bayesian framework which
was adopted first by Elfes [2] followed by many extensions
as the well-known Bayesian Occupancy Filter (BOF) [3, 4].
The latter estimates the dynamics of the grid cells using the
Fast Clustering and Tracking Algorithm in order to ensure
MOT [5].
Other works suggested a formalism based on Dempster-
Shafer (or Evidence) Theory. It has been applied in [6] based
on a 2D occupancy grid using a ring of ultrasonic transducers
and a sensor scanner. Moras et al. proposed a similar approach
also used for mobile object detection based on an inverse
sensor model [7, 8]. The latter is realized according to the
conflict analysis by a temporal evidential fusion of multiple
grids. Extending Moras et al.’s work, contextual discounting
is applied in [9] to control cell remanence.
Some references study the dynamics of the environment
at the cell level to avoid the inconsistencies of the object
representation [10]. Tanzmeister et al. [11] also estimate the
static and dynamic characteristics at the grid cell level and
use a particle filter for obtaining the cell velocity distribution.
Honer et al. [12] focus on the classification of stationary
and dynamic elements based on an evidential semantic 2D
grid map with a five state cell configuration: each cell can
either be free, dynamic, static, occupied or unknown. However,
the update of cells is done according to a combination table
heuristically determined.
The above literature review and especially [1] show that
most of the works consider a two-dimensional grid for the en-
vironment representation even when 3D sensors are providing
the data to build the map. In fact, 3D solutions like voxel grids
or octomaps can generate high complexity and computation
load when applied to real-time applications like autonomous
navigation. An interesting tradeoff remains in 2.5D occupancy
grid which are known to be memory efficient and at the
same time store elevation data. In the particular context of
autonomous driving and ITS, in which the elevation variation
of the terrain is limited in the local area in which the vehicles
are driving, 2.5D representations are of real interest and are
retained here.
In this work, the objective is to consider an object-oriented
tracking which necessitates an efficient object detection mod-
ule. The idea is to consider the tri-dimensional sensor data
issued by a Velodyne LiDAR to built at each time step a 2.5D
grid for which the elevation is attributed to each cell. Sections
II-B and III describe the approach illustrated in Fig. 1.
B. Building a 2.5D Grid
The pre-processing step from Fig. 1 required to build a 2.5D
grid is derived from [1]. The grid is composed of discrete cells
in which the object height above the ground level is stored.
This representation can describe the elevated objects from the
ground which can correspond to dynamic or static objects.
Building the 2.5D grid includes defining the covering area
as well as its resolution, which corresponds to the dimensions
of each cell. An example of a 2.5D grid map is shown in Fig. 2
where the resolution is 0.4×0.4m. The grid covers 40min
front, 20mbehind and 20malong right and left sides of the
vehicle.
Fig. 2. Top: 3D LiDAR point cloud from KITTI, Bottom: Corresponding
2.5D grid with 0.4×0.4mcells.
In order to consider the elevation of objects, it is necessary
to determine all measures that correspond to the ground.
Several approaches as [13] treat this point because it can
induce errors when investigating the occupancy. Such cases
are very frequent when the road is uneven or tilted. In this
work, the method presented in [1] is employed. It consists
in evaluating the variance of height of the points which
correspond to a cell. If this value is larger than a threshold, the
average height is verified to surpass a defined value in order to
make sure it belongs to the ground surface and not any other
planar surface. This is equivalent to the following statement:
G(i, j) = (0if σ2
i,j < trσand µi,j < trµ
µi,j otherwise ,(1)
where µi,j and σ2
i,j are the average height and its variance in
cell with index (i, j). The thresholds trσand trµdefined in [1]
are respectively equal to 2cm and 30cm.Gis the resulting
2.5D grid which will be further used for object detection.
III. FROM A N EVIDENTIAL GRI D TO T HE OBJECT LE VE L
A. Modeling an Evidential Grid
Extending the probability theory, the Belief Theory offers an
adequate representation of the data and source imperfections
and thus is appropriate for perception in ITS. It offers a wide
range of fusion operators handling these properties according
to the application.
In this work, the solution from [7] is adapted to the 2.5D
grid. Moras et al. suggest an approach based on the conflict
Fig. 1. Dynamic object detection with an evidential 2.5D grid.
Fig. 3. Polar representation of an occupancy map according to LiDAR data
where Ris the range of a cell and θits angular sector [14].
appearing during the temporal grid fusion for mobile object
detection and navigable space determination. For that, a frame
of discernment is defined to include the states of a cell
considering it to be Free (F) or Occupied (O). The discernment
frame is then Ω = {F, O}.
The referential power set contains all possible combinations
of the discernment frame hypotheses: 2Ω={∅, F, O, {F, O}}.
To express the belief in each state, a mass function m(.)is
defined to respectively express conflict m(∅), Free state m(F),
Occupied state m(O)and the unknown state m({F, O}).
B. Inverse Sensor Model
Basically, a sensor model is how the mass function of a
state according to a measure is calculated. This basic belief
assignment (bba) also includes the reliability of the source.
In this application, the considered sensor is a 3D multi-echo
LiDAR provided by Velodyne. The input data will include
ranges riand angles θiof each laser beam or point pias
shown in Fig. 3.
According to this set of data, a Scan Grid (SG) in polar
coordinates is constructed. Each row of this SG corresponds
to an angular sector Θ=[θ−, θ+]for which a cell is defined
in R×Θ. The range of a cell is R= [r−, r+]which means
that each cell is defined by a pair on which a mass is attributed
as m{Θ, R}. The masses corresponding to each proposition
A∈Ωare found hereby [7]:
m{Θ, R}(∅) = 0 (2)
m{Θ, R}(O) = (1−µFri∈R
0otherwise (3)
m{Θ, R}(F) = (1−µOif r+<min(ri)
0otherwise (4)
m{Θ, R}(Ω) =
µFri∈R
µOif r+<min(ri)
1otherwise
(5)
where µFand µOrespectively correspond to the probability of
false alarm and missed-detection of the sensor. For simplicity
reason, these mass functions will be noted m(∅),m(O),m(F)
and m(Ω).
C. Combination of Evidential Grids
The construction of a SG is sequentially done to translate
the sensor’s data. However, the temporal propagation of the
knowledge and uncertainties provided by every point cloud
given by the sensor requires a fusion process between the
current SG and the result of the previous fusion. The complete
description of the environment resulting from such a combina-
tion provides a Map Grid (MG). This update allows to detect
the consistencies of data as well as some cases of conflict.
Fig. 4 illustrates the process of building and updating a MG
Fig. 4. Map Grid Construction.
using the sensor point cloud provided at a time t. It is the
outcome of a combination of a SG built at taccording to (2)
and a transformed MG built at t−1.
The grid transformation is applied with respect to the new
pose of the vehicle at tin order to guarantee that the infor-
mation is expressed in the current coordinate system of the
vehicle. This operation is realized by a spatial transformation
for which the aim is to associate to each cell new coordinates.
Algorithm 1 describes the approach.
Algorithm 1 Grid Transformation to new vehicle coordinates
Require: Previous Map Grid MGt−1, Rotation matrix R,
Translation Vector T.
Ensure: Build a transformed Map Grid MGt−1,tr
Initialize MGt−1,tr cells with mM Gt−1,tr (Ω) = 1
for each cell with index (p, q)do
Apply a change of coordinates (p, q) = R×(p, q) + T
Calculate the new indices
(pnew, qnew ) = min(|ceil(p, q )|,|floor(p, q)|)×
sign(i, j )
if (pnew ×qnew)>0then
MGt−1,tr(pnew, qnew ) = M Gt−1(p, q)
end if
end for
This update is done according to an evidential multi-grid
fusion. This is the crucial point of the grid-based object
detection process as it allows the temporal update of the map
grid and also the evaluation of the state of cells. Among
the various operators in Belief Theory, Dempster’s rule of
combination is used:
mMGt=mM Gt−1,tr ⊕mS Gt(6)
where mMGt−1,tr and mS Gtare resp. the mass function of
the transformed MG and SG at time t. The operator is defined
as:
(m1⊕m2)(A) = KX
∀B,C ∈2Θ,B∩C=A,A6=∅
m1(B).m2(C)(7)
where
K−1= 1 −X
∀B,C ∈2Θ,B∩C=∅
m1(B).m2(C)(8)
The resulting masses mMGt(A)define the state of each cell
which depends on the previous state and the new measures.
The resulting masses according to each state are found as
follows [7]:
mMGt(O) =mS Gt(O)mM Gt−1,tr (O) + mSGt(Ω)
mMGt−1,tr (O) + mSGt(O)mM Gt−1,tr (Ω)
mMGt(F) =mS Gt(F)mM Gt−1,tr (F) + mSGt(Ω)
mMGt−1,tr (F) + mSGt(Ω)mM Gt−1,tr (F)
mMGt(Ω) =mS Gt(Ω)mM Gt−1,tr (Ω)
mMGt(∅) =mS Gt(O)mM Gt−1,tr (F) + mSGt(F)
mMGt−1,tr (O)
(9)
with mMGt(∅) being the combined mass expressing the con-
flict. Basically, this property shows the discordance between
the knowledge expressed at t−1and t. The reason for which
a conflict appears is that when a cell changes its state from F
to Oor vice-versa. Therefore, the detection of this conflict can
lead to the evaluation of the dynamic cells. The conflict allows
to label the occupied cells which change their state according
to two types of conflict:
C1=mSGt(O).mM Gt−1,tr (F)from Fto O
C2=mSGt(F).mM Gt−1,tr (O)from Oto F(10)
where mMGt(∅) = C1+C2.
Dempster’s operator implies a normalization of conflict at
fusion considering its absorbing property. Basically, if the
conflict is included in the next combination, it induces loss
of information because mMGt(∅)increases at each fusion.
Therefore, the updated grid contains no conflict. It is only
preserved to classify the mobile cells to be studied for dynamic
object extraction.
D. Clustering for Dynamic Object Detection
In order to attain the object level representation from the
mobile cells, a clustering is applied to group those cells related
to the same object in the grid. For that, the partitioning method
must be unsupervised considering that the number of objects to
be found is unknown. The well-known Density-Based Spatial
Clustering of Applications with Noise (DBSCAN) algorithm is
used [15]. It is based on the estimated density of measures for
partitioning clusters. This algorithm uses two main parameters:
the minimal distance and the minimum number of points
minP ts which must reside within a radius to be included in
a cluster. This algorithm is convenient because it is simple and
can handle aberrant or noisy values while clustering. However,
it can have some issues when clusters have different densities.
Fig. 5. Appearance of conflict due to the displacement of objects. Conflict C1
informs about newly occupied cells whereas C2describes transitions between
occupied to free.
The clustering algorithm is applied to a set of cells which
should necessarily be occupied (i.e. non-zero elevation) and
for which the conflict mMGt(∅)is later used to classify the
set of the resulting clusters. The partial conflict C2informs
about cells changing state from occupied to free at a time t.
The cells affected with this conflict do not belong to a given
object and hence do not provide any knowledge about the
object’s presence. That is why, it is trivial to only consider the
partial conflict C1to determine the location of the object at
time t. However, exclusively clustering the C1-labeled cells is
not informative enough to obtain a complete representation of
the shape of the dynamic object. On the grid, the displacement
of objects is only visible at the perimeter, the conflict is mostly
located at the boundaries of objects as shown in Fig. 5. For
that, clustering is applied to detect both classes of static and
dynamic objects according to the elevation measure on the
2.5D grid. Afterwards, a classification of these clusters is made
according to those which partially contain conflictual cells as
mentioned in Fig. 1.
IV. EXP ER IM EN TAL RE SULTS
The presented approach is applied to real data and has been
tested offline. The validation is done at the grid-level as well
as the object-level according to the ground truth (GT) for
qualitative and quantitative evaluation.
A. Dataset and Performance Evaluation
The used data is extracted from the KITTI database [16].
It is a widely used dataset in research for autonomous driving
as it provides a large set of images, GPS/IMU recordings and
laser scans raw data as well as labeled scenes. In this study,
sequence 17 from the raw data set is used considering that
annotations with the GT on detected objects as well as the
vehicle pose are available. A total of 114 frames are used
for which 59 contain annotated moving cars. The used data
for this approach are the point cloud recorded by a Velodyne
HDL-64 which is characterized by 64 horizontal layers and
a360ohorizontal field of view and a 26.9overtical field of
view. The GPS data is also used to obtain the vehicle’s pose.
The images are not exploited for this application but are used
for visualization.
3D object detection benchmarks offer various criteria for
performance evaluation purposes. The most common measure
is the precision. It is the proportion of all example above the
rank which are from the positive class [17]:
P recision =T P
T P +F P (11)
where T P and F P respectively stand for True Positive and
False Positives.
This metric is calculated according to the overlap of candi-
date detections with the GT. For the computation, the bounding
box of a detected object is compared to the GT bounding box.
A correct detection has an overlap area aowhich exceeds 50%
to be considered True and can be calculated as follows:
ao=area(Bp∩Bgt )
area(Bp∪Bgt )(12)
where Bpand Bgt are respectively the candidate bounding
box and the GT bounding box.
Fig. 6. Top: Frame 40 of Sequence 17, Middle: The corresponding 2.5D grid,
Bottom: Comparison of the detection results on frame 40 with the ground
truth.
B. Results
In the following section, the perception results will be
illustrated according to the 2.5D grid, the evidential occupancy
measures and the detected objects. The results are confronted
with the available GT-based objects.
Fig. 6 shows an example of results according to the image
captured by the camera facing the front of the vehicle. The
corresponding 2.5D grid found below the image expresses
the average height of elevated objects in the scene. In this
view, the car’s position is approximately x= 50, y = 50
and heading to the right. It can been seen that this map
contains voxels describing moving objects (2 cars) as well
as static ones like numerous traffic signs or static vehicles
behind the ego-car. The grid fusion allows to determine which
among these voxels belong to dynamic objects. The bottom
figure, exclusively, shows the results of dynamic objects found
in the range of the camera view. It provides a comparison
between the bounding boxes resulting from the fusion and
clustering process and the GT bounding boxes. We choose not
to display the objects detected behind the ego-vehicle since no
annotations are available in the GT.
The position of these objects is found according to the
evaluation of conflict in the corresponding MG. The conflict
C1allows to observe the cells changing state from Free to
Fig. 7. Overlap of the detected objects with the ground truth showing the
rate above which a detection is eligible.
Occupied and the voxels which contain a non-zero value of C1
are grouped to define the moving objects. Considering that we
only detect cars in this sequence, the parameters of DBSCAN
are minP ts = 4 and = 5 in grid coordinates. This algorithm
is advantageous for this application because it discards the
measures which can be considered noisy. This allows to
optimize the number of relevant clusters. The extracted clusters
are labeled according to bounding boxes containing the exact
number of LiDAR data belonging to the cluster. The use of 3D
bounding boxes allows to evaluate the results according to the
known locations of GT. The total AP is found to be 91.23%
with an overlap illustrated in Fig. 7. It can be noticed that most
detected objects overlap with true objects at a rate varying
between 65% −90%. Note that the number of false alarms is
very low due to the property of the clustering algorithm which
only considers dense groups of measurement. The noisy or
distant data do not belong to any object.
V. CONCLUSION
The approach presented in this paper aims at the detection
of multiple objects based on LiDAR data according to an
evidential 2.5D grid. The main contributions of this paper
is the use of an evidential elevation map and the evaluation
of conflict for the determination of mobile objects. Another
contribution is the clustering to have an object-level repre-
sentation for tracking purposes. The detection of dynamic
objects is evaluated according to ground truth given by a set
of annotations of KITTI dataset. The approach is shown to be
efficient according to its high average precision. The perspec-
tives for future work are the identification of the clustering
algorithm parameters in order to identify many classes of
objects. Extending these first results by testing the approach
on more complex scenarios including occluded objects is a
second work perspective. Furthermore, a comparative study of
this work with the state of the art results will be performed.
ACKNOWLEDGMENT
The authors gratefully acknowledge the financial support
from Fondation Wallach (Mulhouse) in the context of the
Project SIMPHA (Solution Innovantes pour la Mobilit´
e in-
dividualis´
ee et durable des s´
eniors et Personnes pr´
esentant un
Handicap).
REFERENCES
[1] A. Asvadi, P. Peixoto, and U. Nunes, “Detection and
tracking of moving objects using 2.5d motion grids,” in
18th International Conference on Intelligent Transporta-
tion Systems, Las Palmas, Spain, 2015.
[2] A. Elfes, “Using occupancy grids for mobile robot per-
ception and navigation,” Computer, vol. 22, no. 6, pp.
46–57, Jun. 1989.
[3] C. Cou ´
e, C. Pradalier, C. Laugier, T. Fraichard, and
P. Bessiere, “Bayesian Occupancy Filtering for Multitar-
get Tracking: an Automotive Application,” International
Journal of Robotics Research, vol. 25, no. 1, pp. 19–30,
Jan. 2006.
[4] A. Broggi, S. Cattani, M. Patander, M. Sabbatelli, and
P. Zani, “A full-3d voxel-based dynamic obstacle detec-
tion for urban scenario using stereo vision,” in 16th Inter-
national IEEE Conference on Intelligent Transportation
Systems (ITSC), Oct 2013, pp. 71–76.
[5] K. Mekhnacha, Y. Mao, D. Raulo, and C. Laugier,
“The “fast clustering-tracking” algorithm in the bayesian
occupancy filter framework,” in IEEE International Con-
ference on Multisensor Fusion and Integration for Intel-
ligent Systems, Aug 2008, pp. 238–245.
[6] D. Pagac, E. M. Nebot, and H. F. Durrant-Whyte, “An
evidential approach to map-building for autonomous ve-
hicles,” IEEE Transactions on Robotics and Automation,
vol. 14, no. 4, pp. 623–629, 1998.
[7] J. Moras, V. Berge-Cherfaoui, and P. Bonnifait, “Moving
objects detection by conflict analysis in evidential grids,”
in IEEE Intelligent Vehicles Symposium (IV), Baden-
Daden, Germany, 2011, pp. 1122–1127.
[8] J. Moras, V. Cherfaoui, and P. Bonnifait, “Credibilist
occupancy grids for vehicle perception in dynamic envi-
ronments,” in IEEE International Conference on Robotics
and Automation, May 2011, pp. 84–89.
[9] M. Kurdej, J. Moras, V. Cherfaoui, and P. Bonnifait,
“Controlling remanence in evidential grids using geodata
for dynamic scene perception,” International Journal of
Approximate Reasoning, vol. 55, no. 1, Part 3, pp. 355–
375, 2014.
[10] R. Danescu, F. Oniga, and S. Nedevschi, “Modeling
and tracking the driving environment with a particle-
based occupancy grid,” IEEE Transactions on Intelligent
Transportation Systems, vol. 12, no. 4, pp. 1331–1342,
Dec 2011.
[11] G. Tanzmeister and D. Wollherr, “Evidential grid-based
tracking and mapping,” IEEE Transactions on Intelligent
Transportation Systems, vol. 18, no. 6, pp. 1454–1467,
June 2017.
[12] J. Honer and H. Hettmann., “Motion state classification
for automotive lidar based on evidential grid maps and
transferable belief model.” in 21st International Confer-
ence on Information Fusion (FUSION), Oxford, U.K, Jul.
2018.
[13] L. Wang and Y. Zhang, “Lidar ground filtering algorithm
for urban areas using scan line based segmentation,”
Computing Research Repository, vol. abs/1603.00912,
2016.
[14] K. Jo, S. Cho, C. Kim, P. Resende, B. Bradai,
F. Nashashibi, and M. Sunwoo, “Cloud update of tiled
evidential occupancy grid maps for the multi-vehicle
mapping,” Sensors, vol. 18, no. 12, 2018.
[15] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-
based algorithm for discovering clusters in large spatial
databases with noise.” AAAI Press, 1996, pp. 226–231.
[16] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision
meets robotics: The kitti dataset,” International Journal
of Robotics Research (IJRR), 2013.
[17] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn,
and A. Zisserman, “The pascal visual object classes (voc)
challenge,” International Jounal of Computer Vision,
2010.