Page 1
A Probabilistic Representation of LiDAR Range Data for Efficient 3D Object
Detection
Theodore C. Yapo1∗, Charles V. Stewart2, and Richard J. Radke1
1Department of Electrical, Computer, and Systems Engineering
2Department of Computer Science
Rensselaer Polytechnic Institute, Troy, New York 12180
yapot@rpi.edu, stewart@cs.rpi.edu, rjradke@ecse.rpi.edu
Abstract
We present a novel approach to 3D object detection in
scenes scanned by LiDAR sensors, based on a probabilistic
representation of free, occupied, and hidden space that ex
tends the concept of occupancy grids from robot mapping
algorithms. This scene representation naturally handles Li
DAR sampling issues, can be used to fuse multiple LiDAR
data sets, and captures the inherent uncertainty of the data
due to occlusions and clutter. Using this model, we for
mulate a hypothesis testing methodology to determine the
probability that given 3D objects are present in the scene.
By propagating uncertainty in the original sample points,
we are able to measure confidence in the detection results
in a principled way. We demonstrate the approach in ex
amples of detecting objects that are partially occluded by
scene clutter such as camouflage netting.
1. Introduction
Light Detection and Ranging (LiDAR) scanners use
timeofflight measurements of narrow beams of laser light
toproduceestimatesofthelocationsof3Dpointsinascene.
The resolution of commercially available LiDAR scanners
can be very good, achieving an accuracy of a few mm at
100m range [13]. However, unlike digital image sensors
that use an optical lowpass filter to prevent the aliasing of
high spatial frequencies in the scene, LiDAR sensors are
very susceptible to sampling artifacts, as illustrated in Fig
ure 1. For example, if the samples are too far apart, a Li
DAR scan of a picket fence might be interpreted as a solid
wall. Conversely, if a solid wall is sampled at a shallow
grazing angle by nearly parallel LiDAR rays, it can be dif
ficult to connect the distant sample points into a single sur
∗This work was supported in part by the US Army Intelligence and Se
curity Command under the award W9124Q04F2159, and by the DARPA
Computer Science Study Group under the award HR00110710016.
face. Hence, even though each range point is measured with
high accuracy, there can still be quite a bit of uncertainty
about the scene in each LiDAR scan.
Occlusions in the scene introduce a second source of un
certainty into LiDAR range data. Objects may be wholly
or partially hidden from the point of view of the scanner,
resulting in uncertainty about their presence or position in
the scene. To deal with this issue effectively, a 3D object
detection algorithm must allow fusion of data taken from
different viewpoints, and model occlusion explicitly, noting
what parts of the scene are visible from each viewpoint.
Much previous research on analyzing LiDAR data is
based on generating a 3D model of the scene, either reduc
ing the data to a polygonal model, or in some cases, pro
ducing an implicit function representation of the scene sur
faces. Instead of irrevocably collapsing information about
the scene into a likely “crisp” estimate, we propose to pre
serve the inherent uncertainty of the original data when
testing hypotheses against the scene using a probabilistic
framework.
We propose a discrete scene data structure to maintain a
probabilistic model of the 3D scene, and provide a natural
and tractable means to update this model that properly han
dles LiDAR sampling issues. The scene data structure is
fundamentally a site occupancy probability model, extend
ing the concept of occupancy grids from robotics [15]. We
approximate the scene by a set of random fields that de
scribe the probabilities that any single site (3D voxel) is in
oneof threestates: freespace, occupied, orhidden. This ap
proach provides a sound basis for fusing data from disparate
sensors that observe the scene from different viewpoints.
While we believe that the precision of available LiDAR
sensors far surpasses that required for reliable object detec
tion (since most objects of interest in outdoor scenes are
very large relative to the uncertainty of a single LiDAR re
turn), we cannot scan the scene with fewer LiDAR points
without exacerbating the undersampling and aliasing prob
9781424423408/08/$25.00 ©2008 IEEE
Page 2
(a)
(b)
(c)
Figure 1. Illustrating sources of uncertainty in LiDAR data. The black dot represents the LiDAR sensor, the black lines LiDAR rays, the
gray lines object surfaces, and the white dots LiDAR returns. LiDAR is susceptible to (a) undersampling of objects with holes, (b) poor
sampling of surfaces due to shallow grazing angle, and (c) unsampled space resulting from occluding surfaces.
lems. To address these issues, we propose a downsampling
technique to reduce data storage requirements and allow ef
ficient hypothesis testing, while not glossing over important
LiDAR sampling issues.
Our initial results are based on range data alone, but we
note that our proposed data structure allows for the fusion
of data from sensors of different modalities, such as digital
images. Using our probabilistic model, 3D inferences from
stereo pairs or single images could be used to “fill in” re
gions that are ambiguous or occluded from the viewpoint
of the LiDAR scanner. Since image sensors typically have
much higher resolution than their LiDAR counterparts, this
combination could also be used to reduce uncertainty aris
ing from LiDAR undersampling in the scene.
Finally, we describe how 3D hypothesis tests related
to object detection can be easily performed on the data
structure, using an efficient method based on the 3D Fast
Fourier Transform. We demonstrate our results on an ex
ample where objects are hidden behind a camouflage net
that makes the detection problem in raw data difficult.
2. Related Work
Some of the earliest models for representing range data
used an octree with each leaf representing one of three un
ambiguous states: empty, occupied, or unseen [4]. Matthies
and Elfes [15] initially proposed maintaining probabilities
in a 2D structure called an occupancy grid for mapping the
local environment of an autonomous robot. In the orig
inal approach, each cell in the grid was assigned a sin
gle value, Pocc, representing the probability that the site
was occupied; initial values for each cell were set to
and the probabilities were updated using a Bayesian rule
[16]. Subsequent methods [8, 19] assigned two evidence
masses, one for empty (me) and one for full (mf) such that
me+ mf ≤ 1 to capture some notion of uncertainty, and
applied DempsterShafer theory [21] to update the grid.
Most approaches have treated the collection of cells as
a Markov Random Field (MRF) of order 0 [5], i.e., assum
ing that the occupancy probabilities of neighboring cells are
independent, greatly simplifying the occupancy probability
1
2
estimation procedure. In contrast, Thrun [22] used a for
ward model of the sensor and an expectationmaximization
algorithm to explicitly model the conditional dependencies
of cells within a sensor sweep area. Jang et al. [10] clus
tered cells into sets using neural networks and modeled the
conditional probabilities within these clusters. Ribo and
Pinz [20] compared three types of models for updating the
grid: probabilistic (using the Bayes update rule), Dempster
Shafer (using the Dempster rule of combination), and “pos
sibilistic” (using fuzzy logic). Gorodnichy and Armstrong
[7, 8] formulated the update rule as a regression. Matthies
and Elfes [15] illustrated the importance of an update rule
that is both commutative and associative, allowing data to
be fused in any order.
Although much of the previous work on occupancy grids
has been limited to 2D representations, several researchers
have extended the notion to 3D scenes. One of the most
important issues is avoiding a computationally intractable
increase in the number of cells. Payeur et al. [19] described
an octree representation in which spherical occupancy grids
are computed around each sensor to accurately model the
sensor characteristics; these are then merged into a common
Cartesian grid. Several schemes for storing functional rep
resentations of occupancy fields have also been proposed.
Payeur [18] described one such method in which a con
tinuous spatial potential force is computed from the occu
pancy probabilities of each cell, allowing a continuous es
timation of the field. Gorodnichy and Armstrong [8] pro
posed a parametric representation using a minmax tree of
piecewise linear functions. Yguel et al. [24] proposed a 2D
waveletbased occupancy field, exploiting the continuous
and natural multiresolution properties of the wavelet de
composition. Paskin and Thrun [17] proposed using polyg
onal random fields for probabilistic mapping of 2D scenes,
avoiding discretization errors and independence assump
tions associated with occupancy grids. However, we note
that most of these representations do not easily lend them
selves to object recognition.
Once the data has been fused into a single model, hy
potheses (e.g., related to the presence of objects) can be
tested against it. One class of basic hypotheses concerns the
Page 3
detection or recognition of known 3D object models in the
scene. Campbell and Flynn [3] give a good overview of the
basic approaches to 3D object recognition. Among these,
the most relevant to scenes containing heavy clutter and in
tentional occlusions (camouflage) are “pointfeature” based
methods. In these algorithms, a vector descriptor is calcu
lated for a subset of points in the scene. These descriptors
are then matched with those calculated from object models
to detect and recognize objects in the scene. These methods
can be robust to scene clutter and occlusions provided that
initial local surface normals can be estimated.
Johnson and Hebert [11, 12] devised a pointfeature
based method known as “spin images”. In this approach,
a local surface normal is first estimated for each point in the
scene to be tested. A 2D histogram is then rotated around
this normal, creating a “spin image” representative of the
local surface features of the object. The resulting image
serves as a point descriptor which can then be matched with
descriptors from object models.
Frome et al. [6] proposed two pointfeature techniques
and compared their performance to spin images. The first
method extended the concept of shape contexts (Belongie et
al.[1])into3D.Inthismethod, alocalsurfacenormalisfirst
estimated for each feature point. A spherical histogram of
neighboring points is then calculated as the feature descrip
tor. A spherical harmonic transformation is then applied to
this histogram to create a “harmonic shape context”. Their
results indicate better recognition rates than spin images.
Several researchers have applied these techniques to
more realworld recognition problems. Vasile and Marino
[23] developed an automatic target recognition algorithm
for twelve vehicle classes in relativelyunoccluded LiDAR
data based on spin images [12]. Huber et al. [9] described a
partsbased classification algorithm for vehicles, applied to
isolated recognition in the absence of clutter.
In these pointfeature approaches, a local surface rep
resentation (either polygonal meshes or oriented points) is
firstestimatedfromtherangepoints, discardingmuchofthe
uncertainty in the original data. For example, to generate
and match spin images, one must accurately estimate sur
face normals at each 3D scene point to define spin axes. In
contrast, our method assumes no a priori model of the scene
or object structure, and was designed for datasets where
millions of range points acquired from sensors at many dif
ferent viewpoints are fused into a common frame. Real out
door examples of this type of data include “fuzzy” regions
like trees, large hidden regions, and sparselysampled areas
arising from rays nearly parallel to object surfaces. Gen
erating reliable surface and normal estimates in such cases
would be difficult. However, as we describe below, these
cases are naturally handled by occupancy modeling.
Our proposed object detection method uses FFTbased
correlation to efficiently compute joint probabilities of in
dependent voxels. In this way, it is similar to FFTbased
registration methods such as that presented by Lucchese
et al. [14], which uses a correlation technique to estimate
translational parameters in range data registration once ro
tational parameters have been estimated.
3. Probabilistic Occupancy Model
We assume the viewpoint for each LiDAR return is
known; this is used to generate a model representing free
and hidden space as well as regions occupied by solid ob
jects. However, we do not require that the range points have
a particular structure, such as lying on an angular grid; in
fact, each range point can be taken from a different (known)
viewpoint. Since each LiDAR point is considered indepen
dently, we treat estimation of occupancy probabilities as a
sensor fusion problem even for data points obtained from
a single sensor at a fixed position. This approach naturally
scales to handle multiple LiDAR data sets that can be regis
tered into a common world coordinate system.
Figure 2 illustrates our representation of both the scene
space and objects that may be located within it. The scene
space ideally comprises three disjoint sets: (1) the occupied
points, which have been seen by one or more LiDAR sen
sors, (2) the hidden points, which have been obscured from
all sensors, and (3) the freespace points, which have been
“seen through” by at least one sensor. Also shown in Fig
ure 2 is our representation of objects to be detected in the
scene, as will be described in Section 4. Again, we consider
three disjoint sets of points: (1) points on the object sur
face, which could possibly be seen by a LiDAR sensor, (2)
points on the object’s topological interior, which are always
selfoccluded by the object, and hence can never be seen,
and (3) points exterior to the object, which may either be
unoccupied, or contain part of a different object.
Using these models of the scene and object spaces, we
can list the constraints that must be satisfied for an object to
exist in the scene, as summarized in Table 1. The object’s
surface points must coincide with occupied scene points,
and its interior points must coincide with hidden points.
Conversely, points in an object’s surface or interior cannot
coincidewithscenespaceknowntobeempty. Hiddenspace
points may or may not be consistent with an object’s surface
being present; we address this issue in Section 4. We finally
note that scene points exterior to the object have no bearing
on detection.
Our model for spatial occupancy of the scene is a set of
random fields, in which we assign probabilities of being in
each state independently to points in the scene, extending
the ideas of [5]. For any point x in the scene space, we
consider the probabilities that x is in one of the three states:
occupied, hidden, or free, as illustrated in Figure 2:
Pocc(x) + Phid(x) + Pfree(x) = 1.
(1)
Page 4
? ???
?????
??????
?????
????????
? ?????
????? ????
??????
???????
??????
????????
??????
????????
??? ???
Figure 2. Six spaces of interest for constructing probability density functions of scene occupancy. (a) Representation of the scene. Each
point in space is assumed to be in one of three states: occupied, hidden, or free. (b) Representation of an object. The object model is
represented by three sets of points: surface, topological interior, and exterior.
nhid++nhid++nhid++nocc++nfree++nfree++nfree++nfree++nfree++
Figure 3. A single LiDAR return provides information about scene structure at every point along the view ray.
Scene Space
hidden
Object Space
surface
interior
exterior
occupied
√
×

free
×
×

?
√

Table 1. Constraints imposed by scene and object space states.
“√” indicates a required coincidence, “×” indicates mutual ex
clusivity, “?” indicates possible consistency, and “” indicates no
information gain.
We note the distinction between this model and previ
ous work, which has either modeled site occupancy with
a single probability [15] such that Pocc+ Pfree= 1, or
with two evidence masses such that mocc+ mfree ≤ 1
[8]. In our model, we consider the “hidden” state to be a
distinct third possibility, not simply a lack of knowledge
about the occupied or free state of the point, thereby ac
counting for points that are inside solid objects. In doing
so, we allow a resolution to the problem of disambiguating
unknown states from conflicting evidence [8]. For exam
ple, in singleprobability models [15], unseen areas of space
are assigned the noninformative prior Pocc= Pfree=
but there is no way to distinguish this from a completely
observed site where equal, but conflicting, evidence exists
for either of the two states.
approaches [16] resolve this ambiguity by setting initial
masses to mocc = mfree = 0, they do not capture the
important representation of hidden or occluded space. By
making the third state explicit, we maintain true probabili
ties at each point, and can test hypotheses that the random
field model is in a state consistent with the presence of an
object in a principled way. Moreover, by maintaining a no
tion of hidden space, we enable more complex hypotheses
involving the interior space of objects, as well as their sur
1
2,
Although DempsterShafer
faces, to be tested. These tests are described in Section 4.
For each voxel in the grid, we wish to estimate the three
probabilities, Pocc, Pfree, and Phid. Since we assume the
accuracy of each LiDAR point is much greater than the res
olution of the voxel grid, we consider only the number of
LiDAR returns that fall into each voxel for occupancy prob
abilitycalculation. Likewise, forestimatingPfreeandPhid,
we consider the number of view rays that pass through the
voxel, or terminate in a LiDAR return before reaching it,
respectively.
To efficiently accumulate these statistics, view rays are
tracked through the grid using a variation of Bresenham’s
line drawing algorithm [2]. Each voxel is augmented with
three counters: nfree, nocc, and nhid. Starting from the
scanner position, each voxel in the path of the ray encoun
tered before the LiDAR return has nfreeincremented. For
the voxel containing the LiDAR return, noccis incremented.
The ray path is then continued until it leaves the grid, with
each subsequent voxel having nhidincremented. This pro
cess is illustrated in Figure 3.
Once all view rays have been traced through the grid, the
three probabilities are estimated for each voxel. In doing so,
we want completely unexplored voxels to be assigned the
noninformative prior probabilities Pfree= Pocc= Phid=
1
3. As the number of points accumulated in the voxel in
creases, the collected statistics should begin to dominate the
a priori estimates. This behavior would naturally be pro
vided by a Bayesian update rule. However, in our case we
have neither an accurate noise model for the scanner charac
teristics nor any knowledge about the scene structure within
a voxel, so estimating the conditional probabilities required
for a Bayesian rule is problematic. Instead, we estimate the
voxel occupancy probability by:
Pocc=1
3αn+ (1 − αn)
?nocc
n
?
,
(2)
Page 5
for the given voxel, where n = nfree+ nocc+ nhid. The
parameter α causes the influence of the a priori estimates
to diminish exponentially with each additional ray travers
ing the voxel. The remaining two probabilities for each
voxel, Phidand Pfree, are estimated similarly using nhid
and nfree, respectively.
4. Hypothesis Testing
Once the spatial probabilities at each voxel have been
estimated, we can test the hypothesis that an object model
as illustrated in Figure 2 is present in the scene, centered on
a specific voxel. Instead of testing this hypothesis directly,
we test the surrogate hypothesis that the random field model
of the scene is in a state consistent with the presence of the
object. The difference is subtle, as described below.
The object model comprises two sets of points, the set of
points on the object surface:
S = {xS
i,i = 1,...,N},
(3)
and the set of points topologically interior to the object:
I = {xI
j,j = 1,...,M},
(4)
where x represents a point location in the object’s coordi
nate system. We denote Sx0as the set of surface points
translated so that their center is at point x0:
Sx0= {xS
i+ x0,i = 1,...,N}.
(5)
Ix0is defined similarly.
To determine the probability that a single voxel in the
scene space is consistent with the presence of an object’s
surface, we could evaluate the probability that the point is
occupied, Pocc(x), alone. However, doing so neglects the
possible states of hidden points, which may either be hidden
because they are truly part of an object’s topological inte
rior, or may simply have been occluded from all viewpoints
in the LiDAR dataset. To account for this possibility, we
apply a noninformative prior, assuming that hidden points
can be either part of a surface or not with equal probabil
ity, since detection of intentionally hidden objects is par
ticularly important in our application. We define the new
probabilities that the voxel is consistent with the surface of
an object or not:
Psurf(x) = Pocc(x) +Phid(x)
2
(6)
P¬surf(x) = Pfree(x) +Phid(x)
2
(7)
We note that these adjusted probabilities still obey
Psurf(x) + P¬surf(x) = 1. We take Psurf(x) to repre
sent the probability that an object surface point is present at
x.
We now consider the probability P(Sx0) that the entire
object surface S exists in the scene with the object origin at
location x0. To do so, we calculate the probability that the
random fieldmodelofthescenespaceisinastateconsistent
with this hypothesis:
P(Sx0) =
N
?
i=1
Psurf(xS
i+ x0).
(8)
We similarly define P(Ix0), the probability that the ob
ject interior I is present in the scene centered at x0. Since
the salient characteristic of object interior points is that they
are always hidden from view (occluded by the object itself),
we obtain:
P(Ix0) =
M
?
j=1
Phid(xI
j+ x0)
(9)
for the probability that the random field model is in a state
consistent with the interior of the object being located at x0.
Our ultimate goal is to express the probability Pobj(x0)
that the object is present in the scene at location x0. For this
to be the case, the random field must be in a state consistent
with both the surface and the interior of the object being
present centered at x0:
Pobj(x0) = P(Sx0)P(Ix0)
?
=
N
i=1
Psurf(xS
i+ x0)
M
?
j=1
Phid(xI
j+ x0).
(10)
Since we have modeled spatial occupancy with a ran
dom field for which occupancy probabilities are indepen
dent, calculation of these probabilities is straightforward.
To facilitate efficient calculation and avoid numerical is
sues, we transform the product of (10) to a sum by con
sidering the log likelihood:
lnPobj(x0) =
N
?
?
i=1
lnPsurf(xS
i+ x0)
+
M
j=1
lnPhid(xI
j+ x0).
(11)
The sums of (11) can more easily be interpreted by a
change of notation, in which we explicitly index voxels and
object models by a triple index:
?
l=1
m=1n=1
?
= exp?S ? lnPsurf+ I ? lnPhid?,
Pobj
i,j,k= exp
N1
?
N2
?
M1
N3
?
Sl,m,nlnPsurf
i+l,j+m,k+n
+
l=1
M2
?
m=1
M3
?
n=1
Il,m,nlnPhid
i+l,j+m,k+n
?
(12)