PreprintPDF Available

A Survey on Deep Domain Adaptation for LiDAR Perception

Authors:
  • Mercedes-Benz AG
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Scalable systems for automated driving have to reliably cope with an open-world setting. This means, the perception systems are exposed to drastic domain shifts, like changes in weather conditions, time-dependent aspects, or geographic regions. Covering all domains with annotated data is impossible because of the endless variations of domains and the time-consuming and expensive annotation process. Furthermore, fast development cycles of the system additionally introduce hardware changes, such as sensor types and vehicle setups, and the required knowledge transfer from simulation. To enable scalable automated driving, it is therefore crucial to address these domain shifts in a robust and efficient manner. Over the last years, a vast amount of different domain adaptation techniques evolved. There already exists a number of survey papers for domain adaptation on camera images, however, a survey for LiDAR perception is absent. Nevertheless, LiDAR is a vital sensor for automated driving that provides detailed 3D scans of the vehicle's surroundings. To stimulate future research, this paper presents a comprehensive review of recent progress in domain adaptation methods and formulates interesting research questions specifically targeted towards LiDAR perception.
Content may be subject to copyright.
A Survey on Deep Domain Adaptation for LiDAR Perception
Larissa T. Triess1,2,Mariella Dreissig1Christoph B. Rist1J. Marius Z¨
ollner1,3
Abstract Scalable systems for automated driving have to
reliably cope with an open-world setting. This means, the
perception systems are exposed to drastic domain shifts, like
changes in weather conditions, time-dependent aspects, or
geographic regions. Covering all domains with annotated data
is impossible because of the endless variations of domains
and the time-consuming and expensive annotation process.
Furthermore, fast development cycles of the system additionally
introduce hardware changes, such as sensor types and vehicle
setups, and the required knowledge transfer from simulation.
To enable scalable automated driving, it is therefore crucial
to address these domain shifts in a robust and efficient man-
ner. Over the last years, a vast amount of different domain
adaptation techniques evolved. There already exists a number
of survey papers for domain adaptation on camera images,
however, a survey for LiDAR perception is absent. Nevertheless,
LiDAR is a vital sensor for automated driving that provides
detailed 3D scans of the vehicle’s surroundings. To stimulate
future research, this paper presents a comprehensive review of
recent progress in domain adaptation methods and formulates
interesting research questions specifically targeted towards
LiDAR perception.
I. INTRODUCTION
Highly automated vehicles and robots require a detailed
understanding of their dynamically evolving surroundings.
Over the past few years, deep learning techniques have
shown impressive results in many perception applications.
They typically require a huge amount of annotated data
matching the considered scenario to obtain reliable perfor-
mances. A major assumption in these algorithms is that the
training and application data share the same feature space
and distribution. However, in many real-world applications,
such as in the field of automated driving, this assumption
does not hold, since the agents live in an open-world setting.
Furthermore, collection and annotation of large datasets for
every new task and domain is extremely expensive, time-
consuming, and not practical for scalable systems. A domain
is defined as the scope of application for the algorithm.
A common scenario includes solving a detection task
in one domain with training data stemming from another
domain. In this case, the data may differ in their feature
space or follow a different data distribution. Examples for
these divergent domains can be different geographical re-
gions, weather scenarios, seasons, other timely aspects, and
many more. For scalable automated driving, fast development
cycles, simulated data, and changing sensor setups also play
a major role. Usually, when a perception system is exposed
1Mercedes-Benz AG, Research and Development, Stuttgart, Germany
2Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
3Research Center for Information Technology (FZI), Karlsruhe, Germany
Contact: larissa.triess@daimler.com (0000-0003-0037-8460)
Transfer Learning
Inductive Transfer
Learning
Unsupervised Transfer
Learning
Transductive Transfer
Learning
no annotated data
annotated data only in
source domain
annotated data in target
domain
same domain and same
task
different domains but
same task
Sample Selection Bias /
Covariance Shift Domain Adaptation
annotated data in source
domain, source and
target tasks are learned
simultaneously
Self-Taught LearningMulti-Task Learning
no annotated data in
source domain
Fig. 1: Overview of Transfer Learning: Domain adaptation
is a type of transductive transfer learning where the same task
is performed in different, but related domains with annotated
data only in the source domain. (Figure adapted from [1]).
to such a domain shift, its performance drops drastically.
However, it is possible to pass knowledge from a different
but related source domain to the desired target domain
with transfer learning. Specifically, domain adaptation (DA)
requires no manual annotations to adapt to new domains and
therefore promises a cheap and fast solution to deal with
domain shifts (compare Fig. 1).
In recent years, the research community proposed a vast
amount of techniques to transfer knowledge between do-
mains to mitigate the effect of performance drops. The
majority of the proposed methods are targeted towards
DA techniques on 2D camera images [2], [3], [4]. Most
of these approaches aim for a global feature alignment
and ignore local geometric information, which are crucial
in 3D perception [5]. Therefore, recent papers address DA
specifically for LiDAR perception [6], [7], [8], [9].
The focus of this paper is to give an overview on
DA methods that specifically address deep learning based
LiDAR perception and discuss their unique features, use-
cases, and challenges. The paper is organized as follows: Sec-
tion II gives an introduction to common LiDAR perception
tasks and the terminology of DA. The section also includes
an overview on typical baselines, datasets, DA applications,
and metrics. Section III categorizes common DA approaches
for LiDAR. In Section IV, we discuss different aspects of
the presented approaches and give an outlook on interesting
research directions.
II. BACKGROU ND
Wilson et al. [2] provide an extensive survey on image-
based DA approaches. This paper uses similar terminology
©2021 IEEE. Accepted at IV Workshops 2021. 1
arXiv:2106.02377v2 [cs.CV] 7 Jun 2021
but extends the DA methods with LiDAR-specific categories
and focus on LiDAR-related literature. The following gives
an introduction to the building components for deep learning-
based LiDAR DA research.
A. LiDAR Perception
LiDAR sensors are used in autonomous vehicles to obtain
precise distance measurements of the 3D surrounding and
extract high-level information about the underlying scenery.
Typical tasks are object detection, including tracking and
scene flow estimation [10], [11], [12]; point cloud segmen-
tation with semantic segmentation, instance segmentation,
and part segmentation [13], [14], [15]; and scene comple-
tion [16], [17]. Depending on the task at hand and other
factors, a number of different processing techniques and data
representations for the raw data emerged [18]. PointNet [19]
can process generic unordered sets of points and computes
features directly from the raw 3D point cloud. Newer meth-
ods either focus on computing features on local regions or
exploit grid-based structures to abstract hierarchical relations
within the point cloud. Examples are graph-based [20], [21],
multi-view-based [22], voxel-based [10], [11], or higher
dimensional lattices [23], [24]. For an extensive overview
on deep learning methods and representations for 3D point
clouds, we refer to [18], [25].
B. Domain Adaptation
Domain adaptation (DA) is a special type of transfer
learning [1]. Fig. 1shows the localization of DA research
in the field of transfer learning. It is divided into three
major categories: unsupervised transfer learning, where no
annotated data is used; transductive transfer learning, where
annotated data is only available in the source domain;
inductive transfer learning, where annotations are available
in the target domain.
DA is a type of transductive transfer learning where an-
notated source data but no annotated target data is available.
It is therefore also called unsupervised domain adaptation
in works that use a different terminology [2]. The learning
process is defined by performing the same task in different
but related domains. Related domains refer to domains that
are placed in a similar setting, such as outdoor driving
scenarios, whereas different domains refer to a specific aspect
that differs, for example sunny versus rainy days.
In multi-class classification, DA can be subdivided based
on the classes of the source and target domains, and on the
classes considered in the learning process (Fig. 2). Most
papers deal with Closed Set DA, where all classes appear
in both the source and target domains. In Partial DA, just a
subset of the classes from the source domain appears in the
target domain. For Open Set DA it is the other way around. If
both sets have both common and unique classes, it is called
Open-Partial DA. Boundless DA is an Open Set DA where
all target classes are learned individually.
Wilson et al. [2] suggest a categorization of DA methods
that reflects the different lines of research in that field. These
include: domain-invariant feature learning, domain-mapping,
Recognized Class
Unknown Class
Known Class
Target Domain
Source Domain
Closed Set Partial Open Set
Open-Partial Boundless
Fig. 2: Intersections between Source and Target Domains:
Domain adaptation can be subdivided based on the classes
considered in the learning process. All classes of the source
domain are known and are provided with labels. The classes
that also occur in the target domain get recognized by the
perception network. The ones that are not in source, but in
target do not get recognized (except for boundless DA) and
remain unknown. (Figure adapted from [3]).
normalization statistics, ensemble methods, and target dis-
criminative methods. Section III structures the LiDAR-based
DA approaches into these categories. To the best of our
knowledge, the literature does not provide any works on
ensemble methods or target discriminative methods for Li-
DAR. Yet, there exist approaches that are specific for LiDAR
applications which use domain-invariant data representations.
Therefore, we introduce this additional category.
C. Baselines
Compared to DA in the camera world, the field of LiDAR
DA is rather small at this time. Therefore, many LiDAR
papers compare their DA approaches to image baselines to
compensate the lack of LiDAR baselines. This section gives a
short overview on the baselines used in the presented papers.
The entropy minimization technique is one of the most
often referenced baselines. AdvEnt [26] introduces an adver-
sarial entropy minimization which minimizes the distribution
between the source and target based on self-information.
Subsequent work [27] claims that the gradient of the entropy
is biased towards samples that are easy to transfer in the en-
tropy minimization approach. Therefore, they propose a max-
imum squares loss to balance the gradient of well-classified
target samples and prevent the training to be dominated
by easy-to-transfer samples. Minimal-Entropy Correlation
Alignment [28] shows that entropy minimization is induced
by the optimal alignment of second order statistics between
source and target domains. On this basis, they propose to
use Geodesic instead of Euclidean distances, which improves
alignment along non-zero curvature manifolds.
Other image methods that are used as baselines for DA are:
CyCADA [29], an advanced CycleGAN [30]; FeaDA [31],
a joint global and class-specific domain adversarial learn-
ing framework; and OutDA [32], a multi-level adversarial
network that performs output space DA at different feature
levels.
D. Datasets
Over the last years, numerous LiDAR datasets with various
annotation types for automated driving emerged. A number
of these datasets are also used for DA. Most of the works
are based on KITTI [33], SemanticKITTI [34], nuScenes [35],
SemanticPOSS [36], and A2D2 [37]. However, none of these
datasets is specifically designed to foster DA research and
therefore methods using these datasets are hard to compare
against each other, due to varying interpretations of the label
mappings and the DA applications.
Therefore, the following datasets were recently re-
leased to address the growing demand in DA research.
SemanticUSL [38] is a dataset designed for DA be-
tween SemanticKITTI and SemanticPOSS.LIBRE [39]
and DENSE [40] both provide controlled scenes in ad-
verse weather with multiple sensors and modalities. The
Waymo Open dataset [41] even includes a DA benchmark.
All mentioned datasets are real-world recordings in urban
scenarios. However, synthetic datasets also play a major role
in the field of DA. The two most commonly used simulation
frameworks are CARLA [42] and GTA-V LiDAR [43] which
enable the simulation of different sensor models.
E. Applications and Use-Cases
Using simulators for autonomous driving applications
gained a lot of interest in the past years and also increased
the research on sim-to-real DA. Similarly, geography-to-
geography DA has to address changes in geographical and
environmental regions that largely differ in shapes of oth-
erwise similar objects, e.g. traffic signs. Adverse weather
conditions, such as fog or rain can substantially deteriorate
the detection capabilities of a LiDAR, since laser beams are
being reflected and scattered by the droplets or particles in
the atmosphere. Therefore, weather-to-weather DA considers
different weather scenarios and seasons. In contrast to DA
for cameras, day-to-night DA is not important to investigate
for LiDAR, since LiDAR is an active sensor that is less
dependent on external illumination compared to cameras.
Another important application in the field of development
cycles and vehicle setup is the case of sensor-to-sensor DA.
It tackles the differences in resolution, mounting position
or other sensor characteristics like range of vision, noise
characteristics and reflectivity estimates. Most of the related
work considers a far more general case for their research,
namely dataset-to-dataset. It involves multiple of the above
mentioned DA applications at once. Several publicly avail-
able driving datasets are used to develop and evaluate the
adaptation capabilities. Here, geography-to-geography and
sensor-to-sensor usually occur at once, often paired with
seasonal changes, making this task especially challenging.
III. MET HO DS
This section presents the state of the art on DA for LiDAR-
based environment perception (Table I). The approaches are
either data-driven, such as domain-invariant data representa-
tion, domain mapping, and normalization statistics, or model-
driven, such as domain-invariant feature learning.
TABLE I: Domain Adaptation Methods
aApplications: G2G geography-to-geography,D2D dataset-to-dataset,
D2N day-to-night,S2S sensor-to-sensor,S2R sim-to-real,W2W
weather-to-weather
bPerception: detect – object detection, semseg – semantic segmentation
Paper Appl.aTaskbDatasets Baselines
Domain-Invariant Data Representation (Section III-A)
[44], [45], [46] S2S semseg†† [33], [34], [42] -
PiLaNet [47] S2S semseg custom -
Complete & La-
bel [48]
D2D semseg [33], [35], [41] [31], [32]
Domain Mapping (Section III-B)
[6], [49], [50] D2D,S2R detect [33], [42] -
[51], [52] S2R - [33], [53] -
ePointDA [54] S2R semseg [33], [34], [43] -
Alonso et al. [8]D2D semseg [34], [36], [55] [26], [27]
Langer et al. [56] S2S semseg [34], [35] ([28])
Domain-Invariant Feature Learning (Section III-C)
SqueezeSeg-
V2 [9]
S2R semseg [33], [43] -
xMUDA [57] G2G,D2D,
D2N
semseg [34], [35], [37] [26], [28]
LiDARNet [38] D2D semseg [34], [36], [38] [29]
Wang et al. [58] S2S detect [33], [35] -
SF-UDA3D [59] D2D detect [33], [35] [60]
Normalization Statistics & Other Methods (Sections III-D &III-E)
Rist et al. [7]S2S semseg,
detect
[33], custom -
Caine et al. [61] G2G,W2W detect [41] -
these approaches use a combination with methods from Section III-A
††primary task is upsampling, but [44] and [46] also test for semseg
A. Domain-Invariant Data Representation
A domain-invariant representation is a hand-crafted ap-
proach to move different domains into a common represen-
tation. Fig. 3shows that this approach is basically a data pre-
processing after which a regular perception pipeline starts.
It is mostly used to account for the sensor-to-sensor domain
shift and receives special attention in LiDAR research. Avail-
able sensors vary in their resolution and sampling patterns
while resulting point clouds are additionally influenced by
the mounting position and the recording rate of the sensor.
Consequently, the acquired data vary considerably in their
statistics and distributions.
This data distribution mismatch makes it unfeasible to
apply the same model to different sensors in a naive way.
Therefore, many simple DA methods either align the sam-
pling differences in 2D space or use representations in
3D space that are less prone to domain differences. Other
methods include a normalization of the input feature spaces
with respect to different mounting positions by spatial aug-
mentations and replacing absolute LiDAR coordinates with
relative encoding schemes [7], [8].
1) Sampling Alignment in 2D Space: The sensor view of a
rotating LiDAR scanner resembles a 2D image. The vertical
resolution equals the number of layers and the horizontal
resolution depends on the revolution frequency of the sensor.
Convolutional neural networks are often used for LiDAR
perception in 2D space. Even though fully convolutional
Perception
Network
Source Data Prediction
Source
Pre-Processing Invariant
Data
(a) Training
Target Data
Perception
Network Prediction
Target
Pre-Processing Invariant
Data
(b) Testing
Fig. 3: Domain-Invariant Data Representation: The data
from the source domain at train-time (a) and the data from
the target domain at test-time (b) are both converted into a
hand-crafted common representation prior to being fed to the
perception pipeline.
networks operate independently of the input size, the re-
ceptive field still changes with varying sensor resolution. A
straightforward way to align these characteristics manually is
by either up-sampling the data [44], [45], [46] or by dropping
scan lines [8].
2) Geometric Representation in 3D Space: Geometric
representations exploit the inherent 3D structure of point
clouds. In the literature, there are two approaches on that:
employing a volumetric (column-like or voxel) data repre-
sentation or recovering the geometric 3D surface.
PointPillars [11] represents LiDAR data in a column-like
structure. The authors of [47] analyze this representation with
respect to its sensor-to-sensor DA capabilities and conclude
that this representation can be applied regardless of the
sensor’s vertical resolution, as it reduces that axis to one.
Complete & Label [48] introduces a more sophisticated
approach by exploiting the underlying scenery of the scan
which is independent of the recording mechanism. The
method includes a sparse voxel completion network to re-
cover the underlying surface that was sampled by a LiDAR.
This high-resolution representation is then used as an input
to a labeling network which provides semantic predictions
independent of the domain where the scene originates from.
B. Domain Mapping
Domain mapping aims at transferring the data of one
domain to another domain and is most often used in sim-
to-real and dataset-to-dataset applications. Fig. 4shows a
typical setup for domain mapping. Annotated source data
is usually transformed to appear like target data, creating a
labeled pseudo-target dataset. With the transformed data, a
perception network is trained which can then be applied to
target data at test time.
For images, domain mapping is usually done adversarially
and at pixel-level in the form of image-to-image translation
with conditional GANs [62], [63], [64], [65]. Similar prin-
ciples apply to LiDAR data, however, there also exists a
number of methods that do not rely on adversarial training.
1) Adversarial Domain Mapping: Adversarial domain
mapping is typically accomplished with conditional
GANs [66]. The generator translates a source input to the
Domain
Mapper
Perception
Network
Target Data
Source Data PredictionSource Data
(a) Training
Target Data
Perception
Network Prediction
(b) Testing
Fig. 4: Domain Mapping: This is the most commonly used
configuration for domain mapping. During training (a) the
labeled source data is conditionally (dashed line) mapped to
the target domain where a perception network is trained. At
test time (b), the trained perception network can directly be
applied to the target data.
target distribution without changing the underlying semantic
meaning. A perception network can then be trained on the
translated data using the known source labels. The translated
data shall have the same appearance as the target data.
In contrast to methods for camera images or generic point
clouds, the number of papers that adversarially generate
realistic LiDAR data is very limited. In fact, most approaches
use unmodified image GANs, such as CycleGAN [30], and
apply them to top-view projected images of the LiDAR
scans [6], [49], [50]. In all three of these works, the images
are translated between synthetic and real-world domains.
They evaluate their DA capabilities on top-view object
detection for which they use YOLOv3 [67]. They show an
improvement for object detection when YOLOv3 is trained
with the domain adapted data.
Another possibility is to exploit the sensor-view image
of the LiDAR. In contrast to the top-view projection, the
sensor-view image is a dense and lossless representation
with which the original 3D point cloud can be recovered.
Caccia et al. [51] provide an unsupervised method for both
conditional and unconditional LiDAR generation in projec-
tion space and test their method on reconstruction of noisy
data. The generated data does not possess any point-drops as
they usually occur in real-world data. Therefore, DUSty [52]
additionally incorporates a differentiable framework that can
sample binary noises to simulate these point-drops and mit-
igate the domain gap between the real and synthesized data.
Similarly, ePointDA [54] learns a dropout noise rendering
from real data and applies it to synthetic data.
2) Non-Adversarial Domain Mapping: The non-
adversarial mapping techniques primarily focus on the
sampling and distribution differences between LiDAR
sensors.
Alonso et al. [8] address the dataset-to-dataset problem
by using a data and class distribution alignment strategy. In
the data alignment process, a number of simple augmentation
techniques, such as xyz-shifts, replacing absolute with rela-
Feature
Extractor
Perception
Network
Target Data
Source Data Prediction
Feature
Extractor
Alignment
Component
(a) Training
Target Data
Perception
Network Prediction
Feature
Extractor
(b) Testing
Fig. 5: Domain-Invariant Feature Learning: A feature ex-
tractor network and an alignment component learn a domain-
invariant feature encoding (a). At test time (b), the domain-
invariant feature extractor is applied to the target data.
tive features, and the dropping of LiDAR beams, are used.
In the class distribution alignment, it is assumed that the
target domain has a similar class distribution as the source
domain, since the datasets in both domains are recorded in
urban scenarios. Therefore, the Kullback-Leibler divergence
between the two class distributions is minimized.
Another approach is to use a re-sampling technique to
address sensor-to-sensor domain shifts [56]. Here, several
scans of a recorded sequence are accumulated over time.
The specifications of a LiDAR sensor with lower resolution
are then used to sample a single semi-synthetic scan from the
accumulated scene. In the second step of the domain transfer,
the semantic segmentation model has to be re-trained with
geodesic correlation alignment to align second-order statis-
tics between source and target domains to generalize to a
dataset-to-dataset setting [28], [9]. To quantitatively verify
the effectiveness of their method, the authors labeled the
target dataset with the same classes as their source dataset.
C. Domain-Invariant Feature Learning
State-of-the-art methods in domain-invariant feature learn-
ing employ a training procedure that encourages the model
to learn a feature representation that is independent of the
domain. This is done by finding or constructing a common
representation space for the source and target domain. In
contrast to domain-invariant data representations, these ap-
proaches are not hand-crafted but use learned features.
If the classifier model performs well on the source domain
using a domain-invariant feature representation, then the
classifier may generalize well to the target domain. The
basic principle is depicted in Fig. 5. Common approaches
for domain-invariant feature learning can be categorized into
two basic principles:
1) Divergence Minimization: One approach on creating
a domain-invariant feature encoder is minimizing a suitable
divergence measure between the representation of the source
and the target domain within the deep neural network.
SqeezeSegV2 [9] propose a DA pipeline where the dis-
crepancies in the batch statistics from both domains are
minimized. Adapting the work of Morerio et al. [28], they
use a loss function based on the geodesic distance between
the output distributions. The authors apply their presented
approach to a sim-to-real setting. xMUDA [57] minimizes
the domain gap by incorporating a learning structure which
utilizes perception algorithms on both 2D images and 3D
point clouds. A specific cross-modal loss is designed as a
divergence measure. Thus, an information exchange between
the two modalities benefits the overall performance in pres-
ence of a domain shift. The method is tested on day-to-night,
geography-to-geography and dataset-to-dataset scenarios.
2) Discriminator-based Approaches: The basic idea of
domain-invariant feature learning with a discriminator is to
utilize adversarial training to force the feature encoder to
learn only domain-invariant features. The feature extractor
learned a domain-invariant feature representation as soon
as the discriminator is not able to distinguish from which
domain the feature representation originated (i.e. source or
target). LiDARNet [38] follows this discriminator approach,
which was previously applied to camera images in [68]. The
same principle of minimizing domain gaps by employing a
discriminator is applied in [58]. Here, the authors conduct
the model adaptation on intermediate layers of the DNN to
improve the detection of far range objects. Though this near-
to-far is a special case of DA within one LiDAR sensor, the
effectiveness of this approach is demonstrated in dataset-to-
dataset use cases.
The authors of [59] present a different but interesting take
on DA for LiDAR point clouds in their work. They exploit
temporal consistency in the detection to generate pseudo-
labels on the target domain. Following, they built a model
which does not rely on annotations from the source domain
by utilizing a variant of self-taught learning.
D. Normalization Statistics
The primary use of normalization techniques is to improve
training convergence, speed and performance. These advan-
tages have initially been verified on image datasets, image
related tasks and their respective model architectures. On
LiDAR data, normalization techniques are used equally for
improved feature extraction for a variety of tasks [9], [10],
[11], [19], [23]. On images, the properties of normalization
are used for explicit DA. A set of normalization statistics per
domain are expected to separate domain knowledge from task
knowledge. This encourages DNNs to learn style-invariant
representations of input data. Conversely, manipulating the
distribution of intermediate layer activations is explicitly
used for image style transfer. However, experimental studies
that verify a strong similarity between style normalization on
camera images and sensor or scene normalization on LiDAR
are absent.
In DNNs, normalization layers improve training conver-
gence by aligning the distributions of training data and
therefore controlling internal covariate shift and the scale
of the gradients. Distribution alignment is implemented by a
normalization of mean and variance of activations over par-
titions of the batch-pixel-feature tensor. The most prominent
normalization technique is batch normalization [69]. Build-
ing on the same basic idea, subsequent related normalization
procedures [70], [71], [72] address issues with batch normal-
ization related to implementation, network architectures, or
certain data domains.
Adaptive batch normalization is a simple and straightfor-
ward DA method that re-estimates the dataset statistics on
the target domain as a natural extension of the batch norm
approach [60]. However, this normalization approach alone
does not lead to a satisfactory object detection performance
in a LiDAR sensor-to-sensor DA setup [7]. When training
on multiple image data domains simultaneously, switching
between per-domain statistics is used for DA on image
tasks as the second stage of a two-stage approach [73].
Initial pseudo-labels are iteratively refined using separate
batch norm statistics for each domain. The effectiveness of
per-domain statistics on LiDAR domain gaps has not been
verified experimentally.
E. Other Methods
Recently, it is demonstrated that teacher-student knowl-
edge distillation can increase performance on a target domain
in an unsupervised DA setting on LiDAR [61]. A teacher
network is trained on the source domain and creates pseudo-
labels on the target domain. The smaller student network is
then trained on source and target domains simultaneously. A
setup to adapt to a domain with different geometries (dataset-
to-dataset) is used as experimental evaluation. The pseudo-
label trained student networks show better generalization
capabilities on the target domain than the teacher networks.
Practitioners benefit from the simplicity of the approach and
the option to adhere to inference time budgets with small
student networks. However, in practice the sensor setup itself
might be different instead of only the scene geometry. This
yields a more difficult problem as the networks need to work
on both domains simultaneously.
IV. DISCUSSION
After reviewing the recent advances in the related liter-
ature, we discuss the main challenges that remain open in
DA for LiDAR perception. They pose interesting research
directions for future works.
A. Comparability and Transfer from other Modalities
An essential part of research is the comparability between
different approaches to foster further research in promising
directions. In most papers, the success of the DA process is
measured by the performance of a downstream perception
task. However, this has two major drawbacks. First, it is
assumed that the quality of the domain adaptation process
directly correlates with the performance changes in the
downstream perception. However, there is no proof for this
yet. Second, the methods are still not comparable, since
they all use different datasets, task settings, label sets, and
report different metrics. One reason might be that most of the
advanced DA approaches in LiDAR only emerged recently,
therefore no metric or baseline prevailed.
To mitigate the lack of a LiDAR-specific baseline, many
of the presented works use image-based approaches as a
baseline comparison for their own work. However, these are
usually unfair evaluations, since the baseline methods are
not optimized for LiDAR data. Furthermore, it is unknown
whether there is a qualitative difference in the domain gaps
between image domains (camera models, image styles) and
LiDAR domains (LiDAR sensors, 3D scenes). If so, this
would prevent all the DA methods for images to be further
optimized with respect to LiDAR applications.
B. Discrepancies in Domain Gap Quality
The size of a domain gap can be measured in terms of
model performance on a given task if target labels are avail-
able. Consequently, this measure of the apparent size of a
domain gap is model-specific and task-specific. Nevertheless,
a change of the LiDAR sensor causes a more severe impact
on the final performance consistently across several tasks
and models than changes in weather or location. It seems
that a sensor-to-sensor domain gap is not easily covered
by learned and implicit DA methods such as normalization
statistics and adversarial domain mapping that work well
in the image domain. Hand-crafted DA methods based on
the explicit geometric properties of LiDAR data and their
representation are the only ones that yield reasonable results
on sensor-to-sensor domain shifts so far. We see a qualitative
difference between DA for LiDAR and images within the
state of the art.
C. Relevance of Cross-Sensor Adaptation
Various sensor types were developed in the past decade.
Available LiDAR sensors mainly vary in their design and
functionality, i.e. rotating and oscillating LiDARs [74], [75].
Consequently, the scan-pattern and thus the data representa-
tion differs considerably between the sensors.
Although the mechanical spinning LiDAR is mainly used
for perception in the autonomous driving research [34],
[35], all sensor types have their benefits. The publicly
available PandaSet dataset [76] is one of the first datasets
to incorporate two different sensor types: a 360covering
spinning LiDAR and an oscillating LiDAR with a snake
scan-pattern. Since it is unclear which type of sensor will
prevail in the context of autonomous driving in the future,
it is crucial to advance the development of sensor-invariant
perception systems. The in Section III-A presented geometric
and volumetric data representation approaches are valuable
to fully benefit from the different sensor types. Constructing
a domain-invariant data representation in an intermediate step
makes it possible to re-use already trained DNNs and apply
them to new sensor setups. This is essential to lower the
research costs and to keep up with the sensor development.
D. Adaptation in Different Weather Scenarios
LiDAR sensors are heavily impacted by adverse weather
conditions, such as rain or fog, which cause undesired mea-
surements and leads to perception errors. There exists some
work on denoising LiDAR perception [77], but the subject is
not specifically tackled in a DA setting. With the release of
new datasets containing adverse weather scenarios [39], [40],
we believe it is possible to foster the research for weather-
to-weather applications.
E. Generative Models for Domain Translation
The category of adversarial domain mapping techniques
(Section III-B.1) includes only four approaches, none of
which is capable to generate realistic LiDAR point clouds
in 3D (they use top-view projections of the point clouds).
This is surprising, since the same strategy is thriving in the
image world, where a lot of research is conducted to generate
realistic images. There are two possible explanations to it:
either, it is simply not necessary to use generative models
to create realistic point clouds, in the sense that other
approaches are far more powerful, or it is not possible to
achieve the required high quality of the generated data. Either
way, up until now there exists no study that either proves
or contradicts any of these assumptions. A method similar
to [78] that analyzes the DA performance of generative
models for LiDAR domain mapping might help to advance
research in this direction.
F. Open-Partial Domain Adaptation
The majority of the presented approaches deal with
dataset-to-dataset applications, a manifold adaptation task,
where both the sensor type and the environment change.
This makes the task particularly difficult. However, an-
other effect comes to light. Usually, datasets have unique
labeling strategies and include a different set of classes.
For example, SemanticKITTI has 28 semantic classes while
nuScenes divides into 32 classes. Since the domains have
both common and unique classes, this is called an Open-
Partial DA problem [3].
However, none of the presented approaches directly tack-
les the Open-Partial formulation, but rather perform a label
mapping strategy that allows them to address this as a Closed
Set DA problem. Some of the segmentation approaches do
not focus on segmenting the entire scenery from the start, but
only perform foreground segmentation, for example cars ver-
sus background [48]. A common strategy when semantically
segmenting entire scenes, is to find a common minimal class
mapping between the two datasets, that discards all classes
that cannot be matched [8]. Some works even re-label one
of the datasets to perfectly match the label definitions of the
other dataset [56].
In the image world, there are already works that deal with
the Open-Partial and Open-Set DA problems [79], [80], [81].
In the LiDAR world, this field still holds a lot of potential
for future research. We believe that a good strategy for
these questions can help to advance in scalable systems for
automated vehicles.
V. CONCLUSION
This paper reviewed several current trends of DA for
LiDAR perception. The methods are classified into five
different settings: domain-invariant feature learning, domain
mapping, domain-invariant data representations, normaliza-
tion statistics and other methods. Depending on the applica-
tion and the extent of the domain gap itself, different types
of DA approaches can be used to bridge the gap. Especially
the hand-crafted DA methods like normalization statistics
and domain-invariant feature representation can be used to
complement DA learning methods like domain mapping and
domain-invariant feature learning.
The main contribution of this paper is to formulate several
interesting research questions. Addressing them can benefit
the DA research for LiDAR in the future. The introduction of
a prevailing LiDAR DA benchmark can yield an important
step forward to make existing and new works comparable.
Furthermore, many of the surveyed approaches stem from
ideas developed for cameras. The adaptation to the LiDAR
modality yields promising results, but the characteristics of
the LiDAR sensor pose novel modality-specific challenges
and research questions. Examples are the need to address
critical use-cases like cross-sensor and weather domain shift.
On the other hand, day-to-night adaptation becomes insignifi-
cant for LiDAR. Unlike in the image domain, there exist only
few adversarial methods that focus on generating realistic
LiDAR point clouds. Finally, in both camera and LiDAR,
research mostly focuses on closed-set DA, however for truly
scalable automated driving it is important to address the
open-partial DA problem.
ACKNOWLEDGMENT
This work was presented at the Workshop on Autonomy at
Scale (WS52), IV2021 The research leading to these results
is funded by the German Federal Ministry for Economic
Affairs and Energy within the project “KI Delta Learning”
(F¨
orderkennzeichen 19A19013A).
REFERENCES
[1] S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” in IEEE
Transactions on Knowledge and Data Engineering, 2010.
[2] G. Wilson and D. J. Cook, “A Survey of Unsupervised Deep Domain
Adaptation,” arXiv.org, 2020.
[3] M. Toldo, et al., “Unsupervised Domain Adaptation in Semantic
Segmentation: a Review,” arXiv.org, 2020.
[4] S. Zhao, et al., “A Review of Single-Source Deep Unsupervised Visual
Domain Adaptation,” in TNNLS, 2020.
[5] C. Qin, et al., “PointDAN: A Multi-Scale 3D Domain Adaption
Network for Point Cloud Representation,” in NIPS, 2019.
[6] K. Saleh, et al., “Domain Adaptation for Vehicle Detection from Bird’s
Eye View LiDAR Point Cloud Data,” in ICCV Workshops, 2019.
[7] C. B. Rist, M. Enzweiler, and D. M. Gavrila, “Cross-Sensor Deep
Domain Adaptation for LiDAR Detection and Segmentation,” in IV,
2019.
[8] I. Alonso, et al., “Domain Adaptation in LiDAR Semantic Segmenta-
tion,” arXiv.org, 2020.
[9] B. Wu, et al., “SqueezeSegV2: Improved Model Structure and Un-
supervised Domain Adaptation for Road-Object Segmentation from a
LiDAR Point Cloud,” in ICRA, 2019.
[10] Y. Zhou and O. Tuzel, “VoxelNet: End-to-End Learning for Point
Cloud Based 3D Object Detection,” in CVPR, 2018.
[11] A. H. Lang, et al., “PointPillars: Fast Encoders for Object Detection
From Point Clouds,” in CVPR, 2019.
[12] X. Liu, C. R. Qi, and L. J. Guibas, “FlowNet3D: Learning Scene Flow
in 3D Point Clouds,” in CVPR, 2019.
[13] A. Milioto, et al., “RangeNet++: Fast and Accurate LiDAR Semantic
Segmentation,” in IROS, 2019.
[14] C. Xu, et al., “SqueezeSegV3: Spatially-Adaptive Convolution for
Efficient Point-Cloud Segmentation,arXiv.org, 2020.
[15] K. Sirohi, et al., “EfficientLPS: Efficient LiDAR Panoptic Segmenta-
tion,” arXiv.org, 2021.
[16] C. B. Rist, et al., “SCSSnet: Learning Spatially-Conditioned Scene
Segmentation on LiDAR Point Clouds,” in IV, 2020.
[17] C. Agia, et al., “S3CNet: A Sparse Semantic Scene Completion
Network for LiDAR Point Clouds,” in CoRL, 2020.
[18] Y. Guo, et al., “Deep Learning for 3D Point Clouds: A Survey,” in
PAMI, 2020.
[19] R. Q. Charles, et al., “PointNet: Deep Learning on Point Sets for 3D
Classification and Segmentation,” in CVPR, 2017.
[20] Y. Wang, et al., “Dynamic Graph CNN for Learning on Point Clouds,
in Transaction on Graphics, 2019.
[21] R. Klokov and V. Lempitsky, “Escape from Cells: Deep Kd-Networks
for the Recognition of 3D Point Cloud Models,” in ICCV, 2017.
[22] E. Kalogerakis, et al., “3D Shape Segmentation with Projective
Convolutional Networks,” in CVPR, 2017.
[23] H. Su, et al., “SPLATNet: Sparse Lattice Networks for Point Cloud
Processing,” in CVPR, 2018.
[24] Y. Rao, J. Lu, and J. Zhou, “Spherical Fractal Convolutional Neural
Networks for Point Cloud Recognition,” in CVPR, 2019.
[25] S. A. Bello, et al., “Review: Deep Learning on 3D Point Clouds,” in
Remote Sensing, 2020.
[26] T.-H. Vu, et al., “ADVENT: Adversarial Entropy Minimization for
Domain Adaptation in Semantic Segmentation,” in CVPR, 2019.
[27] M. Chen, H. Xue, and D. Cai, “Domain Adaptation for Semantic
Segmentation With Maximum Squares Loss,” in ICCV, 2019.
[28] P. Morerio, J. Cavazza, and V. Murino, “Minimal-Entropy Correlation
Alignment for Unsupervised Deep Domain Adaptation,” in ICLR,
2018.
[29] J. Hoffman, et al., “CyCADA: Cycle Consistent Adversarial Domain
Adaptation,” in ICML, 2018.
[30] J.-Y. Zhu, et al., “Unpaired Image-to-Image Translation using Cycle-
Consistent Adversarial Networks,” in ICCV, 2017.
[31] Y.-H. Chen, et al., “No More Discrimination: Cross City Adaptation
of Road Scene Segmenters,” in ICCV, 2017.
[32] Y.-H. Tsai, et al., “Learning to Adapt Structured Output Space for
Semantic Segmentation,” in CVPR, 2018.
[33] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for Autonomous
Driving? The KITTI Vision Benchmark Suite,” in CVPR, 2012.
[34] J. Behley, et al., “SemanticKITTI: A Dataset for Semantic Scene
Understanding of LiDAR Sequences,” in ICCV, 2019.
[35] H. Caesar, et al., “nuScenes: A Multimodal Dataset for Autonomous
Driving,” in CVPR, 2020.
[36] Y. Pan, et al., “SemanticPOSS: A Point Cloud Dataset with Large
Quantity of Dynamic Instances,” in IV, 2020.
[37] J. Geyer, et al., “A2D2: Audi Autonomous Driving Dataset,arXiv.org,
2020.
[38] P. Jiang and S. Saripalli, “LiDARNet: A Boundary-Aware Domain
Adaptation Model for Lidar Point Cloud Semantic,” arXiv.org, 2020.
[39] A. Carballo, et al., “LIBRE: The Multiple 3D LiDAR Dataset,” in IV,
2020.
[40] M. Bijelic, et al., “Seeing Through Fog Without Seeing Fog: Deep
Multimodal Sensor Fusion in Unseen Adverse Weather,” in CVPR,
2020.
[41] P. Sun, et al., “Scalability in Perception for Autonomous Driving:
Waymo Open Dataset,arXiv.org, 2020.
[42] A. Dosovitskiy, et al., “CARLA: An Open Urban Driving Simulator,”
in CoRL, 2017.
[43] X. Yue, et al., “A LiDAR Point Cloud Generator: From a Virtual
World to Autonomous Driving,” in ICMR, 2018.
[44] L. T. Triess, et al., “CNN-based synthesis of realistic high-resolution
LiDAR data,” in IV, 2019.
[45] T. Shan, et al., “Simulation-based Lidar Super-resolution for Ground
Vehicles,” arXiv.org, 2020.
[46] A. Elhadidy, et al., “Improved Semantic Segmentation of Low-
Resolution 3D Point Clouds Using Supervised Domain Adaptation,”
in NILES, 2020.
[47] F. Piewak, P. Pinggera, and M. Z¨
ollner, “Analyzing the Cross-Sensor
Portability of Neural Network Architectures for LiDAR-based Seman-
tic Labeling,” in ITSC, 2019.
[48] L. Yi, B. Gong, and T. Funkhouser, “Complete & Label: A Domain
Adaptation Approach to Semantic Segmentation of LiDAR Point
Clouds,” arXiv.org, 2020.
[49] A. E. Sallab, et al., “LiDAR Sensor modeling and Data augmentation
with GANs for Autonomous driving,” in ICML Workshops, 2019.
[50] ——, “Unsupervised Neural Sensor Models for Synthetic LiDAR Data
Augmentation,” in NIPS Workshops, 2019.
[51] L. Caccia, et al., “Deep Generative Modeling of LiDAR Data,” in
IROS, 2019.
[52] K. Nakashima and R. Kurazume, “Learning to Drop Points for LiDAR
Scan Synthesis,” arXiv.org, 2021.
[53] O. M. Mozos, et al., “Fukuoka datasets for place categorization,” in
IJRR, 2019.
[54] S. Zhao, et al., “ePointDA: An End-to-End Simulation-to-Real Domain
Adaptation Framework for LiDAR Point Cloud Segmentation,” in
AAAI, 2021.
[55] X. Roynard, J.-E. Deschaud, and F. Goulette, “Paris-Lille-3D: A large
and high-quality ground-truth urban point cloud dataset for automatic
segmentation and classification,” in IJRR, 2018.
[56] F. Langer, et al., “Domain Transfer for Semantic Segmentation of
LiDAR Data using Deep Neural Networks,” in IROS, 2020.
[57] M. Jaritz, et al., “xMUDA: Cross-Modal Unsupervised Domain Adap-
tation for 3D Semantic Segmentation,” in CVPR, 2020.
[58] Z. Wang, et al., “Range Adaptation for 3D Object Detection in
LiDAR,” in ICCV Workshops, 2019.
[59] C. Saltori, et al., “SF-UDA3D: Source-Free Unsupervised Domain
Adaptation for LiDAR-Based 3D Object Detection,” in 3DV, 2020.
[60] Y. Li, et al., “Revisiting Batch Normalization for Practical Domain
Adaptation,” in ICLR Workshops, 2017.
[61] B. Caine, et al., “Pseudo-labeling for Scalable 3D Object Detection,”
arXiv.org, 2021.
[62] Y. Choi, et al., “StarGAN: Unified Generative Adversarial Networks
for Multi-domain Image-to-Image Translation,” in CVPR, 2018.
[63] A. Royer, et al., “XGAN: Unsupervised Image-to-Image Translation
for Many-to-Many Mappings,” in Domain Adaptation for Visual
Understanding, 2020.
[64] S. Benaim and L. Wolf, “One-Sided Unsupervised Domain Mapping,
in NIPS, 2017.
[65] Y. Taigman, A. Polyak, and L. Wolf, “Unsupervised Cross-Domain
Image Generation,” in ICLR, 2017.
[66] M. Mirza and S. Osindero, “Conditional Generative Adversarial Nets,
arXiv.org, 2014.
[67] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,
arXiv.org, 2018.
[68] H. Zhao, et al., “On Learning Invariant Representations for Domain
Adaptation,” in PMLR, 2019.
[69] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep
Network Training by Reducing Internal Covariate Shift,” in ICML,
2015.
[70] Y. Wu and K. He, “Group Normalization,” in ECCV, 2018.
[71] H. Nam and H.-E. Kim, “Batch-Instance Normalization for Adaptively
Style-Invariant Neural Networks,” in NIPS, 2018.
[72] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky, “Instance Normalization:
The Missing Ingredient for Fast Stylization,” arXiv.org, 2016.
[73] W.-G. Chang, et al., “Domain-Specific Batch Normalization for Un-
supervised Domain Adaptation,” in CVPR, 2019.
[74] Y. Li and J. Ibanez-Guzman, “LiDAR for Autonomous Driving:
The Principles, Challenges and Trends for Automotive LiDAR and
Perception Systems,” in SPM, 2020.
[75] S. Royo and M. Ballesta-Garcia, “An Overview of Lidar Imaging
Systems for Autonomous Vehicles,” in Applied Sciences, 2019.
[76] “PandaSet: Public large-scale dataset for autonomous driving.
[Online]. Available: https://scale.com/open- datasets/pandaset
[77] R. Heinzler, et al., “CNN-Based Lidar Point Cloud De-Noising in
Adverse Weather,” in RA-L, 2020.
[78] C. Hubschneider, S. Roesler, and J. M. Z¨
ollner, “Unsupervised Eval-
uation of Lidar Domain Adaptation,” in ITSC, 2020.
[79] P. P. Busto and J. Gall, “Open Set Domain Adaptation,” in ICCV,
2017.
[80] K. Saito, et al., “Open Set Domain Adaptation by Backpropagation,”
in ECCV, 2018.
[81] Y. Luo, et al., “Progressive Graph Learning for Open-Set Domain
Adaptation,” in ICML, 2020.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Panoptic segmentation of point clouds is a crucial task that enables autonomous vehicles to comprehend their vicinity using their highly accurate and reliable LiDAR sensors. Existing top–down approaches tackle this problem by either combining independent task-specific networks or translating methods from the image domain ignoring the intricacies of LiDAR data and thus often resulting in suboptimal performance. In this article, we present the novel top–down efficient LiDAR panoptic segmentation (EfficientLPS) architecture that addresses multiple challenges in segmenting LiDAR point clouds, including distance-dependent sparsity, severe occlusions, large scale-variations, and reprojection errors. EfficientLPS comprises of a novel shared backbone that encodes with strengthened geometric transformation modeling capacity and aggregates semantically rich range-aware multiscale features. It incorporates new scale-invariant semantic and instance segmentation heads along with the panoptic fusion module which is supervised by our proposed panoptic periphery loss function. Additionally, we formulate a regularized pseudolabeling framework to further improve the performance of EfficientLPS by training on unlabeled data. We benchmark our proposed model on two large-scale LiDAR datasets: nuScenes, for which we also provide ground truth annotations, and SemanticKITTI. Notably, EfficientLPS sets the new state-of-the-art on both these datasets.
Article
Full-text available
We propose a methodology for lidar super-resolution with ground vehicles driving on roadways, which relies completely on a driving simulator to enhance, via deep learning, the apparent resolution of a physical lidar. To increase the resolution of the point cloud captured by a sparse 3D lidar, we convert this problem from 3D Euclidean space into an image super-resolution problem in 2D image space, which is solved using a deep convolutional neural network. By projecting a point cloud onto a range image, we are able to efficiently enhance the resolution of such an image using a deep neural network. Typically, the training of a deep neural network requires vast real-world data. Our approach does not require any real-world data, as we train the network purely using computer-generated data. Thus our method is applicable to the enhancement of any type of 3D lidar theoretically. By novelly applying Monte-Carlo dropout in the network and removing the predictions with high uncertainty, our method produces high accuracy point clouds comparable with the observations of a real high resolution lidar. We present experimental results applying our method to several simulated and real-world datasets. We argue for the method’s potential benefits in real-world robotics applications such as occupancy mapping and terrain modeling.
Conference Paper
Full-text available
With the increasing reliance of self-driving and similar robotic systems on robust 3D vision, the processing of LiDAR scans with deep convolutional neu-ral networks has become a trend in academia and industry alike. Prior attempts on the challenging Semantic Scene Completion task-which entails the inference of dense 3D structure and associated semantic labels from "sparse" representations have been, to a degree, successful in small indoor scenes when provided with dense point clouds or dense depth maps often fused with semantic segmen-tation maps from RGB images. However, the performance of these systems drop drastically when applied to large outdoor scenes characterized by dynamic and exponentially sparser conditions. Likewise, processing of the entire sparse volume becomes infeasible due to memory limitations and workarounds introduce computational inefficiency as practitioners are forced to divide the overall volume into multiple equal segments and infer on each individually, rendering real-time performance impossible. In this work, we formulate a method that subsumes the sparsity of large-scale environments and present S3CNet, a sparse convolution based neu-ral network that predicts the semantically completed scene from a single, unified LiDAR point cloud. We show that our proposed method outperforms all counterparts on the 3D task, achieving state-of-the art results on the SemanticKITTI benchmark [1]. Furthermore, we propose a 2D variant of S3CNet with a multi-view fusion strategy to complement our 3D network, providing robustness to oc-clusions and extreme sparsity in distant regions. We conduct experiments for the 2D semantic scene completion task and compare the results of our sparse 2D network against several leading LiDAR segmentation models adapted for bird's eye view segmentation on two open-source datasets.
Conference Paper
In this work, we investigate the potential of latent representations generated by Variational Autoencoders (VAE) to analyze and distinguish between real and synthetic data. Although the details of the domain adaptation task are not the focus of this work, we use the example of simulated lidar data adapted by a generative model to match real lidar data. To assess the resulting adapted data, we evaluate the potential of latent representations learned by a VAE. During training, the VAE aims to reduce the input data to a fixed-dimensional feature vector, while also enforcing stochastic independence between the latent variables. These properties can be used to define pseudometrics to make statements about generative models that perform domain adaptation tasks. The variational autoencoder is trained on real target data only and is subsequently used to generate distributions of feature vectors for data coming from different data sources such as simulations or the output of Generative Adversarial Networks.