Conference PaperPDF Available

First Attempt to Build Realistic Driving Scenes using Video-to-video Synthesis in OpenDS Framework

Authors:

Abstract and Figures

Existing programmable simulators enable researchers to customize different driving scenarios to conduct in-lab automotive driver simulations. However, software-based simulators for cognitive research generate and maintain their scenes with the support of 3D engines, which may affect users' experiences to a certain degree since they are not sufficiently realistic. Now, a critical issue is the question of how to build scenes into real-world ones. In this paper, we introduce the first step in utilizing video-to-video synthesis, which is a deep learning approach, in OpenDS framework, which is an open-source driving simulator software, to present simulated scenes as realistically as possible. Off-line evaluations demonstrated promising results from our study, and our future work will focus on how to merge them appropriately to build a close-to-reality, real-time driving simulator.
Content may be subject to copyright.
First Attempt to Build Realistic Driving
Scenes using Video-to-video Synthesis
in OpenDS Framework
Abstract
Existing programmable simulators enable researchers
to customize different driving scenarios to conduct in-
lab automotive driver simulations. However, software-
based simulators for cognitive research generate and
maintain their scenes with the support of 3D engines,
which may affect users’ experiences to a certain degree
since they are not sufficiently realistic. Now, a critical
issue is the question of how to build scenes into real-
world ones. In this paper, we introduce the first step in
utilizing video-to-video synthesis, which is a deep
learning approach, in OpenDS framework, which is an
open-source driving simulator software, to present
simulated scenes as realistically as possible. Off-line
evaluations demonstrated promising results from our
study, and our future work will focus on how to merge
them appropriately to build a close-to-reality, real-time
driving simulator.
Author Keywords
Video Synthesis; Driving Simulator; Machine Learning;
* stands for equal contributions
Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for third-party components of this work must be honored. For all other
uses, contact the Owner/Author.
AutomotiveUI '19 Adjunct, September 2125, 2019, Utrecht, Netherlands
© 2019 Copyright is held by the owner/author(s).
ACM ISBN 978-1-4503-6920-6/19/09.
https://doi.org/10.1145/3349263.3351497
Zili Song*
Shuolei Wang*
University of Nottingham
Ningbo, Zhejiang
zy22063@nottingham.edu.cn
zy22067@nottingham.edu.cn
Weikai Kong
University of Nottingham
Ningbo, Zhejiang
scywk1@nottingham.edu.cn
Xiangjun Peng
University of Nottingham
Ningbo, Zhejiang
zy22056@nottingham.edu.cn
Xu Sun
University of Nottingham
Ningbo, Zhejiang
xu.sun@nottingham.edu.cn
Introduction
Existing programmable simulators enable researchers
to customize different driving scenarios to conduct in-
lab automotive driver simulations. However, software-
based simulators for cognitive research generate and
maintain their scenes with the support of 3D engines,
which may affect users’ experiences to a certain degree
since they are sufficiently realistic. Now, a critical issue
is the question of how to present scenes which are
more realistic.
In this paper, we introduce our work-in-progress, which
is to build a close-to-reality, real-time cognitive driving
simulator to enhance the user experience while
undertaking in-lab studies. We first generated several
video pieces from OpenDS framework, which is a free,
portable and open-source driving simulator [5]. We
then transformed those pieces into colored frames
based on labeling policy. Finally, We took the video-to-
video synthesis, which is a deep learning approach in
Computer Vision, as a subsystem to build realistic
scenes [6]. Off-line evaluations demonstrated both
promising results and outstanding challenges from
video-to-video synthesis. Our future work will focus on
how to merge them appropriately to be realistic.
Motivation
In this section, we have taken an example to illustrate
what was our motivation by analyzing the functionality
of OpenDS and its drawbacks.
Before launching OpenDS, a set of images were stored
in advance for further scene generation, an example of
which is shownin Figure 1. Figure 2 shows an example
of a 3D building model, while Figure 3 demonstrates
this model in a scenario. Then, OpenDS built the scene
by pasting these “stickers”1onto 3D building model.
Finally, it appears in the simulated scenario and rotated
with the changes of view, as shown in Figure 4.
OpenDS has provided freedom for researchers to build
and customize their scenarios. However, the rotations
of simulated scenes made a original clear image
become vague with the changes of the visual field. In
particular, buildings by roads suffered from this mostly.
Our Approach: Video-to-Video Synthesis
We chose Video-to-Video Synthesis (vid2vid) to
generate realistic scenes. We aimed to achieve a close-
to-reality simulation for users, with the minimal
adjustments in OpenDS framework to keep its original
features. There are three reasons that we chose vid2vid.
First, vid2vid is the most suitable framework for video
generations. Previous work on building realistic scenes
limited its practicality since they applied Image-to-
Image synthesis [3], which led to drifts in video flow
while emerging images into one video.
Second, vid2vid is an extensible framework for
different demands of scenes. The current versions of
Vid2vid relied on Cityscape, a open-sourced high-
resolution data set on Germany street views when
driving [1]. It’s applicable to be generalized into
different places as needed, when there are data
support, like Apolloscape (i.e. similar dataset for street
views in China) [2].
Third, vid2vid is a portable framework. Most
implementations of vid2vid were done on PyTorch, a
1refers to those 2D building images, like Figure 1.
Figure 1: A 2D building “sticker”.
Figure 2: A 3D building example.
Figure 3: An example of the
scene without the “sticker”.
Figure 4: An example of the
scene with the “sticker”.
cross-platform open-sourced machine learning system
[4]. This feature allowed vid2vid to be embedded with
OpenDS without resetting Operating Systems.
Our approach works as follow: first, we transformed the
simulated forms into labeled versions, as highlighted in
Figure 5. Then, it applied vid2vid to create the realistic
driving scenes, as shown in Figure 6, from the labeled
versions.
Experimental Design
We conducted our experimental study to show the
effects of our approach in four steps. First, we selected
the driving scenario “Paris”, which is a standard scenar-
-io from OpenDS, and programmed a specific route for
the driving simulator to drive automatically. Then, we
performed the same routes under three different
environmental settings (i.e. Sunny, Rainy and Night
time) and recorded them. Next, we trained those
videos in Grayscale versions. Finally, we produced the
results via vid2vid framework.
The experimental procedure were shown step by step
in Figure 7, Figure 8 and Figure 9, which shows the
simulated, labeled and synthesized versions
respectively. In each figure, the left one was driving in
the sunny day, the middle one was driving in the rainy
day and the right one was driving at night.
Figure 7: Three examples of driving scenes in OpenDS framework, which are with different environmental settings of a sunny day, a
rainy day and at night (from left to right).
Figure 8: Three images in labeled versions after transformations from Figure 7, which are with different environmental settings of a
sunny day, a rainy day and at night (from left to right).
Figure 5: An standard example of
labeled version from vid2vid.
Please note that the actual ones
used in training are in Grayscale.
Figure 6: An standard example of
a realistic driving scene built from
vid2vid.
System Configurations
All the implementations and
experiments were conducted
on a server remotely, with 3
cores and 20G memory. We
used Python 3.5.2 and
PyTorch 0.4.0. Also, we
imported a TITAN X (xl) GPU
to support vid2vid. Our host
OS is MacOS 10.14.1 and
guest OS is Ubuntu 16.04.
Figure 9: Final results after processing the driving scenes in Figure 8 via vid2vid, which are with different environmental settings of a
sunny day, a rainy day and at night (from left to right).
Preliminary Results
Our preliminary results show the overall quality to uild
realistic driving scenes using vid2vid is acceptable For
example, first, the left one in Figure 9 shows a pretty
realistic road scene. However, there are still two issues
to be further explored. First, the effects of presenting
distant parts of scenes is not well. Second, the edges of
the visual field are not very clear too. These two
aspects observed may due to the different resolutions
ratio between two series of images2.
The rest of Figure 9 demonstrated relatively poor
effects while changing environmental settings. The
middle one shows that, raindrops blur the driving scene,
which resulted in poor clarity. The right one showed
that the night could not be synthesized. This is because
the original data set for training the model, which is
used to perform Video-to-video Synthesis, didn’t
contain the situations while driving at night.
2one refers to the series of recorded images from OpenDS, and
the other refers to the series of images from supporting data
set
Discussions
Based on our preliminary results, we summarize two
major directions for further optimization, including:
Optimization under different environmental settings.
Existing results showed that our approach couldn’t be
extended to driving scenes under different
environmental settings. We planned to optimize it
increasing several sample images for model training.
Optimization on Edges. Our results showed that our
approach could sketch the street views while driving
but couldn’t perform well in the edges of the visual field.
We plan to optimize it by increasing the differences
between classes’ labels, which may avoid too much
confusions while training the model.
Conclusion and Future Work
In this paper, we explored the usages of vid2vid within
OpenDS framework. The preliminary results showed its
promising futures and outstanding challenges. Our
future work would focus on the optimization of applying
vid2vid in OpenDS to build realistic driving scenes
Acknowledgements
We thanked for anonymous reviewers for their valuable
feedback. This work is generously supported by the
funding body of Ningbo Creative Industry Park and
Summer Research Program in University of Nottingham
Ningbo China.
References
1. Marius Cordts, Mohamed Omran, Sebastian Ramos,
Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson,
Uwe Franke, Stefan Roth, and Bernt Schiele. 2016.
The cityscapes dataset for semantic urban scene
understanding. In Proceedings of the IEEE
conference on computer vision and pattern
recognition (CVPR '16), 3213-3223.
2. Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin
Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and
Ruigang Yang. 2018. The apolloscape dataset for
autonomous driving. In Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition Workshops (ECCV '18), 954-960.
3. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei
A. Efros. 2017. Image-to-image translation with
conditional adversarial networks. In Proceedings of
the IEEE conference on computer vision and
pattern recognition (CVPR '17), 1125-1134.
4. Nikhil Ketkar. 2017. Introduction to pytorch.
In Deep learning with python. Apress, Berkeley, CA,
195-208.
5. Rafael Math, Angela Mahr, Mohammad M. Moniri,
and Christian Müller. 2012. OpenDS: A new open-
source driving simulator for research. In Adjunct
roceedings of the International Conference on
Automotive User Interfaces and Interactive
Vehicular Application (Adjunct AutoUI '12), 7-8.
6. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Nikolai
Yakovenko, Anderw Tao, Jan Kautz and Bryan
Catanzaro. 2018. Video-to-Video Synthesis. In
Proceedings of the Annual Conference on Neural
Information Processing Systems (NIPS '18), 1152-
1164.
... Therefore, an easy-to-use and powerful driving simulator can greatly contribute to the promotion of HVI research [20,31]. Second, a driving simulator plays a significant role in the quality of simulation-based experiments [9,14,24,31]. Beyond the visualization supports, there are also built-in functions in driving simulators to be used for the collection of drivers' statistics [16,29]. ...
Chapter
Full-text available
We present Omniverse, a set of configurable abstractions for efficient developments of complex driving events during simulated driving scenarios. The goal of Omniverse is to identify the inefficiency of existing scenario implementations and provide an alternative design for ease-of-implementations for simulated driving events. We first investigate the standard code base of driving scenarios and abstract their overlapped building blocks through mathematical models. Then, we design and implement a set of flexible and configurable abstractions as an external library, to allow further developments and adaptions for more generalized cases. Finally, we validate the correctness and examine the effectiveness of Omniverse through standard driving scenarios’ implementations, and the results show that Omniverse can (1) save 42.7% development time, averaged across all participants; and (2) greatly improve the overall user experience via significantly improved readability and extend-ability of codes. The whole library of Omniverse is online at https://github.com/unnc-ucc/Omniverse-OpenDS.
... Furthermore, combining Oneiros with other advanced extensions of driving simulation infrastructure (e.g. [18,34,35]) can greatly improve the productivity. ...
Chapter
Full-text available
We present Oneiros, an interactive toolkit for agile and automated designs and developments of driving scenes for OpenDS driving simulator. Oneiros is the first response to address and tackle the key challenge of in-lab driving simulations: how to enable efficient designer-programmer cooperation, to design and develop complicated driving scenes. Our response is to design and build Oneiros, which enables a single designer to design, rectify and implement complicated driving scenes without programmers’ helps. This is credited to the integration of both GUIs and Automated Code Generation in Oneiros. Our empirical study, among 11 designers with experiences in designing driving scenes, indicates that Oneiros can significantly improve the productivity by increasing user-friendliness. The executable and source codes of Oneiros are online at https://github.com/unnc-ucc/Oneiros-OpenDS.
... Moreover, exploiting advanced simulation infrastructure and toolkits (e.g. [17,32,33,40]) for data collection is also important for developing effective data-driven Human-Vehicle Interactive systems. ...
Chapter
Full-text available
We present Face2Statistics, a comprehensive roadmap to deliver user-friendly, low-cost and effective alternatives for extracting drivers’ statistics. Face2Statistics is motivated by the growing importance of multi-modal statistics for Human-Vehicle Interaction, but existing approaches are user-unfriendly, impractical and cost-ineffective. To this end, we leverage Face2Statistics to build a series of Deep-Neural-Network-driven predictors of multi-modal statistics, by taking facial expressions as input only. We address two outstanding issues of the current design, and then (1) leverage HSV color space; and (2) Conditional Random Field to improve the robustness of Face2Statistics in terms of prediction accuracy and degree of customization. Our evaluations show that, Face2Statistics can be effective alternatives to sensors/monitors for Heart Rate, Skin Conductivity and Vehicle Speed. We also perform the breakdown analysis to justify the effectiveness of our optimizations. Both source codes and trained models of Face2Statistics are online at https://github.com/unnc-ucc/Face2Statistics.
... Therefore, an easy-to-use and powerful driving simulator can greatly contribute to the promotion of HVI research [20,31]. Second, a driving simulator plays a significant role in the quality of simulation-based experiments [9,14,24,31]. Beyond the visualization supports, there are also built-in functions in driving simulators to be used for the collection of drivers' statistics [16,29]. ...
Preprint
Full-text available
We present Omniverse, a set of configurable abstractions for efficient developments of complex driving events during simulated driving scenarios. The goal of Omniverse is to identify the inefficiency of existing scenario implementations and provide an alternative design for ease-of-implementations for simulated driving events. We first investigate the standard code base of driving scenarios and abstract their overlapped building blocks through mathematical models. Then, we design and implement a set of flexible and configurable abstractions as an external library, to allow further developments and adaptions for more generalized cases. Finally, we validate the correctness and examine the effectiveness of Omniverse through standard driving scenarios' implementations , and the results show that Omniverse can (1) save 42.7% development time, averaged across all participants; and (2) greatly improve the overall user experience via significantly improved readability and extend-ability of codes. The whole library of Omniverse is online at https://github.com/unnc-ucc/Omniverse-OpenDS.
... Moreover, exploiting advanced simulation infrastructure and toolkits (e.g. [17,32,33,40]) for data collection is also important for developing effective data-driven Human-Vehicle Interactive systems. ...
Preprint
Full-text available
We present Face2Statistics, a comprehensive roadmap to deliver user-friendly, low-cost and effective alternatives for extracting drivers' statistics. Face2Statistics is motivated by the growing importance of multi-modal statistics for Human-Vehicle Interaction, but existing approaches are user-unfriendly, impractical and cost-ineffective. To this end, we leverage Face2Statistics to build a series of Deep-Neural-Network-driven predictors of multi-modal statistics, by taking facial expressions as input only. We address two outstanding issues of the current design, and then (1) leverage HSV color space; and (2) Conditional Random Field to improve the robustness of Face2Statistics in terms of prediction accuracy and degree of customization. Our evaluations show that, Face2Statistics can be effective alternatives to sensors/monitors for Heart Rate, Skin Conductivity and Vehicle Speed. We also perform the breakdown analysis to justify the effectiveness of our optimizations. Both source codes and trained models of Face2Statistics are online at https://github.com/unnc-ucc/Face2Statistics.
... Furthermore, combining Oneiros with other advanced extensions of driving simulation infrastructure (e.g. [34,18,35]) can greatly improve the productivity. ...
Preprint
Full-text available
We present Oneiros, an interactive toolkit for agile and automated designs and developments of driving scenes for OpenDS driving simulator. Oneiros is the first response to address and tackle the key challenge of in-lab driving simulations: how to enable efficient designer-programmer cooperation, to design and develop complicated driving scenes. Our response is to design and build Oneiros, which enables a single designer to design, rectify and implement complicated driving scenes without programmers' helps. This is credited to the integration of both GUIs and Automated Code Generation in Oneiros. Our empirical study, among 11 designers with experiences in designing driving scenes, indicates that Oneiros can significantly improve the productivity by increasing user-friendliness. The executable and source codes of Oneiros are online at https://github.com/unnc-ucc/Oneiros-OpenDS.
... Our platform is expected to be able for all the above types of studies, and potentially extend them into a broad context (e.g. online classifications for driving styles [20], examining user trust on the fly [13], realistic driving simulations [12] and etc.). ...
Conference Paper
Full-text available
Techniques for Human-Vehicle Interaction can be easily bounded by the limited amount of computational resources within the vehicles. Therefore, outsourcing the computations from vehicles into a powerful, centralized server becomes the prime method in practice. Such formalization between edge vehicles and centralized servers is denoted as the Internet-of-Vehicles, which forms multiple vehicles as a distributed system. However, there are no available supports to examine the feasibility and suitability of Human-Vehicle Interaction techniques, in the context of Internet-of-Vehicles. Such examinations are essential for newly proposed techniques, by providing experimental characterizations, suggesting detailed implementation schemes and understanding its real-world effects under Internet-of-Vehicle settings. In this work, we report our progress in terms of a general-purpose and portable emulation platform, for examining the effects of novel Human-Vehicle Interaction techniques under the Internet-of-Vehicles setting. The key idea of our work is two-folded: (1) we provide an automatic extractor to retrieve the key patterns from Human-Vehicle Interaction workloads, so that our platform can facilitate with the needs of different scenarios/techniques; and (2) we leverage the configurable networking connections to provide abstractions regarding the interactions between edge vehicles and centralized servers, so that we can enable various types of emulations (e.g. geo-distributed applications). Our current progress is the finalization of the first prototype, and we leverage this prototype to characterize the impacts of a Deep-Neural-Network-driven in-vehicle application. Our results reveal the impacts of different implementations under the Internet-of-Vehicles setting, and our future works would focus on enhancing the characteristics of our prototype and scaling our study into a broader range of applications in Human-Vehicle Interactions.
... One is that the study is based on a simulation-based in-lab study, and the results may need to be vetted in real vehicles or applying more advanced simulation techniques (e.g. in-lab view synthesis [11]). Also, applying detailed facial expressions may also increase the computational burdens, and how to leverage such findings in practice may need to rethink the overall system design and implementations (e.g. ...
Conference Paper
Full-text available
Facial Expressions are valuable data sources for advanced HumanVehicle Interaction designs. However, existing works always consider the whole facial expressions as input, which restricts the design space for detailed optimizations. In this work, we make the hypothesis that facial expressions can exhibit significant variations during the driving procedure. Our goal in this work-in-progress is to justify this hypothesis, by performing detailed characterizations on the drivers’ facial expressions. To this end, we leverage Local Binary Fitting, a novel mechanism for selecting representative feature points from facial images on the fly, for our characterizations. Our characterizations reveal that, among six major components of facial feature points, there are significant variations of correlations with a certain vehicle status (i.e. Vehicle Speed), in terms of (1) the time spots during the driving procedure; and (2) the gender of the drivers. We believe our works can serve as a starting point to incorporate the characteristics of our findings with a great amount of adaptive and personalized Human-Vehicle Interaction designs.
... Our platform is expected to be able for all the above types of studies, and potentially extend them into a broad context (e.g. online classifications for driving styles [20], examining user trust on the fly [13], realistic driving simulations [12] and etc.). ...
Preprint
Full-text available
Techniques for Human-Vehicle Interaction can be easily bounded by the limited amount of computational resources within the vehicles. Therefore, outsourcing the computations from vehicles into a power, centralized server becomes the prime method in practice. Such formalization between edge vehicles and centralized servers is denoted as the Internet-of-Vehicles, which forms multiple vehicles as a distributed system. However, there are no available supports to examine the feasibility and suitability of Human-Vehicle Interaction techniques, in the context of Internet-of-Vehicles. Such examinations are essential for newly proposed techniques, by providing experimental characterizations, suggesting detailed implementation schemes and understanding its real-world effects under Internet-of-Vehicle settings. In this work, we report our progress in terms of a general-purpose and portable emulation platform, for examining the effects of novel Human-Vehicle Interaction techniques under the Internet-of-Vehicles setting. The key idea of our work is two-folded: (1) we provide an automatic extractor to retrieve the key patterns from Human-Vehicle Interaction workloads, so that our platform can facilitate with the needs of different scenarios/techniques; and (2) we leverage the configurable networking connections to provide abstractions regarding the interactions between edge vehicles and centralized servers, so that we can enable various types of emu-lations (e.g. geo-distributed applications). Our current progress is the finalization of the first prototype, and we leverage this prototype to characterize the impacts of a Deep-Neural-Network-driven in-vehicle application. Our results reveal the impacts of different implementations under the Internet-of-Vehicles setting, and our future works would focus on enhancing the characteristics of our prototype and scaling our study into a broader range of applications in Human-Vehicle Interactions.
... One is that the study is based on a simulation-based in-lab study, and the results may need to be vetted in real vehicles or applying more advanced simulation techniques (e.g. in-lab view synthesis [11]). Also, applying detailed facial expressions may also increase the computational burdens, and how to leverage such findings in practice may need to rethink the overall system design and implementations (e.g. ...
Preprint
Full-text available
Facial Expressions are valuable data sources for advanced Human-Vehicle Interaction designs. However, existing works always consider the whole facial expressions as input, which restricts the design space for detailed optimizations. In this work, we make the hypothesis that facial expressions can exhibit significant variations during the driving procedure. Our goal in this work-in-progress is to justify this hypothesis, by performing detailed characterizations on the drivers' facial expressions. To this end, we leverage Local Binary Fitting, a novel mechanism for selecting representative feature points from facial images on the fly, for our characterizations. Our characterizations reveal that, among six major components of facial feature points, there are significant variations of correlations with a certain vehicle status (i.e. Vehicle Speed), in terms of (1) the time spots during the driving procedure; and (2) the gender of the drivers. We believe our works can serve as a starting point to incorporate the characteristics of our findings with a great amount of adaptive and personalized Human-Vehicle Interaction designs.
Conference Paper
Full-text available
Scene parsing aims to assign a class (semantic) label for each pixel in an image. It is a comprehensive analysis of an image. Given the rise of autonomous driving, pixel-accurate environmental perception is expected to be a key enabling technical piece. However, providing a large scale dataset for the design and evaluation of scene parsing algorithms, in particular for outdoor scenes, has been difficult. The per-pixel labelling process is prohibitively expensive, limiting the scale of existing ones. In this paper, we present a large-scale open dataset, ApolloScape, that consists of RGB videos and corresponding dense 3D point clouds. Comparing with existing datasets, our dataset has the following unique properties. The first is its scale, our initial release contains over 140K images - each with its per-pixel semantic mask, up to 1M is scheduled. The second is its complexity. Captured in various traffic conditions, the number of moving objects averages from tens to over one hundred. And the third is the 3D attribute, each image is tagged with high-accuracy pose information at cm accuracy and the static background point cloud has mm relative accuracy. We are able to label these many images by an interactive and efficient labelling pipeline that utilizes the high-quality 3D point cloud. Moreover, our dataset also contains different lane markings based on the lane colors and styles. We expect our new dataset can deeply benefit various autonomous driving related applications that include but not limited to 2D/3D scene understanding, localization, transfer learning, and driving simulation.
Article
Full-text available
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.
Conference Paper
Full-text available
Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.
Chapter
In this chapter, we will cover PyTorch which is a more recent addition to the ecosystem of the deep learning framework. PyTorch can be seen as a Python front end to the Torch engine (which initially only had Lua bindings) which at its heart provides the ability to define mathematical functions and compute their gradients. PyTorch has fairly good Graphical Processing Unit (GPU) support and is a fast-maturing framework.
Book
Discover the practical aspects of implementing deep-learning solutions using the rich Python ecosystem. This book bridges the gap between the academic state-of-the-art and the industry state-of-the-practice by introducing you to deep learning frameworks such as Keras, Theano, and Caffe. The practicalities of these frameworks is often acquired by practitioners by reading source code, manuals, and posting questions on community forums, which tends to be a slow and a painful process.Deep Learning with Python allows you to ramp up to such practical know-how in a short period of time and focus more on the domain, models, and algorithms. This book briefly covers the mathematical prerequisites and fundamentals of deep learning, making this book a good starting point for software developers who want to get started in deep learning. A brief survey of deep learning architectures is also included. Deep Learning with Python also introduces you to key concepts of automatic differentiation and GPU computation which, while not central to deep learning, are critical when it comes to conducting large scale experiments. You will: • Leverage deep learning frameworks in Python namely, Keras, Theano, and Caffe • Gain the fundamentals of deep learning with mathematical prerequisites • Discover the practical considerations of large scale experiments • Take deep learning models to production
OpenDS: A new opensource driving simulator for research
  • Rafael Math
  • Angela Mahr
  • Mohammad M Moniri
  • Christian Müller
Rafael Math, Angela Mahr, Mohammad M. Moniri, and Christian Müller. 2012. OpenDS: A new opensource driving simulator for research. In Adjunct roceedings of the International Conference on Automotive User Interfaces and Interactive Vehicular Application (Adjunct AutoUI '12), 7-8.
Video-to-Video Synthesis
  • Ting-Chun Wang
  • Ming-Yu Liu
  • Jun-Yan Zhu
  • Nikolai Yakovenko
  • Anderw Tao
  • Jan Kautz
  • Bryan Catanzaro
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Nikolai Yakovenko, Anderw Tao, Jan Kautz and Bryan Catanzaro. 2018. Video-to-Video Synthesis. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS '18), 1152-1164.