PreprintPDF Available

Measuring the Influence of Image Preprocessing on the Rate-Distortion Performance of Video Encoding

Authors:

Abstract

In image and video coding, preprocessing of the images allows to increase the perceptual quality or to control the bitrate. In this paper, we conduct an extensive analysis of the rate-distortion (RD) performance achieved by using different preprocessing steps before encoding the video. We propose a novel evaluation method called the Mean Saving-Cost Ratio (MSCR) to compare the RD performance for different preprocessing algorithms. We define MSCR as the logarithmic mean ratio of maximum bitrate savings over maximum quality cost for all parameters of a preprocessing algorithm. Further, we calculate the Bjøntegaard Delta Rate for every quantization parameter (QP) between two RD curves at the respective QP. The resulting Bjøntegaard Delta curves allow for comparing two preprocessing algorithms over a range of QPs. In our experiments, we use the proposed MSCR to compare different preprocessing algorithms such as a Gaussian low-pass filter, a median filter, and a JPEG compressor. Overall, the Gaussian low-pass filter shows the best RD performance according to MSCR.
Measuring the Influence of Image Preprocessing on
the Rate-Distortion Performance of Video Encoding
1st Markus Hofbauer
Chair of Media Technology
Technical University of Munich
Munich, Germany
markus.hofbauer@tum.de
2nd Christopher B. Kuhn
Chair of Media Technology
Technical University of Munich
Munich, Germany
christopher.kuhn@tum.de
3rd Goran Petrovic
BMW Group
Munich, Germany
goran.petrovic@bmw.de
4th Eckehard Steinbach
Chair of Media Technology
Technical University of Munich
Munich, Germany
eckehard.steinbach@tum.de
Abstract—In image and video coding, preprocessing of the
images allows to increase the perceptual quality or to control
the bitrate. In this paper, we conduct an extensive analysis of
the rate-distortion (RD) performance achieved by using different
preprocessing steps before encoding the video. We propose a novel
evaluation method called the Mean Saving-Cost Ratio (MSCR)
to compare the RD performance for different preprocessing
algorithms. We define MSCR as the logarithmic mean ratio
of maximum bitrate savings over maximum quality cost for all
parameters of a preprocessing algorithm. Further, we calculate
the Bjøntegaard Delta Rate for every quantization parameter
(QP) between two RD curves at the respective QP. The resulting
Bjøntegaard Delta curves allow for comparing two preprocessing
algorithms over a range of QPs. In our experiments, we use the
proposed MSCR to compare different preprocessing algorithms
such as a Gaussian low-pass filter, a median filter, and a JPEG
compressor. Overall, the Gaussian low-pass filter shows the best
RD performance according to MSCR.
Index Terms—Image Preprocessing, Video Coding, Rate-
Distortion Performance
I. INTRODUCTION
Spatial low-pass filtering of images is a typical prepro-
cessing technique used to remove details from the image
content [1]. Encoding the images/video preprocessed this
way reduces artifacts, improves the quality, or reduces the
bitrate. In [2] and [3], the authors proposed to filter the entire
image in a preprocessing step to reduce the perceptibility of
coding artifacts for higher quantization. Karlsson et al. [4]
proposed region-of-interest (ROI) video coding using Gaussian
preprocessing filters. Applying the filter on the background
reduces the bitrate required to encode the image while the
foreground remains at a constant quality. Alternatively, the
quality of the foreground can be increased by keeping the
bitrate of the entire image constant. Huang et al. [5] used a
similar preprocessing approach to reduce the bitrate required
for video streaming in mobile environments.
Grois et al. [6] proposed a complexity-aware adaptive ROI
prefiltering scheme for scalable video coding. The authors
applied the prefilters dynamically with a transition region
between foreground and background to improve visual pre-
sentation quality. They used Gaussian, Wiener, and Wavelet
filters, with the Gaussian filter showing the best performance
trade-off due to its low computational complexity [7], [8].
In our previous work, we used preprocessing filters such as
a median filter [9] or a Gaussian filter [10] to enable individual
rate/quality adaptation of live multi-view video streams when
only a single encoder is available. We combined frames of
multiple camera views into a single superframe which is then
encoded by a single encoder. Preprocessing the individual
frames before building the superframe enables rate control
of individual camera views in systems with limited encoding
hardware such as vehicles considered for teleoperated driv-
ing [10]–[12]. When using the preprocessing filters for the
purpose of rate-control such as in [9], [10], we are mainly
interested in the rate savings that can be achieved and the
quality costs required for these savings to identify the most
suitable preprocessing algorithm.
While each of the preprocessing methods achieve certain
rate-quality gains, these gains were only measured for specific
quantization parameters (QPs). In [6], for instance, the authors
encode the background with a fixed QP of 30 and the fore-
ground with a variable QP between 20 and 30. To effectively
compare the performance of different preprocessing filters, the
rate-quality gains have to be evaluated for a larger range of
QPs and for different preprocessing parameters.
In this paper, we systematically evaluate the influence of
preprocessing algorithms on the rate-distortion (RD) perfor-
mance in video encoding. We first analyze the RD curves
of a Gaussian filter, a median filter, and a JPEG preproces-
sor. We select the JPEG preprocessor as an easily available
implementation of a Discrete Cosine Transformation (DCT)
representing a filter optimized for the human vision system.
Based on this analysis and the additional dimension introduced
by the preprocessing algorithms, we propose a novel eval-
uation method named Mean Saving-Cost Ratio (MSCR) to
effectively compare the RD performance of different prepro-
cessing algorithms and their parameters. We define the MSCR
as the logarithmic mean ratio of maximum bitrate saving over
maximum quality cost for all parameters of a preprocessing
algorithm. We calculate the bitrate savings and quality costs
for every preprocessed video relative to the baseline without
preprocessing. Using the ratio of both maxima results in a
single score describing the trade off of bitrate savings and
quality costs. Further, we calculate the Bjøntegaard Delta
Rate for every quantization parameter (QP) between two
0 2,000 4,000 6,000 8,000
60
80
100
Bitrate [kbit/s] (QP [45, ..., 24])
VMAF
none
gauss-3-05
gauss-3-06
gauss-3-07
gauss-3-08
gauss-3-10
gauss-3-15
0 2,000 4,000 6,000 8,000
40
60
80
100
Bitrate [kbit/s] (QP [45, ..., 24])
VMAF
none
gauss-5-05
gauss-5-06
gauss-5-07
gauss-5-08
gauss-5-10
gauss-5-15
0 2,000 4,000 6,000 8,000
20
40
60
80
100
Bitrate [kbit/s] (QP [45, ..., 24])
VMAF
none
median-3
median-5
median-7
median-9
0 2,000 4,000 6,000 8,000
70
80
90
100
Bitrate [kbit/s] (QP [45, ..., 24])
VMAF
none
jpeg-10
jpeg-20
jpeg-40
jpeg-60
0 2,000 4,000 6,000 8,000
60
80
100
Bitrate [kbit/s] (QP [45, ..., 24])
VMAF
(a) Gaussian Filter k=3
0 2,000 4,000 6,000 8,000
40
60
80
100
Bitrate [kbit/s] (QP [45, ..., 24])
VMAF
(b) Gaussian Filter k=5
0 2,000 4,000 6,000 8,000
20
40
60
80
100
Bitrate [kbit/s] (QP [45, ..., 24])
VMAF
(c) Median Filter
0 2,000 4,000 6,000 8,000
70
80
90
100
Bitrate [kbit/s] (QP [45, ..., 24])
VMAF
(d) JPEG Filter
Fig. 1: Rate-distortion (RD) analysis for the preprocessing filters Gaussian, Median, and JPEG. Every column contains the RD
curves for one of the preprocessing filters. The first row shows the RD curves, color coded by the filter parameter. The second
row color codes the RD curves per QP.
RD curves of different preprocessing filters at the respective
QP. The resulting Bjøntegaard Delta (BD) curves allow for
comparing two preprocessing filters over a range of QPs. The
BD curves demonstrate that which filter achieves the best RD
performance depends on the specific QP choice. Overall, the
Gaussian low-pass filter shows the best performance according
to MSCR, validating its wide usage in the literature [7], [8].
The rest of the paper is structured as follows. In Section II,
we analyze the influence of several preprocessing filters on
the encoder RD performance. In Section III, we introduce the
evaluation methods proposed in more detail. We present the
evaluation in Section IV. Section V concludes the paper.
II. RATE -DIS TO RTI ON ANA LYSIS
In this section, we analyze the rate-distortion (RD) perfor-
mance of the encoder for preprocessed input video sequences.
We use the uncompressed Bus video sequence in CIF resolu-
tion from [13] such as used by the related work [10], [14].
We create different preprocessed video sequences from Bus
using the following preprocessing filters and filter parameters:
A Gaussian filter with a kernel size k=3 and standard
deviations σ {0.5,0.6,0.7,0.8,1.0,1.5}, a Gaussian filter
with k=5 and the same values of σ, a median filter with
kernel sizes k {3,5,7,9}, and a JPEG preprocessor with the
quality levels Q {10,20,40,60}. We selected the Gaussian
and median filter as the most commonly used filters and due
to their low computational complexity [4], [6]–[10].
We encode the preprocessed video sequences using the
x264 software video encoder [15] with constant quantization
parameter mode. The quantization parameters (QPs) used for
encoding range from 24 to 45 with a step size of 1. Further,
we select a group-of-pictures (GoP) length of 1, meaning I-
frames only. We measure the video quality for each encoded
video sequence using the state-of-the-art video quality metric
VMAF [16], [17].
Figure 1 shows the resulting RD curves for the four prepro-
cessing filters used. Every column represents one preprocess-
ing filter. The first row presents the RD curves over all QPs for
the specific filter parameters. The blue curve none represents
the baseline without any preprocessing applied. The second
row contains the same RD points as the first row, but with the
RD curves per QP. From now on, we refer to these curves as
RD-QP curves.
First, we observe smooth RD curves for the Gaussian filters
and the median filter. In contrast, the JPEG preprocessor
produces a shaky RD curve. At the same time, the JPEG
preprocessor still reaches high VMAF scores of 70 to 80 even
for a quality level of 10. For the Gaussian filters, a kernel
size of 5 achieves higher rate savings while causing at the
same time higher quality costs. The median filter has a strong
influence on the VMAF score already at a kernel size of 3.
The first analysis conducted demonstrates the high dimen-
sionality of the problem presented. To effectively compare the
performance of different preprocessing filters, we require a
new evaluation method which we present next.
III. EVALUATI ON ME TH OD
In this section, we present our evaluation methods to com-
pare the performance of different preprocessing filters. When
using the preprocessing filters for the purpose of rate-control
such as in [9], [10], we are mainly interested in the rate savings
that can be achieved and the quality costs required for these
savings. Hence, we calculate the quality costs and rate savings
for every point on the RD curves. We define the rate saving
S(qp)for a certain QP qp {24,...,45}as the difference
0 1,000 2,000 3,000 4,000 5,000
0
10
20
30
Bitrate [kbit/s] (QP [24, ..., 45])
VMAF
none
gauss-3-05
gauss-3-06
gauss-3-07
gauss-3-08
gauss-3-10
gauss-3-15
(a) Gaussian Filter k=3
0 1,000 2,000 3,000 4,000 5,000
0
10
20
30
40
50
Bitrate [kbit/s] (QP [24, ..., 45])
VMAF
none
gauss-5-05
gauss-5-06
gauss-5-07
gauss-5-08
gauss-5-10
gauss-5-15
(b) Gaussian Filter k=5
0 1,000 2,000 3,000 4,000 5,000
0
20
40
60
80
Bitrate [kbit/s] (QP [24, ..., 45])
VMAF
none
median-3
median-5
median-7
median-9
(c) Median Filter
0 1,000 2,000 3,000 4,000 5,000
0
5
10
15
20
Bitrate [kbit/s] (QP [24, ..., 45])
VMAF
none
jpeg-10
jpeg-20
jpeg-40
jpeg-60
(d) JPEG Filter
Fig. 2: Quality costs C(VMAF) over bitrate savings S(Bitrate) of the individual preprocessing filter compared to the
unfiltered baseline.
25 30 35 40 45
0.5
0
0.5
1
QP
BD
BD-VMAF
BD-Rate [%]
(a) Gaussian Filter k=5
25 30 35 40 45
10
0
10
20
QP
BD
BD-VMAF
BD-Rate [%]
(b) Median Filter
25 30 35 40 45
40
20
0
20
40
60
QP
BD
BD-VMAF
BD-Rate [%]
(c) JPEG Filter
Fig. 3: Bjøntegaard Delta per QP between RD-QP curves of the Gaussian filter with k=3 and three other preprocessing filters.
of the baseline bitrate Rnone(qp)without any preprocessing
applied and the preprocessed video bitrate Rf ilter (qp):
S(qp) = Rnone(qp)Rf ilt er (qp)(1)
Similarly, we define the quality costs C(qp)as the quality
difference of baseline distortion Dnone(qp)and preprocessed
filter distortion Df ilter (qp):
C(qp) = Dnone(qp)Df ilt er (qp)(2)
Figure 2 visualizes the resulting quality costs over their rate
savings. From now on, we refer to them as CS curves. The CS
curves describe the inverse trend compared to the RD curves.
The higher the rate savings, the higher the quality difference
compared to the unfiltered baseline. For the Gaussian filter
with k=3 and a low σ, the quality costs show a peak for
QPs in the range of 30 to 40.
We use the Bjøntegaard Delta Rate (BDR) [18] and the
Bjøntegaard Delta of the VMAF score (BD-VMAF) to com-
pare two RD-QP curves of two different preprocessing filters
with the same QP. We use the Gaussian filter with k=3 as
the reference signal and calculate the BDR and BD-VMAF
for every RD-QP curve and preprocessing filter. Figure 3
visualizes the resulting BDR and BD-VMAF scores per QP.
We refer to them as Bjøntegaard Delta (BD) curves.
The Gaussian filter with k=5 performs slightly worse than
for k=3. The encoder requires 0.4 % to 1.3 % more bitrate
to reach the same quality or reaches 0.7 to 0.2 lower VMAF
scores for the same bitrate. Additionally, we observe that the
BDR savings become smaller for increasing QPs.
For the median filter, we observe that for low QPs the
encoder requires up to 22 % more bitrate for reaching the same
quality. With increasing QPs, the BDR becomes less right up
to QP 44 and 45 where the median filter performs better than
the reference, the Gaussian filter with k=3.
Lastly, the JPEG filter shows a better performance for QPs
lower than 38 compared to the Gaussian filter with k=3. For
QPs of 40 and higher, we observe extensive jumps in the BD
curves. This can be explained by the artifacts introduced by
the JPEG preprocessor. Encoding these artifacts leads to the
inconsistent RD curves as shown in Figure 1d. Calculating the
BDR and BD-VMAF from such inconsistent curves results in
noticeably variations.
This varying performance further highlights the motivation
for our extensive analysis. To allow for a simple RD per-
formance comparison of different preprocessing filters, we
propose the Mean Saving-Cost Ratio (MSCR). We define the
MSCR as the logarithmic mean ratio of maximum bitrate
savings over maximum quality cost for all parameters of a
preprocessing filter:
MSCR =log10 1
N
N
i=1
max({Si(qp):qp=24,...,45)}
max({Ci(qp):qp=24,...,45)}!(3)
with ias the preprocessing parameter for a certain filter. A
high MSCR value means high bitrate savings with low quality
costs.
The MSCR together with the BD curve allows for effec-
tively comparing the performance of different preprocessing
filters. The MSCR provides a single score while the BD curves
enable us to analyze the BDR and BD-VMAF over certain
QPs. These two metrics can be seen as a similar approach
to the area under the curve (AuC) and the Receiver Operating
Characteristic (ROC) curve, which are widely used in the field
of Machine Learning to compare the performance of different
models [19]–[21]. Next, we use the MSCR proposed for an
extensive comparison of the different preprocessing filters.
IV. EVALUATION
In this section, we use the proposed evaluation method to
analyze the RD performance of different preprocessing filters.
So far, we analyzed the Bus video sequence at CIF resolution
and encoded it at a GoP length of 1 using the x264 video
encoder. Here, we present the full evaluations for multiple
video sequences showing video content with different spatial
and temporal complexity [10]. For CIF resolution, we evaluate
four video sequences [13] for a GoP length of 1 and 20 using
the x264 video encoder. Additionally, we use the NVIDIA
HEVC [22] hardware video encoder with full HD resolution
and GoP length of 1 and 20 on three test video sequences [23].
For both resolutions and video codecs, we use the mean
bitrate and mean VMAF scores of the respective video se-
quences. We calculate the MSCR for two GoP lengths, reso-
lutions, and codecs following the evaluation method presented
in Section III. Table I shows the resulting MSCR scores.
Filter x264-CIF-1 x264-CIF-20 HEVC-HD-1 HEVC-HD-20
JPEG 2.45 -4.00 1.36
Gauss-3 1.96 1.41 3.42 3.02
Gauss-5 1.93 1.36 3.37 2.97
Median 1.72 0.94 3.12 2.68
TABLE I: Mean Saving-Cost Ratio (MSCR) for the test video
sequences at CIF and HD resolution and GoP lengths of 1 and
20, encoded with x264 or NVIDIA HEVC, respectively.
For a GoP length of 1, we observe the highest MSCR
for the JPEG preprocessor followed by the Gaussian filter.
We selected the JPEG preprocessor as a readily available
implementation of a DCT representing a filter optimized for
the human vision system. In practice, the JPEG usually would
not be considered as a preprocessor and according to Figure 3
shows unpredictable inconsistencies in the BD curve. Further,
for a GoP length of 20, JPEG performs worst while the
Gaussian filter achieves the highest MSCR.
The low MSCR values of the JPEG preprocessor with a GoP
length of 20 can be explained with the artifacts introduced
by the JPEG preprocessor. These artifacts affect the motion
estimation of the inter-frame coding enabled with a GoP length
larger than 1. The Gaussian and the median filter only blur the
image content which does not affect the inter-frame coding.
Here, the Gaussian filter with a kernel size of k=3 achieves
the highest MSCR and clearly outperforms the median filter.
V. C ONCLUSION
In this paper, we conducted an extensive analysis of the
rate-distortion (RD) performance of video coding considering
the influence of different preprocessing filters. Due to the
additional dimension introduced by the preprocessing, we pro-
posed a novel evaluation method called the Mean Saving-Cost
Ratio (MSCR) to effectively compare the RD performance
including the preprocessing aspect. We define the MSCR as
the logarithmic mean ratio of maximum bitrate savings over
maximum quality cost for all parameters of a preprocessing
filter. Further, we calculate the Bjøntegaard Delta Rate for
every quantization parameter (QP) between two RD curves
of different preprocessing filters at the respective QP. The
resulting Bjøntegaard Delta (BD) curves allow for comparing
two preprocessing filters over a range of QPs.
The BD curves demonstrate that which filter achieves the
best RD performance depends on the specific QP choice. In
our experiments, we use the proposed MSCR to compare
different preprocessing filters. Overall, the Gaussian low-pass
filter shows the best performance according to MSCR. To the
best of our knowledge, this is so far the first metric that allows
for measuring the influence of preprocessing filters on the
encoder RD performance.
For future work, the proposed MSCR and the BD curves
can be used for comparing more preprocessing filters. Further,
the MSCR could be used as metric for designing new Machine
Leaning based preprocessing filters.
REFERENCES
[1] Timothy Popkin, Andrea Cavallaro, and David Hands, “Accurate and
Efficient Method for Smoothly Space-Variant Gaussian Blurring,” IEEE
Transactions on Image Processing, vol. 19, no. 5, pp. 1362–1370, May
2010.
[2] Shijun Sun, Cheng Chang, and Stacey Spears, “Filtering and dithering
as pre-processing before encoding,” June 2014, US Patent 8,750,390.
[3] Brian Astle, “Image signal encoding with variable low-pass filter,” Feb.
2000, US Patent 6,026,190.
[4] L.S. Karlsson and M. Sjostrom, “Improved ROI video coding using
variable Gaussian pre-filters and variance in intensity,” in IEEE Inter-
national Conference on Image Processing 2005, Genova, Italy, 2005,
pp. II–313, IEEE.
[5] Hong-jie Huang, Xing-ming Zhang, and Zhi-wei Xu, “Semantic Video
Adaptation using a Preprocessing Method for Mobile Environment, in
2010 10th IEEE International Conference on Computer and Information
Technology, Bradford, United Kingdom, June 2010, pp. 2806–2810,
IEEE.
[6] Dan Grois and Ofer Hadar, “Complexity-Aware Adaptive Preprocessing
Scheme for Region-of-Interest Spatial Scalable Video Coding, IEEE
Transactions on Circuits and Systems for Video Technology, vol. 24, no.
6, pp. 1025–1039, June 2014.
[7] Dan Grois and Ofer Hadar, “Efficient adaptive bit-rate control for Scal-
able Video Coding by using Computational Complexity-Rate-Distortion
analysis,” in 2011 IEEE International Symposium on Broadband Mul-
timedia Systems and Broadcasting (BMSB), Metropolitan Area Nurem-
berg, Germany, June 2011, pp. 1–6, IEEE.
[8] Dan Grois and Ofer Hadar, “Efficient Region-of-Interest Scalable Video
Coding with Adaptive Bit-Rate Control, Advances in Multimedia, vol.
2013, pp. 1–17, 2013.
[9] Markus Hofbauer, Christopher Kuhn, Goran Petrovic, and Eckehard
Steinbach, “Adaptive multi-view live video streaming for teledriving
using a single hardware encoder, in 2020 IEEE International Sympo-
sium on Multimedia (ISM), Naples, Italy, 2020, pp. 9–16.
[10] Markus Hofbauer, Christopher B. Kuhn, Goran Petrovic, and Eckehard
Steinbach, “Preprocessor rate control for adaptive multi-view live video
streaming using a single encoder, IEEE Transactions on Circuits and
Systems for Video Technology, pp. 1–16, 2022.
[11] M. Hofbauer, C. B. Kuhn, G. Petrovic, and E. Steinbach, “Telecarla:
An open source extension of the carla simulator for teleoperated driving
research using off-the-shelf components,” in 2020 IEEE Intelligent
Vehicles Symposium (IV), 2020, pp. 335–340.
[12] Markus Hofbauer, Christopher B. Kuhn, Mariem Khlifi, Goran Petrovic,
and Eckehard Steinbach, “Traffic-aware multi-view video stream adapta-
tion for teleoperated driving, in 2022 IEEE 95th Vehicular Technology
Conference (VTC2022-Spring), Helsinki, Finland, June 2022, pp. 1–7,
IEEE.
[13] “Yuv video sequences,” http://trace.eas.asu.edu/yuv/.
[14] Christian Lottermann and Eckehard Steinbach, “Modeling the bit rate
of H.264/AVC video encoding as a function of quantization parameter,
frame rate and GoP characteristics,” in 2014 IEEE International
Conference on Multimedia and Expo Workshops (ICMEW), Chengdu,
China, July 2014, pp. 1–6, IEEE.
[15] VideoLAN, “x264,” .
[16] Netflix Technology Blog, “Toward A Practical Perceptual Video Quality
Metric,” Apr. 2017.
[17] Netflix Technology Blog, “VMAF: The Journey Continues,” Oct. 2018.
[18] Gisle Bjontegaard, “Calculation of average PSNR differences between
RD-curves,” ITU-T VCEG and ISO/IEC MPEG document VCEG-MM33,
Apr. 2001.
[19] Christopher Kuhn, Markus Hofbauer, Goran Petrovic, and Eckehard
Steinbach, “Introspective black box failure prediction for autonomous
driving, in 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas,
NV, USA, 2020, pp. 1907–1913.
[20] Christopher Kuhn, Markus Hofbauer, Goran Petrovic, and Eckehard
Steinbach, “Introspective failure prediction for autonomous driving
using late fusion of state and camera information,” IEEE Transactions
on Intelligent Transportation Systems, pp. 1–15, 2020.
[21] Christopher B. Kuhn, Markus Hofbauer, Goran Petrovic, and Eckehard
Steinbach, “Trajectory-based failure prediction for autonomous driving,”
in 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 2021,
pp. 980–986.
[22] Nvidia Corporation, “Nvidia video codec sdk,” 2021, Accessed on:
2021-04-16.
[23] “Y4m video sequences,” https://media.xiph.org/video/derf/.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Remote control of an autonomous vehicle by a human operator requires low delay video transmission to resolve complex situations and ensure safety. The remote operator perceives the current traffic scenario via video streams from multiple cameras. To provide the operator with the best possible scene understanding while matching the available network resources, the video streams need to be automatically adapted. In this paper, we propose a traffic-aware multi-view video stream adaptation scheme. We estimate the importance of each camera view based on the vehicle's real-time movement in traffic. The resulting prioritization together with the total available transmission rate determines a specific bit-budget for each camera view. We optimize the video quality of each individual video stream for the given bit-budget using a quality-of-experience-driven multi-dimensional adaptation scheme. Additionally, we apply a region-of-interest mask to the rear-facing camera views. The mask removes less important areas from the image which reduces the required bitrate. All modules are implemented to extend the existing TELECARLA framework. We evaluate the proposed traffic-aware adaptation scheme in a user study. We observe a high correlation between the proposed view prioritization module and the subjective ratings obtained in the user study. The region-of-interest masking achieves Bjøntegaard Delta Rate savings of at least 19.8% compared to streaming the full camera view. The overall system improves the VMAF score by 1.86 per camera when considering the importance of the individual camera views as rated by the users. This demonstrates the potential of an individual adaptation for each camera view optimized for the current traffic situation.
Article
Full-text available
Currently, an increasing number of technical systems are equipped with multiple cameras. Limited by cost and size, they are often restricted to a single hardware encoder. The combination of all views into a single superframe allows for streaming all camera views at the same time, but it prevents individual rate/quality adaptations on those camera views. We propose a preprocessing filter concept that allows for individual rate/quality adaptation while using a single encoder. Additionally, we create a preprocessor model that estimates the required preprocessing filter parameters from the specified encoding parameters. This means our approach can be used with any existing multi-view adaptation scheme designed for controlling multiple encoders. We design both an analytical and a Machine Learning-based bitrate model. Because both models perform equally well, we suggest using either one as the core part of our preprocessor model. Both models are specifically designed for estimating the influence of the quantization parameter, frame rate, frame size, group of pictures length, and a Gaussian low-pass filter on the video bitrate. Furthermore, the rate models outperform state-of-the-art bitrate models by at least 22% regarding the overall root mean square error. Our bitrate models are the first of their kind to consider the influence of a Gaussian low-pass filter. We evaluate the preprocessing approach by streaming six camera views in a teledriving scenario with a single encoder and compare it to using six individual encoders. The experimental results demonstrate that the preprocessing approach achieves bitrates similar to the individual encoders for all views. While achieving a comparable rate and quality for the most important views, our approach requires a total bitrate that is 50% smaller than when using a single encoder approach without preprocessing.
Conference Paper
Full-text available
In autonomous driving, complex traffic scenarios can cause situations that require human supervision to resolve safely. Instead of only reacting to such events, it is desirable to predict them early in advance. While predicting the future is challenging, there is a source of information about the future readily available in autonomous driving: the planned trajectory the car intends to drive. In this paper, we propose to analyze the trajectories planned by the vehicle to predict failures early on. We consider sequences of trajectories and use machine learning to detect patterns that indicate impending failures. Since no public data of disengagements of autonomous vehicles is available, we use data provided by development vehicles of the BMW Group. From over six months of test drives, we obtain more than 2600 disengagements of the automated system. We train a Long Short-Term Memory classifier with sequences of planned trajectories that either resulted in successful driving or disengagements. The proposed approach outperforms existing state-of-the-art failure prediction with low-dimensional data by more than 3% in a Receiver Operating Characteristic analysis. Since our approach makes no assumptions on the underlying system, it can be applied to predict failures in other safety-critical areas of robotics as well.
Article
Full-text available
We present an introspective failure prediction approach for autonomous vehicles. In autonomous driving, complex or unknown scenarios can cause a disengagement of the self-driving system. Disengagements can be triggered either by automatic safety measures or by human intervention. We propose to use recorded disengagement sequences from test drives as training data to learn to predict future failures. The system then learns introspectively from its own previous mistakes. In order to predict failures as early as possible, we propose a machine learning approach where sequences of sensor data are classified as either failure or success. The car itself is treated as a black box. Our method combines two sensor modalities that contain different types of information. An image-based model learns to detect generally challenging situations such as crowded intersections accurately multiple seconds in advance. A state data based model allows to detect fast changes immediately before a failure, such as sudden braking or swerving. The outcome of the individual models is fused by averaging the individual failure probabilities. We evaluate our approach on a data set provided by the BMW Group containing 14 hours of autonomous driving. The proposed late fusion approach allows for predicting failures at an accuracy of more than 85% seven seconds in advance, at a false positive rate of 20%. The proposed method outperforms state-of-the-art failure prediction by more than 15% while being a flexible framework that allows for straightforward addition of further sensor modalities.
Conference Paper
Full-text available
Teleoperated driving (TOD) is a possible solution to cope with failures of autonomous vehicles. In TOD, the human operator perceives the traffic situation via video streams of multiple cameras from a remote location. Adaptation mechanisms are needed in order to match the available transmission resources and provide the operator with the best possible situation awareness. This includes the adjustment of individual camera video streams according to the current traffic situation. The limited video encoding hardware in vehicles requires the combination of individual camera frames into a larger superframe video. While this enables the encoding of multiple camera views with a single encoder, it does not allow for rate/quality adaptation of the individual views. To this end, we propose a novel concept that uses preprocessing filters to enable individual rate/quality adaptations in the superframe video. The proposed prepro-cessing filters allow for the usage of existing multidimensional adaptation models in the same way as for individual video streams using multiple encoders. Our experiments confirm that the proposed concept is able to control the spatial, temporal and quality resolution of individual segments in the superframe video. Additionally, we demonstrate the usability of the proposed method by applying it in a multi-view teledriving scenario. We compare our approach to individually encoded video streams and a multiplexing solution without preprocessing. The results show that the proposed approach produces bitrates for the individual video streams which are comparable to the bitrates achieved with separate encoders. While achieving a similar bitrate for the most important views, our approach requires a total bitrate that is 40% smaller compared to the multiplexing approach without preprocessing.
Conference Paper
Full-text available
Teledriving is a possible fallback mode to cope with failures of fully autonomous vehicles. One important requirement for teleoperated vehicles is a reliable low delay data transmission solution, which adapts to the current network conditions to provide the operator with the best possible situation awareness. Currently, there is no easily accessible solution for the evaluation of such systems and algorithms in a fully controllable environment available. To this end we propose an open source framework for teleoperated driving research using low-cost off-the-shelf components. The proposed system is an extension of the open source simulator CARLA, which is responsible for rendering the driving environment and providing reproducible scenario evaluation. As a proof of concept, we evaluated our teledriving solution against CARLA in remote and local driving scenarios. The proposed teledriving system leads to almost identical performance measurements for local and remote driving. In contrast, remote driving using CARLA's client server communication results in drastically reduced operator performance. Further, the framework provides an interface for the adaptation of the temporal resolution and target bitrate of the compressed video streams. The proposed framework reduces the required setup effort for teleoperated driving research in academia and industry.
Conference Paper
Full-text available
Failures in autonomous driving caused by complex traffic situations or model inaccuracies remain inevitable in the near future. While much research is focused on how to prevent such failures, comparatively little research has been done on predicting them. An early failure prediction would allow for more time to take actions to resolve challenging situations. In this work, we propose an introspective approach to predict future disengagements of the car by learning from previous disengagement sequences. Our method is designed to detect failures as early as possible by using sensor data from up to ten seconds before each disengagement. The car itself is treated as a black box, with only its state data and the number of detected objects being required. Since no model-specific knowledge is needed, our method is applicable to any self-driving system. Currently, no public data of real-life disengagements is available. To test our approach, we therefore use autonomous driving data provided by BMW that was collected with BMW research vehicles over multiple months. We show that an LSTM classifier trained with sequences of state data can predict failures up to seven seconds in advance with an accuracy of more than 80%. This is two seconds earlier than comparable approaches from the literature.
Article
Full-text available
This work relates to the regions-of-interest (ROI) coding that is a desirable feature in future applications based on the scalable video coding, which is an extension of the H.264/MPEG-4 AVC standard. Due to the dramatic technological progress, there is a plurality of heterogeneous devices, which can be used for viewing a variety of video content. Devices such as smartphones and tablets are mostly resource-limited devices, which make it difficult to display high-quality content. Usually, the displayed video content contains one or more ROI(s), which should be adaptively selected from the preencoded scalable video bitstream. Thus, an efficient scalable ROI video coding scheme is proposed in this work, thereby enabling the extraction of the desired regions-of-interest and the adaptive setting of the desirable ROI location, size, and resolution. In addition, an adaptive bit-rate control is provided for the region-of-interest scalable video coding. The performance of the presented techniques is demonstrated and compared with the joint scalable video model reference software (JSVM 9.19), thereby showing significant bit-rate savings as a tradeoff for the relatively low PSNR degradation.
Article
This paper presents a complexity-aware adaptive spatial pre-processing scheme for the efficient Scalable Video Coding (SVC) by employing an adaptive pre-filter for each SVC layer. According to the presented scheme, a dynamic transition region is defined between the Region-of-Interest (ROI) and background within each video frame, and then various parameters of each pre-filter (such as the standard deviation, kernel matrix size, and also a number of filters for the dynamic pre-processing of a transition region between the ROI and background) are adaptively varied. The presented scheme has proved to be very efficient since it is based on an SVC computational Complexity-Rate-Distortion (C-R-D) analysis, thereby adding a complexity dimension to the conventional SVC Rate-Distortion (R-D) analysis. As a result, the encoding computational complexity resources are significantly reduced, which is especially useful for portable encoders with limited power resources. The performance of the presented adaptive spatial pre-processing scheme is evaluated and tested in detail both from the computational complexity and visual presentation quality points of view, further comparing it to the Joint Scalable Video Model reference software (JSVM 9.19) and demonstrating significant improvements.