Content uploaded by Markus Hofbauer
Author content
All content in this area was uploaded by Markus Hofbauer on Nov 02, 2022
Content may be subject to copyright.
Measuring the Influence of Image Preprocessing on
the Rate-Distortion Performance of Video Encoding
1st Markus Hofbauer
Chair of Media Technology
Technical University of Munich
Munich, Germany
markus.hofbauer@tum.de
2nd Christopher B. Kuhn
Chair of Media Technology
Technical University of Munich
Munich, Germany
christopher.kuhn@tum.de
3rd Goran Petrovic
BMW Group
Munich, Germany
goran.petrovic@bmw.de
4th Eckehard Steinbach
Chair of Media Technology
Technical University of Munich
Munich, Germany
eckehard.steinbach@tum.de
Abstract—In image and video coding, preprocessing of the
images allows to increase the perceptual quality or to control
the bitrate. In this paper, we conduct an extensive analysis of
the rate-distortion (RD) performance achieved by using different
preprocessing steps before encoding the video. We propose a novel
evaluation method called the Mean Saving-Cost Ratio (MSCR)
to compare the RD performance for different preprocessing
algorithms. We define MSCR as the logarithmic mean ratio
of maximum bitrate savings over maximum quality cost for all
parameters of a preprocessing algorithm. Further, we calculate
the Bjøntegaard Delta Rate for every quantization parameter
(QP) between two RD curves at the respective QP. The resulting
Bjøntegaard Delta curves allow for comparing two preprocessing
algorithms over a range of QPs. In our experiments, we use the
proposed MSCR to compare different preprocessing algorithms
such as a Gaussian low-pass filter, a median filter, and a JPEG
compressor. Overall, the Gaussian low-pass filter shows the best
RD performance according to MSCR.
Index Terms—Image Preprocessing, Video Coding, Rate-
Distortion Performance
I. INTRODUCTION
Spatial low-pass filtering of images is a typical prepro-
cessing technique used to remove details from the image
content [1]. Encoding the images/video preprocessed this
way reduces artifacts, improves the quality, or reduces the
bitrate. In [2] and [3], the authors proposed to filter the entire
image in a preprocessing step to reduce the perceptibility of
coding artifacts for higher quantization. Karlsson et al. [4]
proposed region-of-interest (ROI) video coding using Gaussian
preprocessing filters. Applying the filter on the background
reduces the bitrate required to encode the image while the
foreground remains at a constant quality. Alternatively, the
quality of the foreground can be increased by keeping the
bitrate of the entire image constant. Huang et al. [5] used a
similar preprocessing approach to reduce the bitrate required
for video streaming in mobile environments.
Grois et al. [6] proposed a complexity-aware adaptive ROI
prefiltering scheme for scalable video coding. The authors
applied the prefilters dynamically with a transition region
between foreground and background to improve visual pre-
sentation quality. They used Gaussian, Wiener, and Wavelet
filters, with the Gaussian filter showing the best performance
trade-off due to its low computational complexity [7], [8].
In our previous work, we used preprocessing filters such as
a median filter [9] or a Gaussian filter [10] to enable individual
rate/quality adaptation of live multi-view video streams when
only a single encoder is available. We combined frames of
multiple camera views into a single superframe which is then
encoded by a single encoder. Preprocessing the individual
frames before building the superframe enables rate control
of individual camera views in systems with limited encoding
hardware such as vehicles considered for teleoperated driv-
ing [10]–[12]. When using the preprocessing filters for the
purpose of rate-control such as in [9], [10], we are mainly
interested in the rate savings that can be achieved and the
quality costs required for these savings to identify the most
suitable preprocessing algorithm.
While each of the preprocessing methods achieve certain
rate-quality gains, these gains were only measured for specific
quantization parameters (QPs). In [6], for instance, the authors
encode the background with a fixed QP of 30 and the fore-
ground with a variable QP between 20 and 30. To effectively
compare the performance of different preprocessing filters, the
rate-quality gains have to be evaluated for a larger range of
QPs and for different preprocessing parameters.
In this paper, we systematically evaluate the influence of
preprocessing algorithms on the rate-distortion (RD) perfor-
mance in video encoding. We first analyze the RD curves
of a Gaussian filter, a median filter, and a JPEG preproces-
sor. We select the JPEG preprocessor as an easily available
implementation of a Discrete Cosine Transformation (DCT)
representing a filter optimized for the human vision system.
Based on this analysis and the additional dimension introduced
by the preprocessing algorithms, we propose a novel eval-
uation method named Mean Saving-Cost Ratio (MSCR) to
effectively compare the RD performance of different prepro-
cessing algorithms and their parameters. We define the MSCR
as the logarithmic mean ratio of maximum bitrate saving over
maximum quality cost for all parameters of a preprocessing
algorithm. We calculate the bitrate savings and quality costs
for every preprocessed video relative to the baseline without
preprocessing. Using the ratio of both maxima results in a
single score describing the trade off of bitrate savings and
quality costs. Further, we calculate the Bjøntegaard Delta
Rate for every quantization parameter (QP) between two
0 2,000 4,000 6,000 8,000
60
80
100
Bitrate [kbit/s] (QP ∈[45, ..., 24])
VMAF
none
gauss-3-05
gauss-3-06
gauss-3-07
gauss-3-08
gauss-3-10
gauss-3-15
0 2,000 4,000 6,000 8,000
40
60
80
100
Bitrate [kbit/s] (QP ∈[45, ..., 24])
VMAF
none
gauss-5-05
gauss-5-06
gauss-5-07
gauss-5-08
gauss-5-10
gauss-5-15
0 2,000 4,000 6,000 8,000
20
40
60
80
100
Bitrate [kbit/s] (QP ∈[45, ..., 24])
VMAF
none
median-3
median-5
median-7
median-9
0 2,000 4,000 6,000 8,000
70
80
90
100
Bitrate [kbit/s] (QP ∈[45, ..., 24])
VMAF
none
jpeg-10
jpeg-20
jpeg-40
jpeg-60
0 2,000 4,000 6,000 8,000
60
80
100
Bitrate [kbit/s] (QP ∈[45, ..., 24])
VMAF
(a) Gaussian Filter k=3
0 2,000 4,000 6,000 8,000
40
60
80
100
Bitrate [kbit/s] (QP ∈[45, ..., 24])
VMAF
(b) Gaussian Filter k=5
0 2,000 4,000 6,000 8,000
20
40
60
80
100
Bitrate [kbit/s] (QP ∈[45, ..., 24])
VMAF
(c) Median Filter
0 2,000 4,000 6,000 8,000
70
80
90
100
Bitrate [kbit/s] (QP ∈[45, ..., 24])
VMAF
(d) JPEG Filter
Fig. 1: Rate-distortion (RD) analysis for the preprocessing filters Gaussian, Median, and JPEG. Every column contains the RD
curves for one of the preprocessing filters. The first row shows the RD curves, color coded by the filter parameter. The second
row color codes the RD curves per QP.
RD curves of different preprocessing filters at the respective
QP. The resulting Bjøntegaard Delta (BD) curves allow for
comparing two preprocessing filters over a range of QPs. The
BD curves demonstrate that which filter achieves the best RD
performance depends on the specific QP choice. Overall, the
Gaussian low-pass filter shows the best performance according
to MSCR, validating its wide usage in the literature [7], [8].
The rest of the paper is structured as follows. In Section II,
we analyze the influence of several preprocessing filters on
the encoder RD performance. In Section III, we introduce the
evaluation methods proposed in more detail. We present the
evaluation in Section IV. Section V concludes the paper.
II. RATE -DIS TO RTI ON ANA LYSIS
In this section, we analyze the rate-distortion (RD) perfor-
mance of the encoder for preprocessed input video sequences.
We use the uncompressed Bus video sequence in CIF resolu-
tion from [13] such as used by the related work [10], [14].
We create different preprocessed video sequences from Bus
using the following preprocessing filters and filter parameters:
A Gaussian filter with a kernel size k=3 and standard
deviations σ∈ {0.5,0.6,0.7,0.8,1.0,1.5}, a Gaussian filter
with k=5 and the same values of σ, a median filter with
kernel sizes k∈ {3,5,7,9}, and a JPEG preprocessor with the
quality levels Q∈ {10,20,40,60}. We selected the Gaussian
and median filter as the most commonly used filters and due
to their low computational complexity [4], [6]–[10].
We encode the preprocessed video sequences using the
x264 software video encoder [15] with constant quantization
parameter mode. The quantization parameters (QPs) used for
encoding range from 24 to 45 with a step size of 1. Further,
we select a group-of-pictures (GoP) length of 1, meaning I-
frames only. We measure the video quality for each encoded
video sequence using the state-of-the-art video quality metric
VMAF [16], [17].
Figure 1 shows the resulting RD curves for the four prepro-
cessing filters used. Every column represents one preprocess-
ing filter. The first row presents the RD curves over all QPs for
the specific filter parameters. The blue curve none represents
the baseline without any preprocessing applied. The second
row contains the same RD points as the first row, but with the
RD curves per QP. From now on, we refer to these curves as
RD-QP curves.
First, we observe smooth RD curves for the Gaussian filters
and the median filter. In contrast, the JPEG preprocessor
produces a shaky RD curve. At the same time, the JPEG
preprocessor still reaches high VMAF scores of 70 to 80 even
for a quality level of 10. For the Gaussian filters, a kernel
size of 5 achieves higher rate savings while causing at the
same time higher quality costs. The median filter has a strong
influence on the VMAF score already at a kernel size of 3.
The first analysis conducted demonstrates the high dimen-
sionality of the problem presented. To effectively compare the
performance of different preprocessing filters, we require a
new evaluation method which we present next.
III. EVALUATI ON ME TH OD
In this section, we present our evaluation methods to com-
pare the performance of different preprocessing filters. When
using the preprocessing filters for the purpose of rate-control
such as in [9], [10], we are mainly interested in the rate savings
that can be achieved and the quality costs required for these
savings. Hence, we calculate the quality costs and rate savings
for every point on the RD curves. We define the rate saving
S(qp)for a certain QP qp∈ {24,...,45}as the difference
0 1,000 2,000 3,000 4,000 5,000
0
10
20
30
∆Bitrate [kbit/s] (QP ∈[24, ..., 45])
∆VMAF
none
gauss-3-05
gauss-3-06
gauss-3-07
gauss-3-08
gauss-3-10
gauss-3-15
(a) Gaussian Filter k=3
0 1,000 2,000 3,000 4,000 5,000
0
10
20
30
40
50
∆Bitrate [kbit/s] (QP ∈[24, ..., 45])
∆VMAF
none
gauss-5-05
gauss-5-06
gauss-5-07
gauss-5-08
gauss-5-10
gauss-5-15
(b) Gaussian Filter k=5
0 1,000 2,000 3,000 4,000 5,000
0
20
40
60
80
∆Bitrate [kbit/s] (QP ∈[24, ..., 45])
∆VMAF
none
median-3
median-5
median-7
median-9
(c) Median Filter
0 1,000 2,000 3,000 4,000 5,000
0
5
10
15
20
∆Bitrate [kbit/s] (QP ∈[24, ..., 45])
∆VMAF
none
jpeg-10
jpeg-20
jpeg-40
jpeg-60
(d) JPEG Filter
Fig. 2: Quality costs C(∆VMAF) over bitrate savings S(∆Bitrate) of the individual preprocessing filter compared to the
unfiltered baseline.
25 30 35 40 45
−0.5
0
0.5
1
QP
BD
BD-VMAF
BD-Rate [%]
(a) Gaussian Filter k=5
25 30 35 40 45
−10
0
10
20
QP
BD
BD-VMAF
BD-Rate [%]
(b) Median Filter
25 30 35 40 45
−40
−20
0
20
40
60
QP
BD
BD-VMAF
BD-Rate [%]
(c) JPEG Filter
Fig. 3: Bjøntegaard Delta per QP between RD-QP curves of the Gaussian filter with k=3 and three other preprocessing filters.
of the baseline bitrate Rnone(qp)without any preprocessing
applied and the preprocessed video bitrate Rf ilter (qp):
S(qp) = Rnone(qp)−Rf ilt er (qp)(1)
Similarly, we define the quality costs C(qp)as the quality
difference of baseline distortion Dnone(qp)and preprocessed
filter distortion Df ilter (qp):
C(qp) = Dnone(qp)−Df ilt er (qp)(2)
Figure 2 visualizes the resulting quality costs over their rate
savings. From now on, we refer to them as CS curves. The CS
curves describe the inverse trend compared to the RD curves.
The higher the rate savings, the higher the quality difference
compared to the unfiltered baseline. For the Gaussian filter
with k=3 and a low σ, the quality costs show a peak for
QPs in the range of 30 to 40.
We use the Bjøntegaard Delta Rate (BDR) [18] and the
Bjøntegaard Delta of the VMAF score (BD-VMAF) to com-
pare two RD-QP curves of two different preprocessing filters
with the same QP. We use the Gaussian filter with k=3 as
the reference signal and calculate the BDR and BD-VMAF
for every RD-QP curve and preprocessing filter. Figure 3
visualizes the resulting BDR and BD-VMAF scores per QP.
We refer to them as Bjøntegaard Delta (BD) curves.
The Gaussian filter with k=5 performs slightly worse than
for k=3. The encoder requires 0.4 % to 1.3 % more bitrate
to reach the same quality or reaches 0.7 to 0.2 lower VMAF
scores for the same bitrate. Additionally, we observe that the
BDR savings become smaller for increasing QPs.
For the median filter, we observe that for low QPs the
encoder requires up to 22 % more bitrate for reaching the same
quality. With increasing QPs, the BDR becomes less right up
to QP 44 and 45 where the median filter performs better than
the reference, the Gaussian filter with k=3.
Lastly, the JPEG filter shows a better performance for QPs
lower than 38 compared to the Gaussian filter with k=3. For
QPs of 40 and higher, we observe extensive jumps in the BD
curves. This can be explained by the artifacts introduced by
the JPEG preprocessor. Encoding these artifacts leads to the
inconsistent RD curves as shown in Figure 1d. Calculating the
BDR and BD-VMAF from such inconsistent curves results in
noticeably variations.
This varying performance further highlights the motivation
for our extensive analysis. To allow for a simple RD per-
formance comparison of different preprocessing filters, we
propose the Mean Saving-Cost Ratio (MSCR). We define the
MSCR as the logarithmic mean ratio of maximum bitrate
savings over maximum quality cost for all parameters of a
preprocessing filter:
MSCR =log10 1
N
N
∑
i=1
max({Si(qp):qp=24,...,45)}
max({Ci(qp):qp=24,...,45)}!(3)
with ias the preprocessing parameter for a certain filter. A
high MSCR value means high bitrate savings with low quality
costs.
The MSCR together with the BD curve allows for effec-
tively comparing the performance of different preprocessing
filters. The MSCR provides a single score while the BD curves
enable us to analyze the BDR and BD-VMAF over certain
QPs. These two metrics can be seen as a similar approach
to the area under the curve (AuC) and the Receiver Operating
Characteristic (ROC) curve, which are widely used in the field
of Machine Learning to compare the performance of different
models [19]–[21]. Next, we use the MSCR proposed for an
extensive comparison of the different preprocessing filters.
IV. EVALUATION
In this section, we use the proposed evaluation method to
analyze the RD performance of different preprocessing filters.
So far, we analyzed the Bus video sequence at CIF resolution
and encoded it at a GoP length of 1 using the x264 video
encoder. Here, we present the full evaluations for multiple
video sequences showing video content with different spatial
and temporal complexity [10]. For CIF resolution, we evaluate
four video sequences [13] for a GoP length of 1 and 20 using
the x264 video encoder. Additionally, we use the NVIDIA
HEVC [22] hardware video encoder with full HD resolution
and GoP length of 1 and 20 on three test video sequences [23].
For both resolutions and video codecs, we use the mean
bitrate and mean VMAF scores of the respective video se-
quences. We calculate the MSCR for two GoP lengths, reso-
lutions, and codecs following the evaluation method presented
in Section III. Table I shows the resulting MSCR scores.
Filter x264-CIF-1 x264-CIF-20 HEVC-HD-1 HEVC-HD-20
JPEG 2.45 -4.00 1.36
Gauss-3 1.96 1.41 3.42 3.02
Gauss-5 1.93 1.36 3.37 2.97
Median 1.72 0.94 3.12 2.68
TABLE I: Mean Saving-Cost Ratio (MSCR) for the test video
sequences at CIF and HD resolution and GoP lengths of 1 and
20, encoded with x264 or NVIDIA HEVC, respectively.
For a GoP length of 1, we observe the highest MSCR
for the JPEG preprocessor followed by the Gaussian filter.
We selected the JPEG preprocessor as a readily available
implementation of a DCT representing a filter optimized for
the human vision system. In practice, the JPEG usually would
not be considered as a preprocessor and according to Figure 3
shows unpredictable inconsistencies in the BD curve. Further,
for a GoP length of 20, JPEG performs worst while the
Gaussian filter achieves the highest MSCR.
The low MSCR values of the JPEG preprocessor with a GoP
length of 20 can be explained with the artifacts introduced
by the JPEG preprocessor. These artifacts affect the motion
estimation of the inter-frame coding enabled with a GoP length
larger than 1. The Gaussian and the median filter only blur the
image content which does not affect the inter-frame coding.
Here, the Gaussian filter with a kernel size of k=3 achieves
the highest MSCR and clearly outperforms the median filter.
V. C ONCLUSION
In this paper, we conducted an extensive analysis of the
rate-distortion (RD) performance of video coding considering
the influence of different preprocessing filters. Due to the
additional dimension introduced by the preprocessing, we pro-
posed a novel evaluation method called the Mean Saving-Cost
Ratio (MSCR) to effectively compare the RD performance
including the preprocessing aspect. We define the MSCR as
the logarithmic mean ratio of maximum bitrate savings over
maximum quality cost for all parameters of a preprocessing
filter. Further, we calculate the Bjøntegaard Delta Rate for
every quantization parameter (QP) between two RD curves
of different preprocessing filters at the respective QP. The
resulting Bjøntegaard Delta (BD) curves allow for comparing
two preprocessing filters over a range of QPs.
The BD curves demonstrate that which filter achieves the
best RD performance depends on the specific QP choice. In
our experiments, we use the proposed MSCR to compare
different preprocessing filters. Overall, the Gaussian low-pass
filter shows the best performance according to MSCR. To the
best of our knowledge, this is so far the first metric that allows
for measuring the influence of preprocessing filters on the
encoder RD performance.
For future work, the proposed MSCR and the BD curves
can be used for comparing more preprocessing filters. Further,
the MSCR could be used as metric for designing new Machine
Leaning based preprocessing filters.
REFERENCES
[1] Timothy Popkin, Andrea Cavallaro, and David Hands, “Accurate and
Efficient Method for Smoothly Space-Variant Gaussian Blurring,” IEEE
Transactions on Image Processing, vol. 19, no. 5, pp. 1362–1370, May
2010.
[2] Shijun Sun, Cheng Chang, and Stacey Spears, “Filtering and dithering
as pre-processing before encoding,” June 2014, US Patent 8,750,390.
[3] Brian Astle, “Image signal encoding with variable low-pass filter,” Feb.
2000, US Patent 6,026,190.
[4] L.S. Karlsson and M. Sjostrom, “Improved ROI video coding using
variable Gaussian pre-filters and variance in intensity,” in IEEE Inter-
national Conference on Image Processing 2005, Genova, Italy, 2005,
pp. II–313, IEEE.
[5] Hong-jie Huang, Xing-ming Zhang, and Zhi-wei Xu, “Semantic Video
Adaptation using a Preprocessing Method for Mobile Environment,” in
2010 10th IEEE International Conference on Computer and Information
Technology, Bradford, United Kingdom, June 2010, pp. 2806–2810,
IEEE.
[6] Dan Grois and Ofer Hadar, “Complexity-Aware Adaptive Preprocessing
Scheme for Region-of-Interest Spatial Scalable Video Coding,” IEEE
Transactions on Circuits and Systems for Video Technology, vol. 24, no.
6, pp. 1025–1039, June 2014.
[7] Dan Grois and Ofer Hadar, “Efficient adaptive bit-rate control for Scal-
able Video Coding by using Computational Complexity-Rate-Distortion
analysis,” in 2011 IEEE International Symposium on Broadband Mul-
timedia Systems and Broadcasting (BMSB), Metropolitan Area Nurem-
berg, Germany, June 2011, pp. 1–6, IEEE.
[8] Dan Grois and Ofer Hadar, “Efficient Region-of-Interest Scalable Video
Coding with Adaptive Bit-Rate Control,” Advances in Multimedia, vol.
2013, pp. 1–17, 2013.
[9] Markus Hofbauer, Christopher Kuhn, Goran Petrovic, and Eckehard
Steinbach, “Adaptive multi-view live video streaming for teledriving
using a single hardware encoder,” in 2020 IEEE International Sympo-
sium on Multimedia (ISM), Naples, Italy, 2020, pp. 9–16.
[10] Markus Hofbauer, Christopher B. Kuhn, Goran Petrovic, and Eckehard
Steinbach, “Preprocessor rate control for adaptive multi-view live video
streaming using a single encoder,” IEEE Transactions on Circuits and
Systems for Video Technology, pp. 1–16, 2022.
[11] M. Hofbauer, C. B. Kuhn, G. Petrovic, and E. Steinbach, “Telecarla:
An open source extension of the carla simulator for teleoperated driving
research using off-the-shelf components,” in 2020 IEEE Intelligent
Vehicles Symposium (IV), 2020, pp. 335–340.
[12] Markus Hofbauer, Christopher B. Kuhn, Mariem Khlifi, Goran Petrovic,
and Eckehard Steinbach, “Traffic-aware multi-view video stream adapta-
tion for teleoperated driving,” in 2022 IEEE 95th Vehicular Technology
Conference (VTC2022-Spring), Helsinki, Finland, June 2022, pp. 1–7,
IEEE.
[13] “Yuv video sequences,” http://trace.eas.asu.edu/yuv/.
[14] Christian Lottermann and Eckehard Steinbach, “Modeling the bit rate
of H.264/AVC video encoding as a function of quantization parameter,
frame rate and GoP characteristics,” in 2014 IEEE International
Conference on Multimedia and Expo Workshops (ICMEW), Chengdu,
China, July 2014, pp. 1–6, IEEE.
[15] VideoLAN, “x264,” .
[16] Netflix Technology Blog, “Toward A Practical Perceptual Video Quality
Metric,” Apr. 2017.
[17] Netflix Technology Blog, “VMAF: The Journey Continues,” Oct. 2018.
[18] Gisle Bjontegaard, “Calculation of average PSNR differences between
RD-curves,” ITU-T VCEG and ISO/IEC MPEG document VCEG-MM33,
Apr. 2001.
[19] Christopher Kuhn, Markus Hofbauer, Goran Petrovic, and Eckehard
Steinbach, “Introspective black box failure prediction for autonomous
driving,” in 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas,
NV, USA, 2020, pp. 1907–1913.
[20] Christopher Kuhn, Markus Hofbauer, Goran Petrovic, and Eckehard
Steinbach, “Introspective failure prediction for autonomous driving
using late fusion of state and camera information,” IEEE Transactions
on Intelligent Transportation Systems, pp. 1–15, 2020.
[21] Christopher B. Kuhn, Markus Hofbauer, Goran Petrovic, and Eckehard
Steinbach, “Trajectory-based failure prediction for autonomous driving,”
in 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 2021,
pp. 980–986.
[22] Nvidia Corporation, “Nvidia video codec sdk,” 2021, Accessed on:
2021-04-16.
[23] “Y4m video sequences,” https://media.xiph.org/video/derf/.