Conference PaperPDF Available

A convolutional neural network-based approach to rate control in HEVC intra coding

Authors:
A Convolutional Neural Network-Based Approach
to Rate Control in HEVC Intra Coding
Ye Li, Bin Li, Dong Liu, Zhibo Chen
CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System
University of Science and Technology of China, Hefei 230027, China
chenzhibo@ustc.edu.cn
Abstract—Rate control is an essential element for the practical
use of video coding standards. A rate control scheme typically
builds a model that characterizes the relationship between rate
(R) and a coding parameter, e.g. quantization parameter or
Lagrange multiplier (λ). In such a scheme, the rate control
performance depends highly on the modeling accuracy. For inter
frames, the model parameters can be precisely updated to fit
the video content, based on the information of previously coded
frames. However, for intra frames, especially the first frame
of a video sequence, there is no prior information to rely on.
Therefore, intra frame rate control has remained a challenge. In
this paper, we adopt the R-λmodel to characterize each coding
tree unit (CTU) in an intra frame, and we propose a convolutional
neural network (CNN) based approach to effectively predict the
model parameters for every CTU. Then we develop a new CTU
level bit allocation and bitrate control algorithm based on the R-λ
model for HEVC intra coding. The experimental results show that
our proposed CNN-based approach outperforms the currently
used rate control algorithm in HEVC reference software, leading
to on average 0.46 percent decrease of rate control error and 0.7
percent BD-rate reduction.
Index Terms—H.265/HEVC, Intra coding, Rate control, Con-
volutional neural network (CNN), R-λmodel
I. INTRODUCTION
Rate control (RC) plays an important role in video ap-
plications, especially for real-time video transmission. The
objective of rate control is to achieve the optimal quality
of the coded video under the constraint of the given target
rate, which varies adaptively to the communication channels
in practical use. Typically, rate control involves two steps: bit
allocation and bitrate control [1]. The bit allocation step is to
allocate the total amount of bits for the video content to be
coded, and can be implemented at three levels, i.e. group-of-
picture (GOP) level, frame level, and basic unit (BU) level.
The bitrate control step has a target to achieve the allocated
bitrate as precisely as possible. Usually, a rate control scheme
introduces a model that characterizes the relationship between
rate (R) and a coding parameter, e.g. the R-Q model using
quantization parameter (Q) [2], and the R-λmodel using
Lagrange multiplier (λ) [1], the latter is adopted in the current
High Efficiency Video Coding (HEVC) [3] reference software.
Intra frame rate control is significantly important in a video
coding system for two reasons. First, intra frames usually
978-1-5090-5316-2/16/$31.00 c
2017 IEEE.
EEϭ
EEϮ
WĂƌĂŵĞƚĞƌ
ߙ
WĂƌĂŵĞƚĞƌ
ߚ
ʄ
R
E
O D
EEϭ
EEϮ
WĂƌĂŵĞƚĞƌ
ߙ
WĂƌĂŵĞƚĞƌ
ߚ
ʄ
R
E
O D
R
R
ŝƚůůŽĐĂƚŝŽŶΘŝƚƌĂƚĞ
ŽŶƚƌŽů
/ŶƉƵƚ&ƌĂŵĞ
dh
dh
Fig. 1. Our proposed CNN-based intra frame rate control method.
consume much more bits than inter frames. Second, the quality
of intra frames will influence the coding efficiency of the
following inter frames due to inter prediction. Intra frame rate
control is often more difficult, too. While for inter frames, the
R-λmodel is shown to be accurate, since the model parameters
can be updated from the previously coded frames [1], it does
not hold for intra frames. The interval between intra frames is
usually large, so the previously coded intra frames help little
on the following intra frames. For the first frame of a video
sequence, or when scene change occurs, there is actually even
no prior information to rely on. Therefore, the rate control in
intra coding remains a challenge.
In this paper, we propose a convolutional neural network
(CNN) based approach for rate control in HEVC intra coding,
as depicted in Fig. 1. We verify that the R-λmodeling is
accurate for most of the coding tree units (CTUs) in intra
frames, and the major difficulty is to estimate the model
parameters in practice. We then propose to adopt CNN to
predict the model parameters directly from the content of
CTUs, without any presumption or prior information. Then,
the bit allocation and bitrate control algorithms are developed
for intra frame rate control. The experimental results show that
our method significantly outperforms the state-of-the-art one
that is currently used in the HEVC reference software, leading
to 0.46 percent decrease of rate control error and 0.7 percent
BD-rate reduction.
The rest of this paper is organized as follow. The related
works in HEVC rate control are summarized in Section II. In
Section III we introduce the details of our CNN-based intra
VCIP 2017, Dec. 10 - Dec. 13, St Petersburg, Florida, U.S.A.
frame rate control method. The experimental results are shown
in Section IV and Section V concludes the paper.
II. RE LATE D WOR K
R-λmodeling is proposed by Li et al. [1] for HEVC, which
regards the Lagrange multiplier λas the most critical factor to
determine the rate R. This model together with an improved
bit allocation method [4] is adopted in the current HEVC
reference software. The basic R-λmodel reads,
λ=α·Rβ(1)
where αand βare the model parameters that are dependent
on the content. This basic model works well for HEVC inter
coding, because the model parameters of inter frames can be
accurately estimated from those of previously coded frames.
But as mentioned above, the model has difficulty to deal with
intra frames because the parameters are not easy to estimate.
To address such problem, some works proposed to modify
the R-λmodel by introducing more content-dependent param-
eters. For example, in [5] the following model is proposed,
λ=α·C
Rβ
(2)
where Cstands for content complexity, and is estimated based
on the sum of absolutely transformed difference (SATD) of
pixel values. Specifically, a Hadamard transform is performed
on the pixel values of an 8×8 block, and the absolute values of
resulting coefficients are summed up. This model is currently
used in the HEVC reference software.
Moreover, there are some works proposed to introduce the
gradients into the rate model to improve the intra frame rate
control performance [6], [7]. Some other works proposed to
utilize the information of previously coded frames to further
improve [8], [9].
Different from the previous works, in this paper we return to
the basic R-λmodel and try to accurately estimate the model
parameters directly. We are inspired by the great successes of
deep learning in recent years [10] and then motivated to adopt
CNN for the model parameter prediction.
III. THE PRO PO SE D MET HO D
In this section, we first verify the accuracy of the R-λ
modeling at CTU level for HEVC intra coding. Then the CNN-
based prediction of model parameters is introduced in detail.
Finally we develop a CTU level bit allocation algorithm for
intra frame rate control.
A. R-λModeling for CTUs
The R-λmodel in Equation.1 characterizes the relationship
between the Lagrange multiplier λand the rate R. In [1], the
authors have verified that at sequence level R and λaccurately
fit the model. We perform curve fitting with the R and λ
at CTU level by conducting HEVC intra coding with fixed
quantization parameters. Some results are shown in Fig. 2.
From Fig. 2 we observe that the R-λrelationship at CTU
level still matches the model in Equation.1 very well. But
ʄ сϮ͘ϲϳϳďƉƉ
ͲϮ͘ϯ
ZϸсϬ͘ϵϵϵϱ
Ϭ
ϱϬ
ϭϬϬ
ϭϱϬ
ϮϬϬ
ϮϱϬ
ϯϬϬ
ϯϱϬ
ϰϬϬ
Ϭ Ϭ͘Ϯ Ϭ͘ϰ Ϭ͘ϲ Ϭ͘ϴ ϭ
ůĂŵďĚĂǀĂůƵĞ
ďƉƉ
YDĂůůdhŝŶĚĞdžсϰ
ʄ сϵϬ͘ϱϯϰďƉƉ
ͲϮ͘Ϯϭϵ
ZϸсϬ͘ϵϴϴϰ
Ϭ
ϱϬ
ϭϬϬ
ϭϱϬ
ϮϬϬ
ϮϱϬ
ϯϬϬ
ϯϱϬ
ϰϬϬ
ϬϭϮϯϰϱ
ůĂŵďĚĂǀĂůƵĞ
ďƉƉ
WĂƌƚLJ^ĐĞŶĞdhŝŶĚĞdžсϬ
Fig. 2. R-λcurve fitting at CTU level.
we also observe that different CTUs within the same frame,
though adjacent, have quite different parameters (i.e. αand β).
It shows the model parameters are highly content dependent
and the CTUs have quite different characteristics. Thus, how
to accurately estimate the model parameters is a challenge that
hinders the usage of R-λmodel in intra coding.
ϳdžϳĐŽŶǀ͕ϭϲ
ZĞ>Ƶ
ϳdžϳĐŽŶǀ͕ϯϮ
ZĞ>Ƶ
ŵĂdžƉŽŽů
ϳdžϳĐŽŶǀ͕ϭϲ
ZĞ>Ƶ
ϳdžϳĐŽŶǀ͕ϯϮ
ZĞ>Ƶ
ĨĐͺϭϮϴ
ZĞ>Ƶ
ĨĐͺϲϰ
ĨĐͺϭ
ŵĂdžƉŽŽů
Fig. 3. The CNN structure for predicting model parameters αand β.
B. CNN-Based Prediction of Model Parameters
From the above analyses, we conjecture that the model
parameters (αand β) may be predicted from the given video
content. Intuitively, we can leverage the powerful learning
ability of CNN to perform the prediction, which brings two
advantages. First, there is no need to design features (such
as SATD or gradients) as CNN automates feature learning.
Second, it is not necessary to refer to the previously coded
content, thus facilitates parallel encoding.
1) Network Structure: We design a CNN with four convo-
lutional layers, each followed by a rectified linear unit (ReLU),
two max-pooling layers and three full-connection (fc) layers,
as shown in Fig. 3. The final fc layer outputs a predicted value
for a model parameter (αor β). The same structure is used to
predict αor β, but is separately trained for either.
2) Training: We use many natural images to train the
CNN to obtain a universal model. The images come from the
UCID dataset [11] and the RAISE dataset [12]. None of them
appears in the HEVC common test sequences, and therefore,
our coding results can demonstrate the generalizability of the
trained CNN.
The natural images are converted into YUV420 format and
then compressed with the HEVC reference software under 11
different quantization parameters (QPs), ranging from 20 to 40
with an interval of 2. The coding rate and Lagrange multiplier
values of different QPs are collected for each CTU. Then curve
fitting is performed for each CTU using the 11 pairs of (R, λ)
to achieve αand β. Outlier CTUs are then removed, where
the inliers are defined as α[0.05,200] and β[3,0]. At
last, we use the original pixel values of the luma component
of each CTU as input to CNN, and use the corresponding α
or βas label for training CNN. There are 180,000 CTUs used
for training, and another 16,000 CTUs used for validation.
For an original CTU Xn, where n∈ {1, . . . , N }indexes
each CTU, the corresponding label is denoted by yn, the
training of CNN is to minimize the following loss function,
L(Θ) = 1
N
N
X
n=1
kF(Xn|Θ) ynk2(3)
where Θis the set of parameters of the network in Fig. 3. The
training is performed by stochastic gradient descent with the
standard back-propagation.
3) Network Usage: We integrate the two trained networks
(for αand β, respectively) into HEVC intra frame rate control.
Before encoding one frame, we use the original pixel values
of the luma component of each CTU to predict its model
parameters. It is worth noting that CTUs can be processed
in parallel in the prediction process.
For boundary CTUs whose sizes are less than the normal
size 64×64, they are first padded to the normal size (padded
pixel value is constant 128) and then sent to the trained
CNN. Afterwards, we adjust the CNN predicted parameters for
these boundary CTUs to take the padding effect into account.
Specifically, as shown in Fig. 4, the parameters of the original
CTU (a) and the padded one (b) are given by,
λ(a)=α(a)· R(a)
N(a)
pix !β(a)
, λ(b)=α(b)· R(b)
N(b)
pix !β(b)
(4)
where Ris the amount of coded bits and Npix denotes the
amount of pixels. As the CNN predicts αand βfor (b), we
need the values of (a). The following rectification process is
proposed for this purpose.
Note that the padded pixels have constant value, which
should cost very few bits, so we assume the amount of coded
bits is the same for (a) and (b). Also, we suppose the βvalue
of (a) and (b) are nearly the same, because we empirically
observe the value of beta varies in a small range. Given the
same value of λ, we then have:
λ(a)=λ(b), R(a)=R(b), β(a)β(b)(5)
Combining (4) and (5), we have the rectified αvalue as:
α(a)=α(b)·Sab (6)
where
Sab = N(a)
pix
N(b)
pix !β(b)
(7)
In our experiments, the rectification factor Sab is further
clipped into the range [1,4] empirically.
Fig. 4. Illustration of CTU padding: (a) original CTU, (b) CTU after padding.
C. Bit Allocation
Given a target rate Rffor an intra frame, we first solve
a frame-level λ, i.e. λfby numerically solving the following
equation via the method of bisection,
Nf
X
i=1
αBiλβBi
f=Rf(8)
where Nfis the number of CTUs in the frame, αBiand βBi
are the model parameters of each CTU. Then, we use the Basic
Unit (BU) level bit allocation method proposed in [4], which
is also used for inter frame rate control in the current HEVC
reference software.
IV. EXP ER IM EN TAL RE SULTS
The proposed CNN-based intra frame rate control method
is implemented into the HEVC reference software (HM-16.9).
We follow the HEVC common test conditions in experiments.
Five classes, i.e. A, B, C, D and E, including 20 sequences are
tested. Class F is omitted as it contains screen content videos,
and our CNN is trained from natural images and may not suit
for screen content. For each sequence, only the first frame is
tested because of two reasons. First, it is well known that the
performance of intra coding methods can be well reflected by
results of a small number of frames. Second, as mentioned
above, rate control for the first intra frame is very challenging
as no prior information can be used.
TABLE I
RES ULTS O F RATE CO NT ROL E RROR
Average rate error
RC in HM-16.9 Proposed RC
Class A 0.73% 0.39%
Class B 0.94% 0.62%
Class C 1.41% 1.24%
Class D 3.01% 2.02%
Class E 1.77% 1.25%
Average 1.53% 1.07%
TABLE II
RES ULTS O F BD-RATE,T HE A NCH OR I S TUR NI NG OFF R ATE CO NTR OL
Average BD-rate
RC in HM-16.9 Proposed RC
Y U V Y U V
Class A 1.3% 5.9% 5.2% 0.3% 5.1% 4.4%
Class B 2.3% 6.9% 7.7% 1.3% 4.7% 4.4%
Class C 1.4% 9.0% 8.1% 1.2% 5.5% 6.4%
Class D 1.6% 5.6% 5.2% 1.2% 2.1% 3.2%
Class E 2.6% 4.7% 5.6% 1.6% 3.4% 3.9%
Average 1.8% 6.5% 6.5% 1.1% 4.3% 4.5%
TABLE III
EXA MPL E DE TAILE D RE SULT S
RC in HM-16.9 Proposed RC
Sequence Target bitrate Bitrate Y-PSNR Bitrate error Bitrate Y-PSNR Bitrate error
ParkScene 48754.75 48884.352 41.53 0.27% 48854.59 41.58 0.20%
26321.28 26462.98 38.67 0.54% 26433.22 38.71 0.43%
13756.42 13846.08 35.79 0.65% 13826.69 35.84 0.51%
6773.18 6805.44 33.03 0.48% 6802.37 33.10 0.43%
RaceHorseC 17542.80 17656.08 41.98 0.65% 17477.28 42.08 0.37%
10761.84 10976.40 38.42 1.99% 10836.96 38.42 0.70%
6191.28 6402.72 34.85 3.42% 6297.60 34.78 1.72%
3222.48 3299.04 31.22 2.38% 3276.96 31.14 1.69%
We use the HM software when turning off rate control to
compress each sequence at different QPs, then use the resulting
bitrates as the target bitrates for our rate control method, as
well as the SATD-based method [5] which is currently the
default rate control method in HM.
The results of rate control error and BD-rate are shown in
Tables I and II, respectively. Some detailed results are further
shown in Table III. In the tables, the rate control error is
measured by,
E=|RtRa|
Rt
×100% (9)
where Rtis the target bitrate, Rais the actual bitrate.
From Table I we can observe that our proposed method
achieves a more accurate rate control performance compared
to the SATD-based method, leading to on average 0.46 percent
decrease of rate control error. We attribute this to our CNN-
based method, where the CNN can accurately predict the
parameters of the R-λmodel.
To evaluate the compression efficiency, we use the BD-rate
between the scheme with rate control and the scheme without.
The comparative results are shown in Table II. Note that the
scheme with rate control is usually a little worse than the
scheme without in terms of rate-distortion performance. It can
be observed that our method also outperforms the SATD-based
method, leading to on average 0.7 percent BD-rate reduction
in Y component, and more than 2 percent BD-rate reduction
in U and V components.
Our current implementation is straightforward and not op-
timized. In our experiments, we observe the encoding time
of our method is about 2.2 times to that of the SATD-based
method, due to the CNN-based prediction. Nonetheless, we an-
ticipate the computational time of CNN can be reduced greatly
if using graphics processing unit or specialized hardware.
V. CONCLUSIONS
In this paper, we have presented a convolutional neural
network-based approach for HEVC intra frame rate control.
The proposed rate control method outperforms the SATD-
based one, which is used in the current HEVC reference
software, in terms of both rate control accuracy and compres-
sion efficiency. Since the model parameters are obtained by
CNN, which is trained by many natural images, our method
is universal and does not require any prior information, and
thus especially suits for the first frame or deals with scene
change.
VI. ACKNOWLEDGMENT
This work was supported in part by the Nation-
al Key Research and Development Plan under Grant
No.2016YFC0801001, and National 973 Program of China
under Grant 2015CB351803, and NSFC (Natural Science
Foundation of China) under Grant 61571413, 61390514 and
Intel ICRI MNC.
REFERENCES
[1] B. Li, H. Li, L. Li, and J. Zhang, “λdomain rate control algorithm for
high efficiency video coding,IEEE Transactions on Image Processing,
vol. 23, no. 9, pp. 3841–3854, Sept 2014.
[2] H. Choi, J. Nam, J. Yoo, D. Sim, and I. Bajic, “Rate control based
on unified RQ model for HEVC,” ITU-T SG16 Contribution, JCTVC-
H0213, pp. 1–13, 2012.
[3] G. J. Sullivan, J. R. Ohm, W. J. Han, and T. Wiegand, “Overview of the
high efficiency video coding (HEVC) standard,IEEE Transactions on
Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–
1668, 2012.
[4] L. Li, B. Li, H. Li, and C. W. Chen, “λdomain optimal bit allocation
algorithm for high efficiency video coding,IEEE Transactions on
Circuits and Systems for Video Technology, vol. PP, no. 99, pp. 1–1,
2016.
[5] M. Karczewicz and X. Wang, “Intra frame rate control based on SATD,”
in JCTVC 13th Meeting, Incheon, KR, 2013.
[6] M. Wang, K. N. Ngan, and H. Li, “An efficient frame-content based
intra frame rate control for high efficiency video coding,IEEE Signal
Processing Letters, vol. 22, no. 7, pp. 896–900, 2015.
[7] B. Hosking, D. Agrafiotis, D. Bull, and N. Eastern, “An adaptive resolu-
tion rate control method for intra coding in HEVC,” in Acoustics, Speech
and Signal Processing (ICASSP), 2016 IEEE International Conference
on. IEEE, 2016, pp. 1486–1490.
[8] C. Sheng, F. Chen, Z. Peng, and W. Chen, “An adaptive bit mismatch
rectification algorithm for intra frame rate control in HEVC,” in Image
and Signal Processing (CISP), 2015 8th International Congress on.
IEEE, 2015, pp. 80–84.
[9] M. Zhou, Y. Zhang, B. Li, and H.-M. Hu, “Complexity-based intra frame
rate control by jointing inter-frame correlation for high efficiency video
coding,” Journal of Visual Communication and Image Representation,
2016.
[10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural infor-
mation processing systems, 2012, pp. 1097–1105.
[11] G. Schaefer and M. Stich, “UCID: An uncompressed color image
database,” in Electronic Imaging 2004. International Society for Optics
and Photonics, 2003, pp. 472–480.
[12] D.-T. Dang-Nguyen, C. Pasquini, V. Conotter, and G. Boato, “RAISE:
A raw images dataset for digital image forensics,” in Proceedings of the
6th ACM Multimedia Systems Conference. ACM, 2015, pp. 219–224.
... Various deep tools, such as encoding optimization tools [39][40][41], picture prediction tools [42][43][44], and in-loop or postfiltering tools [45] [46], have been considered in the literature. Xu et al. [41] proposed using the CNN (convolutional neural network) and LSTM (long short-term memory) to facilitate the CU partition of intra-picture prediction in HEVC. ...
Article
Full-text available
Versatile Video Coding (VVC) is the latest video coding standard, which provides significant coding efficiency to its successors based on new coding tools and flexibility. In this paper, we propose a generative adversarial network-based inter-picture prediction approach for VVC. The proposed method involves two major parts, deep attention map estimation and deep frame interpolation. Adjacent VVC-coded frames in every other frame are taken as the reference data for the proposed inter-picture prediction. The deep attention map classifies pixels into high-interest and low-interest. The low-interest pixels are replaced by the generated data from frame interpolation without extra coded bits, while the other pixels are encoded using the conventional VVC coding tools. The generation of the attention map and interpolated frame can be incorporated into the VVC encoding algorithm under a unified framework. Experimental results show that the proposed method improves the coding efficiency of VVC with a moderate increase (26.7%) in runtime. An average 1.91% BD-rate savings compared to the VVC reference software under the Random-Access configuration was achieved, where significant bitrate reduction for chroma components (U and V) was observed.
... Since the HEVC mainly focuses on compressing high-resolution videos, a high-resolution image dataset is selected. Among the several various datasets, the RAISE dataset [119] is chosen, which has been utilized for intra CU partitioning [4,47,54,62,[120][121][122]. It includes pictures of 4 K resolution images without manipulation that are taken by four photographers from different places over three years. ...
Article
Full-text available
The High-Efficiency Video Coding (HEVC) standard has high compression efficiency. This efficiency is achieved at the expense of increasing the computational complexity. The HEVC encoder has the hierarchical search for optimal Coding Unit (CU) partitioning. It is based on rate-distortion optimization. Various solutions are proposed to reduce the encoding time. But, the machine learning-based methods have more effective in reducing the encoding time. Yet, deep learning tools have a relatively high computational load. So, in this paper a new low complexity convolutional neural network has been designed. It is called Convolutional Neural Network-based CTU Partitioner (CNNCP). It reduces the computational complexity of the HEVC encoding. The CNNCP takes the CTU luminance component and the quantization parameter (QP) as inputs, and provides the CU depth matrix in output at once. The CNNCP does not follow the hierarchical approach. Thus, it has a fixed computation structure that facilitates the use of parallel processing tools. The CNNCP has a simple structure with a least number of parameters, and thus, it has the least computational complexity. It has been trained and tested with a large database for all QP values. The results show that it reduced the encoding time by more than 90%, and makes it suitable for real-time applications.
Article
High dynamic range (HDR) video offers a more realistic visual experience than standard dynamic range (SDR) video, while introducing new challenges to both compression and transmission. Rate control is an effective technology to overcome these challenges, and ensure optimal HDR video delivery. However, the rate control algorithm in the latest video coding standard, versatile video coding (VVC), is tailored to SDR videos, and does not produce well coding results when encoding HDR videos. To address this problem, a data-driven λ-domain rate control algorithm is proposed for VVC HDR intra frames in this paper. First, the coding characteristics of HDR intra coding are analyzed, and a piecewise R -λ model is proposed to accurately determine the correlation between the rate ( R ) and the Lagrange parameter λ for HDR intra frames. Then, to optimize bit allocation at the coding tree unit (CTU)-level, a wavelet-based residual neural network (WRNN) is developed to accurately predict the parameters of the piecewise R -λ model for each CTU. Third, a large-scale HDR dataset is established for training WRNN, which facilitates the applications of deep learning in HDR intra coding. Extensive experimental results show that our proposed HDR intra frame rate control algorithm achieves superior coding results than the state-of-the-art algorithms. The source code of this work will be released at https://github.com/TJU-Videocoding/WRNN.git.
Article
Mainstream image/video coding standards, exemplified by the state-of-the-art H.266/VVC, AVS3, and AV1, follow the block-based hybrid coding framework. Due to the block-based framework, encoders designed for these standards are easily optimized for peak signal-to-noise ratio (PSNR) but have difficulties optimizing for the metrics more aligned to perceptual quality, e.g. multi-scale structural similarity (MS-SSIM), since these metrics cannot be accurately evaluated at the small block level. We address this problem by leveraging inspiration from the end-to-end image compression built on deep networks, which is easily optimized through network training for any metric as long as the metric is differentiable. We compared the trained models using the same network structure but different metrics and observed that the models allocate rates in different ratios. We then propose a distillation method to obtain the rate allocation rule from end-to-end image compression models with different metrics and to utilize such a rule in the block-based encoders. We implement the proposed method on the VVC reference software – VTM and the AVS3 reference software – HPM, focusing on intraframe coding. Experimental results show that the proposed method on top of VTM achieves more than 10% BD-rate reduction than the anchor when evaluated with MS-SSIM or LPIPS, which leads to concrete perceptual quality improvement.
Article
Rate control, which typically includes bit allocation and quantization parameter (QP) determination, plays an important role in real-world video coding applications. In this paper, we propose a novel rate control scheme for AOMedia Video 1 (AV1) which enjoys adaptive bit allocation and effective QP determination. In particular, two supporting vector regression (SVR) models are learned for the hierarchical bit allocation and frame-level parameter estimation. To train the models, the multi-pass coding strategy is utilized for training data acquisition. Compared to the default scheme in AV1 and the state-of-the-art method, the proposed rate control scheme achieves superior performance in terms of bitrate accuracy and coding efficiency.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
Rate control is of great significance for the High Efficiency Video Coding (HEVC). Due to the high efficiency and low complexity, the R-lambda model has been applied to the HEVC as the default rate control algorithm. However, the video content complexity, which can help improve the code efficiency and rate control performance, is not fully considered in the R-lambda model. To address this problem, an intra-frame rate control algorithm, which aims to provide improved and smooth video quality, is developed in this paper by jointly taking into consideration the frame-level content complexity between the encoded intra frames and the encoded inter frame, as well as the CTU-level complexity among different CTUs in texture–different regions for intra-frame. Firstly, in order to improve the rate control efficiency, this paper introduces a new prediction measure of content complexity for CTUs of intra-frame by jointly considering the inter-frame correlations between encoding intra frame and previous encoded inter frames as well as correlations between encoding intra frame and previous encoded intra frame. Secondly, a frame-level complexity-based bit-allocation-balancing method, by jointly considering the inter-frame correlation between intra frame and previous encoded inter frame, is brought up so that the smoothness of the visual quality can be improved between adjacent inter- and intra-frames. Thirdly, a new region-division and complexity-based CTU-level bit allocation method is developed to improve the objective quality and to reduce PSNR fluctuation among CTUs in intra-frame. In the end, related model parameters are updated during the encoding process to increase rate control accuracy. As a result, as can be seen from the extensive experimental results that compared with the state-of-the-art schemes, the video quality can be significantly improved. More specifically, up to 10.5% and on average 5.2% BD-Rate reduction was achieved compared to HM16.0 and up to 2.7% and an average of 2.0% BD-Rate reduction was achieved compared to state-of-the-art algorithm. Besides, a superior performance in enhancing the smoothness of quality can be achieved, which outperforms the state-of-the-art algorithms in term of flicker measurement, frame and CTU-wise PSNR, as well as buffer fullness.
Article
Rate control typically involves two steps: bit allocation and bitrate control. The bit allocation step can be implemented in various fashions depending on how many levels of allocation are desired and whether or not an optimal ratedistortion (R-D) performance is pursued. The bitrate control step has a simple aim in achieving the target bitrate as precisely as possible. In our recent research, we have developed a -domain rate control algorithm capable of controlling the bitrate precisely for High Efficiency Video Coding (HEVC). The initial research [1] showed that the bitrate control in the -domain can be more precise than the conventional schemes. However, the simple bit allocation scheme adopted in this initial research is unable to achieve an optimal R-D performance reflecting the inherent R-D characteristics governed by the video content. In order to achieve an optimal R-D performance, the bit allocation algorithms need to be developed taking into account the video content of a given sequence. The key issue in deriving the video content-guided optimal bit allocation algorithm is to build a suitable R-D model to characterize the R-D behavior of the video content. In this research, to complement the R- model developed in our initial work [1], a D- model is properly constructed to complete a comprehensive framework of -domain R-D analysis. Based on this comprehensive -domain R-D analysis framework, a suite of optimal bit allocation algorithms are developed. In particular, we design both picture level and Basic Unit level bit allocation algorithms based on the fundamental rate-distortion optimization (RDO) theory to take full advantage of content-guided principles. The proposed algorithms are implemented in HEVC reference software, and the experimental results demonstrate that they can achieve obvious R-D performance improvement with smaller bitrate control error. The proposed bit allocation algorithms have already been adopted by the Joint Collaborative Team on Video Coding (JCT-VC) and integrated into the HEVC reference software.
Conference Paper
Rate control is an integral part of high-fidelity video coding. Since the first call for proposal, many rate control algorithms have been incorporated into the HEVC standard to prepare for its practical use in the future. The state of the art SATD based intra frame rate control strategy (SATDRC) achieves stable controlling performance. However, the average bit mismatch on the frame level (BMF) reaches 0.62%, with the highest 1.47%, which may lead to unsteady video services in bandwidth constrained applications. In this paper, an adaptive frame level bit mismatch rectification scheme (AFBMR) is proposed for SATDRC to reduce the gap between allocated bits and generated bits. BMF is utilized to adjust bits allocated to a frame adaptively in accordance with sequence characteristics. Experimental results show that the proposed method reduces bit inconsistency by 37.33% on average, enhancing bit rate accuracy as well as providing steadier bit streams without introducing extra computational complexity and loss of video quality.
Conference Paper
Digital forensics is a relatively new research area which aims at authenticating digital media by detecting possible digital forgeries. Indeed, the ever increasing availability of multimedia data on the web, coupled with the great advances reached by computer graphical tools, makes the modification of an image and the creation of visually compelling forgeries an easy task for any user. This in turns creates the need of reliable tools to validate the trustworthiness of the represented information. In such a context, we present here RAISE, a large dataset of 8156 high-resolution raw images, depicting various subjects and scenarios, properly annotated and available together with accompanying metadata. Such a wide collection of untouched and diverse data is intended to become a powerful resource for, but not limited to, forensic researchers by providing a common benchmark for a fair comparison, testing and evaluation of existing and next generation forensic algorithms. In this paper we describe how RAISE has been collected and organized, discuss how digital image forensics and many other multimedia research areas may benefit of this new publicly available benchmark dataset and test a very recent forensic technique for JPEG compression detection.
Article
Rate control plays an important role in the rapid development of high-fidelity video services. As the High Efficiency Video Coding (HEVC) standard has been finalized, many rate control algorithms are being developed to promote its commercial use. The HEVC encoder adopts a new R-lambda based rate control model to reduce the bit estimation error. However, the R-lambda model fails to consider the frame-content complexity that ultimately degrades the performance of the bit rate control. In this letter, a gradient based R-lambda (GRL) model is proposed for the intra frame rate control, where the gradient can effectively measure the frame-content complexity and enhance the performance of the traditional R-lambda method. In addition, a new coding tree unit (CTU) level bit allocation method is developed. The simulation results show that the proposed GRL method can reduce the bit estimation error and improve the video quality in HEVC all intra frame coding.
Article
Rate control is a useful tool for video coding, especially in real-time communication applications. Most of existing rate control algorithms are based on the (R-Q) model, which characterizes the relationship between bitrate (R) and quantization (Q) , under the assumption that (Q) is the critical factor on rate control. However, with the video coding schemes becoming more and more flexible, it is very difficult to accurately model the (R-Q) relationship. In fact, we find that there exists a more robust correspondence between (R) and the Lagrange multiplier (lambda ) . Therefore, in this paper, we propose a novel (lambda ) -domain rate control algorithm based on the (R-lambda ) model, and implement it in the newest video coding standard high efficiency video coding (HEVC). Experimental results show that the proposed (lambda ) -domain rate control can achieve the target bitrates more accurately than the original rate control algorithm in the HEVC reference software as well as obtain significant R-D performance gain. Thanks to the high accurate rate control algorithm, hierarchical bit allocation can be enabled in the implemented video coding scheme, which can bring additional R-D performance gain. Experimental results demonstrate that the proposed (lambda ) -domain rate control algorithm is effective for HEVC, which outperforms the (R-Q) model based rate control in HM-8.0 (HEVC reference software) by 0.55 dB on average and up to 1.81 dB for low delay coding structure, and 1.08 dB on average and up to 3.77 dB for random access coding structure. The proposed (lambda ) -domain rate control algorithm has already been adopted by Joint Collaborative Team on Video Coding and integrated into the HEVC reference software.
Article
High Efficiency Video Coding (HEVC) is currently being prepared as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality. This paper provides an overview of the technical features and characteristics of the HEVC standard.
Conference Paper
Standardised image databases or rather the lack of them are one of the main weaknesses in the field of content based image retrieval (CBIR). Authors often use their own images or do not specify the source of their datasets. Naturally this makes comparison of results somewhat difficult. While a first approach towards a common colour image set has been taken by the MPEG 7 committee their database does not cater for all strands of research in the CBIR community. In particular as the MPEG-7 images only exist in compressed form it does not allow for an objective evaluation of image retrieval algorithms that operate in the compressed domain or to judge the influence image compression has on the performance of CBIR algorithms. In this paper we introduce a new dataset, UCID (pronounced "use it") - an Uncompressed Colour Image Dataset which tries to bridge this gap. The UCID dataset currently consists of 1338 uncompressed images together with a ground truth of a series of query images with corresponding models that an ideal CBIR algorithm would retrieve. While its initial intention was to provide a dataset for the evaluation of compressed domain algorithms, the UCID database also represents a good benchmark set for the evaluation of any kind of CBIR method as well as an image set that can be used to evaluate image compression and colour quantisation algorithms.