Conference PaperPDF Available

Corner proposals from HEVC bitstreams

Authors:

Abstract and Figures

Corner-like features are important in computer vision problems such as object matching, tracking, recognition, and retrieval. Most corner detectors operate in the pixel domain, which means that they require image or video to be fully decoded and reconstructed before detection can start. In this paper we describe a method for generating corner proposals from compressed HEVC bitstreams without full decoding. Specifically, we utilize HEVC syntax and intra prediction directions to find the locations that are likely to contain corners. The proposed method is lightweight and can be applied to intra-coded frames or still images coded by HEVC. Experimental results illustrate that the proposed method is able to identify most regions where conventional pixel-domain corner detectors would find corners.
Content may be subject to copyright.
To be presented at IEEE ISCAS’17, Baltimore, MD, USA, May 2017.
Corner Proposals from HEVC Bitstreams
Hyomin Choi and Ivan V. Baji´
c
School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada
Email: {chyomin, ibajic}@sfu.ca
Abstract—Corner-like features are important in computer
vision problems such as object matching, tracking, recognition,
and retrieval. Most corner detectors operate in the pixel domain,
which means that they require image or video to be fully
decoded and reconstructed before detection can start. In this
paper we describe a method for generating corner proposals from
compressed HEVC bitstreams without full decoding. Specifically,
we utilize HEVC syntax and intra prediction directions to find
the locations that are likely to contain corners. The proposed
method is lightweight and can be applied to intra-coded frames
or still images coded by HEVC. Experimental results illustrate
that the proposed method is able to identify most regions where
conventional pixel-domain corner detectors would find corners.
KeywordsCorner detection, corner proposals, HEVC, intra
prediction
I. INTRODUCTION
Corner-like features are used in various computer vision ap-
plications, such as object matching, tracking, recognition, and
retrieval. A number of corner detectors have been developed
over the last three decades, starting with classic Harris [1]
and Shi-Tomasi [2] detectors. Conventional corner detectors
analyze image characteristics such as pixel intensity, edges, or
shapes, and can be classified as into intensity-based, model-
based, and contour-based detectors [3]. They run in pixel
domain and therefore require images or video frames to be
fully decoded before detection can proceed.
In the current era of Big Data, reducing the complexity and
memory requirements of various image processing algorithms
is one of the key requirements for making new large-scale
applications possible. It would be beneficial if corner-like
features could be localized in the compressed domain, without
full decoding and reconstruction. In the past, there were
proposals to detect various image features in the compressed
domain [4], [5]. But these proposals were exploiting earlier
compression standards, such as JPEG and MPEG-2 (Intra),
where pixel values were directly transformed and therefore
transform coefficients had a direct relationship with the orig-
inal pixel values. Intra coding in recent coding standards
such as H.264/AVC [6] and High Efficiency Video Coding
(HEVC) [7] employs intra prediction prior to transformation,
making the relationship between pixel values and transform
coefficients more complicated. Nonetheless, some clues about
the underlying pixel-domain strictures may be inferred from
the coding syntax, without full decoding.
In this paper we present a method for generating cor-
ner proposals from HEVC bitstreams. Specifically, we ana-
lyze HEVC intra prediction directions and other syntax el-
ements [8], and infer regions where corner-like features are
likely to be found. The proposed method operates on the output
of the HEVC entropy decoding module. It is fast (in fact,
Fig. 1: Illustration of HEVC intra prediction directions
faster than full image decoding), does not require full image
reconstruction, and is able to identify most locations where
conventional detectors [1], [2] would find corners. Identified
regions could be used to initiate compressed-domain track-
ing [9], help with compressed-domain object detection [10], or
to reduce the search space of conventional pixel-domain corner
detectors after decoding. The proposed method is described in
Section II, followed by experimental results in Section III and
conclusions in Section IV.
II. CORNER PRO PO SA LS F ROM HEVC BITSTREA MS
A. Preliminaries
Intra prediction in HEVC employes 33 angular and 2 non-
angular prediction modes [8]. Angular prediction modes are
uniformly spaced by 5.625to cover a 180range. Block size
and prediction mode are decided by rate-distortion optimiza-
tion (RDO). Fig. 1 illustrates intra prediction modes by red
arrows. It is apparent that near the edges, block size tends to be
small (usually 4×4or 8×8) while prediction direction tends to
follow the underlying edge structure. The relationship between
edge structure and prediction has been recognized before, and
a number of methods have been proposed for accelerating
mode decision process at the encoder for both H.264/AVC
and HEVC [11] [12] [13] [14]. In the present work we utilize
these observations to infer the likely positions of corners.
B. Corner proposals
To generate corner proposals, we process the blocks in
an intra-coded HEVC frame in the raster scan order. For
illustration purposes, four neighboring 4×4blocks are shown
in Fig. 2. The blocks are labeled 1through 4in Z-scan
order, and the figure focuses on the first and fourth blocks
to illustrate the concept. Prediction directions are indicated by
red arrows, and their angles (θ1and θ4in Fig. 2, measured
counterclockwise) are deduced from intra prediction modes.
To be presented at IEEE ISCAS’17, Baltimore, MD, USA, May 2017.
Fig. 2: An example of intersection coordinate by two directions
We imagine there is a line passing through the center of
each block i. We call these intra prediction lines. Given the
coordinates (xi, yi)of the center, the equation for the line in
block iis
(x, y)=(xi, yi) + ai(cos θi,sin θi),(1)
where aiis a parameter. If the lines intersect, the angle at the
intersection is given by
θi,j =θiθj.(2)
If |θi,j |∈{0,180}, the lines are parallel and will not
intersect.
For intersecting lines, the point of intersection can be found
by equating the equations of the two lines and solving for
the common point (xi,j , yi,j ). For brevity, we present the
solution in a compact form [15]. Let vi= (cos θi,sin θi)
be the unit vector in the prediction direction in block i, and
v
i= (sin θi,cos θi)be a vector perpendicular to vi, then
(xi,j , yi,j )=(xi, yi) +
v
j·(xjxi, yjyi)
v
j·vi
vi,(3)
where symbol ·indicates dot product.
If prediction direction lines intersect, this may be an indi-
cation of a corner formed by the underlying edge structures,
and we would select the corresponding 4×4block as a corner
proposal. However, for every group of four neighboring blocks
there could be up to 4
2= 6 intersections of intra prediction
lines. This could lead to a large number of corner proposals.
It is therefore necessary to prune some of the less likely
candidates to arrive at a more reasonable number, as explained
in the next section.
Finally, note that conventional corner detectors [1], [2] also
consider terminus (end-of-line) points as corners. For example,
in Fig. 3, corners would be detected not only where the lines
meet (red-colored hollow circles), but also at the other end
of each line (red-colored solid circles). We therefore generate
corner proposals in cases where we expect the end of an
underlying edge or line. We found that terminus points are
usually found where one of the neighboring blocks is coded
Fig. 3: Sample synthetic corners
in an angular intra-prediction mode, and the other in non-
angular (DC or Planar) mode. In this case, when the prediction
direction points towards the neighboring non-angular coded
block, then both blocks are selected as corner proposals.
C. Pruning
The first and simplest criterion for pruning is block size.
Specifically, we only consider 4×4and 8×8blocks in
generating corner proposals. Larger blocks are generally found
in flat areas (see Fig. 1), so we don’t consider them as
generators for corner proposals.
The second pruning criterion is the existence of non-
zero transform coefficients in the block, which can easily be
checked using the cbf_luma flag. When considering a pair of
blocks whose intra prediction lines could potentially generate
a corner, we require that at least one of them has non-zero
transform coefficients (cbf_luma = 1), otherwise we don’t
compute the intersection point for that pair of blocks. The
above two criteria apply to both terminus and non-terminus
corner proposals, while the remaining criteria apply only to
non-terminus proposals.
The next criterion is the angle of intersection of intra
prediction lines. In Fig. 1 we see that for neighboring blocks
along the object’s contour, intra prediction directions differ
slightly from block to block. This means that their angle
difference |θi,j|is either very small (close to 0) or very
large (close to 180). However, there is no corner in these
blocks since the underlying contour bends smoothly. So we
should eliminate very small and very large intersection angles
when generating corner proposals. To decide on the range of
valid angles, we ran an OpenCV1implementation of the Harris
corner detector [1] with default settings on synthetic images of
corners (Fig. 3). The corner angles ranged from 0to 180in
steps of 5.625, to emulate the corners that could be obtained
from the intersections of intra prediction lines. The Harris
detector did not detect corners larger than 157.5, which is
22.5away from 180. Hence we used the same threshold to
eliminate corner proposals associated with very small or very
large angles. The intersection of intra prediction lines is only
sought if the following condition is satisfied:
22.5<|θi,j |<157.5.(4)
The above criteria can be easily checked from HEVC syn-
tax without much computation. If a pair of blocks passes these
criteria, we compute the intersection point from (3). Since
corners are local features, to be a valid corner proposal, we
require the intersection point to be close to the blocks whose
intra prediction lines generated it. We define a valid region as
the region comprising the four blocks being processed along
1http://opencv.org/
To be presented at IEEE ISCAS’17, Baltimore, MD, USA, May 2017.
Fig. 4: Intersections within a valid region
with a 4-pixel boundary on each side (Fig. 4). If the intra
prediction lines intersect within a valid region, they would be
considered corner proposals, otherwise they are ignored. In
Fig. 4, three intersections are within the valid region.
III. EXP ER IM EN TAL RES ULT S
Our method generates corner proposals by identifying a
collection of 4×4blocks where corners are likely to be found.
We evaluate the proposed method by examining how well it
can localize corners detected by benchmark detectors [1], [2].
We construct two metrics for this purpose: corner coverage
(CC) and image coverage (IC). CC is defined as a fraction of
corners found by one of the benchmark detectors ([1] or [2])
that are covered by the corner proposal blocks. IC is the
fraction of the image covered by the corner proposal blocks.
Ideally, we would have CC of 100% while having small IC
(i.e., cover all corners with as few proposals as possible).
We compare our method against a recent fast corner
detector [16] that we call FAST from here on. Since FAST
outputs detected corner coordinates like most conventional
detectors, we round those coordinates to the nearest 4×4
blocks according to (5), to make its output comparable to ours.
Then we can compute CC and IC in the same way as for our
method. Coordinate rounding is performed as follows:
(bx, by) = x+ 2
4,y+ 2
4.(5)
All experiments are performed on the system whose con-
figuration is shown in Table I. Our method is implemented
in the HEVC reference decoder (version HM16.12), while
corner detectors [1], [2], [16] are from OpenCV (version
3.1.0). We selected 13 images for evaluation, listed in Tables II
and III. Two of them (Checkerboard and CheckerboardCube)
are synthetic images and the others are natural images. A few
images are borrowed from HEVC common test sequences.
The images cover a wide range of resolutions from VGA
(640 ×480) to 4K UHD (3840 ×2160). They were coded
by the HM16.12 encoder following common conditions [17]
for intra coding, with QP = 27.
TABLE I: Experimental environment
Item Specification
Processor Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
RAM 64 GB
Operating system Windows Server 2012 R2 64-bit
Compiler Visual Studio 2015
HEVC decoder HM16.12
Other software OpenCV 3.1.0
Table II shows the results in terms of IC and CC. The
parameters of FAST [16] were set to give approximately the
same IC as our method, so that we can have a meaningful
comparison of CC values. As seen in the table, the CC values
of our method are fairly similar to those of FAST [16],
especially when computed against corners found by the Harris
detector. In some cases our method achieves higher CC values
(e.g., Checkerboard) and in other cases it has lower CC values
than FAST. CC values themselves should be taken with a grain
of salt, though, as illustrated in Fig. 5, which shows corners
and corner proposals (red dots) found by the four detectors
on the Checkerboard image. All four detectors find the vast
majority of correct corners, with FAST missing out an a few,
which brings its CC down slightly. The corners that are missed
by FAST but detected by other methods are shown in green in
Fig. 5(a),(c),(d). Note that even the two benchmark detectors
themselves do not fully agree on the corners even in this simple
synthetic image: Shi-Tomasi detector [2] finds corners along
the left image boundary (as does ours), but Harris detector [1]
does not. The discrepancy is much larger on natural images,
where testing one benchmark detector against the other usually
gives CC values around 50% (full results not shown due to
space constraints).
We note in Table II that our method achieves the lowest
CC values on the Kimono image. This image is shown in
Fig. 6 along with corners and corner proposals found by the
four detectors. We see that Harris, Shi-Tomasi, and FAST
detectors find many corners in the highly textured background,
while our detector finds corner proposals along the boundary
of the foreground object (person). This explains why the CC
of our method is low on this image. Also, one could argue
that in this case corner proposals found by our method are
more reasonable for many applications (e.g., object detection
or tracking) compared to corners found by the other three
(a) Proposed (b) FAST [16]
(c) Shi-Tomasi [2] (d) Harris [1]
Fig. 5: Results on Checkerboard
To be presented at IEEE ISCAS’17, Baltimore, MD, USA, May 2017.
TABLE II: Image and corner coverage comparison
Test image Method IC (%) Shi-Tomasi CC (%) Harris CC (%)
BQMall Proposed 33.48 76.43 80.12
FAST 34.86 85.66 79.92
CheckerboardCube Proposed 6.40 80.18 74.42
FAST 6.09 85.85 78.14
Checkerboard Proposed 0.72 99.09 100.00
FAST 0.67 76.82 80.38
BQTerrace Proposed 27.41 74.22 75.38
FAST 27.68 87.17 79.04
Kimono Proposed 0.67 3.52 9.20
FAST 0.61 60.80 47.24
Typing Proposed 8.20 62.72 70.91
FAST 7.70 79.31 67.89
Roof Proposed 22.07 73.60 80.60
FAST 21.37 80.61 67.63
Blocks Proposed 2.48 25.31 37.40
FAST 2.51 72.95 65.75
PeopleOnStreet Proposed 22.00 61.46 69.24
FAST 23.06 81.62 70.03
Traffic Proposed 16.66 63.48 71.95
FAST 16.62 79.17 70.42
Thai Proposed 13.03 66.50 68.74
FAST 13.20 84.21 70.57
Scene Proposed 8.58 30.77 30.95
FAST 8.62 79.24 74.79
Pens Proposed 0.98 50.50 56.62
FAST 0.97 72.25 60.92
detectors. Low CC scores of our method generally occur in
images that contain highly textured regions (Kimono, Blocks,
Scene, and Pens) where other detectors find corners and ours
does not. The reason why our detector avoids highly-textured
regions is that those blocks tend to be coded with DC mode,
whereas our detector requires at least one block in a four-block
neighborhood to be coded with an intra-prediction mode in
order to consider corner proposals.
Finally, the execution time of each detector is shown in
Table III. Execution time includes all the necessary decoding
(entropy decoding only in our case) and processing. Not
surprisingly, our detector is the fastest, because it does not
require full decoding. In fact, our corner proposals are gen-
erated faster than full decoding, making it very attractive for
applications where speed is crucial. Another desirable feature
of our method (which is not reflected in these results) is the
low memory requirement, because the image does not have to
be reconstructs and stored in memory as with other detectors.
IV. CONCLUSION
We presented a method for fast generation of corner
proposals from HEVC bitstreams. By exploiting HEVC intra
prediction modes and coding syntax, we are able to generate
(a) Proposed (b) FAST [16]
(c) Shi-Tomasi [2] (d) Harris [1]
Fig. 6: Results on Kimono
TABLE III: Execution time in seconds
Resolution Test image Proposed FAST [16] Shi-Tomasi [2] Harris [1]
832 ×480 BQMall 0.056 0.126 0.139 0.135
640 ×480 CheckerboardCube 0.040 0.085 0.110 0.106
1440 ×720 Checkerboard 0.061 0.117 0.178 0.173
1920 ×1080
BQTerrace 0.269 0.563 0.653 0.626
Kimono 0.139 0.301 0.424 0.407
Typing 0.152 0.360 0.469 0.455
Roof 0.243 0.524 0.622 0.597
Blocks 0.138 0.272 0.399 0.375
2560 ×1600
PeopleOnStreet 0.426 0.978 1.152 1.111
Traffic 0.417 0.939 1.148 1.102
Thai 0.370 0.743 0.957 0.961
3840 ×2160 Scene 0.662 1.315 1.794 1.738
Pens 0.470 0.958 1.480 1.390
corner proposals without full image reconstruction, and faster
than full reconstruction would take. In most cases, corner
proposals show solid agreement with conventional corner
detectors. In some cases, our corner proposals seem to be
more reasonable compared to conventionally-detected corners
for certain applications such as object detection and tracking.
REFERENCES
[1] C. Harris and M. Stephens, “A combined corner and edge detector,” in
Proc. 4th Alvey Vision Conference, 1988, pp. 189–192.
[2] J. Shi and C. Tomasi, “Good features to track,” in Proc. IEEE CVPR’94,
Jun. 1994.
[3] M. Awrangjeb, G. Lu, and C. S. Fraser, “Performance comparisons of
contour-based corner detectors,” IEEE Trans. Image Processing, vol.
21, pp. 4167–4179, Sept. 2012.
[4] B. Shen and I. K. Sethi, “Direct feature extraction from compressed
images,” in Proc. SPIE Storage and Retrieval for Image and Video
Databases IV, 1996, vol. 2670.
[5] Z. Qian, W. Wang, and T. Qiao, “An edge detection method in DCT
domain,” in Int. Workshop Inform. Electron. Eng., 2012, pp. 344–348.
[6] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview
of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst.
Video Technol., vol. 13, pp. 560–576, Jul. 2003.
[7] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the
high efficiency video coding (HEVC) standard, IEEE Trans. Circuits
Syst. Video Technol., vol. 22, pp. 1649–1668, Dec. 2012.
[8] M. Wien, High Efficiency Video Coding (Coding Tools and Specifica-
tion), Springer, 2015.
[9] S. H. Khatoonabadi and I. V. Baji ´
c, “Video object tracking in
the compressed domain using spatio-temporal Markov random fields,”
IEEE Trans. Image Processing, vol. 22, pp. 300–313, Jan. 2013.
[10] L. Zhao, D. Zhao, X. Fan, and Z. He, “HEVC compressed domain
moving object detection and classification,” in Proc. IEEE ISCAS’16,
May 2016, pp. 1990–1993.
[11] F. Pan, X. Lin, S. Rahardja, K. P. Lim, and Z. G. Li, “A directional
field based fast intra mode decision algorithm for H.264 video coding,”
in Proc. IEEE ICME’04, Jun. 2004.
[12] X. Liu, Y. Liu, P. Wang, C.-F. Lai, and H.-C. Chao, “An adaptive mode
decision algorithm based on video texture characteristics for HEVC
intra prediction,” IEEE Trans. Circuits Syst. Video Technol., Apr. 2016.
[13] X. Wang and Y. Xue, “Fast HEVC intra coding algorithm based on
Otsu’s method and gradient, in Proc. IEEE Int. Symp. Boradband
Multimedia Systems and Broadcasting, Jul. 2016.
[14] W. Jiang, H. Ma, and Y. Chen, “Gradient based fast mode decision
algorithm for intra prediction in HEVC,” in 2nd Int. Conf. Consumer
Electronics, Communications and Networks, May 2012.
[15] F. S. Hill, Jr., “The pleasures of “perp dot” products,” in Graphics
Gems IV, P. S. Heckbert, Ed., pp. 138–148. Academic Press, 1994.
[16] E. Rosten, R. Porter, and T. Drummond, “Faster and better: a machine
learning approach to corner detection,” IEEE Trans. Pattern Analysis
and Machine Intelligence, vol. 32, no. 1, pp. 105–119, Jan. 2010.
[17] F. Bossen, “Common HM test conditions and software reference
configurations,” in ISO/IEC JTC1/SC29 WG11 m28412, JCTVC-L1100,
Jan. 2013.
... As to the compression pipeline design, some early works [10], [11], [13] tried to perform machine analysis tasks in the DCT domain, undecoded and partially decoded HEVC bitstreams. These tasks include edge detection, human recognition, and generation of corner proposals that can be used for object recognition, tracking, and retrieval. ...
Preprint
Full-text available
With the AI of Things (AIoT) development, a huge amount of visual data, e.g., images and videos, are produced in our daily work and life. These visual data are not only used for human viewing or understanding but also for machine analysis or decision-making, e.g., intelligent surveillance, automated vehicles, and many other smart city applications. To this end, a new image codec paradigm for both human and machine uses is proposed in this work. Firstly, the high-level instance segmentation map and the low-level signal features are extracted with neural networks. Then, the instance segmentation map is further represented as a profile with the proposed 16-bit gray-scale representation. After that, both 16-bit gray-scale profile and signal features are encoded with a lossless codec. Meanwhile, an image predictor is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features. Finally, the residual map between the original image and the predicted one is compressed with a lossy codec, used for high-quality image reconstruction. With such designs, on the one hand, we can achieve scalable image compression to meet the requirements of different human consumption; on the other hand, we can directly achieve several machine vision tasks at the decoder side with the decoded 16-bit gray-scale profile, e.g., object classification, detection, and segmentation. Experimental results show that the proposed codec achieves comparable results as most learning-based codecs and outperforms the traditional codecs (e.g., BPG and JPEG2000) in terms of PSNR and MS-SSIM for image reconstruction. At the same time, it outperforms the existing codecs in terms of the mAP for object detection and segmentation.
Article
At present, and increasingly so in the future, much of the captured visual content will not be seen by humans. Instead, it will be used for automated machine vision analytics and may require occasional human viewing. Examples of such applications include traffic monitoring, visual surveillance, autonomous navigation, and industrial machine vision. To address such requirements, we develop an end-to-end learned image codec whose latent space is designed to support scalability from simpler to more complicated tasks. The simplest task is assigned to a subset of the latent space (the base layer), while more complicated tasks make use of additional subsets of the latent space, i.e., both the base and enhancement layer(s). For the experiments, we establish a 2-layer and a 3-layer model, each of which offers input reconstruction for human vision, plus machine vision task(s), and compare them with relevant benchmarks. The experiments show that our scalable codecs offer 37%-80% bitrate savings on machine vision tasks compared to best alternatives, while being comparable to state-of-the-art image codecs in terms of input reconstruction.
Preprint
Full-text available
We present a dataset that contains object annotations with unique object identities (IDs) for the High Efficiency Video Coding (HEVC) v1 Common Test Conditions (CTC) sequences. Ground-truth annotations for 13 sequences were prepared and released as the dataset called SFU-HW-Tracks-v1. For each video frame, ground truth annotations include object class ID, object ID, and bounding box location and its dimensions. The dataset can be used to evaluate object tracking performance on uncompressed video sequences and study the relationship between video compression and object tracking.
Preprint
At present, and increasingly so in the future, much of the captured visual content will not be seen by humans. Instead, it will be used for automated machine vision analytics and may require occasional human viewing. Examples of such applications include traffic monitoring, visual surveillance, autonomous navigation, and industrial machine vision. To address such requirements, we develop an end-to-end learned image codec whose latent space is designed to support scalability from simpler to more complicated tasks. The simplest task is assigned to a subset of the latent space (the base layer), while more complicated tasks make use of additional subsets of the latent space, i.e., both the base and enhancement layer(s). For the experiments, we establish a 2-layer and a 3-layer model, each of which offers input reconstruction for human vision, plus machine vision task(s), and compare them with relevant benchmarks. The experiments show that our scalable codecs offer 37%-80% bitrate savings on machine vision tasks compared to best alternatives, while being comparable to state-of-the-art image codecs in terms of input reconstruction.
Article
Full-text available
This paper proposes a new edge detection method in DCT domain for compressed images. Instead of detection with spatial pixels, we generate the gradient patterns of DCT basis images with detection operators. The quantized coefficients are used to calculate the edge maps, which avoids the procedure of inverse quantization. It is shown that the proposed scheme leads to not only good edges but also computation speedup. (C) 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of Harbin University of Science and Technology.
Conference Paper
Consistency of image edge filtering is of prime importance for 3D interpretation of image sequences using feature tracking algorithms. To cater for image regions containing texture and isolated features, a combined corner and edge detector based on the local auto-correlation function is utilised, and it is shown to perform with good consistency on natural imagery.
Conference Paper
High Efficiency Video Coding (HEVC), the newest video coding standard established, aims to significantly improve the compression performance compared with H.264/AVC and other existing video coding standards. In the intra coding of HEVC, this improvement is achieved by adopting the coding tree unit (CTU) structure and providing up to 35 prediction modes for each prediction unit (PU). However, the computational complexity is greatly increased due to these new techniques. In this paper, we propose a fast coding unit (CU) size decision and PU mode decision algorithm for HEVC intra coding to reduce the time consumption. Since CU sizes are highly correlated to the texture detail and variance information of the largest coding unit (LCU), we introduce the Otsu's method to measure the texture complexity of each LCU and skip some CU depth levels. Meanwhile, as the prediction modes always correspond to gradient directions, we apply improved Sobel masks to measure gradient directions of PUs and reduce some candidate intra modes. Experimental results show that our proposed algorithm can save 33% time consumption on average with negligible loss of coding efficiency.
Article
The latest High Efficiency Video Coding (HEVC) standard could achieve the highest coding efficiency compared with the existing video coding standards. To improve the coding efficiency of Intra frame, a quadtree-based variable block size coding structure which is flexible to adapt to various texture characteristics of images and up to 35 Intra prediction modes for each prediction unit (PU) is adopted in HEVC. However, the computational complexity is increased dramatically because all the possible combinations of the mode candidates are calculated in order to find the optimal rate distortion (RD) cost by using Lagrange multiplier. To alleviate the encoder computational load, this paper proposes an adaptive mode decision algorithm based on texture complexity and direction for HEVC Intra prediction. Firstly, an adaptive Coding Unit (CU) selection algorithm according to each depth levels’ texture complexity is presented to filter out unnecessary coding block. And then, the original redundant mode candidates for each PU are reduced according to its texture direction. The simulation results show that the proposed algorithm could reduce around 56% encoding time in average while maintaining the encoding performance efficiently with only 1.0% increase in BD-rate compared to the test model HM16 of HEVC.
Book
The video coding standard High Efficiency Video Coding (HEVC) targets at improved compression performance for video resolutions of HD and beyond, providing Ultra HD video at similar compressed bit rates as for HD video encoded with the well-established video coding standard H.264 | AVC. Based on known concepts, new coding structures and improved coding tools have been developed and specified in HEVC. The standard is expected to be taken up easily by established industry as well as new endeavors, answering the needs of todays connected and ever-evolving online world. This book presents the High Efficiency Video Coding standard and explains it in a clear and coherent language. It provides a comprehensive and consistently written description, all of a piece. The book targets at both, newbies to video coding as well as experts in the field. While providing sections with introductory text for the beginner, it suits as a well-arranged reference book for the expert. The book provides a comprehensive reference for the technical details of the employed coding tools; it further outlines the algorithmic advances compared to H.264 | AVC. In addition to the technical aspects, the book provides insight to the general concepts of standardization, how specification text is written, and how these concepts apply to the HEVC specification.
Article
High Efficiency Video Coding (HEVC) is currently being prepared as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality. This paper provides an overview of the technical features and characteristics of the HEVC standard.
Article
High Efficiency Video Coding (HEVC) is an ongoing video compression standard and a successor to H.264/AVC. It aims to provide significantly improved compression performance as compared to all existing video coding standards. In the intra prediction of HEVC, this is being achieved by providing up to 35 intra modes with larger coding unit. The optimal mode is selected by a rough mode decision (RMD) process from all of the available modes first, and then through the rate-distortion optimization (RDO) process for the final decision. Because the fact that every coding unit with different sizes is traversed in both procedures makes it very time-consuming, a gradient based fast mode decision algorithm is proposed in this paper to reduce the computational complexity of HEVC. Prior to intra prediction, gradient directions are calculated and a gradient-mode histogram is generated for each coding unit. Based on the distribution of the histogram, only a small part of the candidate modes are chosen for the RMD and the RDO processes. As compared to the default encoding scheme in HEVC test model HM 4.0, experimental results show that the fast intra mode decision scheme provides almost 20% time savings in all intra low complexity cases on average with negligible loss of coding efficiency.