Content uploaded by Ivan V. Bajic
Author content
All content in this area was uploaded by Ivan V. Bajic on Oct 20, 2017
Content may be subject to copyright.
To be presented at IEEE ISCAS’17, Baltimore, MD, USA, May 2017.
Corner Proposals from HEVC Bitstreams
Hyomin Choi and Ivan V. Baji´
c
School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada
Email: {chyomin, ibajic}@sfu.ca
Abstract—Corner-like features are important in computer
vision problems such as object matching, tracking, recognition,
and retrieval. Most corner detectors operate in the pixel domain,
which means that they require image or video to be fully
decoded and reconstructed before detection can start. In this
paper we describe a method for generating corner proposals from
compressed HEVC bitstreams without full decoding. Specifically,
we utilize HEVC syntax and intra prediction directions to find
the locations that are likely to contain corners. The proposed
method is lightweight and can be applied to intra-coded frames
or still images coded by HEVC. Experimental results illustrate
that the proposed method is able to identify most regions where
conventional pixel-domain corner detectors would find corners.
Keywords—Corner detection, corner proposals, HEVC, intra
prediction
I. INTRODUCTION
Corner-like features are used in various computer vision ap-
plications, such as object matching, tracking, recognition, and
retrieval. A number of corner detectors have been developed
over the last three decades, starting with classic Harris [1]
and Shi-Tomasi [2] detectors. Conventional corner detectors
analyze image characteristics such as pixel intensity, edges, or
shapes, and can be classified as into intensity-based, model-
based, and contour-based detectors [3]. They run in pixel
domain and therefore require images or video frames to be
fully decoded before detection can proceed.
In the current era of Big Data, reducing the complexity and
memory requirements of various image processing algorithms
is one of the key requirements for making new large-scale
applications possible. It would be beneficial if corner-like
features could be localized in the compressed domain, without
full decoding and reconstruction. In the past, there were
proposals to detect various image features in the compressed
domain [4], [5]. But these proposals were exploiting earlier
compression standards, such as JPEG and MPEG-2 (Intra),
where pixel values were directly transformed and therefore
transform coefficients had a direct relationship with the orig-
inal pixel values. Intra coding in recent coding standards
such as H.264/AVC [6] and High Efficiency Video Coding
(HEVC) [7] employs intra prediction prior to transformation,
making the relationship between pixel values and transform
coefficients more complicated. Nonetheless, some clues about
the underlying pixel-domain strictures may be inferred from
the coding syntax, without full decoding.
In this paper we present a method for generating cor-
ner proposals from HEVC bitstreams. Specifically, we ana-
lyze HEVC intra prediction directions and other syntax el-
ements [8], and infer regions where corner-like features are
likely to be found. The proposed method operates on the output
of the HEVC entropy decoding module. It is fast (in fact,
Fig. 1: Illustration of HEVC intra prediction directions
faster than full image decoding), does not require full image
reconstruction, and is able to identify most locations where
conventional detectors [1], [2] would find corners. Identified
regions could be used to initiate compressed-domain track-
ing [9], help with compressed-domain object detection [10], or
to reduce the search space of conventional pixel-domain corner
detectors after decoding. The proposed method is described in
Section II, followed by experimental results in Section III and
conclusions in Section IV.
II. CORNER PRO PO SA LS F ROM HEVC BITSTREA MS
A. Preliminaries
Intra prediction in HEVC employes 33 angular and 2 non-
angular prediction modes [8]. Angular prediction modes are
uniformly spaced by 5.625◦to cover a 180◦range. Block size
and prediction mode are decided by rate-distortion optimiza-
tion (RDO). Fig. 1 illustrates intra prediction modes by red
arrows. It is apparent that near the edges, block size tends to be
small (usually 4×4or 8×8) while prediction direction tends to
follow the underlying edge structure. The relationship between
edge structure and prediction has been recognized before, and
a number of methods have been proposed for accelerating
mode decision process at the encoder for both H.264/AVC
and HEVC [11] [12] [13] [14]. In the present work we utilize
these observations to infer the likely positions of corners.
B. Corner proposals
To generate corner proposals, we process the blocks in
an intra-coded HEVC frame in the raster scan order. For
illustration purposes, four neighboring 4×4blocks are shown
in Fig. 2. The blocks are labeled 1through 4in Z-scan
order, and the figure focuses on the first and fourth blocks
to illustrate the concept. Prediction directions are indicated by
red arrows, and their angles (θ1and θ4in Fig. 2, measured
counterclockwise) are deduced from intra prediction modes.
To be presented at IEEE ISCAS’17, Baltimore, MD, USA, May 2017.
Fig. 2: An example of intersection coordinate by two directions
We imagine there is a line passing through the center of
each block i. We call these intra prediction lines. Given the
coordinates (xi, yi)of the center, the equation for the line in
block iis
(x, y)=(xi, yi) + ai(cos θi,sin θi),(1)
where aiis a parameter. If the lines intersect, the angle at the
intersection is given by
θi,j =θi−θj.(2)
If |θi,j |∈{0◦,180◦}, the lines are parallel and will not
intersect.
For intersecting lines, the point of intersection can be found
by equating the equations of the two lines and solving for
the common point (xi,j , yi,j ). For brevity, we present the
solution in a compact form [15]. Let vi= (cos θi,sin θi)
be the unit vector in the prediction direction in block i, and
v⊥
i= (−sin θi,cos θi)be a vector perpendicular to vi, then
(xi,j , yi,j )=(xi, yi) +
v⊥
j·(xj−xi, yj−yi)
v⊥
j·vi
vi,(3)
where symbol ·indicates dot product.
If prediction direction lines intersect, this may be an indi-
cation of a corner formed by the underlying edge structures,
and we would select the corresponding 4×4block as a corner
proposal. However, for every group of four neighboring blocks
there could be up to 4
2= 6 intersections of intra prediction
lines. This could lead to a large number of corner proposals.
It is therefore necessary to prune some of the less likely
candidates to arrive at a more reasonable number, as explained
in the next section.
Finally, note that conventional corner detectors [1], [2] also
consider terminus (end-of-line) points as corners. For example,
in Fig. 3, corners would be detected not only where the lines
meet (red-colored hollow circles), but also at the other end
of each line (red-colored solid circles). We therefore generate
corner proposals in cases where we expect the end of an
underlying edge or line. We found that terminus points are
usually found where one of the neighboring blocks is coded
Fig. 3: Sample synthetic corners
in an angular intra-prediction mode, and the other in non-
angular (DC or Planar) mode. In this case, when the prediction
direction points towards the neighboring non-angular coded
block, then both blocks are selected as corner proposals.
C. Pruning
The first and simplest criterion for pruning is block size.
Specifically, we only consider 4×4and 8×8blocks in
generating corner proposals. Larger blocks are generally found
in flat areas (see Fig. 1), so we don’t consider them as
generators for corner proposals.
The second pruning criterion is the existence of non-
zero transform coefficients in the block, which can easily be
checked using the cbf_luma flag. When considering a pair of
blocks whose intra prediction lines could potentially generate
a corner, we require that at least one of them has non-zero
transform coefficients (cbf_luma = 1), otherwise we don’t
compute the intersection point for that pair of blocks. The
above two criteria apply to both terminus and non-terminus
corner proposals, while the remaining criteria apply only to
non-terminus proposals.
The next criterion is the angle of intersection of intra
prediction lines. In Fig. 1 we see that for neighboring blocks
along the object’s contour, intra prediction directions differ
slightly from block to block. This means that their angle
difference |θi,j|is either very small (close to 0◦) or very
large (close to 180◦). However, there is no corner in these
blocks since the underlying contour bends smoothly. So we
should eliminate very small and very large intersection angles
when generating corner proposals. To decide on the range of
valid angles, we ran an OpenCV1implementation of the Harris
corner detector [1] with default settings on synthetic images of
corners (Fig. 3). The corner angles ranged from 0◦to 180◦in
steps of 5.625◦, to emulate the corners that could be obtained
from the intersections of intra prediction lines. The Harris
detector did not detect corners larger than 157.5◦, which is
22.5◦away from 180◦. Hence we used the same threshold to
eliminate corner proposals associated with very small or very
large angles. The intersection of intra prediction lines is only
sought if the following condition is satisfied:
22.5◦<|θi,j |<157.5◦.(4)
The above criteria can be easily checked from HEVC syn-
tax without much computation. If a pair of blocks passes these
criteria, we compute the intersection point from (3). Since
corners are local features, to be a valid corner proposal, we
require the intersection point to be close to the blocks whose
intra prediction lines generated it. We define a valid region as
the region comprising the four blocks being processed along
1http://opencv.org/
To be presented at IEEE ISCAS’17, Baltimore, MD, USA, May 2017.
Fig. 4: Intersections within a valid region
with a 4-pixel boundary on each side (Fig. 4). If the intra
prediction lines intersect within a valid region, they would be
considered corner proposals, otherwise they are ignored. In
Fig. 4, three intersections are within the valid region.
III. EXP ER IM EN TAL RES ULT S
Our method generates corner proposals by identifying a
collection of 4×4blocks where corners are likely to be found.
We evaluate the proposed method by examining how well it
can localize corners detected by benchmark detectors [1], [2].
We construct two metrics for this purpose: corner coverage
(CC) and image coverage (IC). CC is defined as a fraction of
corners found by one of the benchmark detectors ([1] or [2])
that are covered by the corner proposal blocks. IC is the
fraction of the image covered by the corner proposal blocks.
Ideally, we would have CC of 100% while having small IC
(i.e., cover all corners with as few proposals as possible).
We compare our method against a recent fast corner
detector [16] that we call FAST from here on. Since FAST
outputs detected corner coordinates like most conventional
detectors, we round those coordinates to the nearest 4×4
blocks according to (5), to make its output comparable to ours.
Then we can compute CC and IC in the same way as for our
method. Coordinate rounding is performed as follows:
(bx, by) = x+ 2
4,y+ 2
4.(5)
All experiments are performed on the system whose con-
figuration is shown in Table I. Our method is implemented
in the HEVC reference decoder (version HM16.12), while
corner detectors [1], [2], [16] are from OpenCV (version
3.1.0). We selected 13 images for evaluation, listed in Tables II
and III. Two of them (Checkerboard and CheckerboardCube)
are synthetic images and the others are natural images. A few
images are borrowed from HEVC common test sequences.
The images cover a wide range of resolutions from VGA
(640 ×480) to 4K UHD (3840 ×2160). They were coded
by the HM16.12 encoder following common conditions [17]
for intra coding, with QP = 27.
TABLE I: Experimental environment
Item Specification
Processor Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
RAM 64 GB
Operating system Windows Server 2012 R2 64-bit
Compiler Visual Studio 2015
HEVC decoder HM16.12
Other software OpenCV 3.1.0
Table II shows the results in terms of IC and CC. The
parameters of FAST [16] were set to give approximately the
same IC as our method, so that we can have a meaningful
comparison of CC values. As seen in the table, the CC values
of our method are fairly similar to those of FAST [16],
especially when computed against corners found by the Harris
detector. In some cases our method achieves higher CC values
(e.g., Checkerboard) and in other cases it has lower CC values
than FAST. CC values themselves should be taken with a grain
of salt, though, as illustrated in Fig. 5, which shows corners
and corner proposals (red dots) found by the four detectors
on the Checkerboard image. All four detectors find the vast
majority of correct corners, with FAST missing out an a few,
which brings its CC down slightly. The corners that are missed
by FAST but detected by other methods are shown in green in
Fig. 5(a),(c),(d). Note that even the two benchmark detectors
themselves do not fully agree on the corners even in this simple
synthetic image: Shi-Tomasi detector [2] finds corners along
the left image boundary (as does ours), but Harris detector [1]
does not. The discrepancy is much larger on natural images,
where testing one benchmark detector against the other usually
gives CC values around 50% (full results not shown due to
space constraints).
We note in Table II that our method achieves the lowest
CC values on the Kimono image. This image is shown in
Fig. 6 along with corners and corner proposals found by the
four detectors. We see that Harris, Shi-Tomasi, and FAST
detectors find many corners in the highly textured background,
while our detector finds corner proposals along the boundary
of the foreground object (person). This explains why the CC
of our method is low on this image. Also, one could argue
that in this case corner proposals found by our method are
more reasonable for many applications (e.g., object detection
or tracking) compared to corners found by the other three
(a) Proposed (b) FAST [16]
(c) Shi-Tomasi [2] (d) Harris [1]
Fig. 5: Results on Checkerboard
To be presented at IEEE ISCAS’17, Baltimore, MD, USA, May 2017.
TABLE II: Image and corner coverage comparison
Test image Method IC (%) Shi-Tomasi CC (%) Harris CC (%)
BQMall Proposed 33.48 76.43 80.12
FAST 34.86 85.66 79.92
CheckerboardCube Proposed 6.40 80.18 74.42
FAST 6.09 85.85 78.14
Checkerboard Proposed 0.72 99.09 100.00
FAST 0.67 76.82 80.38
BQTerrace Proposed 27.41 74.22 75.38
FAST 27.68 87.17 79.04
Kimono Proposed 0.67 3.52 9.20
FAST 0.61 60.80 47.24
Typing Proposed 8.20 62.72 70.91
FAST 7.70 79.31 67.89
Roof Proposed 22.07 73.60 80.60
FAST 21.37 80.61 67.63
Blocks Proposed 2.48 25.31 37.40
FAST 2.51 72.95 65.75
PeopleOnStreet Proposed 22.00 61.46 69.24
FAST 23.06 81.62 70.03
Traffic Proposed 16.66 63.48 71.95
FAST 16.62 79.17 70.42
Thai Proposed 13.03 66.50 68.74
FAST 13.20 84.21 70.57
Scene Proposed 8.58 30.77 30.95
FAST 8.62 79.24 74.79
Pens Proposed 0.98 50.50 56.62
FAST 0.97 72.25 60.92
detectors. Low CC scores of our method generally occur in
images that contain highly textured regions (Kimono, Blocks,
Scene, and Pens) where other detectors find corners and ours
does not. The reason why our detector avoids highly-textured
regions is that those blocks tend to be coded with DC mode,
whereas our detector requires at least one block in a four-block
neighborhood to be coded with an intra-prediction mode in
order to consider corner proposals.
Finally, the execution time of each detector is shown in
Table III. Execution time includes all the necessary decoding
(entropy decoding only in our case) and processing. Not
surprisingly, our detector is the fastest, because it does not
require full decoding. In fact, our corner proposals are gen-
erated faster than full decoding, making it very attractive for
applications where speed is crucial. Another desirable feature
of our method (which is not reflected in these results) is the
low memory requirement, because the image does not have to
be reconstructs and stored in memory as with other detectors.
IV. CONCLUSION
We presented a method for fast generation of corner
proposals from HEVC bitstreams. By exploiting HEVC intra
prediction modes and coding syntax, we are able to generate
(a) Proposed (b) FAST [16]
(c) Shi-Tomasi [2] (d) Harris [1]
Fig. 6: Results on Kimono
TABLE III: Execution time in seconds
Resolution Test image Proposed FAST [16] Shi-Tomasi [2] Harris [1]
832 ×480 BQMall 0.056 0.126 0.139 0.135
640 ×480 CheckerboardCube 0.040 0.085 0.110 0.106
1440 ×720 Checkerboard 0.061 0.117 0.178 0.173
1920 ×1080
BQTerrace 0.269 0.563 0.653 0.626
Kimono 0.139 0.301 0.424 0.407
Typing 0.152 0.360 0.469 0.455
Roof 0.243 0.524 0.622 0.597
Blocks 0.138 0.272 0.399 0.375
2560 ×1600
PeopleOnStreet 0.426 0.978 1.152 1.111
Traffic 0.417 0.939 1.148 1.102
Thai 0.370 0.743 0.957 0.961
3840 ×2160 Scene 0.662 1.315 1.794 1.738
Pens 0.470 0.958 1.480 1.390
corner proposals without full image reconstruction, and faster
than full reconstruction would take. In most cases, corner
proposals show solid agreement with conventional corner
detectors. In some cases, our corner proposals seem to be
more reasonable compared to conventionally-detected corners
for certain applications such as object detection and tracking.
REFERENCES
[1] C. Harris and M. Stephens, “A combined corner and edge detector,” in
Proc. 4th Alvey Vision Conference, 1988, pp. 189–192.
[2] J. Shi and C. Tomasi, “Good features to track,” in Proc. IEEE CVPR’94,
Jun. 1994.
[3] M. Awrangjeb, G. Lu, and C. S. Fraser, “Performance comparisons of
contour-based corner detectors,” IEEE Trans. Image Processing, vol.
21, pp. 4167–4179, Sept. 2012.
[4] B. Shen and I. K. Sethi, “Direct feature extraction from compressed
images,” in Proc. SPIE Storage and Retrieval for Image and Video
Databases IV, 1996, vol. 2670.
[5] Z. Qian, W. Wang, and T. Qiao, “An edge detection method in DCT
domain,” in Int. Workshop Inform. Electron. Eng., 2012, pp. 344–348.
[6] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview
of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst.
Video Technol., vol. 13, pp. 560–576, Jul. 2003.
[7] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the
high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits
Syst. Video Technol., vol. 22, pp. 1649–1668, Dec. 2012.
[8] M. Wien, High Efficiency Video Coding (Coding Tools and Specifica-
tion), Springer, 2015.
[9] S. H. Khatoonabadi and I. V. Baji ´
c, “Video object tracking in
the compressed domain using spatio-temporal Markov random fields,”
IEEE Trans. Image Processing, vol. 22, pp. 300–313, Jan. 2013.
[10] L. Zhao, D. Zhao, X. Fan, and Z. He, “HEVC compressed domain
moving object detection and classification,” in Proc. IEEE ISCAS’16,
May 2016, pp. 1990–1993.
[11] F. Pan, X. Lin, S. Rahardja, K. P. Lim, and Z. G. Li, “A directional
field based fast intra mode decision algorithm for H.264 video coding,”
in Proc. IEEE ICME’04, Jun. 2004.
[12] X. Liu, Y. Liu, P. Wang, C.-F. Lai, and H.-C. Chao, “An adaptive mode
decision algorithm based on video texture characteristics for HEVC
intra prediction,” IEEE Trans. Circuits Syst. Video Technol., Apr. 2016.
[13] X. Wang and Y. Xue, “Fast HEVC intra coding algorithm based on
Otsu’s method and gradient,” in Proc. IEEE Int. Symp. Boradband
Multimedia Systems and Broadcasting, Jul. 2016.
[14] W. Jiang, H. Ma, and Y. Chen, “Gradient based fast mode decision
algorithm for intra prediction in HEVC,” in 2nd Int. Conf. Consumer
Electronics, Communications and Networks, May 2012.
[15] F. S. Hill, Jr., “The pleasures of “perp dot” products,” in Graphics
Gems IV, P. S. Heckbert, Ed., pp. 138–148. Academic Press, 1994.
[16] E. Rosten, R. Porter, and T. Drummond, “Faster and better: a machine
learning approach to corner detection,” IEEE Trans. Pattern Analysis
and Machine Intelligence, vol. 32, no. 1, pp. 105–119, Jan. 2010.
[17] F. Bossen, “Common HM test conditions and software reference
configurations,” in ISO/IEC JTC1/SC29 WG11 m28412, JCTVC-L1100,
Jan. 2013.