Content uploaded by Ivan V. Bajic
Author content
All content in this area was uploaded by Ivan V. Bajic on May 31, 2015
Content may be subject to copyright.
Presented at IEEE ICME 2011, Barcelona, Spain, July 2011.
SALIENCY-PRESERVING VIDEO COMPRESSION
Hadi Hadizadeh and Ivan V. Baji´
c
School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada
ABSTRACT
In region-of-interest (ROI) video coding, the part of the frame
designated as ROI is encoded with higher quality relative to
the rest of the frame. At low bit rates, coding artifacts in non-
ROI parts of the frame may become salient and draw user’s
attention away from ROI, thereby degrading visual quality.
In this paper we propose a saliency-preserving framework for
ROI video coding. This approach aims at reducing attention-
grabbing visual artifacts in non-ROI parts of the frame in or-
der to keep user’s attention on ROI. Experimental results in-
dicate that the proposed method is able to improve the visual
quality of ROI video at low bit rates.
Index Terms—ROI video coding, visual attention model,
visual coding artifacts, saliency
1. INTRODUCTION
Video compression standards such as MPEG-4 and H.26x
have been developed to achieve high compression efficiency
simultaneously with high perceived visual quality [1], [2].
However, lossy compression techniques may produce various
coding artifacts such as blockiness, ringing, blur, etc., espe-
cially at low bit rates [3]. Several methods have been pro-
posed to detect and reduce coding artifacts [4], [5], [6].
Recently, region-of-interest (ROI) coding of video using
computational models of visual attention has been recognized
as a promising approach to achieve high-performance video
compression [7],[8]. The idea behind most of these meth-
ods is to encode a small area around the predicted attention-
grabbing (salient) regions with higher quality compared to
other less visually important regions. Such a spatial priori-
tization is supported by the fact that only a small region of
2−5◦of visual angle around the center of gaze is perceived
with high spatial resolution due to the highly non-uniform dis-
tribution of photoreceptors on the human retina [7].
Granting a higher priority to the salient regions, how-
ever, may produce severe coding artifacts in areas outside the
salient regions where the image quality is lower. Such ar-
tifacts may draw viewer’s attention away from the naturally
salient regions, thereby degrading the perceived visual qual-
ity. To mitigate this problem, in this paper, we introduce the
concept of saliency-preserving video compression as a new
paradigm for video compression which attempts to suppress
such attention-grabbing artifacts, and keep user’s attention on
the same regions that were salient before compression. At the
same time, some artifacts may be tolerated, so long as they do
not draw attention.
Using this concept, we propose a novel algorithm for
saliency-preserving video compression within a ROI coding
framework. In the proposed algorithm, the visibility of poten-
tial coding artifacts is predicted by computing the difference
between the saliency map of the original raw video frames
and the saliency map of the encoded frames. The quantization
parameters (QPs) of individual macroblocks (MBs) are then
adjusted according to the obtained saliency errors, so that the
total saliency error is reduced while the bit rate constraint is
satisfied. To achieve this goal, the problem is formulated as
a Multiple-Choice Knapsack problem (MCKP) [9]. Exper-
imental results indicate that the proposed method is able to
improve the visual quality of encoded video compared to the
conventional rate-distortion optimized (RDO) video, as well
as the conventional ROI-coded video.
Note that a visible artifact is not necessarily salient. A
particular artifact may be visible if the user is looking directly
at it or at its neighborhood, but may go unnoticed if it is non-
salient and the user is looking elsewhere in the frame. As
the severity of the artifact increases, it may become salient
and draw user’s attention to it. Although several methods
have been developed for detecting visible (but not necessar-
ily salient) artifacts [6], in our work, the concept of visual
saliency is used to minimize salient coding artifacts, i.e., those
coding artifacts that may grab user’s attention.
The paper is organized as follows. In Section 2, a recent
ROI-coding algorithm is described, followed by a brief sum-
mary of the MCKP problem. The proposed method is pre-
sented in Section 3. Experimental results are given in Section
4, and the conclusions are drawn in Section 5.
2. PRELIMINARIES
2.1. ROI Video Coding
In [10], a ROI bit allocation scheme was proposed for
H.264/AVC. In this scheme, after detecting the ROI, several
coding parameters including QP, macroblock (MB) coding
modes, the number of reference frames, accuracy of motion
vectors, and the search range for motion estimation, are adap-
Presented at IEEE ICME 2011, Barcelona, Spain, July 2011.
tively adjusted at the MB level according to the relative im-
portance of each MB and a given target bit rate. Subsequently,
the encoder allocates more resources, such as bits and com-
putational power, to the ROI. In [10], the optimized QP value
for each MB is obtained as
Qp[i] = X1[i]
T[i]−(N−i)X2[i]qMADpred,adapt[i]
∗
N
X
k=isw[k]
w[i]MADpred,adapt[k],(1)
where T[i]is the remaining bits before encoding the i-th
MB, Nis the total number of MBs in a frame, X1[i]and
X2[i]are the first-order and zero-order parameters of the R-
Q model [10],[11], M ADpred,adapt [i]is the adaptive mean
absolute difference (MAD) prediction value, and w[i]is the
importance level associated with the i-th MB.
In order to combine this ROI coding approach with the
concept of visual attention, we encode the input video based
on the saliency maps produced by the Itti-Koch-Niebur (IKN)
saliency model from [12]. The saliency map of each frame
is first remapped to the range [−0.5,0.5]. Then, the saliency
value of the i-th MB, computed as the average saliency value
of its pixels, is used as the importance level w[i]of that MB.
Given the target bit rate, the QP value of each MB can be
obtained using (1). As in [10], the obtained QP value is fur-
ther bounded to within ±4of the QP value of the previously
encoded MB in order to maintain visual smoothness and sup-
press blocking artifacts. Other ROI bit allocation schemes
(e.g., [13]) can also be employed to find the initial set of QP
values. The proposed method starts with a set of QP values
obtained by ROI bit allocation, and then modifies them in a
way that minimizes saliency error.
2.2. Multiple Choice Knapsack Problem
The Multiple Choice Knapsack Problem (MCKP) is a gener-
alization of the ordinary knapsack problem, where the set of
items is partitioned into classes. The binary choice of tak-
ing an item is replaced by the selection of exactly one item
out of each class [9]. Consider Kmutually disjoint classes
C1, C2, ..., CKof items to be packed into a knapsack of ca-
pacity c. Each item m∈Ckis associated with a profit pkm
and a weight wkm. The goal is to choose exactly one item
from each class such that the profit sum is maximized while
the total weight is kept below the capacity c. Let xk m be the
binary indicator of whether item mis chosen in class Ck. The
MCKP is formulated as follows [9]:
Maximize
K
X
k=1 X
m∈Ck
pkmxk m
Subject to
K
X
k=1 X
m∈Ck
wkmxk m ≤c,
X
m∈Ck
xkm = 1, k = 1,2, ..., K,
xkm ∈ {0,1}, k = 1,2, ..., K, m ∈Ck.
(2)
MCKP is an NP-hard problem [9]. However, an exact
solution can be obtained in a reasonable time for the problem
sizes encountered in the proposed method using the algorithm
proposed in [14].
3. THE PROPOSED METHOD
We now present the proposed algorithm for saliency-
preserving video coding. In the sequel, capital bold letters
(e.g., X) denote matrices, lowercase bold letters (e.g., x) de-
note vectors, and italics letters (e.g., x) represent scalars.
Consider an uncompressed video sequence consisting of
Nframes {F1,F2, ..., FN}, where each frame is WMBs
wide and HMBs high. Let BTbe the target number of bits
we wish to spend on encoding these frames. For each frame
Fi, a visual saliency map Siof the same size as Fiis com-
puted by a chosen visual attention model in which the saliency
of each 16×16 block (obtained by the average saliency of pix-
els within the block) determines the visual importance of the
corresponding MB in Fi. The proposed method consists of
the following steps.
Step 1) The current frame Fiis first encoded by a ROI
encoder (e.g., the one described in Section 2.1) using its orig-
inal saliency map Si. The encoded frame is then decoded, and
the saliency map of the decoded frame, ˜
Si, is computed. Let
Qibe the QP matrix of Fiobtained by the ROI encoder, and
Bibe the matrix containing the actual number of bits spent
on each MB of the encoded Fi. Both Qiand Biare of size
W×H(in MBs). Qi(x, y)is the QP value of the MB at
position (x, y), and Bi(x, y)is the number of bits of the MB
at position (x, y). The total number of bits Bispent on en-
coding frame Fiis Bi=PH
y=1 PW
x=1 Bi(x, y). Note that
BT=PN
i=1 Bi. Due to quantization, Siis, in general, dif-
ferent from ˜
Si. Let Ei=Si−˜
Sibe the saliency error matrix.
We store Eiand Bifor subsequent optimization.
Our goal is to modify some elements of Qisuch that if
Fiis re-encoded with the modified Qi, the L1-norm of its
saliency error matrix Eiis decreased. The QP values can
change by the offsets from the set O={o1, o2, ..., oM}of
size M, whose elements are either positive or negative integer
values. We always set o1= 0, so that one of the options
corresponds to not changing the QP value.
Presented at IEEE ICME 2011, Barcelona, Spain, July 2011.
Due to spatial intra-prediction, the bit rates of neighboring
MBs are dependent. Moreover, since the saliency is computed
over a neighborhood, the saliency value of an MB is affected
by the QPs of its neighbors. Modeling such a dependence
is not an easy task. To overcome this difficulty, we use the
following approach. Let Pbe the set of all binary matrices
(i.e., whose elements are either 0or 1) of size W×Hthat
have the following property: there are exactly two zeros in
between every two non-zero elements in both horizontal and
vertical directions. In total, there are 9such matrices, i.e.,
P={P1,P2, ..., P9}. Let Knbe the number of 1’s in Pn.
Each element of Pnis identified with a MB in a frame. At
any one time, we will only change the QPs of MBs identified
with 1’s. Since any two non-zero elements of each Pnare at
least two positions apart, the dependence between MBs cor-
responding to those non-zero elements is reduced, so we can
change their QPs without affecting other MBs selected by the
same Pnsignificantly. As an illustration, the binary masks
corresponding to Pn, n = 1,2, ..., 9,are shown in Fig. 1 for a
QCIF resolution frame, which contains 9×11 MBs.
Step 2) The current frame Fiis then re-encoded in the
following manner: first a binary matrix Pn∈Pand a QP
offset om∈Oare chosen. A new QP matrix Qmn
iis com-
puted as Qmn
i=Qi+omPnwhere the superscript in Qmn
i
indicates that the new QP matrix has been obtained by offset
omand binary mask Pn. All elements of Qmn
iare passed
through a hard-limiter to ensure that all QP values will be in
the range [0,51] as required by H.264/AVC. Fiis then re-
encoded by Qmn
i, and its saliency map ˜
Smn
i, saliency error
matrix Emn
i, and the bit matrix Bmn
iare stored. In this step,
the rate control is disabled to prevent further modification of
QP values by the encoder. Note that since o1= 0,Q1n
i=Qi.
Therefore, E1n
i=Eiand B1n
i=Bi, where Eiand Biare
computed in Step 1. Therefore, this procedure does not need
to be performed with offset o1, but is applied to all other off-
sets oj,j≥2using the selected Pn. At the end of this step,
we obtain Mdifferent QP values, saliency errors, and actual
bits for each MB in Fifor which the corresponding element
in Pnis non-zero. Let Gbe the set of locations (x, y)of such
MBs in Fi. Since there are Knnon-zero elements of Pn,G
also contains Knelements.
We now want to find the best QP offset for each MB in
Gamong the obtained Moptions, such that if the chosen QP
offsets are applied to Qi, the L1-norm of the saliency error
is minimized, while the resultant number of bits of the en-
coded frame remains at or below Bi. To achieve this goal,
we model this problem as a Multiple-Choice-Knapsack prob-
lem (MCKP) [9]. Here, each class is one MB in G(so we
have a total of Knclasses), and each item in a class is a QP
offset (hence, Mitems in each class). We then consider a
2D window of size 3×3(in MBs) around the k-th MB in G
(k= 1,2, ..., Kn), and we compute the total saliency error
etot
ikmn, and total number of bits btot
ikmn within this window as
Fig. 1. An illustration of the nine binary masks (three in each
row) corresponding to Pn, n = 1,2, ..., 9for QCIF resolu-
tion. Black squares indicate the positions of 1’s in Pn, while
white squares indicate the positions of 0’s. For QCIF resolu-
tion, the number of black squares is Kn= 12 in six cases,
and Kn= 9 in three cases.
follows:
etot
ikmn =X
(x,y)∈N (k)
Emn
i(x, y),
btot
ikmn =X
(x,y)∈N (k)
Bmn
i(x, y),
(3)
where N(k)denotes the neighborhood around the k-th MB in
G, and (x, y)denotes the MB position within Fi. Note that,
as mentioned earlier, when the QP of an MB is changed, not
only the saliency error and bits of that MB are changed, but
also the saliency error and bits of its neighbors may change.
For this reason, we compute the total saliency error and the
actual number of bits of all MBs within a window around k-
th MB, and consider them as a generalized saliency error and
total bits of the k-th MB.
The idea here is to cover the whole frame by non-
overlapping windows around all MBs in G. Since there are at
least two MBs in between each pair of MBs in G, there should
not be any gap between any pair of 3×3windows surround-
ing the MBs in G. For some MBs in G, parts of the window
may fall outside the frame. For such cases, we consider only
the parts that are inside the frame. In other cases, the 3×3
window around the first or the last MB in a row or column
might not touch the frame boundary (e.g., the black MB in the
bottom-right corner of the first mask in Fig. 1). In such cases,
the window is expanded up to the frame boundary. Covering
the entire frame by such windows allows us to use Bias the
capacity of the knapsack. Note that PK
k=1 btot
ikmn|m=1 =Bi
because for m= 1,o1= 0.
Having computed etot
ikmn and btot
ikmn, the negative etot
ikmn is
Presented at IEEE ICME 2011, Barcelona, Spain, July 2011.
considered as the profit (pkm =−etot
ikmn in (2)) , and btot
ikmn
is regarded as the weight (wkm =btot
ikmn in (2)) of the m-th
offset in the k-th MB within the i-th frame when using Pn.Bi
is set as the capacity of the knapsack (c=Biin (2)), and the
MCKP problem is solved. MCKP chooses exactly one item
(in our case one offset) per class (in our case, per MB) such
that the total profit is maximized (in our case, saliency error
minimized) while the total weight (in our case, bits) remains
at or below Bi.
The obtained QP offsets are then applied to the original
QP matrix Qi, and the current frame is re-encoded with the
updated QP matrix Q∗n
i. Finally, the new saliency error ma-
trix of the encoded frame is computed, and its L1-norm Ln
1as
well as the obtained QP matrix Q∗n
iare stored. This proce-
dure is repeated for each matrix Pn,n= 1,2, ..., 9.
Step 3) At the end of Step 2, we obtain nine saliency error
L1norms L1
1, L2
1, ..., L9
1and the corresponding QP matrices.
The QP matrix whose saliency error L1norm is the smallest
is chosen as the final QP matrix Q∗
ifor frame Fi. Finally, Fi
is encoded using Q∗
i, and the encoder moves on to the next
frame.
Note that in the proposed algorithm, whenever the QP
of one MB is changed, the rate-distortion optimized (RDO)
mode decision [2] is employed to obtain the optimal predic-
tion mode and MB type. Therefore, the total number of bits
of each MB is computed after the RDO mode decision. Any
potential underflow (overflow) in the number of bits is added
to (subtracted from) the knapsack capacity of the subsequent
frame, thereby preserving the total rate assigned during ini-
tial ROI bit allocation. Algorithm 1 summarizes the proposed
method. In our current implementation, each video frame is
encoded K= (M−1) ×9 + 1 times, the saliency map
of the frame is computed Ktimes, and MCKP is employed
nine times. Hence, in its current implementation, the pro-
posed method is only suitable for offline applications. How-
ever, multiple frame encodings and saliency computations can
be avoided by using a suitable model for the relationship be-
tween QP values and saliency values. Our current work is
focused on the development of such a model. As for solving
the MKCP, in our simulations the algorithm from [14] takes
an average of about 200 msec (on an Intel Core 2 Duo pro-
cessor at 3.33 GHz with 8 GB RAM) per frame.
4. RESULTS AND DISCUSSION
To evaluate the proposed method, we used three standard
CIF sequences (Soccer,Crew, and Bus). All sequences were
100 frames long, at 30 frames per second (fps), and were
encoded using JM 9.8 [15]. Soccer and Bus were encoded
at 50 kbps, and Crew was encoded at 100 kbps. We used
these relatively low bit rates to make the coding artifacts more
visible for display purposes. The GOP structure was set to
IPPPP. The IKN model [12] was utilized to generate saliency
maps. In all experiments reported here, only three QP offsets
Input: Raw Frame Fi
Output: Encoded Frame Fi
Encode Fiusing the ROI encoder
Compute Ei,Bi, and Qi
Lmax = MAXINT
foreach Pn∈Pdo
E1n
i=Eiand B1n
i=Bi
foreach om∈O\o1do
Encode Fiusing Qmn
i=Qi+omPn
Compute Emn
iand Bmn
i, and store them
end
Run MCKP using Emn
iand Bmn
i,m= 1,2, ..., M
Encode Fiusing Q∗n
iobtained by MCKP
Compute the new Ei
if L1(Ei)≤Lmax then
Q∗
i=Q∗n
i
Lmax =L1(Ei)
end
end
Encode Fiusing Q∗
i.
Algorithm 1: The proposed algorithm for saliency pre-
serving video coding
O={0,−1,1}were employed.
We compare the proposed saliency-preserving ROI (SP-
ROI) coding to the conventional ROI coding using three met-
rics. The first metric ∆L1(E)is computed as
∆L1(E) = (ESP −ROI −ERO I )/EROI ,(4)
where
ESP −ROI =1
N
N
X
i=1
L1(ESP −ROI
i),
EROI =1
N
N
X
i=1
L1(EROI
i),
(5)
Nis the total number of frames, and ESP −ROI
iand EROI
iare
the saliency error maps of the i-th frame encoded by the SP-
ROI and ROI coding methods, respectively. This metric indi-
cates how much the total saliency error of the video encoded
by the SP-ROI method is different from the total saliency er-
ror of the video encoded by the ROI coding method.
To measure the propensity of coding artifacts outside of
ROI to draw user’s attention, we first binarize the saliency
map of the original raw frames using a specific threshold in
order to obtain an estimate of the location of ROI. This thresh-
old is set to the 75-th percentile of the saliency map. All
MBs whose saliency is larger than this threshold are con-
sidered as a part of ROI. We then define two new metrics
∆L1(E∗)and ∆J, where ∆L1(E∗)is computed as ∆L1(E)
in (4), except that saliency errors in (5) are taken only for the
MBs outside of ROI. Meanwhile, ∆Jis computed as ∆J=
Presented at IEEE ICME 2011, Barcelona, Spain, July 2011.
Table 1. The performance of SP-ROI relative to ROI video
coding.
Sequence ∆L1(E) ∆L1(E∗) ∆JROI-PSNR
Soccer −6.27% −11.00% −2.65% −0.17 dB
Crew −1.01% −2.11% −5.23% −0.18 dB
Bus −11.57% −5.96% −2.83% −0.09 dB
Table 2. Average PSNR-Y of SP-ROI and ROI coding rela-
tive to RDO coding.
Method Soccer Crew Bus
ROI −0.08 dB −0.22 dB −0.10 dB
SP-ROI −0.09 dB −0.10 dB −0.08 dB
(JSP −ROI −JRO I )/JROI , where JSP −ROI and JRO I are
the fractions of pixels outside ROI that have absolute quanti-
zation error greater than the just-noticeable-difference (JND)
threshold [16] of their corresponding pixels in frames en-
coded by the SP-ROI and ROI-coding methods, respectively.
To compute the JND thresholds, we employed the spatial JND
model proposed in [17] in the luminance (Y) channel. Note
that the JND threshold determines the visibility threshold of
a quantization error. Therefore, a quantization error is visible
if its magnitude is greater that the JND threshold.
Table 1 compares the SP-ROI method with the ROI cod-
ing method using the three aforementioned metrics, as well as
the average Peak Signal-to-Noise Ratio (PSNR) of the Y com-
ponent within ROI. The values of ∆L1(E)indicate that the
saliency error of SP-ROI over the entire frame is lower than
that of ROI coding, which was the design goal. ∆L1(E∗)
shows that outside of ROI, the saliency error with SP-ROI
coding is lower, indicating that it is less likely for non-ROI
regions to become salient after encoding. Finally, the val-
ues of ∆Jshow that the percentage of pixels outside of ROI
whose quantization error is above JND is lower with SP-ROI
coding than with conventional ROI coding. Overall, the pro-
posed SP-ROI method reduces both the saliency and visibility
of the coding artifacts compared to conventional ROI coding.
This comes with the cost of slightly reduced PSNR within
ROI, as indicated in the last column of the table.
In Table 2, the average PSNR performance of the SP-ROI
and ROI coding methods are compared against RDO coding.
These PSNR values were obtained by averaging the PSNR
over all frames of the corresponding sequence. As seen from
this table, the average PSNR value of both SP-ROI and ROI
coding is lower than that of RDO coding, as expected. How-
ever, as illustrated in the next example, both SP-ROI and
ROI coding provide better visual quality than RDO. All of
the above results were obtained after matching the bit rate of
the ROI-coded and RDO-coded videos with the actual bit rate
of the video encoded by the SP-ROI method within ±0.1%
difference. Tables 3 and 4 show, respectively, the average
Table 3. Average SSIM index.
Method Soccer Crew Bus
RDO 0.6200 0.7871 0.4864
ROI 0.6231 0.7897 0.4846
SP-ROI 0.6378 0.7926 0.5100
Table 4. Average VQM values.
Method Soccer Crew Bus
RDO 3.45169 2.55774 7.60255
ROI 3.47870 2.51918 7.60608
SP-ROI 3.79243 2.57792 7.71076
structural similarity (SSIM) index [18] and the average Video
Quality Metric (VQM) value [19] computed over all frames
of each sequence. As seen from these results, the proposed
method provides higher visual quality, as measuerd by both of
these metrics, compared to conventional RDO and ROI meth-
ods.
Fig. 2 compares the visual quality of the three methods
on a sample frame of Soccer. As seen from these figures, the
proposed SP-ROI coding provides an improved visual qual-
ity of the encoded frames by reducing the visibility of the
coding artifacts. In particular, note that the coding artifacts
around the ball and the player’s feet have been reduced, com-
pared to both RDO and ROI-coded frame. At the same time,
the visual quality of conventional ROI-coded frame is slightly
better than the RDO-coded frame.
5. CONCLUSION
In this paper, we introduced the concept of saliency-
preserving video coding, and proposed a novel ROI coding
method that attempts to preserve the saliency of the original
video frames. Experimental results were presented using the
saliency model from [12], although the proposed method is
generic and can utilize any other visual attention model. The
results indicate that the proposed method is able to improve
the visual quality of encoded video compared to conventional
ROI and RDO video coding at low bit rates.
References
[1] M. Ghanbari, Video Coding: An Introduction to Standard
Codecs, London, U.K. : Institution of Electrical Engineers,
1999.
[2] I. E. G. Richardson, H.264 and MPEG-4 Video Compression:
Video Coding for Next-Generation Multimedia, NJ:Wiley,
2003.
[3] M. Yuen and H. R. Wu, “A survey of hybrid MC/D PCM/ DCT
video coding distortions,” Signal Process., vol. 70, no. 3, pp.
247–278, 1998.
Presented at IEEE ICME 2011, Barcelona, Spain, July 2011.
[4] H. Liu, N. Klomp, and I. Heynderickx, “A perceptually rele-
vant approach to ringing region detection,” IEEE Trans. Image
Process., vol. 19, no. 6, pp. 1304–1318, Jun. 2010.
[5] M. Shen and C. J. Kuo, “Review of postprocessing techniques
for compression artifact removal,” J. Vis. Commun. Image
Rep., vol. 9, no. 1, pp. 2–14, 1998.
[6] S. Daly, “The visible difference predictor: an algorithm for the
assessment of image fidelity,” in Digital Images and Human
Vision, A. B. Watson, Ed. 1993, pp. 179–206, MIT Press.
[7] L. Itti, “Automatic foveation for video compression using a
neurobiological model of visual attention,” IEEE Trans. Image
Process., vol. 13, no. 10, pp. 1304–1318, 2004.
[8] Z. Chen, N. K. Ngan, and W. Lin, “Perceptual video cod-
ing: Challenges and approaches,” in Proc. IEEE International
Conference on Multimedia and Expo (ICME’10), Jul. 2010, pp.
784–789.
[9] H. Kellerer, U. Pferschy, and D. Pisinger, Knapsack Problems,
Springer, 2004.
[10] Y. Liu, Z. G. Li, and Y. C. Soh, “Region-of-interest based
resource allocation for conversational video communication of
H.264 /AVC,” IEEE Trans. Circuits Syst. Video Technol., vol.
18, no. 1, pp. 134–139, Jan. 2008.
[11] Y. Liu, Z. G. Li, and Y. C. Soh, “A novel rate control scheme
for low delay video communication of H.264 /AVC standard,”
IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 1, pp.
67–78, Jan. 2007.
[12] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based
visual attention for rapid scene analysis,” IEEE Trans. Pattern
Anal. Machine Intell., vol. 20, pp. 1254–1259, Nov. 1998.
[13] J.-C. Chiang, C.-S. Hsieh, G. Chang, F.-D. Jou, and W.-N.
Lie, “Region-of-interest based rate control scheme with flexi-
ble quality on demand,” in Proc. IEEE International Confer-
ence on Multimedia and Expo (ICME’10), Jul. 2010, pp. 238–
242.
[14] D. Pisinger, “A minimal algorithm for the multiple-choice
knapsack problem,” European Journal of Operational Re-
search, vol. 83, pp. 394–410, 1994.
[15] “The H. 264/ AVC J M reference software,” [Online] Available:
http://iphome.hhi.de/suehring/tml/.
[16] C.-H. Chou and Y.-C. Li, “A perceptually tuned subband im-
age coder based on the measure of just-noticeable-distortion
profile,” IEEE Trans. Image Process., vol. 5, no. 6, pp. 467–
476, Dec. 1995.
[17] X. Yang, W. Lin, Z. Lu, E. Ong, and S. Yao, “Motion-
compensated residue preprocessing in video coding based on
just-noticeable-distortion profile,” IEEE Trans. Circuits Syst.
Video Technol., vol. 15, no. 6, pp. 745–752, Jun. 2005.
[18] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli,
“Image quality assessment: From error visibility to structural
similarity,” IEEE Trans. Image Process., vol. 13, no. 4, pp.
600–612, Apr. 2004.
[19] M. Pinson and S. Wolf, “A new standardized method for ob-
jectively measuring video quality,” IEEE Trans. Broadcasting.,
vol. 50, no. 3, pp. 312–322, Sep. 2004.
(a) (b)
(c) (d)
(e) (f)
(g) (h)
Fig. 2. An example of the visual quality of different methods.
(a) original frame, (b) saliency map of the original frame, (c)
RDO-coded frame, (d) ROI-coded frame, (e) SP-ROI-coded
frame, (f) saliency error map of the RDO-coded frame, (g)
saliency error map of the ROI-coded frame, (h) saliency error
map of the SP-ROI-coded frame.