Photo and Video Quality Evaluation: Focusing on the Subject.
ABSTRACT Traditionally, distinguishing between high quality professional pho- tos and low quality amateurish photos is a human task. To automatically assess the quality of a photo that is consistent with humans perception is a challenging topic in computer vision. Various differences exist between photos taken by pro- fessionals and amateurs because of the use of photography techniques. Previous methods mainly use features extracted from the entire image. In this paper, based on professional photography techniques, we first extract the subject region from a photo, and then formulate a number of high-level semantic features based on this subject and background division. We test our features on a large and diverse photo database, and compare our method with the state of the art. Our method performs significantly better with a classification rate of 93% versus 72% by the best existing method. In addition, we conduct the first study on high-level video quality assessment. Our system achieves a precision of over 95% in a reason- able recall rate for both photo and video assessments. We also show excellent application results in web image search re-ranking.
- SourceAvailable from: psu.edu
Conference Proceeding: No-reference perceptual quality assessment of JPEG compressed images[show abstract] [hide abstract]
ABSTRACT: Human observers can easily assess the quality of a distorted image without examining the original image as a reference. By contrast, designing objective No-Reference (NR) quality measurement algorithms is a very difficult task. Currently, NR quality assessment is feasible only when prior knowledge about the types of image distortion is available. This research aims to develop NR quality measurement algorithms for JPEG compressed images. First, we established a JPEG image database and subjective experiments were conducted on the database. We show that Peak Signal-to-Noise Ratio (PSNR), which requires the reference images, is a poor indicator of subjective quality. Therefore, tuning an NR measurement model towards PSNR is not an appropriate approach in designing NR quality metrics. Furthermore, we propose a computational and memory efficient NR quality assessment model for JPEG images. Subjective test results are used to train the model, which achieves good quality prediction performance.Image Processing. 2002. Proceedings. 2002 International Conference on; 02/2002
- [show abstract] [hide abstract]
ABSTRACT: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/∼lcv/ssim/.IEEE Transactions on Image Processing 05/2004; · 3.20 Impact Factor
Conference Proceeding: Studying Aesthetics in Photographic Images Using a Computational Approach.[show abstract] [hide abstract]
ABSTRACT: Aesthetics, in the world of art and photography, refers to the principles of the nature and appreciation of beauty. Judging beauty and other aesthetic qualities of photographs is a highly subjective task. Hence, there is no unanimously agreed standard for measuring aesthetic value. In spite of the lack of firm rules, certain features in photographic images are believed, by many, to please humans more than certain others. In this paper, we treat the challenge of automatically inferring aesthetic quality of pictures using their visual content as a machine learning problem, with a peer-rated online photo sharing Website as data source. We extract certain visual features based on the intuition that they can discriminate between aesthetically pleasing and displeasing images. Automated classifiers are built using support vector machines and classification trees. Linear regression on polynomial terms of the features is also applied to infer numerical aesthetics ratings. The work attempts to explore the relationship between emotions which pictures arouse in people, and their low-level content. Potential applications include content-based image retrieval and digital photography.Computer Vision - ECCV 2006, 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006, Proceedings, Part III; 01/2006
Photo and Video Quality Evaluation:
Focusing on the Subject
Yiwen Luo and Xiaoou Tang
Department of Information Engineering
The Chinese University of Hong Kong, Hong Kong
Abstract. Traditionally, distinguishing between high quality professional pho-
tos and low quality amateurish photos is a human task. To automatically assess
the quality of a photo that is consistent with humans perception is a challenging
topic in computer vision. Various differences exist between photos taken by pro-
fessionals and amateurs because of the use of photography techniques. Previous
methods mainly use features extracted from the entire image. In this paper, based
on professional photography techniques, we first extract the subject region from
a photo, and then formulate a number of high-level semantic features based on
this subject and background division. We test our features on a large and diverse
photo database, and compare our method with the state of the art. Our method
performs significantly better with a classification rate of 93% versus 72% by the
best existing method. In addition, we conduct the first study on high-level video
quality assessment. Our system achieves a precision of over 95% in a reason-
able recall rate for both photo and video assessments. We also show excellent
application results in web image search re-ranking.
number of photos that can be accessed is growing explosively. Automatically assessing
the quality of photos that is consistent with human’s perception has become more and
more important with the increasing need of professionals and home users. For example,
newspaper editors can use it to find high quality photos to express news effectively;
home users can use such a tool to select good-looking photos to show from their e-
photo albums; and web search engines may incorporatethis functionto display relevant
and high quality images for the user. Fig. 1 shows two example photos. Most people
agree that the left photo is of high quality and the right one is not. To tell the differences
between high quality professional photos and low quality photos is natural to a human,
but difficult to a computer.
There have been a number of works on image quality assessment concerning image
degradation caused by noise, distortion, and compression artifacts , , . Differ-
ent from these works, we consider photo quality from an aesthetic point of view and try
to determine the factors that make a photo look good in human’s perception. The most
D. Forsyth, P. Torr, and A. Zisserman (Eds.): ECCV 2008, Part III, LNCS 5304, pp. 386–399, 2008.
c ? Springer-Verlag Berlin Heidelberg 2008
Photo and Video Quality Evaluation: Focusing on the Subject387
Fig.1. Most people may agree that (a) is of higher quality than (b)
related work is published in , , and . Tong et al.  and Datta et al.  com-
bined features that are mostly used for image retrieval previously with a standard set of
learning algorithms for the classification of professional photos and amateurish photos.
For the same purpose, Ke et al. designed their features based the spatial distribution
of edges, blur, and the histograms of low-level color properties such as brightness and
hue . Our experiments show that the method in  produce better results than that
in  and  with much less number of features, but it is still not good enough with a
classification rate of 72% on a large dataset.
The main problem with existing methods is that they compute features from the
whole image. This significantly limits the performance of the features since a good
photo usually treats the foregroundsubject and the backgroundvery differently.Profes-
sional photographersusually differentiate the subject of the photo from the background
to highlight the topic of the photo. High quality photos generally satisfy three princi-
ples: a clear topic, gathering most attention on the subject, and removing objects that
distract attention from the subject , , . Photographers try to achieve this by
skillfully manipulating the photo composition, lighting, and focus of the subject. Moti-
vated by these principles, in this paper, we first use a simple and effectiveblur detection
method to roughly identify the focus subject area. Then following human perception
of photo qualities we develop several highly effective quantitative metrics on subject
clarity, lighting,composition,and color. In addition,we conduct the first study on video
quality evaluation. We achieve significant improvement over state of the art methods
reducing the error rates by several folds. We also apply our work to on-line image re-
ranking for MSN Live image search results with good performance.
In summary, the main contributions of this paper include: 1) Proposed a novel ap-
proach to evaluate photo and video quality by focusing on the foreground subject and
developedan efficientsubjectdetectionalgorithm;2)Developeda set ofhighlyeffective
high-level visual features for photo quality assessment; 3) Conducted the first study of
high-level video quality assessment and build the first database for such study; 4) First
studied visual quality re-ranking for real world online image search.
2Criteria for Assessing Photo Quality
raphers to improve photo quality. Notice that most of them rely on different treatment
of the subject and the background.
388 Y. Luo and X. Tang
Fig.2. (a) “Fall on the Rocks” by M. Marjory, 2007. (b) “Mona Lisa Smiles” by David Scar-
brough, 2007. (c) “Fall: One Leaf at a Time” by Jeff Day, 2007. (d) “Winter Gets Closer” by Cyn
D. Valentine, 2007. (e) “The Place Where Romance Starts” by William Lee, 2007.
Composition means the organization of all the graphic elements inside a photo. Good
composition can clearly show the audience the photo’s topic and effectively express
photographer’s feeling. The theory of composition is usually rooted in one simple con-
cept: contrast. Professional photographersuse contrast to awaken a vital feeling for the
subject through a personal observation . Contrast between light and dark, between
shapes, colors, and even sensations, is the basis for composing a photo. The audience
can often find the obvious contrast between the cool and hard stones in the foreground
and the warm and soft river and forest in the backgroundin Fig. 2a.
A badly lit scene ruins the photo as much as poor composition. The way a scene is
lit changes its mood and the audience’s perception of what the photo tries to express.
Lighting in high quality photos makes the subjects not appear flat and enhances their
3D feeling, which is helpful to attract the audience’s attention to the subjects. Good
lighting results in strong contrast between the subject and the background,and visually
distinguishes the subject from the background. The lighting in Fig. 2b isolates the girls
from the background and visually enhances the 3D feeling of them.
Professional photographers control the focus of the lens to isolate the subject from the
background. They blur the background but keep the subject in focus, such as Fig. 2c.
They may also blur closer objects but sharpen farther objects to express the depth of
the scene, such as Fig. 2d. More than capturing the scene only, controlling the lens can
create surrealistic effects, such as Figs. 2c and 2e.
Photo and Video Quality Evaluation: Focusing on the Subject389
Much of what viewers perceive and feel about a photo is throughcolors. Although their
color perception depends on the context and is culture-related, recent color science
study shows that the influence on human emotions or feeling from a certain color or
a certain color combination is usually stable in varying culture background , .
Professional photographers use various exposure and interpreting methods to control
the colorpalette in a photo,and use specific colorcombinationto raise viewers’specific
emotion, producing a pleasing affective response. The photographerof Fig. 2a uses the
combination of bright yellow and dark gray to produce an aesthetic feeling from the
beautyofnature.ThephotographerofFig. 2b uses the combinationof white andnatural
skin color to enhance the beauty of chasteness from the girls.
3Features for Photo Quality Assessment
Based on the previous analysis, we formulate these semantic criteria mathematically in
this section. We first separate the subject from the background,and then discuss how to
extract the features for photo quality assessment.
3.1Subject Region Extraction
Professional photographers usually make the subject of a photo clear and the back-
ground blurred. We propose an algorithm to detect the clear area of the photo and con-
sider it as the subject region and the rest as the background.
Levin et al.  presented a scheme to identify blur in an image when the blur is
caused by 1D motion. We modify it to detect 2D blurred regions in an image. Let us
use Fig. 3 as an example to explain the method. Fig. 3a is a landscape photo. We use a
kernel of size k × k with all coefficients equal to 1/k2to blur the photo. Figs. 3b, 3c
and 3d are the results blurred by 5× 5, 10×10, and 20×20 kernels, respectively. The
log histograms of the horizontal derivatives of the four images in Fig. 3 are shown in
Fig. 3e, and the log histograms of the vertical derivatives of the four images are shown
in Fig. 3f. It is obvious that the blurring significantly changes the shapes of the curves
in the histograms. This suggests that the statistics of the derivative filter responses can
be used to tell the difference between clear and blurred regions.
Let fkdenotesthe blurringkernel of size k×k. Convolvingthe image I with fk, and
computing the horizontal and vertical derivatives from I ∗fk, we have the distributions
of the horizontal and vertical derivatives:
pxk∝ hist(I ∗ fk∗ dx),pyk∝ hist(I ∗ fk∗ dy)
where dx= [1,−1], and dy = [1,−1]T. The operations in Eq. (1) are done 50 times
with k = 1,2,...,50.
For a pixel (i,j) in I, we define a log-likelihood of derivatives in its neighboring
window W(i,j)of size n × n with respect to each of the blurring models as:
(logpxk(Ix(i?,j?)) + logpyk(Iy(i?,j?))),
390Y. Luo and X. Tang
where Ix(i?,j?) and Iy(i?,j?) are the horizontal and vertical derivatives at pixel (i?,j?),
respectively, and lk(i,j) measures how well the pixel (i,j)’s neighboring window is
explained by a k × k blurring kernel. Then we can find the blurring kernel that best
explains the window’s statistics by k∗(i,j) = argmaxklk(i,j). When k∗(i,j) = 1,
pixel (i,j) is in the clear area; otherwise it is in the blurred area. With k∗(i,j) for all
the pixels of I, we can obtain a binary image U to denote the clear and blurred regions
of I, defined as:
Two examples of such images are show in in Figs. 4a and 4b with the neighboring
window size of 3 × 3. Next, we find a compact bounding box that encloses the main
part of the subject in an image.
Projecting U onto the x and y axes independently,we have
On the x axis, we find x1 and x2 such that the energy in [0,x1] and the energy in
[x2,N − 1] are each equal to1−α
image in the x direction. Similarly, we can find y1and y2in the y direction. Thus, the
subject region R is [x1+1,x2−1]×[y1+1,y2−1]. In all our experiments, we choose
α = 0.9. Two examples of subject regions corresponding to Figs. 1a and 1b are given
in Figs. 4c and 4d.
k∗(i,j) = 1
k∗(i,j) > 1.
of the total energy in Ux, where N is the size of the
(a) (b)(c) (d)
−0.2−0.10 0.1 0.2
Fig.3. Images blurred by different blurring kernels. (a) Original Image. (b) Result blurred by the
5×5 kernel. (c) Result blurred by the 10×10 kernel. (d) Result blurred by the 20×20 kernel. (e)
Log histograms of the horizontal derivatives of the original image and the images blurred by the
5×5, 10×10, and 20×20 kernels, respectively. (f) Log histograms of the vertical derivatives of
the original image and the blurred images by 5 × 5, 10 × 10, and 20 × 20 kernels, respectively.