Photo and Video Quality Evaluation: Focusing on the Subject.
ABSTRACT Traditionally, distinguishing between high quality professional pho- tos and low quality amateurish photos is a human task. To automatically assess the quality of a photo that is consistent with humans perception is a challenging topic in computer vision. Various differences exist between photos taken by pro- fessionals and amateurs because of the use of photography techniques. Previous methods mainly use features extracted from the entire image. In this paper, based on professional photography techniques, we first extract the subject region from a photo, and then formulate a number of high-level semantic features based on this subject and background division. We test our features on a large and diverse photo database, and compare our method with the state of the art. Our method performs significantly better with a classification rate of 93% versus 72% by the best existing method. In addition, we conduct the first study on high-level video quality assessment. Our system achieves a precision of over 95% in a reason- able recall rate for both photo and video assessments. We also show excellent application results in web image search re-ranking.
- SourceAvailable from: Dengxin Dai
Conference Paper: The Synthesizability of Texture Examples[Show abstract] [Hide abstract]
ABSTRACT: Example-based texture synthesis (ETS) has been widely used to generate high quality textures of desired sizes from a small example. However, not all textures are equally well reproducible that way. We predict how synthesizable a par-ticular texture is by ETS. We introduce a dataset (21, 302 textures) of which all images have been annotated in terms of their synthesizability. We design a set of texture features, such as 'textureness', homogeneity, repetitiveness, and ir-regularity, and train a predictor using these features on the data collection. This work is the first attempt to quantify this image property, and we find that texture synthesizability can be learned and predicted. We use this insight to trim images to parts that are more synthesizable. Also we suggest which texture synthesis method is best suited to synthesise a given texture. Our approach can be seen as 'winner-uses-all': picking one method among several alternatives, ending up with an overall superior ETS method. Such strategy could also be considered for other vision tasks: rather than build-ing an even stronger method, choose from existing methods based on some simple preprocessing.IEEE Conference on Computer Vision and Pattern Recognition; 06/2014
- [Show abstract] [Hide abstract]
ABSTRACT: The popularity of mobile devices equipped with various cameras has revolutionized modern photography. People are able to take photos and share their experiences anytime and anywhere. However, taking a high quality photograph via mobile device remains a challenge for mobile users. In this paper we investigate a photography model to assist mobile users in capturing high quality photos by using both the rich context available from mobile devices and crowdsourced social media on the Web. The photography model is learned from community-contributed images on the Web, and dependent on user's social context. The context includes user's current geo-location, time (i.e., time of the day), and weather (e.g., clear, cloudy, foggy, etc.). Given a wide view of scene, our socialized mobile photography system is able to suggest the optimal view enclosure (composition) and appropriate camera parameters (aperture, ISO, and exposure time). Extensive experiments have been performed for eight well-known hot spot landmark locations where sufficient context tagged photos can be obtained. Through both objective and subjective evaluations, we show that the proposed socialized mobile photography system can indeed effectively suggest proper composition and camera parameters to help the user capture high quality photos.IEEE Transactions on Multimedia 01/2014; 16(1):184-200. · 1.78 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Most works on image retrieval from text queries have addressed the problem of re-trieving semantically relevant images. However, the ability to assess the aesthetic quality of an image is an increasingly important differentiating factor for search engines. In this work, given a semantic query, we are interested in retrieving images which are semanti-cally relevant and score highly in terms of aesthetics/visual quality. We use large-margin classifiers and rankers to learn statistical models capable of ordering images based on the aesthetic and semantic information. In particular, we compare two families of ap-proaches: while the first one attempts to learn a single ranker which takes into account both semantic and aesthetic information, the second one learns separate semantic and aesthetic models. We carry out a quantitative and qualitative evaluation on a recently-published large-scale dataset and we show that the second family of techniques signifi-cantly outperforms the first one.01/2012;
Photo and Video Quality Evaluation:
Focusing on the Subject
Yiwen Luo and Xiaoou Tang
Department of Information Engineering
The Chinese University of Hong Kong, Hong Kong
Abstract. Traditionally, distinguishing between high quality professional pho-
tos and low quality amateurish photos is a human task. To automatically assess
the quality of a photo that is consistent with humans perception is a challenging
topic in computer vision. Various differences exist between photos taken by pro-
fessionals and amateurs because of the use of photography techniques. Previous
methods mainly use features extracted from the entire image. In this paper, based
on professional photography techniques, we first extract the subject region from
a photo, and then formulate a number of high-level semantic features based on
this subject and background division. We test our features on a large and diverse
photo database, and compare our method with the state of the art. Our method
performs significantly better with a classification rate of 93% versus 72% by the
best existing method. In addition, we conduct the first study on high-level video
quality assessment. Our system achieves a precision of over 95% in a reason-
able recall rate for both photo and video assessments. We also show excellent
application results in web image search re-ranking.
number of photos that can be accessed is growing explosively. Automatically assessing
the quality of photos that is consistent with human’s perception has become more and
more important with the increasing need of professionals and home users. For example,
newspaper editors can use it to find high quality photos to express news effectively;
home users can use such a tool to select good-looking photos to show from their e-
photo albums; and web search engines may incorporatethis functionto display relevant
and high quality images for the user. Fig. 1 shows two example photos. Most people
agree that the left photo is of high quality and the right one is not. To tell the differences
between high quality professional photos and low quality photos is natural to a human,
but difficult to a computer.
There have been a number of works on image quality assessment concerning image
degradation caused by noise, distortion, and compression artifacts , , . Differ-
ent from these works, we consider photo quality from an aesthetic point of view and try
to determine the factors that make a photo look good in human’s perception. The most
D. Forsyth, P. Torr, and A. Zisserman (Eds.): ECCV 2008, Part III, LNCS 5304, pp. 386–399, 2008.
c ? Springer-Verlag Berlin Heidelberg 2008
Photo and Video Quality Evaluation: Focusing on the Subject387
Fig.1. Most people may agree that (a) is of higher quality than (b)
related work is published in , , and . Tong et al.  and Datta et al.  com-
bined features that are mostly used for image retrieval previously with a standard set of
learning algorithms for the classification of professional photos and amateurish photos.
For the same purpose, Ke et al. designed their features based the spatial distribution
of edges, blur, and the histograms of low-level color properties such as brightness and
hue . Our experiments show that the method in  produce better results than that
in  and  with much less number of features, but it is still not good enough with a
classification rate of 72% on a large dataset.
The main problem with existing methods is that they compute features from the
whole image. This significantly limits the performance of the features since a good
photo usually treats the foregroundsubject and the backgroundvery differently.Profes-
sional photographersusually differentiate the subject of the photo from the background
to highlight the topic of the photo. High quality photos generally satisfy three princi-
ples: a clear topic, gathering most attention on the subject, and removing objects that
distract attention from the subject , , . Photographers try to achieve this by
skillfully manipulating the photo composition, lighting, and focus of the subject. Moti-
vated by these principles, in this paper, we first use a simple and effectiveblur detection
method to roughly identify the focus subject area. Then following human perception
of photo qualities we develop several highly effective quantitative metrics on subject
clarity, lighting,composition,and color. In addition,we conduct the first study on video
quality evaluation. We achieve significant improvement over state of the art methods
reducing the error rates by several folds. We also apply our work to on-line image re-
ranking for MSN Live image search results with good performance.
In summary, the main contributions of this paper include: 1) Proposed a novel ap-
proach to evaluate photo and video quality by focusing on the foreground subject and
developedan efficientsubjectdetectionalgorithm;2)Developeda set ofhighlyeffective
high-level visual features for photo quality assessment; 3) Conducted the first study of
high-level video quality assessment and build the first database for such study; 4) First
studied visual quality re-ranking for real world online image search.
2Criteria for Assessing Photo Quality
raphers to improve photo quality. Notice that most of them rely on different treatment
of the subject and the background.
388 Y. Luo and X. Tang
Fig.2. (a) “Fall on the Rocks” by M. Marjory, 2007. (b) “Mona Lisa Smiles” by David Scar-
brough, 2007. (c) “Fall: One Leaf at a Time” by Jeff Day, 2007. (d) “Winter Gets Closer” by Cyn
D. Valentine, 2007. (e) “The Place Where Romance Starts” by William Lee, 2007.
Composition means the organization of all the graphic elements inside a photo. Good
composition can clearly show the audience the photo’s topic and effectively express
photographer’s feeling. The theory of composition is usually rooted in one simple con-
cept: contrast. Professional photographersuse contrast to awaken a vital feeling for the
subject through a personal observation . Contrast between light and dark, between
shapes, colors, and even sensations, is the basis for composing a photo. The audience
can often find the obvious contrast between the cool and hard stones in the foreground
and the warm and soft river and forest in the backgroundin Fig. 2a.
A badly lit scene ruins the photo as much as poor composition. The way a scene is
lit changes its mood and the audience’s perception of what the photo tries to express.
Lighting in high quality photos makes the subjects not appear flat and enhances their
3D feeling, which is helpful to attract the audience’s attention to the subjects. Good
lighting results in strong contrast between the subject and the background,and visually
distinguishes the subject from the background. The lighting in Fig. 2b isolates the girls
from the background and visually enhances the 3D feeling of them.
Professional photographers control the focus of the lens to isolate the subject from the
background. They blur the background but keep the subject in focus, such as Fig. 2c.
They may also blur closer objects but sharpen farther objects to express the depth of
the scene, such as Fig. 2d. More than capturing the scene only, controlling the lens can
create surrealistic effects, such as Figs. 2c and 2e.
Photo and Video Quality Evaluation: Focusing on the Subject389
Much of what viewers perceive and feel about a photo is throughcolors. Although their
color perception depends on the context and is culture-related, recent color science
study shows that the influence on human emotions or feeling from a certain color or
a certain color combination is usually stable in varying culture background , .
Professional photographers use various exposure and interpreting methods to control
the colorpalette in a photo,and use specific colorcombinationto raise viewers’specific
emotion, producing a pleasing affective response. The photographerof Fig. 2a uses the
combination of bright yellow and dark gray to produce an aesthetic feeling from the
beautyofnature.ThephotographerofFig. 2b uses the combinationof white andnatural
skin color to enhance the beauty of chasteness from the girls.
3Features for Photo Quality Assessment
Based on the previous analysis, we formulate these semantic criteria mathematically in
this section. We first separate the subject from the background,and then discuss how to
extract the features for photo quality assessment.
3.1Subject Region Extraction
Professional photographers usually make the subject of a photo clear and the back-
ground blurred. We propose an algorithm to detect the clear area of the photo and con-
sider it as the subject region and the rest as the background.
Levin et al.  presented a scheme to identify blur in an image when the blur is
caused by 1D motion. We modify it to detect 2D blurred regions in an image. Let us
use Fig. 3 as an example to explain the method. Fig. 3a is a landscape photo. We use a
kernel of size k × k with all coefficients equal to 1/k2to blur the photo. Figs. 3b, 3c
and 3d are the results blurred by 5× 5, 10×10, and 20×20 kernels, respectively. The
log histograms of the horizontal derivatives of the four images in Fig. 3 are shown in
Fig. 3e, and the log histograms of the vertical derivatives of the four images are shown
in Fig. 3f. It is obvious that the blurring significantly changes the shapes of the curves
in the histograms. This suggests that the statistics of the derivative filter responses can
be used to tell the difference between clear and blurred regions.
Let fkdenotesthe blurringkernel of size k×k. Convolvingthe image I with fk, and
computing the horizontal and vertical derivatives from I ∗fk, we have the distributions
of the horizontal and vertical derivatives:
pxk∝ hist(I ∗ fk∗ dx),pyk∝ hist(I ∗ fk∗ dy)
where dx= [1,−1], and dy = [1,−1]T. The operations in Eq. (1) are done 50 times
with k = 1,2,...,50.
For a pixel (i,j) in I, we define a log-likelihood of derivatives in its neighboring
window W(i,j)of size n × n with respect to each of the blurring models as:
(logpxk(Ix(i?,j?)) + logpyk(Iy(i?,j?))),
390Y. Luo and X. Tang
where Ix(i?,j?) and Iy(i?,j?) are the horizontal and vertical derivatives at pixel (i?,j?),
respectively, and lk(i,j) measures how well the pixel (i,j)’s neighboring window is
explained by a k × k blurring kernel. Then we can find the blurring kernel that best
explains the window’s statistics by k∗(i,j) = argmaxklk(i,j). When k∗(i,j) = 1,
pixel (i,j) is in the clear area; otherwise it is in the blurred area. With k∗(i,j) for all
the pixels of I, we can obtain a binary image U to denote the clear and blurred regions
of I, defined as:
Two examples of such images are show in in Figs. 4a and 4b with the neighboring
window size of 3 × 3. Next, we find a compact bounding box that encloses the main
part of the subject in an image.
Projecting U onto the x and y axes independently,we have
On the x axis, we find x1 and x2 such that the energy in [0,x1] and the energy in
[x2,N − 1] are each equal to1−α
image in the x direction. Similarly, we can find y1and y2in the y direction. Thus, the
subject region R is [x1+1,x2−1]×[y1+1,y2−1]. In all our experiments, we choose
α = 0.9. Two examples of subject regions corresponding to Figs. 1a and 1b are given
in Figs. 4c and 4d.
k∗(i,j) = 1
k∗(i,j) > 1.
of the total energy in Ux, where N is the size of the
(a) (b)(c) (d)
−0.2−0.10 0.1 0.2
Fig.3. Images blurred by different blurring kernels. (a) Original Image. (b) Result blurred by the
5×5 kernel. (c) Result blurred by the 10×10 kernel. (d) Result blurred by the 20×20 kernel. (e)
Log histograms of the horizontal derivatives of the original image and the images blurred by the
5×5, 10×10, and 20×20 kernels, respectively. (f) Log histograms of the vertical derivatives of
the original image and the blurred images by 5 × 5, 10 × 10, and 20 × 20 kernels, respectively.