Content uploaded by David Hasler

Author content

All content in this area was uploaded by David Hasler on May 27, 2014

Content may be subject to copyright.

Measuring colourfulness in natural images

David Hasler

a

and Sabine S¨usstrunk

b

a

LOGO GmbH, Steinfurt, Germany

b

Audiovisual Communication Lab. (LCAV),

Swiss Fed. Inst. of Tech. (EPFL), Lausanne, Switzerland

ABSTRACT

We want to integrate colourfulness in an image quality evaluation framework. This quality framework is meant

to evaluate the perceptual impact of a compression algorithm or an error prone communication channel on the

quality of an image. The image might go through various enhancement or compression algorithms, resulting in

a diﬀerent—but not necessarily worse—image. In other words, we will measure quality but not ﬁdelity to the

original picture.

While modern colour appearance models are able to predict the perception of colourfulness of simple patches

on uniform backgrounds, there is no agreement on how to measure the overall colourfulness of a picture of a

natural scene. We try to quantify the colourfulness in natural images to perceptually qualify the eﬀect that

processing or coding has on colour. We set up a psychophysical category scaling experiment, and ask people to

rate images using 7 categories of colourfulness. We then ﬁt a metric to the results, and obtain a correlation of

over 90% with the experimental data. The metric is meant to be used real time on video streams. We ignored

any issues related to hue in this paper.

Keywords: Image quality metric, colourfulness metric

1. INTRODUCTION

Modern pictorial imaging systems aim at producing the best looking picture rather than at achieving luminance

and colour ﬁdelity. While evaluating the quality of a processed image, one needs to consider that if the resulting

image is diﬀerent from the original one, it does not necessarily mean that it is of worse quality. When designing

a colour quality metric, we believe that two main factors have to be considered: colour cast and colourfulness.

In this paper, we will only consider the overall colourfulness of an image, without measuring ﬁdelity.

We want to quantify ‘how bad’ is the colour in an image after compression. Our work is part of a larger

framework for measuring the perceptual quality of a video stream after transmission over a network, using a

no reference quality metric approach. The method should be able to work on a single image—or a single video

stream—without having the original image. In other words, we cannot determine the quality of a compression

and coding scheme by doing an image-based comparison between a compressed image and its original, because

the original image is simply not available. Ideally, the method should be able to say if an image is good, but

more practically, the scheme might use some meta data that comes along with the data, for example a set of

parameters deﬁning the properties of the original image. Additionally, the idea of not using the original image

for assessing quality enables the method to deal with images that have gone through various tone mapping or

image enhancement algorithms.

Colour can get degraded in two ways: by colour casts or by a colourfulness loss. Modern colour appearance

models

1, 2

are able to compute colourfulness correlates of colour patches depending on the viewing conditions and

surround. Nevertheless, there is no agreement on how to measure the overall colourfulness of a natural scene,

although very recent techniques try to adress image colour quality in a more general framework.

3

To try to

answer the question of image colourfulness, we set up a psychophysical experiment, where the subject are asked

to rate the colourfulness by choosing among 7 categories. Finally, we try to get an algorithm that best ﬁts the

result of the psychophysical experiment.

email: david.hasler@bluewin.ch, sabine.susstrunk@epﬂ.ch

This paper starts by describing the psychophysical experiment (section 2), and the method used to analyse the

data (section 3). Following section describes every parameter that is considered for building a metric (section 4),

along with the description of the method used to compute an optimal parameter set (section 5). The results are

shown next (section 6), followed by a section that might interest anyone concerned with eﬃcient implementations

(section 7), where a metric that uses a much simpler colour space is proposed.

2. THE EXPERIMENT

We use 20 non expert viewers and ask them to give a global colourfulness rating for a set of 84 image. The

experimental conditions are described in.

4

The user has to choose among the following categories:

1. not colourful

2. slightly colourful

3. moderately colourful

4. averagely colourful

5. quite colourful

6. highly colourful

7. extremely colourful

Prior to the experiment, 4 examples are shown, rated as ‘not colourful’, ‘slightly colourful’, ‘averagely colour-

ful’ and ‘extremely colourful’ to set the scale of the experiment. None of the examples show the same scene

content than the test images. We chose the 2 images in the middle of the scale after conducting a preliminary

experiment, using 5 expert viewers, and selecting the images rated with the least confusion among the viewers.

The 2 images in the extremity of the scale are chosen by the ﬁrst author. We used 10 scenes, which we processed

by linearly reducing the chroma in CIELab space to generate the 84 test images. The images are shown on a

LCD monitor. The images are presented in random order, one image at a time on a grey background. A grey

screen lasting 300ms is displayed between each image. A subset of the images is shown in ﬁgure 1.

We choose to use a category scaling experiment, instead of a paired comparison experiment, to ensure that

the viewer adapts to the image white point, and to avoid the inﬂuence one image may have on the perception of

the other one. Since we consider that a greyscale image has no colourfulness, we can compute a ratio scale using

Thurstone’s law of comparative judgement, as described in Engeldrum.

5

3. COMPUTING A SCALE VALUE FROM THE EXPERIMENTAL DATA

We brieﬂy summarise the method found in Engeldrum

5

in section 10.2.2—The reader not interested in imple-

mentation issues might as well skip this section. The use of a scale value allows to consider that the perceptual

distance between ‘slightly colourful’ and ‘moderately colourful’ might be diﬀerent than the distance between

‘highly colourful’ and ‘extremely colourful’. As we have to attach numbers to these attributes, it is worth trying

to get a perceptually uniform scale. For example, if there is a lot of confusion in the judgment between ‘slightly

colourful’ and ‘moderately colourful’, i.e. a lot of images were rated in both categories by diﬀerent people while

there is almost no confusion in the judgement between ‘highly colourful’ and ‘extremely colourful’, this would

mean that the distance between ‘highly colourful’ and ‘extremely colourful’ is larger than the distance between

‘slightly colourful’ and ‘moderately colourful’.

We will assume that the correlation between the categories as well as the discriminal dispersion of the

categories and the samples are constant (by ‘samples’ we mean the answers of the individual test persons). We

start by building a frequency matrix where the elements {K

jg

} are the number of times the image j has been

put in category g. We deﬁne the cumulative proportion matrix with entries P

jg

as

P

jg

=

P

g

k=1

K

jk

P

m

k=1

K

jk

where m is the number of categories (m = 7). From probability P

jg

we derive the z-scores z

jg

. P

jg

and z

jg

are

related through

P

jg

=

1

√

2π

Z

∞

−z

jg

e

−

1

2

ω

2

dω.

Let t

g

be the (unknown) boundary value between the categories, and s

j

be the (unknown) scale value for each

category. The fundamental assumption underlying the scale computation is that

t

g

− s

j

= z

jg

. (1)

This can be put in matrix form as

z = X ·y (2)

y := [t

1

... t

m−1

s

1

... s

m

]

T

(3)

where z is a column vector containing all the z-scores z

jg

, X is a matrix used to make (2) equivalent to (1)

∗

and

y is the unknown. If we know y, we know the scale values and the boundaries between the scales. The scale

values s deﬁne the distances between the categories, and thus have an arbitrary absolute value. Consequently,

in order to have a solution for (2), we impose an additional contraint, namely that

X

j

s

j

= 0,

which is implemented by adding a line to matrix X and appending a 0 to vector z. The whole computation of scale

values is based on the fact that there is confusion among the observers. If there are images that get unanimous

ratings, they do not provide any scale information, and thus have to be removed from the computation. Finally,

the scale values are obtained by solving (2), thus

y = (X

T

X)

−1

· X

T

z.

4. THE METRICS

To compute a colourfulness metric, we study the distribution of the image pixels in the CIELab colour space.

6

We assume that the image colourfulness can be represented by a linear combination of a subset of the following

quantities:

1. σ

a

: The standard deviation along the a axis.

2. σ

b

: The standard deviation along the b axis.

3. σ

ab

=

p

σ

2

a

+ σ

2

b

: The trigonometric length of the standard deviation in ab space.

4. µ

ab

: The distance of the centre of gravity in ab space to the neutral axis.

5. A

ab

= σ

a

· σ

b

: A pseudo-area in ab space.

6. σ

C

: The standard deviation of Chroma.

7. µ

C

: The mean of Chroma

8. σ

1

: The largest standard deviation in ab space (found by searching the direction in the ab plane along

which the standard deviation is maximum).

9. σ

2

: The second largest (i.e. the smallest) standard deviation in ab space.

10. A

12

= σ

1

· σ

2

: the area in ab space.

∗

X is composed of 1, 0 or -1 only.

11. σ

S

: The standard deviation of Saturation, calculated as Chroma over Lightness.

12. µ

S

: The mean of Saturation.

By choosing a subset of these quantities, for example {σ

a

, σ

b

, µ

ab

}, we can express the colourfulness of the

image using a linear combination of them: Q = α

1

·σ

a

+ α

2

·σ

b

+ α

3

·µ

ab

. The parameters {α

1

, α

2

, α

3

} are found

by maximising the correlation between the experimental data and the metric, according to Section 5.

5. COMPUTING THE METRIC PARAMETERS

We want to obtain the parameter vector α (α := [α

1

···α

m

]

T

) that correlates the most with the experimental

data. To get a meaningful analysis—one that can be generalised to other images—it is important not to use the

same data in computing the correlation and in optimising the parameter set. One possibility is to use half of

our N images to compute the correlation, and the other half optimise the parameter set. Since the number of

images is quite small, we will compute the optimal parameter set using N −1 images, and use it to compute the

colourfulness of the remaining image. We will repeat this experiment N times, to obtain N colourfulness values

that are used to compute the correlation of the metric with the experimental data.

Let

ˆ

M

i

be the colourfulness computed from image i. By assuming that we are using a subset of m parameters

x

(i)

:=

h

x

(i)

1

···x

(i)

m

i

T

of image i among the parameters of Section 4, the colourfulness can be expressed as

ˆ

M

i

= α

T

x

i

.

The parameters {α

j

} are found by maximising the correlation between the other N − 1 images of the test set

and the experimental values M

exp

found through the subjective testing:

{α

j

}

i

= arg max

α

2

···α

m

P

k6=i

(

˜

C

k

− µ

˜

C

) · (M

exp

k

− µ

M

exp

)

q

P

k6=i

(

˜

C

k

− µ

˜

C

)

2

·

P

k6=i

(M

exp

k

− µ

M

exp

)

2

(4)

α

1,i

:= 1,

˜

C

k

:=

m

X

j=1

α

j

· x

(k)

j

, (5)

where µ

·

denotes the mean value of (·). Since the parameter vector α is deﬁned up to a constant factor, we set

arbitrarily α

1

:= 1.

The correlation ρ between the experimental data and the metric is found using

ρ =

P

N

k=1

(

ˆ

M

k

− µ

ˆ

M

) · (M

exp

k

− µ

M

exp

)

q

P

N

k=1

(

ˆ

M

k

− µ

ˆ

M

)

2

·

P

N

k=1

(M

exp

k

− µ

M

exp

)

2

(6)

Finally, the optimal parameter vector α is found by taking the mean value of the N parameter sets deﬁned

in (4).

α =

1

N

N

X

k=1

[α

1

···α

m

]

T

k

.

Instead of this value, we also could have taken the parameter set that maximises the correlation between the

experimental data and the metric using all images. Note that the variance of parameters α

i

gives an indication

of how stable the optimal parameter set is with respect to the choice of the images.

Parameter subset correlation metric details

σ

1

, σ

2

, µ

C

94.2% σ

1

+ 1.46 · σ

2

+ 1.34 · µ

C

σ

a

, σ

b

, µ

ab

94.0% σ

a

+ σ

b

+ 0.39 · µ

ab

σ

ab

, µ

C

94.0% σ

ab

+ 0.94 · µ

C

σ

ab

, µ

ab

93.7% σ

ab

+ 0.37 · µ

ab

σ

a

, σ

b

, µ

C

93.6% σ

a

+ 0.78 · σ

b

+ 0.72 · µ

C

σ

1

, σ

2

, µ

ab

93.5% σ

1

+ 0.81 · σ

2

+ 0.43 · µ

ab

σ

S

, µ

S

92.3% σ

S

+ 1.6 · µ

S

σ

C

, µ

C

92.1% µ

C

+ 1.17 · µ

C

A

ab

, µ

ab

88.8% A

ab

+ 7.3 · µ

ab

A

12

, µ

ab

87.1% A

12

+ 9.3 · µ

ab

Table 1. Correlation of various colourfulness metrics with the experimental data. Each line corresponds to a diﬀerent

metric, detailed in the last column. The exact formulation has been obtained by an optimisation on the correlation value.

6. RESULTS

By choosing diﬀerent subset of the attribute described in Section 4, we can try to ﬁnd the best correlate to the

image colourfulness. Table 1 summarises the results. The result range from 94% down to 87% of correlation. To

select the best metric, we have to consider several aspects: The most obvious is the correlation to the experiment.

The second is the computational cost, and the last is related to the limitation of the experiment due to our initial

choice in the selection of the 10 scenes. Provided that the CIELab space has been designed to be a uniform colour

space, it does not seem reasonable to emphasize the red-green axis over the blue-yellow axis. The optimisation

showing a preference for one of the two axis may be biased by the choice of the test images. In other words, we

prefer the parameter σ

ab

to a sum of σ

a

and σ

b

, also because σ

ab

does not depend on the arbitrary direction of

the a and b axis. For computational reasons, we avoid using σ

1

and σ

2

because they require a Singular Value

Decomposition (SVD), without delivering substantially better results. We also want to avoid using saturation

(σ

S

and µ

S

), since it over-emphasises dark areas, precisely the area that get very roughly approximated by

compression algorithms. Unfortunately, we did not include compressed images in the test set, explaining the

good performance of these parameters

†

. Finally, we propose two metrics:

ˆ

M

(1)

= σ

ab

+ 0.37 · µ

ab

(7)

ˆ

M

(2)

= σ

ab

+ 0.94 · µ

C

, (8)

where each symbol is deﬁned in Section 4. Our colourfulness metric is a linear combination of the mean and

standard deviation of the pixel cloud in the colour plane of CIELab. The

ˆ

M

(1)

metric seems more natural,

because it is a truly two-dimensional metric. It is also computationally more eﬃcient but has a slightly worse

correlation, if we consider that a 0.3% diﬀerence in correlation is a signiﬁcant diﬀerence.

7. A MORE EFFICIENT METRIC

In this section, we will try to reproduce the results of Section 6 using a computationally more eﬃcient approach.

We use a very simple opponent colour space:

rg = R − G

yb =

1

2

(R + G) − B

†

We knew from past experiences that saturation is not a good correlate when using compressed images, so we discarded

its use beforehand, but ﬁnally included it for comparison purposes.

7

The use of compressed images in the test set would

probably have conﬁrmed this argument.

Attribute M

(1)

M

(2)

M

(3)

not colourful 0 0 0

slightly colourful 6 8 15

moderately colourful 13 18 33

averagely colourful 19 25 45

quite colourful 24 32 59

highly colourful 32 43 82

extremely colourful 42 54 109

Table 2. Correspondence between the colourfulness metric, and the colourfulness attributes.

We assume that the image is coded in the sRGB colour space. By reconducting the experiment described in

section 5, we get a new colourfulness metric

ˆ

M

(3)

= σ

rgyb

+ 0.3 · µ

rgyb

,

σ

rgyb

:=

q

σ

2

rg

+ σ

2

yb

,

µ

rgyb

:=

q

µ

2

rg

+ µ

2

yb

,

where σ

·

and µ

·

are the standard deviation and the mean value of the pixel cloud along direction (·), respectively.

Surprisingly, the correlation of

ˆ

M

(3)

with the experimental data is equal to 95.3%, thus it represents a very nice

and eﬃcient way of computing the colourfulness.

8. HOW TO USE THE METRIC

The metric can be used to determine how colourfulness evolves by passing through a tone mapping or a coding

algorithm in the following ways:

∆M

ε

=

ˆ

M

p

−

ˆ

M

o

, (9)

∆M

%

=

ˆ

M

p

ˆ

M

o

, (10)

where

ˆ

M

o

is the colourfulness estimate of the original image, and

ˆ

M

p

is the colourfulness estimate of the processed

image. We would recommend the use of ∆M

ε

over ∆M

%

, but further experimentation would be necessary to

conﬁrm this argument.

To give some intuition about the metric, Table 2 summarises the ‘meaning’ of the metric. For example, a

value of

ˆ

M

(3)

= 59 means that the images is quite colourful.

9. CONCLUSIONS

We tried to introduce colour in an image quality metric scheme, and found that measuring colourfulness was

a very promising way to achieve this goal. We set up a psychophysical experiment and asked the viewers to

rate the colourfulness of an image picturing a natural scene. We then studied several metrics using the CIELab

colour space, and found a simple metric which correlates to about 94% with the experimental data. We also

proposed another metric, which is very easy to compute, and achieves an even better correlation (95%) to the

experimental data. This metric can be used to evaluate the performance of a coding scheme in real time.

We did not consider hue in our experiments. Nevertheless, a complete colour metric should take hue into

account, for example by measuring colour casts between the original and the processed image.

10. ACKNOWLEDGMENTS

We want to thank Genista Corp. for sponsoring this research. We also want to thank the Audiovisual Commu-

nication Lab. at EPFL for allowing us to use their Laboratory for the experimental tests. Finally, we want to

thank all the viewers that took part in the testing.

APPENDIX A. DEFINITION OF THE IMAGE ATTRIBUTES

This section brieﬂy deﬁnes the parameters used in Section 4

Let I

p

be the pixel values of an image in Lab space, p = 1 ···N . The image has N pixels.

I

p

:= [L

p

a

p

b

p

]

T

σ

2

a

:=

1

N

N

X

p=1

a

2

p

− µ

2

a

µ

a

:=

1

N

N

X

p=1

a

p

µ

ab

:=

q

µ

2

a

+ µ

2

b

C

p

:=

p

a

2

+ b

2

µ

C

:=

1

N

N

X

p=1

C

p

σ

2

C

:=

1

N

N

X

p=1

C

2

p

− µ

2

C

S

p

:=

C

p

L

p

µ

S

:=

1

N

N

X

p=1

S

p

.

The parameters σ

1

and σ

2

need a Singular Value Decomposition (SVD) computation. Let U and V be two

orthogonal matrices. Let I be the matrix containing the colour of all the pixels of the image.

I :=

a

1

b

1

···

a

N

b

N

T

.

The matrix I can be written as

I = U · S · V

T

,

where S is a diagonal matrix. Finally σ

1

and σ

2

are computed as

[σ

1

σ

2

] = [σ

a

σ

b

] · V

T

REFERENCES

1. N. Moroney, M. D. Fairchild, R. R. Hunt, C. Li, R. M. Luo, and T. Newman, “The CIECAM02 color

appearance model,” in IS&T/SID Tenth Color Imaging Conference, 2002.

2. C. Li, R. M. Luo, R. R. Hunt, N. Moroney, M. D. Fairchild, and T. Newman, “The performance of

CIECAM02,” in IS&T/SID Tenth Color Imaging Conference, 2002.

3. M. D. Fairchild and G. M. Johnson, “Meet iCAM: A next-generation color appearance model,” in IS&T/SID

Tenth Color Imaging Conference, 2002.

4. S. Winkler and R. Campos, “Video quality evaluation for internet streaming applications,” in Proceedings of

IS&T/SPIE: Human Vision and Electronic Imaging VIII, IS&T/SPIE, 5007, (Sant Clara, CA, USA), 2003.

5. P. G. Engeldrum, Psychometric Scaling: A Toolkit for Imaging Systems Development, Imcotek Press, 2000.

6. R. Hunt, Measuring Colour, Fountain Press, England, 3 ed., 1998.

7. S. Yendrikhovskij, F. Blommaert, and H. de Ridder, “Perceptually optimal color reproduction,” in Proceedings

of SPIE: Human Vision and Electronic Imaging III, 3299, pp. 274–281, (San Jose, CA, USA), 1998.

(a) (b)

(c.2)

(c) (d)

Figure 1. Images used in the experiment. (a),(b) and (c) are used in the scaling experiment. (d) is shown as example

before the experiment. (c.2) is has been obtained from (c) by linearly reducing the chroma in Lab space—the blue/purple

colour shift that arises in the operation should not aﬀect the results since we are comparing images that are diﬀerent from

each other. (a),(b) and (c) are taken from the Corbis royalty free collection.