Content uploaded by Stefano Ricciardi

Author content

All content in this area was uploaded by Stefano Ricciardi on Dec 13, 2013

Content may be subject to copyright.

1

TIFS: A Hybrid Scheme Integrating PIFS

and Linear Transforms

Michele Nappi, Daniel Riccio

Dipartimento di Matematica e Informatica

Universit

`

a di Salerno,

84084 Fisciano (SA), Italy

{mnappi,driccio}@unisa.it

Abstract—Many desirable properties make fractals a powerful

mathematic model applyied in several image processing and

pattern recognition tasks: image coding, segmentation, feature

extraction and indexing, just to cite some of them. Unfortunately,

they are based on a strong asymmetric scheme, so suffering from

very high coding times. On the other side, linear transfoms

are quite time balanced, allowing to be usefully integrated

in real-time applications, but they do not provide comparable

performances with respect to the image quality for high bit

rates. Owning to their potential for preserving the original

image energy in a few coefﬁcients in the frequency domain,

linear transforms also known a widespread diffusion in some

side applications such as to select representative features or to

deﬁne new image quality measures. In this paper, we investigate

different levels of embedding linear trasforms in the fractal

coding scheme. Experimental results have been organized as to

point out what is the contribution of each embedding step to the

objective quality of the decoded image.

I. INTRODUCTION

The literature about the fractal image compression uninter-

ruptedly grown-up starting from the preliminar deﬁnition of

Partitioned Iterated Function System (PIFS) due to Jacquin in

1989; most of the interest in fractal coding is due to its side

applications in ﬁelds such as image database indexing [6] and

face recognition [14]. These applications both utilize some sort

of coding, and they can reach a good discriminating power

even in the absence of high PSNR from the coding module.

The majority of works on the image fractal compression sets

the speed-up of the coding process, while preserving desirable

properties of the fractal coding such as high compression rate,

fast decoding and scale invariance, as its main goal. Many

different solutions have been proposed to speed up the coding

phase [3,5,10–12,19], for instance modifying the partitioning

process or providing new classiﬁcation criteria or heuristic

methods for the range/domain matching problem. All these ap-

proaches can be grouped in three classes: classiﬁcation meth-

ods, feature vectors [22] and local search. Generally, speed-up

methods based on nearest neighbour search by feature vectors

outperform all the others in therms of decoded image quality

at a comparable compression rate, but they often suffer from

the high dimensionality of the feature vector; the Saupe’s

operator represents a suitable example.To cope with this,

dimension reduction technique are introduced. Saupe reduced

the dimension of the feature vector by averaging pixels, while

in [24] DCT is used to cut-out redundant information. In the

same way, also linear transforms have been widely exploited to

extract representative features or to codify groups of pixels in

image indexing and compression applications. Indeed, Linear

transforms form the basis of many compression systems as

they de-correlate the image data and provide good energy

compaction. For example, the Discrete Fourier Transform

(DFT) [25] is used in many image processing systems, while

Discrete Cosine Transform (DCT) [25] is used in standards

like JPEG, MPEG and H.261. Still others are Walsh-Hadamard

transforms (WHT) [25] and Haar Transforms (HT) [25]. In

particular, linear transforms have been matter of study also

in the ﬁeld of objective quality measurs deﬁnition. The HVS,

based on some preliminary DCT ﬁltering is just an exam-

ple [17], but also magnitude and phase of the DFT coefﬁcients

have been used to deﬁne new objective quality measures [1].

This is motivated by that standard objective measures such as

the Root Mean Square Error (RMSE) and Picked Signal To

Noise Ratio (PSNR) are very far from the human perception

in some cases. Hence, this paper sets as its main goal to

investigate the ways of embeddeding a generic linear transform

T into the standard PIFS coding scheme. In more details,

at ﬁrst linear properties of T are exploited to dramaticaly

reduce computational costs of the coding phase, by arranging

its coefﬁcients in a suitable way. Subsequently, the RMSE,

commonly used to upperbound the collage error, is repleaced

by a new objective distance measure based on T coefﬁcients.

Then, T coefﬁcients corresponding to high frequencies are

used to compensate the information lost by the fractal scheme

in high compressed regions.

II. THEORETICAL CONCEPTS

Inorder to shed light on further discussions about the hybrid

scheme proposed in the paper, it may be sound to draw

the reader’s attention to some basic concepts about fractal

compression and linear transforms.

A. Partitioned Iterated Function Systems

PIFS consist in a set of local afﬁne contractive transfor-

mations, which exploite the image self-similarities to cut-out

redundancies, while extracting salient features. In more details,

given an input image I, it is partitioned into a set R =

©

r

1

, r

2

, . . . , r

|R|

ª

of disjoint square regions of size |r| × |r|,

named ranges. Another set D =

©

d

1

, d

2

, . . . , d

|D|

ª

of larger

2

regions is extracted from the same image I. These regions are

called domains and can overlap. Their size is |d| × |d|, where

usually |d| = 2 |r|. Since a domain is quadruple sized respect

to a range, it must be shrunk by a 2 × 2 average operation on

its pixels. This is done only once, downsampling the original

image and obtaining a new image that is a quarter of the

original. An overall representation of the PIFS compression

scheme is reported in Fig. 1.

The image I is encoded range by range: for each range r, it

is necessary to ﬁnd a domain d and two real numbers α and β

such that

min

d∈D

½

min

α,β

kr − (αd + β)k

2

¾

. (1)

Doing so minimizes the quadratic error with respect to the

Euclidean norm. It is customary to impose that |α| ≤ 1 in

order to ensure convergence in the decoding phase. The inner

minimum on α and β is immediate to compute by solving a

minimum square error problem, obtaining

α =

X

1≤i≤|r|

X

1≤j≤|r|

(r

i,j

−

r)(d

i,j

− d)

X

1≤i≤|d|

X

1≤j≤|d|

(d

i,j

− d)

2

(2)

β =

r − αd, (3)

where |r| and |d| are the dimension of ranges and shrunken

domains, while

r and d are the mean value of the range r and

the domain d, respectively. The outer minimum on d, however,

requires an exhaustive search over the whole set D, which is

an impractical operation. Therefore, ranges and domains are

classiﬁed by means of a feature vectors in order to reduce

the cost of searching the domain pool: if the range r is being

encoded, only the domains having a feature vector close to

that of r are considered.

Input image

KD-Tree

Classification

segmentation

Domains

Ranges

Coding

KD-Tree

List of candidate domains

Range

D

o

m

a

i

n

se

a

r

c

h

Error

estimation

RMSE

4.5

6.2

2.1

Best

domain

Fig. 1. The architecture of our fractal coder.

B. Linear Transforms

A Linear Transform T is called linear if it has two mathe-

matical properties:

T (x + y) = T (x) + T (y) additivity

T (αx) = αT (x) homogeneity

A third property, shift invariance, is not a strict require-

ment for linearity, but it is a mandatory property for most

image processing techniques. These three properties form the

mathematics of how linear transformation theory is deﬁned

and used. Homogeneity and additivity play a critical role in

linearity, while shift invariance is something on the side. This

is because linearity is a very broad concept, encompassing

much more than just signals and systems. In other words,

when there are no signals involved, shift invariance has no

meaning, so it can be thought of as an additional aspect of

linearity needed when signals and systems are involved. The

linear transform domain features are very effective when the

patterns are characterized by their spectral properties; so, in

this paper, the feature extraction capability of the Discrete

Fourier Transform (DFT), the Discrete Cosine Transform and

the Haar transform (HT) are investigated. They are formally

deﬁned as follows:

DFT: v(k) =

N−1

X

n=0

x[n]e

−2πikn

N

, with k = 0, 1, . . . , N

DCT: v(k) = α(k)

n−1

X

n=0

x[n]cos

µ

nkπ

N − 1

¶

, with

k = 0, 1, . . . , N,

α(0) =

p

1/N and

α(k) =

p

2/N, k 6= 0

HT:

(

v(0) =

R

1

0

x(n)H

0

(n)dn

v(j, k) =

R

1

0

x(n)H

jk

(n)dn

, with

H

jk

(n) = 2

j/2

H(2

j

n − k), j ≥ 0, k = 0, 1, . . . , 2

j

− 1

H(0) = 1

[0,1)

characteristic function on [0, 1)

H(n) =

½

1 0 < n < 1/2

−1 1/2 ≤ n < 1

III. IMPROVING QUALITY

A major problem in evaluating lossy techniques is the

extreme difﬁculty in describing the type and amount of

degradation in reconstructed images. Because of the inherent

drawbacks associated with the subjective measures of image

quality, there has been a great deal of interest in developing

quantitative measures that can consistently be used as sub-

stitute. All these measures have been largely used to assess

the quality of the whole image after a coding process has

been applied on; in other words the original image is com-

pressed/decompressed by means of an encoder and than the

overall amount of distortion introduced by the coding scheme

is measured. Thus, objective measures have represented an

effective way to compare different coding schemes in terms

of percentage of distortion introduced for a ﬁxed compression

ratio. The key idea is then to embedd quality measures into the

coding process, not curbing them to be a sheer analysis tool.

The compression scheme we adopted for this study, which is

represented in Fig. 1, lays itself open to a direct replacement

of the RMSE by other quality measures.

3

A. New quality measures

Many objective quality measures [1,7], have been deﬁned

to replace subjective evaluations by retaining, as much as

possible, the ﬁdelity with the human perception of image

distortions introduced by the coding schemes. The most com-

mon measures are undoubtedly the RMSE (Root Mean Square

Error) and the PSNR (Peak Signal to Noise Ratio) [1]. They

owe their wide spread to that they work well on the average

by showing a very low computational cost. However, there

are cases in which the quality estimates given by the PSNR

are very far from the human perception (see Fig. 2) and this

led many researchers to deﬁne new quality metrics providing

better performances in terms of distortion measurement even

if at a higher computational cost. Some examples are given by

the Human Visual System [17] or the FFT Magnitude Phase

Norm [1].

(a) (b)

Fig. 2. Two picture with the same objective quality (PSNR 26.5 dB), but

very different subjective quality.

Measures in Table I are deﬁned in the spatial domain and

they are all discrete and bivariate.

1

TABLE I

QUALITY MEASURES

Name Deﬁnition

Average Difference AD =

1

n

2

n−1

X

j=0

n−1

X

k=0

¯

¯

¯

R(j, k) −

ˆ

R(j, k)

¯

¯

¯

Correlation Quality CQ =

P

n−1

j=0

P

n−1

k=0

¯

¯

¯

R(j,k)

2

−

ˆ

R(j,k)R(j,k)

¯

¯

¯

P

n−1

j=0

P

n−1

k=0

R(j,k)

Image Fidelity IF =

P

n−1

j=0

P

n−1

k=0

[

R(j,k)−

ˆ

R(j,k)

]

2

P

n−1

j=0

P

n−1

k=0

R(j,k)

2

Maximum Difference MD = max

n

¯

¯

¯

R(j, k) −

ˆ

R(j, k)

¯

¯

¯

o

N Cross Correlation NK =

¯

¯

¯

¯

¯

1 −

P

n−1

j=0

P

n−1

k=0

[

R(j,k)

ˆ

R(j,k)

]

P

n−1

j=0

P

n−1

k=0

R(j,k)

2

¯

¯

¯

¯

¯

Structural Content SC =

¯

¯

¯

¯

¯

1 −

P

n−1

j=0

P

n−1

k=0

R(j,k)

2

P

n−1

j=0

P

n−1

k=0

ˆ

R(j,k)

2

¯

¯

¯

¯

¯

RMSE RMSE =

1

n

2

n−1

X

j=0

n−1

X

k=0

h

R(j, k) −

ˆ

R(j, k)

i

2

PMSE P M SE =

P

n−1

j=0

P

n−1

k=0

[

R(j,k)−

ˆ

R(j,k)

]

2

n

2

max

j,k

[R]

2

Osaka Plots formally deﬁned in [17]

On the contrary, the most signiﬁcant examples of image

quality measures deﬁned in the frequency domain are the HVS

and the DFT magnitude/phase norm.

1

R(j, k) and

ˆ

R(j, k) denote the samples of original and approximated

range block.

Human Visual System Norm (HVS): few models of the

HVS have been developed in literature; in [17] dealing with

the Discrete Cosine Transform, Nill has deﬁned his function

for the model as a band-pass ﬁlter with a transfer function in

polar coordinates. Therefore the image quality is calculated

on pictures processed through such a spectral mask and then

inverse discrete cosine transformed.

FFT Magnitude Phase Norm (FFT-MP): A spectral

distance-based measures is the Fourier magnitude and/or

phase spectral discrepancy on a block basis [1]. In general,

while the mean square error is among the best measures for

additive noise, local phase-magnitude measures are more

suitable for coding and blur artifacts. In particular, the FFT

magnitude/phase norm is most sensitive to distortion artifacts,

but at the same time least sensitive to the typology of images.

Both these measures have drawbacks. The HVS is to much

complex to be proﬁtably used in several applications, while

the FFT based distance has two main limitations: a) the phase

is signiﬁcantly smaller than the magnitude and its contribution

to the overall distance value is made even more negligible by

a very small factor λ, b) the n-norm and the arctan, needed to

compute magnitude and phase, are computationally intensive

to be calculate, in particular for complex coefﬁcients.

Hence it appears that fractal image coding can signiﬁcantly

proﬁt of a simpler image quality measure exploiting properties

of linear transforms. In more details, we can deﬁne such

distans as follows.

Siano Γ(u, v) e

ˆ

Γ(u, v) i coefﬁcienti della trasfor-

mata T applicata ad un blocco dell’immagine origi-

nale e dell’immagine compressa rispettivamente. Nella

deﬁnizione della misura di qualit necessario distinguere

due casi, a seconda che i coefﬁcienti della trasformata

abbiano solo componenti reali o presentino anche com-

ponenti immaginarie. Nel secondo caso, infatti, bisogna

ricondurre la coppia di componenti ad un unico valore. In

letteratura si soliti ottenere questo risultato, considerando,

in alternativa alla coppia di valori, il valore assoluto (o

magnitudo) del coefﬁciente complesso. In questo caso, si

scelta una via differente, deﬁnendo un operatore Ψ che

permetta di gestire entrambi i casi. Nel caso in cui Γ(u, v)

presenti solo componenti reali Ψ(u, v) = Γ(u, v), altrimenti

Ψ deﬁnito come segue:

Ψ(u, 2v) = Re (Γ(u, v))

Ψ(u, 2v + 1) = Im (Γ(u, v))

La Figura 3 fornisce una rappresentazione graﬁca di come

i coefﬁcienti vengono riorganizzati in un nuovo blocco.

Questa scelta dettata dal fatto che la trasformazione

discreta di fourier, l’unica nel nostro caso a fornire coefﬁ-

cienti complessi, ha la propriet di concentrare la maggior

parte dell’informazione utile nei primi coefﬁcienti situati

in prossimit dell’angolo superiore sinistro della matrice.

Thus, the LT distance function can be deﬁned as follows:

LT =

1

n

2

n−1

X

u=0

n−1

X

v=0

£

Ψ

R

(u, v) − Ψ

ˆ

R

(u, v)

¤

2

.

(4)

4

Im

00

*

Im

10

*

Im

20

*

Im

10

*

n

Im

10

*

n

Im

11

*

n

Im

12

*

n

Im

11

*

nn

Im

2

1

0

*

n

Im

2

1

1

*

n

Im

2

1

2

*

n

Im

2

1

1

*

n

n

Re

00

*

Re

10

*

Re

20

*

Re

10

*

n

Im

00

*

Im

10

*

Im

20

*

Im

10

*

n

Re

2

1

0

*

n

Re

2

1

1

*

n

Re

2

1

2

*

n

Re

2

1

1

*

n

n

Re

00

*

Re

10

*

Re

20

*

Re

10

*

n

*Re

*

Im

*

<

Re

10

*

n

Re

11

*

n

Re

12

*

n

Re

11

*

nn

Fig. 3. Reorganization of real and imaginary components by the Ψ operator.

B. Embedding quality measures in PIFS

In PIFS coding the whole image is partitioned in a sets of

ranges (as described in Section I). For each range, the coding

scheme looks for an approximating domain to be assigned to,

while the domain is mapped into the corresponding range by

an afﬁne transformation. For a given range R, PIFS associates

that domain providing the smallest approximation error in

a root mean square sense, so exactly in that point it is

possible to embedd different quality measure to decide the

best range/domain association. The key idea underlying to

this strategy is that quality measures outperforming the RMSE

from a subjective point of view can better the subjective

appearence of the whole image by improving the quality

of each range. In other words, in the original deﬁnition of

the PIFS coding scheme a proposed by Jaquin, the range is

approximated by the transformation

ˆ

R = α·D+β by minimiz-

ing the error function kR − (α · R + β)k

2

. In this paper, this

function has been replaced by 10 alternative functions. There

are two different ways of embedding new quality measures

in PIFS coding. In the former, α and β are even computed

by solving a mean square error problem while the distance

between the original and the transformed range is measured

by a new quality measure f (R,

ˆ

R). In the latter, formulas

to compute α and β are rewritten to properly minimize the

distance LT of the (4), so that the coding scheme may be

specialized even more by nesting this quality measure in a

deeper level. Thus, starting from the (4) and knowing that

Γ

ˆ

R

(u, v) = T (

ˆ

R) and

ˆ

R = α·D+

¯

β, it come out that Γ

ˆ

R

(u, v)

= T (

ˆ

R) = T (α · D +

¯

β) = α · Γ

D

+ B, where:

B =

½

β if u = v = 0

0 otherwise

Let be P the set of the all pairs {(u, v)}−(0, 0) with u, v =

0, . . . , n − 1, LT can be rewritten as follows:

LT =

1

n

2

[Ψ

R

(0, 0) − (α · Ψ

D

(0, 0) + β)]

2

+

P

(u,v)∈P

[Ψ

R

(u, v) − α · Ψ

D

(u, v)]

2

,

(5)

reorganizing all terms with respect to α and β, one obtains:

LT =

1

n

2

"

α

2

n−1

X

u=0

n−1

X

v=0

Ψ

D

(u, v)

2

+

−2α

n−1

X

u=0

n−1

X

v=0

Ψ

R

(u, v)Ψ

D

(u, v) + 2αβΨ

D

(0, 0)+

−2βΨ

R

(0, 0) + β

2

+

n−1

X

u=0

n−1

X

v=0

Ψ

R

(u, v)

2

#

(6)

The values of α and β minimizing LT are given by:

α =

RD − Ψ

R

(0, 0)Ψ

D

(0, 0)

D

2

− Ψ

D

(0, 0)

2

β =

D

2

Ψ

R

(0, 0) − RDΨ

D

(0, 0)

(D

2

− Ψ

D

(0, 0)

2

)

(7)

where D

2

=

n−1

X

u=0

n−1

X

v=0

Ψ

D

(u, v)

2

and RD =

n−1

X

u=0

n−1

X

v=0

Ψ

R

(u, v)Ψ

D

(u, v).

An important observation made in embedding linear trans-

form based measures in PIFS coding is that LT can give

PSNR values larger than that obtained from the RMSE even

though PSNR is maximized where the RMSE reaches its

minimum). The explanation of way this happens resides in the

range/domain matching process. As the coder ﬁnd a domain

giving an approximation error lower than a ﬁxed threshold,

the domain pool search stops and the range is coded by this

domain. The LT metric induces the coder to a thorough domain

search, since it is more selective than the RMSE and provides

a little approximation error (lower than the ﬁxed threshold)

only for range/domain comparisons which results in small

RMSE values; on the other hand, the number of range/domain

matchings for each range is upperbounded by a ﬁxed constant

l (50 in our case), so that the coding time is not signiﬁcantly

affected by additional comparisons. Fig. 4 reports a graphical

example of this kind of situations.

6.5

12.6

5.7

8.6

9.4

15.2

3.5

6.8

6.5

12.6

5.7

8.6

9.4

15.2

3.5

6.8

7.2

10.1

1.8

4.2

Stop

Stop

RMSE search

LT search

RMSE

LT

RMSE

LT

Threshold

Th = 5.0

Fig. 4. LT and RMSE searching for a given range.

The main grounds can be found in that image distortions are

uniformly distributed in pictures coded with this measure, from

a subjective point of view. The reason of way this happens

5

resides in the image partitioning process. In other words, a

range is coded by the best approximating domain if the ap-

proximation error is lower than a given threshold, subsequently

partitioned otherwise. The RMSE and LT generally give dif-

ferent quad-tree partitioning for the same image. Particularly,

the LT partitioning is more balanced favouring midsize blocks.

Figure 5 shows two different quad-tree decompositions for the

lena image, the former is obtained from the LT measure while

the latter from the RMSE.

LT+FFT quad-tree

RMSE quad-tree

Fig. 5. LT and RMSE quad-tree partitioning of the Lena image.

A further example of the improvements in subjective quality

level provided by the LT measure is given in Figure 6 that

reports the eye regions of the mandrill image coded by the LT,

HSV, RMSE and PMSE quality measures at a compression

ratio of 20:1. This picture underlines as the overall quality

reached by the LT overcomes all the others.

LT+FFT

HSV

RMSE

PMSE

Fig. 6. Eyes of the mandrill decoded with LT, HSV, RMSE and PMSE

quality measure at a compression ratio of 20:1.

C. Codiﬁca dell’informazione residua

L’integrazione delle trasformate lineari all’interno

dello schema di codiﬁca frattale possibile anche ad un

terzo livello (i primi due sono l’accelerazione del processo

di codiﬁca e la sostituzione della misura di qualit), che

quello della codiﬁca dell’informazione residua. Nello

schema ibrido proposto, la trasformata T viene applicata

a tutti i domain all’interno del pool durante la fase di

indicizzazione, mentre durante la fase di codiﬁca essa

viene applicata a ciascun range; ci implica che in fase

di codiﬁca T (r) e T (d) sono entrambi noti, mentre la

trasformazione T (ˆr) del range approssimato ˆr, pu essere

ricavata senza alcun costo computazionale aggiuntivo

sfruttando le propriet di linearit della trasformata T gi

discusse nelle sezioni II. I coefﬁcienti della trasformata

T possono essere ulterioremente utilizzati per conservare

parte dell’energia iniziale persa dalla codiﬁca frattale. In

particolare, poich ˆr calcolato in modo da minimizzare

l’errore di approssimazione per il range r, ha senso

supporre che δ = T (ˆr) − T (r) sia caratterizzato da

valori mediamente piccoli, quindi rappresentabili con un

contenuto numero di bit. Inoltre, in fase di decodiﬁca

il range approssimato ˆr ottenuto mediante decodiﬁca

frattale, per cui noto δ e ricalcolato T (ˆr) sarebbe

possibile recuperare r come T (ˆr) + δ. Risulta evidente

quindi che la deﬁnizione di una strategia efﬁciente che

permetta di immagazzinare se non tutta l’infomazione

contenuta in δ, almeno una sua parte, pur mantenendo un

vantaggioso rapporto qualit/bit-rate rappresenta il cuore

della discussione che segue. In linee generali, dunque, lo

schema di codiﬁca del residuo il seguente:

Codiﬁca

a) calcolo del δ

ij

= Γ

ij

r

− Γ

ij

ˆr

, 0 ≤ ij ≤ n

b) se |δ

ij

| < ǫ poni δ

ij

= 0

c) ∀i, j | δ

ij

> 0 scrivi δ

ij

in forma compatta

Decodiﬁca

a) decodiﬁca dell’mmagine I mediante PIFS

b) per ciascun range ˆr calcola Γ

ˆr

c) leggi δ dal ﬁle

d) sostituisci ˆr con T

−1

(Γ

ˆr

+ δ) nell’immagine I

L’aspetto che riveste maggior importanza costituito

dalla rappresentazione in maniera compatta dei valori δ

ij

all’interno del ﬁle. Di seguito discusso l’approccio utiliz-

zato nello schema proposto. Il primo passo nel processo di

codiﬁca del residuo la scelta della soglia ǫ. Pi in dettaglio,

ciascun δ

ij

pesa in termini di bit in maniera proporzionale

al suo valore; ci implica che un valore elevato della soglia ǫ

lascer inalterati un maggiore numero di δ

ij

da scrivere nel

ﬁle, che si traduce una migliore qualit al costo di un mag-

giore bit-rate. Il criterio adottato per determinare la soglia

ǫ, in questo approccio, dipende fortemente dal modo in cui

vengono quantizzatti i valori δ

ij

. In seguito sar spiegato in

maggior dettaglio che il loro modulo |δ

ij

|, viene approssi-

mato con la quantit 3/2ǫ. Questo implica che la soglia ǫ

deve essere scelta in modo da massimizzare il numero di

coefﬁcienti δ

ij

che cadono intorno a tale valore. Va inoltre

tenuto conto del fatto che quanto maggiore il modulo di

un coefﬁciente δ

ij

tanto pi importante la sua corretta

6

codiﬁca. In questa ottica possibile deﬁnire un criterio per

la determinazione automatica del valore ǫ. Innanzitutto,

costruiamo un istogramma H, dove H(i) indica quanti

δ

ij

in tutta l’immagine hanno assunto valore 3/2|δ

ij

| =

i; dopodich calcoliamo ǫ = argmax

k

P

2k

i=k

(iH(i)). Un

maggior numero di parole va invece speso per questioni

pi delicate, quali la quantizzazione e la codiﬁca dei valori

δ

ij

. In fase di codiﬁca, dopo i passi a) e b), all’interno

di ciascun blocco sono presenti solo valori maggiori della

soglia ǫ. Dunque se un generico blocco non contiene alcun

δ

ij

> 0 esso non ha informazioni signiﬁcative sul residuo e

non viene ulteriormente trattato, mentre all’interno del

ﬁle viene scritto un unico bit uguale a zero. In caso

contrario, nel ﬁle viene scritto un primo bit impostato

ad 1. Dato che, molte righe del blocco δ possono essere

vuote, distinguere le righe i contenenti δ

ij

6= 0 da quelle

con δ

ij

= 0, ∀j permette un ulteriore risparmio di bit-

rate. Pi in dettaglio, per ciascuna riga i del blocco δ

viene scritto un bit b all’interno del ﬁle, dove b = 1 se

∃j | δ

ij

6= 0 e b = 0 altrimenti, per un totale di n bit

(n la taglia del lato del blocco δ). Inoltre, per ciascun

δ

ij

, si conserva solo il segno e la colonna j. Infatti, si

osserva che nel blocco δ si ha che |δ

ij

| > ǫ, ∀i, j, mentre

per ǫ sufﬁcientemente grandi si ha che |δ

ij

| < 2ǫ per la

maggior parte dei δ

ij

. Sulla base di queste osservazioni,

nell’approccio proposto i δ

ij

sono stati approssimati con

un valore costante 3/2ǫ (il centro dell’intervallo [ǫ, 2ǫ])

piuttosto che quantizzati singolarmente; ne scaturisce che

per ciascun coefﬁciente rimane da salvare nel ﬁle solamente

l’informazione relativa alla colonna j. Trattandosi per il

blocco δ di una matrice sparsa, ciascun elemento viene

salvato come una coppia di coordinate (i, j); tuttavia nel

caso speciﬁco, avendo fatto distinzione fra righe vuote

e righe con almeno un coefﬁciente non nullo, possibile

ridurre ulteriormente il bit-rate. Infatti, i valori δ

ij

ven-

gono letti dal ﬁle consecutivamente, una riga dopo l’altra,

possibile quindi salvare nel ﬁle solo la colonna j di ciascun

coefﬁciente, dato che per i coefﬁcienti sulla stessa riga, le

colonne sono in ordine crescente. Questo implica che se,

mentre si legge una sequenza di colonne, quella appena

letta minore della precedente, si tratta di un indice di

colonna di una riga successiva; mentre la riga corretta a

cui questa appartiene pu essere recuperata come la prima

non vuota (per la quale stato salvato precedentemente

un bit a 1). Si noti infatti che non necessariamente si

tratter della riga immediatamente successiva i + 1 essendo

frequenti le righe completamente vuote. In conclusione, per

ciascun blocco contenente informazione residua vengono

salvati solamente 1 + n + K(log

2

(n)), dove K il numero

di δ

ij

6= 0, altrimenti viene salvato solo un bit aggiuntivo

impostato a 0.

IV. EXPERIMENTAL RESULTS

Tests have been conducted on a dataset of twenty images,

twelve of them coming from the waterloo bragzone standard

database [15] and the remaining eight from the web. A large

variability in testing conditions has been ensured by selecting

test images containing patterns, smooth regions and details.

They are all 8-bit grayscale images at a resolution of 512×512

pixels. The performance of the algorithm has been assessed

under different points of view. The main aim of the test is

to underline the efﬁciency of the LT based feature vector and

the improvements given by LT based quality measures. The

compression ratio has been calculated as the ratio between

the original image size and the coded image size. Because

of the partial reversibility of the coding process, the fractal

compression of the image adds noise to the original signal.

Less added noise means greater image quality, and therefore

a better algorithm. Noise is usually measured by the Peak

Signal-to-Noise Ratio (PSNR), which in dB can be computed

as follows:

PSNR = 10 · log

10

Ã

M · N · 255

2

X

m,n

(s

m,n

− s

m,n

)

2

!

,

where M and N are image width and height, 255 is the

maximum pixel value, s

m,n

is the pixel value in the original

image and

s

m,n

is the corresponding pixel in the decoded

image. In order to further assess the performance of the hybrid

scheme, we also compared it with the Saupe’s algorithm [20,

21].

In this ﬁrst experiment, given a test image, it is decoded at

different compression ratios and corresponding PSNR values

are computed. This is repeated for all the quality measures

from Section III-A. Figure 7 reports the mean curves over

all test images. According to their performances in terms of

PSNR, quality measures can be grouped in three main classes:

• class I: NK, CQ, SC, HOSAKA PLOTS;

• class II: IF, AD, MD, PMSE;

• class III: RMSE, HSV, MF.

The measures in class I provide very poor performances

from an objective point of view, with images showing coding

artifacts also for little compression ratios. Class II measures

always outperform class I, but the PSNR is still lower than the

one obtained from the RMSE, mainly when the compression

ratio increases. On the contrary, measures in class III show

the best PSNR curves, with FFT magnitude measure (MF)

outperforming the RMSE in most cases, while HSV and

RMSE are almost comparable. From Fig. 7, it obviously

come out that CQ, Hosaka, NK and SC quality measures

provide very scarce performances, that represents a further

conﬁrmation of that measures from the group I are not

effective at all when applied into the PIFS coding. Notice

that excepting for the Hosaka Plots, none of the measures in

group I is based on the difference between the original and

the approximated ranges, but products and ratios are usually

involved instead, resulting in unstable quality measures when

small values are to the denominator. Fig. 7 also point out

that quality measures belonging to the group II have quite

comparable performances from an objective point of view.

They show better performances than the ﬁrst group, but which

are not satisfactory enough. The main limitation of the AD and

MD could be seen in that they found on the absolute difference

between the original and the approximated range values, which

7

not exalt enough differences as well as the square difference

does. On the contrary, the substantial drawback of the IF and

PMSE is the ratio with the values of the original range that may

be near to zero resulting in a very large distance. Therefore,

the group III still represents the set of best candidate measures

to be embedded into the PIFS scheme. In particular HSV

and RMSE are almost comparable in performances, while

FFT-MP signiﬁcantly outperforms both. There are two main

reasons motivating the superiority of the FFT-MP: a) the FFT

transform retains the most of the image information in its ﬁrst

coefﬁcients, which get it more robust with respect to small

changes in details than the RMSE, by principally characteriz-

ing low frequencies; b) the ease in computing it by summing

on square differences. Indeed, even though the HSV is based

on the DCT (Discrete Cosine Transform) that often provides

better performances in several image processing applications

(image coding, ﬁltering, indexing), it is less effective than

FFT-MP, probably due to the complexity of the model.

5 10 15 20 25 30 35 40 45 50

27

28

29

30

31

32

33

34

35

36

37

38

Mean PSNR over all the images

Comprassion Ratio

PSNR

AD

CQ

HOSAKA

HSV

IF

MD

FFT−MP

NK

PMSE

RMESE

SC

Fig. 7. Average PSNR curves over all the test images.

Nel secondo esperimento le tre varianti della nuova

misura di qualit basata sulle propriet delle trasformazioni

lineari, sono state messe a confronto con la misura RMSE

utilizzata nel modello di base della codiﬁca frattale. I

risultati presentati in Fig. 8 sottolineano X aspetti di

particolare interesse. In primo luogo, le prestazioni for-

nite dalla trasformata di Haar, sono particolarmente sca-

denti; ci probabilmente dovuto alla particolare semplicit

della trasformazione che, in questo caso, non cattura

nei suoi coefﬁcienti sufﬁciente informazione utile vagliare

signiﬁcativamente la quantit di distorsione introdotta

dall’approssimazione range/domain. D’altro canto, emerge

invece che RMSE e LT+DCT forniscono prestazioni in

termini di PSNR pressocch comparabili; ci costituisce

una ulteriore conferma di quanto gi asserito preceden-

temente riguardo al modello HSV, ossia che l’applicazione

della trasformata coseno non portano necessariamente ad

un miglioramento oggettivo della qualit dell’immagine,

sebbene da un punto di vista soggettivo i risultati sem-

brano essere migliori rispetto al RMSE. Ancora una

volta, dunque, LT+FFT sembra essere la scelta migliore.

Sebbene, per valori piccoli del rapporto di compressione 1 :

1−8 : 1 essa sia leggermente al disotto del RMSE, per valori

superiori si ottiene un costante miglioramento rispetto

a quest’ultimo in termini di PSNR. Sperimentalmente si

osservato che tenendo separate la parte reale e la parte

immaginaria dei coefﬁcienti, nella maniera esposta in III-

A, i risultati sono migliori rispetto alla misura FFT-MP

sia da un punto di vista oggettivo che soggettivo; ci si

spiega soprattutto col fatto che LT+FFT per come deﬁnita

conserva i segni dei singoli coefﬁcienti e non fonde insieme

le informazioni provenienti da entrambe le componenti

di ciascun valore complesso, cosa che avviene invece

in FFT-MP, la quale ricorre al modulo dei coefﬁcienti

complessi. Inoltre, tagliando la parte del blocco contenente

i coefﬁcienti pi piccoli, relativi alle alte frequenze, essa

conserva solo l’informazione realmente utile alla stima

della qualit visiva dell’approssimazione.

0 5 10 15 20 25 30 35 40 45 50

28

29

30

31

32

33

34

35

36

37

38

Average PSNR/CR curve

Compression Ratio

PSNR

LT+DCT

LT+FFT

LT+Haar

RMSE

Fig. 8. Average PSNR performance of RMSE and LT based quality measures

(FFT, DCT, Haar) over all the test images.

Dalla formula dell’equazione 5 si evince che LT si

annulla quando T (α · D +

¯

β) → T (R) ossia quando

α·D+

¯

β → R, che la stessa condizione alla base del RMSE,

per cui, come ci si aspetterebbe, LT (R, α · D +

¯

β) → 0

quando RMSE(R, α · D +

¯

β) → 0. In particolare, da un

punto di vista sperimentale si osservato che, per valori

sufﬁcientemente piccoli del RMSE (< 30 nel nostro caso),

i valori di α e β calcolati con le formule (2) e (7) tendono

a coincidere quando RMSE(R, α · D +

¯

β) → 0. Tuttavia, i

valori forniti dalle formule (7) danno PSNR leggermente pi

bassi, ma con una qualit soggettiva in alcuni casi migliore.

La Figura 9 mostra due dettagli dell’immagine di test

Barbara codiﬁcata con le tre varianti della misura basata

su LT (DCT, FFT e Haar) e con la misura RMSE.

Nell’ultimo esperimento stato valutato il contributo

offerto dalla codiﬁca dell’informazione residua mediante

la tecnica descritta in III-C. La Figura 10 mostra i

valori medi di PSNR/CR ottenuti sulle 20 immagini di

test; ciascun quadrante della ﬁgura riporta i risultati

ottenuti con una delle quattro misure senza la codi-

8

L

T

+

F

F

T

L

T

+

H

a

ar

R

M

S

E

L

T

+

D

C

T

Fig. 9. Comparison between LT based quality measures and the RMSE

by a magniﬁcation of two different regions from the Barb image (with the

compression ratio ﬁxed at 12:1).

ﬁca dell’informazione residua, con l’informazione residua

codiﬁcata tramite la DCT o con l’informazione residua

codiﬁcata mediante trasformata di Haar. In particolare si

osserva che il contributo dato dalla codiﬁca del residuo

signiﬁcativo soprattutto per rapporti di compressione

relativamente bassi. Questo da inputarsi alla natura

stessa del metodo utilizzato, in quanto al crescere del

rapporto di compressione la distorsione introdotta dalla

codiﬁca frattale aumenta signiﬁcativamente facendo am-

pliare l’intervallo [ǫ, 2ǫ] in cui sono distribuiti i coefﬁcienti

(in valore assoluto) del residuo da codiﬁcare e che risultano

difﬁcilmente approssimabili mediante un’unica coppia di

valori {−3/2ǫ, +3/2ǫ}. Si osserva inoltre che le prestazioni

fornite dalla DCT o dalla trasformata di Haar, nella cod-

iﬁca dell’informazione residua sono pressocch analoghe.

V. CONCLUSIONS

La codiﬁca frattale rappresenta un ambito di ricerca

particolarmente fertile, grazie alle sue innumerevoli appli-

cazioni collaterali. La maggior parte dei lavori presentati

in letteratura si preﬁgge come scopo quello di velocizzare

la fase di codiﬁca, la cui lentezza rappresenta il collo

di bottiglia nelle applicazioni pratiche dei PIFS; tuttavia

poco stato fatto per ci che concerne il miglioramento della

qualitd i codiﬁca dell’immagine. In questo lavoro, i PIFS e

letrasformazioni lineari vengono combinati insieme al ﬁne

di migliorare la qualit oggettiva e soggettiva dell’immagine.

Le trasformazioni lineari vengono integrate a due livelli

differenti; in una prima fase esse vengono utilizzate nella

deﬁnizione di una misura di qualit da sostituire al RMSE

comunemente usato, mentre in un secondo momento esse

0 10 20 30 40 50

28

30

32

34

36

38

Compression Ratio

PSNR

DCT Quality Measure

DCT

DCT+DCT−Res.

DCT+Haar−Res.

0 10 20 30 40 50

28

30

32

34

36

38

Compression Ratio

PSNR

FFT Quality Measure

FFT

FFT+DCT−Res.

FFT+Haar−Res.

0 10 20 30 40 50

28

30

32

34

36

38

Compression Ratio

PSNR

Haar Quality Measure

Haar

Haar+DCT−Res.

Haar+Haar−Res.

0 10 20 30 40 50

28

30

32

34

36

38

40

Compression Ratio

PSNR

RMSE Quality Measure

RMSE

RMSE+DCT−Res.

RMSE+Haar−Res.

Fig. 10. PSNR versus Compression Ratio with and without LT based coding

of the residual information.

permettono la codiﬁca dell’informazione residua. I risultati

sperimentali, sono stati condotti in maniera da mettere in

evidenza come ciascuno di queste due forme di integrazione

contribuisce al miglioramento della qualit tanto oggettiva

quanto soggettiva dell’immagine. Inoltre, nel caso speciﬁco

sono state considerate tre diverse trasformazioni lineari

(FFT, DCT, Haar), allo scopo di mettere in evidenza il

peso che ciascuna di esse pu avere sulla fase di codiﬁca. I

risultati ottenuti in questo studio rappresentano inoltre una

base incoraggiante per ulteriori approfondimenti, come ad

esempio l’utilizzo delle trasformate lineari per lo speed-up

della codiﬁca, al ﬁne di ottenere una integrazione completa

fra PIFS e trasformazioni lineari.

REFERENCES

[1] Avcibas

˙

I., Sankur B., and Sayood K., Statistical evaluation of image

quality measures, in Journal of Electronic Imaging, vol. 11, no. 2,

pp. 206-23, April 2002.

[2] Aggarwal C., On the Effects of Dimensionality Reduction on High

Dimensional Search, IBM T. J. Watson Research Center, ACM PODS

Conference, YorkTown, pp. 1–11, 2001.

[3] Bani-Eqbal B., Speeding Up Fractal Image Compression, in Proceedings

of the IS&T/SPIE 1995, Symposium on Electronic Imaging: Science &

Technology, vol. 2418, San Jose, California, pp. 67–74, September 1995.

[4] Bentley J. L., Multidimensional Binary Search Trees Used for Asso-

ciative Searching, in Communications of the ACM , vol. 18, no. 9,

pp. 509–517, September 1975.

[5] De Oliveira J. F. L., Mendoc¸a G. V. and Dias R. J., A Modiﬁed Fractal

Transformation to Improve the Quality of Fractal Coded Images, in

IEEE Signal Processing Society 1998 International Conference on Image

Processing, pp. 4–7, October 1998.

[6] Distasi R., Nappi M., Tucci M., FIRE: Fractal Indexing with Robust

Extensions for Image Databases, in IEEE Transactions on Image Pro-

cessing, vol. 12, Issue 3, pp. 373–384, March 2003.

[7] Eskicioglu A.M. and Fisher P.S., Image quality measures and their

performance, in IEEE Transactions on Communications, vol.43, no. 12,

pp. 2959–2965, December 1995.

[8] Fisher Y., Fractal Image Compression: Theory and Application,

Springer-Verlag, New York, 1994.

[9] Hamzaoui R., Hartenstein H., and Saupe D., Local Iterative Improve-

ment of Fractal Image Codes, in Image and Vision Computing, 18(6/7),

pp. 565–568, 2000.

9

[10] Hamzaoui R., Saupe D. and Hiller M., Distortion Minimization with Fast

Local Search for Fractal Image Compression, in Journal of Visual Com-

munication and Image Representation(Academic Press), JVCIR(12), no.

12, pp. 450–468, December 2001.

[11] Hamzaoui R., Saupe D. and Hiller M., Fast code enhancement with

local search for fractal image compression, in Proceedings of IEEE

International Conference on Image Processing (ICIP-2000), vol. 2,

pp. 156–159, 2000.

[12] Hartenstein H., Ruhl M. and Saupe D., Region-based fractal image

compression, in IEEE Transactions on Image Processing, vol. 9, no.

7 , pp. 1171–1184, Jul 2000.

[13] Hartenstein H., Ruhl M., Saupe D. and Vrscay E. R., On the Inverse

Problem of Fractal Compression, in Ergodic Theory, Analysis, and Efﬁ-

cient Simulation of Dynamical Systems, Bernold Fiedler (ed.), Springer

Verlag, August 2001.

[14] Komleh H. E., Chandran V., Sridharan S., Face Recognition Using

Fractal, in Proceedings of IEEE International Conference on Image

Processing (ICIP 2001), vol. 3, no. 1, pp. 58–61, 7-10 October 2001.

[15] Kominek J., Waterloo BragZone and Fractals Repository,

http://links.uwaterloo.ca/bragzone.base.html, 25 January 2007.

[16] Nguyen K. G., Saupe D., Adaptive post-processing for fractal image

compression, Proceedings of IEEE International Conference on Image

Processing (ICIP 2000), September 2000.

[17] Nill N. B., A visual model weighted cosine transform for image

compression and quality assessment, in IEEE Transactions on Com-

munications, vol. 3, no. 6, pp. 551-557, June 1985.

[18] Polvere M. and Nappi N., Speed-Up In Fractal Image Coding: Com-

parison of Methods, in IEEE Transactions on Image Processing, vol. 9,

no. 6, pp. 1002–1008, June 2000.

[19] Popescu C., Dimcac A. and Yan H., A Nonlinear Model for Fractal

Image Coding, in IEEE Transactions on Image Processing, vol. 6, no.

3, pp. 373–382, March 1997.

[20] Riccio D., Nappi M., Defering Range/Domain Comparison in Fractal

Image Compression, in Proceedings from International Conference on

Image Analysis and Processing, Mantova Italy, September 2003.

[21] Distasi R., Nappi M. and Riccio D., A Range/Domain Approximation

Error Based Approach for Fractal Image Compression, in IEEE Trans-

action on Image Processing, vol. 15, No. 1, pp. 89–97, January 2006.

[22] Saupe D., Hamazoui R., Complexity Reduction Methods for Fractal

Image Compression, in I.M.A. Conference Proceedings on Image Pro-

cessing: Mathematical Methods and Applications, pp. 1–24, September

1994.

[23] Roberts S., Richard E., Independent Component Analysis. Principles and

Practice, Cambridge University Press, Cambridge, UK, 2000.

[24] Brendt Wohlberg and Gerhard de Jager, Fast image domain fractal

compression by DCT domain block matching, in Electronics Letters,

vol. 31, no. 11, pp. 869–870, May 1995.

[25] Wu J.-L. and Duh W.-J., Feature extraction capability of some discrete

transforms, in Proceedings of the IEEE International Symposium on

Circuits and Systems, vol. 5, pp. 2649–2652, June 1991.