Content uploaded by J.-M. Boucher

Author content

All content in this area was uploaded by J.-M. Boucher

Content may be subject to copyright.

Visual textures as realizations of multivariate log-Gaussian Cox processes

Huu-Giao Nguyen, Ronan Fablet, Jean-Marc Boucher

Institut Telecom / Telecom Bretagne / LabSTICC

Universit´

e europ´

eenne de Bretagne

{huu.nguyen;ronan.fablet;jm.boucher}@telecom-bretagne.eu

Abstract

In this paper, we address invariant keypoint-based tex-

ture characterization and recognition. Viewing keypoint

sets associated with visual textures as realizations of point

processes, we investigate probabilistic texture models from

multivariate log-Gaussian Cox processes. These models

are parameterized by the covariance structure of the spa-

tial patterns. Their implementation initially rely on the con-

struction of a codebook of the visual signatures of keypoints.

We discuss invariance properties of the proposed models

for texture recognition applications and report a quantita-

tive evaluation for three texture datasets, namely: UIUC,

KTH-TIPs and Brodatz. These experiments include a com-

parison of the performance reached using different meth-

ods for keypoint detection and characterization and demon-

strate the relevance of the proposed models w.r.t. state-of-

the-art methods. We further discuss the main contribution

of proposed approach, including the key features of a statis-

tical model and complexity aspects.

1. Introduction

Texture information is among the key features of inter-

est for the robust characterization and recognition of visual

scenes. A variety of methods can be found in the literature

for texture recognition applications, from the early Haralick

cooccurrence features [8], statistics of the response to scale-

space ﬁlters such as Gabor and wavelet analysis [22] or

more recent methods embedding invariance properties such

as keypoint-based settings [4, 5, 15], multifractal schemes

[27], topographic map [26] or local binary pattern [10] de-

scriptors.

The renewed interest in texture analysis emerged from

the application of visual keypoints [1, 3, 16, 24] to reach

texture description invariant to geometric and photometric

image transforms, e.g. afﬁne transforms, contrast changes.

The classical keypoint-based setting consists in stating tex-

ture recognition as a voting-based output of the set of key-

points attached to a given visual texture [4, 13]. Such an ap-

proach guarantees to inherit the invariance properties of the

local keypoints and advanced statistical learning strategy

can efﬁciently be implemented including random forests [2]

and SVM [12]. Such an approach only rely on the visual

signatures of the keypoints and discard any spatial informa-

tion in terms of spatial patterns formed by the keypoint sets.

Our previous work has shown that descriptive statistics

of spatial point processes provide a relevant basis for jointly

characterizing the spatial and visual signatures of the key-

point set attached to a visual texture [21]. While no explicit

references to spatial point processes and associated descrip-

tive statistics, it can be noted that second-order descrip-

tive statistics of spatial keypoint patterns were previously

considered for application to scene categorization [15] and

robot navigation [5]. Here we further investigate to which

extent visual textures can be viewed as realizations of mul-

tivariate spatial point processes, but rather than descriptive

statistics, we aim at delivering a speciﬁc formal model and

explore in this context the relevance of a class of spatial

processes, namely log-Gaussian Cox processes. Log Gaus-

sian Cox processes introduced by Møller et al. [20] pro-

vide models for the spatial distribution of multivariate point

sets. As they relate to a model of the covariance of count

variables, they were shown to be easy to analyse and ﬂexi-

ble for experiments in spatial statistical analysis, especially

in environment-related sciences [20] or disease surveillance

[6]. It might be noted that spatial point processes were pre-

viously investigated for texture analysis, e.g. Lafarge et

al. [7] applied a spatial Gibbs point model and a Jump-

Diffusion process for the extraction of geometric features

in texture images. Such Gibbs models are however not

suited for recognition issues. Overall the speciﬁcation of

the log-Gaussian Cox processes resort to the estimation of

the covariance structure of count variables of the multivari-

ate point processes. Simple estimation procedures can be

derived for different types of the covariance structures and

an invariant texture characterization follows from model pa-

rameters. Overall, the main contributions of this paper is

three-fold:

•Addressing invariant texture characterization and mod-

2945

(a) (b)

Figure 1. Keypoint’s position detected by FH+SURF(a). Code-

book construction of visual keypoints and spatial statistical char-

acterization with different study circurlar regions(b). Each color

point depicts the category of keypoints. The intersection cases of

study circle with image boundary were illustrated.

elling from log-Gaussian Cox processes of visual keypoint

patterns.

•Testing the implementation of these models with sev-

eral covariance models for different types of keypoint de-

tection and descriptors.

•Demonstrating the relevance of the proposed models

for texture recognition with respect to previous work.

This paper is organized as follows. In Section 2, a

brief overview of the proposed approach and related work

is given. We present in Section 3 the proposed proba-

bilistic keypoint-based texture model based on multivari-

ate log-Gaussian Cox processes. Comparative evaluations

of texture recognition performance are reported for several

databases in Section 4. We further discuss the main contri-

butions of proposed approach in Section 5.

2. Proposed approach and related work

The general goals of this paper are the characterization

and modeling of visual textures from the spatial patterns

formed by visual keypoints using point process models. The

initial step then consists in detecting local keypoints in tex-

ture images(Fig.1a). As a way to making easier statisti-

cal estimation issues, we build a codebook of visual key-

points from their visual signatures using adapted clustering

techniques such that any visual keypoint is assigned to a

category(Fig.1b). Regarding visual keypoint sets as ﬁnite

spatial random sets, log-Gaussian Cox models are investi-

gated to reach an invariant texture characterization from the

covariance structure of spatial patterns. The state-of-the-art

approaches of visual keypoints are detailed below.

When addressing matching and recognition issues in im-

ages, the typical approach relies on learning models in the

feature space deﬁned by local visual descriptors. The focus

has been given to visual signatures invariant to geometric

and photometric transformation of the images [3, 19, 27].

Among the most popular descriptors, local keypoints were

Keypoint density Descriptor size

DoG+Sift 1582 128

FH+Surf 758 64

(Har-Lap)+(Sift-Spin) 538 178

(Hes-Lap)+Daisy 1216 200

FH+Brief 758 256

Table 1. Number of detected keypoints and size of signature vector

of the different detector-desciptor types. The processed texture

pattern is the image displayed in Fig.1a.

shown to be particularly efﬁcient [13, 18] compared to the

early feature developed for texture analysis such as Gabor

features [22] and cooccurrence matrices[8].

Numerous approaches have been proposed to detect re-

gions or points of interest in images. Among the most pop-

ular the Harris detector [9] detects corners, i.e. the points

at which signiﬁcant intensity changes in two directions oc-

cur. It relies on the eigen-decomposition of the structure

tensor of the intensity function. Scale-space approaches

based on the analysis of the Hessian matrix were also pro-

posed to address scale adaption [14]. Scale-spaces of Dif-

ference of Gaussians (DoG) are also widely considered as

an approximation of the Laplacian [16]. More recently,

Mikolajczyk et al.[19] combined Harris or Hessian detec-

tor and the Laplacian operator (for scale adaption) to pro-

pose two scale-invariant feature detectors, namely: Harris-

Laplace (Har-Lap) and Hessian-Laplace (Hes-Lap). Bay

et al.[1] presented the Fast-Hessian (FH) detector based on

the Hessian matrix in the integral images. Other categories

of keypoint detectors may be cited, for instance the max-

imally stable extremal region (MSER) detector [17], the

edge-based region (EBR) detector, the intensity extrema-

based region(IBR) detector [25] or entropy-based region

(such as salient regions) detector [11]. Comparisons be-

tween the different detectors are given in [1, 13, 19].

Given the pixel coordinates of the extracted keypoints,

many different schemes have been proposed to extract a fea-

ture vector of each keypoint siand invariances to contrast

change and geometric transforms, typically afﬁne trans-

forms, are embedded [10, 18, 28]. The SIFT descriptor is

certainly among the most popular and relevant ones. It is

formed by the distribution of the orientations of the gradi-

ent of the intensity in 4x4 windows around the considered

point [16]. This description ensures contrast invariance and

partial invariance to afﬁne transforms. Orientations are typ-

ically quantized over eight values such that the SIFT feature

vector is 128-dimensional. Several extensions of the orig-

inal SIFT descriptor have been proposed including GLOH,

PCA-SIFT, RIFT (see [13] for a review).

For instance, an intensity-domain spin image [13] is a 2D

histogram encoding the distribution of the intensity value

and the distance from the reference point. Rather than con-

sidering gradient orientations, the SURF descriptor [1] rely

2946

on the distribution of Haar-wavelet responses whereas the

Daisy descriptor [24] exploits responses to oriented Gaus-

sian ﬁlters; the Brief descriptor [3] was issued from a rel-

atively small number of intensity of different image patch

using binary string.

From these reviews, we investigate ﬁve robust de-

tector/descriptor types reported with the best score per-

formance in [1, 3, 16, 24, 28], respectively: FH+Surf,

FH+Brief, DoG+Sift, (Hes-Lap)+Daisy and (Har-

Lap)+(Sift-Spin). As illustrated in Tab.1 for the texture

sample displayed in Fig.1a, these different combinations

lead to different complexity levels as well as large differ-

ences in the number of detected keypoints, a critical aspect

when considering keypoint statistics.

3. Multivariate Log Gaussian Cox process

3.1. Multivariate point process and associated de-

scriptive statistics

Aspatial point process Sis deﬁned as a locally ﬁ-

nite random subset of a given bounded region B⊂R2.

A realization of such a process is a spatial point pattern

s={s1, ..., sn}of n points contained in B. Considering

a realization of the point process, the moments of random

variable are relevant descriptive statistics. In the general

case, the pth-order moment of Sis deﬁned as:

µ(p)(B1×... ×Bp) = E{N(B1)...N(Bp)}(1)

where E{.}denotes the expectation, N(Bi)is the num-

ber of random points contained in a given Borel set B. Fo-

cusing on intensity measure of S, the ﬁrst-order moment is

evaluated with p= 1:

µ(B) = EX

s∈S

IB(s) = Z

B

ρ(s)ds (2)

where IB(s)is an indicator function that takes the value

1 when sfalls in region B,ρ(s)ds is the probability that

one point falls in an inﬁnitesimally small area ds of the

neighborhood of point s. The normalized ﬁrst-order mo-

ment λ=µ(B)/|B|is the mean density of expected points

per surface unit, |B|is the surface of region B. This quan-

tity fully characterizes Poisson point processes. For a ho-

mogeneous process, this density is spatially constant.

Beyond the ﬁrst-order moment, the covariance structure

of the count variable, i.e. descriptive statistics of the pairs

of points of the ﬁnite random set, can be characterized by

the second-order moment µ(2) of Sparameterized as:

µ(2)(B1×B2) = EX

s1∈SX

s2∈S

IB1(s1)IB2(s2)(3)

=Z

B1×B2

ρ(2)(s1, s2)ds1ds2(4)

where second-order density ρ(2)(s1, s2)is interpreted as

the density, per surface unit, of the pair of points s1and s2

in inﬁnitesimally small areas ds1and ds2. For a stationary

and isotropy point process, this density function ρ(2)(s1, s2)

states the correlation of pairs of points and only depends

on distance ks1−s2k[23]. In the spatial point ﬁelds, the

second-order measure µ(2) is frequently replaced by the fac-

torial moment measure α(2) as:

α(2)(B1×B2) = EX

s1∈SX

s2∈S

(s26=s1)

IB1(s1)IB2(s2)(5)

where the relation betwen the second-order measure µ(2)

and the factorial moment measure α(2) is given by:

α(2)(B1×B2) = µ(2) (B1×B2)−µ(B1∩B2)(6)

Amultivariate point process Ψis deﬁned as a spatial

point process for which a discrete mark miis associated

to each point siin B. Second-order moment in Eq.4 can be

extended to multivariate point patterns. Considering circu-

lar study region D(., r)with radius r (Fig.1b), the second-

order cooccurrence statistics of Ψare characterized by the

factorial moment measure as follows.

α(2)

i,j (r) = EX

hX

l6=h

δi(mh)δj(ml)I(ksh−slk ≤ r)(7)

where δi(mh)equals 1 if the mark mhof point shis

iand 0 otherwise. For statistical interpretation of second-

order moment µ(2) [23], Ripley’s K function that is usually

used to analyse the mean number of points of type j located

in a study region of radius rcentered at the points of type i

(which itself is excluded) is measured as:

Kij (r) = (λiλj)−1α(2)

ij (r)(8)

3.2. Log-Gaussian Cox model

A Cox process Xwith random intensity function Zis a

point process such that X|Zis a Poisson process with in-

tensity function Z[20, 23]. For an univariate log Gaussian

Cox process Xon a locally ﬁnite subset S⊂R2, the ran-

dom intensity function is given by Z= exp(Y), where Y is

a Gaussian ﬁeld on Scharacterized by its mean µ=EY (s)

and covariance functions c(r) = Cov(Y(s1), Y (s2)),

where r=ks1−s2k, are deﬁned and ﬁnite for all bounded

B⊂S. An important property of log-Gaussian Cox pro-

cess is that the characteristics of the Gaussian ﬁeld Y relate

to the ﬁrst and second-order moments of the point process.

More precisely, the following relations hold [20]:

(ρ(s) = λ= exp(µ+σ2/2)

ρ(2)(s1, s2)/(ρ(s1)ρ(s2)) = g(r) = exp(c(r))

(9)

2947

Figure 2. Intensity estimation on a random set of 435 points issued

from log Gaussian Cox process.

are respectively the intensity and the pair correlation

function, where σ2=V ar(Y(s)) is the variance of the

Gaussian process. We report an example of the intensity es-

timation issued from log Gaussian Cox processes in Fig.2.

Extending to a multivariate log-Gaussian Cox pro-

cess, Cox processes {Xi}are conditionally independent

w.r.t. a multivariate intensity ﬁeld Z={Zi}and

that Xi|Ziis a Poisson process with intensity measure

{Zi}.Zrelates to a multivariate Gaussian ﬁeld Y as

Zi= exp(Yi). The multivariate Gaussian random ﬁeld is

characterized by its mean µi(s)and covariance functions

cij (r) = Cov(Yi(s1), Yj(s2)). And the intensity and pair

correlation function become:

λi=exp(µi+σ2

i/2); gij (r) = exp(cij (r)) (10)

Fitting a stationary parametric log-Gaussian Cox process

comes to estimating mean and covariance parameters for the

associated Gaussian ﬁeld. Following [20, 23], an estimation

procedure relies on the relation between the pair correlation

function gij and the K-function of Gaussian processes as:

Kij (R) = 2π

R

Z

0

rgij (r)dr (11)

where Ris a pre-deﬁned value of radius. Combining Eq.8

and Eq.11, the pair correlation function can be estimated as:

gij (r) = 1

2πrλiλjX

hX

l6=h

δi(mh)δj(ml)ξ(ksh−slk, r)bsh(12)

where ξ(.)is a kernel (here a Gaussian kernel is consid-

ered), λiis the intensity for class iestimated from (Eq.2),

bshis the proportion of the circumference of the study circle

lying within the image. In practice, the computation of the

above second-order descriptive statistics take account edge

effects [21]. Considering the edge effect correction, gij is

not symmetric in i and j. Hence, the non-parametric estima-

tion of the covariance function is deﬁned as:

cij (r) = log λigij (r) + λjgj i(r)

λi+λj(13)

Exponential Cardinal sine Hyperbolic

exp(−(r/β)α) sin(r/β)/(r/β) (1 + r/β)−1

Table 2. Different correlation functions of L(β, r).

To resort to a compact probabilistic model for the rep-

resentations of visual textures, we investigate parametric

forms of the covariance function c. Given a chosen parame-

terization L(β, r)in Tab.2, model parameters are estimated

from the minimization of the following criterion:

R

Z

0σ2

ij L(β, r)−cij(r)2dr (14)

A gradient-based optimization procedure is applied to solve

for this minimization. The proposed probabilistic keypoint-

based texture model is eventually given by intensity param-

eters λi, variances σij and scale parameters βij.

3.3. Feature dimension reduction

Considering the parameters of the log-Gaussian Cox

model as descriptors of the spatial patterns of visual key-

points, each texture image is associated with a k(k+ 2)-

dimensional feature vector, where k is the size of the code-

book of visual words. In practice, the high-dimensional

feature may affect recognition performance. State-of-the-

art methods based on visual keypoints typically involve di-

mensionality in the range of k, e.g. bag-of-keypoint [4],

(Har+Lap)(Sift+Spin) scheme [28]. Hence dimension re-

duction issues should be further analyzed.

A dimensional reduction procedure of second-order

statistics was introduced in [21] from the determination

of categories of keypoint pairs. The codebook of key-

point pairs, denoted by u=M(si, sj), are issued from an

adapted clustering technique applied for each set of two cat-

egorized keypoints siand sj. Therefore, the non-parametric

estimation of the covariance function is given by:

cu(r) = log

1

2πrλuX

hX

l6=h

δu(M(sh, sl))ξ(ksh−slk, r)bsh

(15)

The estimation of intensity parameter λu, variances σu

and scale parameters βufor each category of keypoint pairs

follows as previously from minimization (Eq.14). Overall

this procedure results in downsizing the proposed texture

descriptor to 3k∗-dimensional vectors, where k∗is the size

of codebook of keypoint pairs.

3.4. Invariance properties

Invariance properties of the resulting texture character-

ization are inherited from the characteristics of the chosen

2948

Iref :αref = 1(a)

Without the scale adaption:

(c)

With the scale adaption:

(e)

Itest:αtest = 2(b)

(d)

(f)

Figure 3. Scaling effect on the parameter estimation of keypoint-

based log-Gaussian Cox model. Reference image(a) and test im-

age(b) at two scale factors. The red curves are the values of vari-

ances σu(left:c,e) and scale parameters βu(right:d,f) of reference

image Iref in all plots. The results of parameter estimation with-

out(or with) scale adaption of test image Itest are respectively

showed in blue(or green). The plots were carried out with the

feature dimensionality reduction.

visual keypoint signatures. Image scaling however clearly

affects the second-order moments of the spatial patterns.

More precisely, assuming that the detection and characteri-

zation of visual keypoint are scale-invariant as pointed out

in [21], intensity and covariance parameters for a given tex-

ture observed at two scales relate up to a scale factor. This

scale factor can be estimated from the rate of average point

densities per surface unit. In this work, the actual radius

values Riof the proposed estimation scheme were chosen

depending on a reference image. Fig.3 further illustrates the

stability of proposed features at the different image scales.

4. Experimental evaluation

Given the textural features deﬁned in the previous sec-

tion, an application to texture recognition is considered, i.e.

an unknown texture sample is assigned to one of a set of

know texture classes using a discriminative classiﬁer. The

evaluation of the proposed descriptor involves the computa-

tion of classiﬁcation performances for model learning with

Nttraining texture samples per class. Training images are

randomly selected among the Nsamples available in each

class. The remaining N−Ntimages are used as test im-

ages. The random selection of training samples is repeated

50 times to evaluate the mean and the standard deviation of

the correct classiﬁcation rate. These experimentations are

carried out with three texture datasets, namely UIUC, Bro-

datz, KTH-TIPs databases.

We exploit random forest classiﬁers [2]. They rely on

the construction of an ensemble of classiﬁcation trees us-

ing some form of randomization. A sample is classiﬁed by

sending it down every tree and aggregating the distribution

of classes for the reached leaves. Random forest uses a vot-

ing rule to assign a class to an unknown sample.

4.1. Parameter setting

A set of texture features such as Gabor ﬁlter [22], co-

occurrence matrix [8], local multifractal features [27], s-of-

keypoints(BoK) [4], combination scheme of local keypoints

[28] and descriptive statistics of visual keypoints [15, 21]

were selected to evaluate the relevance of our contribution

compared to the state-of-the-art techniques. Here, we report

the best performance result of each approach obtained from

the different parameter settings detailed as follows.

Given a texture sample, texture features were char-

acterized as the statistics of the response to scale-space

ﬁlters such as Gabor wavelets at the orientation θ=

{0,±π

2, π}and the frequencies f={0,4,8}. In con-

trast, co-occurrence matrices measure the intensity or

grayscale values of texture image at a neighborhood dis-

tance d={1,2,4}with a set of orientation θ=

{0,±π

4,±π

2,±3π

4, π}. In addition to these classical texture

descriptors, Xu’s approach [27] was also tested. It relies

on a multifractal description of textures with invariance to

viewpoint changes, non-rigid deformations and local afﬁne

contrast change. We tested different parameter settings for

Xu’s method: density level ind ={1,8}, dimension of

MFS f={16,64}and iteration level ite ={8,10}.

Regarding schemes based on visual keypoints, we im-

plemented bags-of-keypoints, i.e. the relative occurrence

statistics of the different visual words based on SIFT de-

scriptor [4]. We considered also the most popular keypoint

combination scheme (Har+Lap)(Sift+Spin) introduced in

[28]. These methods were computed with the different size

of categories, here, k={60,120,150}. These approaches

were selected to examine the contribution of spatial infor-

mation of keypoints to texture recognition.

The quantitative evaluation also included cooccurrence

statistics of visual keypoints, here Ling’s method [15] and

2949

(a) UIUCTex (b) Brodatz (c) KTH-TIPs

Figure 4. Image examples of three texture datasets.

Nguyen’s method [21]. They were implemented with log-

arithmicly increased neighborhood sizes, Nr= 128log(x)

where x varies between 1 and exp(1) according to a 0.05

linear step. The computation of the second-order descrip-

tive statistics reported in [21] involved a correction of edge

effects and scaling factor as well as feature dimension re-

duction with k∗= 60 categories of visual keypoints pairs.

The similar parameter setting was used for the proposed ap-

proach based on multivariate log-Gaussian Cox model

4.2. Performance results

4.2.1 UIUC dataset

The UIUC dataset involves 25 texture classes and each

class contains 40 640x480 images with strongly varying

viewpoint, scale and illumination conditions (Fig.4a). Re-

garding the comparison of the proposed descriptor to pre-

vious work in Tab.3, the mean correct classiﬁcation rate

and standard deviations over 50 random selections are re-

ported for each approach as a function of the number of

training samples Nt. Observing the result of the classi-

cal approaches such as Gabor ﬁlters, cooccurrence matrix

and multifractal (Xu’s method), these experiments clearly

demonstrate the interest of inheriting the robustness of the

visual keypoints for texture recognition in terms of invari-

ance to geometric image distortions and contrast change:

respectively, 67.78%±1.28,80.12% ±1.30,93.85% ±1.31

vs. 97.84% ±0.32 of our method with Nt= 20.

Considering state-of-the-art keypoint-based approaches,

reported performances stress the relevance of the proposed

probabilistic texture models. The gain is greater than 6.5%

compared with the BoK and 1.5% compared with the most

popular local keypoint (Har+Lap)(Sift+Spin) scheme of

Zhang’s method when 20 training images are considered.

These results emphasize the efﬁciency of the spatial statis-

tical analysis of the visual keypoints. On the other hand, the

proposed descriptor gets a more robust texture recognition

than the techniques of cooccurrence statistics of visual key-

points of Ling’s method and Nguyen’s method, respectively

91.87% ±1.38,97.34% ±0.25 vs. 97.84% ±0.32 when 20

training images are considered.

4.2.2 Brodatz dataset

We also evaluated recognition performances for 111 dif-

ferent texture classes from Brodatz album. Each class of

this dataset comprises 9 170x170 sub-images (Fig.4b) ex-

tracted from images. It might be noted that this dataset does

not includes scale and illumination changes. The proposed

descriptor favorably compares to the other approaches in

Tab.4 with a random selection of 1 or 3 training images par

class. Our approach reaches up to 96.14% ±0.41 of cor-

rect classiﬁcation with 3 training images. Greater improve-

ments of than 12% are reported to compare with Gabor

and cooccurence features. All other methods reached clas-

siﬁcation performances below 95.67%. The proposed de-

scriptor is shown to be slightly more robust and stable than

Nguyen’s method and Zhang’s method, here 88.81%±0.92

vs. 87.67%±0.81 and 86.63% ±1.05 when Nt= 1.

4.2.3 KTH-TIPs dataset

Similar conclusion can be drawn from the third experiment

for the KTH-TIPs texture dataset. This dataset involves 10

material classes. Each class contains 81 images. Texture

samples are 200x200 images (except some samples of two

classes: brown-bread and cracker) for different scales, illu-

mination directions and object poses (Fig.4c). The classiﬁ-

cation performance comparisons for this dataset are showed

in Tab.7. The proposed approach has a gain of about 1%

compared to the best score of all other approaches, namely

Nguyen’s method, 95.74%±0.45 vs. 95.09%±0.41 when

40 training images are used.

4.2.4 Performance comparison among different key-

point types and different covariance functions

We also report a detailed analysis of the classiﬁcation per-

formances on UIUC texture dataset reached by the pro-

2950

NtGabor ﬁlter Cooc. matrix BoK[4] Ling[15] Xu[27] Zhang[28] Nguyen[21] our method

1 31.22±3.14 45.33±3.03 67.25±2.75 67.62±2.93 61.14±2.90 72.53±2.45 75.66±1.65 75.21±1.75

5 45.14±2.54 61.58±2.14 76.38±2.15 78.42±2.33 83.33±2.07 88.62±1.33 91.67±0.93 91.96±1.13

10 57.37±1.93 70.67±1.72 81.12±1.45 84.14±1.72 89.68±1.65 93.17±1.15 94.33±0.78 95.42±0.71

15 61.25±1.52 73.85±1.34 86.35±1.20 86.38±1.25 91.34±1.45 95.33±0.98 96.54±0.53 96.87±0.65

20 67.78±1.28 80.12±1.30 91.28±1.15 91.87±1.38 93.85±1.31 96.67±0.93 97.34±0.25 97.84±0.32

Table 3. Classiﬁcation rates and standard deviations of proposed method compared with the-state-of-the-art approaches on UIUC dataset.

NtGabor ﬁlter Cooc. matrix BoK[4] Ling[15] Xu[27] Zhang[28] Nguyen[21] our method

1 78.52±1.72 75.42±1.73 83.16 ±1.50 84.33 ±1.63 85.95 ±0.91 86.63 ±1.05 87.67±0.81 88.81±0.92

385.14 ±1.41 83.22±1.04 92.78 ±0.91 93.17 ±0.87 93.41 ±0.73 94.34 ±0.43 95.67±0.33 96.14±0.41

Table 4. Classiﬁcation rates and standard deviations over 50 random selections on Brodatz texture database.

NtGaussian Cardinal sine Hyperbolic

175.21±1.75 75.15±1.67 75.03±1.81

591.96±1.13 91.63±1.17 91.32±1.19

10 95.42±0.71 95.35±0.75 94.72±0.85

15 96.87±0.65 96.17±0.63 95.43±0.71

20 97.84±0.32 97.15±0.42 96.85±0.38

Table 5. Comparison performance of proposed model with the dif-

ferent covariance functions on UIUC dataset.

posed keypoint-based texture models. Different combina-

tions of detector-descriptor types are evaluated in Tab.8.

These results are issued from log-Gaussian Cox models

with a Gaussian covariance function. DoG+Sift descrip-

tors are shown to be the most efﬁcient with a gain from

0.15% to 1.2% compared to the other ones when 15 or

20 training images are used. When only 5 or 10 train-

ing images are considered, (Hes-Lap)+Daisy descriptor

leads to the best classiﬁcation score with 92.13%±1.19

and 95.47%±1.08(respectively). These results might be

explained by the greater keypoint density observed when

using DoG+Sift and (Hes-Lap)+Daisy schemes( see Tab.1)

compared to the other combinations such that a ﬁner char-

acterization of textures can be reached as well as a more

robust parameter estimation of log-Gaussian Cox model.

Given a set of visual keypoints with DoG+Sift descrip-

tor, the classiﬁcation performances obtained with different

covariance functions were reported in Tab.5. The best clas-

siﬁcation performance is obtained with a Gaussian func-

tion with 97.84%±0.32 vs. 97.15%±0.42 or 96.85%±0.38

when 20 training images are used.

We emphasize the relevance of the proposed scale-

adaption and dimension reduction schemes in Tab.6.

Whereas the use of scale-adaption leads to a gain of about

2%, dimension reduction resorts to a slightly more robust

recognition when 5 to 20 training images are available. The

later can be interpreted as a ﬁltering property of this scheme.

5. Discussion

In this paper, we have further explored texture descrip-

tion and recognition from the joint characterization of the

NtWithout Without Model

scaling effect dimensional reduction complete

1 73.27±2.05 75.65±1.82 75.21±1.75

5 89.12±1.27 91.67±1.15 91.96±1.13

10 94.76±1.11 95.15±0.75 95.42±0.71

15 95.12±0.81 96.27±0.71 96.87±0.65

20 95.89±0.54 97.12±0.35 97.84±0.32

Table 6. Comparison performance of proposed model with and

without scaling effect and dimensional reduction.

spatial and visual patterns of keypoint sets in texture im-

ages. Viewing keypoint sets as realizations of ﬁnite spa-

tial random sets we have shown that beyond descriptive

statistics proposed in [15, 21] probabilistic keypoint-based

models can be developed for visual textures. The proposed

models embed invariance properties with respect to contrast

change and geometric image transforms to reach robust tex-

ture recognition performance as proven by the quantitative

evaluation to state-of-the-art schemes.

As pointed out in [6, 7, 23], it is difﬁcult to recommend

a priori the best model among many models for spatial

point pattern analysis in the literature, e.g. Neyman-Scott,

shot-noise Cox or Gibbs processes. Here, multivariate log-

Gaussian Cox models were selected. They have several ap-

pealing features:

•They are fully characterized from the underlying Gaus-

sian ﬁelds, hence the associated mean and covariance fea-

tures. This makes simple the interpretation of model param-

eters as well as their estimation.

•Through the various parametric and non-parametric

forms that can be considered for the covariance structure,

these models are highly ﬂexible to cover a wide range of

covariance structures.

Besides compared to simple descriptive statistics these

models have several major advantages:

•They are independent on the a priori selection of the

study regions (number of regions and radius sizes) which is

critical setting for the computation of descriptive statistics

and provide a description of keypoint patterns intrinsically

free of edge effects and image size.

2951

NtGabor ﬁlter Cooc. matrix BoK[4] Ling[15] Xu[27] Zhang[28] Nguyen[21] our method

562.32 ±2.41 62.83 ±2.42 64.42 ±2.81 71.07 ±2.63 72.63 ±2.45 78.17 ±2.35 81.34±1.93 80.17±2.15

10 74.67 ±2.04 73.34 ±2.22 75.83 ±2.12 76.48 ±2.27 81.42 ±1.95 85.42 ±1.78 87.38±1.47 86.96±1.53

20 82.65 ±1.69 80.45 ±1.67 81.12 ±1.45 83.47 ±1.48 87.18 ±1.53 90.28 ±1.31 92.15±1.26 92.42±1.11

30 87.75 ±1.27 85.83 ±1.30 88.35 ±1.20 88.58 ±1.17 89.95 ±1.35 92.15 ±1.05 93.67±0.97 94.33±0.87

40 89.87 ±0.86 88.95 ±0.77 90.18 ±0.65 91.15 ±1.05 91.33 ±0.97 94.33 ±0.67 95.09±0.41 95.74±0.45

Table 7. Classiﬁcation rates and standard deviations over 50 random selections on KTH-TIPs texture database.

NtDoG+Sift[16] FH+Surf[1] FH+Brief[3] (Hes-Lap)+Daisy[24] (Har-Lap)+(Sift-Spin)[28]

1 75.21±1.75 75.05±1.94 75.43±1.71 75.18±1.69 74.85±1.87

5 91.96±1.13 90.73±1.11 91.42±1.23 92.13±1.19 91.15±1.41

10 95.42±0.71 95.15±0.91 95.22±0.85 95.47±1.08 95.23±0.72

15 96.87±0.65 96.14±0.63 96.43±0.51 96.75±0.58 96.37±0.61

20 97.84±0.32 96.75±0.41 97.25±0.34 97.67±0.35 97.14±0.37

Table 8. Comparison performance of proposed model with the different detector-descriptor types on UIUC dataset.

•They deliver a compact representation of keypoint-

based information, here 3k∗-dimensional feature space vs.

(Nr+ 1)k∗complexity required by descriptive statistics

[15, 21]. This compactness is of great beneﬁt in the applica-

tion of learning techniques for which the lower-dimensional

feature spaces are also preferred.

The proposed probabilistic models also offer additional

generalization properties. They are associated with an ana-

lytical formulation of the likelihood functions of a keypoint-

patterns and simple simulation schemes. These features are

of great interest for various applications. For instance, they

could beneﬁt to the deﬁnition of well-founded texture sim-

ilarity measures from model distances. This is a key fea-

ture of the log-Gaussian Cox model as the similarity mea-

sure can be deﬁned from distances between Gaussian ﬁelds

for which analytical formulation may be derived. While

not investigated here as random forests where considered

for comparison to previous work, such similarity measures

would be of great interest for kernel-based learning as well

as other applications such as image indexing or segmenta-

tion. These aspects will be further explored in future work.

References

[1] H. Bay, T. Tuytelaars, and L. V. Gool. Surf: Speeded up robust

features. ECCV, 1:404–417, 2006.

[2] F. Breiman. Random forests. Machine learning, 45:5–32, 2001.

[3] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief: Binary robust

independent elementary features. ECCV, IV:778–792, 2010.

[4] G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with

bags of keypoints. ECCV, pages 1–22, 2004.

[5] M. Cummins and P. Newman. FAB-MAP: Probabilistic Localization

and Mapping in the Space of Appearance. IJRR, 27(6):647–665,

2008.

[6] P. Diggle, B. Rowlingson, and T. Su. Point process methodology

for on-line spatio-temporal disease surveillance. Environmetrics,

16(5):423–434, 2005.

[7] F.Lafarge, G. Gimel’farb, and X. Descombes. Geometric feature ex-

traction by a multi-marked point process. PAMI, 32(9):1597–1609,

2010.

[8] R. Haralick. Statistical and structural approaches to textures. Pro-

ceedings of the IEEE, 67(5):786–804, May,1979.

[9] C. Harris and M. Stephens. A combined corner and edge detector.

Proc. of the Alvey Vision Conf., pages 147–151, 1988.

[10] M. Heikkil ¨

a, M. Pietik¨

ainen, and C. Schmid. Description of interest

regions with local binary patterns. Pattern Recognition, 42(3):425–

436, 2009.

[11] T. Kadir, A. Zisserman, and M. Brady. An afﬁne invariant salient

region detector. ECCV, pages 345–457, 2004.

[12] S. Kotsiantis, I. Zaharakis, and P. Pintelas. Machine learning: a re-

view of classiﬁcation and combining techniques. Artiﬁcial Intelli-

gence Review, 26(3):159–190, 2006.

[13] S. Lazebnik, C. Schmid, and J. Ponce. A sparse texture representa-

tion using local afﬁne regions. PAMI, 27(8):1265–1278, 2005.

[14] T. Lindeberg. Feature detection with automatic scale selection. IJCV,

30(2):79–116, 1998.

[15] H. Ling and S. Soatto. Proximity distribution kernels for geometric

context in category recognition. ICCV, pages 1–8, 2007.

[16] D. Lowe. Distinctive image features from scale-invariant keypoints.

IJCV, 60(2):91–110, 2004.

[17] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline

stereo from maximally stable extremal regions. BMVC, pages 384–

393, 2002.

[18] K. Mikolajczyk and C. Schmid. A performance evaluation of local

descriptors. PAMI, 27(10):1615–1630, 2005.

[19] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas,

F. Schaffalitzky, T. Kadir, and L. V. Gool. A comparison of afﬁne

region detectors. IJCV, 65:43–72, 2005.

[20] J. Møller, A. Syversveen, and R. Waagepetersen. Log gaussian cox

processes. Scandinavian Journal of Stat., 25(3):451–482, 1998.

[21] H.-G. Nguyen, R. Fablet, and J.-M. Boucher. Spatial statistics of

visual keypoints for texture recognition. ECCV, IV:764–777, 2010.

[22] T. Randen and J. H. Husøy. Filtering for texture classiﬁcation: A

comparative study. PAMI, 21:291–310, 1999.

[23] D. Stoyan and H.Stoyan. Fractals, random shapes and point ﬁelds.

Wiley,Chichester, 1994.

[24] E. Tola, V. Lepetit, and P. Fua. Daisy: An efﬁcient dense descriptor

applied to wide baseline stereo. PAMI, 5:815–830, 2010.

[25] T. Tuytelaars and L. V. Gool. Matching widely separated views based

on afﬁne invariant regions. IJCV, 59(1):61–85, 2004.

[26] G.-S. Xia, J. Delon, and Y. Gousseau. Shape-based invariant texture

indexing. IJCV, 88(3):382–403, 2010.

[27] Y. Xu, H. Ji, and C. Fermuller. Viewpoint invariant texture descrip-

tion using fractal analysis. IJCV, 83(1):85–100, 2009.

[28] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local fea-

tures and kernels for classiﬁcation of texture and object categories: a

comprehensive study. IJCV, 73(2):213–238, 2007.

2952