Conference PaperPDF Available

Abstract and Figures

In this paper, we address invariant keypoint-based texture characterization and recognition. Viewing keypoint sets associated with visual textures as realizations of point processes, we investigate probabilistic texture models from multivariate log-Gaussian Cox processes. These models are parameterized by the covariance structure of the spatial patterns. Their implementation initially rely on the construction of a codebook of the visual signatures of keypoints. We discuss invariance properties of the proposed models for texture recognition applications and report a quantitative evaluation for three texture datasets, namely: UIUC, KTH-TIPs and Brodatz. These experiments include a comparison of the performance reached using different methods for keypoint detection and characterization and demonstrate the relevance of the proposed models w.r.t. state-of-the-art methods. We further discuss the main contribution of proposed approach, including the key features of a statistical model and complexity aspects.
Content may be subject to copyright.
Visual textures as realizations of multivariate log-Gaussian Cox processes
Huu-Giao Nguyen, Ronan Fablet, Jean-Marc Boucher
Institut Telecom / Telecom Bretagne / LabSTICC
Universit´
e europ´
eenne de Bretagne
{huu.nguyen;ronan.fablet;jm.boucher}@telecom-bretagne.eu
Abstract
In this paper, we address invariant keypoint-based tex-
ture characterization and recognition. Viewing keypoint
sets associated with visual textures as realizations of point
processes, we investigate probabilistic texture models from
multivariate log-Gaussian Cox processes. These models
are parameterized by the covariance structure of the spa-
tial patterns. Their implementation initially rely on the con-
struction of a codebook of the visual signatures of keypoints.
We discuss invariance properties of the proposed models
for texture recognition applications and report a quantita-
tive evaluation for three texture datasets, namely: UIUC,
KTH-TIPs and Brodatz. These experiments include a com-
parison of the performance reached using different meth-
ods for keypoint detection and characterization and demon-
strate the relevance of the proposed models w.r.t. state-of-
the-art methods. We further discuss the main contribution
of proposed approach, including the key features of a statis-
tical model and complexity aspects.
1. Introduction
Texture information is among the key features of inter-
est for the robust characterization and recognition of visual
scenes. A variety of methods can be found in the literature
for texture recognition applications, from the early Haralick
cooccurrence features [8], statistics of the response to scale-
space filters such as Gabor and wavelet analysis [22] or
more recent methods embedding invariance properties such
as keypoint-based settings [4, 5, 15], multifractal schemes
[27], topographic map [26] or local binary pattern [10] de-
scriptors.
The renewed interest in texture analysis emerged from
the application of visual keypoints [1, 3, 16, 24] to reach
texture description invariant to geometric and photometric
image transforms, e.g. affine transforms, contrast changes.
The classical keypoint-based setting consists in stating tex-
ture recognition as a voting-based output of the set of key-
points attached to a given visual texture [4, 13]. Such an ap-
proach guarantees to inherit the invariance properties of the
local keypoints and advanced statistical learning strategy
can efficiently be implemented including random forests [2]
and SVM [12]. Such an approach only rely on the visual
signatures of the keypoints and discard any spatial informa-
tion in terms of spatial patterns formed by the keypoint sets.
Our previous work has shown that descriptive statistics
of spatial point processes provide a relevant basis for jointly
characterizing the spatial and visual signatures of the key-
point set attached to a visual texture [21]. While no explicit
references to spatial point processes and associated descrip-
tive statistics, it can be noted that second-order descrip-
tive statistics of spatial keypoint patterns were previously
considered for application to scene categorization [15] and
robot navigation [5]. Here we further investigate to which
extent visual textures can be viewed as realizations of mul-
tivariate spatial point processes, but rather than descriptive
statistics, we aim at delivering a specific formal model and
explore in this context the relevance of a class of spatial
processes, namely log-Gaussian Cox processes. Log Gaus-
sian Cox processes introduced by Møller et al. [20] pro-
vide models for the spatial distribution of multivariate point
sets. As they relate to a model of the covariance of count
variables, they were shown to be easy to analyse and flexi-
ble for experiments in spatial statistical analysis, especially
in environment-related sciences [20] or disease surveillance
[6]. It might be noted that spatial point processes were pre-
viously investigated for texture analysis, e.g. Lafarge et
al. [7] applied a spatial Gibbs point model and a Jump-
Diffusion process for the extraction of geometric features
in texture images. Such Gibbs models are however not
suited for recognition issues. Overall the specification of
the log-Gaussian Cox processes resort to the estimation of
the covariance structure of count variables of the multivari-
ate point processes. Simple estimation procedures can be
derived for different types of the covariance structures and
an invariant texture characterization follows from model pa-
rameters. Overall, the main contributions of this paper is
three-fold:
Addressing invariant texture characterization and mod-
2945
(a) (b)
Figure 1. Keypoint’s position detected by FH+SURF(a). Code-
book construction of visual keypoints and spatial statistical char-
acterization with different study circurlar regions(b). Each color
point depicts the category of keypoints. The intersection cases of
study circle with image boundary were illustrated.
elling from log-Gaussian Cox processes of visual keypoint
patterns.
Testing the implementation of these models with sev-
eral covariance models for different types of keypoint de-
tection and descriptors.
Demonstrating the relevance of the proposed models
for texture recognition with respect to previous work.
This paper is organized as follows. In Section 2, a
brief overview of the proposed approach and related work
is given. We present in Section 3 the proposed proba-
bilistic keypoint-based texture model based on multivari-
ate log-Gaussian Cox processes. Comparative evaluations
of texture recognition performance are reported for several
databases in Section 4. We further discuss the main contri-
butions of proposed approach in Section 5.
2. Proposed approach and related work
The general goals of this paper are the characterization
and modeling of visual textures from the spatial patterns
formed by visual keypoints using point process models. The
initial step then consists in detecting local keypoints in tex-
ture images(Fig.1a). As a way to making easier statisti-
cal estimation issues, we build a codebook of visual key-
points from their visual signatures using adapted clustering
techniques such that any visual keypoint is assigned to a
category(Fig.1b). Regarding visual keypoint sets as finite
spatial random sets, log-Gaussian Cox models are investi-
gated to reach an invariant texture characterization from the
covariance structure of spatial patterns. The state-of-the-art
approaches of visual keypoints are detailed below.
When addressing matching and recognition issues in im-
ages, the typical approach relies on learning models in the
feature space defined by local visual descriptors. The focus
has been given to visual signatures invariant to geometric
and photometric transformation of the images [3, 19, 27].
Among the most popular descriptors, local keypoints were
Keypoint density Descriptor size
DoG+Sift 1582 128
FH+Surf 758 64
(Har-Lap)+(Sift-Spin) 538 178
(Hes-Lap)+Daisy 1216 200
FH+Brief 758 256
Table 1. Number of detected keypoints and size of signature vector
of the different detector-desciptor types. The processed texture
pattern is the image displayed in Fig.1a.
shown to be particularly efficient [13, 18] compared to the
early feature developed for texture analysis such as Gabor
features [22] and cooccurrence matrices[8].
Numerous approaches have been proposed to detect re-
gions or points of interest in images. Among the most pop-
ular the Harris detector [9] detects corners, i.e. the points
at which significant intensity changes in two directions oc-
cur. It relies on the eigen-decomposition of the structure
tensor of the intensity function. Scale-space approaches
based on the analysis of the Hessian matrix were also pro-
posed to address scale adaption [14]. Scale-spaces of Dif-
ference of Gaussians (DoG) are also widely considered as
an approximation of the Laplacian [16]. More recently,
Mikolajczyk et al.[19] combined Harris or Hessian detec-
tor and the Laplacian operator (for scale adaption) to pro-
pose two scale-invariant feature detectors, namely: Harris-
Laplace (Har-Lap) and Hessian-Laplace (Hes-Lap). Bay
et al.[1] presented the Fast-Hessian (FH) detector based on
the Hessian matrix in the integral images. Other categories
of keypoint detectors may be cited, for instance the max-
imally stable extremal region (MSER) detector [17], the
edge-based region (EBR) detector, the intensity extrema-
based region(IBR) detector [25] or entropy-based region
(such as salient regions) detector [11]. Comparisons be-
tween the different detectors are given in [1, 13, 19].
Given the pixel coordinates of the extracted keypoints,
many different schemes have been proposed to extract a fea-
ture vector of each keypoint siand invariances to contrast
change and geometric transforms, typically affine trans-
forms, are embedded [10, 18, 28]. The SIFT descriptor is
certainly among the most popular and relevant ones. It is
formed by the distribution of the orientations of the gradi-
ent of the intensity in 4x4 windows around the considered
point [16]. This description ensures contrast invariance and
partial invariance to affine transforms. Orientations are typ-
ically quantized over eight values such that the SIFT feature
vector is 128-dimensional. Several extensions of the orig-
inal SIFT descriptor have been proposed including GLOH,
PCA-SIFT, RIFT (see [13] for a review).
For instance, an intensity-domain spin image [13] is a 2D
histogram encoding the distribution of the intensity value
and the distance from the reference point. Rather than con-
sidering gradient orientations, the SURF descriptor [1] rely
2946
on the distribution of Haar-wavelet responses whereas the
Daisy descriptor [24] exploits responses to oriented Gaus-
sian filters; the Brief descriptor [3] was issued from a rel-
atively small number of intensity of different image patch
using binary string.
From these reviews, we investigate five robust de-
tector/descriptor types reported with the best score per-
formance in [1, 3, 16, 24, 28], respectively: FH+Surf,
FH+Brief, DoG+Sift, (Hes-Lap)+Daisy and (Har-
Lap)+(Sift-Spin). As illustrated in Tab.1 for the texture
sample displayed in Fig.1a, these different combinations
lead to different complexity levels as well as large differ-
ences in the number of detected keypoints, a critical aspect
when considering keypoint statistics.
3. Multivariate Log Gaussian Cox process
3.1. Multivariate point process and associated de-
scriptive statistics
Aspatial point process Sis defined as a locally fi-
nite random subset of a given bounded region BR2.
A realization of such a process is a spatial point pattern
s={s1, ..., sn}of n points contained in B. Considering
a realization of the point process, the moments of random
variable are relevant descriptive statistics. In the general
case, the pth-order moment of Sis defined as:
µ(p)(B1×... ×Bp) = E{N(B1)...N(Bp)}(1)
where E{.}denotes the expectation, N(Bi)is the num-
ber of random points contained in a given Borel set B. Fo-
cusing on intensity measure of S, the first-order moment is
evaluated with p= 1:
µ(B) = EX
sS
IB(s) = Z
B
ρ(s)ds (2)
where IB(s)is an indicator function that takes the value
1 when sfalls in region B,ρ(s)ds is the probability that
one point falls in an infinitesimally small area ds of the
neighborhood of point s. The normalized first-order mo-
ment λ=µ(B)/|B|is the mean density of expected points
per surface unit, |B|is the surface of region B. This quan-
tity fully characterizes Poisson point processes. For a ho-
mogeneous process, this density is spatially constant.
Beyond the first-order moment, the covariance structure
of the count variable, i.e. descriptive statistics of the pairs
of points of the finite random set, can be characterized by
the second-order moment µ(2) of Sparameterized as:
µ(2)(B1×B2) = EX
s1SX
s2S
IB1(s1)IB2(s2)(3)
=Z
B1×B2
ρ(2)(s1, s2)ds1ds2(4)
where second-order density ρ(2)(s1, s2)is interpreted as
the density, per surface unit, of the pair of points s1and s2
in infinitesimally small areas ds1and ds2. For a stationary
and isotropy point process, this density function ρ(2)(s1, s2)
states the correlation of pairs of points and only depends
on distance ks1s2k[23]. In the spatial point fields, the
second-order measure µ(2) is frequently replaced by the fac-
torial moment measure α(2) as:
α(2)(B1×B2) = EX
s1SX
s2S
(s26=s1)
IB1(s1)IB2(s2)(5)
where the relation betwen the second-order measure µ(2)
and the factorial moment measure α(2) is given by:
α(2)(B1×B2) = µ(2) (B1×B2)µ(B1B2)(6)
Amultivariate point process Ψis defined as a spatial
point process for which a discrete mark miis associated
to each point siin B. Second-order moment in Eq.4 can be
extended to multivariate point patterns. Considering circu-
lar study region D(., r)with radius r (Fig.1b), the second-
order cooccurrence statistics of Ψare characterized by the
factorial moment measure as follows.
α(2)
i,j (r) = EX
hX
l6=h
δi(mh)δj(ml)I(kshslk ≤ r)(7)
where δi(mh)equals 1 if the mark mhof point shis
iand 0 otherwise. For statistical interpretation of second-
order moment µ(2) [23], Ripley’s K function that is usually
used to analyse the mean number of points of type j located
in a study region of radius rcentered at the points of type i
(which itself is excluded) is measured as:
Kij (r) = (λiλj)1α(2)
ij (r)(8)
3.2. Log-Gaussian Cox model
A Cox process Xwith random intensity function Zis a
point process such that X|Zis a Poisson process with in-
tensity function Z[20, 23]. For an univariate log Gaussian
Cox process Xon a locally finite subset SR2, the ran-
dom intensity function is given by Z= exp(Y), where Y is
a Gaussian field on Scharacterized by its mean µ=EY (s)
and covariance functions c(r) = Cov(Y(s1), Y (s2)),
where r=ks1s2k, are defined and finite for all bounded
BS. An important property of log-Gaussian Cox pro-
cess is that the characteristics of the Gaussian field Y relate
to the first and second-order moments of the point process.
More precisely, the following relations hold [20]:
(ρ(s) = λ= exp(µ+σ2/2)
ρ(2)(s1, s2)/(ρ(s1)ρ(s2)) = g(r) = exp(c(r))
(9)
2947
Figure 2. Intensity estimation on a random set of 435 points issued
from log Gaussian Cox process.
are respectively the intensity and the pair correlation
function, where σ2=V ar(Y(s)) is the variance of the
Gaussian process. We report an example of the intensity es-
timation issued from log Gaussian Cox processes in Fig.2.
Extending to a multivariate log-Gaussian Cox pro-
cess, Cox processes {Xi}are conditionally independent
w.r.t. a multivariate intensity field Z={Zi}and
that Xi|Ziis a Poisson process with intensity measure
{Zi}.Zrelates to a multivariate Gaussian field Y as
Zi= exp(Yi). The multivariate Gaussian random field is
characterized by its mean µi(s)and covariance functions
cij (r) = Cov(Yi(s1), Yj(s2)). And the intensity and pair
correlation function become:
λi=exp(µi+σ2
i/2); gij (r) = exp(cij (r)) (10)
Fitting a stationary parametric log-Gaussian Cox process
comes to estimating mean and covariance parameters for the
associated Gaussian field. Following [20, 23], an estimation
procedure relies on the relation between the pair correlation
function gij and the K-function of Gaussian processes as:
Kij (R) = 2π
R
Z
0
rgij (r)dr (11)
where Ris a pre-defined value of radius. Combining Eq.8
and Eq.11, the pair correlation function can be estimated as:
gij (r) = 1
2πrλiλjX
hX
l6=h
δi(mh)δj(ml)ξ(kshslk, r)bsh(12)
where ξ(.)is a kernel (here a Gaussian kernel is consid-
ered), λiis the intensity for class iestimated from (Eq.2),
bshis the proportion of the circumference of the study circle
lying within the image. In practice, the computation of the
above second-order descriptive statistics take account edge
effects [21]. Considering the edge effect correction, gij is
not symmetric in i and j. Hence, the non-parametric estima-
tion of the covariance function is defined as:
cij (r) = log λigij (r) + λjgj i(r)
λi+λj(13)
Exponential Cardinal sine Hyperbolic
exp((r/β)α) sin(r/β)/(r) (1 + r)1
Table 2. Different correlation functions of L(β, r).
To resort to a compact probabilistic model for the rep-
resentations of visual textures, we investigate parametric
forms of the covariance function c. Given a chosen parame-
terization L(β, r)in Tab.2, model parameters are estimated
from the minimization of the following criterion:
R
Z
0σ2
ij L(β, r)cij(r)2dr (14)
A gradient-based optimization procedure is applied to solve
for this minimization. The proposed probabilistic keypoint-
based texture model is eventually given by intensity param-
eters λi, variances σij and scale parameters βij.
3.3. Feature dimension reduction
Considering the parameters of the log-Gaussian Cox
model as descriptors of the spatial patterns of visual key-
points, each texture image is associated with a k(k+ 2)-
dimensional feature vector, where k is the size of the code-
book of visual words. In practice, the high-dimensional
feature may affect recognition performance. State-of-the-
art methods based on visual keypoints typically involve di-
mensionality in the range of k, e.g. bag-of-keypoint [4],
(Har+Lap)(Sift+Spin) scheme [28]. Hence dimension re-
duction issues should be further analyzed.
A dimensional reduction procedure of second-order
statistics was introduced in [21] from the determination
of categories of keypoint pairs. The codebook of key-
point pairs, denoted by u=M(si, sj), are issued from an
adapted clustering technique applied for each set of two cat-
egorized keypoints siand sj. Therefore, the non-parametric
estimation of the covariance function is given by:
cu(r) = log
1
2πrλuX
hX
l6=h
δu(M(sh, sl))ξ(kshslk, r)bsh
(15)
The estimation of intensity parameter λu, variances σu
and scale parameters βufor each category of keypoint pairs
follows as previously from minimization (Eq.14). Overall
this procedure results in downsizing the proposed texture
descriptor to 3k-dimensional vectors, where kis the size
of codebook of keypoint pairs.
3.4. Invariance properties
Invariance properties of the resulting texture character-
ization are inherited from the characteristics of the chosen
2948
Iref :αref = 1(a)
Without the scale adaption:
(c)
With the scale adaption:
(e)
Itest:αtest = 2(b)
(d)
(f)
Figure 3. Scaling effect on the parameter estimation of keypoint-
based log-Gaussian Cox model. Reference image(a) and test im-
age(b) at two scale factors. The red curves are the values of vari-
ances σu(left:c,e) and scale parameters βu(right:d,f) of reference
image Iref in all plots. The results of parameter estimation with-
out(or with) scale adaption of test image Itest are respectively
showed in blue(or green). The plots were carried out with the
feature dimensionality reduction.
visual keypoint signatures. Image scaling however clearly
affects the second-order moments of the spatial patterns.
More precisely, assuming that the detection and characteri-
zation of visual keypoint are scale-invariant as pointed out
in [21], intensity and covariance parameters for a given tex-
ture observed at two scales relate up to a scale factor. This
scale factor can be estimated from the rate of average point
densities per surface unit. In this work, the actual radius
values Riof the proposed estimation scheme were chosen
depending on a reference image. Fig.3 further illustrates the
stability of proposed features at the different image scales.
4. Experimental evaluation
Given the textural features defined in the previous sec-
tion, an application to texture recognition is considered, i.e.
an unknown texture sample is assigned to one of a set of
know texture classes using a discriminative classifier. The
evaluation of the proposed descriptor involves the computa-
tion of classification performances for model learning with
Nttraining texture samples per class. Training images are
randomly selected among the Nsamples available in each
class. The remaining NNtimages are used as test im-
ages. The random selection of training samples is repeated
50 times to evaluate the mean and the standard deviation of
the correct classification rate. These experimentations are
carried out with three texture datasets, namely UIUC, Bro-
datz, KTH-TIPs databases.
We exploit random forest classifiers [2]. They rely on
the construction of an ensemble of classification trees us-
ing some form of randomization. A sample is classified by
sending it down every tree and aggregating the distribution
of classes for the reached leaves. Random forest uses a vot-
ing rule to assign a class to an unknown sample.
4.1. Parameter setting
A set of texture features such as Gabor filter [22], co-
occurrence matrix [8], local multifractal features [27], s-of-
keypoints(BoK) [4], combination scheme of local keypoints
[28] and descriptive statistics of visual keypoints [15, 21]
were selected to evaluate the relevance of our contribution
compared to the state-of-the-art techniques. Here, we report
the best performance result of each approach obtained from
the different parameter settings detailed as follows.
Given a texture sample, texture features were char-
acterized as the statistics of the response to scale-space
filters such as Gabor wavelets at the orientation θ=
{0,±π
2, π}and the frequencies f={0,4,8}. In con-
trast, co-occurrence matrices measure the intensity or
grayscale values of texture image at a neighborhood dis-
tance d={1,2,4}with a set of orientation θ=
{0,±π
4,±π
2,±3π
4, π}. In addition to these classical texture
descriptors, Xu’s approach [27] was also tested. It relies
on a multifractal description of textures with invariance to
viewpoint changes, non-rigid deformations and local affine
contrast change. We tested different parameter settings for
Xu’s method: density level ind ={1,8}, dimension of
MFS f={16,64}and iteration level ite ={8,10}.
Regarding schemes based on visual keypoints, we im-
plemented bags-of-keypoints, i.e. the relative occurrence
statistics of the different visual words based on SIFT de-
scriptor [4]. We considered also the most popular keypoint
combination scheme (Har+Lap)(Sift+Spin) introduced in
[28]. These methods were computed with the different size
of categories, here, k={60,120,150}. These approaches
were selected to examine the contribution of spatial infor-
mation of keypoints to texture recognition.
The quantitative evaluation also included cooccurrence
statistics of visual keypoints, here Ling’s method [15] and
2949
(a) UIUCTex (b) Brodatz (c) KTH-TIPs
Figure 4. Image examples of three texture datasets.
Nguyen’s method [21]. They were implemented with log-
arithmicly increased neighborhood sizes, Nr= 128log(x)
where x varies between 1 and exp(1) according to a 0.05
linear step. The computation of the second-order descrip-
tive statistics reported in [21] involved a correction of edge
effects and scaling factor as well as feature dimension re-
duction with k= 60 categories of visual keypoints pairs.
The similar parameter setting was used for the proposed ap-
proach based on multivariate log-Gaussian Cox model
4.2. Performance results
4.2.1 UIUC dataset
The UIUC dataset involves 25 texture classes and each
class contains 40 640x480 images with strongly varying
viewpoint, scale and illumination conditions (Fig.4a). Re-
garding the comparison of the proposed descriptor to pre-
vious work in Tab.3, the mean correct classification rate
and standard deviations over 50 random selections are re-
ported for each approach as a function of the number of
training samples Nt. Observing the result of the classi-
cal approaches such as Gabor filters, cooccurrence matrix
and multifractal (Xu’s method), these experiments clearly
demonstrate the interest of inheriting the robustness of the
visual keypoints for texture recognition in terms of invari-
ance to geometric image distortions and contrast change:
respectively, 67.78%±1.28,80.12% ±1.30,93.85% ±1.31
vs. 97.84% ±0.32 of our method with Nt= 20.
Considering state-of-the-art keypoint-based approaches,
reported performances stress the relevance of the proposed
probabilistic texture models. The gain is greater than 6.5%
compared with the BoK and 1.5% compared with the most
popular local keypoint (Har+Lap)(Sift+Spin) scheme of
Zhang’s method when 20 training images are considered.
These results emphasize the efficiency of the spatial statis-
tical analysis of the visual keypoints. On the other hand, the
proposed descriptor gets a more robust texture recognition
than the techniques of cooccurrence statistics of visual key-
points of Ling’s method and Nguyen’s method, respectively
91.87% ±1.38,97.34% ±0.25 vs. 97.84% ±0.32 when 20
training images are considered.
4.2.2 Brodatz dataset
We also evaluated recognition performances for 111 dif-
ferent texture classes from Brodatz album. Each class of
this dataset comprises 9 170x170 sub-images (Fig.4b) ex-
tracted from images. It might be noted that this dataset does
not includes scale and illumination changes. The proposed
descriptor favorably compares to the other approaches in
Tab.4 with a random selection of 1 or 3 training images par
class. Our approach reaches up to 96.14% ±0.41 of cor-
rect classification with 3 training images. Greater improve-
ments of than 12% are reported to compare with Gabor
and cooccurence features. All other methods reached clas-
sification performances below 95.67%. The proposed de-
scriptor is shown to be slightly more robust and stable than
Nguyen’s method and Zhang’s method, here 88.81%±0.92
vs. 87.67%±0.81 and 86.63% ±1.05 when Nt= 1.
4.2.3 KTH-TIPs dataset
Similar conclusion can be drawn from the third experiment
for the KTH-TIPs texture dataset. This dataset involves 10
material classes. Each class contains 81 images. Texture
samples are 200x200 images (except some samples of two
classes: brown-bread and cracker) for different scales, illu-
mination directions and object poses (Fig.4c). The classifi-
cation performance comparisons for this dataset are showed
in Tab.7. The proposed approach has a gain of about 1%
compared to the best score of all other approaches, namely
Nguyen’s method, 95.74%±0.45 vs. 95.09%±0.41 when
40 training images are used.
4.2.4 Performance comparison among different key-
point types and different covariance functions
We also report a detailed analysis of the classification per-
formances on UIUC texture dataset reached by the pro-
2950
NtGabor filter Cooc. matrix BoK[4] Ling[15] Xu[27] Zhang[28] Nguyen[21] our method
1 31.22±3.14 45.33±3.03 67.25±2.75 67.62±2.93 61.14±2.90 72.53±2.45 75.66±1.65 75.21±1.75
5 45.14±2.54 61.58±2.14 76.38±2.15 78.42±2.33 83.33±2.07 88.62±1.33 91.67±0.93 91.96±1.13
10 57.37±1.93 70.67±1.72 81.12±1.45 84.14±1.72 89.68±1.65 93.17±1.15 94.33±0.78 95.42±0.71
15 61.25±1.52 73.85±1.34 86.35±1.20 86.38±1.25 91.34±1.45 95.33±0.98 96.54±0.53 96.87±0.65
20 67.78±1.28 80.12±1.30 91.28±1.15 91.87±1.38 93.85±1.31 96.67±0.93 97.34±0.25 97.84±0.32
Table 3. Classification rates and standard deviations of proposed method compared with the-state-of-the-art approaches on UIUC dataset.
NtGabor filter Cooc. matrix BoK[4] Ling[15] Xu[27] Zhang[28] Nguyen[21] our method
1 78.52±1.72 75.42±1.73 83.16 ±1.50 84.33 ±1.63 85.95 ±0.91 86.63 ±1.05 87.67±0.81 88.81±0.92
385.14 ±1.41 83.22±1.04 92.78 ±0.91 93.17 ±0.87 93.41 ±0.73 94.34 ±0.43 95.67±0.33 96.14±0.41
Table 4. Classification rates and standard deviations over 50 random selections on Brodatz texture database.
NtGaussian Cardinal sine Hyperbolic
175.21±1.75 75.15±1.67 75.03±1.81
591.96±1.13 91.63±1.17 91.32±1.19
10 95.42±0.71 95.35±0.75 94.72±0.85
15 96.87±0.65 96.17±0.63 95.43±0.71
20 97.84±0.32 97.15±0.42 96.85±0.38
Table 5. Comparison performance of proposed model with the dif-
ferent covariance functions on UIUC dataset.
posed keypoint-based texture models. Different combina-
tions of detector-descriptor types are evaluated in Tab.8.
These results are issued from log-Gaussian Cox models
with a Gaussian covariance function. DoG+Sift descrip-
tors are shown to be the most efficient with a gain from
0.15% to 1.2% compared to the other ones when 15 or
20 training images are used. When only 5 or 10 train-
ing images are considered, (Hes-Lap)+Daisy descriptor
leads to the best classification score with 92.13%±1.19
and 95.47%±1.08(respectively). These results might be
explained by the greater keypoint density observed when
using DoG+Sift and (Hes-Lap)+Daisy schemes( see Tab.1)
compared to the other combinations such that a finer char-
acterization of textures can be reached as well as a more
robust parameter estimation of log-Gaussian Cox model.
Given a set of visual keypoints with DoG+Sift descrip-
tor, the classification performances obtained with different
covariance functions were reported in Tab.5. The best clas-
sification performance is obtained with a Gaussian func-
tion with 97.84%±0.32 vs. 97.15%±0.42 or 96.85%±0.38
when 20 training images are used.
We emphasize the relevance of the proposed scale-
adaption and dimension reduction schemes in Tab.6.
Whereas the use of scale-adaption leads to a gain of about
2%, dimension reduction resorts to a slightly more robust
recognition when 5 to 20 training images are available. The
later can be interpreted as a filtering property of this scheme.
5. Discussion
In this paper, we have further explored texture descrip-
tion and recognition from the joint characterization of the
NtWithout Without Model
scaling effect dimensional reduction complete
1 73.27±2.05 75.65±1.82 75.21±1.75
5 89.12±1.27 91.67±1.15 91.96±1.13
10 94.76±1.11 95.15±0.75 95.42±0.71
15 95.12±0.81 96.27±0.71 96.87±0.65
20 95.89±0.54 97.12±0.35 97.84±0.32
Table 6. Comparison performance of proposed model with and
without scaling effect and dimensional reduction.
spatial and visual patterns of keypoint sets in texture im-
ages. Viewing keypoint sets as realizations of finite spa-
tial random sets we have shown that beyond descriptive
statistics proposed in [15, 21] probabilistic keypoint-based
models can be developed for visual textures. The proposed
models embed invariance properties with respect to contrast
change and geometric image transforms to reach robust tex-
ture recognition performance as proven by the quantitative
evaluation to state-of-the-art schemes.
As pointed out in [6, 7, 23], it is difficult to recommend
a priori the best model among many models for spatial
point pattern analysis in the literature, e.g. Neyman-Scott,
shot-noise Cox or Gibbs processes. Here, multivariate log-
Gaussian Cox models were selected. They have several ap-
pealing features:
They are fully characterized from the underlying Gaus-
sian fields, hence the associated mean and covariance fea-
tures. This makes simple the interpretation of model param-
eters as well as their estimation.
Through the various parametric and non-parametric
forms that can be considered for the covariance structure,
these models are highly flexible to cover a wide range of
covariance structures.
Besides compared to simple descriptive statistics these
models have several major advantages:
They are independent on the a priori selection of the
study regions (number of regions and radius sizes) which is
critical setting for the computation of descriptive statistics
and provide a description of keypoint patterns intrinsically
free of edge effects and image size.
2951
NtGabor filter Cooc. matrix BoK[4] Ling[15] Xu[27] Zhang[28] Nguyen[21] our method
562.32 ±2.41 62.83 ±2.42 64.42 ±2.81 71.07 ±2.63 72.63 ±2.45 78.17 ±2.35 81.34±1.93 80.17±2.15
10 74.67 ±2.04 73.34 ±2.22 75.83 ±2.12 76.48 ±2.27 81.42 ±1.95 85.42 ±1.78 87.38±1.47 86.96±1.53
20 82.65 ±1.69 80.45 ±1.67 81.12 ±1.45 83.47 ±1.48 87.18 ±1.53 90.28 ±1.31 92.15±1.26 92.42±1.11
30 87.75 ±1.27 85.83 ±1.30 88.35 ±1.20 88.58 ±1.17 89.95 ±1.35 92.15 ±1.05 93.67±0.97 94.33±0.87
40 89.87 ±0.86 88.95 ±0.77 90.18 ±0.65 91.15 ±1.05 91.33 ±0.97 94.33 ±0.67 95.09±0.41 95.74±0.45
Table 7. Classification rates and standard deviations over 50 random selections on KTH-TIPs texture database.
NtDoG+Sift[16] FH+Surf[1] FH+Brief[3] (Hes-Lap)+Daisy[24] (Har-Lap)+(Sift-Spin)[28]
1 75.21±1.75 75.05±1.94 75.43±1.71 75.18±1.69 74.85±1.87
5 91.96±1.13 90.73±1.11 91.42±1.23 92.13±1.19 91.15±1.41
10 95.42±0.71 95.15±0.91 95.22±0.85 95.47±1.08 95.23±0.72
15 96.87±0.65 96.14±0.63 96.43±0.51 96.75±0.58 96.37±0.61
20 97.84±0.32 96.75±0.41 97.25±0.34 97.67±0.35 97.14±0.37
Table 8. Comparison performance of proposed model with the different detector-descriptor types on UIUC dataset.
They deliver a compact representation of keypoint-
based information, here 3k-dimensional feature space vs.
(Nr+ 1)kcomplexity required by descriptive statistics
[15, 21]. This compactness is of great benefit in the applica-
tion of learning techniques for which the lower-dimensional
feature spaces are also preferred.
The proposed probabilistic models also offer additional
generalization properties. They are associated with an ana-
lytical formulation of the likelihood functions of a keypoint-
patterns and simple simulation schemes. These features are
of great interest for various applications. For instance, they
could benefit to the definition of well-founded texture sim-
ilarity measures from model distances. This is a key fea-
ture of the log-Gaussian Cox model as the similarity mea-
sure can be defined from distances between Gaussian fields
for which analytical formulation may be derived. While
not investigated here as random forests where considered
for comparison to previous work, such similarity measures
would be of great interest for kernel-based learning as well
as other applications such as image indexing or segmenta-
tion. These aspects will be further explored in future work.
References
[1] H. Bay, T. Tuytelaars, and L. V. Gool. Surf: Speeded up robust
features. ECCV, 1:404–417, 2006.
[2] F. Breiman. Random forests. Machine learning, 45:5–32, 2001.
[3] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief: Binary robust
independent elementary features. ECCV, IV:778–792, 2010.
[4] G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with
bags of keypoints. ECCV, pages 1–22, 2004.
[5] M. Cummins and P. Newman. FAB-MAP: Probabilistic Localization
and Mapping in the Space of Appearance. IJRR, 27(6):647–665,
2008.
[6] P. Diggle, B. Rowlingson, and T. Su. Point process methodology
for on-line spatio-temporal disease surveillance. Environmetrics,
16(5):423–434, 2005.
[7] F.Lafarge, G. Gimel’farb, and X. Descombes. Geometric feature ex-
traction by a multi-marked point process. PAMI, 32(9):1597–1609,
2010.
[8] R. Haralick. Statistical and structural approaches to textures. Pro-
ceedings of the IEEE, 67(5):786–804, May,1979.
[9] C. Harris and M. Stephens. A combined corner and edge detector.
Proc. of the Alvey Vision Conf., pages 147–151, 1988.
[10] M. Heikkil ¨
a, M. Pietik¨
ainen, and C. Schmid. Description of interest
regions with local binary patterns. Pattern Recognition, 42(3):425–
436, 2009.
[11] T. Kadir, A. Zisserman, and M. Brady. An affine invariant salient
region detector. ECCV, pages 345–457, 2004.
[12] S. Kotsiantis, I. Zaharakis, and P. Pintelas. Machine learning: a re-
view of classification and combining techniques. Artificial Intelli-
gence Review, 26(3):159–190, 2006.
[13] S. Lazebnik, C. Schmid, and J. Ponce. A sparse texture representa-
tion using local affine regions. PAMI, 27(8):1265–1278, 2005.
[14] T. Lindeberg. Feature detection with automatic scale selection. IJCV,
30(2):79–116, 1998.
[15] H. Ling and S. Soatto. Proximity distribution kernels for geometric
context in category recognition. ICCV, pages 1–8, 2007.
[16] D. Lowe. Distinctive image features from scale-invariant keypoints.
IJCV, 60(2):91–110, 2004.
[17] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline
stereo from maximally stable extremal regions. BMVC, pages 384–
393, 2002.
[18] K. Mikolajczyk and C. Schmid. A performance evaluation of local
descriptors. PAMI, 27(10):1615–1630, 2005.
[19] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas,
F. Schaffalitzky, T. Kadir, and L. V. Gool. A comparison of affine
region detectors. IJCV, 65:43–72, 2005.
[20] J. Møller, A. Syversveen, and R. Waagepetersen. Log gaussian cox
processes. Scandinavian Journal of Stat., 25(3):451–482, 1998.
[21] H.-G. Nguyen, R. Fablet, and J.-M. Boucher. Spatial statistics of
visual keypoints for texture recognition. ECCV, IV:764–777, 2010.
[22] T. Randen and J. H. Husøy. Filtering for texture classification: A
comparative study. PAMI, 21:291–310, 1999.
[23] D. Stoyan and H.Stoyan. Fractals, random shapes and point fields.
Wiley,Chichester, 1994.
[24] E. Tola, V. Lepetit, and P. Fua. Daisy: An efficient dense descriptor
applied to wide baseline stereo. PAMI, 5:815–830, 2010.
[25] T. Tuytelaars and L. V. Gool. Matching widely separated views based
on affine invariant regions. IJCV, 59(1):61–85, 2004.
[26] G.-S. Xia, J. Delon, and Y. Gousseau. Shape-based invariant texture
indexing. IJCV, 88(3):382–403, 2010.
[27] Y. Xu, H. Ji, and C. Fermuller. Viewpoint invariant texture descrip-
tion using fractal analysis. IJCV, 83(1):85–100, 2009.
[28] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local fea-
tures and kernels for classification of texture and object categories: a
comprehensive study. IJCV, 73(2):213–238, 2007.
2952
... The BOVW framework is based on clustering SIFT descriptors (Lowe 1999) into visual words, which are scale invariant. The performance level of the combined approach is comparable to the state-of-the-art methods for texture classification (Zhang et al. 2007;Nguyen et al. 2011;Quan et al. 2014). ...
... The combined representation is alternatively used with the SVM or the KDA. The two classifiers are compared with other state-of-the-art approaches (Zhang et al. 2007;Nguyen et al. 2011;Quan et al. 2014), although Quan et al. (2014) do not report results on the Brodatz data set. ...
... Typically, the results reported in previous studies (Lazebnik et al. 2005;Zhang et al. 2007;Nguyen et al. 2011) on the Brodatz data set are based on randomly selecting 3 training samples per class and using the rest for testing. Likewise, the results presented in this paper are based on the same setup with 3 random samples per class for training. ...
Article
The autocorrelation is often used in signal processing as a tool for finding repeating patterns in a signal. In image processing, there are various image analysis techniques that use the autocorrelation of an image in a broad range of applications from texture analysis to grain density estimation. This paper provides an extensive review of two recently introduced and related frameworks for image representation based on autocorrelation, namely Patch Autocorrelation Features (PAF) and Translation and Rotation Invariant Patch Autocorrelation Features (TRIPAF). The PAF approach stores a set of features obtained by comparing pairs of patches from an image. More precisely, each feature is the euclidean distance between a particular pair of patches. The proposed approach is successfully evaluated in a series of handwritten digit recognition experiments on the popular MNIST data set. However, the PAF approach has limited applications, because it is not invariant to affine transformations. More recently, the PAF approach was extended to become invariant to image transformations, including (but not limited to) translation and rotation changes. In the TRIPAF framework, several features are extracted from each image patch. Based on these features, a vector of similarity values is computed between each pair of patches. Then, the similarity vectors are clustered together such that the spatial offset between the patches of each pair is roughly the same. Finally, the mean and the standard deviation of each similarity value are computed.
... The BOVW framework is based on clustering SIFT descriptors (Lowe 1999) into visual words, which are scale invariant. The performance level of the combined approach is comparable to the state-of-the-art methods for texture classification (Zhang et al. 2007;Nguyen et al. 2011;Quan et al. 2014). ...
... The combined representation is alternatively used with the SVM or the KDA. The two classifiers are compared with other state-of-the-art approaches (Zhang et al. 2007;Nguyen et al. 2011;Quan et al. 2014), although Quan et al. (2014) do not report results on the Brodatz data set. ...
... Typically, the results reported in previous studies (Lazebnik et al. 2005;Zhang et al. 2007;Nguyen et al. 2011) on the Brodatz data set are based on randomly selecting 3 training samples per class and using the rest for testing. Likewise, the results presented in this paper are based on the same setup with 3 random samples per class for training. ...
Article
Full-text available
The autocorrelation is often used in signal processing as a tool for finding repeating patterns in a signal. In image processing, there are various image analysis techniques that use the autocorrelation of an image in a broad range of applications from texture analysis to grain density estimation. This paper provides an extensive review of two recently introduced and related frameworks for image representation based on autocorrelation, namely Patch Autocorrelation Features (PAF) and Translation and Rotation Invariant Patch Autocorrelation Features (TRIPAF). The PAF approach stores a set of features obtained by comparing pairs of patches from an image. More precisely, each feature is the euclidean distance between a particular pair of patches. The proposed approach is successfully evaluated in a series of handwritten digit recognition experiments on the popular MNIST data set. However, the PAF approach has limited applications, because it is not invariant to affine transformations. More recently, the PAF approach was extended to become invariant to image transformations, including (but not limited to) translation and rotation changes. In the TRIPAF framework, several features are extracted from each image patch. Based on these features, a vector of similarity values is computed between each pair of patches. Then, the similarity vectors are clustered together such that the spatial offset between the patches of each pair is roughly the same. Finally, the mean and the standard deviation of each similarity value are computed for each group of similarity vectors. These statistics are concatenated to obtain the TRIPAF feature vector. The TRIPAF vector essentially records information about the repeating patterns within an image at various spatial offsets. After presenting the two approaches, several optical character recognition and texture classification experiments are conducted to evaluate the two approaches. Results are reported on the MNIST (98.93%), the Brodatz (96.51%), and the UIUCTex (98.31%) data sets. Both PAF and TRIPAF are fast to compute and produce compact representations in practice, while reaching accuracy levels similar to other state-of-the-art methods.
... The TRIPAF algorithm can be divided into three phases. In the first phase (steps [15][16][17][18][19][20], feature vectors are computed on patches extracted by apply a grid over the image, and then, the resulted feature vectors are stored in the set P. In the second phase (steps 21-25), the similarity vectors are computed and subsequently clustered according to the spatial offsets between patches. In the third phase (steps 26-31), the TRIPAF vector v is generated by computing the mean and the standard deviation of each component of the similarity vectors within each cluster. ...
... The combined representation is alternatively used with the SVM or the KDA. The two classifiers are compared with other state of the art approaches [18,22]. ...
... Typically, the results reported in previous studies [16,18,22] on the Brodatz data set are based on randomly selecting 3 training samples per class and using the rest for testing. Likewise, the results presented in this paper are based on the same setup with 3 random samples per class for training. ...
Conference Paper
Recently, a novel approach of capturing the autocorrelation of an image termed Patch Autocorrelation Features (PAF) was proposed. The PAF approach was successfully evaluated in a series of handwritten digit recognition experiments on the popular MNIST data set. However, the PAF representation has limited applications, because it is not invariant to affine transformations. In this work, the PAF approach is extended to become invariant to image transformations such as translation and rotation changes. First, several features are extracted from each image patch taken at a regular interval. Based on these features, a vector of similarity values is computed between each pair of patches. Then, the similarity vectors are clustered together such that the spatial offset between the patches of each pair is roughly the same. Finally, the mean and the standard deviation of each similarity value are computed for each group of similarity vectors. These statistics are concatenated in a feature vector called Translation and Rotation Invariant Patch Autocorrelation Features (TRIPAF). The TRIPAF vector essentially records information about the repeating patterns within an image at various spatial offsets. Several texture classification experiments are conducted on the Brodatz data set to evaluate the TRIPAF approach. The empirical results indicate that TRIPAF can improve the performance by up to \(10\,\%\) over a system that uses the same features, but extracts them from entire images. Furthermore, state of the art accuracy rates are obtained when the TRIPAF approach is combined with a scale invariant model, namely a bag of visual words model based on SIFT features.
... Local features 25 95.90% ± 0.6 log-Gaussian Cox processes 41 96.14% ± 0.4 KDA based on BOVW + LTD 93.76% ± 0.7 KDA based on BOVW + TRIPAF+ LTD 97.25% ± 0.5 Table 2 shows the accuracy rates when LTD is combined with other state of the art kernels 42,43 . The first MKL approach is based on combining LTD with a bag of visual words (BOVW) framework based on the PQ kernel 10,42 . ...
... The second MKL approach is based on combining LTD with TRIPAF 43 and with the bag of visual words based on the PQ kernel. These MKL approaches are compared with two state of the art methods 25,41 . Both MKL approaches employ the KDA classifier for training. ...
... The best MKL approach, which includes the kernel based on LTD, yields the highest accuracy on the Brodatz data set, namely 97.25%. This is almost 1% better than the state of the art methods 25,41 . Remarkably, none of the individual components of the best MKL approach reach impressive results when used alone 42,43 , but they complement each other perfectly in the MKL context. ...
Article
Full-text available
At a first glance, computer vision and text mining may seem to be unrelated fields of study, but image analysis and text or string processing are in many ways similar. As will be shown in this paper, the concept of treating image and text in a similar fashion has proven to be very fertile for specific applications in computer vision and text mining. By adapting text and string processing techniques to image processing or the other way around, knowledge from one domain can be transferred to the other. In fact, many breakthrough discoveries have been made by transferring knowledge between different domains. This work is centered around the idea of measuring the local non-alignment among two objects and use it as a similarity or distance function between the respective objects. Remarkably, this idea shows its uses in different domains. More precisely, the local non-alignment can be computed between two images, two text documents, or even two DNA sequences. As such, a variety of applications are exhibited in this paper, ranging from optical character recognition and object recognition to native language identification.
... L'ajout des invariances en rotation et en échelle améliorent également les performances du SO sur la base KTH-TIPS mais de manière moins significative que le logarithme et/ou l'AME. • Log Gaussian Cox (COX) [124] • Basic Image Features (BIF) [43] • Sorted Random Projection (SRP) [ ...
... Etant donné que notre base de données de thrombus ne comporte pas plus de dix classes, le choix de la base KTH-TIPS semble plus judicieux que la base CUReT. De plus, les résultats de nos simulations, sur cette base, pourront ensuite être comparés avec ceux existant dans la littérature[43,99,124,151]. ...
Thesis
La maladie veineuse thromboembolique (MVTE) est un problème de santé publique (plus de 100000 cas par an en France). Elle regroupe deux entités cliniques : la thrombose veineuse profonde (TVP) des membres inférieurs et l’embolie pulmonaire (EP). La TVP correspond à la formation inadaptée d’un thrombus veineux (appelé aussi caillot sanguin) dans les veines profondes (poplitées, fémorales, iliaques). Un thrombus est principalement constitué de globules rouges et de plaquettes dans un réseau de fibrine. La complication majeure d’une TVP est la survenue d’une EP, c’est-à-dire que le thrombus s’est détaché de la paroi veineuse, ou s’est fragmenté, et est entraîné par la circulation sanguine jusqu’à une artère pulmonaire. Cette complication a un taux de mortalité assez élevée autour de 10000 à 20000 cas mortels par an en France. La survenue d’une TVP est multifactorielle associant des facteurs génétiques et acquis pouvant être répartis en trois catégories : la stase veineuse, l’altération de la paroi d’une veine et une hypercoagulabilité. En analysant la structure du thrombus, notre projet vise à identifier le facteur principal responsable de la TVP et à évaluer le risque d’EP. Pour caractériser sa structure, nous disposons de deux modes d’imagerie acoustique : l’échographie et l’élastographie (carte de dureté). Nous proposons d’extraire des descripteurs de ces images acoustiques par deux approches, l’une basée sur les ondelettes (le scattering operator) et l’autre sur les statistiques d’ordre supérieur (les multicorrélations). Ces descripteurs sont ensuite analysées par diverses techniques de classification (analyse en composantes principales, k-moyennes, classification spectrale) pour retrouver la cause principale des TVP ou la présence d’EP.
... Covariance-based features are also among the state-of-the-art features for numerous computer vision applications, including texture recognition [69] and object recognition [12]. Interestingly, spatial covariance features naturally apply to irregularly-sampled datasets, as investigated in [69] for keypoint sets. ...
... Covariance-based features are also among the state-of-the-art features for numerous computer vision applications, including texture recognition [69] and object recognition [12]. Interestingly, spatial covariance features naturally apply to irregularly-sampled datasets, as investigated in [69] for keypoint sets. We proceed in the same way for the raw backscatter datasets associated with each grid pixel. ...
Thesis
This thesis, co-directed by Jean-Marc Boucher and Ronan Fablet (IMT Atlantique) and co-supervised by Didier Charlot (iXBlue), Gilles Le Chenadec and Michel Legris (ENSTA Bretagne), was realized in the context of a convention CIFRE with the company iXBlue.iXblue develops and commercializes a multibeam echosounder (MBES) SEAPIX primarily dedicated to the fishery market. The system is optimized to offer the best compromise between performances capabilities and cost. In addition to the classical characteristics of an MBES, it offers the unique feature of scanning the seafloor (and the water column volume) by electronical beamform multiple the emission swaths from port to starboard, as well as from forward to backward. The objective of the thesis is to study the contribution of these new multi-swath capacities in the analysis and classification of the seafloor.The first part of the work consisted in carrying out a detailed analysis of the measurement chain. This study evaluated the consistency in acquiring the backscattering strength from different insonification modes. The second part investigated the discriminant characteristics of the backscattered signal while taking into account the acquisition geometry of each insonification mode. The last stage of the work involved to methods of fusing the acquired data. This study was carried out in two approaches; the first considers data from the same insonification mode (intra-mode) and the second from different modes (inter-mode), for the seafloor classification. The obtained experimental results highlight the interest of the proposed processing chain and a multi-mode architecture on the real datasets.
... Moreover, those results are still especially encouraging because the HOS features are completely different than the wavelet ones and the combination of this both approaches could improve the results. a Log Gaussian Cox processes [15] b Basic Image Features [16] c Sorted Random Projection [17] S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12 S13 S14 Table I. with supervised techniques since four classes are very misclassed. Table III shows, for four feature vectors, the mean and the standard deviation classification rates over 10 simulations. ...
... -How to learn high-order statistics for texture analysis Last but not least, the high-order information can be learned to describe textures from different perspectives, which highly improve the ability to recognize textures. Among these methods, a multivariate log-Gaussian Cox process is applied to model the relationship of key points [21]. The pairwise local binary pattern (LBP) are developed in [22] to depict the relationship between LBP. ...
Article
Full-text available
Texture characterization is a key problem in image understanding and pattern recognition. In this paper, we present a flexible shape-based texture representation using shape co-occurrence patterns. More precisely, texture images are first represented by tree of shapes, each of which is associated with several geometrical and radiometric attributes. Then four typical kinds of shape co-occurrence patterns based on the hierarchical relationship of the shapes in the tree are learned as codewords. Three different coding methods are investigated to learn the codewords, with which, any given texture image can be encoded into a descriptive vector. In contrast with existing works, the proposed method not only inherits the strong ability to depict geometrical aspects of textures and the high robustness to variations of imaging conditions from the shape-based method, but also provides a flexible way to consider shape relationships and to compute high-order statistics on the tree. To our knowledge, this is the first time to use co-occurrence patterns of explicit shapes as a tool for texture analysis. Experiments on various texture datasets and scene datasets demonstrate the efficiency of the proposed method.
Article
Full-text available
This study explores the idea of learning visual codebook using spectral clustering, which we call spectral visual codebook learning (SVCL). Though spectral clustering has been widely applied into unsupervised segmentation, clustering, and manifold learning, using it to learn codebooks on standard image benchmark datasets has not been thoroughly studied. We show how learned codebooks by SVCL can be used for scene classification, texture recognition and image categorization. We describe several implementations for constructing the similarity graph and addressing the large-scale local image patches problem. We show that our approach captures nonlinear manifolds of semantic image patches. Another advantage is that both label and spatial information can be incorporated without increasing its model complexity. We validate SVCL on datasets such as KTH-TIPS, Scene-15, Graz-02, and Caltech-101.
Article
Full-text available
We present a novel method for generic visual catego-rization: the problem of identifying the object content of natural images while generalizing across variations inherent to the ob-ject class. This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches. We propose and compare two alternative implementations using different classifiers: Naïve Bayes and SVM. The main advan-tages of the method are that it is simple, computationally effi-cient and intrinsically invariant. We present results for simulta-neously classifying seven semantic visual categories. These re-sults clearly demonstrate that the method is robust to back-ground clutter and produces good categorization accuracy even without exploiting geometric information.
Article
In this survey we review the image processing literature on the various approaches and models investigators have used for texture. These include statistical approaches of autocorrelation functions, optical transforms, digital transforms, textural edgeness, structural element, gray tone co-occurrence, run lengths, and autoregressive models. We discuss and generalize some structural approaches to texture based on more complex primitives than gray tone. We conclude with some structural-statistical generalizations which apply the statistical techniques to the structural primitives. -Author
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
A review is presented of the image processing literature on the various approaches and models investigators have used for textures. These include statistical approaches of autocorrelation function, optical transforms, digital transforms, textural edgeness, structural element, gray tone co-occurrence, run lengths, and auto-regressive models. A discussion and generalization is presented of some structural approaches to texture based on more complex primitives than gray tone. Some structural-statistical generalizations which apply the statistical techniques to the structural primitives are given.
Article
Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a large-scale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations and learns a Support Vector Machine classifier with kernels based on two effective measures for comparing distributions, the Earth Mover's Distance and the χ 2 distance. We first evaluate the performance of our approach with different keypoint detectors and descriptors, as well as different kernels and classifiers. We then conduct a comparative evaluation with several state-of-the-art recognition methods on four texture and five object databases. On most of these databases, our implementation exceeds the best reported results and achieves comparable performance on the rest. Finally, we investigate the influence of background correlations on recognition performance via extensive tests on the PASCAL database, for which ground-truth object localization information is available. Our experiments demonstrate that image representations based on distributions of local features are surprisingly effective for classification of texture and object images under challenging real-world conditions, including significant intra-class variations and substantial background clutter.
Article
Planar Cox processes directed by a log Gaussian intensity process are investigated in the univariate and multivariate cases. The appealing properties of such models are demonstrated theoretically as well as through data examples and simulations. In particular, the first, second and third-order properties are studied and utilized in the statistical analysis of clustered point patterns. Also empirical Bayesian inference for the underlying intensity surface is considered.
Article
We formulate the problem of on-line spatio-temporal disease surveillance in terms of predicting spatially and temporally localised excursions over a pre-specified threshold value for the spatially and temporally varying intensity of a point process in which each point represents an individual case of the disease in question. Our point process model is a non-stationary log-Gaussian Cox process in which the spatio-temporal intensity, λ(x,t), has a multiplicative decomposition into two deterministic components, one describing purely spatial and the other purely temporal variation in the normal disease incidence pattern, and an unobserved stochastic component representing spatially and temporally localised departures from the normal pattern. We give methods for estimating the parameters of the model, and for making probabilistic predictions of the current intensity. We describe an application to on-line spatio-temporal surveillance of non-specific gastroenteric disease in the county of Hampshire, UK. The results are presented as maps of exceedance probabilities, P{R(x,t)c|data}, where R(x,t) is the current realisation of the unobserved stochastic component of λ(x,t) and c is a pre-specified threshold. These maps are updated automatically in response to each day's incident data using a web-based reporting system. Copyright © 2005 John Wiley & Sons, Ltd.