# Intersection and Signed-Intersection Kernels for Intervals.

**0**

**0**

**·**

**0**Bookmarks

**·**

**37**Views

- Citations (0)
- Cited In (0)

Page 1

Intersection and Signed-Intersection

Kernels for Intervals

Francisco J. RUIZa,1, Cecilio ANGULOaand Núria AGELLb

aKnowledge Engineering Research Group. Universitat Politècnica de Catalunya

bDepartment of Quantitative Methods Management. ESADE-Universitat Ramon Llull

Abstract. In this paper two kernels for interval data based on the intersection op-

eration are introduced. Fist, it is demonstrated that the intersection length of two

intervals and a signed variant of the intersection are positive definite (PD) and con-

ditionally positive definite (CPD) kernels respectively. The performance of these

kernels is evaluated by using one and two-dimensional examples.

Keywords. Qualitative Reasoning, interval analysis, kernel methods

Introduction

In some practical situations, the exact value of a variable is unknown and only an interval

of possible values of this variable is available. This happens, for example, if the value is

measured by a non-ideal instrument, or if it is the average of several measures. It is also

due to the finite nature of computers that cannot cope with the continuous and infinite

aspects of real numbers. Many machines learning techniques are specially conceived to

deal with infinity-precise variables. Specifically the kernel methods, whose best known

representative is the Support Vector Machine [2], are initially applied with kernels de-

fined on Rn, such as polynomial kernels (homogeneous or not), Gaussian kernel, sig-

moid kernel, etc. [3]. The kernel formulation is initially used for converting nonlinear

problems to linear problems, but due to its nature, these methods are suitable for dealing

with any kind of data, even with non-structured data. This is possible by mapping the

data into a new space, the feature space. The learning then takes place in this new space.

It does not matter the structure of the input space, only the Euclidean structure of the

feature space is used.

This characteristic has been used to apply kernel methods with any kind of non-

structured data such as biosequences, images, graphs and text documents [4]. The kernel

formulation not only allows the use of these sets of data in learning processes, but also

permits considering a metric structure and a similarity measure in this set.

In order to use kernel methods with interval data, it is proven that the length of

the intersection of two intervals is a positive semi-definite kernel. From the intersection

length function, another kernel -the signed intersection- is also introduced in this paper.

The signed intersection is not a positive semi-definite kernel but it is a conditionally pos-

1Corresponding Author: EPSEVG-UPC. Avda. Víctor Balaguer, s/n, 08800 Vilanova i la Geltrú; E-mail:

francisco.javier.ruiz@upc.edu

Page 2

itive semi-definite kernel suitable for many learning algorithms such as Support Vector

Machines. These two kernels presented will allow applying learning machine techniques

based on kernel methods when data are described by interval variables.

The rest of the paper is structured as follows. In Section 1 an overview of Kernel

Methods is presented. In Section 2 it is demonstrated that the length of the intersection

of two intervals is a positive semi-definite kernel. A signed variant of this intersection

kernel is also introduced. Section 3 provides samples that illustrate the use of the interval

kernels presented in the previous sections using one and two-dimensional data. The last

section presents some conclusion and future lines of work.

1. Kernelization

Kernel Methods are a class of algorithms for automatic learning that enable analysis of

nonlinear patterns with the efficiency as that of linear methods. The best known Kernel

method is the Support Vector Machine (SVM), but also other methods such as Principal

Components Analysis (PCA), Fisher Linear Discriminant Analysis (LDA) or the Per-

ceptron algorithm were shown to be capable of operating with kernels. SVM belong to

a family of generalized linear classifiers that maximize the geometric margin between

classes. The fact that the formulation of the linear version of SVM and other ’kernal-

izable’ linear methods solely depends on the dot product between input patterns allows

replacing this dot product with a suitable function named ’kernel’. Mercer’s theorem [6]

states that any symmetric and Positive semi-Definite (PD) function K(x,y) can be ex-

pressed as a dot product in a certain high-dimensional space. The substitution of the dot

product by the kernel can be interpreted as mapping the original input patterns into a

different higher dimensional space where the linear classifier could separate the classes.

The main advantage of kernel functions is that it is not necessary to consider this fea-

ture space explicitly and, hence, the dimension of this feature space has no effect on the

algorithm.

The use of PD kernels in SVM ensures the Hessian to be positive definite and allows

the algorithm to find a unique global minimum value. However, in the SVM algorithm it

is possible to use a kernel which does not satisfy Mercer’s condition, i.e. non-PD kernels.

Even for non-PD kernels, one might still find a positive definite Hessian, in which case

the algorithm will converge perfectly.

Conditional Positive semi-Definite (CPD) kernels belong to a larger class of kernels

than that of PD kernels. It will emerge that SVM and other learning algorithms work

with this class, rather than only with PD kernels. In the same way as PD kernels can be

considered as dot products in a certain feature space, CPD can be used as generalized

distances in feature spaces [7].

2. Intersection kernel and Signed Intersection kernel

In this section it is proven that the length of the intersection of two intervals is a PD

kernel defined on the interval set. Also another function based on the intersection length

is proposed to be a kernel, but CPD kernel.

Page 3

2.1. Intersection Kernel

We first prove that the length of the intersection of two intervals is a PD kernel. In order

to prove this, we will consider a feature space formed by the square integrable real func-

tions. The intersection length is equivalent to the usual dot product in this feature space.

Once proven to be a kernel, it is not necessary to use the feature space again to calculate

K(I,J).

In order to fix the notation that we are going to use in the rest of the paper, we con-

sider two different representation of the bounded and closed interval: the endpoint rep-

resentation, I = [a,b] = {x ∈ R/a ≤ x ≤ b} and the midpoint-radius representation,

I = B(c,r) = {x ∈ R/ | x − c |≤ r}.

The set of all the bounded and closed intervals defined on R is named I(R). By

considering closed intervals, it is possible to identify a real number a ∈ R with the

singleton [a,a], i.e. R ⊂ I(R).

Theorem 1 Let I1,I2∈ R be two bounded and closed intervals. The function

K∩(I1,I2) = length(I1∩ I2)

is a positive definite kernel.

Proof

Let φ : I(R) −→ L2(−∞,+∞) be the mapping onto the Hilbert space of square

integrable real functions defined as:

?1 if a ≤ x ≤ b

with I = [a,b] ∈ I(R).

The composition of this function with the usual dot product in L2,

?+∞

= length([a1,b1] ∩ [a1,b1])

leads to K∩.

φ(I) = g[a,b]=

0 otherwise

(1)

K∩([a1,b1],[a2,b2]) =

−∞

g[a1,b1](x)g[a2,b2](x)dx

(2)

2.2. Signed Intersection Kernel

If we consider two intervals I1= B(c1,r1) and I2= B(c2,r2), the expression of the

Kernel Intersection can be written as:

K∩(I1,I2) = max{0,min{2r1,2r2,r1+ r2− |c1− c2|}}

Inthisexpression,whentheintervalsaredisjoint,thevalorofr1+r2−|c1−c2|isnegative

and hence, K∩(I1,I2) = 0. This fact makes this kernel inappropriate for discriminating

between disjoint intervals. In this section a modification of the Kernel Intersection will

be considered. This modification will lead to a simpler and fast-to-compute expression

that can also discriminate between disjoint intervals.

(3)

Page 4

Definition Let I1 = B(c1,r1) and I2 = B(c2,r2) be two intervals defined by their

midpoints and radii. The Signed Intersection Kernel is:

KS(I1,I2) = r1+ r2− |c1− c2|

(4)

The next proposition, will show the relation between this function and the Kernel

Intersection.

Proposition 1 Let I1,I2∈ I(R) are intervals. It is verified,

1. KS(I1,I2) ≥ K∩(I1,I2) for embedded intervals, I1⊂ I2or I2⊂ I1.

2. KS(I1,I2) ≤ K∩(I1,I2) for disjoint intervals, I1∩ I2= ∅.

3. KS(I1,I2) = K∩(I1,I2) otherwise.

Proof

1. IfI1⊂ I2,then|c1−c2| ≤ r2−r1,i.e.r1+r2−|c1−c2| ≥ r1+r2+r1−r2= 2r1.

Hence KS(I1,I2) ≥ K∩(I1,I2). A similar proof for I2⊂ I1leads to the same

result.

2. If I1∩ I2= ∅ then |c1− c2| > r1+ r2. Hence, KS(I1,I2) ≤ 0 = K∩(I1,I2).

Strict equality appears for the singleton case).

3. For not disjoint, not embedded intervals, length of the (non-empty)intersection is

strictly lower than the length of each interval. Hence K∩(I1,I2) = r1+ r2−

|c1− c2| = KS(I1,I2).

The Signed Intersection is not a PD function, since it does not verify the Cauchy-

Schwarz inequality2. Nevertheless, the Signed Intersection is conditionally positive

semi-definite (CPD). This kind of function belongs to a larger class of kernels that can

be considered as a generalized distance in a certain feature space in the same way that

PD kernels are dot products in feature spaces [7].

Definition Let X ?= ∅ a set. A bivariate function K : X × X → R is a conditionally

positive semi-definite (CPD) kernel if for all {α1,...,αm} ∈ R with?m

i=1αi= 0 and

xi,xj∈ X, it is satisfied,

m

?

i,j=1

αi· αj· K(xi,xj) ≥ 0

(5)

In order to demonstrate that the signed intersection kernel is a CPD kernel, we need

two previous results.

Lemma 1 K(x,y) = −?x − y?2, with x,y ∈ Rnis a CPD kernel.

Proof

2If we consider I1= [1,2] and I2= [6,7] then 42= |KS(I1,I2)|2> KS(I1,I1) · KS(I2,I2) = 1 · 1

and, hence, the Cauchy-Schwarz inequality is not satisfied.

Page 5

?

= −

i,j

αiαjK(xi,xj) = −

?

?

= 0 + 0 + 2

?

i,j

αiαj?xi− xj?2=

?

i

αi

?

j

αj?xj?2−

j

αj

?

i

αi?xi?2+

+2

i,j

αiαj?xi,xj? =

?

i,j

αiαj?xi,xj? ≥ 0since

?

i

αi= 0.

Lemma 2 K(x,y) = −|x − y|, with x,y ∈ R is a CPD kernel.

Proof

It has been demonstrated [1] that for K is a CPD kernel, then −(−K)β∀β ∈ (0,1)

is also a CPD kernel. Using β = 1/2 and the latter lemma, K(x,y) = −|x − y| is a

CPD kernel.

Theorem 2 The signed intersection kernel KSis a CPD kernel

Proof

?

=

i,j

αiαjKS(Ii,Ij) =

?

= 0 + 0 +

?

αiαj(ri+ rj− |cj− ci|) =

?

αiαj(−|cj− ci|) ≥ 0

j

αj

?

?

i

αiri+

?

i

αi

j

αjrj+

?

i,j

αiαj(−|cj− ci|) =

i,j

In [7] and [1] it is argued that CPD kernels are a natural choice whenever they deal

with a translation invariant problem, such as the SVM algorithm (maximization of the

margin of separation between two classes of data is independent of the origin’s position).

It will be shown in the next subsection that the signed intersection kernel is useful when

the classification problem do no depend on the radius, just on the center.

3. EXAMPLES

To illustrate how these interval kernels may be used in learning tasks, three application

examples using SVM are considered here.

The first example consists of the classification of subintervals from interval [0,1]

into two classes named, as usual, −1 and 1. An interval belongs to class 1 if the mid-

point belongs to the set [0.15,0.40] ∪ [0.50,0.65] ∪ [0.80,0.90], whatever their radius.

Otherwise belongs to class −1. In Figure 3a, some intervals of both classes are shown.

In order to better visualize the performance of the two kernels, a diagram called

midpoint-radius diagram is used. The midpoint-radius diagram consists of associate to

each interval I = B(c,r) the point (c,r) ∈ R2. This diagram uses the upper half-

plane above the horizontal axis (midpoint axis) to represent all the bounded and closed

intervals. The midpoint axis contains all the real numbers.

Page 6

(a) Some intervals of both classes 1 and −1. (b) The two zones in midpoint-radius dia-

gram.

(c) Classification of test pattern in Example 1 in midpoint-radius diagram for the two kernels used.

Figure 1. Example 1. Training patterns and results.

Figure 3b represents the two zones in the midpoint-radius diagram that the classifier

must find in the first example.

250 patterns were generated for training and 5000 patterns for testing in order to

visualize the two classes forecast for the classifier. A SVM was trained using the Inter-

section Kernel and the Signed Intersection Kernel. By representing the class provided by

the classifier with two different colors, the shape of these classes in the midpoint-radius

diagram is easily compared (Figure 3c).

By using the Kernel Intersection the test error is 28.20%, this best value is reached

using C = 10 (Figure 3c left side). In this case, the Kernel Intersection does not reveal

a good classification. The best choice in this example is the Signed Intersection Kernel

with a test error percentage of 0.80% (with C = ∞) (Figure 3c right side). It can be

concluded that the Signed Intersection Kernel is more suitable than others when the

classification does not depend on the radius but only on the midpoint.

In the second example 250 intervals from [0,1] are also generated but in this case,

we consider as class 1 those intervals most of which, are included in the interval I0=

[0.3,0.5], i.e. which belong to the class 1 if length(I ∩ I0) ≥1

If c0and r0are the midpoint and the radius of I0, the set of intervals I = B(c,r)

belonging to class 1 are those which satisfy c0− r0 ≤ c ≤ c0+ r0and r ≤ 2r0.

2length(I).

Page 7

(a) Some intervals of both classes 1 and −1.(b) The two zones in midpoint-radius diagram.

(c) Classification of test pattern in Example 2 in midpoint-radius diagram for the two kernels used.

Figure 2. Example 2. Training patterns and results.

Figure 4a and 4b represents some intervals of both classes and their respective zones in

midpoint-radius diagram.

As can be seen in Figure 4c, in this example, the Signed Intersection Kernel (right

side) does not give a good classification (error=15.56% with C = 1000). However, the

Intersection Kernel (left side) gives an acceptable result (error=2.6% with C = 1000).

The third example involves two-dimensional interval data. In this case 250 two-

dimensional intervals (rectangles) has been generated from the square of opposite vertex

(0,0) and (1,1). In this case, rectangles from class 1 are those which most of them

belong to the rectangle of opposite vertex (0.3,0.4) and (0.5,0.7). In Figure 5a some

of the training patterns have been represented using two different colors to identify the

two classes. In Figure 5b, the midpoint of the training rectangles has been represented

in order to visualize the difficulty of the classification. In Figure 5c the midpoint of

5000 test patterns have been represented when the three different kernels have been used.

The result using Intersection Kernel was 4.52% (C = 100) and the Signed Intersection

reached 29.90% (C = 10) in error classification. These results reveal that Intersection

Kernel performed well. The Signed Intersection Kernel only perform well if the problem

does not depend on the precision of the intervals, just on the position.

Page 8

(a) Some intervals of both classes 1 and −1. (b) Midpoint of training patterns of both

classes.

(c) Classification of test pattern in Example 3 in midpoint-radius diagram for the two kernels used.

Figure 3. Example 3. Training patterns and results.

4. CONCLUSION

In this paper, we have proven that the length of the intersection of two intervals is a

symmetricpositivedefinitefunction andhenceasuitablekernel.On theotherhand,when

the problem does not depend explicitly on the precision of the interval, another kernel

associated to the intersection but non-PD kernel, the signed intersection is a good choice.

One of the more interesting advantages of the intersection kernel is that it is possible

to extend it easily to other more general kinds of data, such as multidimensional intervals

or ever more general sets including fuzzy sets. This way it will be possible to extend the

kernel method with more general kind of data.

Acknowledgements

This research has been partially granted by the projects AURA (TIN2005-08873-C02)

and ADA-EXODUS (DPI2006-15630-C02) of the Spanish Ministry of Education and

Science.

Page 9

References

[1]

[2]

C. Berg, J.P.R. Christensen and P. Ressel,Harmonic Analysus on Semigroups, Springer Verlag, (1984).

B.E. Boser, I. Guyon and V. Vapnik, A Training Algorithm for Optimal Margin Classifiers, Computa-

tional Learing Theory,pp 144–152. (1992)

N. Cristianini and J. Shawe-Taylor, An introduction to Support Vector Machines and other Kernel-based

learning methods, Cambridge University Press, (2000).

I. Guyon, SVM application list. http://www.clopinet.com/isabelle/Projects/SVM/applist.html, Berkeley,

(2008).

Z. Kulpa, Diagrammatic representation for interval arithmetic, Linear Algebra and its Applications 324

pp 55–80 (2001).

J. Mercer, Functions of positive and negative type and their connection with the theory of integral equa-

tions. Philos. Trans. Roy. Soc. London, A (209) pp 415–446. (1909)

B. Scholkopf and A. J. Smola,Learning with Kernels: Support Vector Machines, Regularization, Opti-

mization and Beyond. MIT Press, (2001).

V.N.Vapnik, The Nature of Statistical Learning Theory, Springer, New York, (1995).

V.N.Vapnik, Statistical Learning Theory, John Wiley & sons, (1998).

[3]

[4]

[5]

[6]

[7]

[8]

[9]