Content uploaded by Gaëlle Loosli

Author content

All content in this area was uploaded by Gaëlle Loosli on Sep 22, 2015

Content may be subject to copyright.

Invariances Invariant LASVM Experiments Conclusion

What to use for invariances?

Afﬁne deformations (linear approximations)

Thickening

Elastic deformations

xafﬁne(i,j) = x(i,j) + αx∗dx∗tx(i,j) + αy∗dy∗ty(i,j)

Rotations with tangent vectors

Translations with tangent vectors

Ga¨

elle Loosli LASVM applied to invariant problems 5/20

Invariances Invariant LASVM Experiments Conclusion

What to use for invariances?

Afﬁne deformations (linear approximations)

Thickening

Elastic deformations

xdeformed (i,j) =

x(i,j)+fx(i,j)∗tx(i,j)+fy(i,j)∗ty(i,j)

P. Y. Simard, Y. LeCun, J. S. Denker, and B. Victorri.

Transformation invariance in pattern recognition – tangent distance and tangent propagation.

International Journal of Imaging Systems and Technology, 11(3), 2000.

Patrice Y. Simard, Dave Steinkraus, and John C. Platt.

Best practices for convolutional neural networks applied to visual document analysis.

In ICDAR ’03: Proceedings of the Seventh International Conference on Document Analysis and Recognition,

page 958, Washington, DC, USA, 2003. IEEE Computer Society.

Ga¨

elle Loosli LASVM applied to invariant problems 5/20

Invariances Invariant LASVM Experiments Conclusion

What to use for invariances?

Afﬁne deformations (linear approximations) [SLDV00]

Thickening

Elastic deformations [SSP03]

Random deformation

xtransformed (i,j) = x(i,j)+α∗(fx(i,j)∗tx(i,j) +fy(i,j)∗ty(i,j))+ β∗qtx(i,j)2+ty(i,j)2

What do we need?

Tangent vectors - linear deformations (tx,ty)

Deformation ﬁelds (regular or noisy) (fx,fy)

Deformation strength parameters (α, β)

We add 1 and 2 pixels translations any 8 directions

Ga¨

elle Loosli LASVM applied to invariant problems 5/20

Invariances Invariant LASVM Experiments Conclusion

What to use for invariances?

Afﬁne deformations (linear approximations)

Thickening

Elastic deformations

Random deformation

xtransformed (i,j) = x(i,j)+α∗(fx(i,j)∗tx(i,j) +fy(i,j)∗ty(i,j))+ β∗qtx(i,j)2+ty(i,j)2

What do we need?

Tangent vectors - linear deformations (tx,ty)

Deformation ﬁelds (regular or noisy) (fx,fy)

Deformation strength parameters (α, β)

We add 1 and 2 pixels translations any 8 directions

Ga¨

elle Loosli LASVM applied to invariant problems 5/20

Invariances Invariant LASVM Experiments Conclusion

What to use for invariances?

Afﬁne deformations (linear approximations)

Thickening

Elastic deformations

Random deformation

xtransformed (i,j) = x(i,j)+ α∗(fx(i,j)∗tx(i,j) +fy(i,j)∗ty(i,j)) +β∗qtx(i,j)2+ty(i,j)2

What do we need?

Tangent vectors - linear deformations (tx,ty)

Deformation ﬁelds (regular or noisy) (fx,fy)

Deformation strength parameters (α, β)

We add 1 and 2 pixels translations any 8 directions

Ga¨

elle Loosli LASVM applied to invariant problems 5/20

Invariances Invariant LASVM Experiments Conclusion

What to use for invariances?

Afﬁne deformations (linear approximations)

Thickening

Elastic deformations

Random deformation

xtransformed (i,j) = x(i,j)+ α∗(fx(i,j)∗tx(i,j) +fy(i,j)∗ty(i,j)) +β∗qtx(i,j)2+ty(i,j)2

What do we need?

Tangent vectors - linear deformations (tx,ty)

Deformation ﬁelds (regular or noisy) (fx,fy)

Deformation strength parameters (α, β)

We add 1 and 2 pixels translations any 8 directions

Ga¨

elle Loosli LASVM applied to invariant problems 5/20

Invariances Invariant LASVM Experiments Conclusion

How to incorporate invariances in learning?

Invariances with SVM methods

Modify the cost function

Learn trajectories

Add some deformed points

Modify the cost function - Tangent kernels [HK02, CS02]

Changes the distance between elements using tangent distance

O. Chapelle and B. Sch¨

olkopf.

Incorporating invariances in nonlinear svms.

In Dietterich T. G.and Becker S. and Ghahramani Z., editors, Advances in Neural Information Processing

Systems, volume 14, pages 609–616, Cambridge, MA, USA, 2002. MIT Press.

B. Haasdonk and D. Keysers.

Tangent distance kernels for support vector machines.

In International Conference on Pattern Recognition, Quebec City, Canada, August 2002.

Ga¨

elle Loosli LASVM applied to invariant problems 6/20

Invariances Invariant LASVM Experiments Conclusion

How to incorporate invariances in learning?

Invariances with SVM methods

Modify the cost function

Learn trajectories

Add some deformed points

Learn trajectories -

SDPM [GH04]

The idea is to classify the

trajectories deﬁned by a

transformation with

continuous parameter.

Thore Graepel and Ralf Herbrich.

Invariant pattern recognition by semi-deﬁnite programming machines.

In Sebastian Thrun, Lawrence Saul, and Bernhard Sch ¨

olkopf, editors, Advances in Neural Information

Processing Systems 16. MIT Press, Cambridge, MA, 2004.

Ga¨

elle Loosli LASVM applied to invariant problems 6/20

Invariances Invariant LASVM Experiments Conclusion

How to incorporate invariances in learning?

Invariances with SVM methods

Modify the cost function

Learn trajectories

Add some deformed points

Database modiﬁcation

- Virtual SVM [DS02]

Adds deformed examples

to the database

D. DeCoste and B. Sch¨

olkopf.

Training invariant support vector machines.

Machine Learning, 46:161–190, 2002.

Ga¨

elle Loosli LASVM applied to invariant problems 6/20

Invariances Invariant LASVM Experiments Conclusion

How to incorporate invariances in learning?

Invariances with SVM methods

Modify the cost function

Learn trajectories

Add some selected deformed points

Combination of

trajectories and virtual

points - [LCVJ05]

Uses discretized

trajectories. The idea is to

add to the database only

virtual points that are SV

Ga¨

elle Loosli, St´

ephane Canu, SVN Vishwanathan, and Alexander J.Smola.

Invariances in classiﬁcation : an efﬁcient svm implementation.

In ASMDA 2005 -Applied Stochastic Models and Data Analysis, 2005.

Ga¨

elle Loosli LASVM applied to invariant problems 6/20

Invariances Invariant LASVM Experiments Conclusion

How to incorporate invariances in learning?

Good and bad

Method Pro Cons

Tangent dis-

tance

Efﬁcient Linear deformations only

Trajectories

learning

All deformations SDP - hard to solve

Virtual vectors Simple, all deformations Requires a lot of memory

Selected virtual

vectors

Simple, all deformations Not yet fast enough (simpleSVM

limitations)

Objective

Apply the selected virtual vectors idea to a more adapted algorithm,

namely LASVM

Ga¨

elle Loosli LASVM applied to invariant problems 7/20

Invariances Invariant LASVM Experiments Conclusion

How to incorporate invariances in learning?

Good and bad

Method Pro Cons

Tangent dis-

tance

Efﬁcient Linear deformations only

Trajectories

learning

All deformations SDP - hard to solve

Virtual vectors Simple, all deformations Requires a lot of memory

Selected virtual

vectors

Simple, all deformations Not yet fast enough (simpleSVM

limitations)

Objective

Apply the selected virtual vectors idea to a more adapted algorithm,

namely LASVM

Ga¨

elle Loosli LASVM applied to invariant problems 7/20

Invariances Invariant LASVM Experiments Conclusion

Helpful deﬁnitions and generalities

Groups of points : IA(active set) and I0(inactive set)

Selection : pick a point from I0to transfer it to IA

Optimization : over IA, may transfer a point from IAto I0

Finalization : checks the ﬁnal solution IA

Ga¨

elle Loosli LASVM applied to invariant problems 9/20

Invariances Invariant LASVM Experiments Conclusion

Helpful deﬁnitions and generalities

Groups of points : IA(active set) and I0(inactive set)

Selection : pick a point from I0to transfer it to IA

Optimization : over IA, may transfer a point from IAto I0

Finalization : checks the ﬁnal solution IA

Ga¨

elle Loosli LASVM applied to invariant problems 9/20

Invariances Invariant LASVM Experiments Conclusion

Helpful deﬁnitions and generalities

Groups of points : IA(active set) and I0(inactive set)

Selection : pick a point from I0to transfer it to IA

Optimization : over IA, may transfer a point from IAto I0

Finalization : checks the ﬁnal solution IA

Ga¨

elle Loosli LASVM applied to invariant problems 9/20

Invariances Invariant LASVM Experiments Conclusion

Helpful deﬁnitions and generalities

Groups of points : IA(active set) and I0(inactive set)

Selection : pick a point from I0to transfer it to IA

Optimization : over IA, may transfer a point from IAto I0

Finalization : checks the ﬁnal solution IA

Ga¨

elle Loosli LASVM applied to invariant problems 9/20

Invariances Invariant LASVM Experiments Conclusion

Helpful deﬁnitions and generalities

Groups of points : IA(active set) and I0(inactive set)

Selection : pick a point from I0to transfer it to IA

Optimization : over IA, may transfer a point from IAto I0

Finalization : checks the ﬁnal solution IA

iterative SVM

Initialize

While selection is possible

Selection

Optimization

Finalize

Ga¨

elle Loosli LASVM applied to invariant problems 9/20

Invariances Invariant LASVM Experiments Conclusion

Algorithms - iterative methods

General loop

General SimpleSVM [VSM03] LASVM [BEWB05]

Selection Takes the most violator

point in I0

Takes a point among the few next

points (according to a criteria)

Optimization Makes a full optimiza-

tion over IA

One SMO step between the candi-

date and a point from IA(Process)

and one SMO step between two

points of IA(Reprocess)

Finalization Once no points are vio-

lators anymore, stops

Once all points are seen once,

makes a full optimization over IA

(end of an epoch)

Antoine Bordes, Seyda Ertekin, Jason Weston, and L´

eon Bottou.

Fast kernel classiﬁers with online and active learning.

Journal of Machine Learning Research, 6:1579–1619, September 2005.

S. V. N Vishwanathan, A. J. Smola, and M. Narasimha Murty.

SimpleSVM.

In Proceedings of the Twentieth International Conference on Machine Learning, 2003.

Ga¨

elle Loosli LASVM applied to invariant problems 10/20

Invariances Invariant LASVM Experiments Conclusion

LASVM speciﬁcities

Things to play with

Selection criteria

Number of Reprocess (level of optimization at each step)

Number of epochs

Complete - brute force

Gradient

Active - distance to the margins

Auto-active

Ga¨

elle Loosli LASVM applied to invariant problems 11/20

Invariances Invariant LASVM Experiments Conclusion

LASVM speciﬁcities

Things to play with

Selection criteria

Number of Reprocess (level of optimization at each step)

Number of epochs

Complete - brute force

Gradient

Active - distance to the margins

Auto-active

Complete

Every point is selected as a candidate to the active set

Ga¨

elle Loosli LASVM applied to invariant problems 11/20

Invariances Invariant LASVM Experiments Conclusion

LASVM speciﬁcities

Things to play with

Selection criteria

Number of Reprocess (level of optimization at each step)

Number of epochs

Complete - brute force

Gradient

Active - distance to the margins

Auto-active

Gradient

Looks at the znext points, selects the most misclassiﬁed point

Ga¨

elle Loosli LASVM applied to invariant problems 11/20

Invariances Invariant LASVM Experiments Conclusion

LASVM speciﬁcities

Things to play with

Selection criteria

Number of Reprocess (level of optimization at each step)

Number of epochs

Complete - brute force

Gradient

Active - distance to the margins

Auto-active

Active

Selects a point if it is between the margins (±δ/2)

Ga¨

elle Loosli LASVM applied to invariant problems 11/20

Invariances Invariant LASVM Experiments Conclusion

LASVM speciﬁcities

Things to play with

Selection criteria

Number of Reprocess (level of optimization at each step)

Number of epochs

Complete - brute force

Gradient

Active - distance to the margins

Auto-active

Auto-active

Looks at the next points, keeps the points selected by the active rule,

until 5 points are kept or 100 points checked. Selects the closest

point to the decision boundary

Ga¨

elle Loosli LASVM applied to invariant problems 11/20

Invariances Invariant LASVM Experiments Conclusion

What is stored?

0 0.6 1.2 1.8 2.4x 105

Original database

Tangent vectors

Original Dataset Dataset transformed with

parameters α1 and β1

Dataset transformed with

parameters α2 and β2Dataset transformed with

parameters α3 and β3

Etc...

Stored in memory:

Infinite sized database

Vertical

Horizontal Stored:

3 times the size of

original dataset

Available:

as many example as

wanted − with correct

deformation parameters

Memory usage

We only need to store 3 times the size of the original database. For

convenience, we also store some pre-generated ﬁelds.

Ga¨

elle Loosli LASVM applied to invariant problems 13/20

Invariances Invariant LASVM Experiments Conclusion

What is stored?

0 0.6 1.2 1.8 2.4x 105

Original database

Tangent vectors

Original Dataset Dataset transformed with

parameters α1 and β1

Dataset transformed with

parameters α2 and β2Dataset transformed with

parameters α3 and β3

Etc...

Stored in memory:

Infinite sized database

Vertical

Horizontal Stored:

3 times the size of

original dataset

Available:

as many example as

wanted − with correct

deformation parameters

Memory usage

We only need to store 3 times the size of the original database. For

convenience, we also store some pre-generated ﬁelds.

Ga¨

elle Loosli LASVM applied to invariant problems 13/20

Invariances Invariant LASVM Experiments Conclusion

What are the relevant deformations for our task?

0 0.5 1 1.5 2 2.5 3 3.5 4

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

strengh of applied transformations, α

error rate

Effect of transformations, trained on 5000 points and 10 transformations each

fields only

fields + 1px translation

fields + 2px translation

fields + thickenning

fields + thickenning + 1px translation

fields + thickenning + 2px translation

best configuration

On the deformations

Translations are the most useful deformations for MNIST. Thickening

does not help for MNIST database. β=0

Ga¨

elle Loosli LASVM applied to invariant problems 14/20

Invariances Invariant LASVM Experiments Conclusion

What are the relevant deformations for our task?

0 0.5 1 1.5 2 2.5 3 3.5 4

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

strengh of applied transformations, α

error rate

Effect of transformations, trained on 5000 points and 10 transformations each

fields only

fields + 1px translation

fields + 2px translation

fields + thickenning

fields + thickenning + 1px translation

fields + thickenning + 2px translation

best configuration

On the deformations

Translations are the most useful deformations for MNIST. Thickening

does not help for MNIST database. β=0

Ga¨

elle Loosli LASVM applied to invariant problems 14/20

Invariances Invariant LASVM Experiments Conclusion

Which mode will we use?

None BF A AA

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Modes for LASVM

Error percentage

Error rate

None BF A AA

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 105Size of the solution

Modes for LASVM

Number of support vectors

None BF A AA

0

2

4

6

8

10

12 x 105

Modes for LASVM

Training time (s)

Training time

None : No

invariances

BF: Brute Force

A : Active

AA : Auto acitve

Ga¨

elle Loosli LASVM applied to invariant problems 15/20

Invariances Invariant LASVM Experiments Conclusion

Optimal results

Which transformations : all but thickening

Training data size : 8 millions

Solution sizes : about 120000 SV for the 10 classiﬁers

Computational time : 8 days

Performance : 0.67 %

NB: the machine used is a dual opteron with 16GB RAM

(6.5GB cache used for kernel)...

Ga¨

elle Loosli LASVM applied to invariant problems 17/20

Invariances Invariant LASVM Experiments Conclusion

Conclusion

We wanted to solve invariance problem

We needed to solve large SVM to do so

As a result...

We ran some very large SVMs (the largest until now?)

We obtained fairly good results in accuracy (even though

convolution networks are still much better)

Ga¨

elle Loosli LASVM applied to invariant problems 19/20

Invariances Invariant LASVM Experiments Conclusion

Conclusion

We wanted to solve invariance problem

We needed to solve large SVM to do so

As a result...

We ran some very large SVMs (the largest until now?)

We obtained fairly good results in accuracy (even though

convolution networks are still much better)

Ga¨

elle Loosli LASVM applied to invariant problems 19/20

Invariances Invariant LASVM Experiments Conclusion

Conclusion

We wanted to solve invariance problem

We needed to solve large SVM to do so

As a result...

We ran some very large SVMs (the largest until now?)

We obtained fairly good results in accuracy (even though

convolution networks are still much better)

Ga¨

elle Loosli LASVM applied to invariant problems 19/20

Invariances Invariant LASVM Experiments Conclusion

Conclusion

We wanted to solve invariance problem

We needed to solve large SVM to do so

As a result...

We ran some very large SVMs (the largest until now?)

We obtained fairly good results in accuracy (even though

convolution networks are still much better)

Ga¨

elle Loosli LASVM applied to invariant problems 19/20

Invariances Invariant LASVM Experiments Conclusion

Conclusion

We wanted to solve invariance problem

We needed to solve large SVM to do so

As a result...

We ran some very large SVMs (the largest until now?)

We obtained fairly good results in accuracy (even though

convolution networks are still much better)

Ga¨

elle Loosli LASVM applied to invariant problems 19/20