ArticlePDF Available

On Growth and Formlets: Sparse Multi-Scale Coding of Planar Shape

Authors:

Abstract and Figures

This paper presents a sparse representation of 2D planar shape through the composition of warping functions, termed formlets, localized in scale and space. Each formlet subjects the 2D space in which the shape is embedded to a localized isotropic radial deformation. By constraining these localized warping transformations to be diffeomorphisms, the topology of shape is preserved, and the set of simple closed curves is closed under any sequence of these warpings. A generative model based on a composition of formlets applied to an embryonic shape, e.g., an ellipse, has the advantage of synthesizing only those shapes that could correspond to the boundaries of physical objects. To compute the set of formlets that represent a given boundary, we demonstrate a greedy coarse-to-fine formlet pursuit algorithm that serves as a non-commutative generalization of matching pursuit for sparse approximations. We evaluate our method by pursuing partially occluded shapes, comparing performance against a contour-based sparse shape coding framework. oui
Content may be subject to copyright.
On Growth and Formlets: Sparse Multi-Scale Coding of
Planar Shape
James H. Eldera,, Timothy D. Oleskiwb, Alex Yakubovicha, Gabriel Peyr´ec
aCentre for Vision Research, York University, Toronto, Canada
bDepartment of Applied Mathematics, University of Washington, Seattle, WA, United States
cCEREMADE, Universit´e Paris-Dauphine, Paris, France
Abstract
We propose a sparse representation of 2D planar shape through the composi-
tion of warping functions, termed formlets, localized in scale and space. Each
formlet subjects the 2D space in which the shape is embedded to a localized
isotropic radial deformation. By constraining these localized warping transfor-
mations to be diffeomorphisms, the topology of shape is preserved, and the set
of simple closed curves is closed under any sequence of these warpings. A gener-
ative model based on a composition of formlets applied to an embryonic shape,
e.g., an ellipse, has the advantage of synthesizing only those shapes that could
correspond to the boundaries of physical objects. To compute the set of formlets
that represent a given boundary, we demonstrate a greedy coarse-to-fine formlet
pursuit algorithm that serves as a non-commutative generalization of matching
pursuit for sparse approximations. We evaluate our method by pursuing par-
tially occluded shapes, comparing performance against a contour-based sparse
shape coding framework.
Keywords: planar shape, deformation, sparse coding sep contour completion
1. Introduction
Shape information is important for a broad range of computer vision prob-
lems. For some detection and recognition tasks, discriminative models that use
non-invertible shape codes (e.g., [1]) can be effective. However, many other
tasks call for a more complete generative model of shape. Examples include:
(1) shape segmentation, recognition, and tracking in cluttered scenes, where
Corresponding author.
Email addresses: jelder@yorku.ca (James H. Elder), oleskiw@uw.edu (Timothy D.
Oleskiw), yakuboa@yorku.ca (Alex Yakubovich), gabriel.peyre@ceremade.dauphine.fr
(Gabriel Peyr´e)
URL: www.yorku.ca (James H. Elder),
http://depts.washington.edu/amath/people/Timothy.Oleskiw (Timothy D. Oleskiw),
elderlab.yorku.ca/alex (Alex Yakubovich), http://www.ceremade.dauphine.fr/peyre
(Gabriel Peyr´e)
Preprint submitted to Image and Vision Computing January 4, 2013
shapes must be distinguished not just from each other, but from ‘phantom’
shapes formed by conjunctions of features from multiple objects [2]; (2) model-
ing of shape articulation, growth, and deformation; and (3) modeling of shape
similarity.
Our paper concerns the generative modeling of natural 2D shapes in the
plane, represented by their 1D boundary. We restrict our attention to simply-
connected shapes whose boundaries are smooth, simple, and closed curves. We
seek a generative shape model that satisfies a set of properties that seem to us
essential:
1. Completeness. The model can produce all shapes.
2. Closure. The set of valid shapes is closed under the generative model. In
other words, the model generates only valid shapes.
3. Composition. Complex shapes are generated by combining simpler com-
ponents.
4. Sparsity. Good approximations of shape can be generated with relatively
few components.
5. Progression. Approximations can be improved by incorporating more
components.
6. Locality. Components are localized in space.
7. Scaling. Components are tuned to specific scales and are self-similar over
scale.
8. Region & Contour. Components can capture both region and contour
properties in a natural way.
The need for completeness is self-evident if the system is to be general.
Closure is critical if we hope to capture the statistics of natural shape in a set of
hidden generative variables. Without closure, heuristics must be used to avoid
the generation of invalid shapes, e.g., bounding contours with self-intersections.
Aside from the resulting inefficiency, this creates a discrepancy between the
statistical structure encoded by the model, and samples the model produces. In
other words, the model cannot fully capture the statistics of natural boundaries.
Composition (here we use the word in a general sense) is important if we are
to handle the richness and complexity of natural shapes while maintaining con-
ceptual simplicity. Given the high dimensionality of natural shapes, sparsity is
necessary in order to store shape models [3]. Sparsity also implies that essential
shape features have been made explicit [4]. Progression allows the complexity
of the model to be matched to the difficulty of the task, facilitating real-time
operation and coarse-to-fine optimization.
Locality is a natural goal, since a first-order property of natural images
is local coherence. Nearby points on the surface of an object tend to have
similar reflectance, attitude, and illumination. Locality also allows for greater
robustness to occlusion, since components are more likely to be either entirely
visible or removed altogether rather than distorted. Scaling allows invariance
over object size, and allows shape features of different sizes to be captured
separately.
2
Finally, it has long been recognized that planar shape description requires
attention to both region and contour properties [3]. Some shape properties,
e.g., curvature, are naturally described by the bounding contour. Others, e.g.,
necks, are best described as region properties, since they involve points that are
proximal in the image but distant along the contour. A good generative model
will allow both to be encoded in a natural way.
We begin by reviewing prior models, with an eye to each of these essential
properties.
2. Prior Work
Early models that used chain coding or splines to encode shapes were not
generative and failed to succinctly capture global properties of shape. Fourier
descriptor, moment, and PCA bases have the potential to be generative, but
since all components are global, they are not robust to occlusion or local defor-
mation [5, 3, 6]. For these reasons, most modern approaches attempt to capture
structure at intermediate scales, or over a range of scales. Most of these models
can be crudely partitioned into two classes: contour-based and symmetry-based.
2.1. Contour-Based Models
Attneave [4] pointed to the concentration of information in the curvature of
the bounding contour, and suggested the potential for sparse descriptions based
on points of extremal curvature magnitude. Hoffman & Richards [7] linked cur-
vature to the part structure of shapes, proposing that parts are perceptually
segmented at negative minima of curvature. Mokhtarian & colleagues empha-
sized the encoding of curvature inflections across scale space for the purpose of
shape recognition [8].
While none of these early models are generative, Dubinskiy & Zhu [9] have
more recently proposed a contour-based shape representation that is both gen-
erative and sparse. The theory is based upon the representation of a shape by
a summation of component shapelets. A shapelet is a primitive curve defined
by Gabor-like coordinate functions that map arclength to the image, which can
be represented by the complex plane.
Specifically, a shapelet γ(t;σ, µ) is a mapping of arc length t[0,1] to the
image, represented by the complex plane. Each shape let is parameterized by
an arc length position parameter µand a scale parameter σ, and has the specific
form:
γ(t;σ, µ) = exp (tµ)2
2σ2cos 2π
σ(tµ)+isin 2π
σ(tµ).(1)
Figure 1 shows the coordinate functions and trace of an example shapelet. Note
that the planar curves generated by γ(t;σ, µ) are identical on tRup to a
linear reparameterization, i.e., they are self-similar. However, these functions
are only approximately self-similar on any finite domain over which a curve will
3
be defined. Also, note that γdoes not in general generate a simple closed curve.
In fact, as σ0, the number of sinusoidal periods on the interval t[0,1]
explodes, generating an infinite number of self-intersections.
y(t)
x(t)
t
0.20.40.60.8
0.1
0.05
0
0.05
0.1
(a) Component functions
y(t)
x(t)
0.05 00.05 0.1
0.1
0.05
0
0.05
0.1
(b) Generated image trace
Figure 1: An example shapelet.
Shifting and scaling shapelets over arclength produces a basis set sufficient
to generate arbitrarily complex shapes. In particular, a K-shapelet curve ΓK(t)
can be defined as:
ΓK(t) = z0+
K
X
k=1
Akγ(t;σk, µk),(2)
4
where the 2 ×2 matrix Akapplies an affine transformation to each shapelet in
image space prior to linear combination.
Dubinskiy & Zhu’s shapelet model has many positive features. Components
are localized, albeit only in arclength, and scale is made explicit in a natural way.
However, like all contour-based methods, the shapelet theory does not explicitly
capture regional properties of shape. Perhaps most crucially, the model does
not respect the topology of object boundaries: sampling from the model will in
general yield non-simple, i.e., self-intersecting, curves (Figure 2). This violates
the closure criterion identified in Section 1.
Figure 2: Sampling from the shapelet model generally yields non-simple curves.
2.2. Symmetry-Based Models
Blum and colleagues [10, 11] introduced the symmetry axis representation
of shape in which a planar shape is represented by a 1D skeleton function and
associated 1D radius function. The symmetry axis representation led to re-
lated representations [12] which found application in medical imaging and other
domains.
Subsequent work incorporated notions of scale and time with symmetry axis
descriptions. Leyton [13] related symmetry axis descriptions to causal defor-
mation processes acting upon prototype shapes. In this view, symmetry axes,
terminating at curvature extrema on the boundary, are understood as records
of these deformation processes. Subsequent work on curve evolution methods
and shock-graph representations [14, 15] has provided a more complete theory
of region-based shape representation that has been broadly applied.
Despite the many appealing features of symmetry axis and shock-graph rep-
resentations, these methods, in general, are not sparse. In fact, the description of
5
each shape typically requires more storage, and little emphasis has been placed
on making symmetry axis representations generative [3]. Recent work of Trinh
and Kimia exploring generative and sparse models based upon shock graphs
comes some way in overcoming these limitations [16]. However, the constraints
required to enforce the closure property, i.e., topological constraints, are fairly
complex, and the full potential of the theory has yet to be explored.
A related approach to shape representation (e.g., [17, 18] employs finite
element modelling techniques to code the bounding contour in terms of the
free vibration modes of the shape, which are said to correspond to the object’s
generalized axes of symmetry. The main difficulty in developing this approach
into a generative model is that points on the boundary are coupled only locally
in the intrinsic coordinates of the shape boundary, thus nothing constrains the
topology of generated shapes.
2.3. Hybrid Approaches
Recognizing the merits and limitations of both contour-based and symmetry-
based approaches, Zhu [19] developed an MRF model for natural 2D shape, em-
ploying a neighbourhood structure that can directly encode both contour-based
and region-based Gestalt principles. The theory is promising in many respects.
It is generative, providing an explicit probabilistic model, and it captures both
region and contour properties. It is not sparse, however, and because the un-
derlying graph is lifted from the image plane, there is nothing in the model
that encodes the topological constraint that the boundary be simple, i.e., non-
intersecting. Instead, when sampling from the model, a ‘firewall’ is employed
to prevent intersections. Again, this is inefficient, and it also creates a dis-
connect between the generative variables encoding the model and the sampling
distribution.
2.4. Coordinate Transformations
A different class of model that could also be called region-based involves the
application of coordinate transformations of the planar space in which a shape
is embedded. This idea can be traced back at least to D’Arcy Thompson, who
considered specific classes of global coordinate transformations of the plane to
model the relationship between the shapes of different animal species [20]. In
the field of computer vision, Jain et al. [21] were among the first to extend this
idea to more general deformations with a complete Fourier deformation basis
that they used to match observed shapes to stored prototypes. However, this
Fourier basis fails to satisfy the locality property, and as a potential genera-
tive model it does not satisfy the closure property: random combinations of
Fourier deformation components will not in general preserve the topology of the
prototype curve.
More recently, Sharon & Mumford [22] have explored conformal mappings as
global coordinate transformations between planar shapes. However, although
the Riemann mapping theorem guarantees that any simple closed curve can
6
be conformally mapped to the unit circle, conformal mappings do not in gen-
eral preserve the topology of embedded contours. Hence, despite the compu-
tational constraints imposed by the Cauchy-Riemann equations, we again have
the problem that the set of valid bounding contours is not closed under these
transformations, making generative modeling difficult.
2.5. Localized Diffeomorphisms: Formlets
In considering prior generative shape models, the goal that seems most elu-
sive is that of closure: ensuring that the model generates only valid shapes. Our
approach originates with the observation that, while general smooth coordinate
transformations of the plane will not preserve the topology of an embedded
curve, it is straightforward to design a specific family of diffeomorphic transfor-
mations that will. It then follows immediately by induction that a generative
model based upon arbitrary sequences of diffeomorphisms will satisfy the closure
property.
In this paper we specifically consider a family of diffeomorphisms we call
formlets. A formlet is a simple, isotropic, radial deformation of planar space
that is localized within a specified circular region of a selected point in the plane.
The family comprises formlets over all locations and spatial scales. While the
gain of the deformation is also a free parameter, it is constrained to satisfy a
simple criterion that guarantees that the formlet is a diffeomorphism. Since
topological changes in an embedded figure can only occur if the deformation
mapping is either discontinuous or non-injective, these diffeomorphic deforma-
tions are guaranteed to preserve the topology of embedded figures. Thus the
model satisfies the closure property.
By construction, formlets satisfy the desired locality and scaling proper-
ties. It is straightforward to show that the model also satisfies the composition,
completeness, and progression properties in that an arbitrary shape can be ap-
proximated to increasing precision by composing an appropriate sequence of
localized formlets. Since each formlet may be centered either near the contour,
near a symmetry axis, or at any other location in the plane, the model has the
potential to capture both region and contour properties directly.
Our formlet model is closely related to recent work by Grenander et al.
[23], modeling changes to anatomical parts over time. Their representation,
called Growth by Random Iterated Diffeomorphisms (GRID), models growth
as a sequence of local and radial deformations. They demonstrate their model
by tracking growth in the rat brain, as revealed in sequential planar sections of
MRI data.
In the present paper we explore the possibility that these ideas could be
extended to model not just differential growth between sequential shapes, but
to serve as the basis for a generative model over the entire space of smooth
shapes, based upon a universal embryonic shape in the plane such as an ellipse.
Elements of the present paper were first reported at CVPR [24]. The main
contributions of this conference paper were:
1. We illustrated the completeness and closure properties of the formlet
model through random generation of sample shapes.
7
2. To solve the inverse problem of modeling given shapes, we developed and
applied a generalization of matching pursuit, which selects the sequence of
formlets that minimizes approximation error. We demonstrated that this
formlet pursuit algorithm allows for progressive approximation of shape,
while preserving topological properties.
3. We assessed the robustness of the formlet model to occlusion by evaluat-
ing it on the problem of contour completion. We found that the model
compares favourably with the contour-based shapelet model [9] on this
important problem.
In the present paper we elaborate substantially on these contributions, in-
cluding full derivations and complete implementation details. But we also build
on this work with several important new contributions:
1. We introduce a method for handling analytically computed optimal gain
values that exceed the diffeomorphism bounds.
2. We develop and evaluate an improved parameter optimization method
called dictionary descent, and show that it increases accuracy by 11% and
decreases run time by 42%, relative to standard dictionary pursuit.
3. We provide derivations for the Jacobian required for this new dictionary
descent method.
4. We develop, evaluate and compare several alternative mathematical for-
mulations of the formlet function.
5. We report statistics of formlet model parameters for our database of ani-
mal shapes, demonstrating coarse-to-fine scaling properties and an inter-
esting anisotropy in the location distribution.
3. Formlet Coding
3.1. Formlet Bases
We represent the image in the complex plane C, and define a formlet f:CC
to be a diffeomorphism of the complex plane localized in scale and space. Such
a deformation can be realized by centering fabout the point ζCand allow-
ing fto deform the plane within a (σR+)-region of ζ. Our Gabor-inspired
deformation is defined as
f(z;ζ, σ, α) = ζ+zζ
|zζ|ρ(|zζ|;σ, α),where
ρ(r;σ, α) = r+αsin 2πr
σexp r2
σ2.
(3)
Thus each formlet f:CCis a localized isotropic and radial deformation
of the plane at location ζand scale σ. The magnitude of the deformation
is controlled by the gain parameter αR. Figure 3 demonstrates formlet
deformations of the plane with positive and negative gain.
8
(a) Expansion (α > 0) (b) Compression (α < 0)
Figure 3: Example formlet deformations. The location of the formlet ζis indicated by the
asterisk.
3.2. Diffeomorphism Constraint
Without any constraints on the parameters, these deformations, though con-
tinuous, can fold the plane on itself, changing the topology of an embedded con-
tour. In order to preserve topology, we must constrain the gain parameter to
guarantee that each deformation is a diffeomorphism. As the formlets defined
in Equation 3 are both isotropic and angle preserving, it is sufficient to require
that the radial deformation ρbe a diffeomorphism of R+, i.e., that ρ(r;σ, α) be
strictly increasing in r:
∂r ρ(r;σ, α)>0
α
∂r sin 2πr
σexp r2
σ2>1
2α
σexp r2
σ2πcos 2πr
σr
σsin 2πr
σ>1
(4)
For α < 0, it is easy to see that the minimal slope of ρis attained as r0+.
Evaluating Equation 4 at r= 0 thus yields the lower-bound on the gain α:
α > σ
2π.(5)
For positive α, the location of the minimum in ρ0(r) does not have a closed
form solution, but can be computed numerically:
α/0.1956σ. (6)
Thus the diffeormorphism constraint is:
ασ1
2π,0.1956.(7)
9
By enforcing this constraint, we guarantee that the formlet f(z , ζ, σ, α) is a
diffeomorphism of the plane. Hence, such a formlet acting on a curve embedded
in the plane will be a homeomorphism. In particular, let Γ be the continuous
mapping
Γ : [0,1] C.(8)
Recall that Γ is simple if the mapping is injective, and closed by permitting the
equality Γ(0) = Γ(1). Since a formlet fsatisfying Equation 7 is bicontinuous,
if Γ is simple and closed, the deformed curve
Γf(t) = f(Γ(t)) (9)
will also be simple and closed.
Figures 4(a) and (b) show the radial deformation function ρ(r;σ, α) as a
function of rfor a range of gain αand scale σvalues respectively. Figures 4(c)
and (d) show the corresponding trace of the formlet deformation of an ellipse
in the plane.
ρ
r
(a) ρwith gain
variation
ρ
r
(b) ρwith scale
variation
y
x
(c) fwith gain
variation
y
x
(d) fwith scale
variation
Figure 4: Formlet transformations as a function of scale and gain. Dashed lines denote invalid
formlet parameters outside the diffeomorphism bounds of Equation 7.
3.3. Formlet Composition
The power of formlets is that they can be composed to produce complex
shapes while preserving topology. We define the forward formlet composition
problem as follows. Given an embryonic shape Γ0(t) and a sequence of K
formlets {f1. . . fK}drawn from a formlet dictionary D, determine the resulting
deformed shape ΓK(t). The problem is well-posed because the set of simple
closed curves is closed under formlet deformation: multiple formlets can be
composed to generate complex shape transformations. Thus,
ΓK(t)=(fKfK1◦ · · · f1)(Γ0(t)).(10)
Figure 5 shows an example of forward composition from a circular embryonic
shape, where the formlet parameters ζ , σ, and αhave been randomly selected.
Note that a rich set of complex shapes is generated without leaving the space
of valid shapes (simple, closed contours).
A more difficult but interesting problem is inverse formlet composition: given
an observed shape Γobs(t), determine a sequence of Kformlets {f1. . . fK}, drawn
10
Figure 5: Shapes generated by random formlet composition over the unit circle. The first
two rows show the result of applying 5 successive random formlets. The asterisk and circle
indicate formlet location ζand scale σ, respectively. The bottom row shows some example
shapes produced from the composition of many random formlets.
from a formlet dictionary D, that best approximate Γobs(t) by minimizing some
reconstruction error ξ. Here we measure error as the L2norm of the residual:1
ξobs,ΓK) = Γobs(t)ΓK(t)
2
2
=Z1
0Γobs(t)ΓK(t)obs (t)ΓK(t))dt.
(11)
1For notational simplicity, we treat contours as continuous functions of arc length t. In
practice, we represent contours as 128-point vectors. All integrals map to summations in a
straightforward manner.
11
4. Formlet Pursuit
4.1. Dictionary Method
As a first attempt to estimate the optimal formlet sequence {f1. . . fK}, we
propose a version of matching pursuit for sparse approximation [25], replacing
the linear summation of elements by a non-commutative composition of formlet
components. Algorithm 1 shows the flow of the formlet pursuit algorithm.
Algorithm 1: Formlet Pursuit of Γobs.
Initialization: define Γ0=AΓ0+z0to be a best matching ellipse
approximating Γobs
for k= 1, . . . , K do
Optimal Formlet: compute maximal error reducing transformation
fk= argmin
f∈D
ξobs, f k1))
Update Approximation: apply optimal formlet
Γk=fkk1)
Initialization. Given an observed target shape Γobs, we initialize the model as
a 128-point polygon sampling the unit circle, and form a 1:1 correspondence
between the model and target points that remains fixed throughout pursuit.
We next apply an affine transformation to the model to generate an embryonic
elliptical shape Γ0minimizing the L2error ξΓobs, f0).
Formlet Selection. At iteration kof the formlet pursuit algorithm, we select
the formlet fk(z;ζk, σk, αk) that, when applied to the current model Γk1,
maximally reduces the approximation error:
fk= argmin
f∈D
ξobs, f k1)).(12)
This is a difficult non-convex optimization problem, and experimentation
with gradient descent methods has shown that the formlet parameter space can
have many local minima. One saving grace is that the formlet transformation
is linear with respect to the gain α, allowing αto be recovered analytically.
Specifically, consider an alternative but equivalent representation of the formlet
described by Equation 3:
f(z;ζ, σ, α) = z+α·g(zζ , σ) where
g(zζ;σ) = zζ
|zζ|sin 2π|zζ|
σexp |zζ|2
σ2.(13)
In Appendix Appendix A we show that, if we fix both the formlet location
ζand scale σ, the optimal unconstrained gain αfor formlet fkis given by
12
α=Γobs Γk1, g Γk1ζ;σdt
gk1ζ;σ)
2
2
.(14)
where ,·i is the inner product on functions f: [0,1] Cgiven in equation
A.3.
One complication is that Equation 14 may yield a gain value αthat does
not satisfy the diffeomorphism constraint given by Equation 7. However, from
Equations 3 and 11 it can be seen that the error is a quadratic function of the
gain α. Thus the optimal constrained gain α
cfor given ζand σparameters is
simply the optimal unconstrained gain αexpressed by Equation 14, thresholded
by the diffeormorphism constraints:
α
c=
αlfor α< αl
αfor αlααu
αufor α> αu,
(15)
where
αl=(2π)1σ
and
αu0.1956σ.
Thus search for the optimal formlet can proceed by sampling from a dictio-
nary over location ζand scale σparameters, computing the optimal constrained
gain α
cin each case, and then selecting the resulting formlet that yields mini-
mum error.
Figure 6 shows an example of formlet pursuit with this dictionary on an
example animal shape.
Figure 6: Formlet pursuit of an example animal shape. We first show the initial unit
circle, followed by the least-squares ellipse embryo Γ0(t), and the models Γk, where k=
1,2,3,4,8,16,32.The last curve shows the model Γ32 without the target curve Γobs.
13
4.2. Dictionary Descent Method
While the formlet pursuit method has the advantage of simplicity, it is far
from optimal, as it ignores most smoothness properties that the error function
might enjoy, aside from the quadratic dependence upon the gain α. As a con-
sequence one must face the tradeoff between accuracy, which requires that the
parameter space be sampled finely, and speed, which limits the capacity of the
dictionary.
We can potentially improve upon the standard dictionary method by em-
ploying a smaller dictionary, and initiating a local gradient descent search from
the mmost promising formlets to determine the formlet parameters that locally
minimize the error function.
Figure 7 compares pursuit for the standard dictionary and dictionary descent
methods on a particular example animal shape: the higher accuracy of the
dictionary descent method is evident. Table 1 shows the performance of the two
methods on the entire shape dataset. The dictionary descent method improves
accuracy by roughly 11%, and runs about 42% faster than standard pursuit. We
use the dictionary descent method in our evaluation below. An implementation
is available at www.elderlab.yorku.ca/formlets.
Figure 7: Pursuit of an example animal shape with standard dictionary search (top row) and
dictionary descent (bottom row) for K=1,2,4,8,16.
Table 1: Comparison of Dictionary and Dictionary Descent methods on entire animal dataset.
Optimization Method L2Error Run Time (min)
Dictionary 0.00535 1.9
Dictionary Descent 0.00476 1.1
5. Implementation Details
5.1. Shape Dataset
To explore the inverse problem of constructing formlet representations of
planar shapes, we employ a database consisting of 391 blue-screened images of
14
animal models from the Hemera Photo-Object database. The boundary of each
object was sampled at 128 points at regular arc-length intervals. Each resulting
polygon was then shifted to have 0 mean and scaled to have unit L2norm in
both vertical and horizontal directions:
Z1
0
Reobs(t))2dt =Z1
0
Imobs(t))2dt = 1.(16)
This scaling generally alters the aspect ratio of the shape: we invert this distor-
tion when displaying our results. The full dataset of object shapes used in this
paper is available at www.elderlab.yorku.ca/formlets.
5.2. Dictionary Method: Discretization
To evaluate this formlet pursuit algorithm, we constructed a dictionary con-
sisting of a regular sampling of the position parameter ζon a 64 ×64 grid
roughly 4 times the extent of the average shape, and the scale parameter σat
16 regularly-spaced values over (0,0.8].
5.2.1. Tuning the Dictionary Descent Method
Since our objective function is the L2-norm of the residual error between
the observed curve and the approximation, we employed the MATLAB function
lsqnonlin(), which is optimized for non-linear least squares problems, and com-
pute the Jacobian of the objective function analytically (Appendix Appendix
B). We tuned the parameters of our Dictionary Descent method in stepwise
fashion. First, we determined appropriate values for the tolerance parameters
xT ol and f T ol of lsqnonlin(), which determine the stopping criteria for the
parameters and error function, respectively. We employed a sparse dictionary,
sampling the position parameter ζon a 16 ×16 grid, and the scale parameter σ
at 4 regularly-spaced values over (0,0.8]. We initiated descent at the m= 100
lowest error solutions. Using a small subset of our animal dataset containing
only four animal shapes, we performed a grid search in log space over the xT ol
and f T ol parameters in the range 101to 109, computing the average run-
ning time and L2error for a 32-formlet approximation. All experiments were
conducted on a Power Mac G5 with a 2.66 Ghz quad-core Intel Xeon processor,
running MATLAB R2009b.
The results are shown in Table 2. Error was found to be minimized for
parameter values of xT ol = 103, f T ol = 106: we used these values for all
further experiments.
Second, we optimized the density of the dictionary and number mof dic-
tionary formlets selected for descent, using the descent parameters optimized
above, the same 4 training shapes, and 32-formlet approximation. The running
time and accuracy results are shown in Tables 3 and 4 respectively. Sampling
ζon a 51 ×51 grid, the scale parameter σat 13 values, and launching m= 25
descents from the most promising formlets, we found that for these four training
images we could improve the accuracy over the standard dictionary method by
a factor of more than two, while saving roughly 30% in computation time.
15
Table 2: Average L2error (×100) for a 32-formlet model, as a function of the gradient descent
termination criteria.
fTol/xTol 1E-01 1E-02 1E-03 1E-04 1E-05 1E-06 1E-07 1E-08 1E-09
1E-01 6.86 6.31 6.54 6.54 6.54 6.54 6.54 6.54 6.54
1E-02 5.68 5.26 5.11 5.02 5.02 5.02 5.02 5.02 5.02
1E-03 5.91 4.66 4.04 4.16 4.16 4.16 4.16 4.16 4.16
1E-04 5.80 3.65 4.02 3.72 3.72 3.72 3.72 3.72 3.72
1E-05 5.75 3.69 3.66 3.73 3.90 3.90 3.90 3.90 3.90
1E-06 5.75 3.69 3.52 3.73 3.74 3.74 3.74 3.74 3.74
1E-07 5.75 3.69 3.54 3.71 3.66 3.66 3.66 3.66 3.66
1E-08 5.75 3.69 3.54 3.67 3.66 3.66 3.66 3.66 3.66
1E-09 5.75 3.69 3.54 3.67 3.66 3.66 3.66 3.66 3.66
Table 3: Average running time per shape (min) for a 32-formlet model, as a function of
dictionary size nand number of descents m.
m/n 642×16 582×14 512×13 452×11 382×10 322×8 262×6 192×5
0 1.28 0.93 0.68 0.45 0.29 0.16 0.09 0.04
1 1.38 1.01 0.74 0.50 0.33 0.21 0.11 0.07
5 1.44 1.06 0.78 0.55 0.38 0.24 0.18 0.13
10 1.49 1.10 0.84 0.60 0.44 0.31 0.25 0.20
15 1.56 1.17 0.90 0.67 0.50 0.37 0.32 0.28
20 1.60 1.23 0.95 0.73 0.56 0.42 0.41 0.35
25 1.67 1.28 1.01 0.78 0.62 0.48 0.47 0.42
30 1.72 1.34 1.07 0.84 0.68 0.55 0.53 0.51
Table 4: Average residual (×1000) for a 32-formlet model, as a function of dictionary size n
and number of descents m.
m/n 642×16 582×14 512×13 452×11 382×10 322×8 262×6 192×5
0 8.0 6.5 8.1 9.5 9.3 14.0 23.5 28.1
1 3.7 4.7 4.8 5.8 5.0 5.9 7.4 14.0
5 3.6 3.8 4.7 5.9 4.2 6.4 7.4 7.1
10 4.1 3.7 3.8 45 4.5 6.1 6.9 6.4
15 3.6 3.9 4.1 4.0 4.1 5.7 6.7 8.0
20 3.4 3.8 3.8 4.5 4.1 4.8 6.6 8.0
25 3.7 3.8 3.7 4.5 4.2 4.8 6.5 7.4
30 3.4 3.9 3.7 4.4 4.2 4.4 5.7 7.5
16
Interestingly, we found that tightening tolerance parameters, increasing the
dictionary density, or increasing the number of deployments of the optimizer did
not always decrease the error. However, at a given iteration, error did decrease
monotonically as a function of each of these parameters, as expected. Thus the
non-monotonic variation in error with these parameters appears to reflect the
non-optimality of the greedy pursuit algorithm. In other words, selecting the
formlet that minimizes the residual at stage iwill not necessarily lead to the
smallest error at stage k > i.
6. Evaluation
To evaluate and compare shape models, we address the problem of contour
completion, using our animal shape dataset. In natural scenes, object bound-
aries are often fragmented by occlusion and loss of contrast: contour completion
is the process of filling in the missing parts. Completion can also be an impor-
tant component of perceptual organization algorithms: given one or more partial
contour hypotheses, completion can be used to estimate the locations of missing
parts. These estimate can then guide search for corroborating evidence.
We compare our formlet model with the shapelet model described in Section
2.1 [9]. For each shape in the dataset, we simulate the occlusion of a 10% or
30% continuous section of the contour, and allow the two methods to pursue
only the remaining visible portion.
The rate of convergence of both formlet and shapelet methods depends upon
how the parameters are sampled. For formlet pursuit, we use the dictionary de-
scent method described in Section 4.2. For the shapelet method, we used the
standard dictionary method of Dubinskiy et al. [9], optimizing performance by
sampling as finely as possible given time constraints. The shapelet representa-
tion assumes an arc-length representation of the curves on t[0,1], and each
shapelet component has an arc-length position µand scale σ. We sampled the
position parameter µat 128 regularly-spaced values over [0,1], and the scale
parameter σat 128 regularly-spaced values over (0,1]. The affine parameters
were computed analytically [9].
The formlet and shapelet pursuit algorithms were initialized with the same
embryonic ellipses, and were governed by a minimization of the L2error (Equa-
tion 11) over the visible points of the curves only. While pursuit is based on
a fixed 1:1 correspondence between points on the target and model curves, we
measure performance using the L2Hausdorff distance to avoid potential depen-
dence of the evaluation upon the parameterization of the curves. Specifically,
we define the error between the target shape and the model as the average
minimum distance of a point on one of the shapes to the other shape:
ξHobs,Γk) = sZ1
0
1
2min
t0[0,1) |Γobs(t)Γk(t0)|2+ min
t0[0,1) |Γobs(t0)Γk(t)|2dt.
(17)
17
We measured the residual error between the model and target for both the
visible and occluded portions of the shapes. Performance on the occluded por-
tion, where the model is under-constrained by the data, reveals how well the
structure of the model captures properties of natural shapes.
Implementations for both the formlet and shapelet models are available at
www.elderlab.yorku.ca/formlets.
6.1. Results
Figure 8 shows some example qualitative results for this experiment. While
shapelet pursuit introduces topological errors in both visible and occluded re-
gions, formlet pursuit remains topologically valid, as predicted.
Figure 8: Examples of 30% occlusion pursuit with shapelets (red) and formlets (blue) for
k= 0,2,4,8,16,32. Solid lines indicate visible contour, dashed lines indicate occluded contour.
Figure 9 shows quantitative results for this experiment. While the shapelet
and formlet models achieve comparable error on the visible portions of the
boundaries, the error is substantially lower for the formlet representation on
the occluded portions. This suggests that the structure of the formlet model
better captures regularities in the shapes of natural objects. We believe that the
two principal reasons for this are a) respecting the topology of the shape prunes
off many inferior completion solutions and b) by working in the image space,
rather than arc length, the formlet model is better able to capture important
regional properties of shape.
18
Formlet Occluded
Formlet Visible
Shapelet Occluded
Shapelet Visible
30% Occlusion
Normalized RMS Error
Number of Components
10% Occlusion
Normalized RMS Error
Number of Components 010 20 30
0 10 20 30 0
0.01
0.02
0.03
0.04
0.05
0
0.01
0.02
0.03
0.04
0.05
0.06
Figure 9: Results of occlusion pursuit evaluation. Black denotes error for Γ0(t), the affine-fit
ellipse.
7. Discussion
7.1. Formlet Parameter Distributions
The focus of this paper is to establish the appropriate structural properties
for a generative model of planar shape. To ultimately apply this representation
to problems such as object detection and recognition, statistical models over this
representation must be developed. One small step is to consider the distribution
of formlet parameters selected in pursuit of the shapes in our animal dataset.
Figure 10 shows how the means of the formlet parameters vary as pursuit
unfolds. We observe that scales decrease over time (a), reflecting a coarse-to-fine
approximation. Gains also decrease over time (b), although when normalized
by scale (c), this decline is moderated substantially. Finally, formlet locations
are biased to the centre of the shape and are roughly isotropic (d), with a slight
bias to the lower field, presumably reflecting the additional details required to
represent the legs of the animals.
7.2. Alternative Formlet Bases
In this paper we have chosen a particular Gabor-like formlet representation
(Equation 3) that confers several key properties:
1. The family of formlets forms a self-similar scale space.
2. Each formlet acts within a σ-ball around a specific location ζ, converging
to the identity as |zζ|→∞.
3. The mapping is smooth everywhere except at ζ, where it is C0.
4. Deformation is isotropic and radial around ζ.
There are of course other formulations that would also satisfy these prop-
erties. Here we consider two specific alternatives and compare them with the
Gabor formulation.
19
Mean scale (σ)
Formlet number k
020 40
0
0.1
0.2
0.3
0.4
(a) Mean scale at each iteration
Mean absolute gain (|α|)
Formlet number k
0 20 40
0
0.02
0.04
0.06
(b) Mean gain at each iteration (expansive
formlets)
Mean abs gain:scale (|α|)
Formlet number k
020 40
0.05
0.1
0.15
0.2
0.25
(c) Mean gain at each iteration (compressive
formlets)
Im(ζ)
Re(ζ)
0.40.20 0.2
0.4
0.2
0
0.2
0.4
(d) Location histogram
Figure 10: Marginal distributions of formlet parameters. Error bars indicate standard error
of the mean.
7.3. Gaussian Derivative Formlets
We simplify the original Gabor formulation of Equation 3 by replacing the
sinusoidal factor with a first-order Taylor series approximation, yielding:
f(z;ζ, σ, α) = ζ+zζ
|zζ|ρ(|zζ|),where (18)
ρ(r) = r+α2πr
σexp r2
σ2(19)
Note that the deformation term of the radial deformation function ρ(r) is
proportional to the first Gaussian derivative in r.
fis a diffeomorphism iff ρ0(r)>0 everywhere:
ρ0(r) = 1 + exp r2
σ22πα
σ12r2
σ2>0.(20)
20
For α < 0, the minimum is attained when r= 0:
α > 1
2πσ(21)
For α > 0, by solving ρ00(r) = 0 it can be shown that the minimum is attained
when r=p3/2σ. Substituting into Equation 20 then yields
α < exp(3/2)
4πσ(22)
Thus fis a diffeomorphism iff ασ
2π1,1
2exp(3/2).
7.4. Spline Formlets
Both the Gabor and Gaussian formlets have infinite support, which increases
computation time and limits the degree to which formlets can be computed in
parallel. To achieve strictly compact support we impose the constraint that
ρ(r;σ) = rf(z) = zwhenever r > σ. To guarantee smoothness, we
require ρ(σ;σ) = σand ρ0(σ;σ) = 1 and to achieve continuity at ζwe require
ρ(0) = 0. The simplest spline meeting all these conditions is:
ρ(r;σ) = (r+αr
σ2(rσ)2for rσ
rfor r > σ (23)
We derive the diffeomorphism constraints as before:
ρ0(r) = 1 + α
σ2(rσ)2+r·2(rσ)>0 (24)
α
σ23r24+σ2>1 (25)
For α < 0, the minimum is attained when r= 0, yielding α > 1.
For α > 0, by solving ρ00(r) = 0 it can be shown that the minimum is
attained when r= 2σ/3. Substituting into Equation 24 then yields α < 3.
Thus fis a diffeomorphism iff α(1,3)
7.5. Comparison of Formlet Bases
Figures 11 - 12(b) show the radial deformation functions, examples of pursuit
and rate of convergence for these three different formulations. Empirically, we
find that the Gabor formulation achieves a better rate of convergence on the
animal dataset than the competing formulations, although at this stage we do
not have a clear theoretical explanation for this result.
21
Spline
Gaussian
Gabor
ρ(r)
r
0 10 20
0
5
10
15
20
Figure 11: Radial deformation function for three formlet bases.
8. Conclusion
We have developed a novel generative model of planar shape that satisfies
a number of essential properties. In this model, complex shapes are seen as
the evolution of a simple embryonic shape by successive application of simple
diffeomorphic transformations of the plane called formlets. The system is both
complete and closed, since arbitrary shapes can be modeled, and generated
shapes are guaranteed to be topologically valid. This means that the model has
the potential to support accurate probabilistic modeling. We have demonstrated
a novel dictionary descent formlet pursuit algorithm that selects formlets to ef-
ficiently approximate given target shapes. Evaluation of the formlet pursuit
model on the problem of shape completion revealed that the model is better
able to approximate parts of shapes missing due to occlusion than a competing
contour-based method. Our animal object dataset, experimental results, exam-
ple movies and implementations for both the formlet and shapelet models are
available at www.elderlab.yorku.ca/formlets.
Future Work. We hope to extend the present work in a number of ways. First,
we would like to generalize our definition of formlets to allow for anisotropic
deformation that could efficiently model elongated parts such as animal limbs.
Second, we would like to develop probabilistic models over the formlet repre-
sentation. Finally, we are interested in using the formlet pursuit algorithm for
contour grouping, using detected fragments to generate predictions for where
other fragments of the same object boundary might be found.
22
(a) Pursuit of an example shape with Gabor (blue), Gaussian (green) and spline (red) bases for
K=1,2,4,8,16.
Spline
Gaussian
Gabor
Normalized RMS Error
Number of Components
0 10 20 30
0.005
0.01
0.015
0.02
(b) Mean L2Hausdorff error for formlet pursuit over animal dataset
with three different formlet bases.
Figure 12: Comparison of the three different formlet bases.
23
References
[1] S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition
using shape contexts, Pattern Analysis and Machine Intelligence, IEEE
Trans. 24 (2002) 509–522.
[2] P. Cavanagh, What’s up in top-down processing, in: A. Gorea (Ed.),
Representations of Vision, Cambridge University Press, Cambridge, UK,
1991 edition, 1991, pp. 295–304.
[3] D. Mumford, Mathematical theories of shape: do they model perception?,
in: B. C. Vemuri (Ed.), Geometric Methods in Computer Vision, volume
1570, SPIE, 1991, pp. 2–10.
[4] F. Attneave, Some informational aspects of visual perception, Psychol.
Rev. 61 (1954) 183–193.
[5] T. F. Cootes, C. J. Taylor, D. H. Cooper, J. Graham, Active shape models
- their training and application, Comput. Vis. Image Underst. 61 (1995)
38–59.
[6] T. Pavlidis, Structural pattern recognition, volume 1, Springer-Verlag,
Berlin, illustrated edition, 1977.
[7] D. D. Hoffman, W. A. Richards, Parts of recognition, Cognition 18 (1984)
65–96.
[8] F. Mokhtarian, A. Mackworth, Scale-based description and recognition of
planar curves and two-dimensional shapes, Pattern Analysis and Machine
Intelligence, IEEE Trans. 8 (1986) 34–43.
[9] A. Dubinskiy, S. Zhu, A multiscale generative model for animate shapes
and parts, in: Proc. 9th IEEE ICCV, volume 1, pp. 249–256.
[10] H. Blum, Biological shape and visual science (part i), J. Theoretical Biology
38 (1973) 205–287.
[11] H. Blum, R. N. Nagel, Shape description using weighted symmetric axis
features, Pattern Recognition 10 (1978) 167 – 180.
[12] M. Brady, H. Asada, Smoothed local symmetries and their implementation,
Int. J. Robotics Res. 3 (1984) 36–61.
[13] M. Leyton, A process-grammar for shape, Artificial Intelligence 34 (1988)
213 – 247.
[14] B. B. Kimia, A. R. Tannenbaum, S. W. Zucker, Shapes shocks and de-
formations i: the components of two dimensional shape and the reaction
diffusion space, Int. J. Comput. Vision 15 (1995) 189–224.
24
[15] S. Osher, J. A. Sethian, Fronts propagating with curvature-dependent
speed, J. Comput. Phys. 79 (1988) 12–49.
[16] N. Trinh, B. Kimia, A symmetry-based generative model for shape, in:
Proc. 11th IEEE ICCV, pp. 1–8.
[17] A. Pentland, S. Sclaroff, Closed-form solutions for physically based shape
modeling and recognition, Pattern Analysis and Machine Intelligence,
IEEE Transactions on 13 (1991) 715–729.
[18] S. Scarloff, A. Pentland, Modal matching for correspondence and recogni-
tion., Pattern Analysis and Machine Intelligence, IEEE Trans. 17 (1995)
545–561.
[19] S.-C. Zhu, Embedding gestalt laws in markov random fields, Pattern
Analysis and Machine Intelligence, IEEE Trans. 21 (1999) 1170–1187.
[20] D. W. Thompson, On growth and form, Cambridge University Press, Cam-
bridge, UK, abridged ed./edited edition, 1961.
[21] A. Jain, Y. Zhong, S. Lakshmanan, Object matching using deformable
templates, Pattern Analysis and Machine Intelligence, IEEE Trans. 18
(1996) 267–278.
[22] E. Sharon, D. Mumford, 2d-shape analysis using conformal mapping, Com-
puter Vision and Pattern Recognition, IEEE Comp. Soc. Conf. 2 (2004)
350–357.
[23] U. Grenander, A. Srivastava, S. Saini, A pattern-theoretic characterization
of biological growth, Medical Imaging, IEEE Trans. 26 (2007) 648–659.
[24] T. Oleskiw, J. Elder, G. Peyr´e, On growth and formlets, Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2010).
[25] S. Mallat, Z. Zhang, Matching pursuits with time frequency dictionaries,
Signal Processing, IEEE Trans. 41 (1993) 3397–3415.
Appendix A. Computation of Optimal Gain
Since the formlet deformation of Equation 3 is linear in the gain α, given
fixed location ζand scale σparameters, the gain that minimizes the L2deviation
from the target shape can be computed analytically. Specifically, suppose that
the observed curve Γobs is currently approximated by Γk1. For given formlet
location and scale parameters ζand σ, we define the optimal unconstrained gain
αfor formlet fkas:
α= argmin
αR
ξobs, f k1;ζ, σ, α)) (A.1)
25
where, for curves aand b,ξa,Γb) denotes the L2error metric
Z1
0
Re Γa(t)Γb(t)2+Im Γa(t)Γb(t)2dt. (A.2)
induced by the inner product:
Γa,Γb=Z1
0
Re Γa(t)Re Γb(t) + Im Γa(t)Im Γb(t)dt (A.3)
Using Equation 13, we differentiate ξwith respect to αand set to zero:
∂α kΓobs fk1)k2=
∂α kΓres αgk2(A.4)
=
∂α kΓresk22αhΓres , gi+α2kgk2(A.5)
= 2(hΓres, g i − αkgk2) = 0 (A.6)
α=hΓres, g i
kgk2
where we used the shorthand g=gΓk1ζ;σ,Γres = Γobs Γk1.
As a result, given fixed ζand σ, the optimal unconstrained gain αthat
maximally reduces the L2error between the observed curve Γobs and current
approximation Γk1is given by
α=Γobs Γk1, g Γk1ζ;σdt
gk1ζ;σ)
2
2
.(A.7)
Note that in general Equation A.7 may produce an optimal gain outside
the diffeomorphism bounds of Equation 7. However, the optimal gain that
satisfies the constraint is simply the unconstrained gain αthresholded by the
diffeomorphism constraints, as described in Section 4.1.
Appendix B. Jacobian Computation for Nonlinear Least Squares Min-
imization
The dictionary descent optimization method described in Section 4.2 em-
ploys the MATLAB gradient descent method lsqnonlin to determine the loca-
tion parameter ζand scale parameter σ.lsqnonlin uses the Jacobian of the
error function in the unknown parameters to iterate toward the local minimum.
The method performs best if an analytic form of the Jacobian can be provided.
Note that since the optimal gain α
cis determined analytically (Equation 15),
this value must be used in all computations of the Jacobian in order to determine
locally optimally values for the other parameters.
26
Combining Equations 11 and 13, and using r=Γk1ζ, the error function
can be written as
ξobs,ΓK) = kobs fk1)k2
=
Γobs Γk1α
c
Γk1ζ
rsin 2πr
σexp r2
σ2
2
.
Now defining
Γres = Γobs Γk1
and
G=G(r;σ) = 1
rsin 2πr
σexp r2
σ2,
and using xand ysubscripts to denote real and imaginary components, we can
rewrite this expression as
ξobs,ΓK) = Z1
0res
x(t)) α
ck1
x(t)ζx)G2dt
+Z1
0res
y(t)) α
ck1
y(t)ζy)G2dt
Z1
0ξx(t)2+ξy(t)2dt.
Since the error is a function of the optimal gain α
cand α
cis a function of
the location parameter ζand the scale parameter σ, we will need the partial
derivative of α
cwith respect to these two parameters. From Equation 15, we
have
α
c=
αlfor α< αl
αfor αlααu
αufor α> αu,
where
αl=(2π)1σ
and
αu0.1956σ,
and αis given by Equation 14. Thus we have
∂α
c
∂σ =
(2π)1for α< αl
∂α
∂σ for αlααu
0.1956 for α> αu
27
∂α
c
∂ζx
=
0 for α< αl
∂α
∂ζx
for αlααu
0 for α> αu
∂α
c
∂ζy
=
0 for α< αl
∂α
∂ζy
for αlααu
0 for α> αu
Thus to determine the partial derivatives of the constrained gain α
c, we must
compute the partial derivatives of the unconstrained gain α, which is defined
by Equation 14:
α=Γobs Γk1, g Γk1ζ;σdt
gk1ζ;σ)
2
2
.
where we have used
gk1ζ;σ) = (Γk1ζ)1
rsin 2πr
σexp r2
σ2= (Γk1ζ)G(r).(B.1)
Computing the partial derivatives with respect to the scale σparameter and
location parameters ζxand ζy, we obtain:
∂α
∂σ =
∂σ hΓres , gikgk2− hΓres , gi
∂σ kgk2
kgk4, where:
∂σ hΓres, gi=
∂σ Z1
0
Γres
x(t)(Γk1
x(t)ζx)G+ Γres
y(t)(Γk1
y(t)ζy)Gdt
=Z1
0Γres(t),Γk1(t)ζ G
∂σ dt
∂σ kgk=
∂σ Z1
0
[(Γk1
x(t)ζx)G]2+ [(Γk1
y(t)ζy)G]2dt
=Z1
0
2G∂G
∂σ kΓk1(t)ζk2
∂α
∂ζx
=
∂ζxhΓres , gikgk2− hΓres , gi
∂ζxkgk2
kgk4, where:
28
∂ζx
hΓres, g i=
∂ζxZ1
0
Γres
x(t)(Γk1
x(t)ζx)G+ Γres
y(t)(Γk1
y(t)ζy)Gdt
=Z1
0
Γres
x(t)G+ (Γk1
x(t)ζx)∂G
∂ζx+ Γres
y(t)k1
y(t)ζy)∂G
∂ζxdt
∂ζx
kgk2=
∂ζxZ1
0
[(Γk1
x(t)ζx)G]2+ [(Γk1
y(t)ζy)G]2dt
=Z1
0
2k1
x(t)ζx)GG+ (Γk1
x(t)ζx)∂G
∂ζx+ 2GG
∂ζx
k1
y(t)ζy)2dt
∂α
∂ζy
=
∂ζyhΓres , gikgk2− hΓres , gi
∂ζykgk2
kgk4, where:
∂ζy
hΓres, g i=
∂ζyZ1
0
Γres
x(t)(Γk1
x(t)ζx)G+ Γres
y(t)(Γk1
y(t)ζy)Gdt
=Z1
0
Γres
x(t)k1
x(t)ζx)∂G
∂ζy+ Γres
y(t)G+ (Γk1
y(t)ζy)∂G
∂ζydt
∂ζy
kgk2=
∂ζyZ1
0
[(Γk1
x(t)ζx)G]2+ [(Γk1
y(t)ζy)G]2dt
=Z1
0
2G∂G
∂ζy
k1
x(t)ζx)2+ 2(Γk1
y(t)ζy)GG+ (Γk1
y(t)ζy)∂G
∂ζydt
We are now ready to compute the Jacobian matrix. From Equation B.1 we have
that:
ξx(ti)=Γres
x(ti)αΓk1
x(ti)ζxG
ξy(ti)=Γres
y(ti)αΓk1
y(ti)ζyG
Thus,
∂ξx(ti)
∂σ =k1
x(ti)ζx)∂α
∂σ G+αG
∂σ (B.2)
∂ξy(ti)
∂σ =k1
y(ti)ζy)∂α
∂σ G+αG
∂σ (B.3)
∂ξx(ti)
∂ζx
=∂α
∂ζx
k1
x(ti)ζx)G+αG αk1
x(ti)ζx)∂G
∂ζx
(B.4)
∂ξy(ti)
∂ζx
=k1
y(ti)ζy)∂α
∂ζx
G+α∂G
∂ζx(B.5)
∂ξx(ti)
∂ζy
=k1
x(ti)ζx)∂α
∂ζy
G+α∂G
∂ζy(B.6)
∂ξy(ti)
∂ζy
=∂α
∂ζy
k1
y(ti)ζy)G+αG αk1
y(ti)ζy)∂G
∂ζy
(B.7)
29
where for the Gabor basis, we have:
∂G
∂σ = exp r2
σ22π
σ2cos 2πr
σ+2r
σ3sin 2πr
σ
∂G
∂ζx
= exp r2
σ2k1
x(ti)ζx)
r2π
σr cos 2πr
σ+2
σ2sin 2πr
σ+1
r2sin 2πr
σ
∂G
∂ζy
= exp r2
σ2k1
y(ti)ζy)
r2π
σr cos 2πr
σ+2
σ2sin 2πr
σ+1
r2sin 2πr
σ
It is straightforward to show that Equations B.2 - B.7 also apply to the
Gaussian and Spline bases (Section 7.2), with suitable definitions of G(r;σ):
Gaussian Basis:
G(r;σ) = 2π
σexp r2
σ2
∂G
∂σ = 2πexp r2
σ21
σ2+2r2
σ4
∂G
∂ζx
=4π
σ3exp r2
σ2k1
xζx)
∂G
∂ζy
=4π
σ3exp r2
σ2k1
yζy)
Spline Basis:
G(r;σ) = (rσ)2
σ2
∂G
∂σ =2
σ3(rσ)22
σ2(rσ)
∂G
∂ζx
=2
2(rσ)(Γk1
xζx)
∂G
∂ζy
=2
2(rσ)(Γk1
yζy)
30
... A third class of theory considers shape as a process of transformation or growth (Elder, Oleskiw, Yakubovich, & Peyré, 2013;Grenander, Srivastava, & Saini, 2007;Jain, Zhong, & Lakshmanan, 1996;Leyton, 1989;Sharon & Mumford, 2006;Thompson, 1917). These theories have a natural generative expression that can be used to support inference with noisy or incomplete visual data and are more able to capture topological properties of objects (Elder et al., 2013). ...
... A third class of theory considers shape as a process of transformation or growth (Elder, Oleskiw, Yakubovich, & Peyré, 2013;Grenander, Srivastava, & Saini, 2007;Jain, Zhong, & Lakshmanan, 1996;Leyton, 1989;Sharon & Mumford, 2006;Thompson, 1917). These theories have a natural generative expression that can be used to support inference with noisy or incomplete visual data and are more able to capture topological properties of objects (Elder et al., 2013). ...
... Similarly, it is possible that in our experiments higher shape frequencies are coded at least partially incoherently by localized shape mechanisms and combined through nonlinear (e.g., energy) pooling. Candidate localized shape encoding mechanisms include shapelets (Dubinskiy & Zhu, 2003) and formlets (Elder et al., 2013). It is also quite possible that higher FD frequency components are not processed independently from other components. ...
Article
Classification image analysis is a powerful technique for elucidating linear detection and discrimination mechanisms, but it has primarily been applied to contrast detection. Here we report a novel classification image methodology for identifying linear mechanisms underlying shape discrimination. Although prior attempts to apply classification image methods to shape perception have been confined to simple radial shapes, the method proposed here can be applied to general 2-D (planar) shapes of arbitrary complexity, including natural shapes. Critical to the method is the projection of each target shape onto a Fourier descriptor (FD) basis set, which allows the essential perceptual features of each shape to be represented by a relatively small number of coefficients. We demonstrate that under this projection natural shapes are low pass, following a relatively steep power law. To efficiently identify the observer's classification template, we employ a yes/no paradigm and match the spectral density of the stimulus noise in FD space to the power law density of the target shape. The proposed method generates linear template models for animal shape detection that are predictive of human judgments. These templates are found to be biased away from the ideal, overly weighting lower frequencies. This low-pass bias suggests that higher frequency shape processing relies on nonlinear mechanisms.
... While general smooth coordinate transformations of the plane will not preserve the topology of an embedded curve, it is possible to design a specific family of diffeomorphic transformations that will [9,17,27]. It then follows immediately by induction that a generative model based upon arbitrary sequences of diffeomorphisms will preserve topology. ...
... Specifically, let us consider a family of diffeomorphisms we will call formlets [9,27], in tribute to D'Arcy Thompson's seminal book On Growth and Form [36]. A formlet is a simple, isotropic, radial deformation of planar space that is localized within a specified circular region of a selected point in the plane. ...
... As the formlets defined in Eq. (5.1) are both isotropic and angle preserving, it is sufficient to require that the radial deformation ρ be a diffeomorphism of R + , i.e., that ρ(r; σ, α) be strictly increasing in r. It can be shown [9,17,27] that this requirement leads to a very simple diffeomorphism constraint: ...
Chapter
Humans are very good at rapidly detecting salient objects such as animals in complex natural scenes, and recent psychophysical results suggest that the fastest mechanisms underlying animal detection use contour shape as a principal discriminative cue. How does our visual system extract these contours so rapidly and reliably? While the prevailing computational model represents contours as Markov chains that use only first-order local cues to grouping, computer vision algorithms based on this model fall well below human levels of performance. Here we explore the possibility that the human visual system exploits higher-order shape regularities in order to segment object contours from cluttered scenes. In particular, we consider a recurrent architecture in which higher areas of the object pathway generate shape hypotheses that condition grouping processes in early visual areas. Such a generative model could help to guide local bottom-up grouping mechanisms toward globally consistent solutions. In constructing an appropriate theoretical framework for recurrent shape processing, a central issue is to ensure that shape topology remains invariant under all actions of the feedforward and feedback processes. This can be achieved by a promising new theory of shape representation based upon a family of local image deformations called formlets, shown to outperform alternative contour-based generative shape models on the important problem of visual shape completion.
... They used evolving splines to capture these deformations. Several newer models have used the same idea of template deformation but adjusted the algorithms to make the shape representations more robust to local contour changes and partial occlusion [40,41]. These impressive models, originating from computer vision, use sophisticated mathematical tools to capture shape representation. ...
Article
Full-text available
shape is perceived and represented poses crucial unsolved problems in human perception and cognition. Recent findings suggest that the visual system may encode contours as sets of connected constant curvature segments. Here we describe a model for how the visual system might recode a set of boundary points into a constant curvature representation. The model includes two free parameters that relate to the degree to which the visual system encodes shapes with high fidelity vs. the importance of simplicity in shape representations. We conducted two experiments to estimate these parameters empirically. Experiment 1 tested the limits of observers’ ability to discriminate a contour made up of two constant curvature segments from one made up of a single constant curvature segment. Experiment 2 tested observers’ ability to discriminate contours generated from cubic splines (which, mathematically, have no constant curvature segments) from constant curvature approximations of the contours, generated at various levels of precision. Results indicated a clear transition point at which discrimination becomes possible. The results were used to fix the two parameters in our model. In Experiment 3, we tested whether outputs from our parameterized model were predictive of perceptual performance in a shape recognition task. We generated shape pairs that had matched physical similarity but differed in representational similarity (i.e., the number of segments needed to describe the shapes) as assessed by our model. We found that pairs of shapes that were more representationally dissimilar were also easier to discriminate in a forced choice, same/different task. The results of these studies provide evidence for constant curvature shape representation in human visual perception and provide a testable model for how abstract shape descriptions might be encoded.
... Though compositional, these models are arguably not structural, as each component applies to the whole contour. Other contour-based shape theories that satisfy this locality constraint involve deformation of an embryonic shape by the addition of morphing primitives (e.g., Dubinskiy & Zhu, 2003;Elder et al., 2013). These impressive models, originating from computer vision, use sophisticated mathematical tools to capture shape representation. ...
Article
Full-text available
How the visual system represents shape, and how shape representations might be computed by neural mechanisms, are fundamental and unanswered questions. Here, we investigated the hypothesis that 2-dimensional (2D) contour shapes are encoded structurally, as sets of connected constant curvature segments. We report 3 experiments investigating constant curvature segments as fundamental units of contour shape representations in human perception. Our results showed better performance in a path detection paradigm for constant curvature targets, as compared with locally matched targets that lacked this global regularity (Experiment 1), and that participants can learn to segment contours into constant curvature parts with different curvature values, but not into similarly different parts with linearly increasing curvatures (Experiment 2). We propose a neurally plausible model of contour shape representation based on constant curvature, built from oriented units known to exist in early cortical areas, and we confirmed the model's prediction that changes to the angular extent of a segment will be easier to detect than changes to relative curvature (Experiment 3). Together, these findings suggest the human visual system is specially adapted to detect and encode regions of constant curvature and support the notion that constant curvature segments are the building blocks from which abstract contour shape representations are composed. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
... However, it remains unknown how a 3D skeletal structure arises from 2D images on the retina. One possibility is that skeletal computations in the visual system invoke generative shape processes (Elder, Oleskiw, Yakubovich, & Peyré, 2013;Trinh & Kimia, 2007). These processes may be able to recover an object's 3D skeletal structure from retinal images by incorporating a small number of image-computable 2D skeletons (e.g., one from each eye; Qiu, Hatori, & Sakai, 2015). ...
Preprint
Full-text available
With seemingly little effort, humans can both identify an object across large changes in orientation and extend category membership to novel exemplars. Although researchers argue that object shape is crucial in these cases, there are open questions as to how shape is represented for object recognition. Here we tested whether the human visual system incorporates a three-dimensional skeletal descriptor of shape to determine an object’s identity. Skeletal models not only provide a compact description of an object’s global shape structure, but also provide a quantitative metric by which to compare the visual similarity between shapes. Our results showed that a model of skeletal similarity explained the greatest amount of variance in participants’ object dissimilarity judgments when compared with other computational models of visual similarity (Experiment 1). Moreover, parametric changes to an object’s skeleton led to proportional changes in perceived similarity, even when controlling for another model of structure (Experiment 2). Importantly, participants preferentially categorized objects by their skeletons across changes to local shape contours and non-accidental properties (Experiment 3). Our findings highlight the importance of skeletal structure in vision, not only as a shape descriptor, but also as a diagnostic cue of object identity.
... The results presented in this paper raise the question whether there are alternative, more universal shape descriptors? In a relevant study, Fruend and Elder (2015) attempted to evaluate the efficiency of a variety of alternative shape descriptors, namely, Fourier Descriptors (Alter & Schwartz, 1988;Zahn & Roskies, 1972), Shapelets (Dubinskiy & Zhu, 2003) and Formlets (Elder, Oleskiw, Yakubovich, & Peyré, 2013;Yakubovich & Elder, 2014) psychophysically. The rationale for their experiment was that if subjects reached their subjective recognition threshold for a specific animal category when represented by a given shape descriptor with fewer components (ranging from 1-10), one might argue that this representation is closer to the encoding scheme employed by the human visual system. ...
Article
The visual system is exposed to a vast number of shapes and objects. Yet, human object recognition is effortless, fast and largely independent of naturally occurring transformations such as position and scale. The precise mechanisms of shape encoding are still largely unknown. Radial frequency (RF) patterns are a special class of closed contours defined by modulation of a circle's radius. These patterns have been frequently and successfully used as stimuli in vision science to investigate aspects of shape processing. Given their mathematical properties, RF patterns cannot represent any arbitrary shape, but the ability to generate more complex, biologically relevant, shapes depicting the outlines of objects such as fruits or human heads raises the possibility that the RF patterns span a representative subset of possible shapes. However, this assumption has not been tested before. Here we show that only a small fraction of all possible shapes can be represented by RF patterns and that this small fraction is perceptually distinct from the general class of all possible shapes. Specifically, we derive a general measure for the distance of a given shape's outline from the set of RF patterns, allowing us to scan large numbers of object outlines automatically. We find that only between 1 smooth outlines can be exactly represented by RF patterns. We present results from a visual search experiment, which revealed that searching an RF pattern among non-radial frequency patterns is efficient, whereas searching an RF pattern among other RF patterns is inefficient (and vice versa). These results suggest that RF patterns represent only a restricted subset of possible planar shapes and that results obtained with this special class of stimuli cannot simply be expected to generalise to any arbitrary planar shape.
... Examples include (a) perceptual organization, recognition, and tracking in cluttered scenes, where shapes must be distinguished not just from each other, but from so-called phantom shapes formed by conjunctions of features from multiple objects (Cavanagh 1991); (b) modeling of shape articulation, growth, and deformation; and (c) modeling of shape similarity. Elder et al. (2013) articulated a set of criteria for a generative shape representation that forms a useful basis for comparing the candidate shape representations we consider in the remainder of this review (see the sidebar titled Desirable Criteria for a Generative Model of 2D Shape). ...
Article
The human visual system reliably extracts shape information from complex natural scenes in spite of noise and fragmentation caused by clutter and occlusions. A fast, feedforward sweep through ventral stream involving mechanisms tuned for orientation, curvature, and local Gestalt principles produces partial shape representations sufficient for simpler discriminative tasks. More complete shape representations may involve recurrent processes that integrate local and global cues. While feedforward discriminative deep neural network models currently produce the best predictions of object selectivity in higher areas of the object pathway, a generative model may be required to account for all aspects of shape perception. Research suggests that a successful model will account for our acute sensitivity to four key perceptual dimensions of shape: topology, symmetry, composition, and deformation.
... In his much-admired book On Growth and Form, Thompson (1942) suggested that the shapes of diverse and phylogenetically distant species are often related to one another by simple non-rigid transformations (Fig. 3a). This has inspired some perception researchers to suggest that the visual system may use a similar approach to represent complex shapes and shape transformations (Elder, Oleskiw, Yakubovich, & Peyré, 2013;Graf, 2006;Mark et al., 1988;Shaw & Pittenger, 1977;Todd, 1982;Todd, Weismantel, & Kallie, 2014). We put this idea to the test. ...
Article
Full-text available
Morphogenesis—or the origin of complex natural form—has long fascinated researchers from practically every branch of science. However, we know practically nothing about how we perceive and understand such processes. Here, we measured how observers visually infer shape-transforming processes. Participants viewed pairs of objects (‘before’ and ‘after’ a transformation) and identified points that corresponded across the transformation. This allowed us to map out in spatial detail how perceived shape and space were affected by the transformations. Participants’ responses were strikingly accurate and mutually consistent for a wide range of non-rigid transformations including complex growth-like processes. A zero-free-parameter model based on matching and interpolating/extrapolating the positions of high-salience contour features predicts the data surprisingly well, suggesting observers infer spatial correspondences relative to key landmarks. Together, our findings reveal the operation of specific perceptual organization processes that make us remarkably adept at identifying correspondences across complex shape-transforming processes by using salient object features. We suggest that these abilities, which allow us to parse and interpret the causally significant features of shapes, are invaluable for many tasks that involve ‘making sense’ of shape.
Article
Perception involves the processing of content or information about the world. In what form is this content represented? I argue that perception is widely compositional. The perceptual system represents many stimulus features (including shape, orientation, and motion) in terms of combinations of other features (such as shape parts, slant and tilt, common and residual motion vectors). But compositionality can take a variety of forms. The ways in which perceptual representations compose are markedly different from the ways in which sentences or thoughts are thought to be composed. I suggest that the thesis that perception is compositional is not itself a concrete hypothesis with specific predictions; rather it affords a productive framework for developing and evaluating specific empirical hypotheses about the form and content of perceptual representations. The question is not just whether perception is compositional, but how . Answering this latter question can provide fundamental insights into perception. This article is categorized under: Philosophy > Representation Philosophy > Foundations of Cognitive Science Psychology > Perception and Psychophysics
Article
Full-text available
With seemingly little effort, humans can both identify an object across large changes in orientation and extend category membership to novel exemplars. Although researchers argue that object shape is crucial in these cases, there are open questions as to how shape is represented for object recognition. Here we tested whether the human visual system incorporates a three-dimensional skeletal descriptor of shape to determine an object’s identity. Skeletal models not only provide a compact description of an object’s global shape structure, but also provide a quantitative metric by which to compare the visual similarity between shapes. Our results showed that a model of skeletal similarity explained the greatest amount of variance in participants’ object dissimilarity judgments when compared with other computational models of visual similarity (Experiment 1). Moreover, parametric changes to an object’s skeleton led to proportional changes in perceived similarity, even when controlling for another model of structure (Experiment 2). Importantly, participants preferentially categorized objects by their skeletons across changes to local shape contours and non-accidental properties (Experiment 3). Our findings highlight the importance of skeletal structure in vision, not only as a shape descriptor, but also as a diagnostic cue of object identity.
Article
Full-text available
The mathematics of shape has a long history in the fields of differential geometry and topology. But does this theory of shape address the central problem of vision: finding the best data structure plus algorithm for storing a shape and later recognizing the same and similar shapes. Several criteria may be used to evaluate this: does the data structure capture our intuitive idea of 'similarity'? does it allow reconstruction of typical shapes to compare with new input? One direction in which mathematics and vision have converged is toward multiscale analyses of visual signals and shapes. In other respects, however, the recognition process in animals shows features that still defy mathematical modeling.
Article
Full-text available
The problem of finding a description, at varying levels of detail, for planar curves and matching two such descriptions is posed and solved in this paper. A number of necessary criteria are imposed on any candidate solution method. Path-based Gaussian smoothing techniques are applied to the curve to find zeros of curvature at varying levels of detail. The result is the ``generalized scale space'' image of a planar curve which is invariant under rotation, uniform scaling and translation of the curve. These properties make the scale space image suitable for matching. The matching algorithm is a modification of the uniform cost algorithm and finds the lowest cost match of contours in the scale space images. It is argued that this is preferable to matching in a so-called stable scale of the curve because no such scale may exist for a given curve. This technique is applied to register a Landsat satellite image of the Strait of Georgia, B.C. (manually corrected for skew) to a map containing the shorelines of an overlapping area.
Article
Structural pattern recognition in case of molecules is an important task in the field of bio-engineering. Several techniques are employed in order to get the exact structural conformation and structural parameters of molecules. Present paper discusses some of the available techniques such as Fourier Infra-Red Spectrocopy, Raman Spectroscopy and Theoretical Structure simulation which are employed in this technologically important field. An attempt has been made to look for the exact structural conformation in case of Polyformaldehyde using First-principles calculations based on Density Functional Theory.
Article
Model-based vision is firmly established as a robust approach to recognizing and locating known rigid objects in the presence of noise, clutter, and occlusion. It is more problematic to apply model-based methods to images of objects whose appearance can vary, though a number of approaches based on the use of flexible templates have been proposed. The problem with existing methods is that they sacrifice model specificity in order to accommodate variability, thereby compromising robustness during image interpretation. We argue that a model should only be able to deform in ways characteristic of the class of objects it represents. We describe a method for building models by learning patterns of variability from a training set of correctly annotated images. These models can be used for image search in an iterative refinement algorithm analogous to that employed by Active Contour Models (Snakes). The key difference is that our Active Shape Models can only deform to fit the data in ways consistent with the training set. We show several practical examples where we have built such models and used them to locate partially occluded objects in noisy, cluttered images.