ArticlePDF Available

Skeleton-Based Recognition of Shapes in Images via Longest Path Matching

Authors:

Abstract and Figures

We present a novel image recognition method based on the Blum medial axis that identifies shape information present in unsegmented input images. Inspired by prior work matching from a library using only the longest path in the medial axis, we extract medial axes from shapes with clean contours and seek to recognize these shapes within “no isy” images. Recognition consists of matching longest paths from the segmented images into complicated geometric graphs, which are computed via edge detection on the (unsegmented) input images to obtain Voronoi diagrams associated to the edges. We present two approaches: one based on map-matching techniques using the weak Fréchet distance, and one based on a multiscale curve metric after reducing the Voronoi graphs to their minimum spanning trees. This paper serves as a proof of concept for this approach, using images from three shape databases with known segmentability (whale flukes, strawberries, and dancers). Our preliminary results on these images show promise, with both approaches correctly identifying two out of three shapes.
Content may be subject to copyright.
Skeleton-based recognition of shapes in images
via longest path matching
Gulce Bal?, Julia Diebold??, Erin Wolf Chambers???, Ellen Gasparovic, Ruizhen
Hu, Kathryn Leonard§, Matineh Shaker, and Carola Wenkk
Abstract We present a novel image recognition method based on the Blum me-
dial axis that identifies shape information present in unsegmented input images.
Inspired by prior work matching from a library using only the longest path in the
medial axis [3], we extract medial axes from shapes with clean contours and seek
to recognize these shapes within “noisy” images. Recognition consists of match-
ing longest paths from the segmented images into complicated geometric graphs,
which are computed via edge detection on the (unsegmented) input images to ob-
tain Voronoi diagrams associated to the edges. We present two approaches: one
based on map-matching techniques using the weak Fr´
echet distance, and one based
on a multiscale curve metric after reducing the Voronoi graphs to their minimum
spanning trees. This paper serves as a proof of concept for this approach, using
images from three shape databases with known segmentability (whale flukes, straw-
berries, and dancers). Our preliminary results on these images show promise, with
both approaches correctly identifying two out of three shapes.
?Dept. of Computer Engineering, Middle East Technical University, gulcebal@gmail.com.
?? Dept. of Computer Science, Technical University of Munich,
julia.diebold@in.tum.de.
??? Dept. of Mathematics and Computer Science, Saint Louis University, echambe5@slu.edu.
Research supported in part by NSF grants CCF-1054779 and IIS-1319573.
Dept. of Mathematics, Duke University, ellen@math.duke.edu.
Dept. of Mathematics, Zhejiang University, ruizhen.hu@gmail.com.
§Dept. of Mathematics, California State University Channel Islands,
kleonard.ci@gmail.com. Research supported in part by NSF grant IIS-0954256.
Dept. of Electrical Engineering, Northeastern University, shaker@ece.neu.edu.
kDept. of Computer Science, Tulane University, cwenk@tulane.edu. Research supported in
part by NSF grant CCF-0643597.
1
2 Authors Suppressed Due to Excessive Length
1 Introduction
We present a method and proof-of-concept for image recognition based on informa-
tion extracted from the Blum medial axis. Shape recognition and matching based
solely on contour points have been shown to perform weakly in the presence of
occlusion, partial data, and noise [22, 13, 4]. Unorganized point sets [5] represent-
ing boundaries of shapes are often matched using assignment algorithms for graph
matching [10]. Another class of methods which use Hausdorff distance to match the
edge maps [13] has the advantage of not requiring correspondences of edge features,
but they do not necessarily preserve the integrity of shape parts. Global shape rep-
resentations which are translation, rotation, or scale invariant such as coefficients of
Fourier descriptors [18] may result in incorrect matchings due to noise or occlusion.
Historically, approaches based on the medial axis have suffered from its instability
and complexity in the presence of noise and pixelation. Our approach is designed to
bypass those problems while preserving the strengths of the medial axis as a shape
descriptor, including meaningful decomposition into parts and stability despite oc-
clusion. Furthermore, our matching techniques are designed to be near-invariant to
Euclidean motions (translation, rotation, and scaling).
While shape recognition based on the medial axis has been well-studied for pre-
segmented shapes [26], this project is among the first to perform recognition using
the medial axis on an unsegmented unknown image. The basic concept builds on
previous work which recognizes objects by matching longest paths in the medial
axis, but only in the limited setting where the input is a “nice” shape taken from
a particular hand-drawn catalog [3]. Here, we apply a similar philosophy to match
shapes in the much more challenging domain where the input is an arbitrary image.
As a result, we must apply edge detection and other techniques in order to identify
significant shape information present in the image. Additionally, whereas [3] uses
both the medial skeleton and the radius function, our current results use only the
skeleton because extracting reliable radius information from arbitrary edges in an
image presents additional challenges.
Since there is no common frame of reference between shapes from our canonical
library of possibilities and our input image, we must match an arbitrary path (the
longest path from the canonical image) into a messy geometric graph (the Voronoi
diagram of the edges detected from our image). We use two different approaches
in this work, one based on map-matching using the weak Fr´
echet distance and the
other based on a multiscale curve matching into the minimum spanning tree of the
graph computed from the input image edges.
Our initial results indicate that both matching methods perform reasonably well,
clearly matching two of our three initial tests to the correct image. The algorithms
are reasonably efficient, although the map-matching approach is more computation-
ally intensive due to the exhaustive set of rotations and transformations that must be
tested. Testing on a larger database than our three-object set is required to determine
the full power of these methods.
Skeleton-based recognition of shapes in images via longest path matching 3
2 Background
2.1 The medial axis
The medial axis of an object is the set of points which have more than one closest
point on the object’s boundary. It was first introduced by Blum as a tool for rec-
ognizing shapes in biological images [6]. It is known that the medial axis has the
same homotopy type as the original shape [17], and therefore it gives a topologi-
cally accurate but simpler representation of the shape of an object. In addition, the
geometry of the boundary curve is encoded in the geometry of the medial skeleton
and its radius function. The medial axis transform is the set of points in the medial
axis annotated with the radius of the largest inscribed ball centered at each point.
This structure can be used to recover the entirety of the original shape. Applications
and algorithms using this structure are numerous; see for example the survey by
Leymarie and Kimia and the many other references in [16].
2.2 Shape recognition using the medial axis
One of the main motivations for this work is the fact that medial-axis based struc-
tures such as the shock graph have had notable success with the problem of image
recognition among a large database [26, 23, 24]. Each of these algorithms catalogs
a set of canonical shape categories by computing the shock graph (an annotated
version of the medial axis) for each of the shape instances. The next step is to read
input images and attempt to match the shock graphs of the input images against the
library of known shapes. These algorithms are based on dynamic programming, and
work efficiently since the shock graph is a tree whenever the input shape is simply
connected.
Another line of research motivating our work does not use the entire structure
of the medial axis, but instead does the matching strictly based on the longest path
in the medial axis and its associated radius function. Bai et. al [3] implemented
and tested on a library of shapes containing 56 images total, with 4 objects per
shape class [2]. Their approach of removing a shape from the library and testing to
get the correct classification resulted in a success rate of 98.2%. In addition, they
implemented and tested their method on a larger dataset [23] with 94.4% accuracy.
Although this matching is naturally less successful for images with high radial
symmetry, they nonetheless successfully match input shapes to the correct class for
the vast majority of tested images. This is perhaps surprising, given how much rich
information about the medial axis is lost when only considering the single longest
path. However, the work has so far been applied only to catalogs of images with
hand-drawn, clean contours. In this paper, we apply a related method to recognize a
shape contained in an arbitrary (noisy) input image.
4 Authors Suppressed Due to Excessive Length
2.3 Map-matching
Given a graph Gembedded in Euclidean space Rd(most often R2) and a polygonal
curve γalso embedded in Rd, the map-matching problem asks for the path in G
which is closest to γ, generally under some distance measure such as the Fr´
echet
distance or weak Fr´
echet distance. Recently, this problem has been considered in
both theoretical and applied settings due to its utility in GIS applications [1, 7, 8].
In this setting, one often has a trajectory (such as is given by a GPS unit placed in
a vehicle) which needs to be matched to the closest path on a known road network,
modeled as the graph G.
Our setting is slightly different: although the graphs we work with are extracted
from images and thus have embeddings in R2, our input paths are not embedded in
the same frame of reference since the scales and orientations of the arbitrary input
images can be quite different from the reference images from the library. This vari-
ation is somewhat similar to the notion of a graph isomorphism, but here, our input
graphs are geometric graphs rather than arbitrary ones. While fast algorithms for
Fr´
echet distance to a geometric graph have been looked at in some limited settings,
such as for trees [12], no one previously has considered the problem where the input
path is not given as an embedding into the same frame of reference as the graph G,
which adds considerably to the difficulty of the problem.
We perform map-matching via the weak Fr´
echet distance. Let γ1,γ2:[0,1]R2
be two curves in the plane. The weak Fr´
echet distance δwF between them is defined
as:
δwF (γ1,γ2) = inf
α1,α2:[0,1][0,1]max
t[0,1]kγ1(α1(t)) γ2(α2(t))k,
where α1and α2range over all continuous reparametrizations with α1(0) = α2(0) =
0 and α1(1) = α2(1) = 1, and ||.|| denotes the Euclidean norm. The weak Fr´
echet
distance is a well-suited distance measure for comparing curves as it takes into ac-
count the continuity of the curves. In our setting, we consider the set Tof trans-
lations, rotations, and scalings. And the related map-matching problem that we ad-
dress is to find for a geometric graph G, a curve γ, and any admissible transformation
TT, the path in Gthat minimizes the weak Frechet distance to any T(γ).
2.4 H1/2-type multiscale curve metric
Our other method of matching an input path into a geometric graph is via the H1/2
multiscale curve metric, first introduced in [20], evaluated on a curve extracted from
the graph and the known longest medial path. The last decade has produced a sub-
stantial body of work on finding shape metrics that respect the underlying geometry
of shape space, where a shape is modeled as a curve in R2possibly modulo a group
of transformations [25, 19, 14]. Unfortunately, these metrics are computationally
expensive and can be unwieldy to implement in any realistic setting. The H1/2-type
metric is a middle-ground: a weakened linearization of a Riemannian metric that
Skeleton-based recognition of shapes in images via longest path matching 5
is computationally fast. In other words, it computes distances based on geometric
quantities whereas the Fr´
echet distance does not.
For ease of exposition, results here are given for plane curves as objects in C
instead of R2. We trust the reader can move naturally between these two represen-
tations. Given a smooth arclength-parameterized open plane curve γ(s), define an
H1/2“norm”9as:
kγk2
1
2
=ZL
0Zmin(s,Ls)
0
β(s,t)2dt ds,
where Lis the length of the curve, and the angle β(s,t)between the rays joining
γ(s)to γ(s+t)and γ(st)is given by:
β(s,t)arg γ(s+t)γ(s)
γ(s)γ(st).
Moreover, βgives rise to a metric on curves. Let Σbe the set of homeomor-
phisms σ:[0,1][0,1]and γ1,γ2be Lipschitz curves. Then:
L(γ1,γ2) = inf
σΣZ Z (β1(s,t)β2(σ(s),t))2ds dt
gives the metric:
d2(γ1,γ2) = L(γ1,γ2) + L(γ2,γ1).
For a discretized curve sampled over dyadic intervals, we have:
kγk2
1
2
=
N2k1
n=1+2k1
K
k=1
β(n,k)22k,
where Nis the number of sampled points, Kdetermines the maximum number of
dyadic intervals, and the angle βis:
β(n,k) = arg γ(n+2k1)γ(n)
γ(n)γ(n2k1).
If γis an arclength parameterization of a Lipschitz graph, then the angles β(n,k)
are, in a distributional sense, the same as the wavelet coefficients of γover the same
dyadic interval system. In this way, the collection of angles {β(n,k)}provides a
multiscale analysis of the curve γand, in turn, the Haar coefficients of γ0provide a
fast computation for {β(n,k)}based on scaled second differences:
N2k1
n=1+2k1
K
k=1
β(n,k)22k=
N2k1
n=1+2k1
K
k=1γ(n+2k)2γ(n) + γ(n2k)2k.(1)
9We are not viewing the space of plane curves as linear, but the integral defined is analogous to
Sobolev norms on function spaces and the integrand is analogous to a wavelet decomposition of γ.
Additionally, the “norm” gives rise to a metric on curves in the standard way.
6 Authors Suppressed Due to Excessive Length
If γ1and γ2are sampled by MNpoints, respectively, then σ:{1,...,M} →
{1,...,N}and scales are limited by Klog2Mand we obtain the discrete approx-
imation to the continuous metric:
L(γ1,γ2)min
σΣM,N
M2k1
m=1+2k1
K
k=1
1
k2|β1(m,k)β2(σ(m),k)|2
which in turn can be computed using second differences as above.
The metric as defined is naturally translation invariant. In the discrete case, rota-
tion invariance is introduced by rotating the line joining γ(n+2k)and γ(n2k)to
be horizontal (a coarse approximation to the tangent line at γ(n)) and scale invari-
ance is introduced by normalizing the average inter-point distances to be one. See
[20] for details and full generality of results.
3 Method
3.1 Extracting medial axes from “known” images
In general, the medial axis of an object in a natural image is difficult to extract au-
tomatically, as it requires segmenting the image, extracting the points on the bound-
ary of the object of interest, then computing the medial axis. We select three image
databases with known segmentability: whale flukes, strawberries, and dancers. We
use k-means clustering to extract an initial binary representation of the object of in-
terest, then apply morphological techniques to obtain a clean boundary. We extract
the centers and radii of the circumcircles of the Delaunay triangulation of the bound-
ary points and retain only those centers and radii corresponding to the interior of the
object, thereby obtaining the interior medial axis. See Figure 1 for an illustration of
this process. For more details on this process, see [15].
(a) (b) (c)
Fig. 1: Intermediate steps for extracting the medial axis from the whale image. The original image
can be seen in Figure 6. Images above are (a) the initial cluster containing the whale fluke resulting
from k-means clustering, (b) the segmented whale fluke after morphological processing, and (c) the
resulting boundary points. The medial axis with longest path resulting from the boundary displayed
here can be seen in Figure 2(e).
Skeleton-based recognition of shapes in images via longest path matching 7
To extract the longest path within the axis, we apply Dijkstra’s algorithm to find
the point Pon the axis that is farthest from a randomly selected medial point, then
repeat Dijkstra’s algorithm to find the medial point Qfarthest from P. Retracing
steps from Qto Pgenerates the sequence of medial points along the longest path in
the medial axis. See Figure 2 for an illustration of this process on our 3 test images.
100 200 300 400 500 600
0
50
100
150
200
250
300
350
400
450
50 100 150 200 250 300 350 400 450 500
ï50
0
50
100
150
200
250
300
(a) Voronoi vertices of strawberry (b) Voronoi vertices of whale fluke (c) Voronoi vertices of dancer
80 100 120 140 160 180 200 220 240 260
150
200
250
300
350
400
0 100 200 300 400 500 600 700 800 900
200
300
400
500
600
700
800
900
100 200 300 400 500 600 700 800
ï200
ï100
0
100
200
300
400
(d) Medial skeletons of strawberry (e) Medial skeletons of whale fluke (f) Medial skeletons of dancer
Fig. 2: Top row: Voronoi vertices with longest path highlighted. Bottom row: Medial skeletons
with longest path highlighted. These longest paths are shown matched to one another in Figures 12
- 14.
8 Authors Suppressed Due to Excessive Length
3.2 Extracting Voronoi edges from “unknown” input images
Giraffe Smoothed Giraffe Torbreck Smoothed Torbreck
Fig. 3: Examples of smoothed images.
figures/Fig6_4a.pdf
Fig. 4: Results of edge detection on smoothed images of a giraffe and a bottle.
Given an input image, we smooth it as in [21]. See Figure 3. Let fdenote the
noisy input image and uthe denoised (smooth) version. We obtain uby minimizing
the energy:
E(u) = Z(fu)dx +λZ|u|dx,(2)
where denotes the image domain and λR>0a weighting factor. The first term
ensures that uis similar to fand the second term forces uto be smooth everywhere
except at strong edges.
Next, we run a line segment detector (LSD) algorithm [11] on the smoothed ver-
sion in order to extract prominent edges and thus a likely boundary of a shape. LSD
locally detects straight contours on the image, giving subpixel results while con-
trolling the number of false detections per pixel. Contours are naturally defined by
the image gradient and level lines of the image which divide the transition region
Skeleton-based recognition of shapes in images via longest path matching 9
from dark to light or the opposite. The algorithm works by finding the unit vectors
tangent to the level lines, thus computing the level line angle at each pixel. The re-
sulting vector field is then segmented into connected regions that share the same
level line angle up to a threshold. Each connected region is represented by a geo-
metrical object such as a rectangle. The principal axis of this object defines the main
direction which is chosen as the line segment.
(a) (b) (c) (d) (e)
Fig. 5: Removing “outlier” medial points for giraffe image. (a) Original image with all medial
points, followed by the resulting images after (b) dilation, (c) erosion, and (d) point deletion. (e)
Final image with remaining medial points.
The output is a set of edges with noise, as in Figure 4, which we process into a
Voronoi diagram to extract potential medial points. In doing so, we remove “outlier”
medial points (including points in the region external to the shape) by a dilation and
erosion process, as depicted in Figure 5. That is, we first thicken the medial points
to form many connected point clusters and subsequently erode them (while still
maintaining connected structures). We then identify and delete all point clusters in
the processed image of a sufficiently small area, and/or those points that are greater
than a certain small distance away from the largest connected structures in the im-
age. We then compare the resulting image with the original input image and delete
all medial points in the input image corresponding to deleted points in the processed
image, yielding the desired image without outliers.
Next, our objective is to match the single longest path from each of our initial
image instances into the graph which, in each case, approximates the medial axis
of the shape that is present in each input image. We pursue this problem in two
different ways as outlined in the following two subsections.
3.3 Matching via weak Fr´
echet distance
Our first method of matching is based on map-matching via the weak Fr´
echet dis-
tance. The related map-matching problem is to find for a geometric graph Gand a
curve γa path in the graph that minimizes the weak Fr´
echet distance to γ. For a
polygonal curve γwith nvertices and a graph Gwith a total number of medges and
vertices, the map-matching problem can be solved in O(mnlog(mn)) time [27]. This
algorithm constructs a “free space graph” which is essentially a combinatorial rep-
10 Authors Suppressed Due to Excessive Length
resentation of the product space of (parameterizations of) the curve and the graph.
Each vertex-edge pair is assigned a weight that equals their Euclidean distance, and
then a shortest path algorithm in this “free space graph” (where the length of the
path is computed as the maximum of the weights) computes a path with minimum
weak Fr´
echet distance. Please see [27] for more details.
In our setting, we consider the set Tof translations, rotations, and scalings.
And the related map-matching problem that we address is to find for a geometric
graph G, a curve γ, and any admissible transformation TT, the path in Gthat
minimizes the weak Frechet distance to any T(γ). We sample Tby applying a fairly
exhaustive set of scalings, translations, and rotations to the curve, and for each such
transformation we run the map-matching algorithm of Wenk et al. [27]. In particular,
we sample the transformation space as follows: We consider rotations by 0, 90,
180, and 270 degrees. We hold the aspect ratio constant and apply a single scaling
factor; the maximum scaling factor is determined such that the width of the (possibly
rotated) path equals the width of the graph, and the minimum scaling factor is chosen
to be half the maximum factor; this range is sampled in steps of 0.2. The two-
dimensional translation space is determined to consist of all translations such that
the bounding box of the (possibly scaled and rotated) path fits entirely inside the
bounding box of the graph; the translation space is sampled in steps of 10 pixels. As
described in Section 4, the dimensions of each bounding box are several hundred
pixels by several hundred pixels. The resulting range of scales where between 1 and
2.4 for the strawberry, between 0.38 and 0.78 for the whale fluke, and between 0.6
and 1.2 for the dancer. We note that this method is computationally intensive for
each example, involving multiple tests for different possible orientations and sizes
of the path.
3.4 Matching via an H1/2-type metric
Our second method of matching addresses the fact that the Fr´
echet based algo-
rithm described in Section 3.3 is especially difficult because the input medial axis
graph can be quite noisy and messy depending on how well our edge detection and
smoothing algorithms are able to isolate prominent shapes. Additionally, the second
method applies a metric that is invariant under Euclidean motion.
We first simplify the Voronoi graph to a tree to avoid cycles when computing
the longest path in the graph. We choose the minimum spanning tree because it
appears to capture the prominent shape features quite well, though other ways of
simplifying the input graph may be worth investigating. Note that in converting the
graph to a tree we may lose segments on the longest path. Suppose γ1is a discrete
representation of the longest path in the medial axis of a known object, and we are
given the Voronoi edges from an unknown image. Our method is as follows, with
curve matching running in O(MN log M)time:
1. Compute the minimal spanning tree for the Voronoi edges.
2. Extract γ2, the longest path in the Voronoi tree.
Skeleton-based recognition of shapes in images via longest path matching 11
3. Resample γ1and γ2to have N=M=128 equally spaced points.
4. Normalize scale so that each curve has an average inter-point distance of one.
5. Compute second differences as described in Equation 1.
6. Extract second differences corresponding to every fourth point on γ1to allow for
flexible point matching (otherwise the points are matched one-to-one in order),
following the procedure outlined in Section 2.4.
7. Apply dynamic programming to find the matching of points of γ1to γ2that min-
imizes the distance d2(γ1,γ2)between the curves.
8. Sum scaled second differences corresponding to the optimal matching to obtain
approximation to d2(γ1,γ2).
4 Results
The three images we use are of a strawberry, a whale fluke, and a dancer. Here we
match the medial axis extracted as described in Section 3.1 to the Voronoi diagram
of the same image extracted as described in Section 3.2. The dimensions of the
bounding boxes of the Voronoi diagrams are 479 ×367 for the strawberry, 618×418
for the whale fluke, and 540 ×239 for the dancer. Results from the two matching
methods are comparable, and both seem promising.
Fig. 6: Input images: a strawberry, a whale fluke, and a dancer.
4.1 Weak Fr´
echet map-matching distance results
For the dancer and the whale fluke, the transformation that minimized the weak
Fr´
echet distance over all sampled transformations was found correctly, see Figures
8 and 9. The point matching computed by the weak Fr´
echet distance also appears to
be of good quality. The distance for the minimum transformation computed for the
dancer is so small (2.5 pixels), that the transformed dancer path and the resulting
12 Authors Suppressed Due to Excessive Length
(a) (b) (c)
Fig. 7: Results of edge detection: (a) strawberry, (b) whale fluke, and (c) dancer.
0 100 200 300 400 500
0
50
100
150
200
250
Fig. 8: Matching the dancer path into the dancer graph. The graph edges are shown in light gray,
and the path is shown in green. The algorithm finds the correct transformation with minimum
Fr´
echet distance 2.5 pixels (at scale 1.0 with no rotation). The transformed path is shown in black,
and the corresponding path in the graph in blue.
matched path in the graph almost coincide. For the whale fluke, the minimum com-
puted transformation (11.2 pixels) is very close to the transformation with the third
smallest distance of 11.6 pixels which applies an additional 180 degrees transforma-
tion to the whale path. For the strawberry, the path is found for multiple small scales
at multiple positions at small distances (ranging between 8.1 to about 13) in the
graph, see Figure 8. Out of the 4,368 sampled transformations per sampled scale,
9.2% of the transformations at scale 1.0 have a distance less than 15. At scale 1.2,
this reduces to 5.5%, and at scale 2.0 this reduces to only 0.07%. We believe that
this is an artifact caused by the almost grid-like dense edge pattern in the strawberry
graph in combination with the very straight shape of the strawberry path.
We also compared the dancer path to the strawberry graph, the whale fluke graph,
and the dancer graph. We computed the minimum weak Fr´
echet distance over all
sampled transformations. The computed minimum distances were 8.9 pixels for the
Skeleton-based recognition of shapes in images via longest path matching 13
600
700
800
900
1000
500
400
300
200
100
0200 400 600
300
350
400
450
250
200
150
100
50
0
100 200 300 400 600500
Fig. 9: Matching the whale path into the whale graph. The graph edges are shown in light gray, and
the path is shown in green. The algorithm finds the correct transformation with minimum Fr´
echet
distance 11.2 pixels (at scale 0.6 with 90 degrees rotation). The transformed path is shown in black,
and the corresponding path in the graph is blue. The red lines show the optimal point matching.
300
350
250
200
150
100
50
50 100 150 200 250 350300-50 0
300
350
250
200
150
100
050 100 150 200 250 300 350 400
300
350
250
200
150
100
50
0 50 100 150 200 250 300 350 400 450
Fig. 10: Matching the strawberry path into the strawberry graph. The algorithm finds too many
occurrences of the path at a small scale. The graph edges are shown in light gray, and the path is
shown in green. Results are shown for the minimum distance (8.1) at scale 1.0 (with a rotation
of 180 degrees), the minimum distance (8.9) at scale 1.2 (with a rotation of 270 degrees), and the
minimum distance (13.4) at scale 2.0 (with a rotation of 180 degrees). The transformed path is
shown in black, and the corresponding path in the graph in blue.
strawberry graph, 9.7 pixels for the whale fluke graph, and 2.5 for the dancer graph.
The dancer path therefore correctly determined the dancer graph as the graph it
matches best with, see Figures 8 and 11.
14 Authors Suppressed Due to Excessive Length
(a)
300
350
250
200
150
100
50
0
50 100 150 200 250 300 350 400 450 500
(b)
300
350
400
450
250
200
150
100
50
0
100 200 300 400 500 600
Fig. 11: Matching the dancer path into the strawberry graph (distance 8.9) and into the whale fluke
graph (distance 8.7). Both distances are larger than the distance into the dancer graph (2.5), see
Figure 8.
4.2 H1/2metric results
Initial results for matching the medial longest path to the Voronoi tree longest path
are correct for the two instances where the longest Voronoi path contains the desired
medial points. Apart from the strawberry image, where the Voronoi tree longest path
fails to contain edges belonging to the medial axis of the strawberry, the closest
match corresponds to the correct classification. In addition, the optimal matching
between points performs reasonably well. See Figures 12 - 14. Note that the scale
of the curves has changed. This is because of the scale invariance we introduced by
normalizing inter-point distances to be one.
(a)
0 0.5 1 1.5 2 2.5 3
0.5
1
1.5
2
2.5
3
Distance = 1.551
(b)
0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
Distance = 1.749
(c)
−1 0 1 2 3 4
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Distance = 2.687
Fig. 12: Matching medial longest path from (a) whale (distance = 1.551), (b) dancer (distance =
1.749), (c) berry (distance = 2.687) into the Voronoi tree longest path of the whale fluke. Lines
show the optimal point matching. The minimum distance into the messy graph correctly classifies
the unknown image as a whale.
Skeleton-based recognition of shapes in images via longest path matching 15
(a)
0 0.5 1 1.5 2 2.5 3 3.5
0.5
1
1.5
2
2.5
3
Distance = 1.424
(b)
0.5 1 1.5 2 2.5 3
−0.5
0
0.5
1
1.5
Distance = 1.314
(c)
−1 0 1 2 3 4 5
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Distance = 2.602
Fig. 13: Matching medial longest path from (a) whale (distance = 1.424), (b) dancer (distance =
1.314), (c) berry (distance = 2.602) into the Voronoi tree longest path of the dancer. Lines show
the optimal point matching. The minimum distance into the messy graph correctly classifies the
unknown image as a dancer.
(a)
−0.
5
0
0.
5
1
1.
5
2
2.
5
3
0.5
1
1.5
2
2.5
3
Distance = 1.473
(b)
0.
5
1
1.
5
2
2.
5
3
0
0.5
1
1.5
2
Distance = 1.474
(c)
−1 0 1 2 3 4 5
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Distance = 2.422
Fig. 14: Matching medial longest path from (a) whale (distance = 1.473), (b) dancer (distance =
1.474), (c) berry (distance = 2.422) into the Voronoi tree longest path of the strawberry. Lines show
the optimal point matching. The minimum distance into the messy graph incorrectly classifies the
unknown image as a whale. Note that the longest path in the Voronoi tree for the strawberry image
does not contain any edges from the medial axis of the strawberry itself.
5 Discussion and Future Directions
Our matching techniques show enough promise to merit additional investigation.
We are curious about the success of the algorithms when the Voronoi diagram is
dense or grid-like, or where the longest path in the medial axis does not trace a
prominent shape feature (such as when the input image is nearly round with radial
symmetry).
5.1 Analysis of the weak Fr´
echet map-matching distance
Sampling the transformation space to minimize the weak Fr´
echet map-matching
works well for the dancer and the whale fluke. The strawberry graph exhibits a
grid-like dense edge pattern which causes the strawberry path to be found in many
locations in the graph, in particular for small scales. While this behavior is extreme
in the strawberry, it is also present in the whale fluke data, where the path with the
second smallest distance is located at a different location with an additional 180
16 Authors Suppressed Due to Excessive Length
degrees rotation. We believe that the “small scale” problem could be overcome by
analyzing the distribution of distances for fixed scale and varying translations and
rotations, in order to identify transformations with significant distances. We will
investigate this direction in future research.
For the dancer path, the distance into the dancer graph was much smaller than
into the strawberry graph and the whale fluke graph. The minimum weak Fr´
echet
distance into the messy graph therefore correctly classifies the unknown image as a
dancer.
5.2 Analysis of the H1/2-type metric
Matching longest paths using the H1/2-type metric performs well for the two cases,
whale and dancer, where the longest path in the Voronoi tree contains edges cor-
responding to the medial axis of the object of interest. Not surprisingly, it fails for
the third image, the strawberry, where no medial edges appear in the Voronoi tree
longest path. The strawberry image is particularly challenging, as the berry itself
contributes very few edges to the very complicated edge map seen in Figure 7(a)
and contains several spurious edges in its interior. This illustrates the need for an
additional evaluation of relative importance of Voronoi vertices, perhaps through
classification of vertices as belonging to the foreground or background or noise.
In addition, the optimal matching of points between the two longest paths cur-
rently seems to favor matches that map the medial path into the length of the Voronoi
path. For example, in Figure 12 the medial axis for the whale in the Voronoi path
starts at about the halfway point whereas the optimal matching begins at the left of
the path. Because the optimal matching can skip enough points to avoid highly mis-
matching segments, it seems likely that a longer match will often be lower cost for
medial curve matching. At the same time, two curves that are identical up to a point
can correspond in a match that is too short. Figure 15 illustrates this issue. For the
larger scales, the differences in the βangles (and their second difference approxima-
tions) will grow as the points on the circles approach the points on the line. Hence
the lowest cost match avoids points toward the end of the circle attached to the line.
Penalizing skips that are longer than an average skip, or adding the difference in the
radius function values to the cost of matching two points may improve the medial
point correspondence.
5.3 Future work
Our initial proof of concept for this approach is promising. Based on our results and
prior work in this area [3], we speculate that this approach will also work well to
capture the same shape in a different pose (such as a dancer in different positions).
Future work will consider a larger library of shapes as well as input images in differ-
Skeleton-based recognition of shapes in images via longest path matching 17
−1.5 −1 −0.5 0 0.5 1
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Distance = 1.534
Fig. 15: Matching a semicircle into a curve composed of the union of a semicircle and a line.
Instead of matching semicircle to semicircle, the semicircle is matched to points away from the
line.
ent poses. There is also potential to include the radius function as well as the longest
path for improved recognition results, but it is not clear if the Voronoi graphs from
input images will prove too noisy to reliably calculate this information.
Both approaches perform better with a simpler Voronoi graph. We are currently
exploring methods for evaluating the saliency of either a particular Voronoi vertex
or (equivalently) an edge pair associated to a Voronoi vertex. In addition, both ap-
proaches would benefit from using the information in the radius function on the
medial and Voronoi points that gives the distance to the corresponding edge points.
We anticipate substantial improvement from the combination of these modifications.
We also hope to reduce the cost of learning additional shape classes once a suf-
ficient number of classes have been learned. Learning the visual models for clas-
sification of test objects requires a significant number of training samples. In the
method of one-shot learning [9], the information from previously learned categories
is used for training new categories, using a Bayesian prior and maximum a poste-
riori (MAP) estimation. This model could be used to optimize and extend learning
for the current methods.
Acknowledgments
The authors would like to thank the Institute for Pure and Applied Mathematics, the
Association for Women in Mathematics, Microsoft Research, the National Science
Foundation, and the National Geospatial Agency for support, financial and other-
18 Authors Suppressed Due to Excessive Length
wise, of this collaboration. Kathryn Leonard thanks Matt Feiszli for providing the
initial Matlab code for the H1/2metric for closed curves which was modified for
this project.
References
1. Helmut Alt, Alon Efrat, G¨
unter Rote, and Carola Wenk. Matching planar maps. J. Algorithms,
49(2):262–283, November 2003.
2. Cagri Aslan and Sibel Tari. An axis-based representation for recognition. In ICCV, pages
1339–1346. IEEE Computer Society, 2005.
3. X. Bai, X. Yang, D. Yu, and L. J. Latecki. Skeleton-based shape classification using path
similarity. International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI),
22(4):733–746, 2008.
4. Serge Belongie, Jitendra Malik, and Jan Puzicha. Shape matching and object recognition using
shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4):509–
522, 2002.
5. Serge Belongie, Greg Mori, and Jitendra Malik. Matching with shape contexts. In Statistics
and Analysis of Shapes, pages 81–105. Springer, 2006.
6. H. Blum. A transformation for extracting new descriptors of shape. Models for the Perception
of Speech and Visual Form, pages 362–80, 1967.
7. Sotiris Brakatsoulas, Dieter Pfoser, Randall Salas, and Carola Wenk. On map-matching ve-
hicle tracking data. In Proceedings of the 31st International Conference on Very Large Data
Bases, VLDB ’05, pages 853–864. VLDB Endowment, 2005.
8. Daniel Chen, Anne Driemel, Leonidas J. Guibas, Andy Nguyen, and Carola Wenk. Approxi-
mate map matching with respect to the Fr´
echet distance. In Matthias M¨
uller-Hannemann and
Renato Fonseca F. Werneck, editors, ALENEX, pages 75–83. SIAM, 2011.
9. Li Fei-Fei, Robert Fergus, and Pietro Perona. One-shot learning of object categories. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 28(4):594–611, 2006.
10. Steven Gold and Anand Rangarajan. A graduated assignment algorithm for graph matching.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4):377–388, 1996.
11. Rafael Grompone von Gioi, J´
er´
emie Jakubowicz, Jean-Michel Morel, and Gregory Randall.
LSD: a Line Segment Detector. Image Processing On Line, 2012, 2012.
12. Joachim Gudmundsson and Michiel Smid. Fr´
echet queries in geometric trees. In Hans L.
Bodlaender and Giuseppe F. Italiano, editors, Algorithms - ESA 2013, volume 8125 of Lecture
Notes in Computer Science, pages 565–576. Springer Berlin Heidelberg, 2013.
13. Daniel P. Huttenlocher, Gregory A. Klanderman, and William J. Rucklidge. Comparing im-
ages using the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine In-
telligence, 15(9):850–863, 1993.
14. S. Kushnarev. Teichons: Solitonlike geodesics on universal Teichm¨
uller space. Experimental
Mathematics, (18):325–336, 2009.
15. K. Leonard, R. Strawbridge, D. Lindsay, R. Barata, M. Dawson, and L. Averion. Minimal
geometric representation and strawberry stem detection. In Computational Science and Its
Applications (ICCSA), 2013 13th International Conference on, pages 144–149, June 2013.
16. Frederic F. Leymarie and Benjamin B. Kimia. From the infinitely large to the infinitely
small: Applications of medial symmetry representations of shape. In Kaleem Siddiqi and
Stephen Pizer, editors, Medial Representations: Mathematics, Algorithms and Applications,
pages 327–351. Kluwer Academic Publishers, 2006.
17. Andr´
e Lieutier. Any open bounded subset of Rnhas the same homotopy type as its medial
axis. Comput.-Aided Des., 36(11):1029–1046, September 2004.
18. C.C. Lin and Rama Chellappa. Classification of partial 2-D shapes using Fourier descriptors.
IEEE Transactions on Pattern Analysis and Machine Intelligence, (5):686–690, 1987.
Skeleton-based recognition of shapes in images via longest path matching 19
19. A. Trouve M. I. Miller and L. Younes. On metrics and Euler-Lagrange equations of computa-
tional anatomy. Ann. Rev. Biomed. Engng, (4):375–405, 2002.
20. Kathryn Leonard Matt Feiszli, Sergey Kushnarev. Metric spaces of shapes and applications:
Compression, curve matching and low-dimensional representation. Geometry, Imaging, and
Computation, to appear.
21. L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms.
Physica D, 60:259–268, 1992.
22. Thomas B. Sebastian and Benjamin B. Kimia. Curves vs. skeletons in object recognition.
Signal Processing, 85(2):247–263, 2005.
23. Thomas B. Sebastian, Philip N. Klein, and Benjamin B. Kimia. Shock-based indexing into
large shape databases. In Proceedings of the 7th European Conference on Computer Vision-
Part III, ECCV ’02, pages 731–746, London, UK, 2002. Springer-Verlag.
24. Thomas B. Sebastian, Philip N. Klein, and Benjamin B. Kimia. Recognition of shapes by
editing their shock graphs. IEEE Trans. Pattern Anal. Mach. Intell., 26(5):550–571, May
2004.
25. E. Sharon and D. Mumford. 2d-shape analysis using conformal mapping. International Jour-
nal of Computer Vision, (70):55 75, 2006.
26. Nhon H. Trinh and Benjamin B. Kimia. Skeleton search: Category-specific object recognition
and segmentation using a skeletal shape model. International Journal of Computer Vision,
94(2):215–240, September 2011.
27. Carola Wenk, Randall Salas, and Dieter Pfoser. Addressing the need for map-matching speed:
Localizing global curve-matching algorithms. In Proceedings of the 18th International Con-
ference on Scientific and Statistical Database Management, SSDBM ’06, pages 379–388,
Washington, DC, USA, 2006. IEEE Computer Society.
... In [3][4][5] skeletons are used for automatic chromosome analysis (karyotyping), to identify abnormalities in the morphology. Also for computer vision tasks such as object recognition [6,7] and object tracking [8] skeletonization is an important tool. Furthermore, in [9] it is used to reconstruct the 3D path of interventional devices from two orthogonal projection images. ...
Article
Full-text available
The skeletonization of binary images is a common task in many image processing and machine learning applications. Some of these applications require very fast image processing. We propose novel techniques for efficient 2D and 3D thinning of binary images using GPU processors. The algorithms use bit-encoded binary images to process multiple points simultaneously in each thread. The simpleness of a point is determined based on Boolean algebra using only bitwise logical operators. This avoids computationally expensive decoding and encoding steps and allows for additional parallelization. The 2D algorithm is evaluated using a data set of handwritten characters images. It required an average computation time of 3.53 ns for 32 \(\times\) 32 pixels and 0.25 ms for 1024 \(\times\) 1024 pixels. This is 52–18,380 times faster than a multi-threaded border-parallel algorithm. The 3D algorithm was evaluated based on clinical images of the human vasculature and required computation times of 0.27 ms for 128 \(\times\) 128 \(\times\) 128 voxels and 20.32 ms for 512 \(\times\) 512 \(\times\) 512 voxels, which is 32–46 times faster than the compared border-sequential algorithm using the same GPU processor. The proposed techniques enable efficient real-time 2D and 3D skeletonization of binary images, which could improve the performance of many existing machine learning applications.
... Yang et al. presented a hierarchical skeleton to organize multiple skeletons obtained by skeleton pruning and a matching technique considering similarities for both single skeletons and skeleton pairs in a hierarchical skeleton [26]. Bal et al. [27] extended the idea of matching longest paths in the medial axis [23] to the recognition of unsegmented input images. ...
Article
In recent decades, the need for efficient and effective image search from large databases has increased. In this paper, we present a novel shape matching framework based on structures common to similar shapes. After representing shapes as medial axis graphs, in which nodes show skeleton points and edges connect nearby points, we determine the critical nodes connecting or representing a shape’s different parts. By using the shortest path distance from each skeleton (node) to each of the critical nodes, we effectively retrieve shapes similar to a given query through a transportation-based distance function. To improve the effectiveness of the proposed approach, we employ a unified framework that takes advantage of the feature representation of the proposed algorithm and the classification capability of a supervised machine learning algorithm. A set of shape retrieval experiments including a comparison with several well-known approaches demonstrate the proposed algorithm’s efficacy and perturbation experiments show its robustness.
Chapter
The Blum medial axis is known to provide a useful representation of pre-segmented shapes. Very little work to date, however, has examined its usefulness for extracting objects from natural images. We propose a method for combining fragments of the medial axis, generated from the Voronoi diagram of an edge map of a natural image, into a coherent whole. Using techniques from persistent homology and graph theory, we combine image cues with geometric cues from the medial fragments to aggregate parts of the same object into a larger whole. We demonstrate our method on images containing articulating objects, with an eye to future work applying articulation-invariant measures on the medial axis for shape matching between images.
Article
The skeleton of an object provides an intuitive and effective abstraction which facilitates object matching and recognition. However, without any human interaction, traditional skeleton-based descriptors and matching algorithms are not stable for deformable objects. Specifically, some fine-grained topological and geometrical features would be discarded if the skeleton was incomplete or only represented significant visual parts of an object. Moreover, the performance of skeleton-based matching highly depends on the quality and completeness of skeletons. In this paper, we propose a novel object representation and matching algorithm based on hierarchical skeletons which capture the shape topology and geometry through multiple levels of skeletons. For object representation, we reuse the pruned skeleton branches to represent the coarse- and fine-grained shape topological and geometrical features. Moreover, this can improve the stability of skeleton pruning without human interaction. We also propose an object matching method which considers both global shape properties and fine-grained deformations by defining singleton and pairwise potentials for similarity computation between hierarchical skeletons. Our experiments attest our hierarchical skeleton-based method a significantly better performance than most existing shape-based object matching methods on six datasets, achieving a 99.21% bulls-eye score on the MPEG7 shape dataset.
Article
Full-text available
In this paper we present three metrics on classes of 2D shapes whose outlines are simple closed planar curves. The first, a C 1-type metric on classes of shapes with Lipschitz tangent angle, allows for estimates of massiveness such as ε-entropy. A Sobolev-type metric on piecewise C 2 curves allows for efficient curve matching based on a multiscale wavelet-like analysis. Finally, the Weil-Petersson metric, a Riemannian metric on the class of smooth diffeomorphisms of S 1 → R 2 , allows a low dimensional shape representation, an N-Teichon, whose initial conditions are closely linked to curvature. Dedication This paper is dedicated to the 75th birthday of Prof. David Mumford. Prof. Mumford served as a PhD adviser for the authors of this paper, and the work contained here began as three of the last four theses he supervised. His unparalleled knowledge, unmatched scientific intuition and unabating interest in mathematics and applications shaped the authors' academic lives and the field of Pattern Theory itself. The authors are deeply grateful to him for his mentorship and intellectual generosity.
Article
Full-text available
LSD is a linear-time Line Segment Detector giving subpixel accurate results. It is designed to work on any digital image without parameter tuning. It controls its own number of false detections: on average, one false alarm is allowed per image [1]. The method is based on Burns, Hanson, and Riseman's method [2], and uses an a contrario validation approach according to Desolneux, Moisan, and Morel's theory [3, 4]. The version described here includes some further improvement over the one described in our original article [1]. Source Code The ANSI C implementation of LSD version 1.6 is the one which has been peer reviewed and accepted by IPOL. The source code, the code documentation, and the online demo are accessible at the IPOL web page of this article 1. Supplementary Material Also available at the IPOL web page of this article 2 are two older implementations of LSD, versions 1.0 and 1.5, as well as an example of applying LSD, frame by frame, to a video. The version 1.0 of LSD code corresponds better to the algorithm described in our original article [1], and does not include the further improvements described here and included in the current version; they can be compiled, both, as a C language program or using the Megawave2 3 framework. Versions 1.0 and 1.5 of the code are non reviewed material.
Conference Paper
Full-text available
This paper takes a crucial step toward a visual system for an automated strawberry harvester. We present an algorithm based on the Blum medial axis that outputs for a given berry image a bounding box containing the berry's stem, and determines minimal geometric information to do so. The algorithm first generates three potential boxes, then automatically selects which of the three contains the stem. We compare the performance of our geometric-based stem detection with two other methods. The first, implemented already for a berry harvesting robot, relies on the principal axes of the berry shape to define the bounding box. The second takes as input the three potential boxes generated using the medial axis, then selects the one containing the stem by computing geometric and appearance features within each box for use in an ensemble classifier of 250 trees boosted by RUSboost with five-leaf minimum and a learning rate of 0.1. Note that because our data is imbalanced we used class-proportional sampling. Our geometric approach outperforms the other two methods on a database of 286 strawberry images.
Chapter
Full-text available
We conclude the book by covering a wide spectrum of applications of medial symmetries of shape from the infinitely large toward the infinitely small. Our journey starts with a dynamic model of the formation and evolution of galaxies. We move on to the description of geographical information at the scale of regions of planet Earth. Next is the representation of cities, buildings, and archaeological artifacts, followed by the perception of gardens and the generation of virtual plants. Having reached the scale of human activities, we consider the perception and generation of artistic creations, the study of motion and the generation of animated virtual objects, and the representation of geometrically complex systems in machining, metal forging and object design. We then move inside the human body itself with applications in medical imaging and biology, followed by the representation of molecular structures. Our final stop is to consider the abstract scale of the perception of visual information.
Article
Let T be a tree that is embedded in the plane and let Δ, ε > 0 be real numbers. The aim is to preprocess T into a data structure, such that, for any query polygonal path Q, we can decide if T contains a path P whose Fréchet distance δ F (P,Q) to Q is less than Δ. We present an efficient data structure that solves an approximate version of this problem, for the case when T is c-packed and each of the edges of T and Q has length Ω(Δ) (not required if T is a path): If the data structure returns NO, then there is no such path P. If it returns YES, then \(\delta_F(P,Q) \leq \sqrt{2} (1+\varepsilon )\Delta\) if Q is a line segment, and δ F (P,Q) ≤ 3(1 + ε)Δ otherwise.
Article
Most of the traditional methods for shape classification are based on contour. They often encounter difficulties when dealing with classes that have large nonlinear variability, especially when the variability is structural or due to articulation. It is well-known that shape representation based on skeletons is superior to contour based representation in such situations. However, approaches to shape similarity based on skeletons suffer from the instability of skeletons, and matching of skeleton graphs is still an open problem. Using a new skeleton pruning method, we are able to obtain stable pruned skeletons even in the presence of significant contour distortions. We also propose a new method for matching of skeleton graphs. In contrast to most existing methods, it does not require converting of skeleton graphs to trees and it does not require any graph editing. Shape classification is done with Bayesian classifier. We present excellent classification results for complete shapes.
Chapter
We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by (1) solving for correspondences between points on the two shapes, and (2) using the correspondences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descriptor, the shape context, to each point. The shape context at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape contexts, enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences, we estimate the transformation that best aligns the two shapes; regularized thin-plate splines provide a flexible class of transformation maps for this purpose. The dissimilarity between the two shapes is computed as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning transform. We treat recognition in a nearest neighbor classification framework as the problem of finding the stored prototype shape that is maximally similar to that in the image. We also demonstrate that shape contexts can be used to quickly prune a search for similar shapes. We present two algorithms for rapid shape retrieval: representative shape contexts, performing comparisons based on a small number of shape contexts, and shapemes, using vector quantization in the space of shape contexts to obtain prototypical shape pieces. Results are presented for silhouettes, handwritten digits and visual CAPTCHAs.