Conference PaperPDF Available

Abstract and Figures

We propose efficiency of representation as a criterion for evaluating shape models, then apply this criterion to compare the boundary curve representation with the medial axis. We estimate the å-entropy of two compact classes of curves. We then construct two adaptive encodings for noncompact classes of shapes, one using the boundary curve and the other using the medial axis, and determine precise conditions for when the medial axis is more efficient. Along the way we construct explicit near-optimal boundarybased approximations for compact classes of shapes and an explicit compression scheme for non-compact classes of shapes based on the medial axis. We end with an application of the criterion to shape data.
Content may be subject to copyright.
2D Shape Model Selection via Efficiency Measures: An Empirical Study
Kathryn Leonard
Dept. of Applied and Computational Mathematics
California Institute of Technology
In related work, we propose efficiency of representation
as a criterion for evaluating 2D shape models. In this work,
we apply the criterion to three shape databases of shape
contours: Mokhtarian’s 1100 fish, Kimia’s 1003 shapes,
and a new database we extracted from the BSDS300 train-
ing set. In particular, we determine which shapes are more
efficiently modeled by the boundary curve and which are
better modeled by the medial axis. Surprisingly, we find that
nearly every shape is more efficiently represented by the me-
dial axis, and that this superiority is reasonably robust. We
offer an explanation for these results based on the geomet-
ric relationships between the medial axis and the boundary
1. Introduction
The study of 2D shapes and their similarities plays a
central role in the field of computer vision, relating to
tasks such as object detection, classification and recogni-
tion [4,10,14,17,18,25]. It is not clear what the best shape
model is, or if a best model exists, as different tasks rely on
different qualities of a shape [16]. Reasons for choosing a
shape model may be compatibility with a larger image anal-
ysis structure (e.g., [19,27,7,26]), qualities of an image
(e.g. the dalmation dog versus a line drawing), or simply
religious belief. In [1], we propose that efficiency of repre-
sentation offers a useful, quantitative selection criterion for
shape models, and we derive such a criterion based on the
intrinsic geometry of each shape. In this paper, we provide
an empirical study of that efficiency criterion.
At the heart of our work is the idea that regardless of
the model used for a particular task, understanding efficient
representation gives insight into the objects being modeled.
For linear spaces, this observation has led to the invention of
wavelet-type constructions, the study of sparse representa-
tion, and the subsequent revolution of signal processing. We
are working toward similar successes in the nonlinear space
of 2D shapes. We begin by considering two of the most
popular shape models, the boundary curve and the medial
While the boundary curve is an obvious choice for a
shape model, the medial axis is controversial because the
axis structure is not stable under small perturbations of the
boundary curve. The reverse, however, is not true, as the
medial axis is a very stable representation of the boundary
curve: small perturbations of a branch of the medial axis re-
sult in small perturbations of the boundary curve [2]. From
a theoretical perspective, the axis is appealing because of
the way it encodes geometric properties of the boundary
curve into its own geometry. It is equally appealing from a
practical perspective because of its ability to decompose an
object into parts. Consequently, it has been applied to many
shape-related problems, such as 2D- and 3D-recognition
[29,12], animation [22], medical imaging [6,28], shape
reconstruction [3,9], and shape matching [23,21].
We analyze the efficiency of boundary and medial axis
representations of naturally occurring shapes drawn from
three databases: the Kimia 1003 [20], Mokhtarian’s 1100
fish [15], and a new database of 219 shapes we have ex-
tracted from the hand-segmented BSDS300 training set
[13]. We selected the benchmark BSDS300 dataset because
our results give insight into the role of shape in detection
and recognition tasks; we wanted our data to be the same
data used in such research.
We begin with a brief overview of the results presented
in [1], then apply our efficiency criterion to the shape
databases to see which of the two models is more efficient
for each shape. We define a relative efficiency measure µto
be the ratio of bit length using the boundary to bit length us-
ing the medial axis, then examine its behavior as the bound-
ary is smoothed or sub-sampled. Because our computations
involve delicate estimates of second derivatives associated
to the medial axis, we also analyze stability of µafter per-
turbations of the medial points.
2. Model Comparison Theory
For us, a shape is a planar region with boundary curve
having finite length and curvature—a silhouette. We place
a metric on the space of curves that reflects the strong signif-
icance of orientation in human perception [11], a Hausdorff
metric on both location and tangent orientation:
ρ(γ1, γ2)
= sup
where λis a dimension-normalizing constant.
In this setting, we construct an ε-lossy encoding of gen-
eral plane curves that reduces to the optimal encoding for
compact classes of curves [1], with bit rate as follows. The
ε-lossy encoding of a plane curve γof bounded curvature
κγand length L, with κγcontinuous a.e., requires a bit
length Bγsatisfying:
BγR[0,L]|κγ|ds +δ
where δis an error term that can be made arbitrarily small.
Equation 1results from the fact that the leading term of
the bit rate for the encoding of a curve comes entirely from
encoding the tangent angle function. Refining the encoding
to correct for location error requires only lower order terms.
Note that Equation 1gives a bit rate for encoding both
the boundary curve and the medial curve. A simplification
of the encoding gives a parallel result for functions, with
leading term in the bit rate dependent only on the function’s
derivative. We will employ this in Section 2.2.
2.1. Properties of the medial axis
The medial axis, defined in 1970 by Blum [5] in the con-
text of mathematical morphology, can be thought of as the
skeleton of a region in the plane. It captures the local sym-
metries of a shape, thereby encoding the boundary geome-
try in its own geometry. There is a one-to-one correspon-
dence between medial axis pairs and boundary curves, and
there are explicit formulas to go back and forth between
medial geometry and boundary geometry. We include the
essential relationships here, but refer to [8,2,24] for more
complete explanations.
The medial axis pair,(m(t), r(t)), of a closed planar
region consists of m(t), the curve defined by the closure of
the locus of centers of maximal circles contained within the
region, and r(t), the associated radii. The medial curve m
will consist of several branches with a degree of smooth-
ness determined by the smoothness of the boundary curve
[8], meeting at branch points. We will denote a branch con-
tained in mby m. Throughout, the symbol ’ will be re-
served for derivatives with respect to an arclength parameter
of m.
For other notation, refer to Figure 1. Because the max-
imal circles are bitangent to the boundary curve, the radius
joining a medial point to a corresponding boundary point
will be parallel to the normal to the boundary. The angle
between the tangent Tmto the medial curve and the out-
ward normal to the boundary is denoted by φ. Because the
medial axis sweeps out both sides of the boundary at once,
each medial point corresponds to two boundary points: one
to the left of the medial curve, γ+, and one to the right,
γ. The subscripts ±will refer to these two portions of the
boundary, noting that γwill have the same orientation as
mwhile γ+will have the opposite.
(s )
(s )
Figure 1. Notation for the medial axis.
The key geometric relationship underlying our work is
the expression of the tangent angle function to the boundary,
θ±, in terms of medial data:
In other words, to reconstruct the tangent angle function for
the boundary, one needs only the tangent to the medial curve
and the angle φ. Note that because r=sin φ, knowledge
of the angle φis equivalent to knowledge of r.
As mentioned in the introduction, the medial axis is a
stable representation of the boundary curve. In [2], we de-
rive geometric and analytic bounds on how far the boundary
curve can wander given medial data at two nearby points on
a medial branch. We include Figure 2here as a demonstra-
tion of the geometric bounds.
2.2. Adaptive Coding and Model Comparison
We now present the criterion for comparing efficiency of
the boundary curve with that of the medial axis [1]. The me-
dial axis is more efficient than the boundary curve over do-
mains Ithat are decomposable into subdomains Ijwhere:
supIj|φ|>2 + 3 or (3)
supIj|κm|>2 + 3.(4)
We are able to obtain these results because of the inti-
macy between the boundary and medial geometry. From
Figure 2. Possible regions R±for γbetween γ±
1and γ±
2, given
medial data at point m1,m2on a medial branch.
Equation 2, encoding θγwith εerror is the same as encod-
ing θm±φ+π/2to within ε, and so one may encode θm
with error η[0, ε]and encode φwith error εη, allo-
cating precision where it is most necessary. This gives the
bit rate (ignoring the slop term) for an encoding of γvia the
medial axis as:
We can also derive an expression in termsof medial data
for the bit rate for encoding θγdirectly. For sarclength
on γ,varclength on a medial branch, and corresponding
domains Dand [0, l]:
ZD|κγ|ds =Z[0,l]|κm|+|φ|+||κm| − |φ|| dv. (6)
Choosing the optimal ηadaptively, one may compare the
expressions in Equations 5and 6to determine when the me-
dial axis is more efficient. Doing so gives the criterion pre-
sented at the beginning of the section.
Equation 2shows that the medial axis decouples the ge-
ometry of the boundary into a portion arising from the me-
dial curve and a portion arising from radial rays. Equations
3&4indicate that if the boundary curvature comes from
either the curvature of the medial curve or the variation of
the angle φ, the medial axis is more efficient. When the
curvature of the boundary relies heavily on both sources,
however, the boundary curve is more efficient; Equations 3
&4give the threshold for the transition.
3. Experimental Methods
We apply these ideas to shape data, in order to learn
which naturally occurring shapes are better modeled by the
boundary and which are better modeled by the medial axis.
For all experiments, the approximation error is ε= 0.01.
Recall that our relative efficiency measure µis the ratio of
the boundary bit rate to the medial axis bit rate.
3.1. Shape Data
To ensure diversity of data, we analyzed shapes from
three databases. The first, Kimia’s 1003-shape database
[20], consists of floating-point shape contours in several
shape categories, including humans, fish, animals, cups,
bones, airplanes, tools, and others. See Figure 3. The
second is the 1100 fish database of Mokhtarian, et al [15],
which consists of integer-valued contours of many strange
and wonderful fish extracted from marine biology texts.
See Figure 4. The final database consists of 219 integer-
valued shape contours which we extracted from the Berke-
ley BSDS300 hand-segmented training set [13]. See Figure
Figure 3. Random samples from Kimia’s 1003-shape database.
We operated on the raw data from the Kimia dataset,
but the integer-valued contours have large pixelation ef-
fects. We therefore smoothed the boundaries of the other
two datasets using 5 time steps of width .01 in a discrete
implementation of the geometric heat equation, with cur-
vature weighted by 0.3. This gives minimal smoothing, as
exemplified by Figure 6, where even the sharp corners re-
main clear.
3.2. Medial Data Extraction and Curvature Ap-
To extract medial axis data from the contours, we use the
Delaunay triangulation of the boundary points to approx-
imate the centers of the medial circles. Because the De-
launay triangulation corresponds to a circle through three
boundary points, whereas most medial circles touch only
two boundary points, the resulting approximation to a me-
dial branch will artificially zig-zag as the third boundary
Figure 4. Random samples from the Mahktarian 1100-fish
Figure 5. Sample shapes from the hand-segmented BSDS300
training set.
point is alternately chosen from γ+and γ. We apply the
same smoothing used for the pixelated boundaries to these
rough medial branches, again for 5 time steps. From the
smoothed medial curve, we approximate rusing the ray Nγ
joining the medial point to the singleton of the three bound-
ary points.
We use the standard 3-point approximations for deriva-
tives. The tangent to a medial point miis approximated by
Ti=mi+1 mi1. The angle between Tiand Nγfor the
corresponding boundary point then offers an approximation
Figure 6. A shape before and after smoothing.
φiof φ, from which we find an approximation φ
i. The cur-
vature at miis approximated by the 3-point formula for the
derivative of θmwith respect to arclength.
3.3. Stability of Results
To test the robustness of our results, we manipulated
the data in a few ways. First, we investigated the effects
of smoothing the boundary curve, looking at bit rates after
smoothing for 5, 10, 15, and 20 time steps for the pixelated
data, and for 0, 5, 10, and 15 time steps for the floating-
point data. Next, we experimented with sub-sampling the
boundary points, analyzing bit rates for 25%, 50%, 75%,
and 100% of the data points. Note that because different
choices of εmerely scale the resulting bit rates and leave µ
fixed, there is no need to vary ε.
Finally, we analyzed effects of the sensitivity of the sec-
ond derivative approximation to noise and round-off er-
ror. We focused on the Kimia database because of its size
and variety of shape categories, and because the values are
floating-point. We perturbed the smoothed medial points
by adding a random vector to each point. The coordinates
of the random vector were generated from a uniform dis-
tribution, scaled so that the length of the vector would not
exceed 1/c ·lfor some c, where lis the minimum distance
between two medial points in a given branch. We experi-
mented with the value of c, recomputing the bit rates after
each perturbation. For each choice of c, adding a random
vector can change the curvature approximations by at most
α=π2 tan1c
l. Note that the maximum curvature change
occurs when c= 1, giving α1.57/l.
4. Results
4.1. Primary Experiment
We find overwhelmingly that the medial axis is more ef-
ficient. Out of the 2,322 shapes analyzed, only three were
more efficiently represented by the boundary curve. Fig-
ure 7shows a scatter plot of the bit rate for the medial axis
against the bit rate for the boundary curve for each shape.
The median value for µis 1.476; see the leftmost box plot
displayed in Figures 10 and 12. Figure 8shows the three
shapes and corresponding bit rates for which the boundary
was more efficient (µ < 1); we believe these suffer from
insufficient boundary smoothing. Figure 9shows the shape
and corresponding bit rates from each database for which µ
was the largest.
0 0.5 1 1.5 2 2.5 3 3.5 4
x 104
x 104
Figure 7. Scatter plots of bit rates using the medial axis (x-axis)
and boundary curve (y-axis). The line shown is y=x.
Boundary bits: 4800
Axis bits: 5944
Boundary bits: 11640
Axis bits: 27774
Boundary bits: 39109
Axis bits: 82692
Figure 8. Shapes for which the boundary curve is more efficient
than the medial axis, and their medial axes. The third shape, a
park bench from BSDS, has an anomaly in the medial axis due to
the hand segmentation.
4.2. Robustness Experiments
The efficiency of the medial axis is reasonably robust to
the manipulations performed, though we find evidence of
some susceptibility to down-sampling.
Figure 10 displays boxplots for µas smoothing increases
from left to right. The median value for µdecreases slightly
as the boundary is smoothed, but its variance decreases sig-
nificantly. The notches in the box plot indicate the 95%
confidence interval for the value of the median. In all cases,
it is well away from the value 1, indicating that the effi-
ciency of the medial axis is statistically significant. Figure
Boundary bits: 7733
Axis bits: 4120
Boundary bits: 16489
Axis bits: 9990
Boundary bits: 21369
Axis bits: 11979
Figure 9. Shapes from each of the three databases for which the
medial axis was most efficient, their medial axes, and respective
bit rates. From top to bottom: Mokhtarian, BSDS, Kimia.
Column Number
Figure 10. Box plots of values for µas the boundary curve is
smoothed by (left to right) 5, 10, 15, and 20time steps.
11 displays the corresponding scatter plots; note that after
only 10 time steps of smoothing, all shapes are better repre-
sented by the medial axis, suggesting further investigation
into the effects of pixelation are necessary.
Sub-sampling the boundary points, the median value for
µdecreased with each increase in subsampling. See Figure
12. Interestingly, µremained stable for the Kimia database.
Figure 13 shows results for sub-sampling with the Kimia
data removed. Even without the Kimia data, however, it
is not until 75% of the original boundary points have been
removed that the confidence interval for the median of µ
contains the value 1. Again, the difference of results for the
Kimia dataset suggests pixelation is a likely cause.
For the perturbation study, the value of the constant c
determining the maximum length of the perturbation vector
is crucial. We found that for c21, the efficiency of the
medial axis is stable. At c= 20, a few shapes begin to
be better modeled by the boundary, and for c19, the
boundary is almost always more efficient. We performed 25
trials in our perturbation study, taking c= 20 as the center
0 5 10
x 104
12 x 104
0 5 10
x 104
12 x 104
0 5 10
x 104
x 104
0 5 10
x 104
12 x 104
Figure 11. Scatter plots of bit rates using the medial axis (x-axis) and boundary curve (y-axis) after smoothing for (left to right) 5, 10, 15,
20 time steps.
Column Number
Figure 12. Box plots of values for µas the boundary curve is sub-
sampled: (left to right) 100%, 75%, 50%, 25% of points sampled.
of the transitional range. We give an idea of the effect on
curvature as cvaries: for c= 19,α0.105/l; for c= 20,
α0.099/l; and for c= 21,α0.095/l.
For each trial, the boundary curve became more efficient
for between 4 and 12 out of the 1003 Kimia shapes. Figure
14 shows the box plot for µafter perturbations. The median
value has decreased slightly, but is still significantly above
µ= 1.
On a shape-by-shape basis, the outcome of these trials
was remarkably consistent. A few shapes appeared only
once or twice (see Figure 16), but the 19 shapes depicted in
Figure 15 consistently appeared in several of the trials, with
one appearing in every single trial (noisy bowtie, upper left
corner). Note that the collection of shape categories among
the perturbed shapes for which the boundary is more effi-
Column Number
Figure 13. Box plots of values for µas the boundary curve is sub-
sampled, with Kimia data removed: (left to right) 100%, 75%,
50%, 25% of points sampled.
cient is quite small—noisy bowtie, tool, bone, human, and
goblet categories account for all but two of these shapes—
and that these shapes share some qualitative properties. The
median value for µwithout perturbation for the 19 shapes
in Figure 15 is 1.65, whereas the median for the larger pop-
ulation is 1.476. This indicates that, instead of being shapes
for which the efficiency of the boundary and the medial axis
are close, these shapes are more sensitive to perturbation
than others. This merits further exploration to determine
the source of the increased sensitivity of their axis curves.
Column Number
Figure 14. Box plot for values for µafter perturbation of the medial
Figure 15. Shapes for which a small perturbation made the bound-
ary more efficient in three or more of the 25 perturbation trials,
displayed in decreasing order of number of trials. The shape in the
upper left corner appeared in every trial.
5. Discussion
We began these experiments with the expectation that the
boundary would be more efficient for certain categories of
shape and the medial axis would be more efficient for oth-
ers. Certainly, such a result is implied by the criterion stated
in Equations 3&4. Surprisingly, we found that the medial
axis is more efficient in almost all settings for almost all of
the 2,322 shapes analyzed. While it is not true that the me-
dial axis is more efficient for every branch associated to a
particular shape, for the majority of the branches, it is sig-
Figure 16. Shapes for which a small perturbation made the bound-
ary more efficient in one or two of the 25 perturbation trials.
nificantly more efficient.
Our results show that for most of the branches in a shape,
|κm|is very small, especially compared to |φ|. In other
words, the medial axis is tremendously successful in decou-
pling the geometry of the boundary, so that the medial curve
supplies a near-linear scaffold upon which φbuilds bound-
ary curvature. The robustness of these results suggests that
despite the delicacy of the second-order estimates and de-
spite the sensitivity of the medial axis to small changes in
the boundary, the greater efficiency of the medial axis is
Knowledge of the efficiency of the medial axis is ex-
tremely significant for the vision community. Certainly, our
work provides justification for the medial axis as a shape
model. In addition, such knowledge can be used to enhance
shape detection and recognition: skeletal preferences can
be built into priors on feature detectors; knowledge of local
symmetries can reduce the search for large wavelet coeffi-
cients; branch structures can guide construction of constel-
lation models and account for missing features. We hope
our work will influence these tasks, as well as inspiring fur-
ther investigations into model efficiency.
[1] Anonymous. An Efficiency Criterion for 2D Shape
Model Selection. CVPR submission 434, 2005. 1,2
[2] Anonymous. Efficient Shape Modeling:
ε-entropy, Adaptive Coding, and Boundary Curves -
vs- Blum’s Medial Axis. Submitted to IJCV, 2005. 1,
[3] N. Amenta, S. Choi, R. Kolluri. The power crust,
union of balls, and the medial axis transform. Compu-
tational Geometry: Theory and Applications, 19(2-3):
127-153, 2001. 1
[4] M. F. Beg, M. I. Miller, A. Trouve, L. Younes. Com-
puting metrics via geodesics on flows of diffeomor-
phisms. International Journal of Computer Vision,
July 2003. 1
[5] H. Blum. Biological shape and visual science. J.
Theor. Biol. 38:205-287, 1973. 2
[6] J. N. Damon. Determining the geometry of boundaries
of objects from medial data. IJCV, 63(1): 45-64, 2005.
[7] A. Desolneux, L. Moisan, J.-M. Morel. Meaningful
alignments. IJCV, 40(1):7-23, 2000. 1
[8] P. Giblin and B. Kimia. On the intrinsic reconstruction
of shape from its symmetries. IEEE PAMI, 25: 895-
911, 2003. 2
[9] R. A. Katz, S. M. Pizer. Untangling the Blum medial
axis transform. IJCV, 55(2-3):139-153, 2003. 1
[10] A. Latto, D. Mumford, J. Shah. The representation of
shape. IEEE 1984 Proceedings of the Workshop on
Computer Vision Representation and Control. 1
[11] W. Li, G. Westheimer. Human discrimination of the
implicit orientation of simple symmetrical patterns.
Vision Research, 37(5):565-572, 1997. 2
[12] D. Macrini, A. Shokoufandeh, S. Dickinson, K. Sid-
diqi, S. Zucker. View-based 3-D object recognition us-
ing shock graphs. Technical report, 2001. 1
[13] D. Martin, C. Fowlkes, D. Tal, J. Malik. A database of
human segmented natural imaged and its application
to evaluating segmentation algorithms and measuring
ecological statistics. Proc. 8th Int’l Conf. Computer
Vision, 2:416-423, July 2001. 1,3
[14] P. Michor, D. Mumford. Riemannian geometries on
spaces of plane curves. J. Eur. Math. Soc., to appear. 1
[15] F. Mokhtarian, S. Abbasi, J. Kittler. Robust and ef-
ficient shape indexing through curvature scale space.
Proc. 6th British Machine Vision Conference, pp 53-
62, September 1996. 1,3
[16] D. Mumford. The problem of robust shape descrip-
tors. Proceedings of the IEEE First International Con-
ference on Computer Vision, 1987. 1
[17] D. Mumford. Theories of shape: Do they model per-
ception? SPIE Geometric Methods in Computer Vi-
sion, 1570, 1991. 1
[18] D. Mumford. The shape of objects in two and three di-
mensions: Mathematics meets computer vision. AMS
Josiah Gibbs Lecture, Baltimore, MD, January 2003.
[19] L. Rudin, S. Osher, C. Fatemi. Nonlinear total vari-
ation based noise removal algorithms. Physica D,
60:259-268, 1992. 1
[20] D. Sharvit, J. Chan, B.B. Kimia. Symmetry-based in-
dexing of image databases. In Content-Based Access
of Image and Video Libraries, 1998. 1,3
[21] T. Sebastian, P. Klein, B. Kimia. Recognition of
shapes by editing shock graphs. Proceedings of 8th In-
ternational Conference of Computer Vision, Vancou-
ver, IEEE Computer Society Press, July, 2001, p. 755-
762. 1
[22] J. Shen, D. Thalmann. Fast realistic human body de-
formation for animation and VR applications. Appli-
cations of Computer Graphics International, Pohang,
1996, p. 166-174. 1
[23] K. Siddiqi, B. Kimia. A shock grammar for recogni-
tion. Proceedings of the Conference on Computer Vi-
sion and Pattern Recognition, 1996, p. 507-513. 1
[24] R. Teixera. Curvature motions, medial axes and dis-
tance transforms. PhD Thesis, Harvard University,
1998. 2
[25] R. C. Veltkamp. Shape matching: Similarity mea-
sures and algorithms. Technical report UU-CS-2001-
03, 2003. 1
[26] Z. W. Tu, X. R. Chen, A. L. Yuille, S.-C. Zhu. Image
parsing: unifying segmentation, detection, and recog-
nition. IJCV, 63(2):113-140,2005. 1
[27] S. X. Yu, J. Shi. Concurrent object recognition and
segmentation by graph partitioning. IEEE Conference
on Computer Vision and Pattern Recognition, Madi-
son, Wisconsin, 2003. 1
[28] P. Yushkevich, P. Fletcher, S. Joshi, A. Thall, S. Pizer.
Continuous medial representations for geometric ob-
ject modeling in 2-D and 3-D. Proceedings of the 1st
Generative Model-based Vision Workshop, Copen-
hagen, DK, June, 2002. 1
[29] S. C. Zhu, A. L. Yuille. FORMS: A flexible object
recognition and modeling system. International Jour-
nal of Computer Vision 20:187-212, 1996.
... The utility of the Blum medial axis remains divisive for the computer vision community, in large part because of its sensitivity to noise and the difficulty of accurately extracting shape boundaries from images. At the same time, the medial axis provides an efficient shape representation [11] while encoding key geometric information from the boundary curve [6], [16]. We demonstrate that the medial skeleton proves extremely efficient for encoding shape information for strawberries. ...
Conference Paper
Full-text available
This paper takes a crucial step toward a visual system for an automated strawberry harvester. We present an algorithm based on the Blum medial axis that outputs for a given berry image a bounding box containing the berry's stem, and determines minimal geometric information to do so. The algorithm first generates three potential boxes, then automatically selects which of the three contains the stem. We compare the performance of our geometric-based stem detection with two other methods. The first, implemented already for a berry harvesting robot, relies on the principal axes of the berry shape to define the bounding box. The second takes as input the three potential boxes generated using the medial axis, then selects the one containing the stem by computing geometric and appearance features within each box for use in an ensemble classifier of 250 trees boosted by RUSboost with five-leaf minimum and a learning rate of 0.1. Note that because our data is imbalanced we used class-proportional sampling. Our geometric approach outperforms the other two methods on a database of 286 strawberry images.
... In other words, the boundary curvature for shapes of interest comes almost entirely from variation around a nearly straight curve of symmetry. For more details, refer to [15]. We are in the process of exploring the consequences of the discovery. ...
Full-text available
This paper evaluates the Blum medial axis representation of embeddings of S(1) into R(2) from the perspective of efficiency, using a C(1)-type metric. For compact classes of curves with Lipschitz tangent angle, we compute the epsilon-entropy and compare that efficiency benchmark with uniform approximation using the Blum medial axis. In the compact setting, the boundary curve is more efficient. For noncompact classes of embeddings, we establish a geometric criterion for when the medial axis will be more efficient in an adaptive approximation.
This paper analyzes the problem of determining the optimal scaling to prune the medial axis of spurious branches with the use of the Scale Axis Transform (SAT) in \(\mathbb{R}^{2}\). This optimal scaling is found by minimizing the Fréchet distance between the boundary of the true shape and the boundary of the SAT-filtered version of the shape perturbed by noise. To compute the minimum, the noisy shape is filtered using a variety of scalings s > 1 of the SAT algorithm. The optimal scaling is then related to the level of noise used to perturb the true shape. The minimization problem is repeated for various shapes and different noise levels. In applications such as image recognition and registration, the medial axis is very relevant. However, it is highly susceptible to noise along the boundary. The results presented here offer crucial information to automate the de-noising process, by providing a link between the level of noise and the optimal SAT scaling factor.
Full-text available
A problem in both computer vision and artificial intelligence is the transition from events in the real world, which are always described by continuous data, to symbolic descriptions of the sort that computers can readily manipulate. There is a kind of mismatch between these two aspects of reality, between the signals of electrical engineers and the terms of symbolic logic. Pertinent principles are discussed and applied to the description of shape.
Full-text available
We describe a flexible object recognition and modelling system (FORMS) which represents and recognizes animate objects from their silhouettes. This consists of a model for generating the shapes of animate objects which gives a formalism for solving the inverse problem of object recognition. We model all objects at three levels of complexity: (i) the primitives, (ii) the mid-grained shapes, which are deformations of the primitives, and (iii) objects constructed by using a grammar to join mid-grained shapes together. The deformations of the primitives can be characterized by principal component analysis or modal analysis. When doing recognition the representations of these objects are obtained in a bottom-up manner from their silhouettes by a novel method for skeleton extraction and part segmentation based on deformable circles. These representations are then matched to a database of prototypical objects to obtain a set of candidate interpretations. These interpretations are verified in a top-down process. The system is demonstrated to be stable in the presence of noise, the absence of parts, the presence of additional parts, and considerable variations in articulation and viewpoint. Finally, we describe how such a representation scheme can be automatically learnt from examples.
Full-text available
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a parsing graph, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches—generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns—generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002. IEEE Trans. PAMI, 24(5):657–673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.
We describe a flexible object recognition and modeling system (FORMS) which represents and recognizes animate objects from their silhouettes. This consists of a model for generating the shapes of animate objects which gives a formalism for solving the inverse problem of object recognition. We model all objects at three levels of complexity: (i) the primitives, (ii) the mid-grained shapes, which are deformations of the primitives, and (iii) objects constructed by using a grammar to join mid-grained shapes together. The deformations of the primitives can be characterized by principal component analysis or modal analysis. When doing recognition the representations of these objects are obtained in a bottom-up manner from their silhouettes by a novel method for skeleton extraction and part segmentation based on deformable circles. These representations are then matched to a database of prototypical objects to obtain a set of candidate interpretations. These interpretations are verified in a...
We describe a novel continuous medial representation for object geometry and a deformable templates method for fitting the representation to images. Our representation simultaneously describes the boundary and medial loci of geometrical objects, always maintaining Blum's symmetric axis transform (SAT) relationship. Cubic b-splines define the continuous medial locus and the associated thickness field, which in turn generate the object boundary. We present geometrical properties of the representation and derive a set of constraints on the b-spline parameters. The 2D representation encompasses branching medial loci; the 3D version can model objects with a single medial surface, and the extension to branching medial surfaces is a subject of ongoing research. We present preliminary results of segmenting 2D and 3D medical images. The representation is ultimately intended for use in statistical shape analysis.