Content uploaded by Kathryn Leonard
Author content
All content in this area was uploaded by Kathryn Leonard on May 19, 2016
Content may be subject to copyright.
Minimal Geometric Representation and Strawberry
Stem Detection
Kathryn Leonard, Rebecca Strawbridge, Danika Lindsay, Raquel Barata, Matthew Dawson, Lawrence Averion
Department of Mathematics
California State University, Channel Islands
Camarillo, CA USA
Email: kleonard.ci@gmail.com
Abstract—This paper takes a crucial step toward a visual
system for an automated strawberry harvester. We present an
algorithm based on the Blum medial axis that outputs for a
given berry image a bounding box containing the berry’s stem,
and determines minimal geometric information to do so. The
algorithm first generates three potential boxes, then automatically
selects which of the three contains the stem. We compare the per-
formance of our geometric-based stem detection with two other
methods. The first, implemented already for a berry harvesting
robot, relies on the principal axes of the berry shape to define the
bounding box. The second takes as input the three potential boxes
generated using the medial axis, then selects the one containing
the stem by computing geometric and appearance features within
each box for use in an ensemble classifier of 250 trees boosted by
RUSboost with five-leaf minimum and a learning rate of 0.1. Note
that because our data is imbalanced we used class-proportional
sampling. Our geometric approach outperforms the other two
methods on a database of 286 strawberry images.
Index Terms—fruit harvesting; medial axis; computational
geometry; skeletal shape models
I. INTRODUCTION
This paper addresses two issues: (1) developing a geometry-
based strawberry stem-detection system, and (2) determining
minimal information required to perform that detection. As
outlined in [10], visual systems for harvesting robots are a
pressing but as yet unmet need. For example, in 2011 Georgia
lost $400 million from healthy crops dying in the fields due to
a dearth of seasonal labor [4]. This paper addresses strawberry
harvesting in particular. The US produced 2.85 billion pounds
of strawberries in 2011, making it the largest strawberry
producer in the world. 2.53 billion pounds were grown in
California alone [3]. Additionally, strawberry pesticide methyl
bromide is quite toxic to humans.
Parallel to that industrial need is a recent theoretical interest
in efficient (minimal) representations for shapes and shape
classes [5], [12], [18]. For example, [12] compares the effi-
ciency of modeling shapes using the boundary curve and using
the Blum medial axis, determining geometric criteria for when
one is more efficient than the other.
The utility of the Blum medial axis remains divisive for
the computer vision community, in large part because of its
sensitivity to noise and the difficulty of accurately extracting
shape boundaries from images. At the same time, the medial
axis provides an efficient shape representation [11] while
encoding key geometric information from the boundary curve
[6], [16]. We demonstrate that the medial skeleton proves
extremely efficient for encoding shape information for straw-
berries. Indeed, only three points on the skeleton are necessary
to determine stem location.
Our method takes as input an image containing a berry,
extracts the berry boundary and an associated medial skeleton,
identifies three points on the skeleton capturing the coarse
berry shape, and then outputs three bounding boxes with one
containing the berry stem. Note that once we have extracted
the segmented berry, this process uses no appearance cues
from the image. The method relies on geometry alone.
To determine which of the three boxes contains the stem,
we then compare a hybrid appearance-geometry classification
process with a process based solely on geometry. The pure
geometric approach wins handily.
Our method also outperforms a method based on the
principal axes of the berry described in [7], the only other
publication describing berry stem detection to our knowledge.
II. METHODOLOGY
Our strawberry image database derives from Google image
search and photographing berries in nearby fields. From an
initial collection of 321 berry images, we discarded 35 because
of failure to separate adjacent berries. See Figure 1(top) for a
typical example of a discarded image. The remaining 286 berry
images are of varying quality (blurry iphone to crisp high-
resolution), context (berry field to commercial photographic
set), and pose. See Figure 2. Several of these berries have
flawed segmentations, as can be seen in Figure 1(bottom),
but were not discarded.
We segment each berry, select points on the interior Blum
medial axis of the boundary curve to generate three rectan-
gular bounding boxes as candidates to contain the stem, then
compare two methods for selecting one of the three boxes
as the one containing the stem. We also compare the medial
skeleton approach to a simpler approach using the principal
axis of the berries as in [7].
Our process consists of five steps, described in more detail
below: (A) Extraction of berry boundary, (B) Medial skele-
ton computation and simplification, (C) Identification of key
medial axis points, (D) Bounding box generation, (E) Box
selection.
Fig. 1. Top: An example of an image discarded from the dataset and its
segmentation. Bottom: An example of an undiscarded image despite flawed
segmentation.
Fig. 2. A sampling of images from the strawberry dataset.
A. Extract Boundary
We segment the image using k-means clustering in the
L*a*b* color space with k= 3 [13]. For a typical berry
image, this produces clusters with red, green and other pixels.
We select the cluster with the highest values of red. In case
of images with multiple berries, we extract the largest berry
using the watershed method [1]. Having isolated the pixels
from a single berry, we convert to a binary image using Otsu’s
method [14], then fill any holes. Finally, we shrink the berry
shape by one pixel and subtract from the original binary image
to obtain a boundary that is a single pixel wide. Coordinates
for the berry boundary correspond to pixel locations within
the image. See Figure 3.
Fig. 3. Boundary extraction process. Top: Original image (L), red cluster
from k-means process (R). Bottom: red cluster with holes filled (L), border
(R).
B. Compute Medial Skeleton
The interior Blum medial axis for a continuous shape
boundary is a pair (m, r), where mis a collection of skeletal
curves composed of centers of maximal circles inscribed
within the shape boundary, and rgives the associated radii
[2]. For discrete shape boundaries, the interior medial skeleton
of a finite sample of boundary points can be approximated
by the interior Voronoi vertices or, equivalently, centers of the
circumcircles of the corresponding Delaunay triangulation [9].
Because the skeleton is known to be sensitive to pixelation
and other sources of noise, we introduce two simplification
steps: extracting the medial skeleton of the convex hull of the
berry boundary, and applying the contour ratio to the resulting
skeleton. The contour ratio computes the ratio {length of sec-
ond longest section of boundary}/{total length of boundary}
[17]. For a given threshold T, skeletal points with a contour
ratio smaller than Tare discarded.
More precisely, suppose a point mon the medial skeleton
corresponds to a medial circle containing boundary points
{γ(s1), ..., γ(sn)} ⊂ γ, the boundary curve, where s1<
... < snfor san arclength parameter on γ. Let Lγbe the
length of γ,Li=Rsi+1
siγ0(s)ds for i= 1, . . . , n −1and
Ln=Rs1
sn−1γ0(s)ds be the lengths of the segments of the
boundary associated to m. Permuting to order the lengths from
largest to smallest gives Li1> .. . > Lin. The contour ratio
associated to the medial point mis C(m) = Li2
Lγ. Note that
C(m)takes on values between 0 and 1/2. The larger the value
of C(m), the more the global structure of γdepends on the
particular medial point m. Given a threshold T, we discard
medial points mfor which C(m)< T . See Figure 4.
In practice, a single value for Trarely suffices for a database
of highly variable shapes. Three threshold values sufficed for
the berry shapes: T= 0,0.01,0.02. Our method automatically
selects a threshold for each image, as described in the next
section.
100 120 140 160 180 200 220 240 260 280
140
160
180
200
220
240
260
280
300
320
100 120 140 160 180 200 220 240 260 280
140
160
180
200
220
240
260
280
300
320
100 120 140 160 180 200 220 240 260 280
140
160
180
200
220
240
260
280
300
320
100 120 140 160 180 200 220 240 260 280
140
160
180
200
220
240
260
280
300
320
Fig. 4. Process for extracting key medial points. Left column (top to bottom):
original berry boundary, boundary of convex hull of berry, convex hull with
medial skeleton. Right column: key medial points with (top to bottom) T
= 0.0, 0.1, 0.2. The key medial points are detected by finding the extreme
points of the medial skeleton after pruning the it using the contour ratio with
thresholds given by T.
100 120 140 160 180 200 220 240 260 280
140
160
180
200
220
240
260
280
300
320
Fig. 5. Triangle generated by key medial points, p1, p2, p3. For medial-
based box selection, p1will be assigned to the rightmost key point, defined
as the bottom of strawberry. This choice of key medial points from the three
possibilities shown in Figure 4is an automated process based on the pairwise
distances between the extreme medial points.
C. Identify Key Medial Axis Points
Each of the three choices of the contour ratio threshold T
generates a skeleton for a particular berry boundary. Within
each of these three skeletons associated to a berry we select
three points, implicitly generating an isosceles triangle that
captures the coarse shape of the berry. See Figure 5. We use
these points to select simultaneously a value for Tand key
medial points in a two-step process: (1) for each value of
T, identify three key medial points {p1, p2, p3}, (2) evaluate
properties of the point triple to select a single Tand the best
point triple.
The best-fitting triangle generated by a particular skeleton
will be the one generated by key medial points as close as
possible to the primary maxima of curvature of the berry
boundary. We therefore select medial points on the minimum
enclosing circle of the medial skeleton [8], the smallest circle
containing all the medial points. If fewer than three medial
points lie on the minimum enclosing circle, we remove those
points and repeat the process on the remaining medial points
until we find three points spaced sufficiently far apart. See
Figure 4, bottom row, for the key medial points for each of
the three values of Tgenerated by a given berry shape.
To select a single value for T, we assume that a berry’s
width will differ more from each of its lengths than the two
lengths will from each other. Measuring the lengths of the
sides of each of the three triangles, we select the value for
Tthat produces a triangle with a pair of sides most similar
in length. Simultaneously, we identify the point common
to the two similar sides as the bottom of the strawberry
corresponding to the point opposite the stem. For the berry
depicted in Figure 4, we select T= 0 giving the triangle and
point labeling described in Figure 5.
Summary of Steps:
1) For each contour ratio threshold value, extract three
extreme points on the medial axis, generating a triangle.
2) For each triangle, compute the side lengths. Identify the
triangle with a pair of sides that are closest in length.
Discard all other triangles and thresholds.
3) Store the key medial points corresponding to the remain-
ing threshold value as {p1, p2, p3}, where p1is the point
common to the two similar sides.
D. Generate Bounding Boxes
For each pair of points {pi, pj}selected from {p1, p2, p3},
we generate a box perpendicular to −−→
pipj, centered at the
midpoint of −−→
pipj, of height α1times the berry height and of
width α2times the berry width. Intersecting the bounding box
with the complement of the strawberry region in the original
image produces the output of the algorithm. See Figure 6.
We find that α1=α2=1
2works well in practice. This
process generates three bounding boxes potentially containing
the berry stem.
E. Select Correct Box
1) Method 1: Medial Geometry: Given points {p1, p2, p3}
defining a triangle where ||p1−p2|| ≈ ||p1−p3||, take p1to
be the bottom of the strawberry. We select the bounding box
perpendicular to −−→
p2p3.
2) Method 2: Feature-based Classification: We compute
a feature vector for each bounding box, selecting from 7
features: ratios of the lengths of the sides of the triangle
defined by the key medial points; entropy, skewness, and
variance of the gray-scale pixel values within the box; and
proportion of green within the box. Applying PCA, we train
a classifier to identify the box with the stem.
Our box data is imbalanced, with around 80% of the boxes
not containing stems. We use half the data for training and
use class-proportional sampling. We implement an ensemble
of 250 trees, imposing a minimum of five leaves and a learning
Fig. 6. Top: Boundary curve, medial skeleton, key medial points and output
box with α1=
1
2,α2=
1
4. Bottom: Original image and output box
containing stem.
Fig. 7. L: Major and minor axes of the berry shape intersecting at the
centroid. R: Output box containing the stem.
rate of 0.1, boosted by RUSboost. The best-performing feature
vector uses 6 of the 7 features, discarding the color measure.
Performance on the training set is 63% correct classification
for non-stem boxes and 75% correct for boxes with stems.
F. Comparison Method: Principal Axis
For comparison, we also generate stem bounding boxes
based on orientation of the berries’ major and minor axes and
the location of the berry centroid, following the procedure
outlined in [7]. The major axis is assumed to run the length
of the berry with the minor axis along the width. We measure
the span of white pixels from the major axis in the binary
berry image to ensure we have chosen the correct label (top,
bottom) for each end of the major axis. We generate a box
centered along the major axis with height β1times the length
of the major axis and β2times the length of the minor axis.
We find that β1=β2=1
2works well in practice. See Figure
7.
III. RES ULTS A ND DISCUSSION
A. Results
The medial skeleton proves a powerful tool in locating berry
stems with minimal information. One of the pairs of the three
key medial points generates a box containing the stem in 268
of the 286 berry images, for a success rate of 93.7%. In almost
all cases, the box is centered on the stem, or where the stem
would be if not occluded by a leaf or another berry. In the few
cases where none of the three medial boxes contains the stem,
either a box just misses the stem, indicating a poor contour
ratio threshold choice (Figure 9), or two of the key medial
points are selected from adjacent branches, indicating a failure
of the minimum enclosing circle method (Figure 10).
In comparison, the principal axis method correctly locates
the stem for only 77 berries, giving a 26.9% success rate.
See Table I. Note that in [7], the 100 berry images for
which a much higher success rate is reported were obtained
in a controlled laboratory environment where the berries had
already been detached from their plants.
For box selection, again the medial approach outperforms
the alternatives. Examining medial triangle side length sim-
ilarities produces the correct box for 190 of the 268 berry
images where one of the three output boxes contains the
stem, giving a success rate of 70.9%. In comparison, the best-
performing tree-ensemble classifier produced the correct box
for only 54.1% of the images. See Table I.
We show a sampling of successful medial skeleton stem
boxes and original images in Figure 8.
MP: 3-box MP: 1-box Tree PA
% Correct 94% 71% 54% 27%
TABLE I
BOX SELECTION PERFOMANCE RESULTS:MEDIAL POINT GEOMETRY (MP)
WITH 3-BOX AN D 1-BOX O UTP UT S,TR EE -BAS ED CL ASS IFI ER,PRINCIPAL
AX IS (PA).
We found two main sources of error for the medial box
selection method: berries that were near-circular (Figure 11),
and unripe berries with misshapen segmentations resulting in
stray medial branches (Figure 12).
B. Discussion
Our geometric approach successfully locates stems of
berries for harvesting, outperforming current methodology
and statistical appearance-based approaches. One purpose for
exploring this approach is a desire to be more deliberate in
decisions about which solutions are most appropriate to which
problems. Machine learning schemes work tremendously well
for many problems, especially broad detection and recognition
problems, but other methods may be more appropriate in
constrained problems such as berry stem detection. Following
the spirit of Rissanen’s minimum description length principal
[15], the fact that our method requires only three medial points
to correctly bound a stem suggests that the medial approach
is appropriate in this instance.
An unanticipated benefit of our geometric approach is the
potential to identify unripe or misshapen berries using the three
medial key points. Unripe berries have a consistent lack of
symmetry in their point configurations. See Figure 12. Future
work will explore the possibility of classifying berries as
ripe/unripe, well-formed/misshapen based on the configuration
of the medial key points.
Fig. 8. Stem boxes produced using only three medial key points.
Many other fruit crops are harvested by stem severing. With
appropriate variations based on the geometric features of each
fruit, our methods are likely to succeed for these other classes
of fruits such as citrus, apples, and pears. Having developed an
understanding of the geometry of these other fruits, we plan
to explore the possibility of a large-scale fruit recognition and
stem detection system.
ACK NOW LE DG EM EN TS
The authors gratefully acknowledge Ron Rieger for intro-
ducing us to the berry stem identification problem, and the
National Science Foundation IIS-0954256 for funding our
work.
Fig. 9. Medial axis for berry where the algorithm chose too large a threshold
and pruned extreme points on the desired branch.
Fig. 10. Medial axis for berry where the algorithm selected suboptimal
medial points. The missing point is the endpoint of the upper left skeleton
branch.
REFERENCES
[1] Beucher, S. & Lantujoul, C. “Use of watersheds in contour detection.”
In International Workshop on Image Processing, Real-time Edge and
Motion Detection, 1979. 2
[2] H. Blum. “Biological shape and visual science”. J. Theor. Biol. 38:205-
287, 1973. 2
[3] California AG Network. “2011 Strawberry Pro-
duction Stable, Fresh-Market.” June, 2011
(http://californiaagnet.com/pages/landing news?2011-Strawberry-
Production-Stable-Fresh-=1&blockID=534771&feedID=2523). 1
[4] Carter, C. “Spring labor shortage cost Georgia almost $400 million in
lost crops and revenues.” The Produce News, Oct. 11, 2011. 1
Fig. 11. Medial axis for a round berry with incorrect box selection. Stem is
located to the L, box selected using medial points is toward lower R.
Fig. 12. Typical medial skeletons for unripe berries.
[5] J. N. Damon. Determining the geometry of boundaries of objects from
medial data. IJCV, 63(1): 45-64, 2005. 1
[6] P. Giblin and B. Kimia. “On the intrinsic reconstruction of shape from
its symmetries.” IEEE Trans. on Pattern Anal. and App. 25(7):895-911,
2003. 1
[7] Guo F., Cao Q., & Nagata M. “Fruit detachment and classification
method for strawberry harvesting robot.” International Journal of Ad-
vanced Robotic Systems, 5(1): 41-48, 2008. 1,4
[8] D’Errico, J. “A suite of minimal bounding objects.” MATLAB Central
File Exchange, January 2012. 3
[9] Dey, T. & Zhao, W. “Approximating the medial axis from the Voronoi di-
agram with a convergence guarantee.” Journal Algorithmica, 38(1):179-
200 , 2002. 2
[10] Kapach, K, Barnea, E., Mairon, R., Edan, Y., & Ben-Shahar, O. “Com-
puter vision for fruit harvesting robots state of the art and challenges
ahead.” Int. J. Computational Vision and Robotics, 3(1-2):4-34, 2012. 1
[11] K. Leonard, An efficiency criterion for 2D shape model selection, IEEE
CVPR Proc., 1, 2006, 1289 - 1296 1
[12] K. Leonard, “Efficient shape modeling: ε-entropy, adaptive coding, and
Blum’s medial axis versus the boundary curve.” Int. J. Comp. Vis., 74,
2007, 183 - 199. 1
[13] MacQueen, J. B. “Some methods for classification and analysis of
multivariate observations.” Proceedings of 5th Berkeley Symposium on
Mathematical Statistics and Probability. University of California Press.
pp. 281297, 1967. 2
[14] Otsu, N. “A threshold selection method from grey-level histograms.”
IEEE Trans Sys Man Cyber 9(1):62-66, 1979. 2
[15] J. Rissanen. Stochastic complexity in statistical inquiry. World Scientific
Press, 1989. 4
[16] K. Siddiqi, B. Kimia. “A shock grammar for recognition.” Proceedings
of the Conference on Computer Vision and Pattern Recognition, p. 507-
513, 1996. 1
[17] Sakai, H. & Sugihara, K. “A method for stable construction of medial
axes in figures.” Electronics and Communications in Japan, Part 2,
89:(7): 48-55, 2006. 2
[18] P. Yushkevich, P. Fletcher, S. Joshi, A. Thall, S. Pizer. Continuous
medial representations for geometric object modeling in 2-D and 3-
D. Proceedings of the 1st Generative Model-based Vision Workshop,
Copenhagen, DK, June, 2002. 1