Content uploaded by Kathryn Leonard

Author content

All content in this area was uploaded by Kathryn Leonard on May 19, 2016

Content may be subject to copyright.

Minimal Geometric Representation and Strawberry

Stem Detection

Kathryn Leonard, Rebecca Strawbridge, Danika Lindsay, Raquel Barata, Matthew Dawson, Lawrence Averion

Department of Mathematics

California State University, Channel Islands

Camarillo, CA USA

Email: kleonard.ci@gmail.com

Abstract—This paper takes a crucial step toward a visual

system for an automated strawberry harvester. We present an

algorithm based on the Blum medial axis that outputs for a

given berry image a bounding box containing the berry’s stem,

and determines minimal geometric information to do so. The

algorithm ﬁrst generates three potential boxes, then automatically

selects which of the three contains the stem. We compare the per-

formance of our geometric-based stem detection with two other

methods. The ﬁrst, implemented already for a berry harvesting

robot, relies on the principal axes of the berry shape to deﬁne the

bounding box. The second takes as input the three potential boxes

generated using the medial axis, then selects the one containing

the stem by computing geometric and appearance features within

each box for use in an ensemble classiﬁer of 250 trees boosted by

RUSboost with ﬁve-leaf minimum and a learning rate of 0.1. Note

that because our data is imbalanced we used class-proportional

sampling. Our geometric approach outperforms the other two

methods on a database of 286 strawberry images.

Index Terms—fruit harvesting; medial axis; computational

geometry; skeletal shape models

I. INTRODUCTION

This paper addresses two issues: (1) developing a geometry-

based strawberry stem-detection system, and (2) determining

minimal information required to perform that detection. As

outlined in [10], visual systems for harvesting robots are a

pressing but as yet unmet need. For example, in 2011 Georgia

lost $400 million from healthy crops dying in the ﬁelds due to

a dearth of seasonal labor [4]. This paper addresses strawberry

harvesting in particular. The US produced 2.85 billion pounds

of strawberries in 2011, making it the largest strawberry

producer in the world. 2.53 billion pounds were grown in

California alone [3]. Additionally, strawberry pesticide methyl

bromide is quite toxic to humans.

Parallel to that industrial need is a recent theoretical interest

in efﬁcient (minimal) representations for shapes and shape

classes [5], [12], [18]. For example, [12] compares the efﬁ-

ciency of modeling shapes using the boundary curve and using

the Blum medial axis, determining geometric criteria for when

one is more efﬁcient than the other.

The utility of the Blum medial axis remains divisive for

the computer vision community, in large part because of its

sensitivity to noise and the difﬁculty of accurately extracting

shape boundaries from images. At the same time, the medial

axis provides an efﬁcient shape representation [11] while

encoding key geometric information from the boundary curve

[6], [16]. We demonstrate that the medial skeleton proves

extremely efﬁcient for encoding shape information for straw-

berries. Indeed, only three points on the skeleton are necessary

to determine stem location.

Our method takes as input an image containing a berry,

extracts the berry boundary and an associated medial skeleton,

identiﬁes three points on the skeleton capturing the coarse

berry shape, and then outputs three bounding boxes with one

containing the berry stem. Note that once we have extracted

the segmented berry, this process uses no appearance cues

from the image. The method relies on geometry alone.

To determine which of the three boxes contains the stem,

we then compare a hybrid appearance-geometry classiﬁcation

process with a process based solely on geometry. The pure

geometric approach wins handily.

Our method also outperforms a method based on the

principal axes of the berry described in [7], the only other

publication describing berry stem detection to our knowledge.

II. METHODOLOGY

Our strawberry image database derives from Google image

search and photographing berries in nearby ﬁelds. From an

initial collection of 321 berry images, we discarded 35 because

of failure to separate adjacent berries. See Figure 1(top) for a

typical example of a discarded image. The remaining 286 berry

images are of varying quality (blurry iphone to crisp high-

resolution), context (berry ﬁeld to commercial photographic

set), and pose. See Figure 2. Several of these berries have

ﬂawed segmentations, as can be seen in Figure 1(bottom),

but were not discarded.

We segment each berry, select points on the interior Blum

medial axis of the boundary curve to generate three rectan-

gular bounding boxes as candidates to contain the stem, then

compare two methods for selecting one of the three boxes

as the one containing the stem. We also compare the medial

skeleton approach to a simpler approach using the principal

axis of the berries as in [7].

Our process consists of ﬁve steps, described in more detail

below: (A) Extraction of berry boundary, (B) Medial skele-

ton computation and simpliﬁcation, (C) Identiﬁcation of key

medial axis points, (D) Bounding box generation, (E) Box

selection.

Fig. 1. Top: An example of an image discarded from the dataset and its

segmentation. Bottom: An example of an undiscarded image despite ﬂawed

segmentation.

Fig. 2. A sampling of images from the strawberry dataset.

A. Extract Boundary

We segment the image using k-means clustering in the

L*a*b* color space with k= 3 [13]. For a typical berry

image, this produces clusters with red, green and other pixels.

We select the cluster with the highest values of red. In case

of images with multiple berries, we extract the largest berry

using the watershed method [1]. Having isolated the pixels

from a single berry, we convert to a binary image using Otsu’s

method [14], then ﬁll any holes. Finally, we shrink the berry

shape by one pixel and subtract from the original binary image

to obtain a boundary that is a single pixel wide. Coordinates

for the berry boundary correspond to pixel locations within

the image. See Figure 3.

Fig. 3. Boundary extraction process. Top: Original image (L), red cluster

from k-means process (R). Bottom: red cluster with holes ﬁlled (L), border

(R).

B. Compute Medial Skeleton

The interior Blum medial axis for a continuous shape

boundary is a pair (m, r), where mis a collection of skeletal

curves composed of centers of maximal circles inscribed

within the shape boundary, and rgives the associated radii

[2]. For discrete shape boundaries, the interior medial skeleton

of a ﬁnite sample of boundary points can be approximated

by the interior Voronoi vertices or, equivalently, centers of the

circumcircles of the corresponding Delaunay triangulation [9].

Because the skeleton is known to be sensitive to pixelation

and other sources of noise, we introduce two simpliﬁcation

steps: extracting the medial skeleton of the convex hull of the

berry boundary, and applying the contour ratio to the resulting

skeleton. The contour ratio computes the ratio {length of sec-

ond longest section of boundary}/{total length of boundary}

[17]. For a given threshold T, skeletal points with a contour

ratio smaller than Tare discarded.

More precisely, suppose a point mon the medial skeleton

corresponds to a medial circle containing boundary points

{γ(s1), ..., γ(sn)} ⊂ γ, the boundary curve, where s1<

... < snfor san arclength parameter on γ. Let Lγbe the

length of γ,Li=Rsi+1

siγ0(s)ds for i= 1, . . . , n −1and

Ln=Rs1

sn−1γ0(s)ds be the lengths of the segments of the

boundary associated to m. Permuting to order the lengths from

largest to smallest gives Li1> .. . > Lin. The contour ratio

associated to the medial point mis C(m) = Li2

Lγ. Note that

C(m)takes on values between 0 and 1/2. The larger the value

of C(m), the more the global structure of γdepends on the

particular medial point m. Given a threshold T, we discard

medial points mfor which C(m)< T . See Figure 4.

In practice, a single value for Trarely sufﬁces for a database

of highly variable shapes. Three threshold values sufﬁced for

the berry shapes: T= 0,0.01,0.02. Our method automatically

selects a threshold for each image, as described in the next

section.

100 120 140 160 180 200 220 240 260 280

140

160

180

200

220

240

260

280

300

320

100 120 140 160 180 200 220 240 260 280

140

160

180

200

220

240

260

280

300

320

100 120 140 160 180 200 220 240 260 280

140

160

180

200

220

240

260

280

300

320

100 120 140 160 180 200 220 240 260 280

140

160

180

200

220

240

260

280

300

320

Fig. 4. Process for extracting key medial points. Left column (top to bottom):

original berry boundary, boundary of convex hull of berry, convex hull with

medial skeleton. Right column: key medial points with (top to bottom) T

= 0.0, 0.1, 0.2. The key medial points are detected by ﬁnding the extreme

points of the medial skeleton after pruning the it using the contour ratio with

thresholds given by T.

100 120 140 160 180 200 220 240 260 280

140

160

180

200

220

240

260

280

300

320

Fig. 5. Triangle generated by key medial points, p1, p2, p3. For medial-

based box selection, p1will be assigned to the rightmost key point, deﬁned

as the bottom of strawberry. This choice of key medial points from the three

possibilities shown in Figure 4is an automated process based on the pairwise

distances between the extreme medial points.

C. Identify Key Medial Axis Points

Each of the three choices of the contour ratio threshold T

generates a skeleton for a particular berry boundary. Within

each of these three skeletons associated to a berry we select

three points, implicitly generating an isosceles triangle that

captures the coarse shape of the berry. See Figure 5. We use

these points to select simultaneously a value for Tand key

medial points in a two-step process: (1) for each value of

T, identify three key medial points {p1, p2, p3}, (2) evaluate

properties of the point triple to select a single Tand the best

point triple.

The best-ﬁtting triangle generated by a particular skeleton

will be the one generated by key medial points as close as

possible to the primary maxima of curvature of the berry

boundary. We therefore select medial points on the minimum

enclosing circle of the medial skeleton [8], the smallest circle

containing all the medial points. If fewer than three medial

points lie on the minimum enclosing circle, we remove those

points and repeat the process on the remaining medial points

until we ﬁnd three points spaced sufﬁciently far apart. See

Figure 4, bottom row, for the key medial points for each of

the three values of Tgenerated by a given berry shape.

To select a single value for T, we assume that a berry’s

width will differ more from each of its lengths than the two

lengths will from each other. Measuring the lengths of the

sides of each of the three triangles, we select the value for

Tthat produces a triangle with a pair of sides most similar

in length. Simultaneously, we identify the point common

to the two similar sides as the bottom of the strawberry

corresponding to the point opposite the stem. For the berry

depicted in Figure 4, we select T= 0 giving the triangle and

point labeling described in Figure 5.

Summary of Steps:

1) For each contour ratio threshold value, extract three

extreme points on the medial axis, generating a triangle.

2) For each triangle, compute the side lengths. Identify the

triangle with a pair of sides that are closest in length.

Discard all other triangles and thresholds.

3) Store the key medial points corresponding to the remain-

ing threshold value as {p1, p2, p3}, where p1is the point

common to the two similar sides.

D. Generate Bounding Boxes

For each pair of points {pi, pj}selected from {p1, p2, p3},

we generate a box perpendicular to −−→

pipj, centered at the

midpoint of −−→

pipj, of height α1times the berry height and of

width α2times the berry width. Intersecting the bounding box

with the complement of the strawberry region in the original

image produces the output of the algorithm. See Figure 6.

We ﬁnd that α1=α2=1

2works well in practice. This

process generates three bounding boxes potentially containing

the berry stem.

E. Select Correct Box

1) Method 1: Medial Geometry: Given points {p1, p2, p3}

deﬁning a triangle where ||p1−p2|| ≈ ||p1−p3||, take p1to

be the bottom of the strawberry. We select the bounding box

perpendicular to −−→

p2p3.

2) Method 2: Feature-based Classiﬁcation: We compute

a feature vector for each bounding box, selecting from 7

features: ratios of the lengths of the sides of the triangle

deﬁned by the key medial points; entropy, skewness, and

variance of the gray-scale pixel values within the box; and

proportion of green within the box. Applying PCA, we train

a classiﬁer to identify the box with the stem.

Our box data is imbalanced, with around 80% of the boxes

not containing stems. We use half the data for training and

use class-proportional sampling. We implement an ensemble

of 250 trees, imposing a minimum of ﬁve leaves and a learning

Fig. 6. Top: Boundary curve, medial skeleton, key medial points and output

box with α1=

1

2,α2=

1

4. Bottom: Original image and output box

containing stem.

Fig. 7. L: Major and minor axes of the berry shape intersecting at the

centroid. R: Output box containing the stem.

rate of 0.1, boosted by RUSboost. The best-performing feature

vector uses 6 of the 7 features, discarding the color measure.

Performance on the training set is 63% correct classiﬁcation

for non-stem boxes and 75% correct for boxes with stems.

F. Comparison Method: Principal Axis

For comparison, we also generate stem bounding boxes

based on orientation of the berries’ major and minor axes and

the location of the berry centroid, following the procedure

outlined in [7]. The major axis is assumed to run the length

of the berry with the minor axis along the width. We measure

the span of white pixels from the major axis in the binary

berry image to ensure we have chosen the correct label (top,

bottom) for each end of the major axis. We generate a box

centered along the major axis with height β1times the length

of the major axis and β2times the length of the minor axis.

We ﬁnd that β1=β2=1

2works well in practice. See Figure

7.

III. RES ULTS A ND DISCUSSION

A. Results

The medial skeleton proves a powerful tool in locating berry

stems with minimal information. One of the pairs of the three

key medial points generates a box containing the stem in 268

of the 286 berry images, for a success rate of 93.7%. In almost

all cases, the box is centered on the stem, or where the stem

would be if not occluded by a leaf or another berry. In the few

cases where none of the three medial boxes contains the stem,

either a box just misses the stem, indicating a poor contour

ratio threshold choice (Figure 9), or two of the key medial

points are selected from adjacent branches, indicating a failure

of the minimum enclosing circle method (Figure 10).

In comparison, the principal axis method correctly locates

the stem for only 77 berries, giving a 26.9% success rate.

See Table I. Note that in [7], the 100 berry images for

which a much higher success rate is reported were obtained

in a controlled laboratory environment where the berries had

already been detached from their plants.

For box selection, again the medial approach outperforms

the alternatives. Examining medial triangle side length sim-

ilarities produces the correct box for 190 of the 268 berry

images where one of the three output boxes contains the

stem, giving a success rate of 70.9%. In comparison, the best-

performing tree-ensemble classiﬁer produced the correct box

for only 54.1% of the images. See Table I.

We show a sampling of successful medial skeleton stem

boxes and original images in Figure 8.

MP: 3-box MP: 1-box Tree PA

% Correct 94% 71% 54% 27%

TABLE I

BOX SELECTION PERFOMANCE RESULTS:MEDIAL POINT GEOMETRY (MP)

WITH 3-BOX AN D 1-BOX O UTP UT S,TR EE -BAS ED CL ASS IFI ER,PRINCIPAL

AX IS (PA).

We found two main sources of error for the medial box

selection method: berries that were near-circular (Figure 11),

and unripe berries with misshapen segmentations resulting in

stray medial branches (Figure 12).

B. Discussion

Our geometric approach successfully locates stems of

berries for harvesting, outperforming current methodology

and statistical appearance-based approaches. One purpose for

exploring this approach is a desire to be more deliberate in

decisions about which solutions are most appropriate to which

problems. Machine learning schemes work tremendously well

for many problems, especially broad detection and recognition

problems, but other methods may be more appropriate in

constrained problems such as berry stem detection. Following

the spirit of Rissanen’s minimum description length principal

[15], the fact that our method requires only three medial points

to correctly bound a stem suggests that the medial approach

is appropriate in this instance.

An unanticipated beneﬁt of our geometric approach is the

potential to identify unripe or misshapen berries using the three

medial key points. Unripe berries have a consistent lack of

symmetry in their point conﬁgurations. See Figure 12. Future

work will explore the possibility of classifying berries as

ripe/unripe, well-formed/misshapen based on the conﬁguration

of the medial key points.

Fig. 8. Stem boxes produced using only three medial key points.

Many other fruit crops are harvested by stem severing. With

appropriate variations based on the geometric features of each

fruit, our methods are likely to succeed for these other classes

of fruits such as citrus, apples, and pears. Having developed an

understanding of the geometry of these other fruits, we plan

to explore the possibility of a large-scale fruit recognition and

stem detection system.

ACK NOW LE DG EM EN TS

The authors gratefully acknowledge Ron Rieger for intro-

ducing us to the berry stem identiﬁcation problem, and the

National Science Foundation IIS-0954256 for funding our

work.

Fig. 9. Medial axis for berry where the algorithm chose too large a threshold

and pruned extreme points on the desired branch.

Fig. 10. Medial axis for berry where the algorithm selected suboptimal

medial points. The missing point is the endpoint of the upper left skeleton

branch.

REFERENCES

[1] Beucher, S. & Lantujoul, C. “Use of watersheds in contour detection.”

In International Workshop on Image Processing, Real-time Edge and

Motion Detection, 1979. 2

[2] H. Blum. “Biological shape and visual science”. J. Theor. Biol. 38:205-

287, 1973. 2

[3] California AG Network. “2011 Strawberry Pro-

duction Stable, Fresh-Market.” June, 2011

(http://californiaagnet.com/pages/landing news?2011-Strawberry-

Production-Stable-Fresh-=1&blockID=534771&feedID=2523). 1

[4] Carter, C. “Spring labor shortage cost Georgia almost $400 million in

lost crops and revenues.” The Produce News, Oct. 11, 2011. 1

Fig. 11. Medial axis for a round berry with incorrect box selection. Stem is

located to the L, box selected using medial points is toward lower R.

Fig. 12. Typical medial skeletons for unripe berries.

[5] J. N. Damon. Determining the geometry of boundaries of objects from

medial data. IJCV, 63(1): 45-64, 2005. 1

[6] P. Giblin and B. Kimia. “On the intrinsic reconstruction of shape from

its symmetries.” IEEE Trans. on Pattern Anal. and App. 25(7):895-911,

2003. 1

[7] Guo F., Cao Q., & Nagata M. “Fruit detachment and classiﬁcation

method for strawberry harvesting robot.” International Journal of Ad-

vanced Robotic Systems, 5(1): 41-48, 2008. 1,4

[8] D’Errico, J. “A suite of minimal bounding objects.” MATLAB Central

File Exchange, January 2012. 3

[9] Dey, T. & Zhao, W. “Approximating the medial axis from the Voronoi di-

agram with a convergence guarantee.” Journal Algorithmica, 38(1):179-

200 , 2002. 2

[10] Kapach, K, Barnea, E., Mairon, R., Edan, Y., & Ben-Shahar, O. “Com-

puter vision for fruit harvesting robots state of the art and challenges

ahead.” Int. J. Computational Vision and Robotics, 3(1-2):4-34, 2012. 1

[11] K. Leonard, An efﬁciency criterion for 2D shape model selection, IEEE

CVPR Proc., 1, 2006, 1289 - 1296 1

[12] K. Leonard, “Efﬁcient shape modeling: ε-entropy, adaptive coding, and

Blum’s medial axis versus the boundary curve.” Int. J. Comp. Vis., 74,

2007, 183 - 199. 1

[13] MacQueen, J. B. “Some methods for classiﬁcation and analysis of

multivariate observations.” Proceedings of 5th Berkeley Symposium on

Mathematical Statistics and Probability. University of California Press.

pp. 281297, 1967. 2

[14] Otsu, N. “A threshold selection method from grey-level histograms.”

IEEE Trans Sys Man Cyber 9(1):62-66, 1979. 2

[15] J. Rissanen. Stochastic complexity in statistical inquiry. World Scientiﬁc

Press, 1989. 4

[16] K. Siddiqi, B. Kimia. “A shock grammar for recognition.” Proceedings

of the Conference on Computer Vision and Pattern Recognition, p. 507-

513, 1996. 1

[17] Sakai, H. & Sugihara, K. “A method for stable construction of medial

axes in ﬁgures.” Electronics and Communications in Japan, Part 2,

89:(7): 48-55, 2006. 2

[18] P. Yushkevich, P. Fletcher, S. Joshi, A. Thall, S. Pizer. Continuous

medial representations for geometric object modeling in 2-D and 3-

D. Proceedings of the 1st Generative Model-based Vision Workshop,

Copenhagen, DK, June, 2002. 1