Content uploaded by S.T. Drummond
Author content
All content in this area was uploaded by S.T. Drummond on Dec 11, 2014
Content may be subject to copyright.
SOFTWARE
Management Zone Analyst (MZA): Software for Subfield
Management Zone Delineation
Jon J. Fridgen, Newell R. Kitchen,* Kenneth A. Sudduth, Scott T. Drummond,
William J. Wiebold, and Clyde W. Fraisse
ABSTRACT different management zones within a field? Two, how
can information be processed into unique management
Producers using site-specific crop management (SSCM) have a units (i.e., procedures for classification)? And three, how
need for strategies to delineate areas within fields to which manage-
ment can be tailored. These areas are often referred to as management many unique zones should a field be divided into? Quick,
zones. Quick and automated procedures are desirable for creating efficient, and automated procedures are needed that ad-
management zones and for testing the question of the number of zones dress these questions.
to create. A software program called Management Zone Analyst A number of information sources have been used to
(MZA) was developed using a fuzzy c-means unsupervised clustering delineate subfield management zones for SSCM. Tradi-
algorithm that assigns field information into like classes, or potential tional soil surveys often provide estimates of crop pro-
management zones. An advantage of MZA over many other software ductivity for each soil map unit. In the USA, county
programs is that it provides concurrent output for a range of cluster soil surveys report the average yield of major crops and
numbers so that the user can evaluate how many management zones various soil properties by soil map unit; but the spatial
should be used. Management Zone Analyst was developed using Micro- scale of county soil surveys has often been found inade-
soft Visual Basic 6.0 and operates on any computer with Microsoft
Windows (95 or newer). Concepts and theory behind MZA are pre- quate for use in SSCM (Mausbach et al., 1993). Digital
sented as are the sequential steps of the program. Management Zone elevation data collected using global positioning systems
Analyst calculates descriptive statistics, performs the unsupervised (GPS) or total station surveys have been used for classi-
fuzzy classification procedure for a range of cluster numbers, and pro- fying a field into management zones (McCann et al.,
vides the user with two performance indices [fuzziness performance 1996; Lark, 1998; MacMillan et al., 1998; van Alphen and
index (FPI) and normalized classification entropy (NCE)] to aid in Stoorvogel, 1998). Fleming et al. (2000) used aerial pho-
deciding how many clusters are most appropriate for creating manage- tographs of bare soil along with landscape position and
ment zones. Example MZA output is provided for two Missouri clay- the management experience of the producer to delin-
pan soil fields using soil electrical conductivity, slope, and elevation as eate within-field management zones. Because of the
clustering variables. Management Zone Analyst performance indices relationship of bulk soil apparent electrical conductivity
indicated that one field should be divided into either two (using NCE)
or four (using FPI) management zones and the other field should be (EC
a
) to productivity on some soils (Kitchen et al., 1999,
divided into four (using NCE or FPI) management zones. 2003), it has been used in the delineation of management
units. Sudduth et al. (1996) and Fraisse et al. (2001a)
used a combination of topographic attributes and soil
EC
a
to delineate management zones. Long et al. (1994)
Site-specific crop management promotes the concept
investigated the accuracy and precision of field manage-
of identification and management of regions within
ment maps created from several sources [e.g., soil survey
the geographic area defined by field boundaries. Often
map, aerial photograph, overlaying class values of kriged
referred to as management zones, these subfield regions
point data in a geographic information system (GIS)].
typically represent areas of the field that are similar based
They concluded that aerial photographs of growing crops
on some quantitative measure(s) (e.g., topography, yield,
were the most accurate and precise for classifying a field
and soil-test nutrients). Determination of subfield areas
into management units to predict grain yield. Imagery
is difficult due to the complex combination of soil, biotic,
of a growing crop and yield data collected in the same
and climate factors that may affect crop yield. These
year would be highly correlated and thus an accurate
factors dynamically interact, further complicating the
representation of crop production potential for that spe-
decisions of how to manage by zones. Three questions
typically arise when considering managing by zones. One, cific year (Boydell and McBratney, 1999).
what information should be used as a basis for creating Delineating zones based on topographic attributes
and/or soil physical properties most often captures yield
J.J. Fridgen, ITD/Spectral Visions, 20407 South Neil Street Suite 2, variability due to differences in plant available water and
Champaign, IL 61820; N.R. Kitchen, K.A. Sudduth, and S.T. Drum- thus, crop production potential (McCann et al., 1996;
mond, USDA–ARS, Cropping Syst. and Water Quality Res. Unit, van Alphen and Stoorvogel, 1998; Fraisse et al., 2001a).
Columbia, MO 65211; W.J. Wiebold, Dep. of Agron., University of
Missouri, Columbia, MO 65211; and C.W. Fraisse, Agric. and Biol.
Eng. Dep., Univ. of Florida, Gainesville, FL 32611. Received 22 Aug.
Abbreviations: EC
a
, apparent soil electrical conductivity; FPI, fuzzi-
2002. *Corresponding author (kitchenn@missouri.edu). ness performance index; GIS, geographic information systems; ISO-
DATA, Iterative Self-Organizing Data Analysis Technique; MZA,Published in Agron. J. 96:100–108 (2004).
American Society of Agronomy Management Zone Analyst; NCE, normalized classification entropy;
SSCM, site-specific crop management.677 S. Segoe Rd., Madison, WI 53711 USA
100
FRIDGEN ET AL.: SOFTWARE FOR MANAGEMENT ZONE DELINEATION 101
The appropriate number of productivity zones was found fied theory of clustering that is widely accepted (Milli-
gan, 1996).
to vary from year to year and was primarily affected by
In supervised clustering, a combination of fieldwork,
weather and the crop planted (Fraisse et al., 2001a).
maps, aerial imagery, and personal experience are used
Fewer zones were required if sufficient moisture was
to characterize specific sites that can then be used to
received during the growing season or crops more toler-
represent the whole area (Mausel et al., 1990). These
ant to water stress were grown.
sites are referred to as training sites because the soil or
Although within-field management zones are often
vegetation reference information is used to train the clas-
used to represent areas of equal production potential, sification algorithm for mapping the remainder of the
they do have other uses. Fleming et al. (2000) suggested data. After calculating multivariate statistical param-
that producer-developed management zones based on eters for each training site, every data point both inside
fertility could be effective in creating variable-rate appli- and outside the training sites is evaluated and assigned
cation maps. MacMillan et al. (1998) concluded that man- to the class for which it has the greatest likelihood of being
agement zones are well suited for locating benchmark a member (Jensen, 1996). We assumed the user may have
soil-sampling sites. Small, spatially coherent areas within no a priori knowledge of what information or areas
fields may also be useful in relating yield to soil and should be used for training and therefore excluded su-
topographic parameters for crop-modeling evaluation pervised clustering as an option for MZA.
(Fraisse et al., 2001b). Unlike supervised clustering techniques, unsuper-
In SSCM applications, cluster analysis procedures vised clustering algorithms do not require the user to
have been effectively used to identify regions of a field specify training areas. Unsupervised classification tech-
that are similar based on landscape attributes, fertility, niques produce natural groupings of the data in attribute
or soil physical properties (Fraisse et al., 2001a). Staf- space (Jensen, 1996; Irvin et al., 1997). Often, unsuper-
ford et al. (1998) used fuzzy clustering of combine yield vised classification is used to gain insight into the inher-
monitor data to divide a field into potential management ent structure of the data. The Iterative Self-Organizing
zones. Similarly, Boydell and McBratney (1999) divided Data Analysis Technique (ISODATA) (Tou and Gon-
a field into management zones using cotton (Gossypium zalez, 1974) is one of the more widely used unsupervised
hirsutum L.) yield estimates from satellite imagery. They clustering algorithms. The ISODATA unsupervised clas-
concluded, however, that due to the effects of weather- sification algorithm calculates class means evenly dis-
induced variability on crop growth, at least 5 yr of yield tributed in the data space and then iteratively clusters
data were required to create a stable number of manage- the data points by minimizing the Euclidean distance
ment zones. from each data point to a class mean. Each iteration re-
While a great deal of knowledge has been gained sults in recalculation of class means and reclassification
through analysis and interpretation of spatial data, pro- of data points with respect to the new means. This pro-
cedures for generating management zones are not well cess continues until either the maximum number of iter-
prescribed for the end-user. Although many GIS pack- ations is reached or the number of data points in each
ages contain the functions necessary for transforming class changes by less than the specified change threshold
spatial information into potential management zones, (ESRI, 1994; RSI, 1999). To effectively characterize out-
they can be cumbersome to use and require considerable put classes by mean vectors and a covariance matrix,
time to learn. Producers have indicated a need for soft- ISODATA requires each variable in the data set to ex-
ware to help them with decision-making and planning hibit a roughly Gaussian distribution. Additionally, bet-
for variable within-field management (Wiebold et al., ter results are obtained if all data exhibit similar vari-
1998). This paper reports on the development of soft- ances (ESRI, 1994; Fraisse et al., 2001a; Irvin et al.,
ware to assist researchers, consultants, and producers 1997). These two requirements may require additional
in creating management zones using quantitative soil, data set preparation before classification.
crop, and/or site information. The software, MZA, pro- Unlike the ISODATA algorithm, the c-means (also
vides systematic procedures for delineating within-field known as k-means) algorithm does not require variables
management zones and for testing the results to evaluate used in the classification to have similar variances or to
the question of how many zones to use in a given field. follow a Gaussian distribution (Irvin et al., 1997). The
Here we report on the theoretical basis of MZA, outline c-means algorithm is based on the minimization of an
the steps and equations used, provide an example of objective function defined as the sum of squared dis-
MZA output for two fields, and list software specifica- tances from all data points in the cluster domain to the
tions and availability. cluster center (i.e., the centroid). Similar to ISODATA,
the c-means algorithm uses an iterative process to recal-
culate the cluster means and assign data points to clus-
THEORETICAL BASIS ters. The algorithm terminates when the specified con-
FOR MANAGEMENT ZONE vergence criterion (i.e., the amount of change in the
ANALYST DEVELOPMENT cluster means) is met (Tou and Gonzalez, 1974).
Cluster analysis is the grouping of similar individuals Zadeh (1965) introduced the theory of fuzzy sets as
into distinct classes called clusters. Literature reviews have a generalization of conventional set theory. Unlike con-
shown many algorithm options for clustering data (Tou ventional set theory, which allows an individual to be-
long to only one set, fuzzy set theory allows individualsand Gonzalez, 1974; Hartigan, 1975), but there is no uni-
102 AGRONOMY JOURNAL, VOL. 96, JANUARY–FEBRUARY 2004
to exhibit partial membership in each of a number of sets. MANAGEMENT ZONE ANALYST
PROGRAM DESCRIPTION
Using this theory, Ruspini (1969) introduced the con-
cept of fuzzy clustering, which allows any given data point Microsoft Visual Basic (Microsoft Corp., Redmond,
to exhibit partial membership in a given class. The appli- WA) was used to develop MZA. Figure 1 depicts the
cation of fuzzy set theory to clustering algorithms has decision structure and information flow through the soft-
allowed researchers to better account for the continuous ware. Management Zone Analyst’s function is the calcu-
variability in natural phenomena (Burrough, 1989). lation of descriptive statistics, the delineation of man-
One of the more extensively used clustering algorithms agement zones using the fuzzy c-means unsupervised
is the fuzzy c-means (also known as fuzzy k-means) al- classification algorithm, and the evaluation of the per-
gorithm. Fuzzy c-means uses a weighting exponent to formance of the clustering by the number of clusters.
control the degree to which membership sharing occurs To maximize utility with other software, MZA was de-
between classes (Bezdek, 1981). Fuzzy c-means classifi- signed to work with comma-delimited ASCII(the Ameri-
cation has been used to classify soil and landscape data can Standard Code for Information Interchange) text
files. Files of this format are easily exported from data-
(Burrough et al., 1992; McBratney and DeGruijter, 1992;
base, spreadsheet, graphics, or GIS software. The first
Odeh et al., 1992; Irvin et al., 1997), yield data (Lark
line of the file must contain the variable names (i.e.,
and Stafford, 1997; Lark, 1998; Stafford et al., 1998), and column names), also separated by commas. There is no
remotely sensed images (Ahn et al., 1999; Boydell and set maximum for the number of observations or vari-
McBratney, 1999). Justification for using this algorithm ables in the data file. However, memory requirements
when using soil information in the classification process increase as the number of observations and the number
has been documented (Odeh et al., 1992). We chose the of variables increase. The analyses performed by the
fuzzy c-means algorithm for MZA, believing that many software have been tested with as many as 20 variables
who would use the software would rely on continuum- and 36 000 observations. Any quantitative data may be
based soil and landscape information as clustering inputs. used in MZA. Sources of data may include, but are not
limited to, soil properties (e.g., soil electrical conductiv-
Identifying a Measure of Similarity ity), topographic attributes (e.g., elevation, slope, curva-
ture), soil fertility, yield, and remotely sensed imagery.
Before a data cluster can be formed using the fuzzy
c-means clustering procedure, it is necessary to establish
Calculation of Descriptive Statistics
an appropriate measure of similarity for assigning indi-
vidual observations to a particular cluster. A measure Descriptive univariate and multivariate statistics of
of similarity is a procedure used to determine how simi- input variables are calculated before the clustering anal-
lar an observation is to a cluster center. The measure of ysis is performed. Computations include the minimum
similarity most commonly used is a normalized distance and maximum values for each variable, the mean, stan-
from an observation to the cluster mean in attribute dard deviation, coefficient of variation, between vari-
space (Tou and Gonzalez, 1974; Johnson, 1998). Thus, as able variance–covariance matrix, and correlation ma-
the distance between the observation and cluster mean trix. The variance–covariance matrix is provided for use
decreases, the similarity between the two increases. Eu- in choosing the measure of similarity (explained in de-
clidean distance, one of the more frequently used mea- tail later).
sures of similarity, gives equal weight to all measured
variables and is sensitive to correlated variables (Bez- Fuzzy c-Means Background
dek, 1981). Geometrically, Euclidean distance generates The fuzzy c-means clustering algorithm was selected
clusters having a spherical shape, which in reality, rarely for the purpose of partitioning ndata observations in
occurs in a soil system (Odeh et al., 1992). Johnson (1998) feature space into cgroups or clusters. “Fuzzy” refers to
described a variant of the Euclidean distance method the shared membership between classes (Ruspini, 1969).
known as standardized Euclidean distance. This proce- There are three primary matrices involved in the clus-
dure computes the standard Euclidean distance between tering process. First, there is the data we want to classify,
points using their standardized Zscores. the data matrix Y, consisting of nobservations with p
The diagonal-distance method for measuring simi- classification variables each. Second is the cluster centroid
larity is described by McBratney and Moore (1985) and matrix V, consisting of ccluster centroids located in the
Odeh et al. (1992). Like Euclidean distance, the diago- feature space defined by the pclassification variables.
nal method is sensitive to correlated variables. However, Finally, there is the fuzzy membership matrix U, con-
it does compensate for distortions in the assumed spheri- sisting of membership values to every cluster in Vfor
cal shape of the clusters by weighting with the variances each observation in Y, bounded by the constraints for
of the measured variables. all i⫽1tocand all k⫽1tonthat:
Another alternative is the Mahalanobis distance, which
u
ik
僆0⫺1, ∀i,kand
兺
c
i
⫽
1
u
ik
⫽1, ∀k[1]
accounts for unequal variances as well as correlations
between variables. It accomplishes this by including the
pooled within-class variance–covariance matrix as an The fuzzy c-means clustering algorithm attempts to
integral part of the distance calculation (Bezdek, 1981; locate minimal solutions to a selected objective function.
The most commonly used objective function (and theMcBratney and Moore, 1985; Odeh et al., 1992).
FRIDGEN ET AL.: SOFTWARE FOR MANAGEMENT ZONE DELINEATION 103
Fig. 1. Diagram of information flow through Management Zone Analyst (MZA) software. Dashed lines represent user input.
one used in MZA) is the weighted within-groups sum sary to select the proper measure of similarity in the
of squared errors objective function (Bezdek, 1981): descriptive statistics calculated before the delineation
process.
J
m
共
U,v
兲
⫽
兺
n
k
⫽
1
兺
c
i
⫽
1
共
u
ik
兲
m
共
d
ik
兲
2
[2] The Euclidean distance should be used only for statis-
tically independent variables exhibiting equal variances.
where m⫽fuzziness exponent (1 ⱕm⬍∞) and (d
ik
)
2
⫽In this case, Ais the p⫻pidentity matrix. This condition
the squared distance in feature space between y
k
and v
i
.is rarely met in practice as even a change in the units of
The fuzziness exponent (m) controls the amount of the classification variables will affect variances. A diago-
membership sharing that occurs between classes. As mnal distance, whereby the identity matrix is adjusted by
increases toward infinity, the amount of membership dividing each row of the matrix by the variance of the re-
sharing increases, and the resulting classes become less lated classification variable, can address this problem.
distinct. Hard clusters (i.e., no membership sharing) oc- The diagonal distance is appropriate for statistically in-
cur as mapproaches a value of 1. dependent classification variables with unequal variances.
The distance in feature space between an observation The third alternative is the Mahalanobis distance where
in Yand a cluster centroid in Vcan be calculated in the Ais defined as the inverse of the p⫻psample variance–
following manner: covariance matrix of Y. The Mahalanobis distance, while
slightly more computationally intensive, can account for
共
d
ik
兲
2
⫽
冨冨
y
k
⫺v
i
冨冨
2
⫽
共
y
k
⫺v
i
兲
⬘A
共
y
k
⫺v
i
兲
[3] situations where Ycontains statistically dependent clas-
where y
k
⫽the data observation k, consisting of pclassi- sification variables with unequal variances (Bezdek,
fication variables; v
i
⫽the centroid of cluster i, con- 1981; Odeh et al., 1992). While all three distance meth-
sisting of pclassification variables; and A⫽positive ods are provided as options in MZA, a user relying on
definite, norm-inducing weight matrix of size p⫻p.soil and landscape data will likely find the Mahalanobis
The weight matrix Adefines the distance-normalizing option to be the most appropriate once the variance–
procedure. The result represents the distance between covariance matrix is examined (Odeh et al., 1992).
two points (vectors are used synonymously with points)
in a linear vector space (Brogan, 1985). In the MZA soft- Cluster Performance Indices
ware, the weight matrix may have three different forms
While the iterative fuzzy c-means algorithm always
depending on the covariance structure of the data. Man-
agement Zone Analyst provides the information neces- converges to a local minimum of J
m
starting from a given
104 AGRONOMY JOURNAL, VOL. 96, JANUARY–FEBRUARY 2004
Table 1. The header row and first 12 rows of an example input
initial U, a different randomization of Umight lead to
data set for Management Zone Analyst (MZA) file (with comma
a different local (or global) minimum (Xie and Beni;
delineation removed).
1991; Bezdek, 1981; Ahn et al., 1999). To evaluate the
Easting Northing Elevation Soil EC† Slope
characteristics of clustering by the number of clusters,
576000 4342200 264.8 71.5 0.56
two types of cluster validity functions were calculated
576010 4342200 264.9 94.8 0.56
on each fuzzy c-partition of Yproduced by the fuzzy
576020 4342200 264.9 98.1 0.27
c-means clustering algorithm.
576030 4342200 265.0 44.1 0.13
576040 4342200 265.0 42.1 0.11
The fuzziness performance index (FPI) (Odeh et al.,
576050 4342200 265.0 44.9 0.02
1992; Boydell and McBratney, 1999) is a measure of
576060 4342200 265.0 43.5 0.1
the degree of separation (i.e., fuzziness) between fuzzy
576070 4342200 265.0 43.6 0.09
576080 4342200 265.0 41 0.18
c-partitions of Yand is defined as:
576090 4342200 265.0 36.6 0.14
576100 4342200 264.9 36.6 0.02
576110 4342200 265.0 34.8 0.16
FPI ⫽1⫺c
(c⫺1)
冤
1⫺
兺
n
k
⫽
1
兺
c
i
⫽
1
共
u
ik
兲
2
冒
n
冥
[4]
.....
.....
.....
Values of FPI may range from 0 to 1. Values ap-
proaching 0 indicate distinct classes with little member-
† EC, electrical conductivity.
ship sharing while values near 1 indicate nondistinct
classes with a large degree of membership sharing.
Bezdek (1981) described a second measure of cluster u
ik
⫽
冤
兺
c
j
⫽
1
冢
d
ik
d
jk
冣
2/(m
⫺
1)
冥
⫺
1
[8]
validity known as the normalized classification entropy
(NCE). The NCE models the amount of disorganization 9. Stop when l
max
is reached or when ||U
(l)
⫺U
(l
⫺
1)
|| ⱕ
of a fuzzy c-partition of Y(Odeh et al., 1992; Lark and ε; otherwise go to Step 7.
Stafford, 1997). The classification entropy (H) is defined 10. Compute the cluster validity functions (FPI and
by the function: NCE).
H(U;c)⫽⫺
兺
n
k
⫽
1
兺
c
i
⫽
1
u
ik
log
a
(u
ik
)
冒
n[5]
MANAGEMENT ZONE ANALYST
where logarithmic base ais any positive integer. Values RESULTS FOR TWO CLAYPAN
of Hwill range from 0 to log
a
(c). Bezdek (1981) reported SOIL FIELDS
that the endpoints of the range of Hdo not accurately To illustrate MZA results, soil and landscape infor-
represent the amount of disorganization present (i.e., mation for two Missouri claypan soil fields were used for
at c⫽1, H⫽0; at c⫽n,H⫽0). To remedy this issue, creating potential management zones. Georeferenced
he suggested the NCE: measurements of soil EC
a
, elevation, and slope (from
elevation) were measured for a 36- and a 14-ha field
NCE ⫽H
共
U;c
兲
/
关
1⫺
共
c/n
兲兴
[6] (Field 1 and Field 2, respectively), kriged, and then
The values of NCE will be similar to those of Hwhen gridded to a common 10-m cell as described for the
cis relatively small compared with n[i.e., (c/n) ap- Missouri field in Kitchen et al. (2003). This gave 3634
proaching 0]. However in situations where (c/n) is large observations for Field 1 and 1308 observations for
(i.e., approaching 1), NCE will produce substantially Field 2. Data files for each field were saved as a comma-
different results. delimited text file with the first row as labels for the
columns, as shown in Table 1 (commas omitted for ta-
Summary of Management Zone Analyst Steps ble). After importing a data file to MZA, the variables
soil EC
a
, elevation, and slope (Fig. 2 and 3) were selected
The algorithmic structure of the iterative fuzzy c-means as the clustering variables. These variables had pre-
algorithm (Bezdek, 1981) is viously been useful in delineating productivity zones for
1. Choose the number of clusters c, with 2 ⱕc⬍n.claypan soil fields (Fraisse et al., 2001a). MZA was then
2. Choose the fuzziness exponent m, with 1 ⱕm⬍∞.used to calculate descriptive statistics for the input vari-
3. Choose an appropriate measure of similarity for ables. For both fields, the classification variables were
the distance metric d
2
ik
.found to have unequal variances and non-zero covari-
4. Choose a value for the stopping criterion ε.ances; thus, the Mahalanobis measure of similarity op-
5. Choose a value for the maximum number of itera- tion was chosen for the delineation procedure. Other
tions l
max
.option settings were fuzziness exponent ⫽1.5, maxi-
6. Initialize U
0
with random values meeting the speci- mum number of iterations ⫽300, convergence crite-
fied constraints. rion ⫽0.0001, minimum number of zones ⫽2, and
7. At iteration l⫽1, 2, 3, . . ., calculate updated V
l
maximum number of zones ⫽8. When clustering using
from U
(l
⫺
1)
, using: soils data, setting the fuzziness exponent between 1.2
and 1.5 will give reasonable results (Odeh et al., 1992).
v
i
⫽
兺
n
k
⫽
1
共
u
ik
兲
m
y
k
冒
兺
n
k
⫽
1
共
u
ik
兲
m
,1ⱕiⱕc[7] After the clustering process, the results were saved by
MZA into a new text file as shown in Table 2. The
output file is the same as the input file (Table 1) but8. Calculate updated U
l
from updated V
l
, using:
FRIDGEN ET AL.: SOFTWARE FOR MANAGEMENT ZONE DELINEATION 105
Fig. 2. (Top) Apparent soil electrical conductivity (EC
a
), elevation, and slope used as Management Zone Analyst (MZA) clustering variables
for Field 1 and (bottom) MZA output for two, four, and six clusters.
with appended columns for zone delineations from two Management Zone Analyst also allows exporting the
performance index data for graphing in other software.
zones to eight zones (shown as header labels of c2, Results from the two indices were graphed for the two
c3, . . . c8). The exported data file from each field was fields in Fig. 4. The minimum FPI was obtained for both
imported into a mapping program and maps created for fields with about four clusters. The minimum NCE was
the two-, four-, and six-zone columns (bottom row of obtained with two clusters for Field 1 and with four
maps in Fig. 2 and 3). clusters for Field 2. The final decision of how many
The final step within MZA is a graphical representa- clusters to use for creating management zones when the
tion of the FPI and NCE performance indices relative performance indices are dissimilar may require addi-
to cluster number to visually assess the optimal cluster tional verification. For example, when developing pro-
number, similar to the approaches taken by others ductivity zones, verification of cluster number might be
(Boydell and McBratney, 1999; Fraisse et al., 2001a). accomplished by comparing the within-zone yield vari-
The optimal number of clusters for each computed index ation as one increases the number of clusters (Fraisse
is when the index is at the minimum, representing the et al., 2001a). Also, by comparing MZA output using
least membership sharing (FPI) or greatest amount of different input variables, a user can assesswhich variables
are most important for creating management zones.organization (NCE) as a result of the clustering process.
106 AGRONOMY JOURNAL, VOL. 96, JANUARY–FEBRUARY 2004
Fig. 3. (Top) Apparent soil electrical conductivity (EC
a
), elevation, and slope as MZA clustering variables for Field 2 and (bottom) MZA output
for two, four, and six clusters.
FRIDGEN ET AL.: SOFTWARE FOR MANAGEMENT ZONE DELINEATION 107
Table 2. The header row and first 12 rows of data of an Manage-
ment Zone Analyst (MZA) output file (with delineation re-
moved). Columns 6 through 12 in this table are the classes that
MZA created for the field from two to eight zones. The values
in each of these columns represent the assigned class from the
clustering procedure.
Easting Northing Elevation Soil EC† Slope c2 c3 c4 c5 c6 c7 c8
576000 4342200 264.8 71.5 0.56 1232627
576010 4342200 264.9 94.8 0.56 1232627
576020 4342200 264.9 98.1 0.27 1232627
576030 4342200 265.0 44.1 0.13 2144311
576040 4342200 265.0 42.1 0.11 2144311
576050 4342200 265.0 44.9 0.02 2144311
576060 4342200 265.0 43.5 0.1 2144311
576070 4342200 265.0 43.6 0.09 2144311
576080 4342200 265.0 41 0.18 2144311
576090 4342200 265.0 36.6 0.14 2145258
576100 4342200 264.9 36.6 0.02 2145258
576110 4342200 265.0 34.8 0.16 2145258
. . . . . .......
. . . . . .......
. . . . . .......
† EC, electrical conductivity.
As previously indicated, multiple outcomes are inher-
ent to the MZA clustering algorithm. In one study, as
few as one outcome (at four clusters) and as many as
seven outcomes (at eight clusters) were possible using
the fuzzy c-means algorithm (Kitchen et al., 2002). In
MZA, initial cluster membership of observations is ran-
domly assigned, and convergence to a local or global
minima may vary depending on that starting point (Xie
and Beni, 1991; Bezdek, 1981; Ahn et al., 1999). We
found, when using the same three delineation variables
as in the example above, that multiple outcomes were
most different when the number of clusters was three
or fewer (Kitchen et al., 2002). Each data set will behave
uniquely, and we encourage users to test each data set
by running MZA and then evaluating the results using Fig. 4. Fuzziness performance index (FPI) and normalized classifica-
the performance indices and other validation methods tion entropy (NCE) as calculated by MZA for (top) Field 1 and
as outlined by Kitchen et al. (2002). (bottom) Field 2. Generally, the best classification occurs when
membership sharing (FPI) and/or the amount of class disorganiza-
tion (NCE) is at a minimum with the least number of classes used.
MANAGEMENT ZONE ANALYST
SPECIFICATIONS AND AVAILABILITY ACKNOWLEDGMENTS
Management Zone Analyst 1.0 was developed us- We thank the following for financial support for this project:
ing Microsoft Visual Basic 6.0 and operates on any com- North Central Soybean Research Program, United Soybean
Board, the Foundation for Agronomic Research (FAR), and
puter running Microsoft Windows (95 or newer). The the USDA-CSREES National Research Initiative and Special
MZA executable and associated components require Water Quality Grants program. We also thank M. Krumpelman,
approximately five megabytes of hard disk space and a B. Mahurin, and M. Volkmann for their assistance in collecting
minimum of eight megabytes of random access memory field and crop measurements.
(RAM). As the number of clusters and/or the size of the
data set increases, memory requirements also increase. REFERENCES
Due to the large number of computations performed dur-
Ahn, C.W., M.F. Baumgardner, and L.L. Biehl. 1999. Delineation of
ing the zone delineation process, processor speed will soil variability using geostatistics and fuzzy clustering analyses of
influence the time required to create the specified num- hyperspectral data. Soil Sci. Soc. Am. J. 63:142–150.
ber of clusters. The latest version of MZA is available Bezdek, J.C. 1981. Pattern recognition with fuzzy objective function
free from the Internet at http://www.fse.missouri.edu/ algorithms. Plenum Press, New York.
Boydell, B., and A.B. McBratney. 1999. Identifying potential within-
ars/decision_aids.htm (verified 30 Oct. 2003). Included field management zones from cotton yield estimates. p. 331–341.
with the software download is a user guide that illustrates In J.V. Stafford (ed.) Precision agriculture ’99. Proc. European Conf.
the steps for running the program. Management Zone on Precision Agric., 2nd, Odense Congress Cent., Denmark. 11–15
July 1999. SCI, London.
Analyst program code is available in Fridgen (2000).
108 AGRONOMY JOURNAL, VOL. 96, JANUARY–FEBRUARY 2004
Brogan, W.L. 1985. Modern control theory. Prentice Hall, Englewood Int. Conf., 4th, St. Paul, MN. 19–22 July 1998. ASA, CSSA, and
SSSA, Madison, WI.Cliffs, NJ.
Burrough, P.A. 1989. Fuzzy mathematical methods for soil survey Mausbach, M.J., D.J. Lytle, and L.D. Spivey. 1993. Application of
soil survey information to soil specific farming. p. 57–68. In P.C.and land evaluation. J. Soil Sci. 40:477–492.
Burrough, P.A., R.A. Macmillan, andW. VanDeursen. 1992. Fuzzy clas- Robert et al. (ed.) Soil specific crop management. Proc. Int. Conf.,
Minneapolis, MN. 14–16 Apr. 1992. ASA, CSSA, and SSSA, Madi-sification methods for determining land suitability from soil profile
observations and topography. J. Soil Sci. 43:193–210. son, WI.
Mausel, P.W., W.J. Kamber, and J.K. Lee. 1990. Optimum band selec-[ERSI] Environmental Systems Research Institute. 1994. Grid com-
mands. ERSI, Redlands, CA. tion for supervised classification of multispectral data. Photo-
gramm. Eng. Remote Sens. 56:55–60.Fleming, K.L., D.G. Westfall, D.W. Wiens, and M.C. Brodah. 2000.
Evaluating farmer developed management zone maps for variable McBratney, A.B., and J.J. DeGruijter. 1992. A continuum approach
to soil classification by modified fuzzy k-means with extragrades.rate fertilizer application. Precis. Agric. 2:201–215.
Fraisse, C.W., K.A. Sudduth, and N.R. Kitchen. 2001a. Delineation J. Soil Sci. 43:159–175.
McBratney, A.B., and A.W. Moore. 1985. Application of fuzzy setsof site-specific management zones by unsupervised classification
of topographic attributes and soil electrical conductivity. Trans. to climatic classification. Agric. For. Meteorol. 35:165–185.
McCann, B.L., D.J. Pennock, C. van Kessel, and F.L. Walley. 1996.ASAE 44(1):155–166.
Fraisse, C.W., K.A. Sudduth, and N.R. Kitchen. 2001b. Calibration The development of management units for site specific farming.
p. 295–302. In P.C. Robert, R.H. Rust, and W.E. Larson (ed.)of the Ceres-Maize model for simulating site-specific crop develop-
ment and yield on claypan soils. Appl. Eng. Agric. 17(4):547–556. Precision agriculture.Proc. Int. Conf., 3rd, Minneapolis, MN.23–26
June 1996. ASA, CSSA, and SSSA, Madison, WI.Fridgen, J.J. 2000. Development and evaluation of unsupervised clus-
tering software for sub-field delineation of agricultural fields. M.S. Milligan, G.W. 1996. Clustering validation: Results and implications
for applied analyses. p. 341–375. In P. Arabie, L.J. Hubert, and G.thesis. Univ. of Missouri, Columbia.
Hartigan, J.A. 1975. Clustering algorithms. John Wiley & Sons, New De Soete (ed.) Clustering and classification. World Sci. Publ., River
Edge, NJ.York.
Irvin, B.J., S.J. Ventura, and B.K. Slater. 1997. Fuzzy and isodata Odeh, I.O.A., A.B. McBratney, and D.J. Chittleborough. 1992. Soil
pattern recognition with fuzzy-c-means: Application to classifica-classification of landform elements from digital terrain data in
Pleasant Valley, Wisconsin. Geoderma 77:137–154. tion and soil–landform interrelationships. Soil Sci. Soc. Am. J. 56:
505–516.Jensen, J.R. 1996. Introductory digital image processing: A remote
sensing perspective. Prentice Hall, Upper Saddle River, NJ. [RSI] Research Systems Incorporated. 1999. ENVI user’s guide. Ver-
sion 3.2. RSI, Boulder, CO.Johnson, D.E. 1998. Applied multivariate methods for data analysis.
Brooks/Cole Publ., Pacific Grove, CA. Ruspini, E.H. 1969. A new approach to clustering. Inf. Control. 15:
22–32.Kitchen, N.R., S.T Drummond, E.D. Lund, K.A. Sudduth, and G.W.
Buchleiter. 2003. Soil electrical conductivity and topography re- Stafford, J.V., R.M. Lark, and H.C. Bolam. 1998. Using yield maps
to regionalize fields into potential management units. p. 225–237.lated to yield for three contrasting soil–crop systems. Agron. J. 95:
483–495. In P.C. Robert et al. (ed.) Precision agriculture. Proc. Int. Conf.,
4th, St. Paul, MN. 19–22 July 1998. ASA, CSSA, and SSSA, Madi-Kitchen, N.R., J.J. Fridgen, K.A. Sudduth, S.T. Drummond, W.J.
Wiebold, and C.W. Fraisse. 2002. Procedures for evaluating unsu- son, WI.
Sudduth, K.A., S.T. Drummond, S.J. Birrell, and N.R. Kitchen. 1996.pervised classification to derive management zones. In P.C. Robert
et al. (ed.) Precision agriculture [CD-ROM]. Proc. Int. Conf., 6th, Analysis of spatial factors influencing crop yield. p. 129–140. In
P.C. Robert, R.H. Rust, and W.E. Larson (ed.) Precision agricul-Minneapolis, MN. 14–17 July 2002. ASA, CSSA, and SSSA, Madi-
son, WI. ture. Proc. Int. Conf., 3rd, Minneapolis, MN. 23–26 June 1996.
ASA, CSSA, and SSSA, Madison, WI.Kitchen, N.R., K.A. Sudduth, and S.T. Drummond. 1999. Soil electri-
cal conductivity as a crop productivity measure for claypan soils. Tou, J.T., and R.C. Gonzalez. 1974. Pattern recognition principles.
Addison-Wesley, Reading, MA.J. Prod. Agric. 12:607–617.
Lark, R.M. 1998. Forming spatially coherent regions by classification van Alphen, B.J., and J.J. Stoorvogel. 1998. A methodology to define
management units in support of an integrated, model-based ap-of multivariate data: An example from the analysis of maps of crop
yield. Int. J. Geogr. Inf. Sci. 12:83–98. proach to precision agriculture. p. 1267–1278. In P.C. Robert et al.
(ed.) Precision agriculture. Proc. Int. Conf., 4th, St. Paul, MN.Lark, R.M., and J.V. Stafford. 1997. Classification as a first step in
the interpretation of temporal and spatial variation of crop yield. 19–22 July 1998. ASA, CSSA, and SSSA, Madison, WI.
Wiebold, W.J., K.A. Sudduth, J.G. Davis, D.K. Shannon, and N.R.Ann. Appl. Biol. 130:111–121.
Long, D.S., G.R. Carlson, and S.D. DeGloria. 1994. Quality of field Kitchen. 1998. Determining barriers to adoption and research needs
of precision agriculture [Online]. Report to the North Centralmanagement maps. p. 251–271. In P.C. Robert, R.H. Rust, and
W.E. Larson (ed.) Site-specific management for agricultural sys- Soybean Research Program. Available at http://www.fse.missouri.
edu/mpac/pubs/parpt.pdf (verified 30 Oct. 2003). Missouri Preci-tems. Proc. Int. Conf., 2nd, Minneapolis, MN. 27–30 Mar. 1994.
ASA, CSSA, and SSSA, Madison, WI. sion Agric. Cent., Univ. of Missouri, Columbia.
Xie, X.L., and G. Beni. 1991. A validity measure for fuzzy clustering.MacMillan, R.A., W.W. Pettapiece, L.D. Watson, and T.W. Goddard.
1998. A landform segmentation model for precision farming. p. IEEE Trans. Pattern Anal. Mach. Intell. 13(8):841–847.
Zadeh, L.A. 1965. Fuzzy sets. Inf. Control. 8:338–353.1335–1346. In P.C. Robert et al. (ed.) Precision agriculture. Proc.