Evaluating Alternative Explanations
in Ecosystem Experiments
Stephen R. Carpenter,1*Jonathan J. Cole,2Timothy E. Essington,1
James R. Hodgson,3Jeffrey N. Houser,1James F. Kitchell,1
and Michael L. Pace2
1Center for Limnology, 680 North Park Street, University of Wisconsin, Madison Wisconsin 53706;2Institute of Ecosystem Studies,
Box AB Route 44A, Millbrook, New York 12545; and3Department of Biology, St. Norbert College, DePere, Wisconsin 54115, USA
Unreplicated ecosystem experiments can be ana-
lyzed by diverse statistical methods. Most of these
methods focus on the null hypothesis that there is
noresponseof agiven ecosystem toamanipulation.
We suggest that it is often more productive to
compare diverse alternative explanations (models)
for the observations. An example ispresented using
whole-lakeexperiments. When asingleexperimen-
tal lakewasexamined, wecouldnot detect effectsof
phosphorus(P) input rate, dissolved organic carbon
(DOC), and grazing on chlorophyll. When three
experimental lakes with contrasting DOC and food
manipulations, all three impacts and their interac-
tions were measurable. Focus on multiple alternatives
has important implications for design of ecosystem
experiments. If a limited number of experimental eco-
systems are available, it may be more informative to
manipulate each ecosystem differently to test alter-
natives, rather than attempt to replicate the experi-
Ecosystemresearch usesseveral approaches, includ-
ing theory, long-term studies, comparative studies,
andexperiments(PaceandGroffman 1998). Experi-
ments are unique among these approaches because
they reveal how ecosystems respond to natural or
anthropogenic perturbations. ‘‘To find out what
happenstoasystem when you interferewith it, you
have to interfere with it (not just passively observe
it)’’ (Box 1966). In this report, ecosystem experi-
tems that are large enough to contain the physical,
chemical, and biotic context of processes under
study (Carpenter 1998).
Trade-offsbetween the size of experimental units
and replication are debated in ecology. Statistical
test of the null hypothesis (the hypothesis that the
manipulation had no effect) is at the heart of these
discussions (Gotelli and Graves 1996). Our report
attempts to place the null hypothesis in perspective
as only one of many possible uses of statistics. We
summarize recent progress on statistical analysis of
ecosystem experiments, which shows that statistics
are used in diverse ways. Approaches that compare
alternative explanations may be more appropriate
and insightful than testing the null hypothesis. This
create contrasts (in time or between ecosystems)
that are likely to discriminate among key alterna-
tives. We provide an example in which multiple
alternative explanations for experimental results
are compared statistically. An important insight
emergesfrom theexample: ifmultipleexperimental
ecosystemsareavailable, itmay bebettertomanipu-
late them in ways that test alternative models than
replicatethem to test thenull hypothesis.
SCALING AND INFERENCE
The Compromise: Scale Versus Replication
There is no single optimal scale for ecosystem
experimentation, but for a given scientific problem
Received 4 November 1997; accepted 25 March 1998.
*Correspondingauthor; e-mail: email@example.com
Ecosystems(1998) 1: 335–344
1992). A sampling of the literature reveals the
diversity of opinion about scaling ecological experi-
ments(Tilman 1989; Carpenter 1996; Lawton 1996;
Lodge and others 1997; Carpenter 1998; Schindler
Ecological criteriaforchoosingexperimental scales
of the processes under study (Carpenter 1998;
Schindler 1998). Context includes larger, slower
processes that constrain the processes targeted by
the study. For example, studies of nutrient limita-
tion of grassland production may need to consider
soil development (aslow constraint) andmigrations
of large mammalian grazers (a spatially extensive
constraint). Some of the questions ecologists use to
choosethescaleof an experiment are
What scales are appropriate for the process
under study (Levin 1992)? These scales include
those of the key controlling processes. Examples
are ranges and life cycles of the dominant con-
sumers, hydrologic units (for example, water-
sheds), climate cycles, and scales of variation in
What is the scale at which results will be used
(Pace 1999)? The experimental system is in-
tended to represent some classof ecosystem (for
example, plots are used to represent a forest). A
model isused to ‘‘scale up’’ from the experimen-
tal systemtothebroaderclassofecosystems. The
model may be verbal (‘‘the forest will respond
like a larger version of the plots’’) or mathemati-
cal with varying degrees of complexity. The
closer the match in scale between the experi-
ment and its application, the simpler the model
and thefewer theassumptions.
In general, these factors tend to favor larger experi-
mental systemsstudied for longer periodsof time.
Investigations that focus on the null hypothesis
employ replicate experimental unitsto estimate the
magnitude of random variations (Hurlbert 1984). It
is easier to replicate small, brief experiments than
large, long-termones.Statistical criteria, in combina-
tion with limited research resources, often favor
small experimental systems studied for short peri-
Ecologists weigh many criteria and make many
compromiseswhen they design experiments.Ecosys-
tem experimenters usually try to match the appro-
as more important than replication. Replication
may be impossible because the system is unique,
costs or logistics are prohibitive, or ethical con-
straints preclude repetition of a manipulation (Car-
penter 1990; Schindler 1998).
The pseudoreplication debate of the 1980s re-
volved around these issues and, unfortunately, cre-
ated a great deal of confusion about large-scale
experimentsandenvironmental impact assessment.
These problems cannot be blamed on the original
article (Hurlbert 1984), which pointed out a wide-
spread problem in ecology and coined a catchy term
to describe it. Pseudoreplication occurs when the
degrees of freedom are erroneously inflated in a
statistical analysis. An unreplicated experiment is
not pseudoreplicated until an inappropriate statisti-
cal analysis is calculated. A number of statistical
analyses are, however, appropriate and insightful
for unreplicated experiments(Table1).
Scale and the Null Hypothesis
Much confusion about statistical analysis in ecosys-
tem experimentsderivesfrom failuretostateclearly
the scale of interest. Apparently conflicting posi-
tionscan resultfromdifferent, butunstated, assump-
tions about scale. Two particular scales are often
Doecosystemsin general respond nonrandomly tothis
manipulation? To answer this question, we must
measure variability among ecosystems at the scale
of the experiment. The most direct approach is to
replicate the manipulation at the ecosystem scale
(McAllister and Peterman 1992; Stewart-Oaten
1996; Olson and others 1998). Often, however,
direct replication is impossible (Carpenter 1990;
Schindler 1998). An alternative is to compare the
systems (Schindler and others 1985; Carpenter and
others 1989; Stewart-Oaten 1996). Where effects
are subtle or variability is very large, some form of
genuinereplication isessential. Fishery exploitation
experiments, where observation errors are often
larger than even substantial ecological responses,
arean exampleofasituation wheregenuinereplica-
tion is critically needed (McAllister and Peterman
1992). In many other experiments, however, obser-
vation errors and routine variability are substan-
tially smaller than ecologically interesting re-
by other research teams in other ecosystems, often
in other biomes. For example, important experi-
ments in watershed hydrogeochemistry and lake
eutrophication, acidification, and biomanipulation
have been performed in several nations (Carpenter
and others 1995). This form of replication increases
the generality of findings to a greater extent than
replication by a singlegroup at a singlesite.
S. R. Carpenter and others
Did this particular ecosystem respond nonrandomly to
manipulation? This question can be answered by
repeated observation of the experimental system in
time or by measuring the spatial variability within
the experimental ecosystem. It iscrucial to measure
lation. It is better to measure variability in both a
reference ecosystem and a manipulated ecosystem.
Stewart-Oaten and colleagues (1986) first pointed
out the statistical possibilities of a ‘‘before–after
control impact’’ (BACI) analysis.Theirinsightunder-
lies several later statistical analyses of ecosystem
experiments, most of which are based on paired
time series from before and after manipulation in
both reference and manipulated ecosystems [for
example, see Carpenter and others (1989), Carpen-
ter(1993), SchmittandOsenberg(1996), andCrome
and others (1996)]. Statistical methods for BACI
mental resultsare explainable by routine variability
of reference or manipulated ecosystems in time or
These methods do not address the applicability of
the experimental results to a broader group of
ecosystems. Generalization of the results depends
on comparativeorgradientstudies, long-termobser-
vations, and models. This would often be true even
if the experiment could be replicated. Ecosystem
scientists routinely rely on comparative and long-
term studies and models to expand the spatial and
temporal context of their findings (Pace and Groff-
man 1998). Replication does not change the need
for theseother approaches.
Magnitude of Ecological Response
Ecologists have often ignored the null hypothesis
and focused instead on theecological significanceof
the result. The question isrephrased: Isthe ecologi-
cal effect of this manipulation large in comparison
to the range known for other, similar ecosystems?
This nonstatistical approach depends on knowledge
from comparative and long-term ecological studies,
Statistical analysis can provide valuable informa-
tion about the magnitude of ecological responses.
Forexample, Carpenterandcolleagues(1996, 1998)
calculated ‘‘rules of thumb’’ for responses of lake
productivity to perturbations of phosphorus (P),
dissolved organic carbon (DOC), and grazing.
Bayesian statisticshavebeen usedto calculateprob-
ability distributions for ecosystem responses to par-
ticular perturbations (Reckhow 1990; Carpenter
and others 1996, 1998; Crome and others 1996;
Olson and others 1998). Such calculations invite
comparison among ecosystems and focus attention
on ecological importance, rather than statistical
significance, of theresults.
ing alternative explanations. The objective is to
determine which explanation is most plausible on
the basis of the data and other information perti-
nent totheexperiment. It ispossiblethat alternative
explanations are not mutually exclusive, that mul-
tiple mechanisms are operating, and that the most
likely explanation will invokemultiplecauses.
The idea that ecosystem changes are explainable
by chance alone (the null hypothesis) is only one
among many potential explanations for the results
of an ecosystem experiment. In fact, the null hypoth-
of Data from Ecosystem Experiments
Some Examplesof Statistical Analyses
Group Approach References
Olson et al. 1998
1 Schindler et al. 1985;
Carpenter et al. 1989
1 Carpenter et al. 1989
1 Green 1993
2 Intervention analysis
Box and Tiao 1975;
Rasmussen et al.
Carpenter and Kitchell
Iveset al. 1998
2 Transfer functions
3 Posterior distributions
Dynamic linear models
Carpenter et al. 1996;
Crome et al. 1996;
Olson et al. 1998
Carpenter et al. 1998
SeeSchmitt andOsenberg(1996), especially chapters1, 2, and6–9, for additional
perspectivesandexamplesof before–after control impact (BACI) pairedtimeseries
studies. Group-1 methods are based on t tests or analysis of variance. Group-2
methods are examples of time series analysis. Group-3 methods are Bayesian
Evaluating Alternative Explanations in Ecosystem Experiments
esisisoften the least relevant of the alternatives. By
the time we are ready to invest in an expensive,
large-scale experiment, there is usually little doubt
that responses will be nonrandom. Instead, we are
ecosystem components, whether any ecosystem
components respond in surprising ways, and the
most likely explanation for thechangesobserved.
We suggest that comparison and evaluation of
alternative explanations is a central goal of ecosys-
tem experimentation. Alternative explanations can
be expressed mathematically asdifferent modelsfor
the observed data. Useful statistical approaches for
comparingmodelsarewell known (KassandRaftery
1995; Hilborn and Mangel 1997) but have not yet
made major contributionsto ecosystem experimen-
Statistical Comparison of Alternative Models
Thedistinction between nestedandnonnestedmod-
els affects the choice of statistics. Two models are
nested if the more complex one can be converted to
the simpler one by fixing one or more of the
parameters. For example, consider themodels
Nt ? 1? Nt? b0[exp(b1Tt)] [b2RtNt] (1)
Nt ? 1? Nt? [b2RtNt] (2)
Nt ? 1? Nt? b0[exp(b1Tt)] (3)
These modelspredict the future size of a population
Nt ? 1from the previous value Nt; environmental
temperature Tt, the term b0[exp(b1Tt)]; and level of
a limiting resource Rt, the term [b2RtNt]. The biare
tions of Nt, Tt, and Rt. Model 2 is nested in model 1,
because if b0? 0 then model 1 is identical to model
2. By asimilarargument (setb2? 0), model 1 can be
converted to model 3, so model 3 is nested within
model 1. However, models 2 and 3 cannot be
converted to the same model by simply fixing a
parameter, and they arenot nested.
Nested models can be compared by using the
likelihood ratio LR,
LR ? L(data 0 complex model)/
L(data 0 simple model)
L (data model) is the likelihood of the data given a
specified model (Hilborn and Mangel 1997). The
best fit the data. The mathematical form of the
likelihooddependson theprobability distribution of
the deviationsbetween data and model predictions.
For the normal distribution, the likelihood of a
singledeviation Eifor a particular model Mis
L(Ei0 M) ? [exp(?Ei
2/ 2 s2)] / (2 ? s2) (5)
where s2is the estimate of the variance of all the
deviations. The likelihood of all the data given
model M, L(E 0 M), istheproduct ofall theindividual
L(Ei0M). Themodel parameters(includings2) can be
estimated by finding the values that maximize the
likelihood of all the data (Hilborn and Mangel
Thelargerthelikelihoodratiois(Eq. 4), thebetter
is the fit of the more complex model relative to the
simpler model. However, the likelihood ratio alone
does not adjust for the costs of complexity (greater
parameter variance). Is the likelihood ratio large
enough that we should prefer the more complex
model? Thisquestion can be answered by using the
likelihood ratio statistic LRS ? 2 ln(LR). The LRS
has a chi-squared distribution with degrees of free-
dom equal to the difference in number of param-
etersbetween the two models(Hilborn and Mangel
1997). The degrees of freedom account for the
differencesin complexity between themodels. Ifthe
LRS is large enough to be very improbable (accord-
ing to a chi-squared test), then the more complex
model is better. The LRS tests the null hypothesis
that the complex model fits the data no better than
the simpler model. This null hypothesis takes many
specific forms, depending on the models used to
calculate the likelihood ratio. The particular null
hypothesis that the manipulation had no effect is
only one possibility. It can be evaluated by compar-
ing a model that includes a term for the manipula-
tion effect with a simpler model that does not
include such a term. However, it is possible to
compareamuch broaderrangeofmodels, represent-
ing diverse explanations for the ecosystem re-
Nonnested models can be compared using the
AIC(E 0 M) ? ?2 ln[L(E 0 M)] ? 2 p (6)
where p is the number of parameters in the model.
Thesuperiormodel will havethelowerAIC. Several
other statisticsfor comparing nonnested modelsare
presented by Kassand Raftery (1995).
Sets of three or more models can be compared
using the pairwise likelihood ratios, AIC or similar
statisticsfor each model, or the posterior probability
of each model. Posterior probabilities measure the
relative credibility of each model in light of the data
(Kass and Raftery 1995). These probabilities are
perhaps the most useful information for scientists
S. R. Carpenter and others
but require additional assumptions and relatively
complex calculations (Reckhow 1990; Kass and
Raftery 1995). Often, the outcome is clear from
simpler statistics such as the likelihood ratio or AIC
(Hilborn and Mangel 1997).
EXAMPLE: CONTROL OF PRIMARY
PRODUCERS IN LAKES
In 1990, we began an experiment to measure the
interactive effects of nutrient input and food-web
structure on lake productivity. Because the mecha-
nisms of interest depend on lakewide fish move-
ments (Kitchell and others 1994) and physical
structure of the entire water column, it was neces-
sary to do theseexperimentsin wholelakes.
Paul Lake served asthe unmanipulated reference
ecosystem. Peter Lake’s food web was converted to
dominanceby planktivorousminnowsin 1991 (Car-
penter and others 1996). Long Lake was divided
with plastic curtains into east, central, and west
basins in 1991 (Christensen and others 1996). The
food web of West Long Lake was dominated by
piscivorous bass (Carpenter and others 1996). East
Long Lake was initially dominated by planktivores,
but fish biomass dwindled as the lake’s chemistry
changed unexpectedly. The curtain altered the hy-
drologic inputs to East Long Lake, leading to in-
creases in water color (absorbance at 440 nm) and
concentrations of DOC, and decreases in pH and
transparency (Christensen andothers1996). Begin-
ningin 1993, East Long, West Long, andPeter Lakes
were fertilized with similar concentrations of N and
P (N–P ratio 25 by atoms). Details of this ecosystem
experiment have been published elsewhere (Chris-
tensen andothers1996; Carpenter andothers1996,
1998; Paceand Cole1996; Paceand others1998).
In this example, we focus on the phytoplankton
response in East Long Lake, measured as chloro-
phyll a concentration integrated vertically from the
depth of 5% surface irradiance (Carpenter and
others 1998). How did fertilization, the unexpected
DOC increase, andthesubsequentfood-webchanges
affect chlorophyll? First, we compare alternative
models by using only East Long and Paul Lakes.
Then, we use all the lakes to compare alternative
modelsfor theobservationsin East Long Lake.
Alternative Explanations: Models
The data are time series of chlorophyll and various
factors that may affect chlorophyll. These include
the curtain (present or absent); chlorophyll in the
reference lake, input rate of P (the limiting nutrient
in theselakes), crustacean mean length (an index of
grazing), and DOC concentration [which is in-
versely related to water transparency (Carpenter
and others 1998)]. Our approach is to fit models
that predict chlorophyll concentrations and com-
pare them statistically. The model that fits best
corresponds to the most likely explanation for the
ecosystem response, among the models tested. An
untested model might givea better fit.
The models resemble regressions. The general
Yt ? 1? ?0? ?Yt? f(?,Xt) ? ?t
where subscripts denote weekly time intervals, Y is
the time series of log(chl), ?0 is the intercept
parameter estimated from the data, f is an autore-
gressive parameter estimated from the data, f(?,X)
represents a polynomial model of predictor time
series X and parameters ? to be estimated from the
data, and ? isa timeseriesof independent, normally
distributed residuals. Diagnostics (normal probabil-
ity plots, autocorrelation functions, partial autocor-
relation functions) suggested that residuals were
uncorrelated and approximately normal.
For East Long Lake alone, we considered the
following alternative models for the chlorophyll
0. Chlorophyll dynamics are explainable by ran-
dom walk around a mean.
1. Chlorophyll dynamicsare explainable by curtain
installation (indexedby avariablethatis0 when the
curtain was absent and 1 when the curtain was
2. Chlorophyll dynamics are explainable by re-
gional weather or variability in methods, as re-
flected in Paul Lake’schlorophyll dynamics.
3. Chlorophyll dynamicsareexplainableby changes
4. Chlorophyll dynamicsareexplainableby changes
in grazing intensity as indexed by crustacean mean
5. Chlorophyll dynamicsareexplainableby changes
in P input rate.
6. Chlorophyll dynamics are explained jointly by
DOC, grazing, and P input rate.
7. Chlorophyll dynamics are explained jointly by
DOC, grazing, P input rateand their interactions.
The model corresponding to each explanation is
obtained by using a particular form for f(?,X). For
example, for model 0, f? 0. For models1–5, f? ?1X
where X isthe time seriesof the appropriate predic-
tor. These models are similar to linear regressions.
For model 6, f ? X b where X is a matrix with
Evaluating Alternative Explanations in Ecosystem Experiments
columns consisting of time series for DOC, mean
crustacean, length andPinput rate, andb isavector
regression with three predictors. Model 7 is similar
to model 6 except that, in addition to the three
predictors, X containsthe productsof the predictors
(DOC *mean crustacean length, DOC *Pinput rate,
mean crustacean length * P input rate, and the
product of all three predictors) and b has seven
elements. Model 7 correspondsto a multipleregres-
sion with threepredictorsand their interactions.
For East Long, West Long, and Peter Lakes com-
bined, weconsideredtherandomwalk model(model
0), models with each predictor alone (models 3–5),
all combinations of two predictors, all three predic-
tors without interactions (model 6), and all three
predictorswith interactions(model 7).
All models that we compared use log chlorophyll
as the sole response variate and P input rate, DOC,
and crustacean length asthe predictors. P input was
manipulated directly. Although zooplankton bio-
massisaffected by P input rate, crustacean length is
not (Carpenter andothers1996). Crustacean length
isaffected by fish predation (Carpenter and Kitchell
1993) and serves as an indicator of food-web treat-
ments. DOC could be affected by P inputs(Pace and
Cole 1996), but over all lakesand yearsP input rate
andDOC arenot strongly correlated(Carpenter and
others 1998). Most of the variability in DOC is due
to hydrologic changescaused by curtain installation
(Christensen and others 1996). Thus, it is reason-
able to view DOC as an independent variate. In
other cases, it might be appropriate to fit a model
with multiple-response variates, for example, pre-
dict logchlorophyll, DOC, andzooplankton biomass
from earlier observationsof the same variatesand P
input rate. Multivariate autoregressive models are
used in such situations(Ivesand others1998).
Models were fit by minimizing the negative log
likelihood and compared by using likelihood ratios
(Hilborn and Mangel 1997). Model 0 (the simple
autoregression or random walk) was used as the
simpler model in all likelihood ratios because it has
the minimal structure necessary to fit the data. It
predicts the next sample from the current sample
plus noise. If a more complex model is worthwhile,
it must surpass this minimum benchmark. This
relatively simple approach revealed the best model
of those we compared. In other situations, addi-
tional comparisonscould beneeded.
East Long Lake: Time Series
In the2 yearsfollowing installation of thecurtain in
Long Lake, crustacean mean length and DOC con-
centrations increased (Figure 1). Nutrient enrich-
ment began in year 3 after curtain installation. The
most notablechangefollowing nutrient enrichment
was to increase the variability of chlorophyll rather
than themean (Figure1A).
DOC concentrations (Figure 1C) began to in-
crease in East Long Lake immediately after installa-
tion of the curtain (Christensen and others 1996).
There was a slight decrease in DOC in West Long
The change in DOC of East Long Lake was a
consequenceofcurtain installation.A model predict-
ing DOC from a curtain effect and autoregression
fits better than a model using autoregression alone
(likelihood ratio ? 16.9; P ? 0.05).
Figure 1. Weekly observations of selected limnological
in East Long Lake A–D and Paul Lake, the reference
ecosystem, E. The arrow showsinstallation of the curtain
dividing Long Lake. A Chlorophyll (Chl) in East Long
Lake (integrated from depth of 5% surface irradiance)
(mg m?2). B Crustacean mean body length (Crus. Len.) in
East Long Lake (mm). C Dissolved organic carbon (DOC)
concentration in the epilimnion of East Long Lake (mg
L?1). D Phosphorus (P) input rate from experimental
enrichment of East Long Lake (mg m?2day?1). E Chloro-
phyll (Ref. Chl) in Paul Lake(integratedfromdepth of5%
S. R. Carpenter and others
Crustacean body length generally increased fol-
lowing installation of the curtain (Figure 1B). In-
creasingacidity andoxygen demandassociatedwith
increasing DOC caused a decline in fish predation
on zooplankton, allowing large-bodied grazerssuch
as Daphnia pulex to dominate (Pace and others
Nutrient enrichment substantially increased P
input to East Long Lake (Figure 1D). Prior to experi-
mental enrichment, P input rates to the lake were
about 0.1 mg m?2day?1. Water-column N–P ratios
remained roughly 25 (by atoms) throughout the
study. Ammonium and nitrate accumulated in the
epilimnion in 1993–95, while phosphate did not,
suggesting that primary producerswere P limited in
theseyears. In 1996, both dissolvedinorganic N and
dissolved reactive P accumulated in the epilimnion,
suggesting that primary producers were limited by
something other than P or N. DOC isdirectly related
to light extinction in East Long Lake(Carpenter and
others1998), and it islikely that primary producers
becamelight limited in 1996.
Chlorophyll concentrations in the reference lake
(Figure 1E) allow us to assess the possibility that
some regional factor (such as weather) or inconsis-
tenciesin methodsover time could explain changes
in the experimental lakes. There are no detectable
trends in the reference lake. Variability of chloro-
phyll in thereferencelakeislower than observed in
East Long Lake following enrichment. The variabil-
ity observed in Paul Lake’s chlorophyll in 1993
derives from recruitment of a large year class of
largemouth bass, which triggered a short-lived tro-
phic cascade(Post and others1997).
Models for East Long Lake
The simple autoregression or random walk is an
adequate model for the chlorophyll time series of
East Long Lake (Figure 2). The models predicting
East Long Lake’s chlorophyll as a curtain effect or
from chlorophyll in the reference lake are no better
than the simple autoregression. Models based on
DOC and P input offered little improvement. The
model based on grazer body size was the best of the
single-factor models, but it did not improve signifi-
cantly on thesimpleautoregression. Themodel that
included P input, DOC, grazer length, and their
interactionshadthehighestoverall likelihood. How-
ever, this model requires fitting a large number of
parameters, and it does not perform as well as the
much simpler autoregressivemodel.
Initially, we were surprised by our inability to
detect responses of chlorophyll to 60-fold increases
in Pinputrate, threefoldchangesin DOC concentra-
tion, andvery largechangesin grazersize. However,
the ‘‘independent’’ variables in the analysis are not
in fact independent. The trend of increasing DOC
caused some of the changes in the grazer commu-
nity. The correlation of DOC and crustacean length
is obvious for 1990–94 but is broken up somewhat
by variablecrustacean lengthsin 1995–96(Figure1).
The changes in DOC happened to be strongly
correlatedwith nutrient enrichment (for DOC andP
input rate, r ? 0.643 and n ? 105). The correlations
of P input rate, DOC, and grazer length obscured
their effectson chlorophyll in East Long Lake.
Models for All Experimental Lakes
The correlations among P input rate, DOC, and
grazer length are small if all of the experimental
lakesare considered together (Carpenter and others
1998). All three experimental lakes were subjected
to a similar range of P enrichment rates (Figure 3).
East Long Lakehadthehighest DOC concentrations
and generally high but variable grazer length. West
Long Lake had low DOC and large grazersthrough-
out the experiment. Peter Lake had low DOC and
generally low but variablegrazerlength. Thus, there
isa DOC contrast between East Long Lake and both
experimental lakes, and a grazing contrast between
East Long Lakeand Peter Lake.
Several modelsare superior to the simple autore-
gression when all experimental lakesareconsidered
(Figure 4). The most likely model is the model that
predicts chlorophyll from P input rate, DOC, grazer
length, and all of their interactions. Its likelihood is
more than 106greater than that of the simple
autoregression, and more than 20 times greater
than that of thenext most likely model.
Predictions of the optimal model are significantly
correlated with observations (Figure 5). The three-
lake model also does a good job of predicting
Figure 2. Likelihood ratios for models fit to East Long
Laketimeseries. Each horizontal bar showsthelikelihood
of a model divided by the likelihood of the simple
autoregression [AR(1)]. The dashed line shows the mini-
mum likelihood ratio for significance at the 5% level.
DOC, dissolved organic carbon.
Evaluating Alternative Explanations in Ecosystem Experiments
chlorophyll in East Long Lake alone. There is,
however, a significant amount of variability that is
not explained by the model, has no significant
autocorrelationsor trends, andisnot explainableby
any other variable that we measured. Understand-
ing the variability in chlorophyll is as important as
understanding the trends (Carpenter and others
1998), and in some respectsremainsa challenge for
The model comparisons could have been calculated
usinglong-term datafrom ecosystemsthat werenot
experimentally manipulated. However, the manipu-
lationscreatedcontraststhat increasedour ability to
tributetoinferencesabout causality (Stewart-Oaten
1996). Argumentsabout causality hinge on a diver-
sity of evidence. Stewart-Oaten and colleagues
(1986) list anumberofpropertiesofcausal evidence
in the context of environmental impact assessment,
such as magnitude of effect, consistency among
studies, temporality (does cause precede effect?),
dose–response relationship, plausibility, coherence,
experimental evidence, and analogy (did similar
cases have similar effects?). The models presented
here involve predictors that were directly manipu-
lated (P input), indirectly manipulated (crustacean
length), andinadvertently manipulated(DOC).They
‘‘establish whether or not there is any reason to
believe that a change of a kind that could imply
causation hasreally occurred, and they estimatethe
sizeof that change’’ (Box and others1978: 604).
ing lakes was more informative than replication
would have been, if replication was possible. Using
data from East Long Lake and the reference lake
Figure 3. Chlorophyll (Chl) (integrated from depth of
5% surface irradiance, mean of July values with 95%
confidenceintervals) versusphosphorus(P) input ratefor
Paul Lakeand threeexperimentally manipulated lakes.
Figure 4. Likelihood ratios for models fit to time series
from all experimental lakes. Each horizontal bar shows
the likelihood of a model divided by the likelihood of the
simpleautoregression [AR(1)]. Thedashedlineshowsthe
minimum likelihoodratio for significanceat the5% level.
DOC, dissolved organic carbon.
Figure 5. Observed chlorophyll (Chl) versus predictions
based on the best model fit to time series from all
experimental lakes. Diagonal line shows observations ?
predictions. A Predictions for all three lakes (n ? 261).
B Predictionsfor East Long Lakeonly (n ? 105).
S. R. Carpenter and others
alone, we were unable to disentangle any effects of
the curtain installation, weather, P, DOC, and graz-
ing. The analysis for East Long Lake does not suffer
from lack of replicates. It is impaired by the corre-
lated changesin DOC, grazing, and P input.
When data from all three experimental lakes are
analyzed, it is clear that chlorophyll dynamics are
explainable by P input rate, DOC, grazing, and their
interactions. These patterns could be detected be-
cause Peter and West Long Lakes offer important
contrasts to East Long Lake. West Long Lake had
large-bodied grazers and relatively low DOC. Peter
Lake had small-bodied grazers and low DOC. The
contrast between Peter and West Long Lakes re-
vealed grazer effects. The contrast between East
Long Lakeandtheother lakesrevealedDOC effects.
The contrast in P input rates over time in all three
lakes revealed the P effect and interactions with
grazing and DOC. The three lakes were not repli-
cates. Instead, they provided contrasting treatments
that proved crucial for drawing conclusions.
Experiments designed to compare alternative
models may differ from those designed to test the
null hypothesis. When alternative models are con-
sidered, experiments will contain deliberate con-
trasts intended to differentiate among them. These
contrasts may occur sequentially in time or among
different experimental ecosystems. If multiple ex-
perimental ecosystems are available, it may be
wasteful to use them as mere replicates to test the
null hypothesis. It may be more instructive to use
the ecosystems to examine important alternative
Although comparing alternative models may of-
ten bemoreimportant than testingthenull hypoth-
esis for ecosystem experiments, there are some
situations in which replication to test the null
terized by difficulties of measuring time series or
spatial variability, and potentially subtle effects of
manipulation. Forexample, replication seemsessen-
tial for answeringsomefisheriesmanagement ques-
tions (McAllister and Peterman 1992; Olson and
others 1998). Even so, it may be important to
distribute replicates across important environmen-
tal gradients so that several alternatives can be
evaluated (Walters and others 1988; Walters and
Holling 1990). In other ecosystem experiments,
manipulation effects are large relative to routine
variability and observation errorsare small, and it is
possible to measure detailed time series or spatial
patterns. In these cases, the null hypothesis may be
lessuseful than alternativeecological models.
A CK NOW LEDGMENTS
We thank our collaborators on these whole-lake
experiments, especially K. L. Cottingham and D. E.
Schindler. Referees and D. W. Schindler provided
helpful comments. This work is supported by the
National Science Foundation and the Andrew W.
Box GEP. 1966. Use and abuse of regression. Technometrics
Box GEP, Hunter WG, Hunter JS. 1978. Statisticsfor experiment-
ers. New York: John Wiley and Sons.
Box GEP, Tiao GC. 1975. Intervention analysis with application
to economic and environmental problems. J Am Stat Assoc
Carpenter SR. 1990. Large-scale perturbations: opportunities for
innovation. Ecology 71:2038–43.
Carpenter SR. 1993. Statistical analyses of the ecosystem experi-
ments. In: Carpenter SR, Kitchell JF, editors. The trophic
cascadein lakes. London: CambridgeUniversity Press. p26–42.
Carpenter, SR. 1996. Microcosm experiments have limited rel-
evance for community and ecosystem ecology. Ecology 77:
Carpenter SR. 1998. The need for large-scale experiments to
assess and predict the response of ecosystems to perturbation.
In: Pace ML, Groffman PM, editors. Successes, limitations and
frontiers in ecosystem science. New York: Springer-Verlag.
Carpenter SR, Chisholm SW, KrebsCJ, Schindler DW, Wright RF.
1995. Ecosystem experiments. Science269:324–7.
Carpenter SR, Cole JJ, Kitchell JF, Pace ML. 1998. Impact of
dissolved organic carbon, phosphorus and grazing on phyto-
plankton biomass and production in experimental lakes. Lim-
nol Oceanogr 43:73–80.
Carpenter SR, Frost TM, Heisey D, Kratz TK. 1989. Randomized
intervention analysis and the interpretation of whole-ecosys-
tem experiments. Ecology 70:1142–52.
Carpenter SR, Kitchell JF. 1993. The trophic cascade in lakes.
Cambridge: CambridgeUniversity Press.
Carpenter SR, Kitchell JF, Cottingham KL, Schindler DE, Chris-
tensen DL, Post DM, Voichick N. 1996. Chlorophyll variability,
nutrient input and grazing: evidence from whole-lake experi-
ments. Ecology 77:725–35.
Christensen DL, Carpenter SR, Cole JJ, Cottingham KL, Knight
SE, LeBouton JP, Pace ML, Schindler DE, Voichick N. 1996.
Pelagic responses to changes in dissolved organic carbon
following division of a seepage lake. Limnol Oceanogr 41:
Cottingham KL, Carpenter SR. 1998. Population, community
and ecosystem variatesasecological indicators: phytoplankton
responseto whole-lakeenrichment. Ecol Appl. 8:508–30.
Crome FHJ, Thomas MR, Moore LA. 1996. A novel Bayesian
approach to assessing impacts of rain forest logging. Ecol Appl
Gotelli NJ, GravesGR. 1996. Null modelsin ecology. Washington
(DC): Smithsonian Institution Press.
Green RH. 1993. Application of repeated measures designs in
environmental impact and monitoring studies. Aust J Ecol
Evaluating Alternative Explanations in Ecosystem Experiments
Hilborn R, Mangel M. 1997. The ecological detective. Princeton Download full-text
(NJ): Princeton University Press.
Hurlbert SH. 1984. Pseudoreplication andthedesign ofecological
field experiments. Ecol Monogr 54:187–211.
Ives AR, Carpenter SR, Dennis B. Interactions between species
and the response of zooplankton to long-term experimental
changesin planktivory. Ecology. Forthcoming.
Kass RE, Raftery AE. 1995. Bayes factors. J Am Stat Assoc
Kitchell JF, Eby EA, He X, Schindler DE, Wright RA. 1994.
Predator–prey dynamics in an ecosystem context. J Fish Biol
Lawton, JH. 1996. TheEcotron facility at SilwoodPark: thevalue
of ‘‘big bottle’’ experiments. Ecology 77:665–9.
Levin SA. 1992. The problem of pattern and scale in ecology.
LodgeDM, BlumenshineSC, Vadeboncoeur Y. 1997. Insightsand
application of large-scale, long-term ecological observations
and experiments. In: ResetaritsWJ, Bernardo J, editors. Issues
and perspectives in experimental ecology. London: Oxford
McAllister MK, Peterman RM. 1992. Experimental design in the
management of fisheries: a review. N Am J Fish Manage
Olson M, Carpenter SR, Cunningham P, Gafny S, Herwig BR,
Nibbelink NP, Pellett T, Storlie C, Trebitz AS, Wilson KA. 1998.
Managing macrophytes to improve fish growth. Fisheries
Pace ML. 1999. Getting it right and wrong: extrapolations across
experimental scales. In: Gardner R, Kemp M, Peterson J,
Kennedy V, editors. Scaling relations in experimental ecology.
New York: Columbia University Press. Forthcoming.
Pace ML, Cole JJ. 1996. Regulation of bacteria by resources and
predation tested in whole lake experiments. Limnol Oceanogr
Pace ML, Cole JJ, Carpenter SR. 1998. Trophic cascades and
compensation: differential responses of microzooplankton in
wholelakeexperiments. Ecology 79:138–52.
Pace ML, Groffman PM, editors. 1998. Successes, limitationsand
frontiers in ecosystem science. New York: Springer-Verlag.
Post DM, Carpenter SR, Christensen DL, Cottingham KL, Hodg-
son JR, Kitchell JF, Schindler DE. 1997. Seasonal effects of
variable recruitment of a dominant piscivore on pelagic food
webstructure. Limnol Oceanogr 42:722–9.
Rasmussen PW, Heisey DM, Nordheim EV, Frost TM. 1993.
Time-series intervention analysis: unreplicated large-scale ex-
periments. In: Scheiner SM, Gurevitch J, editors. Design and
analysis of ecological experiments. New York: Chapman and
Hall. p 138–58.
Reckhow KH. 1990. Bayesian inferencein non-replicatedecologi-
cal studies. Ecology 71:2053–9.
Schindler DW. 1998. Replication versusrealism: the necessity for
ecosystem-scale experiments, replicated or not. Ecosystems 1.
Schindler DW, Mills KH, Malley DF, Findlay DL, Shearer JA,
Davies IJ, Turner MA, Linsey GA, Cruikshank DR. 1985.
Long-term ecosystem stress: the effects of years of experimen-
tal acidification on a small lake. Science228:1395–401.
Schmitt RJ, Osenberg CW, editors. 1996. Detecting ecological
impacts: concepts and applications in coastal habitats. San
Diego (CA): Academic.
Stewart-Oaten A. 1996. Problemsin theanalysisof environmen-
tal monitoring data. In: Schmitt RJ, Osenberg CW, editors.
Detecting ecological impacts: concepts and applications in
coastal habitats. San Diego (CA): Academic. p 109–31.
Stewart-Oaten A, Murdoch WW, Parker KR. 1986. Environmen-
tal impact assessment: ‘‘pseudoreplication’’ in time? Ecology
Tilman GD. 1989. Ecological experimentation: strengths and
conceptual problems. In: Likens GE, editor. Long-term studies
in ecology. New York: Springer-Verlag. p 136–57.
Walters CJ, Collie JS, Webb T. 1988. Experimental designs for
estimating transient responses to management disturbances.
Can J Fish Aquat Sci 45:530–8.
Walters CJ, Holling CS. 1990. Large-scale management experi-
mentsand learning by doing. Ecology 71:2060–8.
S. R. Carpenter and others