ArticlePDF AvailableLiterature Review

Operant Conditioning

Authors:

Abstract and Figures

Operant behavior is behavior "controlled" by its consequences. In practice, operant conditioning is the study of reversible behavior maintained by reinforcement schedules. We review empirical studies and theoretical approaches to two large classes of operant behavior: interval timing and choice. We discuss cognitive versus behavioral approaches to timing, the "gap" experiment and its implications, proportional timing and Weber's law, temporal dynamics and linear waiting, and the problem of simple chain-interval schedules. We review the long history of research on operant choice: the matching law, its extensions and problems, concurrent chain schedules, and self-control. We point out how linear waiting may be involved in timing, choice, and reinforcement schedules generally. There are prospects for a unified approach to all these areas.
Content may be subject to copyright.
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
10.1146/annurev.psych.54.101601.145124
Annu. Rev. Psychol. 2003. 54:115–44
doi: 10.1146/annurev.psych.54.101601.145124
Copyright
c
° 2003 by Annual Reviews. All rights reserved
First published online as a Review in Advance on October 4, 2002
OPERANT CONDITIONING
J. E. R. Staddon and D. T. Cerutti
Department of Psychological and Brain Sciences, Duke University,
Durham, North Carolina 27708-0086; e-mail: staddon@psych.duke.edu,
cerutti@psych.duke.edu
Key Words interval timing, choice, concurrent schedules, matching law,
self-control
Abstract Operantbehaviorisbehavior “controlled”byitsconsequences.In prac-
tice, operant conditioning is the study of reversible behavior maintained by reinforce-
ment schedules. We review empirical studies and theoretical approaches to two large
classes of operant behavior: interval timing and choice. We discuss cognitive versus
behavioral approaches to timing, the “gap” experiment and its implications, propor-
tional timing and Weber’s law, temporal dynamics and linear waiting, and the problem
of simple chain-interval schedules. We review the long history of research on operant
choice: the matching law,itsextensionsand problems, concurrent chain schedules, and
self-control. We point out how linear waiting may be involved in timing, choice, and
reinforcement schedules generally. There are prospects for a unified approach to all
these areas.
CONTENTS
INTRODUCTION ..................................................... 116
INTERVAL TIMING ...................................................118
WEBER’S LAW, PROPORTIONAL TIMING
AND TIMESCALE INVARIANCE ......................................119
Cognitive and Behavioral Approaches to Timing ...........................121
The Gap Experiment .................................................123
Temporal Dynamics: Linear Waiting .....................................125
Chain Schedules .....................................................126
CHOICE .............................................................133
Concurrent Schedules ................................................134
Concurrent-Chain Schedules ...........................................136
Self-Control ........................................................ 137
CONCLUSION .......................................................139
0066-4308/03/0203-0115$14.00
115
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
116 STADDON
¥
CERUTTI
INTRODUCTION
The term operant conditioning
1
was coined by B. F. Skinner in 1937 in the con-
text of reflex physiology, to differentiate what he was interested in—behavior that
affects the environment—from the reflex-related subject matter of the Pavlovians.
Theterm wasnovel,butits referentwasnotentirelynew.Operantbehavior,though
defined by Skinner as behavior“controlled by its consequences” is in practice little
different from what had previously been termed “instrumental learning” and what
most people would call habit. Any well-trained “operant” is in effect a habit. What
was truly new was Skinner’s method of automated training with intermittent rein-
forcementandthesubjectmatterofreinforcementschedulestowhichitled.Skinner
and his colleagues and students discovered in the ensuing decades a completely
unsuspectedrange ofpowerful andorderlyscheduleeffectsthatprovidednewtools
for understanding learning processes and new phenomena to challenge theory.
A reinforcement schedule is any procedure that delivers a reinforcer to an
organism according to some well-defined rule. The usual reinforcer is food for
a hungry rat or pigeon; the usual schedule is one that delivers the reinforcer for
a switch closure caused by a peck or lever press. Reinforcement schedules have
also been used with human subjects, and the results are broadly similar to the
results with animals. However, for ethical and practical reasons, relatively weak
reinforcers must be used—and the range of behavioral strategies people can adopt
is of course greater than in the case of animals. This review is restricted to work
with animals.
Two types of reinforcement schedule have excited the most interest. Most pop-
ular are time-based schedules such as fixedand variableinterval,in which the rein-
forcer is delivered after a fixed or variable time period after a time marker (usually
the preceding reinforcer). Ratio schedules require a fixed or variable number of
responses before a reinforcer is delivered.
Trial-by-trial versions of all these free-operant procedures exist. For example,
a version of the fixed-interval schedule specifically adapted to the study of interval
timing is the peak-interval procedure, which adds to the fixed interval an intertrial
interval (ITI) preceding each trial and a percentage of extra-long “empty” trials in
which no food is given.
Fortheoretical reasons, Skinner believed that operant behaviorought to involve
a response that can easily be repeated, such as pressing a lever, for rats, or pecking
1
The first and only previous Annual Review contribution on this topic was as part of a 1965
article, “Learning, Operant Conditioning and Verbal Learning” by Blough & Millward.
Since then there have been (by our estimate) seven articles on learning or learning theory
in animals, six on the neurobiology of learning, and three on human learning and memory,
but this is the first full Annual Review article on operant conditioning. We therefore include
rather more old citations than is customary (for more on the history and philosophy of
Skinnerian behaviorism, both pro and con, see Baum 1994, Rachlin 1991, Sidman 1960,
Staddon 2001b, and Zuriff 1985).
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 117
an illuminated disk (key) for pigeons. The rate of such behavior was thought to be
important as a measure of response strength (Skinner 1938, 1966, 1986; Killeen &
Hall 2001). The current status of this assumption is one of the topics of this review.
True or not, the emphasis on response rate has resulted in a dearth of experimental
workbyoperantconditionersonnonrecurrentbehaviorsuchasmovementinspace.
Operant conditioning differs from other kinds of learning research in one im-
portant respect. The focus has been almost exclusively on what iscalled reversible
behavior, that is, behavior in which the steady-state pattern under a given schedule
is stable, meaning that in a sequence of conditions, XAXBXC...,where each con-
dition is maintained for enough days that the pattern of behavior is locally stable,
behavior under schedule X shows a pattern after one or two repetitions of X that
is always the same. For example, the first time an animal is exposed to a fixed-
interval schedule, after several daily sessions most animals show a “scalloped”
pattern of responding (call it pattern A): a pause after each food delivery—also
calledwait time or latency—followedbyrespondingatan accelerated rate until the
next food delivery. However, some animals show negligible wait time and a steady
rate(patternB). Ifallarenow trainedonsomeotherprocedure—a variable-interval
schedule, for example—and then after several sessions are returned to the fixed-
interval schedule, almost all the animals will revert to pattern A. Thus, pattern A
is the stable pattern. Pattern B, which may persist under unchanging conditions
but does not recur after one or more intervening conditions, is sometimes termed
metastable (Staddon 1965). The vast majority of published studies in operant con-
ditioning are on behavior that is stable in this sense.
Although the theoretical issue is not a difficult one, there has been some confu-
sion about what the idea of stability (reversibility) in behavior means. It should be
obvious that the animal that shows pattern A after the second exposure to proce-
dure X is not the same animal as when it showed pattern A on the first exposure. Its
experimentalhistory is different after the second exposurethan after thefirst. If the
animal has any kind of memory, therefore, its internal state
2
following the second
exposure is likely to be different than after the first exposure, even though the
observed behavior is the same. The behavior is reversible; the organism’s internal
state in general is not. The problems involved in studying nonreversible phenom-
ena in individual organisms have been spelled out elsewhere (e.g., Staddon 2001a,
Ch. 1); this review is mainly concerned with the reversible aspects of behavior.
Once the microscope was invented, microorganisms became a new field of
investigation. Once automated operant conditioning was invented, reinforcement
schedules became an independent subject of inquiry. In addition to being of great
interest in their own right, schedules have also been used to study topics defined in
more abstract ways such as timing and choice. These two areas constitute the ma-
jority of experimental papers in operant conditioning with animal subjects during
2
By “internal” we mean not “physiological” but “hidden.” The idea is simply that the
organism’s future behavior depends on variables not all of which are revealed in its current
behavior (cf. Staddon 2001b, Ch. 7).
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
118 STADDON
¥
CERUTTI
the past two decades. Great progress has been made in understanding free-operant
choicebehaviorandinterval timing. Yet severaltheories ofchoicestill compete for
consensus, and much the same is true of interval timing. In this review we attempt
to summarize the current state of knowledge in these two areas, to suggest how
common principles may apply in both, and to show how these principles may also
apply to reinforcement schedule behavior considered as a topic in its own right.
INTERVAL TIMING
Intervaltiming is defined in severalways.The simplest is to define it ascovariation
between a dependent measure such as wait time and an independent measure such
as interreinforcement interval (on fixed interval) or trial time-to-reinforcement (on
the peak procedure). When interreinforcement interval is doubled, then after a
learning period wait time also approximately doubles (proportional timing). This
is an example of what is sometimescalled a time productionprocedure: Theorgan-
ism produces an approximation to the to-be-timed interval. There are also explicit
time discrimination procedures in which on each trial the subject is exposed to a
stimulus and is then required to respond differentially depending on its absolute
(Church & Deluty 1977, Stubbs 1968) or even relative (Fetterman et al. 1989)
duration. For example, in temporal bisection, the subject (e.g., a rat) experiences
either a 10-s or a 2-s stimulus, L or S. After the stimulus goes off, the subject is
confronted with two choices. If the stimulus was L, a press on the left lever yields
food; if S, aright press givesfood;errors produce a brief time-out. Once theanimal
has learned, stimuli of intermediate duration are presented in lieu of S and L on
test trials. The question is, how will the subject distribute its responses? In partic-
ular, at what intermediate duration will it be indifferent between the two choices?
[Answer: typically in the vicinity of the geometric mean, i.e.,
(L.S) 4.47
for 2 and 10.]
Wait time is a latency; hence (it might be objected) it may vary on time-
production procedures like fixed interval because of factors other than timing—
such as degree of hunger (food deprivation). Using a time-discrimination proce-
dure avoids this problem. It can also be mitigated by using the peak procedure
and looking at performance during “empty” trials. “Filled” trials terminate with
food reinforcement after (say) T s. “Empty” trials, typically 3T s long, contain
no food and end with the onset of the ITI. During empty trials the animal there-
fore learns to wait, then respond, then stop (more or less) until the end of the
trial (Catania 1970). The mean of the distribution of response rates averaged over
empty trials (peak time) is then perhaps a better measure of timing than wait time
because motivational variables are assumed to affect only the height and spread of
the response-rate distribution, not its mean. This assumption is only partially true
(Grace & Nevin 2000, MacEwen & Killeen 1991, Plowright et al. 2000).
There is still some debate about the actual pattern of behavior on the peak
procedure in each individual trial. Is it just wait, respond at a constant rate, then
waitagain? Or istheresomeresidual responding afterthe“stop”[yes,usually (e.g.,
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 119
Church et al. 1991)]? Is the response rate between start and stop really constant
or are there two or more identifiable rates (Cheng & Westwood 1993, Meck et al.
1984)? Nevertheless, the method is still widely used, particularly by researchers
in the cognitive/psychophysical tradition. The idea behind this approach is that
intervaltimingisakintosensoryprocessessuchastheperceptionofsoundintensity
(loudness) or luminance (brightness). As there is an ear for hearing and an eye for
seeing, so (it is assumed) there must be a (real, physiological) clock for timing.
Treisman (1963) proposed the idea of an internal pacemaker-driven clock in the
context of human psychophysics. Gibbon (1977) further developed the approach
and applied it to animal interval-timing experiments.
WEBER’S LAW, PROPORTIONAL TIMING
AND TIMESCALE INVARIANCE
The major similarity between acknowledgedsensory processes, such as brightness
perception, and interval timing is Weber’s law. Peak time on the peak procedure
is not only proportional to time-to-food (T), its coefficient of variation (standard
deviation divided by mean) is approximately constant, a result similar to Weber’s
law obeyed by most sensory dimensions. This property has been called scalar
timing (Gibbon 1977). Most recently, Gallistel & Gibbon (2000) have proposed a
grand principle of timescale invariance, the idea that the frequency distribution of
any given temporal measure (the idea is assumed to apply generally, though in fact
most experimental tests have used peak time) scales with the to-be-timed-interval.
Thus, given the normalized peak-time distribution for T =60 s, say; if the x-axis
is divided by 2, it will match the distribution for T= 30 s. In other words, the
frequency distribution for the temporal dependent variable, normalized on both
axes, is asserted to be invariant.
Timescale invarianceisineffecta combination of Weber’slawandproportional
timing. Like those principles, it is only approximately true. There are three kinds
of evidence that limit its generality. The simplest is the steady-state pattern of
responding (key-pecking or lever-pressing) observed on fixed-interval reinforce-
ment schedules. This pattern should be the same at all fixed-interval values, but it
is not. Gallistel & Gibbon wrote, “When responding on such a schedule, animals
pause after each reinforcement and then resume responding after some interval
has elapsed. It was generally supposed that the animals’ rate of responding ac-
celerated throughout the remainder of the interval leading up to reinforcement.
In fact, however, conditioned responding in this paradigm ...is a two-state vari-
able (slow, sporadic pecking vs. rapid, steady pecking), with one transition per
interreinforcement interval (Schneider 1969)” (p. 293).
This conclusion over-generalizes Schneider’s result. Reacting to reports of
“break-and-run” fixed-interval performance under some conditions, Schneider
sought to characterize this feature more objectively than the simple inspection
of cumulative records. He found a way to identify the point of maximum acceler-
ation in the fixed-interval “scallop” by using an iterative technique analogous to
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
120 STADDON
¥
CERUTTI
attaching an elastic band to the beginning of an interval and the end point of the
cumulative record, then pushing a pin, representing the break point, against the
middle of the band until the two resulting straight-line segments best fit the cu-
mulative record (there are other ways to achieve the same result that do not fix the
end points of the two line-segments). The postreinforcement time (x-coordinate)
of the pin then gives the break point for that interval. Schneider showed that the
break point is an orderly dependent measure: Break point is roughly 0.67 of inter-
val duration, with standard deviation proportional to the mean (the Weber-law or
scalar property).
This finding is by no means the same as the idea that the fixed-interval scallop is
“atwo-statevariable”(Hanson&Killeen 1981). Schneider showedthat a two-state
model is an adequate approximation; he did not show that it is the best or truest
approximation. A three- or four-line approximation (i.e., two or more pins) might
well havefit significantly better than the two-line version. To showthat the process
is two-state, Schneider would have had to show that adding additional segments
produced negligibly better fit to the data.
The frequent assertion that the fixed-interval scallop is always an artifact of
averaging flies in the face of raw cumulative-record data—the many nonaveraged
individual fixed-interval cumulative records in Ferster & Skinner (1957, e.g., pp.
159, 160, 162), which show clear curvature, particularly at longer fixed-interval
values (> 2 min). The issue for timescale invariance, therefore, is whether the
shape, or relative frequency of different-shaped records, is the same at different
absolute intervals.
The evidence is that there is more, and more frequent, curvature at longer
intervals. Schneider’s data show this effect. In Schneider’s Figure 3, for example,
the time to shift from low to high rate is clearly longer at longer intervals than
shorter ones. On fixed-interval schedules,apparently,absolute duration does affect
the pattern of responding. (A possible reason for this dependence of the scallop on
fixed-interval value is described in Staddon 2001a, p. 317. The basic idea is that
greater curvature at longer fixed-interval values follows from two things: a linear
increase in response probability across the interval, combined with a nonlinear,
negatively accelerated, relation between overall response rate and reinforcement
rate.) If there is a reliable difference in the shape, or distribution of shapes, of
cumulative records atlongandshortfixed-interval values,the timescale-invariance
principle is violated.
A second dataset that does not agree with timescale invariance is an extensive
set of studies on the peak procedure by Zeiler & Powell (1994; see also Hanson &
Killeen 1981), who looked explicitly at the effect of interval duration on various
measures of interval timing. They conclude, “Quantitative properties of temporal
control depended on whether the aspect of behavior considered was initial pause
duration, the point of maximum acceleration in responding [break point], the point
ofmaximum deceleration, the point atwhichresponding stopped, or severaldiffer-
ent statistical derivations of a point of maximum responding .... Existing theory
does not explain why Weber’s law [the scalar property] so rarely fit the results...
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 121
(p.1;seealso Loweet al. 1979, Wearden1985 for otherexceptionsto proportional-
ity between temporal measures of behavior and interval duration). Like Schneider
(1969) and Hanson & Killeen (1981), Zeiler & Powell found that the break point
measure was proportional to interval duration, with scalar variance (constant co-
efficient of variation), and thus consistent with timescale invariance, but no other
measure fit the rule.
Moreover, the fit of the breakpoint measure is problematic because it is not a
direct measure of behavior but is itself the result of a statistical fitting procedure.
It is possible, therefore, that the fit of breakpoint to timescale invariance owes as
much to the statistical method used to arrive at it as to the intrinsic properties of
temporal control. Even if this caveat turns out to be false, the fact that every other
measure studied by Zeiler & Powell failed to conform to timescale invariance
surely rules it out as a general principle of interval timing.
The third and most direct test of the timescale invariance idea is an extensive
series of time-discrimination experiments carried out by Dreyfus et al. (1988) and
Stubbs et al. (1994). The usual procedure in these experiments was for pigeons to
peck a center response key to produce a red light of one duration that is followed
immediately by a green light of another duration. When the green center-key light
goes off, two yellow side-keys light up. The animals are reinforced with food for
pecking the left side-key if the red light was longer, the right side-key if the green
light was longer.
The experimentalquestion is, how does discrimination accuracy depend on rel-
ative and absolute duration of the two stimuli? Timescale invariance predicts that
accuracy depends only on the ratio of red and green durations: For example, accu-
racy should be the same following the sequence red:10, green:20 as the sequence
red:30, green:60, but it is not. Pigeons are better able to discriminate between
the two short durations than the two long ones, even though their ratio is the
same. Dreyfus et al. and Stubbs et al. present a plethora of quantitative data of the
same sort, all showing that time discrimination depends on absolute as well as
relative duration.
Timescale invariance is empirically indistinguishable from Weber’s law as it
applies to time, combined with the idea of proportional timing: The mean of a
temporal dependent variable is proportional to the temporal independent variable.
But Weber’s law and proportional timing are dissociable—it is possible to have
proportional timing without conforming to Weber’slawand vice versa (cf. Hanson
&Killeen1981,Zeiler&Powell1994),andinanycasebothareonlyapproximately
true. Timescaleinvariancetherefore does not qualify as a principle in its own right.
Cognitive and Behavioral Approaches to Timing
The cognitive approach to timing dates from the late 1970s. It emphasizes thepsy-
chophysical properties of the timing process and the use of temporal dependent
variables as measures of (for example) drug effects and the effects of physio-
logical interventions. It de-emphasizes proximal environmental causes. Yet when
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
122 STADDON
¥
CERUTTI
timing (then called temporal control; see Zeiler 1977 for an early review) was
first discovered by operant conditioners (Pavlov had studied essentially the same
phenomenon—delay conditioning—many years earlier), the focus was on the time
marker,the stimulus that triggered the temporally correlatedbehavior.(That is one
virtue of the term control: It emphasizes the fact that interval timing behavior is
usually not free-running. It must be cued by some aspect of the environment.)
On so-called spaced-responding schedules, for example, the response is the time
marker: The subject must learn to space its responses more than T s apart to get
food. On fixed-interval schedules the time marker is reinforcer delivery; on the
peak procedure it is the stimulus events associated with trial onset. This depen-
denceonatimemarkerisespeciallyobviousontime-productionprocedures,buton
time-discrimination procedures the subject’s choice behavior must also be under
the control of stimuli associated with the onset and offset of the sample duration.
Not all stimuli are equally effective as time markers. For example, an early
study by Staddon & Innis (1966a; see also 1969) showed that if, on alternate fixed
intervals, 50% of reinforcers (F) are omitted and replaced by a neutral stimulus
(N) of the same duration, wait time following N is much shorter than after F
(the reinforcement-omission effect). Moreover, this differencepersists indefinitely.
DespitethefactthatFandNhavethesametemporalrelationshiptothereinforcer,F
ismuchmoreeffectiveas atimemarkerthan N. Noexactlycomparable experiment
has been done using the peak procedure, partly because the time marker there
involves ITI offset/trial onset rather than the reinforcer delivery, so that there is no
simple manipulation equivalent to reinforcement omission.
These effects do not depend on the type of behavior controlled by the time
marker. On fixed-interval schedules the time marker is in effect inhibitory: Re-
sponding is suppressed during the wait time and then occurs at an accelerating
rate. Other experiments (Staddon 1970, 1972), however, showed that given the ap-
propriate schedule, the time marker can control a burst of responding (rather than
a wait) of a duration proportional to the schedule parameters (temporal go–no-go
schedules) and later experiments have shown that the place of responding can be
controlled by time since trial onset in the so-called tri-peak procedure (Matell &
Meck 1999).
A theoretical review (Staddon 1974) concluded, “Temporal control by a given
time marker depends on the properties of recall and attention, that is, on the
same variables that affect attention to compound stimuli and recall in memory
experiments such as delayed matching-to-sample.” By far the most important
variable seems to be “the value of the time-marker stimulus—Stimuli of high
value...are more salient ... (p. 389), although the full range of properties that
determine time-marker effectiveness is yet to be explored.
Reinforcement omission experiments are transfer tests, that is, tests to identify
the effective stimulus. They pinpoint the stimulus property controlling interval
timing—the effective time marker—by selectively eliminating candidate proper-
ties. For example, in a definitive experiment, Kello (1972) showed that on fixed
intervalthewaittimeislongestfollowingstandardreinforcerdelivery(foodhopper
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 123
activated with food, hopper light on,houselight off,etc.).Omissionofanyofthose
elements caused the wait time to decrease, a result consistent with the hypothesis
that reinforcer delivery acquires inhibitory temporal control over the wait time.
The only thing that makes this situation different from the usual generalization
experiment is that the effects of reinforcement omission are relatively permanent.
In the usual generalization experiment, delivery of the reinforcer according to the
same schedule in the presence of both the training stimulus and the test stimuli
would soon lead all to be responded to in the same way. Not so with temporal
control: As we just saw, even though N and F events have the same temporal re-
lationship to the next food delivery, animals never learn to respond similarly after
both. The only exception is when the fixed-interval is relatively short, on the order
of 20 s or less (Starr & Staddon 1974). Under these conditions pigeons are able to
use a brief neutral stimulus as a time marker on fixed interval.
The Gap Experiment
The closest equivalent to fixed-interval reinforcement–omission using the peak
procedure is the so-called gap experiment (Roberts 1981). In the standard gap
paradigm the sequence of stimuli in a training trial (no gap stimulus) consists of
three successive stimuli: the intertrial interval stimulus (ITI), the fixed-duration
trial stimulus (S), and food reinforcement (F), which ends each training trial.
The sequence is thus ITI, S, F, ITI. Training trials are typically interspersed with
empty probe trials that last longer than reinforced trials but end with an ITI only
and no reinforcement. The stimulus sequence on such trials is ITI, S, ITI, but
the S is two or three times longer than on training trials. After performance has
stabilized, gap trials are introduced into some or all of the probe trials. On gap
trials the ITI stimulus reappears for a while in the middle of the trial stimulus.
The sequence on gap trials is therefore ITI, S, ITI, S, ITI. Gap trials do not end in
reinforcement.
What is the effective time marker (i.e., the stimulus that exerts temporal con-
trol) in such an experiment? ITI offset/trial onset is the best temporal predictor of
reinforcement: Its time to food is shorter and less variable than any other experi-
mental event. Most but not all ITIs follow reinforcement, and the ITI itself is often
variable in duration and relatively long. So reinforcer delivery is a poor temporal
predictor. The time marker therefore has something to do with the transition be-
tween ITI and trial onset, between ITI and S. Gap trials also involve presentation
of the ITI stimulus, albeit with a different duration and within-trial location than
the usual ITI, but the similarities to a regular trial are obvious. The gap experiment
is therefore a sort of generalization (of temporal control) experiment. Buhusi &
Meck (2000) presented gap stimuli more or less similar to the ITI stimulus during
probe trials and found results resembling generalization decrement, in agreement
with this analysis.
However, the gap procedure was not originally thought of as a generalization
test, nor is it particularly well designed for that purpose. The gap procedure arose
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
124 STADDON
¥
CERUTTI
directlyfromthecognitiveideathatintervaltimingbehaviorisdrivenbyaninternal
clock (Church 1978). From thispoint of viewit is perfectly natural toinquire about
the conditions under which the clock can be started or stopped. If the to-be-timed
intervalisinterrupted—agap—will theclockrestartwhen thetrialstimulusreturns
(reset)? Will it continue running during the gap and afterwards? Or will it stop and
then restart (stop)?
“Reset” corresponds to the maximum rightward shift (from trial onset) of the
response-rate peak from its usual position t s after trial onset to t+G
E
, where G
E
is the offset time (end) of the gap stimulus. Conversely, no effect (clock keeps
running) leaves the peak unchanged at t, and “stop and restart” is an intermediate
result, a peak shift to G
E
G
B
+t, where G
B
is the time of onset (beginning) of
the gap stimulus.
Bothgapduration andplacementwithinatrial havebeenvaried.Theresultsthat
have been obtained so far are rather complex (cf. Buhusi & Meck 2000, Cabeza
de Vaca et al. 1994, Matell & Meck 1999). In general, the longer the gap and the
later it appears in the trial, the greater the rightward peak shift. All these effects
can be interpreted in clock terms, but the clock view provides no real explanation
for them, because it does not specify which one will occur under a given set of
conditions. The results of gap experiments can be understood in a qualitative way
in terms of the similarity of the gap presentation to events associated with trial
onset; the more similar, the closer the effect will be to reset, i.e., the onset of a new
trial. Another resemblance between gap results and the results of reinforcement-
omission experiments is that the effects of the gap are also permanent: Behavior
on later trials usually does not differ from behavior on the first few (Roberts
1981). These effects have been successfully simulated quantitatively by a neural
network timing model (Hopson 1999, 2002) that includes the assumption that
the effects of time-marker presentation decay with time (Cabeza de Vaca et al.
1994).
The original temporal control studies were strictly empirical but tacitly ac-
cepted something like the psychophysical view of timing. Time was assumed to
be a sensory modality like any other, so the experimental task was simply to
explore the different kinds of effect, excitatory, inhibitory, discriminatory, that
could come under temporal control. The psychophysical view was formalized by
Gibbon (1977) in the context of animal studies, and this led to a static information-
processing model, scalar expectancy theory (SET: Gibbon & Church 1984, Meck
1983, Roberts 1983), which comprised a pacemaker-driven clock, working and
reference memories, a comparator, and various thresholds. A later dynamic ver-
sion added memory for individual trials (see Gallistel 1990 for a review). This
approach led to a long series of experimental studies exploring the clocklike prop-
erties of interval timing (see Gallistel & Gibbon 2000, Staddon & Higa 1999 for
reviews), but none of these studies attempted to test the assumptions of the SET
approach in a direct way.
SET was for many years the dominant theoretical approach to interval timing.
In recent years, however, its limitations, of parsimony and predictive range, have
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 125
become apparent and there are now a number of competitors such as the behav-
ioral theory of timing (Killeen & Fetterman 1988, MacEwen & Killeen 1991,
Machado 1997), spectral timing theory (Grossberg & Schmajuk 1989), neural net-
work models (Church & Broadbent 1990, Hopson 1999, Dragoi et al. 2002), and
the habituation-based multiple time scale theory (MTS: Staddon & Higa 1999,
Staddon et al. 2002). There is as yet no consensus on the best theory.
Temporal Dynamics: Linear Waiting
A separate series of experiments in the temporal-control tradition, beginning in
the late 1980s, studied the real-time dynamics of interval timing (e.g., Higa et al.
1991, Lejeune et al. 1997, Wynne & Staddon 1988; see Staddon 2001a for a
review). These experiments have led to a simple empirical principle that may have
wide application. Most of these experiments used the simplest possible timing
schedule, a response-initiated delay (RID) schedule
3
. In this schedule the animal
(e.g., a pigeon) can respond at any time, t, after food. The response changes the
key color and food is delivered after a further T s. Time t is under the control of
the animal; time T is determined by the experimenter. These experiments have
shown that wait time on these and similar schedules (such as fixed interval) is
strongly determined by the duration of the previous interfood interval (IFI). For
example, wait time will track a cyclic sequence of IFIs, intercalated at a random
point in a sequence of fixed (t+T=constant) intervals, with a lag of one interval;
a single shortIFI is followed by a short wait timein the nextinterval (the effect of a
single long interval is smaller), and so on (see Staddon et al. 2002 for a review and
other examples of temporal tracking). To a first approximation, these results are
consistent with a linear relation between wait time in IFI N + 1 and the duration
of IFI N:
t(N + 1) = a[T(N) + t(N)] + b = aI(N)+b, (1)
where I is the IFI, a is a constant less than one, and b is usually negligible. This
relation has been termed linear waiting (Wynne & Staddon 1988). The principle
is an approximation: an expanded model, incorporating the multiple time scale
theory,allowstheprincipletoaccountfortheslowereffectsofincreasesas opposed
to decreases in IFI (see Staddon et al. 2002).
Most importantly for this discussion, the linear waiting principle appears to
be obligatory. That is, organisms seem to follow the linear waiting rule even if
they delay or even prevent reinforcer delivery by doing so. The simplest example
is the RID schedule itself. Wynne & Staddon (1988) showed that it makes no
difference whether the experimenter holds delay time T constant or the sum of t+
T constant (t+T=K): Equation 1 holds in both cases, even though the optimal
(reinforcement-rate-maximizing) strategy in the first case is for the animal to set
3
When there is no response-produced stimulus change, this procedure is also called a con-
junctive fixed-ratio fixed-time schedule (Shull 1970).
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
126 STADDON
¥
CERUTTI
t equal to zero, whereas in the second case reinforcement rate is maximized so
long as t < K. Using a version of RID in which T in interval N + 1 depended on
the value of t in the preceding interval, Wynne & Staddon also demonstrated two
kinds of instability predicted by linear waiting.
The fact that linear waiting is obligatory allows us to look for its effects on
schedules other than the simple RID schedule. The most obvious application is to
ratio schedules. The time to emit a fixed number of responses is approximately
constant; hence the delay to food after the first response in each interval is also
approximatelyconstantonfixedratio(FR),ason fixed-TRID (Powell1968).Thus,
theoptimal strategyon FR,asonfixed-TRID, istorespond immediatelyafterfood.
However, in both cases animals wait before responding and, as one might expect
based on the assumption of a roughly constant interresponse time on all ratio
schedules, the duration of the wait on FR is proportional to the ratio requirement
(Powell1968), although longer than on a comparable chain-type schedule with the
same interreinforcement time (Crossman et al. 1974). The phenomenon of ratio
strain—the appearance of long pauses and even extinction on high ratio schedules
(Ferster & Skinner 1957)—may also have something to do with obligatory linear
waiting.
Chain Schedules
A chain scheduleis one in which a stimulus change, rather than primary reinforce-
ment, is scheduled. Thus, a chain fixed-interval–fixed-interval schedule is one in
which, for example, food reinforcement is followed by the onset of a red key light
in the presence of which, after a fixed interval, a response produces a change to
green. In the presence of green, food delivery is scheduled according to another
fixed interval. RID schedules resemble two-link chain schedules. The first link is
time t, before the animal responds; the second link is time T, after a response. We
may expect, therefore, that waiting time in the first link of a two-link schedule will
depend on the duration of the second link. We describe two results consistent with
this conjecture and then discuss some exceptions.
Davison(1974) studied a two-link chain fixed-interval–fixed-interval schedule.
Each cycle of the schedule began with a red key. Responding was reinforced, on
fixed-interval I
1
s, by a change in key color from red to white. In the presence of
white, food reinforcement was delivered according to fixed-interval I
2
s, followed
by reappearance of the red key. Davison varied I
1
and I
2
and collected steady-state
rate, pause, and link-duration data. He reported that when programmed second-
link duration was long in relation to the first-link duration, pause in the first link
sometimes exceeded the programmed link duration. The linear waiting predictions
for this procedure can therefore be most easily derived for those conditions where
the second link is held constant and the first link duration is varied (because under
theseconditions,thefirst-linkpausewasalwayslessthantheprogrammedfirst-link
duration). The prediction for the terminal link is
t
2
= aI
2
, (2)
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 127
where a is the proportionality constant, I
2
is the duration of the terminal-link fixed-
interval,and t
2
is the pause in the terminal link.Because I
2
is constant in this phase,
t
2
is also constant. The pause in the initial link is given by
t
1
= a(I
1
+ I
2
) = aI
1
+aI
2
, (3)
where I
1
is the duration of the first link. Because I
2
is constant, Equation 3 is a
straight line with slope a and positive y-intercept aI
2
.
Linear waiting theory can be tested with Davison’s data by plotting, for every
condition, t
1
and t
2
versus time-to-reinforcement (TTR); that is, plot pause in
each link against TTR for that link in every condition. Linear waiting makes a
straightforward prediction: All the data points for both links should lie on the
same straight line through the origin (assuming that b 0). We show this plot in
Figure 1. There is some variability, because the data points areindividualsubjects,
not averages, but points from first and second links fit the same line, and the
deviations do not seem to be systematic.
A study by Innis et al. (1993) provides a dynamic test of the linear waiting hy-
pothesis as applied to chain schedules. Innis et al. studied two-link chain schedules
with one link of fixed duration and the other varying from reinforcer to reinforcer
Figure 1 Steady-state pause duration plotted against actual time to reinforcement in
the first and second links of a two-link chain schedule. Each data point is from a single
pigeon in one experimental condition (three data points from an incomplete condition
are omitted). (From Davison 1974, Table 1)
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
128 STADDON
¥
CERUTTI
according to a triangular cycle. The dependent measure was pause in each link.
Their Figure 3, for example, shows the programmed and actual values of the sec-
ond link of the constant-cycle procedure (i.e., the first link was a constant 20 s;
the second link varied from 5 to 35 s according to the triangular cycle) as well as
the average pause, which clearly tracks the change in second-link duration with
a lag of one interval. They found similar results for the reverse procedure, cycle-
constant, in which the first link varied cyclically and the second link was constant.
The tracking was a little better in the first procedure than in the second, but in both
cases first-link pause was determined primarily by TTR.
There are some data suggesting that linear waiting is not the only factor that
determines responding on simple chain schedules. In the four conditions of Davi-
son’s experiment in which the programmed durations of the first and second links
added to a constant (120 s)—which implies a constant first-link pause according
to linear waiting—pause in the first link covaried with first-link duration, although
the data are noisy.
The alternative to the linear waiting account of responding on chain sched-
ules is an account in terms of conditioned reinforcement (also called secondary
reinforcement)—the idea that a stimulus paired with a primary reinforcer acquires
some independent reinforcing power. This idea is also the organizing principle be-
hind most theories of free-operant choice. There are some data that seem to imply
a response-strengthening effect quite apart from the linear waiting effect, but they
do not always hold up under closer inspection. Catania et al. (1980) reported that
“higher rates of pecking were maintained by pigeons in the middle component
of three-component chained fixed-interval schedules than in that component of
the corresponding multiple schedule (two extinction components followed by a
fixed-interval component)” (p. 213), but the effect was surprisingly small, given
that no responding at all was required in the first two components. Moreover,
results of a more critical control condition, chain versus tandem (rather than multi-
ple) schedule, were the opposite: Rate was generally higher in the middle tandem
component than in the second link of the chain. (A tandem schedule is one with
the same response contingencies as a chain but with the same stimulus present
throughout.)
Royalty et al. (1987) introduced a delay into the peck-stimulus-change contin-
gency of a three-link variable-interval chain schedule and found large decreases
in response rate [wait time (WT) was not reported] in both first and second links.
They concluded that “because the effect of delaying stimulus change was compa-
rable to the effect of delaying primary reinforcement in a simple variable-interval
schedule ...the results provide strong evidence for the concept of conditioned re-
inforcement” (p. 41). The implications of the Royalty et al. data for linear waiting
are unclear, however, (a) because the linear waiting hypothesis does not deal with
the assignment-of-credit problem, that is, the selection ofthe appropriate response
by the schedule. Linear waiting makes predictions about response timing—when
the operant response occurs—but not about which response will occur. Response-
reinforcer contiguity may be essential for the selection of the operant response
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 129
in each chain link (as it clearly is during “shaping”), and diminishing contiguity
may reduce response rate, but contiguity may play little or no role in the timing
of the response. The idea of conditioned reinforcement may well apply to the first
function but not to the second. (b) Moreover, Royalty et al. did not report ob-
tained time-to-reinforcement data; the effect of the imposed delay may therefore
have been via an increase in component duration rather than directly on response
rate.
Williams &Royalty(1990) explicitly compared conditioned reinforcement and
time to reinforcement as explanations for chain schedule performance in three-
link chains and concluded “that time to reinforcement itself accounts for little
if any variance in initial-link responding” (p. 381) but not timing, which was
not measured. However, these data are from chain schedules with both variable-
interval and fixed-interval links, rather than fixed-interval only, and with respect
to response rate rather than pause measures. In a later paper Williams qualified
this claim: “The effects of stimuli in a chain schedule are due partly to the time
to food correlated with the stimuli and partly to the time to the next conditioned
reinforcer in the sequence” (1997, p. 145).
The conclusion seems to be that linear waiting plays a relatively major, and
conditioned reinforcement (however defined) a relatively minor, role in the de-
termination of response timing on chain fixed-interval schedules. Linear waiting
also provides the best available account of a striking, unsolved problem with chain
schedules: the fact that in chains with several links, pigeon subjects may respond
at a low level or even quit completely in early links (Catania 1979, Gollub 1977).
On fixed-interval chain schedules with five or more links, responding in the early
links begins to extinguish and the overall reinforcement rate falls well below
the maximum possible—even if the programmed interreinforcement interval is
relatively short (e.g., 6 ×15=90 s). If the same stimulus is present in all links
(tandem schedule), or if the six different stimuli are presented in random order
(scrambled-stimuli chains), performance is maintained in all links and the overall
reinforcement rate is close to the maximum possible (6I, where I is the interval
length). Other studies have reported very weak responding in early components
of a simple chain fixed-interval schedule (e.g., Catania et al. 1980, Davison 1974,
Williams1994;reviewinKelleher&Gollub1962). Thesestudiesfoundthatchains
with as few as three fixed-interval 60-s links (Kelleher & Fry 1962) occasionally
produce extreme pausing in the first link. No formal theory of the kind that has
proliferated to explain behavior on concurrent chain schedules (discussed below)
has been offered to account for these strange results, even though they have been
well known for many years.
The informal suggestion is that the low or zero response rates maintained by
early components of a multi-link chain are a consequence of the same discrim-
ination process that leads to extinction in the absence of primary reinforcement.
Conversely, the stimulus at the end of the chain that is actually paired with primary
reinforcement is assumed to be a conditioned reinforcer; stimuli in the middle
sustain responding because they lead to production of a conditioned reinforcer
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
130 STADDON
¥
CERUTTI
(Catania et al. 1980, Kelleher & Gollub 1962). Pairing also explains why behavior
is maintained on tandem and scrambled-stimuli chains (Kelleher & Fry 1962). In
both cases the stimuli early in the chain are either invariably (tandem) or occasion-
ally (scrambled-stimulus) paired with primary reinforcement.
There are problems with the conditioned-reinforcement approach, however. It
can explain responding in link two of a three-link chain but not in link one, which
should be an extinction stimulus. The explanatory problem gets worse when more
links are added. There is no well-defined principle to tell us when a stimulus
changes from being a conditioned reinforcer, to a stimulus in whose presence
responding is maintained by a conditioned reinforcer, to an extinction stimulus.
Whatdeterminesthestimulusproperty?Isitstimulus number,stimulus duration or
the durations of stimuli later in the chain? Perhaps there is some balance between
contrast/extinction, which depresses responding in early links, and conditioned
reinforcement,whichissupposedto(butsometimesdoesnot)elevaterespondingin
later links? No well-defined compound theory has been offered, even though there
are several quantitative theories for multiple-schedule contrast (e.g., Herrnstein
1970,Nevin 1974,Staddon1982;see review inWilliams1988).Thereare alsodata
that cast doubt even on the idea that late-link stimuli have a rate-enhancing effect.
In the Catania et al. (1980) study, for example, four of five pigeons responded
faster in the middle link of a three-link tandem schedule than the comparable
chain.
The lack of formal theories for performance on simple chains is matched by a
dearth of data. Some pause data are presented in the study by Davison (1974) on
pigeons in a two-link fixed-interval chain. The paper attempted to fit Herrnstein’s
(1970) matching law between response rates and link duration. The match was
poor: The pigeon’s rates fell more than predicted when the terminal links (con-
tiguous with primary reinforcement) of the chain were long, but Davison did find
that “the terminal link schedule clearly changes the pause in the initial link, longer
terminal-link intervals giving longer initial-link pauses” (1974, p. 326). Davison’s
abstract concludes, “Data on pauses during the interval schedules showed that, in
most conditions, the pause duration was a linear function of the interval length,
and greater in the initial link than in the terminal link” (p. 323). In short, the pause
(time-to-first-response) data were more lawful than response-rate data.
Linear waiting provides a simple explanation for excessive pausing on multi-
link chain fixed-interval schedules. Suppose the chief function of the link stimuli
on chain schedules is simply to signal changing times to primary reinforcement
4
.
4
This idea surfaced very early in the history of research on equal-link chain fixed-interval
schedules, but because of the presumed importance of conditioned reinforcement, it was
the time to reinforcement from link stimulus offset, rather than onset that was thought to
be important. Thus, Gollub (1977), echoing his 1958 Ph.D. dissertation in the subsequent
Kelleher& Gollub (1962)review, wrote, “Inchained schedules withmore than two compo-
nents ...the extent to which responding is sustained in the initial components ...depends
on the time that elapses from the end of the components to food reinforcement” (p. 291).
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 131
Thus, in a three-link fixed-interval chain, with link duration I, the TTR signaled
by the end of reinforcement (or by the onset of the first link) is 3I. The onset
of the next link signals a TTR of 2I and the terminal, third, link signals a TTR
of I. The assumptions of linear waiting as applied to this situation are that paus-
ing (time to first response) in each link is determined entirely by TTR and that
the wait time in interval N+1 is a linear function of the TTR in the preceding
interval.
To see the implications of this process, consider again a three-link chain sched-
ule with I =1 (arbitrary time units). The performance to be expected depends
entirely on the value of the proportionality constant, a, that sets the fraction of
time-to-primary-reinforcement that the animal waits (for simplicity we can ne-
glect b; the logic of the argument is unaffected). All is well so long as a is less
than one-third. If a is exactly 0.333, then for unit link duration the pause in the
third link is 0.33, in the second link 0.67, and in the first link 1.0 However, if a
is larger, for instance 0.5, the three pauses become 0.5, 1.0, and 1.5; that is, the
pause in the first link is now longer than the programmed interval, which means
the TTR in the first link will be longer than 3 the next time around, so the pause
will increase further, and so on until the process stabilizes (which it always does:
First-link pause never goes to ).
The steady-state wait times in each link predicted for a five-link chain, with
unit-duration links, for two values of a are shown in Figure 2. In both cases wait
times in the early links are very much longer than the programmed link duration.
Clearly, this process has the potential to produce very large pauses in the early
links of multilink-chain fixed-interval schedules and so may account for the data
Catania (1979) and others have reported.
Gollubinhisdissertationresearch(1958)noticedtheadditivityofthissequential
pausing. Kelleher & Gollub (1962) in their subsequent review wrote, “No two
pauses in [simple fixed interval]can both postpone food-delivery; however, pauses
in different components of [a] five-component chain will postpone food-delivery
additively”(p.566).However,thisadditivitywasonlyoneofanumberofprocesses
suggested to account for the long pauses in early chain fixed-interval links, and its
quantitative implications were never explored.
Note that the linear waiting hypothesis also accounts for the relative stability of
tandem schedules and chain schedules with scrambled components. In the tandem
schedule, reinforcement constitutes the only available time marker. Given that
responding after the pause continues at a relatively high rate until the next time
marker, Equation 1 (with b assumed negligible) and a little algebra shows that the
steady-state postreinforcement pause for a tandem schedule with unit links will be
t =
a(N 1)
1 a
, if t 1, (4)
whereNisthenumberoflinksandaisthepausefraction.In theabsenceofanytime
markers, pauses in links after the first are necessarily short, so the experienced link
duration equals the programmed duration. Thus, the total interfood-reinforcement
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
132 STADDON
¥
CERUTTI
Figure 2 Wait time (pause, time to first response) in each equal-duration link of a
five-link chain schedule (as a multiple of the programmed link duration) as predicted
by the linear-waiting hypothesis. The two curves are for two values of parameter a in
Equation 1 (b =0). Note the very long pauses predicted in early links—almost two
orders of magnitude greater than the programmed interval in the first link for a=0.67.
(From Mazur 2001)
interval will be t + N 1(t1): the pause in the first link (which will be
longer than the programmed link duration for N > 1/a) plus the programmed
durations of the succeeding links. For the case of a = 0.67 and unit link duration,
which yielded a steady-state interfood interval (IFI) of 84 for the five-link chain
schedule, the tandem yields 12. For a= 0.5, the two values are approximately 16
and 8.
The long waits in early links shownin Figure 2 depend critically on the valueof
a. If, as experience suggests (there has been no formal study), a tends to increase
slowlywith training, we might expect the long pausing in initial links to take some
time to develop, which apparently it does (Gollub 1958).
On the scrambled-stimuli chain each stimulus occasionally ends in reinforce-
ment, so each signals a time-to-reinforcement (TTR)
5
of I, and pause in each link
should be less than the link duration—yielding a total IFI of approximately N,
5
Interpreted as time to the first reinforcement opportunity.
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 133
i.e., 5 for the example in the figure. These predictions yield the order IFI in the
chain > tandem > scrambled, but parametric data are not available for precise
comparison. We do not know whether an N-link scrambled schedule typically sta-
bilizes at a shorter IFI than the comparable tandem schedule, for example. Nor do
we know whether steady-state pause in successive links of a multilink chain falls
off in the exponential fashion shown in Figure 2.
In the final section we explore the implications of linear waiting for studies of
free-operant choice behavior.
CHOICE
Although we can devote only limited space to it, choice is one of the major re-
search topics in operant conditioning (see Mazur 2001, p. 96 for recent statistics).
Choice is not something that can be directly observed. The subject does this or
that and, in consequence, is said to choose. The term has unfortunate overtones of
consciousdeliberation and weighing of alternativesfor whichthebehavioritself—
response A or response B—provides no direct evidence. One result has been the
assumption that the proper framework for all so-called choice studies is in terms
of response strength and the value of the choice alternatives. Another is the as-
sumption that procedures that are very differentare nevertheless studying the same
thing.
For example, in a classic series of experiments, Kahneman & Tversky (e.g.,
1979) asked a number of human subjects to make a single choice of the following
sort: between $400 for sure and a 50% chance of $1000. Most went for the sure
thing, even though the expected value of the gamble is higher. This is termed risk
aversion, and the same term has been applied to free-operant “choice” experi-
ments. In one such experiment an animal subject must choose repeatedly between
aresponseleadingtoa fixedamountoffoodandoneleadingequiprobablytoeither
a large or a small amount with the same average value. Here the animals tend to be
either indifferent or risk averse, preferring the fixed alternative (Staddon & Innis
1966b, Bateson & Kacelnik 1995, Kacelnik & Bateson 1996).
In a second example pigeons responded repeatedly to two keys associated with
equal variable-interval schedules. A successful response on the left key, for exam-
ple,isreinforced byachange inthecolorof thepeckedkey(the otherkeylightgoes
off). In the presence of this second stimulus, food is deliveredaccordingto a fixed-
intervalschedule(fixed-intervalX).Thefirststimulus,whichisusuallythesameon
both keys, is termed the initial link; the second stimulus is the terminal link. Pecks
on the right key lead in the same way tofood reinforcement on variable-intervalX.
(Thisistermed a concurrent-chainschedule.)Inthiscasesubjects overwhelmingly
prefer the initial-link choice leading to the variable-interval terminal link; that is,
they are apparently risk seeking rather than risk averse (Killeen 1968).
The fact that these three experiments (Kahneman & Tversky and the two free-
operantstudies) all produce differentresultsissometimesthoughtto pose a serious
researchproblem,but,wecontend,theproblem is only in theuseofthetermchoice
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
134 STADDON
¥
CERUTTI
for all three. The procedures (not to mention the subjects) are in fact verydifferent,
and in operant conditioning the devilis very much in the details. Apparently trivial
proceduraldifferencescansometimesleadtowildlydifferentbehavioraloutcomes.
Use of the term choice as if it denoted a unitary subject matter is therefore highly
misleading. We also question the idea that the results of choice experiments are
always best explained in terms of response strength and stimulus value.
Concurrent Schedules
Bearing these caveats in mind, let’s look briefly at the extensive history of free-
operant choice research. In Herrnstein’s seminal experiment (1961; see Davison
& McCarthy 1988, Williams 1988 for reviews; for collected papers see Rachlin
& Laibson 1997) hungry pigeons pecked at two side-by-side response keys, one
associated with variable-interval v
1
s and the other with variable-intervalv
2
s(con-
current variable-interval–variable-interval schedule). After several experimental
sessions and a range of v
1
and v
2
values chosen so that the overall programmed
reinforcement rate was constant (1/v
1
+ 1/v
2
=constant), the result was matching
between steady-state relative response rates and relative obtained reinforcement
rates:
x
y
=
R(x)
R(y)
, (5)
where x and y are the response rates on the two alternatives and R(x) and R(y) are
the rates of obtained reinforcement for them. This relation has become known as
Herrnstein’s matching law. Although the obtained reinforcement rates are depen-
dent on the response rates that produce them, the matching relation is not forced,
because x and y can vary over quite a wide range without much effect on R(x) and
R(y).
Because of the negative feedback relation intrinsic to variable-interval sched-
ules (the less you respond, the higher the probability of payoff), the matching
law on concurrent variable-interval–variable-interval is consistent with reinforce-
ment maximization (Staddon & Motheral 1978), although the maximum of the
function relating overall payoff, R(x) +R(y), to relative responding, x/(x +y), is
pretty flat. However, little else on these schedules fits the maximization idea. As
noted above, even responding on simple fixed-T response-initiated delay (RID)
schedules violates maximization. Matching is also highly overdetermined, in the
sense that almost any learning rule consistent with the law of effect—an in-
crease in reinforcement probability causes an increase in response probability—
will yield either simple matching (Equation 5) or its power-law generalization
(Baum 1974, Hinson & Staddon 1983, Lander & Irwin 1968, Staddon 1968).
Matching by itself therefore reveals relatively little about the dynamic processes
operating in the responding subject (but see Davison & Baum 2000). Despite
this limitation, the strikingly regular functional relations characteristic of free-
operant choice studies have attracted a great deal of experimental and theoretical
attention.
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 135
Herrnstein (1970) proposed that Equation 5 can be derived from the function
relating steady-state response rate, x, and reinforcement rate, R(x), to each re-
sponse key considered separately. This function is negatively accelerated and well
approximated by a hyperbola:
x =
kR(x)
R(x)+R
0
, (6)
where k is a constant and R
0
represents the effects of all other reinforcers in
the situation. The denominator and parameter k cancel in the ratio x/y, yielding
Equation 5 for the choice situation.
There are numerous empirical details that are not accounted for by this for-
mulation: systematic deviationsfrom matching [undermatching and overmatching
(Baum1974)] asafunction ofdifferenttypesofvariable-intervalschedules,depen-
dence of simple matching on use of a changeover delay, extensions to concurrent-
chain schedules, and so on. For example, if animals are pretrained with two al-
ternatives presented separately, so that they do not learn to switch between them,
when given the opportunity to respond to both, they fixate on the richer one rather
than matching [extreme overmatching (Donahoe & Palmer 1994, pp. 112–113;
Gallistel & Gibbon 2000, pp. 321–322)]. (Fixation—extreme overmatching—is,
trivially, matching, of course butif only fixation were observed, the idea of match-
ing would never have arisen. Matching implies partial, not exclusive, preference.)
Conversely, in the absence of a changeover delay, pigeons will often just alternate
between two unequal variable-interval choices [extreme undermatching (Shull &
Pliskoff 1967)]. In short, matching requires exactly the right amount of switching.
Nevertheless, Herrnstein’s idea of deriving behavior in choice experiments from
the laws that govern responding to the choice alternatives in isolation is clearly
worth pursuing.
Inanyevent, Herrnstein’s approach—molar data, predominantly variable-inter-
val schedules, rate measures—set the basic pattern for subsequent operant choice
research. It fits the basic presuppositions of the field: that choice is about response
strength, that response strength is equivalent to response probability, and that
response rate is a valid proxy for probability (e.g., Skinner 1938, 1966, 1986;
Killeen & Hall 2001). (For typical studies in this tradition see, e.g., Fantino 1981;
Grace 1994; Herrnstein 1961, 1964, 1970; Rachlin et al. 1976; see also Shimp
1969, 2001.)
We can also look at concurrent schedules in terms of linear waiting. Although
published evidence is skimpy, recent unpublished data (Cerutti & Staddon 2002)
show that even on variable-interval schedules (which necessarily always contain a
fewvery short interfood intervals), postfood wait time andchangeover time covary
with mean interfood time. It has also long been known that Equation 6 can be
derived from two time-based assumptions:that the number of responses emitted is
proportional to the number of reinforcers receivedmultiplied by the available time
and that available time is limited by the time taken up by each response (Staddon
1977, Equations 23–25). Moreover, if we define mean interresponse time as the
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
136 STADDON
¥
CERUTTI
reciprocal of mean response rate,
6
x, and mean interfood interval is the reciprocal
of obtained reinforcement rate, R(x), then linear waiting yields
1
/
x
=
a
/
R(x)
+ b,
where a and b are linear waiting constants. Rearranging yields
x =
1
b
R(x)
a
b
+ R(x)
, (7)
where 1/b =k and a/b =R
0
in Equation 6. Both these derivations of the hyperbola
in Equation 6 from a linear relation in the time domain imply a correlation between
parameters k and R
0
in Equation 6 under parametric experimental variation of pa-
rameter b by (for example)varyingresponseeffortor,possibly,hunger motivation.
Such covariation has been occasionally but not universally reported (Dallery et al.
2000, Heyman & Monaghan 1987, McDowell & Dallery 1999).
Concurrent-Chain Schedules
Organisms can be trained to choose between sources of primary reinforcement
(concurrent schedules) or between stimuli that signal the occurrence of primary
reinforcement (conditioned reinforcement: concurrent chain schedules). Many ex-
perimental and theoretical papers on conditioned reinforcement in pigeons and
rats have been published since the early 1960s using some version of the concur-
rent chains procedure of Autor (1960, 1969). These studies have demonstrated
a number of functional relations between rate measures and have led to several
closely related theoretical proposals such as a version of the matching law, incen-
tive theory, delay-reduction theory, and hyperbolic value-addition (e.g., Fantino
1969a,b; Grace 1994; Herrnstein 1964; Killeen 1982; Killeen & Fantino 1990;
Mazur 1997, 2001; Williams 1988, 1994, 1997). Nevertheless, there is as yet no
theoretical consensus on how best to describe choice between sources of condi-
tioned reinforcement, and no one has proposed an integrated theoretical account
of simple chain and concurrent chain schedules.
Molar response rate does not capture the essential feature of behavior on fixed-
intervalschedules: the systematic pattern of rate-change in each interfood interval,
the “scallop.” Hence, the emphasis on molar response rate as a dependent variable
has meant that work on concurrent schedules has emphasized variable or random
intervals over fixed intervals. We lack any theoretical account of concurrent fixed-
interval–fixed-interval and fixed-interval–variable-interval schedules. However, a
recent study by Shull et al. (2001; see also Shull 1979) suggests that response
rate may not capture what is going on even on simple variable-interval schedules,
where the time to initiate bouts of relatively fixed-rate responding seems to be a
6
It is not of course: The reciprocal of the mean IRT is the harmonic mean rate. In practice,
“mean response rate” usually means arithmetic mean, but note that harmonic mean rate
usually works better for choice data than the arithmetic mean (cf. Killeen 1968).
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 137
more sensitive dependent measure than overall response rate. More attention to
the role of temporal variables in choice is called for.
We conclude with a brief account of how linear waiting may be involved in
several well-established phenomena of concurrent-chain schedules: preference for
variable-interval versus fixed-interval terminal links, effect of initial-link duration,
and finally, so-called self-control experiments.
PREFERENCE FOR VARIABLE-INTERVAL VERSUS FIXED-INTERVAL TERMINAL LINKS On
concurrent-chain schedules with equal variable-interval initial links, animals show
a strong preference for the initial link leading to a variable-interval terminal link
over the terminal-link alternative with an equal arithmetic-mean fixed interval.
This result is usually interpreted as a manifestation of nonarithmetic (e.g., har-
monic) reinforcement-rate averaging (Killeen 1968), but it can also be interpreted
as linear waiting. Minimum TTR is necessarily much less on the variable-interval
than on the fixed-interval side, because some variable intervals are short. If wait
time is determined by minimum TTR—hence shorter wait times on the variable-
interval side—and ratios of wait times and overall response rates are (inversely)
correlated (Cerutti & Staddon 2002), the result will be an apparent bias in favor
of the variable-interval choice.
EFFECT OF INITIAL-LINK DURATION Preference for a given pair of terminal-link
schedules depends on initial link duration. For example, pigeons may approxi-
mately match initial-link relativeresponse rates to terminal-link relativereinforce-
ment rates when the initial links are 60 s and the terminal links range from 15 to
45 s (Herrnstein 1964), but they will undermatch when the initial-link schedule
is increased to, for example, 180 s. This effect is what led to Fantino’s delay-
reduction modification of Herrnstein’s matching law (see Fantino et al. 1993 for
a review). However, the same qualitative prediction follows from linear waiting:
Increasing initial-link duration reduces the proportional TTR difference between
the two choices. Hence the ratio of WTs or of initial-link response rates for the
two choices should also approach unity, which is undermatching. Several other
well-studied theories of concurrent choice, such as delay reduction and hyperbolic
value addition, also explain these results.
Self-Control
The prototypical self-control experiment has a subject choosing between two out-
comes: not-so-good cookie now or a good cookie after some delay (Rachlin &
Green 1972; see Logue 1988 for a review; Mischel et al. 1989 reviewed human
studies). Typically, the subject chooses the immediate, small reward, but if both
delays are increased by the same amount, D, he will learn to choose the larger
reward, providing D is long enough. Why? The standard answer is derived from
Herrnstein’s matching analysis (Herrnstein 1981) and is called hyperbolic dis-
counting (see Mazur 2001 for a review and Ainslie 1992 and Rachlin 2000 for
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
138 STADDON
¥
CERUTTI
longer accounts). The idea is that the expected value of each reward is inversely
related to the time at which it is expected according to a hyperbolic function:
V
i
=
A
i
1 + kD
i
, (8)
where A
i
is the undiscounted value of the reward, D
i
is the delay until reward is
received, i denotes the large or small reward, and k is a fitted constant.
Nowsupposewe setD
L
andD
S
tovaluessuch thattheanimalshowsapreference
for the shorter, sooner reward. This would be the case (k=1) if A
L
=6, A
S
=2,
D
L
= 6s,andD
S
=1s:V
L
=0.86 and V
S
=1—preference for the small, less-
delayed reward. If 10 s is added to both delays, so that D
L
= 16 s and D
S
=
11 s, the values are V
L
=0.35 and V
S
=0.17—preference for the larger reward.
Thus, Equation 8 predicts that added delay—sometimes awkwardly termed pre-
commitment—should enhance self-control, which it does.
The most dramatic prediction from this analysis was made and confirmed by
Mazur (1987, 2001) in an experiment that used an adjusting-delay procedure (also
termed titration). A response on the center key started each trial, and then a
pigeon chose either a standard alternative (by pecking the red key) or an adjusting
alternative (by pecking the green key) ...the standard alternative delivered2sof
accesstograinaftera10-sdelay,andtheadjustingalternativedelivered6sofaccess
to grain after an adjusting delay” (2001, p. 97). The adjusting delay increased (on
the next trial) when it was chosen and decreased when the standard alternative was
chosen. (See Mazur 2001 for other procedural details.) The relevant independent
variableis TTR. The discounted valueofeach choice is givenby Equation 8. When
the subject is indifferent does not discriminate between the two choices, V
L
= V
S
.
Equating Equation 8 for the large and small choices yields
D
L
=
A
L
A
S
· D
S
+
A
L
A
S
kA
S
; (9)
that is, an indifference curve that is a linearfunction relating D
L
and D
S
, with slope
A
L
/A
S
>1 and a positive intercept. The data (Mazur 1987; 2001, Figure 2) are
consistent with this prediction, but the intercept is small.
It is also possible to look at this situation in terms of linear waiting. One as-
sumption is necessary: that the waiting fraction, a, in Equation 1 is smaller when
the upcoming reinforcer is large than when it is small (Powell 1969 and Perone &
Courtney 1992 showed this for fixed-ratio schedules; Howerton & Meltzer 1983,
for fixed-interval). Given this assumption, the linear waiting analysis is even sim-
pler than hyperbolic discounting. The idea is that the subject will appear to be
indifferent when the wait times to the two alternatives are equal. According to
linear waiting, the wait time for the small alternative is given by
t
S
= a
S
D
S
+ b
S
, (10)
where b
S
is a small positive interceptand a
S
> a
L
. Equating the waittimes for small
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 139
and large alternatives yields
D
L
=
a
S
a
L
· D
S
+
b
S
b
L
a
L
, (11)
which is also a linear function with slope >1 and a small positive intercept.
Equations 9 and 11 are identical in form. Thus, the linear waiting and hy-
perbolic discounting models are almost indistinguishable in terms of these data.
However, the linear waiting approach has three potential advantages: Parameters
a and b can be independently measured by making appropriate measurements in
a control study that retains the reinforcement-delay properties of the self-control
experiments without the choice contingency; the linear waiting approach lacks the
fitted parameter k in Equation 9; and linear waiting also applies to a wide range of
time-production experiments not covered by the hyperbolic discounting approach.
CONCLUSION
Temporalcontrol maybeinvolvedinunsuspectedwaysin awidevarietyofoperant
conditioning procedures. A renewed emphasis on the causal factors operating in
reinforcement schedules may help to unify research that has hitherto been defined
in terms of more abstract topics like timing and choice.
ACKNOWLEDGMENTS
We thank Catalin Buhusi and Jim Mazur for comments on an earlier version and
the NIMH for research support over many years.
The Annual Review of Psychology is online at http://psych.annualreviews.org
LITERATURE CITED
Ainslie G. 1992. Picoeconomics: The Strategic
InteractionofSuccessiveMotivationalStates
Within the Person. Cambridge, MA: Harvard
Univ. Press
AutorSM.1960.Thestrengthofconditionedre-
inforcersasafunctionoffrequencyandprob-
ability of reinforcement. PhDthesis. Harvard
Univ., Cambridge, MA
Autor SM. 1969. The strength of conditioned
reinforcers and a function of frequency and
probability of reinforcement. In Conditioned
Reinforcement, ed. DP Hendry, pp. 127–62.
Homewood, IL: Dorsey
Bateson M, Kacelnik A. 1995. Preferences for
fixed and variable food sources: variability
in amount and delay. J. Exp. Anal. Behav.
63:313–29
Baum WM. 1974. On two types of deviation
fromthematchinglaw:biasandundermatch-
ing. J. Exp. Anal. Behav. 22:231–42
BaumWM.1994.UnderstandingBehaviorism:
Science, Behavior and Culture. New York:
HarperCollins
Blough DS, Millward RB. 1965. Learning: op-
erantconditioningandverballearning.Annu.
Rev. Psychol. 17:63–94
Buhusi CV, Meck WH. 2000. Timing for the
absence of the stimulus: the gap paradigm
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
140 STADDON
¥
CERUTTI
reversed. J.Exp.Psychol.:Anim. Behav.Pro-
cess. 26:305–22
Cabeza de Vaca S, Brown BL, Hemmes NS.
1994. Internal clock and memory processes
in animal timing. J. Exp. Psychol.:Anim. Be-
hav. Process. 20:184–98
Catania AC. 1970. Reinforcement schedules
and psychophysical judgments: a study of
some temporal properties of behavior.In The
Theory of Reinforcement Schedules, ed. WN
Schoenfeld, pp. 1–42. New York: Appleton-
Century-Crofts
CataniaAC.1979.Learning.Englewood Cliffs,
NJ: Prentice-Hall
Catania AC, Yohalem R, Silverman PJ. 1980.
Contingency and stimulus change in chained
schedulesof reinforcement. J.Exp.Anal. Be-
hav. 5:167–73
Cerutti DT, Staddon JER. 2002. The tem-
poral dynamics of choice: concurrent and
concurrent-chainintervalschedules.Submit-
ted
ChengK,WestwoodR.1993.Analysisofsingle
trialsin pigeons’timing performance.J.Exp.
Psychol.: Anim. Behav. Process. 19:56–67
Church RM. 1978. The internal clock. In Cog-
nitive Processes in Animal Behavior, ed. SH
Hulse, H Fowler, WK Honig, pp. 277–310.
Hillsdale, NJ: Erlbaum
Church RM, Broadbent HA. 1990. Alterna-
tive representationsof time,number andrate.
Cognition 37:55–81
Church RM, Deluty MZ. 1977. Bisection of
temporal intervals. J. Exp. Psychol.: Anim.
Behav. Process. 3:216–28
Church RM, Miller KD, Meck WH. 1991.
Symmetrical and asymmetrical sources of
variance in temporal generalization. Anim.
Learn. Behav. 19:135–55
Crossman EK, Heaps RS, Nunes DL, Alferink
LA.1974.Theeffectsofnumberofresponses
on pause length with temporal variables con-
trolled. J. Exp. Anal. Behav. 22:115–20
Dallery J, McDowell JJ, Lancaster JS. 2000.
Falsification of matching theory’s account of
single-alternativeresponding:Herrnstein’sK
varies with sucrose concentration. J. Exp.
Anal. Behav. 73:23–43
Davison M. 1974. A functional analysis of
chained fixed-interval schedule performa-
nce. J. Exp. Anal. Behav. 21:323–30
Davison M, Baum W. 2000. Choice in a vari-
ableenvironment:Everyreinforcercounts. J.
Exp. Anal. Behav. 74:1–24
Davison M, McCarthy D. 1988. The Matching
Law: A Research Review. Hillsdale, NJ: Erl-
baum
Donahoe JW, Palmer DC. 1994. Learning and
Complex Behavior. Boston: Allyn & Bacon
Dragoi V, Staddon JER, Palmer RG, Buhusi
VC. 2002. Interval timing as an emergent
learning property. Psychol. Rev. In press
Dreyfus LR, Fetterman JG, Smith LD, Stubbs
DA. 1988. Discrimination of temporal rela-
tions by pigeons. J. Exp. Psychol.: Anim. Be-
hav. Process. 14:349–67
Fantino E. 1969a. Choice and rate of reinforce-
ment. J. Exp. Anal. Behav. 12:723–30
Fantino E. 1969b. Conditioned reinforcement,
choice, and the psychological distance to re-
ward. In Conditioned Reinforcement, ed. DP
Hendry, pp. 163–91. Homewood, IL: Dorsey
Fantino E. 1981. Contiguity, response strength,
and the delay-reduction hypothesis. In Ad-
vances in Analysis of Behavior: Predictabil-
ity, Correlation, and Contiguity, ed. P Har-
zem, M Zeiler, 2:169–201. Chichester, UK:
Wiley
FantinoE,PrestonRA,DunnR.1993.Delayre-
duction: current status. J. Exp. Anal. Behav.
60:159–69
Ferster CB, Skinner BF. 1957. Schedules of Re-
inforcement. New York: Appleton-Century-
Crofts
Fetterman JG, Dreyfus LR, Stubbs DA. 1989.
Discrimination of duration ratios. J. Exp.
Psychol.: Anim. Behav. Process. 15:253–63
Gallistel CR. 1990. TheOrganizationof Learn-
ing. Cambridge, MA: MIT/Bradford
Gallistel CR, Gibbon J. 2000. Time, rate, and
conditioning. Psychol. Rev. 107:289–344
Gibbon J. 1977. Scalar expectancy theory and
Weber’s law in animal timing. Psychol. Rev.
84:279–325
Gibbon J, Church RM. 1984. Sources of vari-
ance in an information processing theory of
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 141
timing. In Animal Cognition, ed. HL Roit-
blat, TG Bever, HS Terrace. Hillsdale, NJ:
Erlbaum.
GollubLR. 1958.Thechainingof fixed-interval
schedules. Unpublished doctoral disserta-
tion, Harvard Univ.
Gollub L. 1977. Conditioned reinforcement:
scheduleeffects.SeeHonig &Staddon 1977,
pp. 288–312
Grace RC. 1994. A contextual choice model of
concurrent-chains choice. J. Exp. Anal. Be-
hav. 61:113–29
Grace RC, Nevin JA. 2000. Response strength
and temporal control in fixed-interval sched-
ules. Anim. Learn. Behav. 28:313–31
Grossberg S, Schmajuk NA. 1989. Neural
dyamicsof adaptive timingand temporaldis-
criminationduringassociative learning. Neu-
ral. Netw. 2:79–102
HansonSJ,KilleenPR.1981.Measurementand
modeling of behavior under fixed-interval
schedulesof reinforcement.J.Exp. Psychol.:
Anim. Behav. Process. 7:129–39
Herrnstein RJ. 1961. Relative and absolute
strength of response as a function of fre-
quency of reinforcement. J. Exp. Anal. Be-
hav. 4:267–72
Herrnstein RJ. 1964. Secondary reinforcement
and rate of primary reinforcement. J. Exp.
Anal. Behav. 7:27–36
Herrnstein RJ. 1970. On the law of effect. J.
Exp. Anal. Behav. 13:243–66
Herrnstein RJ. 1981. Self control as response
strength. In Recent Developments in the
Quantification of Steady-State Operant Be-
havior, ed. CM Bradshaw, CP Lowe, F
Szabadi, pp. 3–20. Amsterdam: Elsevier/
North-Holland
Heyman GM, Monaghan MM. 1987. Effects
of changes in response requirements and de-
privation on the parameters of the matching
law equation: new data and review. J. Exp.
Psychol.: Anim. Behav. Process. 13:384–
94
Higa JJ, Wynne CDL, Staddon JER. 1991. Dy-
namics of time discrimination. J. Exp. Psy-
chol.: Anim. Behav. Process. 17:281–91
Hinson JM, Staddon JER. 1983. Matching,
maximizing and hill climbing. J. Exp. Anal.
Behav. 40:321–31
HonigWK, StaddonJER, eds.1977. Handbook
of Operant Behavior. Englewood Cliffs, NJ:
Prentice-Hall
Hopson JW. 1999. Gap timing and the spectral
timing model. Behav. Process. 45:23–31
Hopson JW. 2002. Timing without a clock:
learning models as interval timing models.
PhD thesis. Duke Univ., Durham, NC
Howerton L, Meltzer D. 1983. Pigeons’
FI behavior following signaled reinforce-
ment duration. Bull. Psychon. Soc. 21:161–
63
InnisNK, MitchellS, Staddon JER.1993. Tem-
poral control on interval schedules: What
determines the postreinforcement pause? J.
Exp. Anal. Behav. 60:293–311
KacelnikA,Bateson M.1996.Riskytheories—
the effects of variance on foraging decisions.
Am. Zool. 36:402–34
Kahneman D, Tversky A. 1979. Prospect the-
ory: an analysis of decision under risk.
Econometrika 47:263–91
KelleherRT, Fry WT. 1962. Stimulus functions
in chained and fixed-interval schedules. J.
Exp. Anal. Behav. 5:167–73
KelleherRT,GollubLR.1962. Areview ofpos-
itiveconditionedreinforcement.J.Exp.Anal.
Behav. 5:541–97
KelloJE.1972.Thereinforcement-omissionef-
fect on fixed-interval schedules: frustration
or inhibition? Learn. Motiv. 3:138–47
Killeen PR. 1968. On the measurement of rein-
forcement frequency in the study of prefer-
ence. J. Exp. Anal. Behav. 11:263–69
Killeen PR. 1982. Incentive theory: II. Mod-
els for choice. J. Exp. Anal. Behav. 38:217–
32
Killeen PR, Fantino E. 1990. Unification of
models for choice between delayed rein-
forcers. J. Exp. Anal. Behav. 53:189–200
Killeen PR, Fetterman JG. 1988. A behav-
ioral theory of timing. Psychol. Rev. 95:274–
95
Killeen PR, Hall SS. 2001. The principal com-
ponents of response strength. J. Exp. Anal.
Behav. 75:111–34
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
142 STADDON
¥
CERUTTI
Lander DG, Irwin RJ. 1968. Multiple sched-
ules: effects of the distribution of reinforce-
ments between components on the distribu-
tion of responses between components. J.
Exp. Anal. Behav. 11:517–24
Lejeune H, Ferrara A, Simons F, Wearden JH.
1997. Adjusting to changes in the time of
reinforcement: peak-interval transitions in
rats. J. Exp. Psychol.: Anim. Behav. Process.
23:211–321
Logue AW. 1988. Research on self-control:
an integrating framework. Behav. Brain Sci.
11:665–709
Lowe CF, Harzem P, Spencer PT. 1979. Tem-
poral control of behavior and the power law.
J. Exp. Anal. Behav. 31:333–43
MacEwenD,KilleenP.1991.Theeffectsofrate
and amount on the speed of the pacemaker in
pigeons’ timing behavior. Anim. Learn. Be-
hav. 19:164–70
Machado A. 1997. Learning the temporal dy-
namics of behavior. Psychol. Rev. 104:241–
65
Matell MS, Meck WH. 1999. Reinforcement-
induced within-trial resetting of an internal
clock. Behav. Process. 45:159–71
Mazur JE. 1987. An adjusting procedure for
studying delayed reinforcement. In Quan-
titative Analyses of Behavior, Vol. 5. The
Effects of Delay and Intervening Events on
Reinforcement Value, ed. ML Commons, JE
Mazur, JA Nevin, H Rachlin, pp. 55–73.
Mahwah, NJ: Erlbaum
Mazur JE.1997. Choice, delay,probability, and
conditionedreinforcement. Anim.Learn. Be-
hav. 25:131–47
Mazur JE. 2001. Hyperbolic value addition and
general models of animal choice. Psychol.
Rev. 108:96–112
McDowell JJ, Dallery J. 1999. Falsification of
matchingtheory:changesintheasymptoteof
Herrnstein’s hyperbola as a function of wa-
ter deprivation. J. Exp. Anal. Behav. 72:251–
68
Meck WH. 1983. Selective adjustment of the
speed of an internal clock and memory pro-
cesses. J. Exp. Psychol.: Anim. Behav. Pro-
cess. 9:171–201
Meck WH, Komeily-Zadeh FN, Church RM.
1984. Two-step acquisition: modification of
aninternalclock’scriterion.J.Exp.Psychol.:
Anim. Behav. Process. 10:297–306
Mischel W, Shoda Y, Rodriguez M. 1989.
Delay of gratification for children. Science
244:933–38
Nevin JA. 1974. Response strength in multi-
ple schedules. J. Exp. Anal. Behav. 21:389–
408
Perone M, Courtney K. 1992. Fixed-ratio paus-
ing:joint effectsof pastreinforcermagnitude
and stimuli correlated with upcoming mag-
nitude. J. Exp. Anal. Behav. 57:33–46
Plowright CMS, Church D, Behnke P, Silver-
man A. 2000. Time estimation by pigeons
on a fixed interval: the effect of pre-feeding.
Behav. Process. 52:43–48
Powell RW. 1968. The effect of small sequen-
tial changes infixed-ratio sizeupon the post-
reinforcement pause. J. Exp. Anal. Behav.
11:589–93
Powell RW. 1969. The effect of reinforcement
magnitudeuponrespondingunderfixed-ratio
schedules. J. Exp. Anal. Behav. 12:605–8
Rachlin H. 1991. Introduction to Modern Be-
haviorism. New York: Freeman
Rachlin H. 2000. The Science of Self-Control.
Cambridge, MA: Harvard Univ. Press
RachlinH,GreenL.1972.Commitment,choice
and self-control. J. Exp. Anal. Behav. 17:15–
22
Rachlin H, Green L, Kagel JH, Battalio RC.
1976. Economic demand theory and psycho-
logical studies of choice. In The Psychology
of Learning and Motivation, ed. GH Bower,
10:129–54. New York: Academic
Rachlin H, Laibson DI, eds. 1997. The Match-
ing Law: Papers in Psychology and Eco-
nomics. Cambridge, MA: Harvard Univ.
Press
Roberts S. 1981. Isolation of an internal clock.
J. Exp. Psychol.: Anim. Behav. Process. 7:
242–68
Roberts S. 1983. Properties and function of an
internal clock. In Animal Cognition and Be-
havior, ed. R Melgren, pp. 345–97. Amster-
dam: North-Holland
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
OPERANT CONDITIONING 143
RoyaltyP, WilliamsB, FantinoE. 1987. Effects
of delayed reinforcement in chain schedules.
J. Exp. Anal. Behav. 47:41–56
Schneider BA. 1969. A two-state analysis of
fixed-interval responding in pigeons. J. Exp.
Anal. Behav. 12:677–87
Shimp CP. 1969. The concurrent reinforcement
of two interresponse times: the relative fre-
quency of an interresponse time equals its
relative harmonic length. J. Exp. Anal. Be-
hav. 1:403–11
Shimp CP. 2001. Behavior as a social construc-
tion. Behav. Process. 54:11–32
Shull RL. 1970. The response-reinforcement
dependency in fixed-interval schedules of
reinforcement. J. Exp. Anal. Behav. 14:55–
60
Shull RL. 1979. The postreinforcement pause:
some implications for the correlational law
of effect. In Reinforcement and the Organi-
zationof Behavior, ed.MD Zeiler,P Harzem,
pp. 193–221. New York: Academic
Shull RL, Gaynor ST, Grimes JA. 2001. Re-
sponse rate viewed as engagement bouts:
effects of relative reinforcement and sche-
dule type. J. Exp. Anal. Behav. 75:247–
74
Shull RL, Pliskoff SS. 1967. Changeover delay
and concurrent schedules: some effects on
relative performance measures. J. Exp. Anal.
Behav. 10:517–27
SidmanM. 1960.TacticsofScientific Research:
Evaluating Experimental Data in Psychol-
ogy. New York: Basic Books
Skinner BF. 1937. Two types of conditioned re-
flex: a reply to Konorski and Miller. J. Gen.
Psychol. 16:272–79
Skinner BF. 1938. The Behavior of Organisms.
New York: Appleton-Century
Skinner BF. 1966. Operant behavior. In Oper-
ant Behavior: Areas of Research and Appli-
cation,ed. WKHonig, pp.12–32. NewYork:
Appleton-Century-Crofts
Skinner BF. 1986. Some thoughts about the fu-
ture. J. Exp. Anal. Behav. 45:229–35
Staddon JER. 1965. Some properties of spaced
responding in pigeons. J. Exp. Anal. Behav.
8:19–27
Staddon JER. 1968. Spaced responding and
choice: a preliminary analysis. J. Exp. Anal.
Behav. 11:669–82
Staddon JER. 1970. Temporal effects of re-
inforcement: a negative “frustration” effect.
Learn. Motiv. 1:227–47
Staddon JER. 1972. Reinforcement omission
on temporal go–no-go schedules. J. Exp.
Anal. Behav. 18:223–29
StaddonJER.1974. Temporal control,attention
and memory. Psychol. Rev. 81:375–91
Staddon JER. 1977. On Herrnstein’s equa-
tion and related forms. J. Exp. Anal. Behav.
28:163–70
Staddon JER. 1982. Behavioral competition,
contrast,andmatching.InQuantitativeAnal-
yses of Behavior, Vol. 2. Quantitative Anal-
yses of Operant Behavior: Matching and
Maximizing Accounts,ed. ML Commons,RJ
Herrnstein, H Rachlin, pp. 243–61. Cam-
bridge, MA: Ballinger. 5 Vols.
Staddon JER. 2001a. Adaptive Dynamics:
The Theoretical Analysis of Behavior. Cam-
bridge, MA: MIT/Bradford. 423 pp.
Staddon JER. 2001b. The New Behaviorism:
Mind, Mechanism and Society. Philadelphia:
Psychol. Press. 211 pp.
Staddon JER, Chelaru IM, Higa JJ. 2002. A
tuned-trace theoryof interval-timing dynam-
ics. J. Exp. Anal. Behav. 77:105–24
StaddonJER,HigaJJ.1999. Timeand memory:
towards a pacemaker-free theory of interval
timing. J. Exp. Anal. Behav. 71:215–51
Staddon JER, Innis NK. 1966a. An effect
analogous to “frustration” on interval rein-
forcement schedules. Psychon. Sci. 4:287–
88
Staddon JER, Innis NK. 1966b. Preference for
fixed vs. variable amounts of reward. Psy-
chon. Sci. 4:193–94
Staddon JER, Innis NK. 1969. Reinforcement
omission on fixed-interval schedules. J. Exp.
Anal. Behav. 12:689–700
Staddon JER, Motheral S. 1978. On matching
and maximizing in operant choice experi-
ments. Psychol. Rev. 85:436–44
Starr B, Staddon JER. 1974. Temporal control
on fixed-interval schedules: signal properties
15 Nov 2002 16:3 AR AR178-PS54-05.tex AR178-PS54-05.SGM LaTeX2e(2002/01/18) P1: FHD
144 STADDON
¥
CERUTTI
of reinforcement and blackout. J. Exp. Anal.
Behav. 22:535–45
Stubbs A. 1968. The discrimination of stimu-
lus duration by pigeons. J. Exp. Anal. Behav.
11:223–38
Stubbs DA, Dreyfus LR, Fetterman JG, Boyn-
ton DM, Locklin N, Smith LD. 1994.
Duration comparison: relative stimulus dif-
ferences, stimulus age and stimulus predic-
tiveness. J. Exp. Anal. Behav. 62:15–32
Treisman M. 1963. Temporal discrimination
and the indifference interval: implications
for a model of the “internal clock.” Psychol.
Monogr. 77(756): entire issue
Wearden JH. 1985. The power law and We-
ber’slawinfixed-intervalpost-reinforcement
pausing. Q. J. Exp. Psychol. B 37:191–
211
Williams BA. 1988. Reinforcement, choice,
and responsestrength. In Stevens’Handbook
of Experimental Psychology, ed. RC Atkin-
son, RJ Herrnstein, G Lindzey, RDLuce, pp.
167–244. New York: Wiley. 2nd ed.
Williams BA. 1994. Conditioned reinforce-
ment: neglected or outmoded explanatory
construct? Psychon. Bull. Rev. 1:457–75
WilliamsBA.1997.Conditionedreinforcement
dynamics in three-link chained schedules. J.
Exp. Anal. Behav. 67:145–59
Williams BA, Royalty P. 1990. Conditioned
reinforcement versus time to primary rein-
forcement in chain schedules. J. Exp. Anal.
Behav. 53:381–93
Wynne CDL, Staddon JER. 1988. Typical de-
laydetermines waitingtimeonperiodic-food
schedules: static and dynamic tests. J. Exp.
Anal. Behav. 50:197–210
Zeiler MD. 1977. Schedules of reinforcement:
the controlling variables. See Honig & Stad-
don 1977, pp. 201–32
Zeiler MD, Powell DG. 1994. Temporal con-
trol in fixed-interval schedules. J. Exp. Anal.
Behav. 61:1–9
Zuriff G. 1985. Behaviorism: A Conceptual
Reconstruction. New York: Columbia Univ.
Press
... Besides the 12-19 age group, juniors below 16 were only eligible for Sinovac. Younger age groups (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39) preferred BioNTech more than Sinovac while the older age groups preferred Sinovac more than BioNTech. ...
... Most admitted that observation of their effect was needed before making decisions [9]. The policies targeting the lack of initiatives were classified into incentives (reward) and restrictions (punishment) by using the operant conditioning theory to analyze the AWAVR [26]. Similar strategies were applied in other countries. ...
Article
Full-text available
Background: The World Health Organization has set a target of at least 70% of the global population being vaccinated by the middle of 2022. There are only 17 countries that achieved a 70% vaccination rate (VR). This study aims to analyze the effectiveness of public policies to increase the COVID-19 VR. Methods: vaccination figures of all eligible population groups in Hong Kong from 22 February 2021 to 23 January 2022, were extracted for analysis. Weekly acceleration in the VR (AVR) was calculated as a measure of policy effectiveness. A total of 13 identified measures were classified into four policy categories: eligibility, accessibility, incentives, and restrictions. Age-weighted AVR (AWAVR) was compared by age group and policy presence vs. absence using Mann-Whitney U tests. Results: the AWAVR means across age groups ranged from -1.26% to +0.23% (p = 0.12) for eligibility; accessibility ranged from +0.18% to +1.51% (p < 0.0001); incentives ranged from +0.11% to +0.68% (p < 0.0001); and restrictions ranged from +0.02% to +1.25% (p < 0.0001). Conclusions: policies targeting accessibility, incentives, and restrictions are effective at increasing the VR. These results may serve as a policy reference.
... Operant Conditioning Theory of Learning B.F Skinner propounded the operant conditioning theory in 1937 (Staddon and Cerutti, 2003). ...
Article
Full-text available
Academic performance of students has been an issue of great concern to the students, teachers, educational administrators, employers of labour, and the government as well, as the strength of any nation rests greatly on the quantity and quality of its manpower. Effort was made to discuss extensively the influence of social media, peer pressure, age, study habit, and interest in advanced degrees on the academic performance of undergraduate students among various other factors that influence students' academic performance. The population was made up of undergraduate students of the Department of Educational Psychology, Guidance and Counselling, University of Port Harcourt. 200 students were sampled from the university. Four point Likert Scale questionnaire consisting of 25 items were used to elicit responses from the students on the five factors studied. The questionnaire was manually processed and data analyzed using mean and standard deviation. The findings however show that social media promotes learning, while addiction to social media influence academic performance negatively; good peer group has positive influence on academic performance, while bad peer group influence the students' academic performance negatively; though age has a little influence on the students' academic performance, interest in the course of study and self-discipline exert more influence on the students' academic performance more than age; study habit has strong influence on the students' academic performance as well as interest in advanced degrees. Based on these findings, it was recommended among others that Nigerian tertiary institutions should ensure effective guidance and counselling services within the institutions as the above enumerated problems can be drastically reduced through effective guidance and counselling services as well as the marketing of the effective and efficient adoption of the above mentioned academic performance variables to aid students achieve their academic sustainable goals in the current and future.
... Here, we focus on one area of learningreward-based instrumental conditioning, a form of associative learning. 'Instrumental' (Skinner, 1938) refers to the formation of an association between a behavior and its consequence and it requires the presence of reinforcement (Colwill and Rescorla, 1986;Dickinson, 1994;Staddon and Cerutti, 2003). Traditionally, instrumental forms of learning focus on the relationship between a behavioral response (R) and a biologically relevant outcome (O). ...
Article
Full-text available
Learning is fundamental to animal survival. Animals must learn to link sensory cues in the environment to actions that lead to reward or avoid punishment. Rapid learning can then be highly adaptive and the difference between life or death. To explore the neural dynamics and circuits that underlie learning, however, has typically required the use of laboratory paradigms with tight control of stimuli, action sets, and outcomes. Learning curves in such reward-based tasks are reported as slow and gradual, with animals often taking hundreds to thousands of trials to reach expert performance. The slow, highly variable, and incremental learning curve remains the largely unchallenged belief in modern systems neuroscience. Here, we provide historical and contemporary evidence that instrumental forms of reward-learning can be dissociated into two parallel processes: knowledge acquisition which is rapid with step-like improvements, and behavioral expression which is slower and more variable. We further propose that this conceptual distinction may allow us to isolate the associative (knowledge-related) and non-associative (performance-related) components that influence learning. We then discuss the implications that this revised understanding of the learning curve has for systems neuroscience.
... In order to potentially influence the behavior of the target population to repeat the required task, this paper explores the method of operant conditioning. B.F. Skinner [21] first introduced the concept of operant condition and defined it as "controlled by its consequences" [22]. There are four types of operant conditioning techniques that are used in this paper: positive reinforcement, negative reinforcement, positive punishment, and negative punishment. ...
Article
Full-text available
Greenhouse gas emission is a major contributor to climate change and global warming. Many sustainability efforts are aimed at reducing greenhouse gas emissions. These include recycling and the use of renewable energy. In the case of recycling, the general population is typically required to at least temporarily store, and possibly haul, the materials rather than simply throwing them away. This effort from the general population is a key aspect of recycling, and in order for it to work, some investment of time and effort is required by the public. In the case of corrugated cardboard boxes, it has been observed that there is less motivation for the general population to recycle them. This paper explores different means of motivating people to reuse, and not just recycle, with different types of incentives. The paper addresses the use of persuasion techniques and operant conditioning techniques together to incent the general population to adopt sustainable efforts. The paper makes an attempt to segment the general population based on persuasion preference, operant condition preference, and personality type to use different forms of incentives and motivational work unlike any approaches found in the literature review. Four types of persuasion techniques and four types of operant conditioning are combined to give 16 different types of incentives. Two online surveys are conducted, and their data are analyzed (using entropy, Hamming distance, chi-square, and ANOVA). The results indicate that “positive reinforcement ethos” is a cost-effective way to incent the general population. The results of this study can be applied to a wide range of applications such as incentives for solar panels, incentives for vaccination, and other areas wherein sustainability-centric behavior is encouraged.
... The most extended is Csikszentmihalyi's Flow theory (1990), which defines a state of intense concentration and involvement when the individual perceives a clear goal, feedback, proper challenge/skill ratio, and an environment for deep focus. Oppositely, Skinner's operant conditioning relies on extrinsic motivation to modify the subject's behavior via systematic reinforcement driven by previously defined rules ( (Staddon and Cerutti 2003)). ...
Article
Full-text available
Virtual reality (VR) has been widely used to simulate various real-like environments suitable to explore and interact, similar to being genuinely there (i.e., allowing presence). User experience in virtual environments (VE) is highly subjective, and presence-based self-reports have addressed its assessment; however, it is unclear how a diverse set of VR features relates to the subscales of the questionnaires (e.g., engagement, immersion, or Attention), which could be helpful to create and improve immersive VE. Consequently, most current studies have appealed to self-defined criteria to design their VE in response to a lack of accepted methodological frameworks. Therefore, we systematically reviewed the current publications to identify critical design elements to promote presence and realistic experiences in VR-games users. We extracted information from different databases (Scopus, Web of Science, PubMed, ACM, IEEE, Springer, and Scholar) and used inclusion and exclusion criteria to reduce the original set of 595 candidates to 53 final papers. The findings showed that better quality and quantity in resources allocation (software and hardware) and more accuracy in objects and characters, which all refer to higher immersion, provide Place Illusion (PI), i.e., the spatial dimension of presence. Furthermore, Scenario's Realism, external stimuli, and coherent match between virtual and real worlds (including body representation) are decisive to set Plausibility Illusion (PSI), i.e., the dimension associated with coherence. Finally, performance feedback, character customization, and multiplayer mechanics are crucial to assure motivation and agency, which are user-exclusive but crucial to defining pres-ence's perception. Moreover, about 65% of the analyzed studies agreed that immersive media and social interaction could simultaneously influence PI and PSI.
Article
Clinical fear is at the core of anxiety disorders. Considerable research has examined processes through which clinical fears are learned and unlearned (i.e., acquisition, generalization, extinction, return of fear) in anxiety disorders. Empirically supported models of these processes implicate both associative and instrumental learning. Research has also delineated that avoidance (i.e., behaviors intended to prevent aversive experiences) and fear approach (i.e., behaviors that involve exposure to one's fear) modulate fear learning, yet these processes remain under-researched in anxiety-based disorders. The purpose of the current review is to a) review existing research on clinical fear learning, incorporating fear approach, avoidance, and inhibitory learning, and b) extend this model to advance the understanding of fear-based learning in eating disorders. Implications for research and treatment are discussed, including how the anxiety field can inform eating disorder research and the importance of empirically testing fear learning in eating disorders to improve treatment.
Article
The use of established and discipline specific theories within research and practice is an indication of the maturity of a discipline. With computing education research as a relatively young discipline, there has been recent interest in investigating theories that may prove foundational to work in this area, with discipline specific theories and many theories from other disciplines emerging as relevant. A challenge for the researcher is to identify and select the theories that provide the best foundation for their work. Learning is a complex and multi-faceted process and, as such, a plethora of theories are potentially applicable to a research problem. Knowing the possible candidate theories and understanding their relationships and potential applicability, both individually or as a community of theories, is important to provide a comprehensive grounding for researchers and practitioners alike. In this work, we investigate the fundamental connections between learning theories foundational to research and practice in computing education. We build a comprehensive list of 84 learning theories and their source and influential papers, which are the papers that introduce or propagate specific theories within the research community. Using Scopus, ACM Digital Library and Google Scholar, we identify the papers that cite these learning theories. We subsequently consider all possible pairs of these theories and build the set of papers that cite each pair. On this weighted graph of learning theory connections, we perform a community analysis to identify groups of closely linked learning theories. We find that most of the computing education learning theories are closely linked with a number of broader learning theories, forming a separate cluster of 17 learning theories. We build a taxonomy of theory relationships to identify the depth of connections between learning theories. Of the 294 analysed links, we find deep connections in 32 links. This indicates that while the computing education research community is aware of a large number of learning theories, there is still a need to better understand how learning theories are connected and how they can be used together to benefit computing education research and practice.
Article
Animal interval timing is often studied through the peak interval (PI) procedure. In this procedure, the animal is rewarded for the first response after a fixed delay from the stimulus onset, but on some trials, the stimulus remains and no reward is given. The standard methods and models to analyse the response pattern describe it as break-run-break, a period of low rate response followed by rapid responding, followed by a low rate of response. The study of the pattern has found correlations between start, stop, and duration of the run period that hold across species and experiments. It is commonly assumed that to achieve the statistics with a pacemaker accumulator model, it is necessary to have start and stop thresholds. In this paper, we will develop a new model that varies response rate in relation to the likelihood of event occurrence, as opposed to a threshold, for changing the response rate. The new model reproduced the start and stop statistics that have been observed in 14 different PI experiments from 3 different papers. The developed model is also compared to the two-threshold Time-adaptive Drift–diffusion Model (TDDM), and the latest accumulator model subsuming the scalar expectancy theory (SET) on all 14 datasets. The results show that it is unnecessary to have explicit start and stop thresholds or an internal equivalent to break-run-break states to reproduce the individual trials statistics, the average behaviour, and the break-run-break analysis results. The new model also produces more realistic individual trials compared to TDDM.
Chapter
Full-text available
This chapter introduces a framework borrowed from economics within which choices between different commodities may be studied in a consistent manner. This framework is “demand theory” and the particular concept within demand theory that most directly applies to studies of choice is that of “substitutability.” The relationship between psychology and economic demand theory is explored. The chapter explores that demand theory is not so much a psychological theory as it is a definition of psychological utility. It specifies an internal mechanism or can be proven true or false. Using economic concepts of budget lines, indifference curves, and substitutability, two series of experiments are conducted involving rats' choices between two different commodities. Consumption of the commodities changed as changes are introduced into the budget set-the rats consumed more of the lower priced commodity and less of the higher priced commodity. Through computer simulation, the maximization of overall rate of reinforcement results in matching of relative rate of responding to relative rate of reinforcement.
Article
Full-text available
Contrary to data showing sensitivity to nontemporal properties of timed signals, current theories of interval timing assume that animals can use the presence or absence of a signal as equally valid cues as long as duration is the most predictive feature. Consequently, the authors examined rats' behavior when timing the absence of a visual or auditory stimulus in trace conditioning and in a "reversed" gap procedure. Memory for timing was tested by presenting the stimulus as a reversed gap into its timed absence. Results suggest that in trace conditioning (Experiment 1), rats time for the absence of a stimulus by using its offset as a time marker. As in the standard gap procedure, the insertion of a reversed gap was expected to "stop" rats' internal clock. In contrast, a reversed gap of 1-, 5-, or 15-s duration "reset" the timing process in both trace conditioning (Experiment 2) and the reversed gap procedure (Experiment 3). A direct comparison of the standard and reversed gap procedures (Experiment 4) supported these findings. Results suggest that attentional mechanisms involving the salience or content of the gap might contribute to the response rule adopted in a gap procedure.
Article
Humans and animals are capable of learning under real-time training conditions to generate appropriately timed responses to sensory stimuli. An extensive experimental literature exists describing the properties of such timed behavior during classical conditioning and instrumental learning experiments. The present work develops a neural network model that has been used to quantitatively simulate key properties of this data base on the computer. The present model clarifies how learning of properly timed individual responses is achieved. Learned timing of response sequences utilizes adaptive circuits for sensory-motor planning, such as avalanche circuits , into which copies of the circuit described may be naturally embedded.
Article
Temporal control of behavior was investigated within the framework of an internal clock model. Pigeons were exposed to signaled fixed-interval 30-s trials mixed with extended unreinforced (baseline) trials. On unreinforced break trials, the signal was interrupted for a period of time after trial onset. In Experiment 1, comparisons between the peak time obtained on baseline and on break trials produced peak time shifts that were longer than those expected if the clock had stopped during the break but shorter than if the clock had reset. In Experiment 2, systematic manipulations of duration and location of breaks produced peak time shifts that were nonlinear functions of break duration and that varied linearly with break location. The obtained peak times were more consistent with a continuous memory decay model than with the stop-retain or the reset hypotheses.
Article
Four experiments studied the scaling of time by rats. The purpose was to determine if internal clock and memory processes could be selectively adjusted by pharmacological manipulations. All of the experiments used a temporal discrimination procedure in which one response ("short") was reinforced following a 2-sec noise signal and a different response ("long") was reinforced following an 8-sec noise signal; unreinforced signals of intermediate duration were also presented. The proportion of "long" responses increased as a function of signal duration. All drugs were administered intraperitoneally (ip) and their effect on clock or memory processes was inferred from the observed pattern of change in the point of subjective equality of the psychophysical functions under training and testing conditions. Experiment 1 demonstrated that methamphetamine (1.5 mg/kg) can selectively increase clock speed and that haloperidol (.12 mg/kg) can selectively decrease clock speed. Experiment 2 demonstrated that footshock stress (.2 mA) can selectively increase clock speed during continuous administration but leads to a decrease in clock speed below control values when the footshock is abruptly terminated. Experiment 3 demonstrated that vasopressin (.07 pressor units/kg) and oxytocin (.02 pressor units/kg) can selectively decrease the remembered durations of reinforced times, which suggests that memory storage speed increased. Experiment 4 demonstrated that physostigmine (.01 mg/kg) can selectively decrease the remembered durations of reinforced times and that atropine (.05 mg/kg) can selectively increase these remembered durations, which suggests that memory storage speed was differentially affected. The conclusion is that internal clock and memory processes can be dissociated by selectively adjusting their speed of operation and that these changes can be quantitatively modeled by a scalar timing theory.
Article
The hyperbolic-decay model is a mathematical expression of the relation between delay and reinforcer value. The model has been used to predict choices in discrete-trial experiments on delay-amount tradeoffs, on preference for variable over fixed delays, and on probabilistic reinforcement. Experiments manipulating the presence or absence of conditioned reinforcers on trials that end without primary reinforcement have provided evidence that the hyperbolic-decay model actually predicts the strength of conditioned reinforcers rather than the strength of delayed primary reinforcers. The model states that the strength of a conditioned reinforcer is inversely related to the time spent in its presence before a primary reinforcer is delivered. A possible way to integrate the model with Grace's (1994) contextual-choice model for concurrent-chain schedules is presented. Also discussed are unresolved difficulties in determining exactly when a stimulus will or will not serve as a conditioned reinforcer.
Chapter
Publisher Summary The term “time discrimination” has no exact meaning, but the common examples of it share two properties: (1) the probability of some learned response changes with the time since some event, and (2) the function relating response probability and time changes with the time of the motivating event—for example, food or shock. The inhibition-of-delay results of Pavlov were the first examples of animal time discrimination. One of Pavlov's examples, originally published in 1980, used as the conditioned stimulus (CS) the sound of a whistle. During the first minute of the whistle, there were zero drops of saliva, during the second minute, about five drops, and during the third minute, about nine drops. In addition, Pavlov found that the pause before salivating was proportional to the length of the CS; this satisfies the second condition. The introduction by Skinner of fixed-interval schedules of reinforcement made it much easier to observe timing, especially in rats. Skinner was training rats in the first Skinner boxes. He began by rewarding every response, but changed to rewarding responses only once a minute (a fixed-interval schedule) to make his supply of pellets last longer. After training with this procedure, a cumulative record of the number of responses showed the now-familiar fixed-interval scallops-response rate was low at the beginning of the interval and much higher near the end.