Content uploaded by Jean Sawyer
Author content
All content in this area was uploaded by Jean Sawyer on Sep 15, 2015
Content may be subject to copyright.
T
he quantification of stuttered speech and normally
disfluent speech has been the focus of considerable
debate, resulting in diverse research methodologies
and measures over the past 60 years. For example, there has
been a diverse opinion about the metric to be used. Onslow,
Costa, and Rue (1990) used several measures, including
percentage of stuttered syllables; Cordes and Ingham
(1994) counted the frequency of time-interval–based
speech sections that were perceived as containing stutter-
ing; and Yairi and Ambrose (1999) used, among other
measures, frequency of specific disfluency types per 100
syllables. Multiple metrics, including speech naturalness,
speech rate, and severity rating scales, in addition to those
just mentioned, have been advocated to increase the
reliability and validi ty of characterizing stuttere d speech
(Conture, 1997; J. C. Ingham & Riley, 1998). In addition to
using a variety of metrics, collecting samples repeatedly
across different times and speaking environments has
been advocated as a way to establish an adequate sample
of stuttering (Costello & Ingham, 1984; J. C. Ingham &
Riley, 1998).
Problems in quantification, however, are present at
earlier stages of the data collecting/recording process, that
is, during the phase when the objective is to secure a speech
sample that adequately represents the individual’s verbal
output. In this respect, one strong impression derived from
the literature is that the way in which speech samples are
collected may affect measurement of disfluent events or
events of perceived stuttering. Commensurate with this
broad objective, researchers (e.g., Costello & Ingha m,
1984; Gordon & Luper, 1992; Gregory & Hill, 1993; J. C.
Ingham & Riley, 1998; Yairi, 1997) have argued that a
person’s speech samples should be recorded on different
days and/or in different situational contexts to increase the
level of representativeness of disfluent speech in such
samples. Indeed, several investigators who examined
the effects of different speaking situations on counts of
children’s disfluency or perc eived stuttering reported some
variability (e.g., J. C. Ingham & Riley, 1998; Martin, Kuhl,
& Haroldson, 1972; Silverman, 1971; Wexler, 1982;
Yaruss, 1997a). Although for certain purposes multiple-
situation samples may be desired, it should be pointed out
that there are numerous studies, laboratory experiments,
and field screenings (e.g., prevalence surveys) that legit-
imately rely on a single speech sample. Certainly, in these
conditions, but also in research or clinical situations where
multiple samples are desired, the question of how long
each sample ought to be has remained open. From this
perspective, the question of how much speech would be
sufficient to improve the identification of stuttering, or to
American Journal of Speech-Language Pathology
Vol. 15
36–44
February 2006
A American Speech-Language-Hearing Association
1058-0360/06/1501-0036
36
Research
The Effect of Sample Size on the Assessment
of Stuttering Severity
Jean Sawyer
Ehud Yairi
The University of Illinois at Urbana-Champaign
The relationships between the length of the
speech sample and the resulting disfluency data
in 20 stuttering children who exhibited a wide
range of disfluency levels were investigated.
Specifically, the study examined whether the
relative number of stuttering-like disfluencies
(SLD) per 100 syllables, as well as the length of
disfluencies (number of iterations per disfluent
event), varied systematically across 4 consecu-
tive, 300-syllable sections in the same speech
sample. The difference in the number of SLD
per 100 syllables between the early and later
sections of the speech sample was statistically
significant. In addition, the length of the speech
sample had a critical influence on the identifica-
tion of stuttering in children exhibiting relatively
low levels of disfluency. Also, when a 20%
difference in the number of SLD per 100 syllables
was taken as a criterion, 50% of the children
exhibited upward shifts in continuous speech
samples that were longer than 300 syllables
(i.e., 600, 900, and 1,200 syllables). Results
indicated that, in general, group means for SLD
grew larger as the sample size increased. The
length of disfluent events did not significantly
differ as the sample size increased; however,
there were large differences for some children.
Implications for clinicians and investigators are
discussed.
Key Words: stuttering measurement, speech
sample size, stuttering distribution
assess its characteristics, such as frequency and severity,
in a thorough and accur ate way, is a valid one in and
of itself.
Recommendations for collecting speech samples in
general, not just stuttered speech, have emphasized
that samples should contain a sufficient number of the
behavior(s) targeted for analysis and that larger samples
with varied contexts enhance reliability (Lahey, 1988;
Lund & Duchan, 1993). This recommendation appears
to be particularly applicable to stuttering because the
frequency of disfluent events and the severity of the
disorder tend to fluctuate even within a given situation. As
Van Riper (1982) opined, ‘‘It is also necessary to obtain a
speech sample adequate in length for frequency count’’
(p. 205). A speech sample that is too short runs the risk
of lacking a number of events perceived as stuttering,
events that contain specific types of stuttering, or other
parameters of disfluency that can result in misrepresenting
a person’s speech. On the other hand, ‘‘too large a sampling
can be onerous’’ (Van Riper, 1982, p. 205). Thus, defin-
ing reasonably adequate sample sizes for rese arch and
clinical purposes is among the unresolved issues related
to the collection of speech samples for counting either
stuttering events or specific types of disfluencies.
Although speech sample size has varied greatly from
study to study, there has been a noticeable tendency to use
short samples. Wexl er (1982) analyzed 100 uttera nces of
normally fluent children; Riley (1984) used 100 words;
Onslow, Gardner, Bryant, Stuckings, and Knight (1992)
used 15 utterances taken from a 10-min conversation;
Yaruss (1997a) based his analysis on samples as short as
200 syllables; and Meyers (1986) used approximately
350-word samples. The use of 300 words or so has been
common in research (Conture & Kelly, 1991; Gutierrez &
Caruso, 1995; Pellowski & Conture, 2002; Schwartz,
Zebrowski, & Conture, 1990; Zebrowski, 1991) as well as
in clinical assessment: Riley’s (1972) original Stuttering
Severity Instrument (SSI) suggested collecting 100-word
sections from each of three speaking situations. Short
speech samples have also been time based. For example,
Onslow et al. (1990) allowed speech samples recorded at
children’s homes to be as short as 60 s. Martin and
Haroldson (1981) and MacDonald and Martin (1973) used
5 min of speech, and Young and Prather (1962) concluded
that speech samples (of adults) as short as 20 s could be
sufficient to estimate stuttering severity.
Several investigators have used longer speech samples.
For example, Yairi and Lewis (1984) and J. C. Ingha m and
Riley (1998) used 500 words. Here too, some researchers
reported longer samples in terms of time. R. J. Ingham,
Southwood, and Horsburgh (1981) used a base-rate phase
that involved a minimum of six consecutive 5-min readings
and spontaneous speaking tasks. It should be noted,
however, that theirs was not a continuous sample. A related
phenomenon has been the use of speech samples that
were inconsistent in size within the same investigation.
Johnson and Associates’ (1959) study included speech
samples that ranged from 31 to 2,044 words; Silverman’s
(1971) samples ranged from 342 to 2,592 words; Schwartz
and Conture’s (1988) were from 85 to 650 words; and
Ambrose and Yairi’s (1999) samples were 750 to 1,500
syllables.
Additional examination of the clinical literature reveals
that recommendations for speech sample size for the
clinical purposes of diagnosis and evaluation of stuttering
also vary. Costello and Ingham (1984) emphasized
sampling from different situational contexts in their
10–15-min ‘‘standard sets of talking,’’ recommending
500 syllables for each of several situations. Curlee (1999)
recommended a minimum of 500 syllables for diagnosing
and evaluating childhood stuttering, and Adams (1977)
advocated obtaining a 300- to 500-word speech sample.
The current version of the SSI recommends a minimum of
200 syllables in each of three situational speaking sections,
advising that a longer sample, even up to 500 syllables,
may be more reliable (Riley, 1994). Still, others have
advocated collecting considerably shorter samples. Conture
(1990) stated that a 300-word sample was sufficient
for averaging across enough 100-word samples to ade-
quately assess variations in stuttering, and Yaruss (1997b)
advocated analyzing 200-syllable samples from each of
four speaking situations. None of the above investigators
provided data-based support for the choice of the particular
sample size, be it long or short. It is unfortunate that, to
date, few systematic investigations have examined the role
of sample size on the disfluency data obtained, and no
scientific basis for using any speech sample size has
been offered. One attempt in this direction was made
by Yaruss (1997b) as part of his study of situational
variability. Few details were provided, and sample sizes
ranged from 600 to 1,000 syllables. No significant
variability in disfluencies was found in 200-syllable
sections of these speech samples.
As stated earlier, the length of the sample influences the
representative level of the speech data because disfluencies
do not occur with precise regularity and are not equally
distributed throughout a person’s speech. Disfluencies may
occur in clusters or be separated by intervals of fluent
speech, the length of which varies within and between
participants, including preschool children (Hubbard &
Yairi, 1988; LaSalle & Con ture, 1995; Sawyer & Yairi,
2003). Thus, particularly when investigators or clinici ans
are interested in specific disfluent components of a person’s
stuttered or normally fluent speech, it is important for the
sample to include representative examples of all the
disfluencies in a person’s speech as well as several
tokens of each type so that the count is reasonably reliable.
Short samples may have a high risk of underrepresenting
different disfluency types or such disfluency characteristics
as length that are produced by the speaker, particula rly
those disfluencies that occur less frequently. For
example, a child may produce only one or two sound
prolongations, or not even one instance of a syllable
repetition containing four repetition units. These types
of disfluencies tend to strongly impact the perception of
stuttering severity. Because disfluencies may occur
irregularly, sometimes only toward the end of a long speech
sample, the examples given above may not be adequately
sampled in only 200 or 300 syllables (Ambrose & Yairi,
1999). On the other hand, a short sample may also lead to
Sawyer & Yairi: Sample Size and Stuttering Severity 37
overrepresentation if, by chance, it contains an infrequently
occurring disfluent event, such as a 2- or 3-s sound
prolongation, giving the investigator or clinician an
erroneous estimate of its frequency of occurrence.
As stated above, to date, few if any studies have
systematically investigated the effect of a single speech
sample size on the disfluency or stuttering data obtained.
Such information, however, is important, even if multiple
samples are taken, because it would provide researchers
and clinicians empirically based guidelines for collecting
speech samples that are appropriate for their respective
needs. Therefore, in this study we investigated the effects
of speech sample size on the disfluency counts in the
speech of preschool-age children who stuttered, a group
that sometimes may present unique additional challenges in
identification and differentiation. We hypothesized that
longer speech samples would provide proportionally
different frequency counts of disfluency as well as different
disfluency-length data than short samples would provide.
Consecutive sections of 300 syllables within a single
sample were compared because of the frequent past
practice of using sample sizes of similar length (Conture &
Kelly, 1991; Gutierrez & Caruso, 1995; Pellowski &
Conture, 2002; Riley, 1972; Schwartz et al., 1990;
Zebrowski, 1991). We addressed the following research
questions: (a) Does the frequency of stuttering-like
disfluency (SLD) per 100 syllables increase when the
length of a speech sample increases as consecutive 300-
syllable sections of the speech sample are incorporated into
the analysis? (b) Does the number of SLD per 100 syllables
in a single, long speech sample vary across four con-
secutive, 300-syllable sections? (c) Are there significant
differences in the length of repetition as a function of the
sample size?
Method
Participants
The participants were 20 preschool children who
stuttered, ranging from 33 to 58 months of age
(M = 43.9 months). The sample included 14 boys and
6 girls.
All children met the following criteria: (a) age 6 years or
younger, (b) regarded by parents as having a stuttering
problem, (c) regarded by two certified speech pathologists
as exhibiting stuttering, (d) exhibiting at least three SLDs
(part- and single-syllable word repetitions and blocks/sound
prolongations) per 100 syllables, and (e) no history of
neurological disorders or abnormalities.
Procedures
Recording. A conversat ional speech sample with the
child interacting with a parent and an experimenter, taken
in a single session, was audio- and videotaped in a sound-
treated room. Speech samples were a minimum of 1,200
syllables in length. These samples were elicited while the
child was playing with Play-Doh. Although the children
were engaged in a free conversation, to increase uniformity,
the conversation was guided by having the children respond
to open-ended standard questions about what they were
making with the Play-Doh, their favorite story, favorite TV
show, and so forth.
Segmentation of sample s. The first 1,200 syllables of
each recorded sample were transcribed orthographically.
Each of these long speech samples was arbitrarily divided
into four 300-syllable sections: Section 1 (Syllables 1 to
300), Section 2 (Syllables 301 to 600), Section 3 (Syllables
601 to 900), and Section 4 (Syllables 901 to 1,200).
Analysis of speech samples. Disfluencies were identified
and marked using the Sy stematic Analysis of Language
Transcripts (SALT; Miller & Chapman, 1996) to facilitate
the counting. Unintelligible utterances were deleted from
the research materials. Isolated responses such as ‘‘yes,’’
‘‘no,’’ ‘‘okay,’’ and ‘‘yeah,’’ which tend to be fluent, were
excluded from the anal ysis. When an affirmative or
negative was immediately followed by a phrase (e.g., ‘‘Yes,
I like ice cream’’), however, it was retained. The isolated
affirmatives that were excluded represented a range of 1%
to 4% of the total speech sample for the children in the
study, with the mean being 3%.
1
For the purpose of the current investigation, three
disfluency types that constitute SLD—part-word repeti-
tion, monosyllabic word repetition, and disrhythmic
phonation—were coded and identified. These types were
chosen because they have been shown to typify stuttered
speech (Ambrose & Yairi, 1999; Conture, 2001; Van Riper,
1971; Yairi, 1996). To avoid a possibl e order effect, each of
the 300-syllable sections was treated as an independent data
unit. The data were analyzed for disfluency identification
and count by the first author, who has had extensive
experience (several hundred hours) in both disfluency and
language analyses. Th e data were also analyzed inde-
pendently by a second listener with a similar level of
experience, who was blind to the hypotheses of the study
and the order of the sections. Interjudge comparisons for
the syllable counts for 20% of the samples yielded a .9 8
agreement coefficient. The interjudge reliability for type
and location of SLD in the randomly analyzed sections was
.89, using the percentage-occurrence-agreement formula
described by Baird and Nelson-Gray (1999). That is, the
number of agreements of SLD occurrence were divided by
the number of agreements plus disagreements. Specifically,
agreement was calculated only for disfluent events. Last,
interjudge reliability for the count of the number of
repetition units (RU; for part-word and single-syllable word
repetition) using Baird and Nelson-Gray’s (1999) formula
was .94. Intrajudge agreement for the first author was
derived after a period of 1 year. The values for the syllable
counts, SLD, and number of repetition units were high
at .98, .92, and .96, respectively.
1
Typically, single-word utterances, especially ‘‘yes’’ and ‘‘no’’ responses,
are uttered fluently. Because during free conversation with very young
children such responses can be frequent, an incorrect assessment of the
child’s habitual stuttering may occur if these utterances are included.
Taken to extreme, a child with moderate–severe stuttering may never
stutter or do so very infrequently if he/she talks with only single-word
responses, especially yes and no. Thus, exclusion of such material provides
a better picture of the child’s level of stuttering.
38 American Journal of Speech-Language Pathology
Vol. 15
36–44
February 2006
The SLD count per 100 syllables was determined by
combining the frequencies of all three disfluency type s that
constitute this class. RU was reported in terms of the mean
number of extra iterations for part-word repetitions and
single-syllable, whole-word repetitions. These data were
derived for each child, section by section. Group means for
each section were calculated.
Results
SLD
Cumulative SLD counts. To answer the main question of
this study—that is, whether the relative frequency of
disfluencies changes as a function of the speec h sample
size—SLD counts per 100 syllables were derived for the
first 300 syllables, the first consecutive 600 syllables, the
first 900 consecutive syllables, and the entire sample of
1,200 syllables. Table 1 presents the individual data, with
the sample length containing the first highest SLD count
marked in bold. Individual participant numbers were
assigned on the basis of the number of SLD per 100
syllables, from low to high, in the full speech sample.
The direction of the critical change is indicated under the
sample length in which it could be first detected (as is
explained below): U indicates an upward shift, D indicates
a downward shift, and N indicates no shift. Group means
for the children for each of the sample lengths appear at the
bottom of the table.
Most of the children showed considerable variability, in
terms of SLD per 100 syllables, over the different lengths
of the speech sam ple. Child 5, for example, had an increase
in SLD per 100 syllables of almost 300% as the sample
size increased from 300 to 1,200 syllables. Two immediate
observations can be made on the individual data in the table:
1. The length of the speech sample made a critical
difference in the basic diagnostic decision of classifying
3 children (15% of the sample) as exhibiting a stuttering
problem commensurate with the common standar d
of three SLD or similar measures (Conture, 2001;
Van Riper, 1971; Yairi, 1997). That is, their diagnosis
as children who stutter would have been missed if only
the first 300-syllable sample were used. Two of these
children required a 600-syllable sample, and 1 child
required 900-syllable samples to meet the criterion.
Three out of seven potential misses is a high proportion.
In addition, the 3 children were among the first 7
children listed in the table, all of whom represented
low-level SLD of five SLD or fewer per 100 syllables, a
level widely recognized as mild stuttering (see severity
scales by Johnson, Darley, & Spriestersbach, 1963;
Williams, 1978; Wingate, 1976).
2. Only 6 children (30% of the group) achieved the highest
SLD count in the first 300-syllable section, an
additional 3 children (15%) reached the highest count
after 600 syllables, and 3 more children reached their
peak after 900 syllables. Eight children, 40% of the
group, reached their maximum after speaking up to
1,200 syllables.
Regarding the last observation, however, an obvio us
question presents itself: What magnitude of a difference in
the number of SLD is practically meaningful? For example,
what change in the frequency of SLD or other measures,
such as frequency of stuttering (Onslow et al., 1992), is
required for altering the perception of stuttering severity?
Except for the minimal criterion for the diagnosis of
stuttering (3 SLD per 100 syllables), where a difference of
almost any size can be critical, there is no direct evidence
currently to support an answer to the question posed above.
Regardless, it is obvious that a person’s level of disfluency
must be considered. A difference of 1 SLD for a child
exhibiting 4 SLD per 100 syllables is relatively much
more significant than the same difference for a child
exhibiting 15 SLD per 100 syllables. Therefore, although
it would appear reasonable to hypothesize that a 10%
up or down change in observable behavior should be
regarded as practically significant, for the sake of being
conservative we doubled this figure, settling on a 20%
change as the critical threshold for the purpose of this
initial study of the sample length issue. Accordingly, the
individual data in Table 1 were reviewed to identify
children whose level of disfluency seen in Section 1
(300 syllables) was critically altered and at what speech
sample length the change was first detected. An inspection
of Table 1 yielded three subgroups:
1. Children with upward SLD shifts: This subgroup
included 10 children, or 50% of the participants. Of
these, 4 children (Nos. 2, 7, 11, and 18) had upward shifts
in SLD in the 600-syllable count, 3 (Nos. 5, 16, and 19) in
the 900-syllable count, and 3 (Nos. 4, 6, and 12) in the
TABLE 1. Individual stuttering-like disfluency (SLD) counts,
group means, and standard deviations for 300, 600, 900, and
1,200 syllables.
Syllables
Participant 300 600 900 1,200
1 4.67 3.00 D 2.55 3.17
2 2.33 3.17 U 3.00 3.17
3 4.00 3.00 D 3.00 3.42
4 3.00 3.50 3.11 3.67 U
5 1.33 1.50 3.33 U 3.75
6 3.23 2.50 3.11 3.83 U
7 2.67 3.50 U 3.89 4.00
8 5.33 N 5.00 5.11 4.83
9 6.33 5.33 4.67 D 5.08
10 9.33 7.17 6.00 D 5.50
11 6.00 8.17 U 9.00 8.75
12 7.57 7.50 8.56 8.92 U
13 10.67 N 11.33 9.33 9.50
14 10.67 N 9.67 9.56 10.17
15 11.33 N 10.50 12.22 12.00
16 8.00 9.50 10.22 U 12.83
17 13.33 N 14.50 14.00 14.42
18 10.67 15.33 U 15.89 14.92
19 9.67 11.00 14.55 U 15.25
20 23.00 N 21.17 21.11 23.92
M 7.12 7.82 8.11 8.55
SD 3.69 5.15 5.28 5.62
Note. Boldface type represents the highest SLD level; U = upward
shift; D = downward shift; N = no shift.
Sawyer & Yairi: Sample Size and Stuttering Severity 39
1,200-syllable count. Note, again, that 5 of these
children were among the 7 listed at the beginning of
the table; that is, they all exhibited what is typically
regarded as mild stuttering. The other 5 children
exhibited what is typically regarded as either moderate
or severe stuttering.
2. Children with downward SLD shifts: This subgroup
included 4 children, or 20% of the participants. For
2 children in this subgroup (No s. 1 and 3), the change
occurred in the 600-syllable speech count, and for the
other 2 (Nos. 9 and 10) in the 900-syllable count. The
stuttering of the first 2 children can be regarded as mild
and that of the last 2 children as moderate, according to
stuttering severity scales.
3. Children with no SLD shifts: This subgroup include d
6 children (Nos. 8, 13, 14, 15, 17, and 20), or 30% of
the entire group. Interestingly, 5 of the children were
located along the moderate and severe range of stut-
tering severity.
Turning to the group’s means at the bottom of the table,
the means gradually increased as the speech sample size
got longer. Th e mean of the longest sample was larger than
the mean o f the shortest sample by almost 1.5 SLD points.
These means, however, could not be compared through
statistical analyses because the same disfluencies present
in shorter samples are also present in longer cumulative
samples (e.g., the 900 column includes data from the
previous two columns). Thus, any test would be a comparison
of a whole with some of its parts. Therefore, comparisons of
noncumulative means were calculated as section-by-section
data, wherein the speech samples are treated as four con-
secutive, but separate, 300-syllable sections.
Section-by-section data. Although the main concern of
this study was the effect of the increasing sample length on
the frequency of disfluency, variations in disfluency
between each of the four 300-syllable sections that we
arbitrarily marked were also of interest, providing a
somewhat different look at the data that allowed the
application of statistical assessment. Group data are
presented in Table 2, with the numbers 1, 2, 3, and 4 used
to denote the four consecutive, 300-syllable sections of the
speech sample. Section 1 contains the first 300 syllables
(1–300), Section 2 includes Syllables 301–600, Section 3
includes Syllables 601–900, and Section 4 includes
Syllables 901–1,200.
The group means showed a substantial increase in SLD,
from 7.12 in Section 1 to 9.85 in Section 4, a rise of 38%.
Differences betwee n consecutive sections were 10% to
13%. Note that both the mean and standard deviation for
SLD were largest in the fourth section of the speec h
sample. A one-way, repeated-measures analysis of varianc e
(ANOVA) with a Huyn–Feldt correction for sphericity
revealed significant differences in SLD across the sections,
F(1, 19) = 2.9, p = .049. Post hoc comparisons using
Tukey’s honestly signif icant difference test indicated that
the SLD means in Sections 1 and 4 were significantly
different ( p = .05).
Again, SLD fluctuations over the speech samples were
examined in terms of individual data, this time for each
300-syllable section separately. Table 3 shows how many
children had a 20% or more increase in SLD per 100
syllables between Sections 1 and 2, 2 and 3, and 3 and 4. As
we expected, the data reinforced the findings presented
earlier in regard to a longer speech sample. For example,
Participant 4 had 3.00, 4.00, 2.33, and 5.33 SLD in Sections
1 through 4, respectively. Participant 20 had 12.00, 19.33,
21.00, and 32.33 SLD in the respective samples. Table 3
makes it clear that later sections in the long speech sample
tended to have more disfluencies. Individually, over half
the children had a large increase in the number of SLD
per 100 syllables in the last 300 syllables of the speech
sample.
RU
Cumulative RU counts. To determine whether the
number of RU, a measure of disfluency length, changed as
a function of the speec h sample size, particularly for the
individual participants, RU means per 100 syllables were
derived for the 300-, 600-, 900-, and 1,200- syllable
samples. Table 4 presents the individual data, with the
highest RU count marked in bold. In 2 cases (Participants
14 and 20), the highest count occurred in two sections but
only the first one is marked in bold. Note that a child’s
mean number of repetition units could not be less than 1
because the RU was calculated by dividing the total number
of iterations by the number of instances of repetitions.
Group data are reported in the bottom of the table. The
only notable change was a slight increase in RU as the
sample size grew from 600 to 900 syllables. As with SLD,
these data could not be statistically compared because
disfluencies in later sections included those same dis-
fluencies present in earlier sections.
When we reviewed the individual data, we found that
6 children had the highest number of repetition units in
the first 300 syllabl es of the speech sample, 2 reached
TABLE 2. Group ranges, means, and standard deviations for
SLD counts per 100 syllables for each 300-syllable section.
Section Range MSD
1 1.33–13.33 7.12 3.69
2 1.33–20.00 7.98 5.74
3 1.67–21.67 8.70 6.27
4 3.33–32.33 9.85 7.23
TABLE 3. Number of children and percentage of total
participant sample with an increase of at least 20% in SLD
counts per 100 syllables between each 300-syllable section.
Section
No. of children 1–2 2–3 3–4
9 45%
6 30%
11 55%
40 American Journal of Speech-Language Pathology
Vol. 15
36–44
February 2006
the longest average disfluency in 600 syllables of speech,
and 5 children reached the longe st average disfluency in 900
syllables. Seven children reached their highest level of RU
only when the entire 1,200- syllable sample was considered.
Although overall the differences seem small, several
studies of repetition units (e.g., Ambrose & Yairi, 1995,
1999) have reported that small differences in this dimen-
sion may be the single most important differential factor
between children who stutter and normally speaking
children. Not having a precise critical difference for RU,
we again took a 20% change criterion. Accordingly, the
individual data in Table 4 were marked as either N (no
change from the first 300 syllables) or U (upward change
at the speech sample length where it was first detected).
This analysis showed that 15 children (75% of the group)
remained stable on the dimension of length, whereas 5
children (25% of the group) increased the length. Of these,
3 (Nos. 17, 18, and 19) were all in the range of severe
stuttering and exhibited what can be regarded as large
differences in mean RU from one sample size to another.
Section-by-section data. As was done for the frequenc y
of SLD, group means for RU, reflecting the length of
disfluency, were compared among the four 300-syllable
sections for purposes of statistical analysis. The data are
presented in Table 5 with the numbers 1 through 4 denoting
the four consecutive 300-syllable sections in the speech
sample. Only small variations among sections could be
observed. A one-way, repeate d measures ANOVA revealed
no statistically significant differences among the four
means, F(1, 19) = 3.43, p = .080.
Discussion
We focused in the present study on the relationships
among the size of the speech sample, outcome measure-
ments of SLD, and mean length of disfluency (RU). To the
best of our knowledge, this is the only study to have
systematically investigated the effect of sample size on the
resultant frequency and length of disfluency. Our assump-
tion was that, given more opportunities for disfluency to
occur, that is, longer speech samples, more valid informa-
tion could be obtained regarding this speech phenomenon.
SLD
Although SLD occurs in the speech of both children who
stutter and normally fluent children, it has been shown in
several ways to serve as a strong index of stuttering: SLD
(a) occurs much more frequently in the speech of the first
group (Ambrose & Yairi, 1999; Pellowski & Conture,
2002); (b) is more likely to provoke listeners’ perc eption of
stuttered speech than are other disfluencies, such as
revisions, phrase repetitions, and interjections (Conture,
2001; Van Riper, 1971; Yairi, 1996); and (c) has been
shown to closely reflect developmental changes in stutter-
ing over time (Yairi & Ambrose, 1999) and to closely agree
with other measures of such changes, such as stuttering
severity ratings. Hence, valid, accurate data for this
disfluency class are particularly important both in clinical
situations and for research purposes.
Although the general findings showing that changes in
the frequency of disfluency occurred throughout the sample
were expected, the specific results of statistically signifi-
cant differences in the frequency of SLD between Sections
1 and 4 suggest that for fuller exploration of children’s
disfluency output, measured in SLD per 100 syllables,
later portions in longer speech samples may contain
information that can appreciably alter research outcome
where groups are concerned. Longer samples might alter
outcomes even more so in clinical evaluation where a
single child is the target. Inasmuch as half of the young
children we evaluated showed more than a 20% rise in their
SLD counts as the speech sample exceeded 300 syllables, a
commonly used speech sample size, clinicians may wish to
be mindful of possible ways to increase accuracy in their
diagnostic procedures. The finding that a few children
decreased their disfluency whereas several children re-
mained stable does not change this clinical consideration.
Overall, the results suppor t the general recommendations
for collecting speech samples of sufficient length as
suggested by Lund and Duchan (1993) for the purpose of
assessing language deficits. The precaution of reco rding
more speech would appea r to be especially warranted for
children who seem to exhibit a low level of SLD but are
suspected by parents or others to exhibit stuttering. For 3
TABLE 5. Group ranges, means, and standard deviations for RU
counts for each 300-syllable section.
Section Range MSD
1 1.00–1.50 1.24 0.16
2 1.00–1.80 1.28 0.20
3 1.00–1.96 1.27 0.25
4 1.00–2.08 1.36 0.27
TABLE 4. Individual repetition unit (RU) counts, group means,
and standard deviations for 300, 600, 900, and 1,200 syllables.
Syllables
Participant 300 600 900 1,200
1 1.40 N 1.38 1.33 1.48
2 1.14 N 1.11 1.12 1.12
3 1.17 N 1.17 1.11 1.18
4 1.00 N 1.06 1.08 1.05
5 1.00 1.00 1.10 1.16 U
6 1.00 1.15 1.28 U 1.19
7 1.16 N 1.17 1.14 1.23
8 1.31 N 1.24 1.18 1.17
9 1.06 N 1.11 1.11 1.12
10 1.35 N 1.48 1.42 1.38
11 1.50 N 1.47 1.35 1.42
12 1.45 N 1.55 1.46 1.48
13 1.17 N 1.13 1.10 1.10
14 1.44 N 1.35 1.44 1.44
15 1.29 N 1.20 1.36 1.43
16 1.43 N 1.35 1.33 1.38
17 1.24 1.30 1.94 U 1.38
18 1.17 1.29 1.81 U 1.45
19 1.22 1.25 1.71 U 1.99
20 1.21 N 1.29 1.30 1.30
M 1.24 1.25 1.33 1.32
SD 0.16 0.15 0.25 0.21
Sawyer & Yairi: Sample Size and Stuttering Severity 41
out of 7 children in this category in this study, the critica l
diagnosis of stuttering would have been missed had only
the first 300 syllables been analyzed, and 5 out of the
10 children showing appreciable increase were in this
group of mild stuttering.
The significant group differences indicate that 300
syllables, or other similarly short samples reported in past
research (e.g., Adams, 1977; Conture, 1990; Curlee, 1999;
MacDonald & Martin, 1973; Martin & Haroldson, 1981;
Onslow et al., 1992; Yaruss, 1997b), could also be too short
for certain research purposes. One question that may arise
concerns the magnitude of the differences among the group
means for the different sections within the single sample
used in our study. Are the differences in SLD means
between Sections 1 and 2, 1 and 3, and 1 and 4 (0.86, 1.52,
and 1.43, respectively) meaningful? Is the difference of
1.43 SLD between the short (300 syllables) and long
(1,200 syllables) samples meaningful? There are good
reasons for an affirmative response, especially at the low-
to-medium level of disflu ency. For example, it has been
widely accepted that 3.0 SLD per 100 syllables/words,
or similar measures, is the cut-off point between normal
disfluency and stuttering. Therefore, a child who exhi bits
only 2.5 SLD is typically regarded as normally fluent.
By adding only 1.5 SLD, a child exhibiting 4.0 SLD is
clearly in the category of mild stuttering. Similarly, when
evaluating recovery from stuttering, either natural or the
result of treatment, slidi ng down from 4.0 to 2.5 SLD per
100 syllables is a very meaningful change. In this respect,
it is also worthwhile to note that when Yaruss (1997a)
studied the effect of situational variability on the level of
disfluency, his data showed that among four situations, the
group’s mean difference in disfluency varied only from 0.36
to 0.80, which is significantly below the group difference
in the current study. When a fifth situation was added, his
largest difference was 1.54, very similar to the magnitude of
differences reported by us. Still, Yaruss’s finding s have been
frequently referred to as evidence for situational influence
on the level of disfluency. Of course, it should be recognized
that at high levels of stuttering severity, larger changes in
the frequency of SLD are necessary for making a substantial
impact. As to this issue in our study, few children exhibited
significant decline in SLD as the speech sample increased
in length, whereas others remained stable.
One explanation of the findings is that the changes
through the sample reflect changes in linguistic variables,
such as utterance length and complexity, that might have
increased toward the end of a long speech sample, as the
children warmed up and talked more freely, getting more
involved and excited and using longer sentences. Such
factors have been shown to influence disfluency output
(Gaines, Runyan, & Meyers, 1991; Logan & Conture,
1995; Ratner & Sih, 1987; Yaruss, 1999). In addition,
longer sentences may have involved a faster rate. More
excitement and a faster rate are likely to result in more
disfluency. Alternatively, as the children became more com-
fortable over time, they might have responded more will-
ingly to questions that required explanations instead of
uttering brief utterances about what they were making with
the Play-Doh.
Last, although one could suggest that the changes in
disfluency during the sample were the result of such
situational variability, which reinforces the opinion that
multisituational speech sampling is necessary, a contrary
argument is just as reasonable—that what has been
suggested in the past as the effect of situational variability
on the level of disfluency is not really different than
variability that takes place within a single sample.
Furthermore, the magnitude of changes within a long
sample, either up or down, is similar to, or even somewhat
larger than, those reported by Yaruss (1997a) to have
occurred among five different situations. It is possible then,
that a single, long sample may provide similar information
about variability.
RU
RU, as one measure of the length of disfluent events, has
been reported to be reliably counted by judges (Ambrose &
Yairi, 1995; Zebrowski, 1991). It is also a measure that has
been shown to different iate the speech of children who
stutter from normally fluent speech in young children
(Ambrose & Yairi, 1995, 1999; Johnson & Associates,
1959; Wexler, 1982), and RU has been used in several
clinical protocols for identification of incipient stuttering
(Adams, 1977; Cooper & Cooper, 1985; Curlee, 1980;
Pindzola, 1987; Riley, 1984; Van Riper, 1982).
In the current study, however, RU did not prove to be
significantly affect ed for the group as a whole by the length
of the speech sample. However, clinicians may wish to take
notice that although the section-by-section mean differ-
ences were not statistically significant, some children,
especially among those with severe stuttering, showed large
increases in RU as the sample length increased. Inasmuch
as even one or two disfluent events containing more than
2 RU leave strong impressions on listeners, the combined
effects of SLD frequency and RU at the end of a long
sample might help clinicians arrive at a more accurate
assessment of the child’s stuttering.
Conclusions and Implications
Conture (2001) pointed out that clinicians do not always
have control over the size of the speech sample. For
example, the child might be tired, irritable, or unwilling to
speak much, and the parents might not be able to return
with the child for a second or multiple recording of speech
samples. Although short samples (e.g., 300 words) may
provide basic data, we posit that the longer the sample is,
the more valid its representation will be. The present
findings indicate that, particularly in children who are
suspected to exhibit stuttering but reveal relatively low
levels of disfluency during evaluation or other types or
testing, the additional information gained from a long
speech sample can be critical. The question is how far a
clinician or a researcher should go in terms of sample size.
Whereas one may argue that the upward trend in disfluency
associated with sample length, as reported here, suggests
that samples longer than 1,200 syllables might be neces-
sary, additional research is needed to provide more precise
42
American Journal of Speech-Language Pathology
Vol. 15
36–44
February 2006
and defendable answers about the effects of speech sample
length. However, one must also weigh the time factor
associated with sample length, in either clinical or research
settings, against the information to be gained. Yet, for those
who claim that 300 syllables are enough for valid quantifica-
tion of stuttering in very young children, the obvious
question to ask is why not only 150 syllables? It is surprising
that such a basic issue as the length of speech sample to be
used has received so little systematic research. Whereas
more data and further analyses of contributing factors are
necessary to identify the optimal speech sample length, in
the meantime some clues are provided by the present
findings in that for 70% of the children (14 out of 20)
substantial upward fluctuations were reflected in 600
syllables, whereas 85% of the children were covered by a
900-syllable sample. Although these clues are general-
izations, their application to children with borderline or mild
stuttering is warranted.
Regardless of the present findings, arguing in favor of
600-syllable samples as a minimum, Yairi and Ambrose
(2005) pointed out that at least three tokens for any given
type of disfluency are necessary to identify a pattern or
obtain a mean and to indicate that the behavior is more than
circumstantial. Although for some types of disfluency a
300-syllable sample could yield three tokens, for other, less
frequent disfluencies, such as disrhythmic phonation,
300 syllables may not be sufficient. Because disrhythmic
phonations also occur occasionally in the speech of normally
fluent children from a frequency of 0 up to a maximum of
0.49 per 100 syllables, it is important to sample them at
frequencies of 0.50 or above. To obtain three examples of a
disfluency at that rate, a 600-syllable sample is required.
As stated above, however, research is required to examine
this issue before a definitive recommendation can be made.
It is clear, however, that if only a single session is available ,
more recording should be attempted.
Another implication involves the practice of analyzing
selected sections taken from a larger corpus. Often, clinician
and investigators tend to disregard the early and the last
sections. For example, the SSI (Riley, 1972) advoca tes
deleting the first and last 25 syllables of the speech sample.
Our findings suggest that any truncation should be at the
beginning, rather than the end, of a long speech sample.
Caveats and Future Research Directions
More research is needed to investigate further the effect
of sample length and factors that might influence disfluency.
Were the differences observed in the present study due solely
to length? Do other factors surface along with this
parameter? As mentioned earlier, past research has shown
different factors affect disfluency, including linguistic
variables, such as mean length of utterance and complexity.
Other variables that may affect disfluency include a faster
rate of speech, an increased level of excitement, or an
increased level of comfort with the listener. Another
important research goal would be to study what constitutes
critical difference at various levels of disfluency. All these
possibilities are worthy of pursuing. Last, additional
research with larger numbers of children who exhibit
different disfluency levels would permit a finer analysis of
the parameter that has surfaced in the current study.
Acknowledgment
This research was supported by Research Grant R01-DC
05210 from the National Institute on Deafness and Other
Communication Disorders, National Institutes of Health, to the
second author.
References
Adams, M. R. (1977). A clinical strategy for differentiating the
normally nonfluent child and the incipient stutterer. Journal of
Fluency Disorders, 2, 141–148.
Ambrose, N. G., & Yairi, E. (1995). The role of repetition units
in the differential diagnosis of early childhood incipient
stuttering. American Journal of Speech-Language Pathology,
4, 82–88.
Ambrose, N., & Yairi, E. (1999). Normative disfluency data for
early childhood stuttering. Journal of Speech, Language, and
Hearing Research, 42, 895–909.
Baird, S., & Nelson-Gray, R. O. (1999). Direct observation and
self monitoring. In S. C. Hayes, D. H. Barlow, & R. O. Nelson
Gray (Eds.), The scientist practitioner: Research and
accountability in the age of managed care (2nd ed., pp. 353–
386). Needham Heights, MA: Allyn & Bacon.
Conture, E. (1990). Stuttering (2nd ed.). Englewood Cliffs, NJ:
Prentice-Hall.
Conture, E. (1997). Evaluating childhood stuttering. In R. F.
Curlee & G. M. Siegel (Eds.), Nature and treatment of
stuttering: New directions (2nd ed., pp. 239–256). Needham
Heights, MA: Allyn & Bacon.
Conture, E. (2001). Stuttering: Its nature, diagnosis, and
treatment. Needham Heights, MA: Allyn & Bacon.
Conture, E., & Kelly, E. (1991). Young stutterers’ non-speech
behaviors during stuttering. Journal of Speech and Hearing
Research, 34, 1041–1056.
Cooper, E., & Cooper, C. (1985). Cooper personalized fluency
control therapy (Rev.). Allen, TX: DLM Teaching Resources.
Cordes, A. K., & Ingham, R. J. (1994). Time-interval
measurement of stuttering: Effects of interval duration.
Journal of Speech and Hearing Research, 37, 779–788.
Costello, J. M., & Ingham, R. J. (1984). Assessment strategies
for stuttering. In R. F. Curlee & W. H. Perkins (Eds.),
The nature and treatment of stuttering: New directions
(pp. 303–334). Boston: College Hill.
Curlee, R. F. (1980). A case selection strategy for young disfluent
children. Seminars in Speech, Language, and Hearing, 1,
277–287.
Curlee, R. F. (1999). Identification and case selection guidelines
for early childhood stuttering. In R. F. Curlee (Ed.), Stuttering
and related disorders of fluency (2nd ed., pp. 1–21).
New York: Thieme Medical.
Gaines, N., Runyan, C., & Meyers, S. (1991). A comparison of
young stutterers’ fluent versus stuttered utterances on measures
of length and complexity. Journal of Speech and Hearing
Research, 34, 37–42.
Gordon, P. A., & Luper, H. L. (1992). The early identification of
beginning stuttering I: Protocols. American Journal of Speech-
Language Pathology: A Journal of Clinical Practice, 1, 43–53.
Gregory, H., & Hill, D. (1993). Differential evaluation—
Differential therapy for stuttering children. In R. F. Curlee
(Ed.), Stuttering and related disorders of fluency (pp. 23–44).
New York: Thieme Medical.
Sawyer & Yairi: Sample Size and Stuttering Severity 43
Gutierrez, J., & Caruso, A. (1995). The variable nature of
stuttering: A clinical case study. National Student Speech
Language Hearing Association Journal, 22, 29–35.
Hubbard, C., & Yairi, E. (1988). Clustering of disfluencies in
the speech of stuttering and nonstuttering preschool children.
Journal of Speech and Hearing Research, 31, 228–233.
Ingham, J. C., & Riley, G. (1998). Guidelines for documentation
of treatment efficacy for young children who stutter. Journal of
Speech, Language, and Hearing Research, 41, 753–770.
Ingham, R. J., Southwood, H., & Horsburgh, G. (1981). Some
effects of the Edinburgh masker on stuttering during oral
reading and spontaneous speech. Journal of Fluency
Disorders, 6, 135–154.
Johnson, W., & Associates. (1959). The onset of stuttering.
Minneapolis: University of Minnesota.
Johnson, W., Darley, F., & Spriestersbach, D. C. (1963). Diag-
nostic methods in speech pathology. New York: Harper & Row.
Lahey, M. (1988). Language disorders and language develop-
ment. New York: Macmillan.
LaSalle, L. R., & Conture, E. G. (1995). Disfluency clusters of
children who stutter: Relation of stutterings to self-repairs.
Journal of Speech and Hearing Research, 38, 965–977.
Logan, K., & Conture, E. (1995). Length, grammatical
complexity, and rate differences in stuttered and fluent
conversational utterances of children who stutter. Journal of
Fluency Disorders, 20, 35–61.
Lund, N., & Duchan, J. F. (1993). Assessing children’s language
in naturalistic contexts (3rd ed.). Englewood Cliffs,
NJ: Prentice-Hall.
MacDonald, J. D., & Martin, R. R. (1973). Stuttering and
disfluency as two reliable and unambiguous response classes.
Journal of Speech and Hearing Research, 16, 691–699.
Martin, R., & Haroldson, S. K. (1981). Stuttering identification:
Standard definition and moment of stuttering. Journal of
Speech and Hearing Research, 46, 59–63.
Martin, R., Kuhl, P., & Haroldson, S. (1972). An experimental
treatment with two preschool stuttering children. Journal of
Speech and Hearing Research, 15, 743–752.
Meyers, S. (1986). Qualitative and quantitative differences and
patterns of variability in disfluencies emitted by preschool
stutterers and nonstutterers during dyadic conversations.
Journal of Fluency Disorders, 11, 293–306.
Miller, J., & Chapman, R. (1996). SALT: Systematic Analysis of
Language Transcripts. Madison: University of Wisconsin.
Onslow, M., Costa, L., & Rue, S. (1990). Direct early
intervention with stuttering: Some preliminary data. Journal of
Speech and Hearing Disorders, 55, 405–416.
Onslow, M., Gardner, K., Bryant, K., Stuckings, C., &
Knight, T. (1992). Stuttered and normal speech events in
early childhood: The validity of a behavioral data language.
Journal of Speech and Hearing Research, 35, 79–87.
Pellowski, M., & Conture, E. (2002). Characteristics of speech
disfluency and stuttering behavior in 3- and 4-year old children.
Journal of Speech, Language, and Hearing Research, 45, 20–34.
Pindzola, R. H. (1987). Stuttering intervention program. Tulsa,
OK: Modern Education.
Ratner, N., & Sih, C. (1987). The effects of gradual increases in
sentence length and complexity on children’s dysfluency.
Journal of Speech and Hearing Research, 52, 278–287.
Riley, G. (1972). Stuttering severity instrument for children and
adults. Journal of Speech and Hearing Disorders, 37, 314–320.
Riley, G. (1984). Stuttering Prediction Instrument for Young
Children (Rev. ed.). Austin, TX: Pro-Ed.
Riley, G. (1994). Stuttering Severity Instrument for Children and
Adults (3rd ed.). Austin, TX: Pro-Ed.
Sawyer, J., & Yairi, E. (2003, November). How much is enough?
The effect of sample size on the assessment of stuttering.
Poster presented at the American Speech-Language-Hearing
Association Convention, Chicago.
Schwartz, H., & Conture, E. (1988). Subgrouping young
stutterers: Preliminary behavioral observations. Journal of
Speech and Hearing Research, 31, 62–71.
Schwartz, H., Zebrowski, P., & Conture, E. (1990). Behaviors
at the onset of stuttering. Journal of Fluency Disorders, 15,
77–86.
Silverman, E. (1971). Situational variability of preschoolers’
disfluency: Preliminary study. Perceptual and Motor Skills, 33,
1021–1022.
Van Riper, C. (1971). The nature of stuttering. Englewood Cliffs,
NJ: Prentice-Hall.
Van Riper, C. (1982). The nature of stuttering (2nd ed.).
Englewood Cliffs, NJ: Prentice-Hall.
Wexler, K. (1982). Developmental disfluency in 2-, 4-, and
6-year-old boys in neutral and stress situations. Journal of
Speech and Hearing Research, 25, 229–234.
Williams, D. E. (1978). Differential diagnosis of disorders of
fluency. In F. L. Darley & D. C. Spriestersbach (Eds.),
Diagnostic methods in speech pathology (2nd ed., pp. 409–
438). New York: Harper & Row.
Wingate, M. E. (1976). Stuttering theory and treatment.
New York: Irvington.
Yairi, E. (1996). Applications of disfluencies in measurements
of stuttering. Journal of Speech and Hearing Research, 39,
402–404.
Yairi, E. (1997). Disfluency characteristics of childhood stutter-
ing. In R. F. Curlee & G. M. Siegel (Eds.), Nature and
treatment of stuttering: New directions (2nd ed., pp. 49–78).
Needham Heights, MA: Allyn & Bacon.
Yairi, E., & Ambrose, N. (1999). Early childhood stuttering I:
Persistency and recovery rates. Journal of Speech, Language,
and Hearing Research, 42, 1097–1112.
Yairi, E., & Ambrose, N. (2005). Early childhood stuttering: For
clinicians, by clinicians. Austin, TX: Pro-Ed.
Yairi, E., & Lewis, B. (1984). Disfluencies at the onset of
stuttering. Journal of Speech and Hearing Research, 27,
154–159.
Yaruss, J. S. (1997a). Clinical implications of situational
variability in preschool children who stutter. Journal of
Fluency Disorders, 22, 187–203.
Yaruss, J. S. (1997b). Clinical measurement of stuttering
behaviors. Contemporary Issues in Communication Science
and Disorders, 24, 33–44.
Yaruss, J. S. (1999). Utterance length, syntactic complexity, and
childhood stuttering. Journal of Speech, Language, and
Hearing Research, 42, 329–344.
Young, M. A., & Prather, E. M. (1962). Measuring severity of
stuttering using short segments of speech. Journal of Speech
and Hearing Research, 5, 256–262.
Zebrowski, P. (1991). Duration of speech disfluencies of
beginning stutterers. Journal of Speech and Hearing Research,
34, 483–491.
Received October 25, 2004
Revision received May 24, 2005
Accepted October 13, 2005
DOI: 10.1044/1058-0360(2006/005)
Contact author: Jean Sawyer, Department of Speech Pathology
and Audiology, 204 Fairchild Hall, Illinois State University,
Normal, IL 61790. E-mail: jsawyer @ilstu.edu
44 American Journal of Speech-Language Pathology
Vol. 15
36–44
February 2006
A preview of this full-text is provided by American Speech-Language-Hearing Association.
Content available from American Journal of Speech-Language Pathology
This content is subject to copyright. Terms and conditions apply.