ArticlePDF Available

‘Who’s a good boy?!’ Dogs prefer naturalistic dog-directed speech

Authors:

Abstract and Figures

Infant-directed speech (IDS) is a special speech register thought to aid language acquisition and improve affiliation in human infants. Although IDS shares some of its properties with dog-directed speech (DDS), it is unclear whether the production of DDS is functional, or simply an overgeneralisation of IDS within Western cultures. One recent study found that, while puppies attended more to a script read with DDS compared with adult-directed speech (ADS), adult dogs displayed no preference. In contrast, using naturalistic speech and a more ecologically valid set-up, we found that adult dogs attended to and showed more affiliative behaviour towards a speaker of DDS than of ADS. To explore whether this preference for DDS was modulated by the dog-specific words typically used in DDS, the acoustic features (prosody) of DDS or a combination of the two, we conducted a second experiment. Here the stimuli from experiment 1 were produced with reversed prosody, meaning the prosody and content of ADS and DDS were mismatched. The results revealed no significant effect of speech type, or content, suggesting that it is maybe the combination of the acoustic properties and the dog-related content of DDS that modulates the preference shown for naturalistic DDS. Overall, the results of this study suggest that naturalistic DDS, comprising of both dog-directed prosody and dog-relevant content words, improves dogs’ attention and may strengthen the affiliative bond between humans and their pets. Electronic supplementary material The online version of this article (10.1007/s10071-018-1172-4) contains supplementary material, which is available to authorized users.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
1 3
Animal Cognition (2018) 21:353–364
https://doi.org/10.1007/s10071-018-1172-4
ORIGINAL PAPER
‘Who’s agood boy?!’ Dogs prefer naturalistic dog‑directed speech
AlexBenjamin1· KatieSlocombe1
Received: 29 August 2017 / Revised: 22 February 2018 / Accepted: 23 February 2018 / Published online: 2 March 2018
© The Author(s) 2018. This article is an open access publication
Abstract
Infant-directed speech (IDS) is aspecial speech register thought to aid language acquisition and improve affiliation in human
infants. Although IDS shares some of its properties with dog-directed speech (DDS), it is unclear whether the production
of DDS is functional, or simply an overgeneralisation of IDS within Western cultures. One recent study found that, while
puppies attended more to a script read with DDS compared with adult-directed speech (ADS), adult dogs displayed no
preference. In contrast, using naturalistic speech and a more ecologically valid set-up, we found that adult dogs attended to
and showed more affiliative behaviour towards a speaker of DDS than of ADS. To explore whether this preference for DDS
was modulated by the dog-specific words typically used in DDS, the acoustic features (prosody) of DDS or a combination
of the two, we conducted a second experiment. Here the stimuli from experiment 1 were produced with reversed prosody,
meaning the prosody and content of ADS and DDS were mismatched. The results revealed no significant effect of speech
type, or content, suggesting that it is maybe the combination of the acoustic properties and the dog-related content of DDS
that modulates the preference shown for naturalistic DDS. Overall, the results of this study suggest that naturalistic DDS,
comprising of both dog-directed prosody and dog-relevant content words, improves dogs’ attention and may strengthen the
affiliative bond between humans and their pets.
Keywords Dog-directed speech· Human–dog communication· Infant-directed speech· Dog cognition· Affiliative
behaviour· Dog attention
Introduction
When talking to an infant, adults use a special speech regis-
ter characterised by elevated fundamental frequency (pitch),
exaggerated intonation contours and high affect (Burnham
etal. 2002). This phenomenon is evident across languages
including English, Russian, Swedish and Japanese (Kuhl
etal. 1997; Andruski etal. 1999). It is thought that infant-
directed speech (IDS) facilitates infants’ linguistic develop-
ment by amplifying the phonetic characteristics of native
language vowels (Kuhl etal. 1997), allows infants’ to select
appropriate social partners (Schachner and Hannon 2011)
and increases social bonding between infant and caregiver
(Kaplan etal. 1995).
In the same way that IDS is produced automatically when
talking to infants, humans in Western cultures also produce
a special speech register when talking to their pets. This
pet-directed speech (PDS) shares some of the acoustic fea-
tures of IDS including elevated pitch and exaggerated affect
compared to adult-directed speech (ADS) (Burnham etal.
1998). It is possible that pitch is elevated in IDS and PDS
in order to attract the listener’s attention, while affect is ele-
vated to meet listener’s emotional needs, possibly motivating
affiliative interaction with the speaker. One crucial feature
not shared between IDS and PDS and only found in IDS
is the hyperarticulation of vowels (Burnham etal. 1998).
Hyperarticulation of vowels may be the aspect of IDS that
assists spoken language acquisition (Kuhl etal. 1997) and
the speaker’s hyperarticulation may be mediated by the per-
ceived linguistic capacity of the receiver; evidence that sup-
ports this view is provided by a study that compared speech
produced to dogs, parrots and infants. Speakers seem to
hyperarticulate their vowels most with prelinguistic human
Electronic supplementary material The online version of this
article (https ://doi.org/10.1007/s1007 1-018-1172-4) contains
supplementary material, which is available to authorized users.
* Katie Slocombe
Ks553@york.ac.uk
1 Department ofPsychology, The University ofYork, York,
UK
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
354 Animal Cognition (2018) 21:353–364
1 3
infants, followed by parrots, with little evidence of this when
addressing dogs, who in contrast to parrots have no ability
to produce speech (Xu etal. 2013).
It is evident that speakers are sensitive to their audience in
terms of acoustic preference, emotional needs and linguistic
potential; however, in order to understand the function of
special speech registers, it is crucial to understand how they
affect the receiver. Human infants show a preference for IDS
from a very early age (Kaplan etal. 1995), with Cooper and
Aslin (1990) finding preferences for IDS over ADS in 2-day-
old infants. Werker and McLeod (1989) measured affective
responsiveness to ADS and IDS in 4–5- and 7–9-month-old
infants. Two trained raters judged the affective responsive-
ness of infants, comprising of how much they thought the
infant was trying to interact with the speaker, how interested
they appeared and the valence of the infant’s emotional state.
They found that infants of both age groups showed greater
affective responsiveness to IDS than to ADS. They also
found that when presented with video recordings of infants
listening to speech, unfamiliar observers rated the infants
more ‘appealing’ when the infants were listening to IDS
than when they were listening to ADS. This indicates that
the use of IDS may facilitate the development of an emo-
tional bond between adults and infants. In contrast to IDS,
there has been very little research into the effect of PDS on
receivers, meaning that it is currently unclear whether PDS
is a non-functional overgeneralisation of IDS in Western cul-
tures where pets often have the status of infants or whether it
functions to gain pets’ attention and strengthen the affiliative
bond between humans and their pets.
Ben-Aderet etal. (2017) were the first to investigate
both the production of dog-directed speech (DDS) and the
behavioural response to DDS in puppies, adult dogs and
older dogs. Acoustic analysis of DDS confirmed previ-
ous descriptions of the acoustic structure of this speech
register, where DDS was higher in pitch, with more pitch
variation over time, and higher harmonicity than ADS.
They also showed that human adults produced DDS to
dogs of all ages. Crucially, Ben-Aderet etal. (2017) then
conducted playback experiments using the DDS and
ADS recorded in the first part of the study to test dog
responses to these types of speech. Stimuli consisted of
repetitions of the phrase ‘Hi! Hello cutie! Who’s a good
boy? Come here! Good Boy! Yes! Come here sweetie pie!
What a good boy!’ in dog- and adult-directed prosody.
Speech was played from a loudspeaker in the corner of
the room, with no human near the source of the sound and
various measures of dogs’ attention to and approach of
the loudspeaker were combined into a composite behav-
ioural response measure. They found that puppies showed
a higher behavioural response to DDS than for ADS,
but this preference decreased as a function of age. The
authors conclude that puppies are highly reactive to DDS
and that pitch is a key feature in modulating this prefer-
ence, but that adult dogs do not react differentially to DDS
and ADS. They argue that DDS may have a functional
value in puppies, but not adult dogs, and therefore, the
use of DDS with adult dogs may simply be a ‘spontaneous
attempt to facilitate interactions with non-verbal listeners
(Ben-Aderet etal. 2017, p. 1). It is, however, possible that
alternative explanations of the null result with adult dogs
exist. As Ben-Aderet etal. discuss, adult dogs may need
additional cues (e.g. gestures) to respond to unfamiliar
speakers. If DDS functions to facilitate social communi-
cation and interaction, it may only be relevant to attend
to it when it comes from a human that can be attended
to and socialised with. It is possible that if no human
experimenter is present, adult dogs realise that there is
no social benefit to reacting preferentially to any speech.
Puppies, with little experience of the world, may not rec-
ognise this and therefore still responded to DDS in the
absence of a feasible producer. While it is clear that pup-
pies are more reactive to the prosody of DDS than adult
dogs, further testing with a human speaker present during
stimulus presentation is required in order to rigorously
test whether adult dogs really are insensitive to DDS. We
therefore aimed to test the possible function of DDS with
adult dogs in a more ecologically valid setting where atten-
tion and affiliation towards the individuals who produced
DDS could be directly measured. Dogs were presented
with two experimenters with audio speakers on their laps
that played naturalistic DDS or ADS (differing in both
prosody and content), and we measured the dogs’ atten-
tion to each individual during speech and then proximity
to the experimenters once dogs were given the opportunity
to approach them after the speech finished. We predicted
that if DDS is functional for adult dogs, in experiment
1 they should attend more to DDS than ADS, and when
given the opportunity to approach the experimenters,
they should choose to spend more time in proximity to
the individual who produced DDS. We then ran a second
experiment to investigate whether content or prosody was
driving any preferences for naturalistic DDS. Here we
presented content-mismatched stimuli (e.g. adult content
with dog prosody and vice versa) and predicted that if the
content of naturalistic DDS was driving preferences, dogs
should attend to and spend more time near the individual
producing dog-relevant content. If, on the other hand, the
prosody of DDS was driving preferences, as was the case
for the puppies studied by Ben-Aderet etal. (2017), dogs
should attend to and spend more time near the individual
producing dog-directed prosody. Finally if preferences for
naturalistic DDS are driven by both content and prosody,
or result from the combination of dog-relevant content and
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
355Animal Cognition (2018) 21:353–364
1 3
DDS prosody, we expect to find no significant preference
for either of the mismatched stimuli.
Experiment 1
As we were interested in naturalistic dog- and adult-directed
speech, the stimuli used in this experiment varied in both
content and prosody. The stimuli were ‘matched’ in prosody
and content such that DDS consisted of dog-relevant con-
tent and dog-directed prosody, and ADS consisted of adult-
relevant content and adult-directed prosody.
Methods
Study site andparticipants
Dogs were recruited from Redhouse Boarding Kennels,
York, with permission from the kennel owner. In experi-
ment 1, 37 dogs took part (17 females and 20 males; mean
age 6years ± 3.86) in this study between January and May
2014. See supplementary material for more detailed age,
gender and breed information (TableS1). Where dogs have
been removed from various parts of the analysis due to inter-
ruptions, equipment failures or safety reasons, the details
and N for each analysis are given.
Stimuli
Stimuli were recorded as uncompressed WAV files using a
Marantz PMD661 solid-state recorder from the two human
female experimenters (aged 20–21). The recordings from
experimenter A were always presented through experimenter
A’s speaker (and the same for experimenter B), ensuring con-
gruency of speech with physical characteristics. Although
only presenting speech from the experimenters meant that
multiple dogs heard the same recordings, it ensured that the
stimuli were congruous with the physical characteristics of
the experimenters (age, gender, height), thus maximising
ecological validity and removing the possibility of looking
time measures being affected by incongruity of the stimuli.
DDS was chosen from a sample of recorded naturalistic
interactions with a friendly dog (irish setter). ADS was cho-
sen from a sample of naturalistic adult–adult interactions
that occurred between the experimenters (see supplementary
material for transcripts).
Two different segments of DDS and ADS for each experi-
menter were selected from the continuous speech recordings
(one 10-s segment and one 15-s segment). The amplitude of
the speech in each segment was modified using Raven Pro
(version 1.4), so that the mean RMS amplitude of each seg-
ment was equalised at approximately 3000. For each trial,
the DDS track of one experimenter was paired with the ADS
track of another. Figure1 illustrates the stimulus timeline.
Design
This experiment used a within-subject design, where all
dogs heard both DDS and ADS. All dogs heard simultane-
ous speech first, followed by DDS only and ADS only. The
order of DDS and ADS only segments was counterbalanced
across trials. Simultaneous was played again at the end, to
eliminate the possibility that dogs would approach the indi-
vidual who spoke last. We also counterbalanced the identity
of the DDS speaker (experimenter 1 or 2) and the location
from which DDS was played (left/right) across trials.
Procedure
Equipment was set up as illustrated in Fig.2. The speakers
were equalised to 70dB at 1m away with white noise using
a sound pressure meter, to ensure that that speech broadcast
from each speaker would be equal in volume. Experimenters
1 and 2 then left the room via door 2. The third experimenter
Fig. 1 A diagram illustrating the stimulus timeline. ADS only and
DDS only segments were counterbalanced such that half the dogs
heard ADS only first and half heard DDS only first. Each track was
played simultaneously (DDS from one speaker, ADS from another
speaker) from an iPod paired with an Anchor speaker. The same 10-s
segment was used in simultaneous 1 and 2 for each speaker, though
these segments differed from the 15-s segments in ADS and DDS
only phases
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
356 Animal Cognition (2018) 21:353–364
1 3
(handler) retrieved the dog from its kennel and entered the
experimental room through door 1. The dog was allowed to
explore the experimental room for 1min (to habituate to the
environment in order to reduce distraction during the trial),
before being put back on a lead and taken into a waiting
room via door 3. Experimenters 1 and 2 entered through
door 2 and sat in the chairs. The handler entered with the
dog. Once the dog was in position, the stimulus was played.
For the duration of the stimulus, the experimenters sat
still to ensure the dogs were not exposed to any body lan-
guage cues. The experimenters did not attempt to move their
mouths simulating the speech. Instead, the experimenters
placed one hand covering their mouths so that the dog could
not see their lips. They also maintained neutral expressions
with eyes directed towards the dog to ensure the dog did not
receive differential facial cues from the experimenters.
While the stimulus played, the dog was kept on a short
lead to ensure it remained within camera visibility, while
still allowing the dog to move around within 1m of the han-
dler. The handler did not interact with the dog and looked at
the ground throughout. At the end of the stimulus phase, the
lead was removed and the dog was allowed to explore freely
for 1min and approach experimenters 1 and 2 if they wished.
The dog received no interaction from any experimenter.
Video coding
Video recordings of each session were analysed, and during
the stimulus presentation, time spent looking towards DDS
and ADS was recorded as measured by head direction. Dur-
ing the 1-min off-lead period following the stimulus pres-
entation, time spent in proximity to DDS and ADS speakers
was recorded, as measured by the position of the dog’s head
in the 1.1m2 area surrounding the speaker (see Fig.2).
The period after the dog entered the room, but before
the stimulus began was used as a control period (mean
duration 4.56 ± 2.14s). Looking times during this phase
were recorded in order to establish whether the dog dis-
played any preference for one experimenter in particular, or
one location (left or right) that may have influenced looking
times in the experiment.
Interobserver reliability
The primary observer (AB) coded 100% of videos. For
experiment 1, two trained observers each coded 30% of
videos (N = 24/36 trials total) and measured looking time
at each speaker in each section of the stimulus (control
silence, simultaneous 1, DDS only, ADS only, simultane-
ous 2; N = 10 measurements) and time in proximity to each
speaker in the minute post-stimulus presentation (N = 2
measurements). The primary coder had high agreement with
the two secondary coders, and there was also high agreement
between the two secondary coders across all measurements
(Spearman’s R > 0.90, p < 0.001 for all comparisons), indi-
cating the videos had been coded reliably.
A third observer, who was blind to the hypotheses of the
experiment, also coded 22% of the videos (N = 8/36 trials
total) with the sound turned off so that they were unaware
which speech type was heard by the dog. There was high
agreement with the primary coder for looking time (R = 0.86,
p < 0.001) and for proximity preference (R = 0.96, p < 0.001).
Statistical analysis
All data were analysed using IBM SPSS (version 24) with
the significance level set at p < .050. Attentive and affilia-
tive preference was evaluated using mixed ANOVAs with
the fixed within-subject factor speech prosody (DDS/ADS),
between-subject factors DDS identity (experimenter 1/exper-
imenter 2) and DDS location (right/left). A single mixed
ANOVA was conducted on the proximity to speakers in the
Fig. 2 Diagram of experimental
set-up at Redhouse Boarding
Kennels in York. Position of
dog marked with a cross. Cam-
eras were positioned behind
and to the right of the dog, and
behind the speakers. Doors to
other areas are marked. Dotted
lines represent edges of areas
in which proximity to speaker
was recorded. Experimenters
with speakers on their laps were
seated on chairs in the centre of
each area
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
357Animal Cognition (2018) 21:353–364
1 3
minute post-stimulus presentation. For looking time, after
the ANOVA on the total looking time had been completed
(Table1), separate ANOVAs were then run for each sec-
tion of the stimulus (simultaneous; ASD only; DDS only).
We applied a more conservative Bonferroni-corrected alpha
level to the separate section analyses (p = 0.01) to correct for
family-wise error that might have arisen from running mul-
tiple tests on the same data set. Finally, we ran an ANOVA
with between-subject factors DDS identity (experimenter 1/
experimenter 2) and DDS location (right/left) on proportion
of looking times in the control period. All assumptions of
these parametric tests were tested and met.
Results
Looking preference
For this analysis, four subjects were removed due to equip-
ment failure (N = 33). During control silence, there was no
significant main effect of Identity or Location, indicating
that dogs did not display any preference for one particular
experimenter or speaker location (Table1). Dogs displayed a
significant preference for DDS across the whole trial (Fig.3;
Table1) and during each phase that contained DDS (Fig.3;
TableS3). Dogs tended to look more towards ADS when
this was the only stimulus available; however, this prefer-
ence was non-significant (Fig.3). No significant interactions
with speaker identity or location were found for total time
(Table1) or separate segments of the stimuli (simultaneous,
DDS only, ADS only) (Supplementary Material: TableS3).
Proximity preference
For this analysis, three dogs were removed from the data
set due to equipment failure or because the dog had to be
kept on a lead, resulting in an N = 34. A mixed ANOVA
revealed that after hearing content-matched stimuli, dogs
spent significantly more time in close proximity to the
DDS speaker than the ADS speaker (F (1, 30) = 5.54,
Table 1 Results of a between-subject ANOVA (df = 1, 29) on looking proportions in the control period and a mixed ANOVA (df = 1, 29) com-
paring main effects and interactions for looking times towards content-matched DDS and ADS
Bold value denotes a significant finding
Significant results are marked, where *** denotes p < 0.005
Within-subject effects F(p) Between-subject effects F(p)
Speech type Speech type *iden-
tity
Speech type * loca-
tion
Speech type *
identity * loca-
tion
Identity Location Identity * location
Control silence 0.38 (.543) 0.59 (.448) 0.85 (.364)
Total looking 40.51 (< .001)*** 0.15 (.704) 1.61 (.215) 0.24 (.627) 0.20 (.656) 1.37 (.251) 0.43 (.517)
Fig. 3 Time spent looking towards content-matched DDS and ADS
where error bars represent 1 standard error of the mean. ***refers
to significant differences (p < 0.005) and n.s denotes non-significant
comparisons as revealed by mixed ANOVAs (total: Table1; other
time segments TableS3)
Fig. 4 A graph to show the mean time spent in proximity to each
experimenter (seconds), in the minute after the speech stimuli ended,
when the dogs heard content-matched DDS and ADS. Error bars rep-
resent one standard error of the mean. (*) denotes a significant main
effect of speech type (p < 0.050) based on the results of ANOVA pre-
sented in Table2
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
358 Animal Cognition (2018) 21:353–364
1 3
p = 0.025; Fig.4). No significant interactions with loca-
tion or speaker identity were found (Table2).
Discussion
This experiment showed that dogs display a behavioural
preference for naturalistic DDS (matched in prosody and
content) compared with ADS when presented in the pres-
ence of an associated human. Dogs, on average, spent
more time looking towards a speaker of DDS compared
with a speaker of ADS in all segments of the stimulus
containing DDS and across the trial as a whole. We also
found that when given the subsequent opportunity to inter-
act with the speakers, dogs chose to spend more time in
proximity with the DDS speaker, than the ADS speaker.
Although the absolute differences in looking and prox-
imity time were small and therefore their functional rel-
evance may be questioned, we feel the substantial effect
sizes obtained and the convergence of results across our
behavioural measures indicates we have detected function-
ally relevant differences in behaviour. Overall, our results
support the hypothesis that dogs display attentive and
affiliative preferences for naturalistic DDS over ADS.
The results from the control period show no signifi-
cant preference for a specific location, or speaker iden-
tity, indicating that the dogs had no a priori preference for
looking at one experimenter or location. In line with this,
no significant main effects of location or speaker identity,
or interactions of identity, location and speech type were
found.
Although our results show a robust preference for natural-
istic DDS over ADS, as the stimuli in this experiment differed
in both content and prosody, it is not possible to determine
whether this effect is driven by dog-directed prosody or con-
tent, as these factors did not vary independently. Therefore,
although this experiment clearly shows that dogs discriminate
between and show a behavioural preference for naturalistic
DDS over ADS, further investigation is required to deter-
mine the extent to which prosody and content are driving this
preference.
Experiment 2
Experiment 2 was designed in order to examine whether
content alone or prosody alone was sufficient for driving
the preference found in experiment 1. In experiment 2, the
content from experiment 1 was reproduced but with reversed
prosody such that the dog-related content was spoken with
the prosody of ADS and vice versa. For simplicity, in all
cases, DDS refers to stimuli with dog-directed prosody
(with either dog- or adult-related content) and ADS refers
to stimuli with adult-directed prosody (with either adult- or
dog-related content). In experiment 2, we presented dogs
with content-mismatched DDS (dog-directed prosody with
adult-related content) and content-mismatched ADS (adult-
directed prosody with dog-related content).
Methods
Study site andparticipants
In experiment 2, 32 dogs from Redhouse Boarding Ken-
nels in York took part (16 females and 16 males; mean age
6years ± 3.75). Data collection for this experiment was con-
ducted 2years after the first experiment (2016).
Stimuli
For experiment 2, uncompressed WAV files were recorded
from two new female experimenters (age 20 and 21). The
experimenters repeated the transcripts from experiment 1
with the opposing prosody, in order to produce content-mis-
matched DDS and ADS. All stimuli were still directed to an
appropriate live audience (e.g. adult script was produced
with dog prosody to a live dog; Irish setter) and processed
as described in experiment 1.
For the stimuli used in experiment 2, some dog content
was repeated in ADS, and some adult content was removed
in DDS. This was in order to account for differences in word
rate between naturalistic DDS and ADS. These alterations
Table 2 Results of a mixed ANOVA with degrees of freedom (1, 30) comparing the time spent near DDS and ADS speakers for content-
matched speech
Bold value denotes a significant finding
Significant results indicated, where * denotes p < 0.050
Within-subject effects F(p) Between-subject effects F(p)
Speech type Speech type *identity Speech type * location Speech type * identity *
location
Identity Location Identity * location
Proximity
prefer-
ence
5.54 (.025)* 1.64 (.210) 0.29 (.592) 0.05 (.833) 1.13 (.552) 0.36 (.552) 0.62 (.438)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
359Animal Cognition (2018) 21:353–364
1 3
are indicated in Supplementary material. The amplitude of
the speech segments was again equalised, and tracks were
built as in experiment 1 (see Fig.1).
Acoustic analysis ofstimuli
To ensure the prosody of the content-mismatched DDS and
ADS for experiment 2 was convincing, we compared the
acoustic properties of these stimuli with the stimuli used in
experiment 1. Mean, minimum and maximum pitch (FO)
was measured (Table3) in PRAAT (version 6.0.05). Pitch
settings were 75-1200Hz and continuous segments of speech
with a continuous visible pitch line were selected, and the
mean, min and max pitch in the segment was extracted using
the ‘get pitch’ function. Pitch modulation was calculated as
maxF0-minF0. Word rate was calculated as the number of
words divided by the duration from the start of the first word
to the end of the last word in a stimulus.
Generalised linear mixed models (GLMMs) were used to
assess the effect of prosody (dog-directed/adult-directed);
content (dog/adult) and content–prosody matching (matched
(experiment 1)/mismatched (experiment 2)) on the acoustic
measurements of stimuli in experiments 1 and 2. These fac-
tors were entered as fixed factors in models with (1) mean
pitch and (2) pitch modulation as DVs. In order to ensure
we were comparing the pitch-related measures of the same
words or phrases, for mean pitch and pitch modulation,
measurements of each continuous segment of speech with
a continuous visible pitch line that were available in both
experiments were entered into the analyses. Each speech
segment was numbered and included as a random factor
along with speaker identity, in order to control for repeated
sampling at these two levels (Warmelink etal. 2013). For
word rate, the rate of each 10- or 15-s stimulus produced
by each speaker was entered into analyses, with speaker
identity entered as a random factor to control for repeated
sampling of each speaker. As we only had a small number of
data points for this GLMM (N = 16), we ran three separate
models, each with a single fixed factor (prosody, content or
prosody–content matching) to avoid overfitting the models.
GLMMs revealed that the content-matched (experiment
1) and content-mismatched stimuli (experiment 2) did not
significantly differ in pitch, pitch modulation or word rate
(Tables3, 4), indicating that the content-mismatched stim-
uli were produced with prosody representative of natural
dog-directed and adult-directed speech. In line with previ-
ous descriptions of the prosody of DDS, the pitch was sig-
nificantly higher, the pitch modulation significantly greater
and word rate significantly slower for stimuli produced with
dog-directed prosody compared to adult-directed prosody
(Burnham etal. 1998; Ben-Aderet etal. 2017; Tables3, 4).
Content did not significantly affect pitch modulation or word
rate, but dog content was significantly higher pitched than
adult content (Tables3, 4).
Design
As in experiment 1, this experiment used a within-sub-
ject design with all dogs hearing both DDS and ADS.
Table 3 Acoustic measurements
of the different types of speech
produced by each experimenter
Mean values from the 10- and 15-s segments are reported in each row
Speaker ID Prosody Content Mean pitch Pitch modulation Word rate
Experimenter 1 DDS Dog 598.88 240.26 172.85
ADS Adult 452.68 170.02 216.01
Experimenter 2 DDS Dog 794.51 207.49 195.37
ADS Adult 413.47 62.97 242.40
Experimenter 3 DDS Adult 684.58 285.92 138.97
ADS Dog 487.00 87.45 270.53
Experimenter 4 DDS Adult 535.02 172.18 128.95
ADS Dog 472.75 83.26 278.71
Table 4 Results of GLMMs
exploring the effect of prosody,
content and content–prosody
matching on pitch, pitch
modulation and word rate
Bold value denotes a significant finding
Significant results are indicated where *** denotes p < 0.005
df Prosody F(p) Content F(p) Content–pros-
ody matching
F(p)
Mean pitch 1, 328 245.86 (< .001)*** 13.97 (< .001)*** 0.58 (.447)
Pitch modulation 1, 328 49.13 (< .001)*** 0.07 (.792) 0.20 (.653)
Word rate 1, 6 34.22 (< 001)*** 3.24 (.094) < 0.01 (.937)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
360 Animal Cognition (2018) 21:353–364
1 3
Between-subject factors such as DDS speaker, DDS loca-
tion and stimulus order were counterbalanced across trials.
Procedure
The procedure for this experiment was identical to that of
experiment 1.
Interobserver Reliability
The primary observer (AB) coded 100% of videos. Two
trained observers each coded 50% of the videos (N = 32/32
trials total). The primary observer had high agreement with
both secondary coders, who also had high agreement with
each other across all measurements (Spearman’s R > 0.90,
p < 0.001 for all comparisons).
A third observer, who was blind to the hypotheses of
the experiment, also coded 22% of the videos (N = 7/32
trials total) with the sound turned off so that they were
unaware which speech type was heard by the dog. There
was high agreement with the primary coder for looking
time (R = 0.93, p < 0.001) and for proximity preference
(R = 0.88, p < 0.001).
Statistical analysis
As above, attentive and affiliative preference was evaluated
using mixed ANOVAs with the fixed within-subject factor
speech prosody (DDS/ADS), between-subject factors DDS
identity (e.g. experimenter 1/experimenter 2) and DDS loca-
tion (right/left). All assumptions were tested and met.
Experiment 2: results
Looking preference
For content-mismatched DDS, 3 trials were removed due
to equipment failure and the following analysis is based
on n = 29. A mixed ANOVA revealed there was no sig-
nificant preference for DDS when content was incongruent
with prosody (Fig.5; Table5). During the control period,
there was a main effect of identity, with dogs preferring to
look towards experimenter 3 compared to experimenter 4
(Table5). There was also an interaction of speech type and
identity for total looking time. To explore the nature of the
interaction between speech type and identity, four inde-
pendent samples t tests with Bonferroni-corrected alpha
(p < 0.0125) were conducted. Firstly, at the level of DDS,
there was a significant main effect of speaker identity, with
dogs preferring the speech of experimenter 3 over experi-
menter 4 (t (27) = 3.08, p = 0.005). However, at the level of
ADS, there was no significant effect of speaker identity (t
(27) = 0.82, p = 0.419). At the level of each speaker, there
was no preference for the DDS of experimenter 3 compared
with her ADS (t (27) = 0.77, p = 0.450), and the same was
true for experimenter 4 (t (27) = −1.50, p = 0.146).
Proximity preference
This analysis is based on N = 30 following equipment fail-
ures. For content-mismatched stimuli, dogs spent more time,
on average, in proximity to the ADS location as illustrated in
Fig.6. However, a mixed ANOVA revealed that this result
was non-significant (see Table6).
To explore whether the failure to find a significant
preference for either type of speech was likely due to
reduced power associated with the slightly smaller sample
size in experiment 2 compared to experiment 1, we con-
sidered effect sizes and conducted power analyses using
G*Power (version 3.1.9.2). The preference for attending
to DDS in experiment 1 was associated with a large effect
size (η2 = 0.563), yet the same comparison in experiment
2 yielded a very small effect size (η2 < 0.001). An a priori
power analysis for looking time in experiment 2 indicated
that to find a similar effect size based on partial η2 of
0.56, with power of 0.80 and an alpha level of 0.05 for the
within-subject comparison of speech type, 6 participants
would have been needed, which we exceeded with our
29 participants in experiment 2. The proximity prefer-
ence for the DDS speaker in experiment 1 was associ-
ated with a medium effect size (η2 < 0.156), yet the same
comparison in experiment 2 yielded a small effect size
(η2 = 0.038). An a priori power analysis for proximity
duration in experiment 2 indicated that to find a similar
Fig. 5 Time spent looking towards content-mismatched DDS and
ADS during each phase, where error bars represent 1 standard error
of the mean. n.s denotes non-significant comparisons as revealed by
mixed ANOVAs (total: Table5: other time segments: TableS4)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
361Animal Cognition (2018) 21:353–364
1 3
effect size based on partial η2 of 0.16, with power of 0.80
and an alpha level of 0.05 for the within-subject com-
parison of speech type, 24 participants would have been
needed, which we exceeded with our 30 participants in
experiment 2. Together the effect sizes and power analy-
sis indicate that experiment 2 had sufficient power to find
differences similar to those found in experiment 1, had
they existed, and therefore, we can be relatively confident
in this null result.
Discussion
The results from experiment 2 suggest that there is no sig-
nificant difference in dogs’ attention or proximity preference
to speakers of DDS or ADS where content and prosody did
not match. This suggests that neither content, nor prosody,
is solely responsible for the preference for DDS shown in
experiment 1. As the same scripts were used in both experi-
ments, this result also highlights that the preference shown
in experiment 1 could not be explained by the use of specific
words in the content of the original stimuli, such as ‘walk’
or ‘dog’, for example. If this were the case, we would have
observed a preference for content-mismatched ADS, which
not only contained the specific dog-related words used in
experiment 1, but more repetitions of them (see methods).
In order to explore alternative explanations for these null
results we first considered if the difficulty of producing these
content-mismatched stimuli had resulted in poor examples
of DDS and ADS prosody being produced. The acoustic
analysis of the stimuli, however, illustrates that the content-
mismatched stimuli followed the same patterns of acoustic
properties as the naturalistic DDS of experiment 1. This
supports the use of these stimuli and highlights that the null
result found in this experiment is unlikely to be due to fail-
ures in producing authentic DDS or ADS when the content
is reversed. Second, although a broadly comparable number
of subjects were used in experiments 1 and 2, it is possible
that the slightly smaller N available in experiment 2 (33
vs 29 Looking duration; 34 vs 30 proximity duration), left
experiment 2 with slightly less power to detect differences
Table 5 Results of between-subject ANOVA (1,25) for the control silence and a mixed ANOVA with degrees of freedom (1,25) comparing main
effects and interactions for looking times towards content-mismatched DDS and ADS
Bold value denotes a significant finding
Significant results are marked, where * indicates p < 0.050
Within-subject effects F(p) Between-subject effects F(p)
Speech type Speech type *identity Speech type * location Speech type *
identity * loca-
tion
Identity Location Identity * location
Control silence 4.24 (.048)*1.44 (.242) 1.02 (.322)
Total looking < 0.01 (.985) 5.75 (.024)* 2.03 (.167) 1.00 (.328) 2.58 (.121) 0.99 (.330) 0.34 (.560)
Fig. 6 A graph to show mean time spent in proximity with each
speaker (seconds), for content-mismatched DDS and ADS. Error bars
represent one standard error of the mean
Table 6 Results of a mixed ANOVA with degrees of freedom (1,26) comparing the time spent near DDS and ADS speakers for content-mis-
matched speech
Within-subject effects F(p) Between-subject effects F(p)
Speech type Speech type *identity Speech type * location Speech type * identity *
location
Identity Location Identity * location
Proximity
prefer-
ence
1.03 (.319) 0.85 (.365) 0.01 (.992) 0.02 (.894) 1.20 (.283) 0.59 (.448) 0.52 (.477)
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
362 Animal Cognition (2018) 21:353–364
1 3
compared to experiment 1. However, examination of effect
sizes indicates that while the naturalistic speech in experiment
1 elicited large effect size (η2 = 0.563), effect sizes obtained
with the reversed stimuli were extremely small (η2 < 0.001).
Power analyses confirmed that we had sufficient sample sizes
in experiment 2 to detect differences similar to those found in
experiment 1. We are therefore confident that the null result
in experiment 2 was not due to lack of power.
In experiment 2 a significant interaction between speech
type and experimenter revealed that experimenter 3’s DDS
was more effective at eliciting attention than experimenter
4’s DDS. This effect is likely mediated by what seemed to
be an a priori preference for experimenter 1, which resulted
in dogs looking significantly longer at this experimenter in
the control period before any speech was produced. It is not
clear whether visual or scent characteristics drove this pref-
erence, although scent seems unlikely as the preference did
not remain in the post-stimulus proximity to experimenters
where an attractive scent could have been actively explored.
It is interesting that dogs seemed to have an immediate pref-
erence for one experimenter and this may have enhanced
the efficacy of an experimenter’s dog-directed prosody. It is,
however, important to note that the preferred experimenter’s
DDS was still not significantly more effective in attracting
dogs’ attention than her ADS. Indeed post hoc analyses of
the interaction term at the level of each speaker confirmed
the main findings that the different types of speech did not
elicit significantly different behaviour from the dogs.
General discussion
The results provide evidence that in an ecologically valid
setting, dogs attended more towards naturalistic DDS, where
prosody and content were matched, compared with ADS.
We also show for the first time that dogs subsequently spend
more time in proximity to an experimenter who has recently
produced naturalistic DDS than one who has recently pro-
duced ADS. This novel finding suggests that DDS may
fulfil a dual function of improving attention and increasing
social bonding. This fits with the current understanding of
infant research, which suggests not only that IDS serves to
facilitate language acquisition, but that it is also crucial for
developing meaningful social relationships with caregivers.
The second experiment was designed to investigate
whether prosody or content alone was driving this pref-
erence for naturalistic DDS; however, when content and
prosody were mismatched, we found there was no differ-
ence in the amount of time spent in proximity to the experi-
menters and there was no significant attentive preference for
DDS or ADS in any part of the trial, or across the session
as a whole. This suggests that neither content, nor prosody
alone was driving the preference observed in experiment
1. Instead, it is clear that both content and prosody matter
to dogs. Future research should aim to disentangle whether
dog-related prosody and content independently affect dog
behaviour, or whether they have to be combined congruently
in order to affect dog preferences. This study is unable to
distinguish between these possibilities; however, the results
from Ben-Aderet etal. (2017), who found that adult dogs did
not prefer dog-relevant content produced with dog-directed
prosody over adult-directed prosody, indicate that it may
be the congruent combination of dog-directed content and
prosody that underpins the preference for naturalistic DDS.
Further experiments, with large sample sizes, which manipu-
late both prosody and content independently, are required to
understand this relationship more fully.
Interestingly, Ben-Aderet etal. did find a significant
preference for DDS prosody in puppies, showing that pup-
pies are more sensitive to prosodic differences compared
to adult dogs. Puppies may be more sensitive to acoustic
differences than adult dogs in the same way that human
babies are most sensitive to IDS early in life (Newman
and Hussain 2006). Puppies also have less experience of
human language and time to form associations between
specific words and positive experience (e.g. walk) and
thus should be less sensitive to content. Therefore, while
puppies may rely wholly on prosodic information, adult
dogs seem to take both content and prosody into account,
and only when these two things are relevant to them, they
do display a behavioural preference. While preference for
dog-related content needs experience of human interaction
to develop, the origins of the preference for dog-directed
prosody are less clear: they may be routed in an innate
preference for higher pitched, tonal sounds, the domestica-
tion process or be a product of early learning environment.
If preferences for DDS prosody are based on preferences
for high pithed tonal sound, which across mammalian spe-
cies is associated with affiliation and submission rather
than aggression (Morton 1977), then other mammalian
species should show a preference for DDS over ADS.
Future research could test this possibility. Alternatively,
preference for DDS prosody may have arisen through
various routes during the domestication process. Firstly,
early in the domestication process, DDS may have pro-
vided dogs with a reliable cue that indicates safe social
partners at a time when joining human groups may have
been dangerous, and identifying those who would not be
hostile would have been important for a dog’s survival.
Secondly, as dogs are able to engage with humans in
joint attention (Miklósi etal. 2003) and can cooperate to
achieve goal-directed actions (Range and Virányi 2014),
it is possible that humans selected dogs for characteris-
tics that promoted social communication during domes-
tication, including attentive and affiliative preference for
DDS. It is, however, also possible that dogs kept as pets
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
363Animal Cognition (2018) 21:353–364
1 3
are conditioned over their individual lifetimes to respond
positively to DDS as this type of speech is often paired
with positive events (e.g. food treat, toy, walk or affection).
Although Ben-Aderet etal. found a clear preference for
DDS in young dogs (2–5months), it is possible that such
associations could be formed in that time. Future research
with young puppies raised with extremely minimal human
contact would enable us to test whether environmental
input is needed to shape this preference or whether it is
an innate preference, as it seems to be in human infants
(Cooper and Aslin 1990).
Although the use of real people to deliver the speech to
the dogs increased the ecological validity of our experimen-
tal set-up, it did have potential drawbacks. First, the impor-
tance of providing speech from each experimenter (exact
match with characteristics including gender, height and size)
to ensure it was physically congruous meant that the same
stimuli were heard by multiple dogs. Although acoustic anal-
ysis confirmed the structure of these stimuli were representa-
tive of DDS and ADS reported in other studies, it is unclear
whether these findings would generalise to a wider sample
of DDS and our findings suggest that there may be indi-
vidual variation in the efficacy of DDS. Thus, further studies
without pseudoreplication at the level of the stimulus are
required to confirm the generalisability of our findings. Dif-
ferential a priori interest in the experimenters, as we found
in experiment 2, is a further complication associated with
the use of live models in these experiments, which highlights
the need for rigorous counterbalancing and a control period
where such a priori biases can be measured. In addition,
our results illustrate the interesting possibility that a priori
preferences for individuals may influence the effectiveness
of and sensitivity to other cues including speech register.
In conclusion, the results from this study support the
hypothesis that dogs pay more attention to naturalistic DDS
than to ADS. It also revealed that dogs spent more time
near someone who had just produced DDS rather than ADS,
indicating for the first time that DDS may not just modulate
attentive behaviour, but also play a role in the development
of affiliative preferences. This preference for naturalistic
DDS was not driven by preference for dog-directed content
or prosody alone, as no attentive or affiliative preferences
were shown when dogs were presented with content- and
prosody-mismatched stimuli. This study concludes that natu-
ralistic DDS elicits more attention from dogs than ADS and
has the potential to strengthen the affiliative bond a human
has with a dog.
Acknowledgements We would like to extend our thanks to Alyse and
all the staff at Redhouse Boarding Kennels in York for allowing us to
conduct our research study at the Pooches Paradise. We would par-
ticularly like to thank Lucy whose friendly and helpful advice was
extremely valuable during our visits. We are grateful to the dog owners
of Newton-upon-Derwent, whose dogs provided valuable pilot data
which helped inform our final study design. Thanks also to the under-
graduate research students, Kate Dibb, Emma Curran, Amy Wilson and
Charlotte Le Bourgeois, who provided the stimuli and helped collect
and code the data. Finally, thank you Poppy, for many hours of patience
during the production of the stimuli for this study.
Compliance with ethical standards
Conflicts of interest There are no conflicts of interest to disclose.
Ethical approval All applicable international, national and/or institu-
tional guidelines for the care and use of animals were followed. All
procedures performed in studies involving human participants were
in accordance with the ethical standards of the institutional and/or
national research committee and with the 1964 Helsinki Declaration
and its later amendments or comparable ethical standards.
Informed consent Informed consent was obtained from the kennel
owner for the use of resident dogs in this study. Experimenters provided
informed consent for the use of their voices.
Open Access This article is distributed under the terms of the Crea-
tive Commons Attribution 4.0 International License (http://creat iveco
mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribu-
tion, and reproduction in any medium, provided you give appropriate
credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
References
Andruski JE, Kuhl PK, Hayashi A (1999) Point vowels in Japa-
nese mothers’ speech to infants and adults. J Acoust Soc Am
105(2):1095. https ://doi.org/10.1121/1.42513 5
Ben-Aderet T, Gallego-Abenza M, Reby D, Mathevon N (2017) Dog-
directed speech: why do we use it and do dogs pay attention to it?
Proc R Soc Lond B Biol Sci 284(1846):20162429
Burnham D, Francis E, Vollmer-Conna U (1998) Are you my little
pussy-cat? acoustic, phonetic and affective qualities of infant-and
pet-directed speech. ICSLP. http://www.isca-speec h.org/archi ve/
archi ve_paper s/icslp _1998/i98_0916.pdf
Burnham D, Kitamura C, Vollmer-Conna U (2002) What’s new, pus-
sycat? On talking to babies and animals. Science 296(5572):1435.
https ://doi.org/10.1126/scien ce.10695 87
Cooper RP, Aslin RN (1990) Preference for infant-directed speech in
the first month after birth. Child Dev 61(5):1584–1595. https ://
doi.org/10.1111/j.1467-8624.1990.tb028 85.x
Kaplan PS, Goldstein MH, Huckeby ER, Owren MJ, Cooper RP (1995)
Dishabituation of visual attention by infant- versus adult-directed
speech: effects of frequency modulation and spectral composition.
Infant Behav Dev 18(2):209–223. https ://doi.org/10.1016/0163-
6383(95)90050 -0
Kuhl PK, Andruski JE, Chistovich IA, Chistovich LA, Kozhevnikova EV,
Ryskin VL, Stolyarova EI, Sundberg U, Lacerda F (1997) Cross-lan-
guage analysis of phonetic units in language. Science 227:684–686
Miklósi Á, Kubinyi E, Topál J, Gácsi M, Virányi Z, Csányi V (2003)
A simple reason for a big difference: wolves do not look back at
humans, but dogs do. Curr Biol. https ://doi.org/10.1016/S0960
-9822(03)00263 -X
Morton ES (1977) On the occurrence and significance of motiva-
tion-structural rules in some bird and mammal sounds. Am Nat
111:855–869
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
364 Animal Cognition (2018) 21:353–364
1 3
Newman RS, Hussain I (2006) Changes in preference for infant-
directed speech in low and moderate noise by 4.5- to 13-month-
olds. Infancy 10(1):61–76. https ://doi.org/10.1207/s1532 7078i
n1001 _4
Range F, Virányi Z (2014) Wolves are better imitators of conspecifics
than dogs. PLoS ONE 9(1):e86559. https ://doi.org/10.1371/journ
al.pone.00865 59
Schachner A, Hannon EE (2011) Infant-directed speech drives social
preferences in 5-month-old infants. Dev Psychol 47(1):19–25
Waller BM, Warmelink L, Liebal K, Micheletta J, Slocombe KE
(2013) Pseudoreplication: a widespread problem in primate
communication research. Anim Behav 86(2):483–488. https ://
doi.org/10.1016/j.anbeh av.2013.05.038
Werker JF, McLeod PJ (1989) Infant preference for both male and
female infant-directed talk: a developmental study of attentional
and affective responsiveness. Can J Psychol/Revue Canadienne
de Psychologie 43(2):230–246. https ://doi.org/10.1037/h0084 224
Xu N, Burnham D, Kitamura C, Vollmer-Conna U (2013) Vowel hyper-
articulation in parrot-, dog- and infant-directed speech. Anthro-
zoos Multidiscip J Interact People Animals 26(3):373–380. https
://doi.org/10.2752/17530 3713X 13697 42946 3592
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com

Supplementary resource (1)

... The highpitched tone-one of the aspects that characterizes the nice speech-was also characteristic of DDS-type speeches. DDS has been used to attract the listener's attention-increasing the recipients' social responsiveness-and to meet the receiver's emotional needs, possibly enhancing affiliative communication with the speaker by stimulating affection and emotions with a positive valence [13,34]. ...
... An alternative interpretation of this association could be that the trainers used softer voices when the animals were attentive and closer to them. However, as preference for DDS has already been demonstrated to influence dog proximity and time spent looking at humans [13,48], we believe the alternative interpretation is unlikely. Although this association has not been demonstrated for wolves so far, the reduction in the time they spent close to the trainers when these used reproachful speech more often during the training sessions is consistent with a greater interest in approaching and interacting with someone using a DDS-type of speech [18,34]. ...
... In addition, the study authors observed that the duration of neutral speeches was shorter than that of angry speeches. Benjamin & Slocombe [13] also argue that DDS-style speeches, compared to ADS-style, have a higher pitch and exaggerated affect, characteristics that were also observed in this study. Therefore, nice speech in our study, to some extent, matched DDS, not only considering its acoustic characteristics, but also regarding dogs' behaviors associated to it. ...
Article
Full-text available
In a previous study, we found that Positive Reinforcement Training reduced cortisol of wolves and dogs; however, this effect varied across trainer–animal dyads. Here we investigate whether and how the trainers’ use of speech may contribute to this effect. Dogs’ great interest in high-pitched, intense speech (also known as Dog Directed Speech) has already been reported, but whether and how wolves respond similarly/differently to voice characteristics has never been studied before. We analyzed 270 training sessions, conducted by five trainers, with nine mixed-breed dogs and nine wolves, all human-socialized. Through Generalized Linear Mixed Models, we analyzed the effects of (a) three speech categories (nice, neutral, reprehensive) and laugh; and (b) acoustic characteristics of trainers’ voices on animals’ responses (correct responses, latency, orientation, time at less than 1 m, non-training behaviors, tail position/movements, cortisol variation). In both subspecies, tail wagging occurred more often in sessions with longer durations of nice speech, and less often in sessions with reprehensive speech. For dogs, the duration of reprehensive speech within a session was also negatively related to correct responses. For wolves, retreat time was associated with more reprehensive speech, whereas duration of nice speech was positively associated with time spent within one meter from the trainer. In addition, most dog behavioral responses were associated with higher average intonations within sessions, while wolf responses were correlated with lower intonations within sessions. We did not find any effects of the variables considered on cortisol variation. Our study highlights the relevance of voice tone and speech in a training context on animals’ performances and emotional reactions.
... Second, we compared speech neural processing in dogs and humans using noninvasive electroencephalography (EEG), to investigate (1) how dogs track speech modulations; and (2) if, like in humans, dogs' speech tracking accuracy predicts comprehension. Unlike previous studies, e.g., [24,30,[49][50][51], we selected command words as speech stimuli, which allowed us to use dogs' behavioural responses as an index of "intelligibility," while remaining within the structural definition of the DDS register, i.e., short (3 words on average), mostly one-node, imperative utterances [18]. ...
... However, when constructing acoustic stimuli, we ensured that the vocal rate would be within the natural DDS range (i.e., 3 ± 1.6 Hz) and complementary analyses revealed no differences in spectral characteristics between the 2 registers (S3 Fig). Furthermore, like previous studies that used praising DDS as stimuli [49,51], we found that eliciting successful responses required the full integration of prosodic and content information (Fig 3). This suggests that commanding and praising DDS may be similarly processed by dogs at least at the auditory level which we explored here. ...
Article
Full-text available
Within species, vocal and auditory systems presumably coevolved to converge on a critical temporal acoustic structure that can be best produced and perceived. While dogs cannot produce articulated sounds, they respond to speech, raising the question as to whether this heterospecific receptive ability could be shaped by exposure to speech or remains bounded by their own sensorimotor capacity. Using acoustic analyses of dog vocalisations, we show that their main production rhythm is slower than the dominant (syllabic) speech rate, and that human–dog-directed speech falls halfway in between. Comparative exploration of neural (electroencephalography) and behavioural responses to speech reveals that comprehension in dogs relies on a slower speech rhythm tracking (delta) than humans’ (theta), even though dogs are equally sensitive to speech content and prosody. Thus, the dog audio-motor tuning differs from humans’, and we hypothesise that humans may adjust their speech rate to this shared temporal channel as means to improve communication efficacy.
... Dogs are not as closely related to humans as bonobos or chimpanzees, but have unique exposure to human speech from living alongside humans. Pet dogs overhear speech in their everyday life, and people often direct speech to dogs (Ben-Aderet et al. 2017;Benjamin and Slocombe 2018). Dogs can quickly learn a vocabulary of commands and learn related words from exposure. ...
Article
Full-text available
Humans have an impressive ability to comprehend signal-degraded speech; however, the extent to which comprehension of degraded speech relies on human-specific features of speech perception vs. more general cognitive processes is unknown. Since dogs live alongside humans and regularly hear speech, they can be used as a model to differentiate between these possibilities. One often-studied type of degraded speech is noise-vocoded speech (sometimes thought of as cochlear-implant-simulation speech). Noise-vocoded speech is made by dividing the speech signal into frequency bands (channels), identifying the amplitude envelope of each individual band, and then using these envelopes to modulate bands of noise centered over the same frequency regions – the result is a signal with preserved temporal cues, but vastly reduced frequency information. Here, we tested dogs’ recognition of familiar words produced in 16-channel vocoded speech. In the first study, dogs heard their names and unfamiliar dogs’ names (foils) in vocoded speech as well as natural speech. In the second study, dogs heard 16-channel vocoded speech only. Dogs listened longer to their vocoded name than vocoded foils in both experiments, showing that they can comprehend a 16-channel vocoded version of their name without prior exposure to vocoded speech, and without immediate exposure to the natural-speech version of their name. Dogs’ name recognition in the second study was mediated by the number of phonemes in the dogs’ name, suggesting that phonological context plays a role in degraded speech comprehension. Supplementary Information The online version contains supplementary material available at 10.1007/s10071-024-01869-3.
... Indeed, contrary to popular beliefs, we found no evidence that dogs primarily rely on 'prosody' rather than 'content' to respond to human vocal cues. Instead, successful responses require the fully integrated signal (Figure 3), confirming and extending previous results that used a preference-looking paradigm to investigate responses to ADS and DDS in dogs 58 . To borrow a term from the multi-modal communication . ...
Preprint
Full-text available
Within species, vocal and auditory systems co-evolve to converge on a critical temporal acoustic structure that can be best produced and perceived. While dogs cannot produce articulated sounds, they respond to speech, raising the question as to whether this heterospecific receptive ability is shaped by exposure to speech or bounded by their own sensorimotor capacity. Acoustic analyses of vocalisations show that dogs' main production rhythm is slower than the dominant (syllabic) speech rate, and that human dog-directed speech falls halfway in between. Comparative exploration of neural (electroencephalography) and behavioural responses to speech reveals that comprehension in dogs relies on a slower speech rhythm tracking (delta) than humans' (theta), even though dogs are equally sensitive to human speech content and prosody. Thus, the dog audio-motor tuning differs from humans', who vocally adjust their speech rate to this shared temporal channel.
... This led to the conclusion that dogs tend to rely on both lexical meaning and intonation in human speech processing 18 . It seems however, that neither prosody nor linguistic content alone is solely responsible for dogs' response to auditory cues, suggesting that content and prosody account for this preference in a conjoint manner 19 . Further evidence also suggests that dogs can rely on vocal into-nation to solve object choice tasks, utilizing positive versus negative intonation as a social referencing cue 20 . ...
Article
Full-text available
Domestic dogs are well-known for their abilities to utilize human referential cues for problem solving, including following the direction of human voice. This study investigated whether dogs can locate hidden food relying only on the direction of human voice and whether familiarity with the speaker (owner/stranger) and the relevance of auditory signal features (ostensive addressing indicating the intent for communication to the receiver; linguistic content) affect performance. N = 35 dogs and their owners participated in four conditions in a two-way object choice task. Dogs were presented with referential auditory cues representing different combinations of three contextual parameters: the (I) ‘familiarity with the human informant’ (owner vs. stranger), the (II) communicative function of attention getter (ostensive addressing vs. non-ostensive cueing) and the (III) ‘tone and content of the auditory cue’ (high-pitched/potentially relevant vs. low-pitched/potentially irrelevant). Dogs also participated in a ‘standard’ pointing condition where a visual cue was provided. Significant differences were observed between conditions regarding correct choices and response latencies, suggesting that dogs’ response to auditory signals are influenced by the combination of content and intonation of the message and the identity of the speaker. Dogs made correct choices the most frequently when context-relevant auditory information was provided by their owners and showed less success when auditory signals were coming from the experimenter. Correct choices in the ‘Pointing’ condition were similar to the experimenter auditory conditions, but less frequent compared to the owner condition with potentially relevant auditory information. This was paralleled by shorter response latencies in the owner condition compared to the experimenter conditions, although the two measures were not related. Subjects’ performance in response to the owner- and experimenter-given auditory cues were interrelated, but unrelated to responses to pointing gestures, suggesting that dogs’ ability to understand the referential nature of auditory cues and visual gestures partly arise from different socio-cognitive skills.
... Looking at human faces also gives them the ability to differentiate between humans, recognize familiar individuals, or even generate an internal representation of their owner's face [23][24][25][26][27][28][29]. Regarding vocal communication, dogs are more attentive when humans talk to them using dog-directed speech, a register resembling "baby talk" [30][31][32]. Moreover, multimodal signalling in human-dog communication has been increasingly studied in recent years, with an interesting focus on contrasting command paradigms in which the vocal cues indicate an intent that mismatches visual cues [5,17,18]. ...
Article
Full-text available
Across all species, communication implies that an emitter sends signals to a receiver, through one or more channels. Cats can integrate visual and auditory signals sent by humans and modulate their behaviour according to the valence of the emotion perceived. However, the specific patterns and channels governing cat-to-human communication are poorly understood. This study addresses whether, in an extraspecific interaction, cats are sensitive to the communication channel used by their human interlocutor. We examined three types of interactions—vocal, visual, and bimodal—by coding video clips of 12 cats living in cat cafés. In a fourth (control) condition, the human interlocutor refrained from emitting any communication signal. We found that the modality of communication had a significant effect on the latency in the time taken for cats to approach the human experimenter. Cats interacted significantly faster to visual and bimodal communication compared to the “no communication” pattern, as well as to vocal communication. In addition, communication modality had a significant effect on tail-wagging behaviour. Cats displayed significantly more tail wagging when the experimenter engaged in no communication (control condition) compared to visual and bimodal communication modes, indicating that they were less comfortable in this control condition. Cats also displayed more tail wagging in response to vocal communication compared to the bimodal communication. Overall, our data suggest that cats display a marked preference for both visual and bimodal cues addressed by non-familiar humans compared to vocal cues only. Results arising from the present study may serve as a basis for practical recommendations to navigate the codes of human–cat interactions.
Article
Full-text available
Pet-directed speech (PDS) is often produced by humans when addressing dogs. Similar to infant-directed speech, PDS is marked by a relatively higher and more modulated fundamental frequency ( f 0 ) than is adult-directed speech. We tested the prediction that increasing eye size in dogs, one facial feature of neoteny (juvenilisation), would elicit exaggerated prosodic qualities or pet-directed speech. We experimentally manipulated eye size in photographs of twelve dog breeds by −15%, +15% and +30%. We first showed that dogs with larger eyes were indeed perceived as younger. We then recorded men and women speaking towards these photographs, who also rated these images for cuteness. Linear mixed-effects models demonstrated that increasing eye size by 15% significantly increased pitch range ( f 0 range) and variability ( f 0 CV) among women only. Cuteness ratings did not vary with eye size, due to a possible ceiling effect across eye sizes. Our results offer preliminary evidence that large eyes can elicit pet-directed speech and suggest that PDS may be modulated by perceived juvenility rather than cuteness. We discuss these findings in the context of inter-species vocal communication.
Article
Full-text available
Nas últimas décadas houve grande aumento no número de grupos de pesquisa e publicações sobre comportamento, cognição e bem-estar de cães. No entanto, devido a diversos fatores como a grande difusão de conceitos antigos pela mídia não especializada e a dispersão da difusão de conhecimento imposta pelas mídias sociais, pode-se dizer que muito do produzido nas últimas décadas não atinge os profissionais e os tutores/responsáveis pelos cães no Brasil. O objetivo do presente capítulo foi adereçar este fato, focando nos novos conhecimentos gerados, nas dificuldades da chegada do conhecimento a essas pessoas e nas iniciativas que parecem poder suplantar as dificuldades. Pesquisas envolvendo a cognição e o bem-estar de cães revelaram muitas coisas nas últimas décadas: a grande capacidade destes animais em perceber a comunicação não verbal humana assim como suas particularidades de aprendizagem, alguns sinais sutis de ansiedade, que podem ser utilizados como marcadores de estados emocionais, as necessidades e maneiras de se medir seu bem-estar, dentre outros. Este conteúdo, por razões diversas, dificilmente alcança todos os profissionais e responsáveis por estes animais, seja em um ambiente mais profissionalizado como o meio de cães de trabalho e esporte ou no enorme mercado pet brasileiro e sua grande heterogeneidade. Algumas iniciativas de compreensão de realidades e de difusão de conhecimento podem ser destacadas: desde projetos de mensuração e melhoria de bem-estar de cães de trabalho, como o recentemente iniciado no Exército Brasileiro, passando por novos atores como associações voltadas ao comportamento e bem-estar, chegando à difusão de conhecimento possibilitada pelas novas mídias, como vídeos e podcasts produzidos em um número cada vez maior no tema. Adicionalmente, outras possíveis iniciativas como a colaboração entre instituições, a prática de ciência colaborativa e a utilização de grandes bancos de dados foram levantados como possíveis fatores impactantes para o futuro.
Article
Full-text available
Pet-directed speech is strikingly similar to infant-directed speech, a peculiar speaking pattern with higher pitch and slower tempo known to engage infants' attention and promote language learning. Here, we report the first investigation of potential factors modulating the use of dog-directed speech, as well as its immediate impact on dogs' behaviour. We recorded adult participants speaking in front of pictures of puppies, adult and old dogs, and analysed the quality of their speech. We then performed playback experiments to assess dogs' reaction to dog-directed speech compared with normal speech. We found that human speakers used dog-directed speech with dogs of all ages and that the acoustic structure of dog-directed speech was mostly independent of dog age, except for sound pitch which was relatively higher when communicating with puppies. Playback demonstrated that, in the absence of other non-auditory cues, puppies were highly reactive to dog-directed speech, and that the pitch was a key factor modulating their behaviour, suggesting that this specific speech register has a functional value in young dogs. Conversely, older dogs did not react differentially to dog-directed speech compared with normal speech. The fact that speakers continue to use dog-directed with older dogs therefore suggests that this speech pattern may mainly be a spontaneous attempt to facilitate interactions with non-verbal listeners.
Article
Full-text available
Domestication is thought to have influenced the cognitive abilities of dogs underlying their communication with humans, but little is known about its effect on their interactions with conspecifics. Since domestication hypotheses offer limited predictions in regard to wolf-wolf compared to dog-dog interactions, we extend the cooperative breeding hypothesis suggesting that the dependency of wolves on close cooperation with conspecifics, including breeding but also territory defense and hunting, has created selection pressures on motivational and cognitive processes enhancing their propensity to pay close attention to conspecifics' actions. During domestication, dogs' dependency on conspecifics has been relaxed, leading to reduced motivational and cognitive abilities to interact with conspecifics. Here we show that 6-month-old wolves outperform same aged dogs in a two-action-imitation task following a conspecific demonstration. While the wolves readily opened the apparatus after a demonstration, the dogs failed to solve the problem. This difference could not be explained by differential motivation, better physical insight of wolves, differential developmental pathways of wolves and dogs or a higher dependency of dogs from humans. Our results are best explained by the hypothesis that higher cooperativeness may come together with a higher propensity to pay close attention to detailed actions of others and offer an alternative perspective to domestication by emphasizing the cooperativeness of wolves as a potential source of dog-human cooperation.
Article
Full-text available
Pseudoreplication (the pooling fallacy) is a widely acknowledged statistical error in the behavioural sciences. Taking a large number of data points from a small number of animals creates a false impression of a better representation of the population. Studies of communication may be particularly prone to artificially inflating the data set in this way, as the unit of interest (the facial expression, the call or the gesture) is a tempting unit of analysis. Primate communication studies (551) published in scientific journals from 1960 to 2008 were examined for the simplest form of pseudoreplication (taking more than one data point from each individual). Of the studies that used inferential statistics, 38% presented at least one case of pseudoreplicated data. An additional 16% did not provide enough information to rule out pseudoreplication. Generalized linear mixed models determined that one variable significantly increased the likelihood of pseudoreplication: using observational methods. Actual sample size (number of animals) and year of publication were not associated with pseudoreplication. The high prevalence of pseudoreplication in the primate communication research articles, and the fact that there has been no decline since key papers warned against pseudoreplication, demonstrates that the problem needs to be more actively addressed.
Article
Full-text available
Vowel triangle area is a phonetic measure of the clarity of vowel articulation. Compared with speech to adults, people hyperarticulate vowels in speech to infants and foreigners but not to pets, despite other similarities in infant- and pet-directed-speech. This suggests that vowel hyperarticulation has a didactic function positively related to the actual, or even the expected, degree of linguistic competence of the audience. Parrots have some degree of linguistic competence yet no studies have examined vowel hyperarticulation in speech to parrots. Here, we compared the speech of 11 adults to another adult, a dog, a parrot, and an infant. A significant linear increase in vowel triangle area was found across the four conditions, showing that the degree of vowel hyperarticulation increased from adult- and dog-directed speech to parrot-directed speech, then to infant-directed speech. This suggests that the degree of vowel hyperarticulation is related to the audience's actual or expected linguistic competence. The results are discussed in terms of the relative roles of speakers' expectations versus listeners' feedback in the production of vowel hyperarticulation; and suggestions for further studies, manipulating speaker expectation and listener feedback, are provided.
Article
Full-text available
Although a large literature discusses infants' preference for infant-directed speech (IDS), few studies have examined how this preference might change over time or across listening situations. The work reported here compares infants' preference for IDS while listening in a quiet versus a noisy environment, and across 3 points in development: 4.5 months of age, 9 months of age, and 13 months of age. Several studies have suggested that IDS might help infants to pick out speech in the context of noise (Colombo, Frick, Ryther, Coldren, & Mitchell, 1995; Fernald, 1984; Newman, 2003); this might suggest that infants' preference for IDS would increase in these settings. However, this was not found to be the case; at all 3 ages, infants showed similar advantage (or lack thereof) for IDS as compared to adult-directed speech when presented in noise versus silence. There was, however, a significant interaction across ages: Infants aged 4.5 months showed an overall preference for IDS, whereas older infants did not, despite listening to the same stimuli. The lack of an effect with older infants replicates and extends recent findings by Hayashi, Tamekawa, and Kiritani (2001), suggesting that the variations in fundamental frequency and affect are not sufficient cues to IDS for older infants.
Article
American, Russian, and Swedish mothers produce acoustically more extreme point vowels (/i/, /u/, and /a/) when speaking to their infants than when speaking to another adult [Kuhl et al., Science 277, 684–686]. This study examines the three point vowels in Japanese mothers’ speech, and compares the acoustic structure of infant‐directed (ID) and adult‐directed (AD) tokens. Three target words containing /i/, /u/, and /a/ (bi:zu, batto, bu:tsu = beads, bat, boots) were recorded while mothers conversed with another native‐speaking adult, and with their infants, aged either 51/2 or 81/2 months. F1, F2, and F0 were measured at vowel onset, center, and offset. Acoustic, results will be compared for AD and ID speech, and expansion of the vowel space in Japanese mothers’ speech will be examined. [Work supported by NIH HD35465‐01S1.]
Article
The present investigations were undertaken to compare interspecific communicative abilities of dogs and wolves, which were socialized to humans at comparable levels. The first study demonstrated that socialized wolves were able to locate the place of hidden food indicated by the touching and, to some extent, pointing cues provided by the familiar human experimenter, but their performance remained inferior to that of dogs. In the second study, we have found that, after undergoing training to solve a simple manipulation task, dogs that are faced with an insoluble version of the same problem look/gaze at the human, while socialized wolves do not. Based on these observations, we suggest that the key difference between dog and wolf behavior is the dogs' ability to look at the human's face. Since looking behavior has an important function in initializing and maintaining communicative interaction in human communication systems, we suppose that by positive feedback processes (both evolutionary and ontogenetically) the readiness of dogs to look at the human face has lead to complex forms of dog-human communication that cannot be achieved in wolves even after extended socialization.
Article
This paper describes a concept, not altogether new but largely neglected, that should lead to a greater understanding of the information contained in certain classes of vocal communication signals of birds and mammals. The concept is based on empirical data, first pointed out by Collias (1960, p. 382), showing that natural selection has resulted in the structural convergence of many animal sounds used in "hostile" and "friendly" contexts. Simply stated, birds and mammals use harsh, relatively low-frequency sounds when hostile and higherfrequency, more pure tonelike sounds when frightened, appeasing, or approaching in a friendly manner. Thus, there appears to be a general relationship between the physical structures of sounds and the motivation underlying their use. I hope to develop the idea that this relationship has had a far greater influence on the evolution of animal communication systems than has hitherto been discussed. I will discuss the idea that there exist motivation-structural rules (MS) governing the physical structure of close contact sounds in animal communication systems. The greatest value of the MS concept is that it provides the opportunity to compare the evolution of vocal communication in any species against an abstract concept. The adaptive nature of communication systems against varying backgrounds of environment, social system, and competition will appear in clear relief.
Article
2 experiments examined behavioral preferences for infant-directed (ID) speech over adult-directed (AD) speech in young infants. Using a modification of the visual-fixation-based auditory-preference procedure, Experiments 1 and 2 examined whether 12 1-month-old and 16 2-day-old infants looked longer at a visual stimulus when looking produced ID as opposed to AD speech. The results showed that both 1-month-olds and newborns preferrred ID over AD speech. Although the absolute magnitude of the ID speech preference was significantly greater, with the older infants showing longer looking durations than the younger infants, subsequent analyses showed no significant difference in the relative magnitude of this effect. Differences in overall looking times between the 2 groups apparently reflect task variables rather than differences in speech processing. These results suggest that infants' preference for the exaggerated prosodic features of ID speech is present from birth and may not depend on any specific postnatal experience. However, the possible role of prenatal auditory experience with speech is considered.
Article
Dishabituation of visual attention by infant- and adult-directed (ID and AD) speech was investigated in four experiments. Four-month-olds received 12 10-s presentations of a checkerboard pattern with a speech segment compounded only on the ninth trial. Recovery of visual attention was observed on the compound trial in response to both ID and AD speech, but only ID speech dishabituated visual attention during the following pattern-alone retest (a Thompson-Spencer dishabituation effect, observed in the first two experiments). Synthetic analogs of these speech segments' fundamental frequencies (Fos) elicited equivalent increases in attention on the compound trial, but neither elicited Thompson-Spencer dishabituation (Experiment 3). A synthetic version of the intact ID signal elicited Thompson-Spencer dishabituation, but synthetic stimuli simulating the F0 only, the F0 plus the first harmonic above the F0, and the harmonics only did not (Experiment 4). These data have implications for the acoustic characteristics of ID speech that increase infant attention and arousal.