ArticlePDF Available

Animal linguistics: Exploring referentiality and compositionality in bird calls

Authors:

Abstract

Establishing the theory of language evolution is an ongoing challenge in science. One profitable approach in this regard is to seek the origins of linguistic capabilities by comparing language with the vocal communication systems of closely related relatives (i.e., the great apes). However, several key capabilities of language appear to be absent in non‐human primates, which limits the range of studies, such as direct phylogenetic comparison. A further informative approach lies in identifying convergent features in phylogenetically distant animals and conducting comparative studies. This approach is particularly useful with respect to establishing general rules for the evolution of linguistic capabilities. In this article, I review recent findings on linguistic capabilities in a passerine bird species, the Japanese tit (Parus minor). Field experiments have revealed that Japanese tits produce unique alarm calls when encountering predatory snakes, which serve to enhance the visual attention of call receivers with respect to snake‐like objects. Moreover, tits often combine discrete types of meaningful calls into fixed‐ordered sequences according to an ordering rule, conveying a compositional message to receivers. These findings indicate that two core capabilities of language, namely, referentiality and compositionality, have independently evolved in the avian lineage. I describe how these linguistic capabilities can be examined under field conditions and discuss how such research may contribute to exploring the origins and evolution of language. Understanding the origins and evolution of language is an ongoing challenge in science. Recent field studies revealed that several key features of language (referentiality and compositionality) have independently evolved in the avian lineage, providing a unique opportunity to explore the principles and general rules of language evolution.
AWARD PAPER
Animal linguistics: Exploring referentiality and
compositionality in bird calls
Toshitaka N. Suzuki
The Hakubi Center for Advanced
Research, Kyoto University, Kyoto, Japan
Correspondence
Toshitaka N. Suzuki, The Hakubi Center
for Advanced Research, Kyoto University,
Yoshida-honmachi, Kyoto 606-8501,
Japan.
Email: toshi.n.suzuki@gmail.com
Funding information
Hakubi Project funding; Japan Society for
the Promotion of Science, Grant/Award
Numbers: 16K18616, 18H05074,
18K14789, 20H03325, 20H05001
Abstract
Establishing the theory of language evolution is an ongoing challenge in sci-
ence. One profitable approach in this regard is to seek the origins of linguistic
capabilities by comparing language with the vocal communication systems of
closely related relatives (i.e., the great apes). However, several key capabilities
of language appear to be absent in non-human primates, which limits the
range of studies, such as direct phylogenetic comparison. A further informative
approach lies in identifying convergent features in phylogenetically distant
animals and conducting comparative studies. This approach is particularly use-
ful with respect to establishing general rules for the evolution of linguistic
capabilities. In this article, I review recent findings on linguistic capabilities in
a passerine bird species, the Japanese tit (Parus minor). Field experiments have
revealed that Japanese tits produce unique alarm calls when encountering
predatory snakes, which serve to enhance the visual attention of call receivers
with respect to snake-like objects. Moreover, tits often combine discrete types
of meaningful calls into fixed-ordered sequences according to an ordering rule,
conveying a compositional message to receivers. These findings indicate that
two core capabilities of language, namely, referentiality and compositionality,
have independently evolved in the avian lineage. I describe how these linguis-
tic capabilities can be examined under field conditions and discuss how such
research may contribute to exploring the origins and evolution of language.
KEYWORDS
bird, communication, compositionality, language evolution, referentiality
1|INTRODUCTION
How did language evolve? This is a long-standing ques-
tion in science (Fitch, 2010; Hurford, 2014; Tallerman &
Gibson, 2012). The generative power of language is based
on semantics and syntax, that is, signals convey indepen-
dent meanings, and combinations of signals provide com-
positional messages (Hurford, 2007, 2012). In contrast,
animal communication signals have long been consid-
ered essentially emotional or motivational in nature, that
is, signals are assumed to merely reflect the internal
states of signalers and convey neither referential nor
compositional information (Hurford, 2007, 2012). Given
Toshitaka N. Suzuki is the recipient of the 22nd Miyadi Award of the
Ecological Society of Japan.
Received: 23 September 2020 Revised: 9 November 2020 Accepted: 25 November 2020
DOI: 10.1111/1440-1703.12200
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided
the original work is properly cited.
© 2021 The Authors. Ecological Research published by John Wiley & Sons Australia, Ltd on behalf of The Ecological Society of Japan
Ecological Research. 2021;36:221231. wileyonlinelibrary.com/journal/ere 221
this widely accepted dichotomy, most previous studies on
the evolution of language have concentrated on detailed
analyses of various linguistic expressions, searching for
the minimum set of capabilities (i.e., the faculty of lan-
guage) required for verbal communication (i.e., the mini-
malist program; Chomsky, 1965, 1993). However, in the
absence of any knowledge of the evolutionary continuity
or parallels between language and animal communica-
tion systems, the origins and evolution of language
remain deep mysteries (Fitch, 2010; Hurford, 2014;
Tallerman & Gibson, 2012).
In our ongoing quest to trace the evolution of lan-
guage, one profitable approach is to seek its origins in
the vocal communication of closely related species (the
great apes). In this regard, both field and laboratory
researches have revealed that humans share several lin-
guistic capabilities, including the associative learning of
signs and referents, gestural communication, and use of
the direction of another individuals' gaze (i.e., gaze fol-
lowing), with chimpanzees (Pan troglodytes) and bonobos
(Pan paniscus) (Genty & Zuberbuhler, 2014; Terrace, Pet-
itto, Sanders, & Bever, 1979; Tomasello, Hare, &
Lehann, 2007). In contrast, however, numerous other lin-
guistic capabilities, such as learning to produce novel
sounds (i.e., audio-vocal learning) and syntax, are appar-
ently lacking in most non-human primates (Hauser,
Chomsky, & Fitch, 2002; Hurford, 2012). Thus, direct
phylogenetic comparison between humans and the great
apes is alone insufficient to gain a complete understand-
ing of the origins and evolution of linguistic capabilities
(Hurford, 2012; Suzuki, Wheatcroft, & Griesser, 2019).
An alternative fruitful approach lies in identifying con-
vergent cases of linguistic capabilities in phylogenetically
distant animals and to perform associated comparative
studies (Griesser, Wheatcroft, & Suzuki, 2018; Suzuki,
Griesser, & Wheatcroft, 2019; Suzuki, Wheatcroft, &
Griesser, 2018). This approach has been applied in stud-
ies of the audio-vocal learning of songs in passerine birds
(Catchpole & Slater, 2008), and research over the past
few decades has revealed remarkable similarities
between humans and passerines with respect to the neu-
ral mechanisms underlying audio-vocal learning
(Berwick, Okanoya, Beckers, & Bolhuis, 2011; Pfenning
et al., 2014).
Recent studies on vocal communication in a passerine
bird species, the Japanese tit (Parus minor; Figure 1), have
provided novel insights into instances of convergent lin-
guistic capabilities, among which is referentiality, that is,
the ability to convey to receivers, reference to external
objects or events using specific signals. This ability had for
long been considered unique to humans; however, the
findings of field studies conducted over the past four
decades have indicated possible evolutionary parallels in
several avian and mammalian species (Gill &
Bierema, 2013; Suzuki, 2016a; Townsend & Manser, 2013;
Zuberbühler, 2009). For example, common ravens (Corvus
corax) produce so-called yellcalls on finding carcasses
(Heinrich, 1988; Heinrich & Marzluff, 1991), the playback
of which has been found to attract conspecifics to the food
source (Szipl, Boeckle, Wascher, Spreafico, &
Bugnyar, 2015). However, most studies in this field cannot
exclude the possibility that signals merely represent a spe-
cific motivational state of signalers, rather than denoting
external referents (Rendall, Owren, & Ryan, 2009;
Wheeler & Fischer, 2012). In this review, I describe studies
that have used a novel paradigm to discriminate between
these two possibilities in Japanese tits (Suzuki, 2018).
A further linguistic capability of the Japanese tit is
compositional syntax, that is, the ability to combine
meaning-bearing units into compositional expressions
based on certain rules. Although compositional syntax
has also been considered a uniquely human trait, tits
have been shown to combine discrete types of meaning-
bearing calls into fixed-ordered sequences. Moreover,
playback experiments have revealed that receivers extract
a compositional message from the call sequences using
an ordering rule. Herein, I describe how compositional
expressions in animals can be distinguished from non-
compositional, holistic sequences of meaningless ele-
ments (see also Suzuki, Wheatcroft, et al., 2019). I also
discuss how these new findings can enhance our under-
standing of the cognitive mechanisms underlying animal
communication and how they can contribute to future
comparative studies that seek to establish the origins and
evolution of language.
FIGURE 1 The Japanese tit, Parus minor. This small passerine
has evolved several key capabilities of language, including
referentiality and compositionality [Color figure can be viewed at
wileyonlinelibrary.com]
222 SUZUKI
2|REFERENTIALITY
2.1 |Functionally referential may be
emotional
In human speech, words are often used to refer to objects
or events (i.e., referents), leading to a triadic relationship
among speakers, listeners and referents (Tomasello, 1995,
1999). Human infants develop referential words at
912 months of age, contingent on the prior development
of several cognitive skills, such as joint attention and
audio-vocal learning (Tomasello, 1999). In contrast, ani-
mal signals have long been considered expressions of the
emotional state of signalers, leading to a simple dyadic
relationship between signalers and receivers (Rendall
et al., 2009). However, Seyfarth, Cheney, and
Marler (1980) challenged this historical assumption by
examining the responses of vervet monkeys (Chlorocebus
pygerythrus) to different types of alarm call. These mon-
keys produce acoustically discrete alarm calls for differ-
ent predators, such as leopards, eagles and snakes
(Struhsaker, 1967). By examining the response of free-
living vervet monkeys to playbacks of the variation in
alarm calls, Seyfarth et al. (1980) found that different
alarm calls evoke different, presumably adaptive,
responses in receivers. Although such specific alarm calls
have been described as functionally referential
(Macedonia & Evans, 1993), it is also claimed that differ-
ent alarm calls may merely influence receivers' behavior
in the absence of any retrieval of referential information
(Rendall et al., 2009). Accordingly, even if monkeys pro-
duce different alarm calls for different threats, these
sounds could be merely considered expressions of differ-
ent types of fear, as earlier claimed by Darwin (1871).
2.2 |Enhanced attention and search
images
In order to verify whether animal signals are truly refer-
ential, it is necessary to determine whether these signals
enhance the attention of receivers with respect to target
objects, thereby generating a triadic relationship among
signalers, receivers and referents. Nevertheless, simply
examining the responses of call receivers to target objects
is not sufficient to evaluate this possibility, as calls may
merely evoke stereotyped behaviors (e.g., a fixed scan-
ning pattern), which may assist in detecting the referents.
If this is the case, the enhanced rate of referent detection
can be explained in terms of a chain of actions
(Bond, 2019).
If alarm calls are truly referential, then a key predic-
tion is that these calls evoke a mental image (or concept)
of predators in the receiver's mind. In cognitive and neu-
ral sciences, the retrieval of mental images is defined as
representation and the accompanying experience of sen-
sory information in the absence of a direct external stim-
ulus (Pearson, Naselaris, Holmes, & Kosslyn, 2015).
Therefore, to provide evidence for the evocation of visual
mental images by acoustic signals, it is necessary to dem-
onstrate that receivers retrieve a mental image even in
the absence of the target referent (Albers, Kok, Toni,
Dijkerman, & de Lange, 2013; Kok, Mostert, & de
Lange, 2017; Lee, Kravitz, & Baker, 2012). In this regard,
an object that resembles a predator to a certain extent but
alone does not evoke a specific behavior, could be used to
examine the evocation of visual search images. If alarm
calls enhance the visual attention of receivers to
predator-like objects prior to having detected a predator,
then it would provide evidence that these calls evoke
predator-specific search images in receivers
(Suzuki, 2019). This paradigm is based on human studies
showing that evocation of visual mental images by refer-
ential words enables listeners to enhance the detection of
otherwise unseen objects (Forder & Lupyan, 2019;
Lupyan & Ward, 2013).
2.3 |Alarm calls evoke a search image
Using a combination of snake-like objects and specific
alarm calls, Suzuki (2018) examined the possibility that
alarm calls evoke certain search images. On encountering
a predatory snake, such as a Japanese rat snake (Elaphe
climacophora), Japanese tits produce acoustically unique
alarm calls (Suzuki, 2014), which evoke context-specific
anti-snake behaviors in receivers (Suzuki, 2011, 2012a,
2015). For example, when female tits incubate eggs in the
nests, they respond to snake-specific alarm calls by
immediately fleeing the nest cavity, thereby enabling
them to evade attacks from snakes that can invade such
cavities (Suzuki, 2015). When outside the cavity, tits
respond to snake alarms by scanning the ground near the
nesting tree or by looking inside the nest cavity, as if
searching for snakes (Suzuki, 2012a). These findings
accordingly indicate that the snake-specific alarm calls of
Japanese tits do not merely convey the emotional or
internal states of signalers (e.g., fear), but may serve to
specifically indicate the presence of snakes, similar to the
use of human referential words (e.g., snake).
Japanese tits were initially attracted by the playback
of snake-specific alarm calls. Subsequently, the birds
were exposed a wooden stick moved in a snake-like man-
ner using thin string. If tits retrieve a visual mental image
of a snake from snake-specific alarm calls, they may use
this image to search for a snake and then show a specific
SUZUKI 223
response to the snake-like moving stick. During the play-
back of snake-specific alarm calls, Japanese tits
approached a stick moving snake-like along a tree trunk
(Figure 2). However, they did not respond to the same
stick when hearing other call types (general alarm calls
for non-snake predators or non-alarm, recruitment calls
for attracting birds in a non-predatory context). Similarly,
tits approached the stick when it was moved in a snake-
like manner on the ground in combination with snake
alarm calls, but not when combined with general alarm
calls. Consequently, stick approaches by tits are not con-
sidered to be part of a chain of reactions induced by dif-
ferences in scanning patterns during different playbacks.
In addition, tits did not approach moving sticks when the
movement was dissimilar to that of a snake
(i.e., swinging on a low shrub). Therefore, on hearing
snake-specific alarm calls, tits do not invariably approach
any novel objects out of increased curiosity. These results
indicate that prior to detecting a real snake, tits retrieve a
visual search image from snake-specific alarm calls and
use this to search for snakes. Snakes are typically cryptic
against the different types of substrates on which they
move, such as the ground, tree trunks and branches, and
thus the use of visual search images, as opposed to a ste-
reotyped scanning pattern, may contribute to the efficient
detection of snakes in complex environments.
2.4 |Search images evoked by
eavesdropping
Suzuki (2020) subsequently extended the aforementioned
studies on Japanese tits to include interspecific
communication. In montane regions of Japan, coal tits
(Periparus ater) can be found in the same habitats as Jap-
anese tits and often eavesdropon Japanese tit alarm
calls (Suzuki, 2016b). Experiments have revealed that
similar to Japanese tits, coal tits will also approach
snake-like moving sticks in response to hearing Japanese
tit snake-specific alarms, whereas they do not approach
the same sticks when hearing other call types, or if move-
ment of the stick is dissimilar to that of a snake. These
findings accordingly reveal that the retrieval of specific
search images from referential calls is not limited to
intraspecific communication but can occur in response to
interspecific eavesdropping. Recent studies on other spe-
cies of birds have shown that eavesdropping on the alarm
calls of other species is dependent on associative learning
between known threats (e.g., visual stimuli from preda-
tors or known types of alarm call) and novel sounds
(Keen, Cole, Sheehan, & Sheldon, 2020; Magrath, Haff,
Fallow, & Radford, 2015; Magrath, Haff, McLachlan, &
Igic, 2015; Potvin, Ratnayake, Radford, & Magrath, 2018).
Therefore, it is likely that these birds assign mental
images to heterospecific alarm calls through associative
learning, although further investigations are necessary to
confirm this assumption.
2.5 |Referentiality in other animals?
Although there have been only two studies that have
examined the influence of referential calls on visual
attention to referents (Suzuki, 2018, 2020), several studies
have shown that hearing alarm calls can influence how
individuals respond to auditory cues that relate to
ReceiverSignaler
Retrieval of snake-specific search images
Enhancement of visual attention to snakelike objects
Signal
Discrimination and categorization of predator types
Production of distinct alarm calls to snakes
Test stimuli
FIGURE 2 The referentiality of a signal can be assessed by examining the attention receivers give to a target referent. By using an object
that to a certain extent resembles a predatory snake, but alone does not evoke a response, it can be ascertained whether snake-specific alarm
calls evoke predator-specific search images in receivers. If individuals form a snake-specific search image, they may respond to snake-like
objects only when hearing snake-specific alarm calls [Color figure can be viewed at wileyonlinelibrary.com]
224 SUZUKI
predators. For example, Diana monkeys (Cercopithecus
diana) show a reduction in alert responses to predator
vocalizations after hearing conspecific alarm calls only if
these alarm call types match the predator types
(Zuberbühler, Cheney, & Seyfarth, 1999). These findings
thus indicate that these monkeys detect referential infor-
mation related to predator type from specific alarm calls,
and thereafter alter their response to subsequent predator
vocalizations. This design is comparable to the paradigms
used by Suzuki (2018, 2020), but differs in two respects.
First, although the monkey study examined the associa-
tion between two different auditory stimuli, Suzuki (2018,
2020) investigated the association between auditory
(calls) and visual (sticks) stimuli, highlighting the impor-
tance of the integration of cross-modal information in
referential communication. Second, whereas Zuberbühler
et al. (1999) used predator vocalizations as the test stim-
uli, Suzuki (2018, 2020) used an object that bears a cer-
tain resemblance to the target referents (i.e., snakes) but
alone does not evoke any specific response in receivers.
This design allows us to explore the retrieval of mental
images, differentiated from the visual (or direct) percep-
tion of objects, as visual mental images are defined as the
retrieval of mental representations without seeing or
awareness of the external referents (Albers et al., 2013;
Kok et al., 2017; Lee et al., 2012).
Research over the past four decades has revealed that
numerous species of birds and mammals produce specific
vocalizations on encountering predators or when finding
a source of food (Gill & Bierema, 2013; Suzuki, 2016a;
Townsend & Manser, 2013; Zuberbühler, 2009). By
adopting a paradigm similar to that used by Suzuki (2018,
2020), it could be determined whether these calls evoke
mental states in receivers, thereby generating a triadic
relationship among signalers, receivers and referents.
3|COMPOSITIONALITY
3.1 |Syntax and compositionality
The generative power of language is dependent to a large
extent on syntax and compositionality. Syntax is defined
as a set of rules whereby words can be combined into
well-formed complexes (Hurford, 2012). A combination
of words is considered compositional if its overall mean-
ing depends on its elements and the manner in which
they are syntactically combined (the principle of
compositionality; Partee, ter Meulen, & Wall, 1990;
Pelletier, 1994). Studies on animal syntax began with
analyses of bird songs. The songs of passerine birds typi-
cally consist of multiple sound elements, which are com-
bined according to ordering rules (Catchpole &
Slater, 2008; Podos, Huber, & Taft, 2004). Although bird
songs can be structurally complex, their meaning is con-
sidered to be simple, with song phrases generally having
certain functions, notably facilitating mate attraction, ter-
ritorial defense, or both (Catchpole & Slater, 2008; Podos
et al., 2004). Combinations of sound elements are widely
detected in non-human animals, including bats (Bohn,
Smarsh, & Smotherman, 2013), mice (Chabout, Sarkar,
Dunson, & Jarvis, 2015), mongooses (Fitch, 2012; Jansen,
Cant, & Manser, 2012; Rauber, Kranstauber, &
Manser, 2020), cetaceans (Payne & McVay, 1971; Mer-
cedo, Herman, & Pack 2005), gibbons (Clarke,
Reichard, & Zuberbühler, 2006) and gorillas (Hedwig,
Mundry, Robbins, & Boesch, 2015). However, most of
these vocal sequences have been considered as holistic
sequences, the meanings of which are conveyed by the
overall sequences, and consequently, the sequence is not
considered as a compositional expression. Thus, in ani-
mal studies, the term syntaxhas long been used to sig-
nify the rules for combining meaningless elements
(Hurford, 2012; Marler, 1998).
However, the findings of recent studies have indi-
cated that several animals are able to combine meaning-
ful elements into sequences (Suzuki, Griesser,
et al., 2019; Suzuki, Wheatcroft, et al., 2019). Accordingly,
it has become necessary to redefine the term syntaxto
correspond with its definition in linguistics. In this
regard, Suzuki and Zuberbühler (2019) recently redefined
syntax as a set of principles by which meaning-bearing
units can be combined into well-formed complexes,
which matches the definition of human syntax in terms
of compositionality and can be applied for the analyses of
animal vocal sequences.
3.2 |Compositional syntax in bird calls
Initial support for the occurrence of compositional syn-
tax in animal vocal sequences was provided by the
findings of studies on Japanese tits (Suzuki,
Wheatcroft, & Griesser, 2016, 2017). Japanese tits pro-
duce alert calls (so-called chickacalls), which serve
to warn conspecifics of the presence of a range of dif-
ferent predators (Suzuki, 2014), whereas they produce
acoustically distinct recruitment calls when attracting
conspecifics in non-predatory situations (Suzuki
et al., 2016). Interestingly, Japanese tits combine these
two call types into alert-recruitment call sequences
when attracting conspecifics for mobbing predators
(Suzuki, 2014). Playback experiments have revealed
that tits display different behaviors when hearing alert
and recruitment calls, moving their head from side to
side, as if scanning for danger, when hearing alert calls,
SUZUKI 225
but approaching the sound source (i.e., a presumed sig-
naler) in response to hearing recruitment calls (Suzuki
et al., 2016). In response to alert-recruitment call
sequences, tits combine both behaviors, that is, they
progressively approach the sound source, while contin-
uously scanning the horizon (Figure 3a). Notably, how-
ever, receivers do not appear to produce these two
behaviors merely by exhibiting responses to the two
meaningful units at the same time, as it has been
found that they reduce their response to artificially
reversed versions of the same component calls (recruit-
ment-alert call sequences) (Figure 3b). Thus, it is likely
that Japanese tits use the same ordering rule (alert-
recruitment ordering rule) when combining calls and
when decoding call sequences.
3.3 |Decoding novel call sequences
Although Suzuki et al. (2016) have suggested that Japa-
nese tits decode compositional meaning from call
sequences using an ordering rule, there remains the pos-
sibility that these birds simply reduce their response to
reversed call sequences because these sequences are
novel and unfamiliar. If this is true, then tits may recog-
nize alert-recruitment call sequences as a holistic mes-
sage, signifying mobbing,rather than by perceiving the
meanings of the component calls.
Suzuki et al. (2017) subsequently developed a novel
experimental approach to assess this latter possibility.
During non-breeding seasons, Japanese tits form mixed-
species flocks with willow tits (Poecile montanus), within
Response B
Response A
Natural call typesArtificially generated call types
Compound
response
(A + B)
Inappropriate
response
Compound
response
(A + B)
Inappropriate
response
(b)
(c)
(a) Combinations of calls evoke a
compound response
Receiver response depends on
the ordering of call elements
Receivers can use an ordering rule
to decode novel call sequences
Recruitment call
(Japanese tit)
Alert call
(Japanese tit)
Alert call + recruitment call
(both Japanese tit)
Recruitment call + alert call
(both Japanese tit)
Recruitment call + alert call
(Willow tit + Japanese tit)
Alert call + recruitment call
(Japanese tit + willow tit)
esnopseRilumits kcabyalP
FIGURE 3 The compositionality of
call sequences can be assessed by
examining the responses of receivers to
individual call elements and their
combinations (a). According to the
definition of syntax and
compositionality, it is also necessary to
examine the role of call ordering in
receivers' responses (b). Furthermore, to
rule out the possibility that call
sequences provide a single, holistic
meaning as a whole, it would be
informative to assess the response to
artificially generated novel call
sequences, such as combinations of calls
from two species (c) [Color figure can be
viewed at wileyonlinelibrary.com]
226 SUZUKI
which, the Japanese tits approach in response to the
recruitment calls of both willow tits and conspecifics to
maintain flock cohesion (Suzuki, 2012b, 2012c; Suzuki &
Kutsukake, 2017). However, when the recruitment calls
of willow tits were artificially shortened to match the
length of Japanese tit recruitment calls (while
maintaining their natural pitch), Japanese tits did not
approach these modified calls (Suzuki et al., 2017),
thereby indicating that the responses of Japanese tits are
not due to the acoustic similarity of the recruitment calls
of willow tits and conspecifics, but rather because they
perceive them as two distinct vocalizations with a shared
meaning. Therefore, from the perspective of Japanese tit
receivers, willow tit recruitment calls are synonymous
with their own species' recruitment calls, providing the
opportunity to generate artificial novel call sequences
composed of conspecific alert calls and heterospecific
recruitment calls. In this regard, the findings of playback
experiments have revealed that Japanese tits approach
both natural and novel (mixed-species) call sequences
only when the combinations of call units follow the alert-
recruitment ordering, thereby indicating that they use an
ordering rule to decode even novel combinations of calls
(Figure 3c).
3.4 |Compositional syntax in other
animal signals?
An increasing body of evidence indicates that several
other animal species may also combine meaning-bearing
elements into complex utterances. For example, Cam-
pbell's monkeys (Cercopithecus campbelli) produce acous-
tically discrete types of calls (Krak,”“Hokand Wak)
when perceiving a threat, such as leopards or crowned
eagles (Ouattara, Lemasson, & Zuberbühler, 2009a,
2009b), and often combine these calls with a short oo
sound at the end, producing Krak-oo,”“Hok-ooand
Wak-oovocalizations. Field observations have revealed
that two of these sequences (Krak-ooand Hok-oo)
are more likely to be produced by monkeys experiencing
low-threat situations, such as falling trees or the presence
of non-predatory animals, indicating that oomay act as
a suffix to modify the warning content of alarm calls
(Ouattara et al., 2009a, 2009b). Similarly, call combina-
tions have been documented in several species of birds
and mammals (Engesser & Townsend, 2019), although in
most cases, the findings of these studies could not rule
out the possibility that individuals recognize call
sequences as a single meaningful unit, but not as a com-
positional expression (Kuhn, Keenan, Arnold, &
Lemasson, 2018; Suzuki, Griesser, et al., 2019; Suzuki,
Wheatcroft, et al., 2019). Moreover, even if two meaning-
bearing units are combined, it is possible that the resul-
tant combined call sequences convey an unrelated, third
meaning, comparable to idioms in human language. For
example, putty-nosed monkeys (Cercopithecus nictitans)
combine two discrete alarm calls, each of which seem-
ingly denotes a different threat, whereas the call
sequences are used to stimulate long-distance group
movements (Arnold & Zuberbühler, 2006a, 2006b, 2008,
2012). Further experiments are therefore required to
determine whether call combinations in other animals
are semantically compositional or whether instead they
convey holistic messages.
4|CONCLUSIONS AND FUTURE
DIRECTIONS
4.1 |The family Paridae
In this review, I have described recent findings relating to
the linguistic capabilities (referentiality and
compositionality) of the Japanese tit. This species belongs
to the family Paridae, which worldwide consists of some
55 species of tits, titmice and chickadees (Johansson
et al., 2013). In common with Japanese tits, these birds
may also use different types of alarm calls for specific
predators. For example, zior seeetcalls are apparently
associated with the detection of aerial predators, such as
flying raptors (Haftorn, 2000; Zachau & Freeberg, 2012).
In addition, snake-specific alarm calls have been recorded
in coal tits, willow tits, and varied tits (Sittiparus varius)
(Suzuki, personal observations). These calls may also con-
vey referential information to receivers. Combinations of
calls (or notes) have also been documented in numerous
parid species (Krams, Krama, Freeberg, Kullberg, &
Lucas, 2012; Lucas & Freeberg, 2007). For example, Caro-
lina chickadees (Poecile carolinensis) produce multiple dis-
crete types of notes that are combined to yield a diverse
variety of sequences (Freeberg, 2008; Freeberg &
Lucas, 2012; Lucas & Freeberg, 2007). The use of different
sequences depends on the eliciting context, and therefore
may convey different types of information (Lucas &
Freeberg, 2007). Moreover, there exists a diverse range in
the complexity of vocalizations among different species,
thereby providing a model system for comparative studies
(Krams et al., 2012).
4.2 |Evolutionary drivers
Species within the family Paridae may provide an ideal
opportunity to perform comparative studies for exploring
the socioecological factors that drive the evolution of
SUZUKI 227
linguistic capabilities. These birds inhibit a variety of
habitats, from forests to savannas, thereby representing
an example of adaptive radiation. In addition, they have
evolved different systems of sociality, for example, in
Oxford (United Kingdom), great tits (Parus major) form
flocks with a high degree of fissionfusion interactions,
whereas blue tits (Cyanistes caeruleus) and marsh tits
(Poecile palustris) form relatively stable groups (Farine,
Aplin, Sheldon, & Hoppitt, 2015). In dense forests or in
species that form loose flocks, visual contact with other
individuals may be limited, thereby favoring individuals
that transmit multiple types of information at the same
time. In such cases, syntax and compositionality would
be selectively advantageous. In contrast, birds living in
open habitats or forming cohesive flocks may not need to
evolve such complex utterances, as a simple vocalization
that attracts the visual attention of other birds might be
sufficient to enable the caller to notify other individuals
of prevailing circumstances.
Predator composition may also be an important factor
driving the evolution of avian vocal repertoire. In Asia
and southern Europe, there are several species of snakes
that depredate on birds (Ha et al., 2020; Sorace
et al., 2000), whereas in northern Europe, most snake
species are unable to climb trees and therefore do not
represent a threat to birds. Consequently, snake-specific
alarm calls may not have evolved in northern Europe.
Instead, great spotted woodpeckers (Dendrocopos major)
are one of the major predators of the eggs and nestlings
of tits in Europe (Skwarska, Kalinski, Wawrzyniak, &
Banbura, 2009), although in Japan, the same species of
woodpecker lives in the same habitats as tits but does not
attack their nests. Future comparative studies may reveal
the socioecological factors that contribute to driving the
evolution of referentiality and compositionality in bird
calls, thereby providing a unique model for examining
the principles and general rules which drive the evolu-
tion of linguistic capabilities.
4.3 |Genetic and neural bases
A comparative approach can also be applied to analyze
the genetic and neural mechanisms underlying commu-
nication. For example, accelerated evolution of early
growth response protein 1, a transcription factor that is
believed to play a role in vocal communication, learning
and memory (Clayton, 2013; Dragunow, 1996; Hara,
Kubikova, Hessler, & Jarvis, 2007), has been found in the
tit lineage (Laine et al., 2016). A similar positive selection
has, however, not been detected in the ground pecker
(Pseudopodoces humilis), a bird that inhabits open areas
(savanna) and seemingly produces less complex
vocalizations (Laine et al., 2016). In humans, forkhead
box protein P2 has been shown to play roles in the cor-
rect development of language and speech (Enard
et al., 2002), and is also associated with song learning in
passerines (Haesler et al., 2007; Teramitsu &
White, 2006). Although the mechanisms whereby these
two genes influence linguistic capabilities remain to be
elucidated, further detailed studies may reveal the links
between genetic structure, neural mechanisms and vocal
communication in parids.
4.4 |Conclusion
Although long considered uniquely human traits, recent
studies have revealed that referentiality and
compositionality have evolved in the avian lineage and
may also be involved in communication systems of a
wide range of animals. Several new methodologies, such
as object presentation in conjunction with call playbacks
or use of mixed-species call sequences, will contribute to
enhancing our understanding of how receivers extract
information from acoustic signals. Species within the
family Paridae have a wide distribution range, including
the Northern Hemisphere and Africa, and accordingly
represent a valuable group in terms of comparative stud-
ies. Detailed investigations on different parid species
would advance our understanding of how socioecological
factors drive the evolution of linguistic capabilities and
their underlying genetic and neural mechanisms.
ACKNOWLEDGMENTS
This work was supported by JSPS KAKENHI grant num-
bers 16K18616, 18K14789, 18H05074, 20H05001 and
20H03325, and the Hakubi Project funding. I thank two
anonymous referees for valuable comments on the
manuscript.
ORCID
Toshitaka N. Suzuki https://orcid.org/0000-0001-6405-
7653
REFERENCES
Albers, A. M., Kok, P., Toni, I., Dijkerman, H. C., & de
Lange, F. P. (2013). Shared representations for working
memory and mental imagery in early visual cortex. Current
Biology,23, 14271431.
Arnold, K., & Zuberbühler, K. (2006a). Language evolution: Seman-
tic combinations in primate calls. Nature,441, 303.
Arnold, K., & Zuberbühler, K. (2006b). The alarm-calling system of
adult male putty-nosed monkeys, Cercopithecus nictitans mar-
tini.Animal Behaviour,72, 643653.
Arnold, K., & Zuberbühler, K. (2008). Meaningful call combinations
in a non-human primate. Current Biology,18, R202R203.
228 SUZUKI
Arnold, K., & Zuberbühler, K. (2012). Call combinations in mon-
keys: Compositional or idiomatic expressions? Brain and Lan-
guage,120, 303309.
Berwick, R. C., Okanoya, K., Beckers, G. J., & Bolhuis, J. J. (2011).
Songs to syntax: The linguistics of birdsong. Trends in Cognitive
Sciences,15, 113121.
Bohn, K. M., Smarsh, G. C., & Smotherman, M. (2013). Social con-
text evokes rapid changes in bat song syntax. Animal Behav-
iour,85, 14851491.
Bond, A. B. (2019). Searching images and the meaning of alarm
calls. Learning and Behavior,47, 109110.
Catchpole, C. K., & Slater, P. J. (2008). Bird song: Biological themes
and variations. Cambridge, England: Cambridge University
Press.
Chabout, J., Sarkar, A., Dunson, D. B., & Jarvis, E. D. (2015). Male
mice song syntax depends on social contexts and influences
female preferences. Frontiers in Behavioral Neuroscience,9, 76.
Chomsky, N. A. (1965). Aspects of the theory of syntax. Cambridge,
MA: MIT Press.
Chomsky, N. A. (1993). A minimalist program for linguistic theory.
In K. Hare & K. Keyser (Eds.), The view from building (Vol. 20,
pp. 152). Cambridge, MA: MIT Press.
Clarke, E., Reichard, U. H., & Zuberbühler, K. (2006). The syntax
and meaning of wild gibbon songs. PLoS One,1, e73.
Clayton, D. F. (2013). The genomics of memory and learning in
songbirds. Annual Review of Genomics and Human Genetics,
14,4565.
Darwin, C. (1871). The descent of man, and selection in relation to
sex. London, England: John Murray.
Dragunow, M. (1996). A role for immediate-early transcription fac-
tors in learning and memory. Behavior Genetics,26, 293299.
Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S. L., Wiebe, V.,
Kitano, T., Pääbo, S. (2002). Molecular evolution of FOXP2, a
gene involved in speech and language. Nature,418, 869872.
Engesser, S., & Townsend, S. W. (2019). Combinatoriality in the
vocal systems of nonhuman animals. WIREs Cognitive Science,
10, e1493.
Farine, D. R., Aplin, L. M., Sheldon, B. C., & Hoppitt, W. (2015).
Interspecific social networks promote information transmission
in wild songbirds. Proceedings of the Royal Society B: Biological
Sciences,282, 20142804.
Fitch, W. T. (2010). The evolution of language. Cambridge, England:
Cambridge University Press.
Fitch, W. T. (2012). Segmental structure in banded mongoose calls.
BMC Biology,10, 98.
Forder, L., & Lupyan, G. (2019). Hearing words changes color percep-
tion: Facilitation of color discrimination by verbal and visual cues.
JournalofExperimentalPsychology:General,148, 11051123.
Freeberg, T. M. (2008). Complexity in the chick-a-dee call of Caro-
lina chickadees (Poecile carolinensis): Associations of context
and signaler behavior to call structure. Auk,125, 896907.
Freeberg, T. M., & Lucas, J. R. (2012). Information theoretical
approaches to chick-a-dee calls of Carolina chickadees (Poecile
carolinensis). Journal of Comparative Psychology,126,6881.
Genty, E., & Zuberbuhler, K. (2014). Spatial reference in a bonobo
gesture. Current Biology,14, 15961600.
Gill, S. A., & Bierema, A. M. K. (2013). On the meaning of alarm
calls: A review of functional reference in avian alarm calling.
Ethology,119, 449461.
Griesser, M., Wheatcroft, D., & Suzuki, T. N. (2018). From bird calls
to human language: Exploring the evolutionary drivers of com-
positional syntax. Current Opinion in Behavioral Sciences,
21,612.
Ha, J., Lee, K., Yang, E., Kim, W., Song, H., Hwang, I.,
Jablonski, P. (2020). Experimental study of alarm calls of the
oriental tit (Parus minor) toward different predators and reac-
tions they induce in nestlings. Ethology,126, 610619.
Haesler, S., Rochefort, C., Georgi, B., Licznerski, P., Osten, P., &
Scharff, C. (2007). Incomplete and inaccurate vocal imitation
after knockdown of FoxP2 in songbird basal ganglia nucleus
area X. PLoS Biology,5, e321.
Haftorn, S. (2000). Contexts and possible functions of alarm calling
in the willow tit, Parus montanus; the principle of better safe
than sorry.Behaviour,137, 437449.
Hara, E., Kubikova, L., Hessler, N. A., & Jarvis, E. D. (2007). Role
of the midbrain dopaminergic system in modulation of vocal
brain activation by social context. European Journal of Neuro-
science,25, 34063416.
Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of
language: What is it, who has it, and how did it evolve? Science,
298, 15691579.
Hedwig, D., Mundry, R., Robbins, M. M., & Boesch, C. (2015). Con-
textual correlates of syntactic variation in mountain and west-
ern gorilla close-distance vocalizations: Indications for lexical
or phonological syntax? Animal Cognition,18, 423435.
Heinrich, B. (1988). Winter foraging at carcasses by three sympatric
corvids, with emphasis on recruitment by the raven, Corvus
corax.Behavioral Ecology and Sociobiology,23, 141156.
Heinrich, B., & Marzluff, J. (1991). Do common ravens yell because
they want to attract others? Behavioral Ecology and Sociobiol-
ogy,28,1321.
Hurford, J. R. (2007). The origins of meaning: Language in the light
of evolution. Oxford, England: Oxford University Press.
Hurford, J. R. (2012). The origins of grammar: Language in the light
of evolution II. Oxford, England: Oxford University Press.
Hurford, J. R. (2014). The origins of language: A slim guide. Oxford,
England: Oxford University Press.
Jansen, D. A. W. A. M., Cant, M. A., & Manser, M. B. (2012). Seg-
mental concatenation of individual signatures and context cues
in banded mongoose (Mungos mungo) close calls. BMC Biology,
10, 97.
Johansson, U. S., Ekman, J., Bowie, R. C. K., Halvarsson, P.,
Ohlson, J. I., Price, T. D., & Ericson, P. G. P. (2013). A
complete multilocus species phylogeny of the tits and chick-
adees (Aves: Paridae). Molecular Phyologenetics and Evolu-
tion,69, 852860.
Keen, S. C., Cole, E. F., Sheehan, M. J., & Sheldon, B. C. (2020).
Social learning of acoustic anti-predator cues occurs between
wild bird species. Proceedings of the Royal Society B: Biological
Sciences,287, 20192513.
Kok, P., Mostert, P., & de Lange, F. P. (2017). Prior expectations
induce prestimulus sensory templates. Proceedings of the
National Academy of Sciences of the United States of America,
114, 1047310478.
Krams, I., Krama, T., Freeberg, T. M., Kullberg, C., & Lucas, J. R.
(2012). Linking social complexity and vocal complexity: A parid
perspective. Philosophical Transactions of the Royal Society B:
Biological Sciences,367, 18791891.
SUZUKI 229
Kuhn, J., Keenan, S., Arnold, K., & Lemasson, A. (2018). On the oo
suffix of Campbell's monkeys. Linguistic Inquiry,49,169181.
Laine, V. N., Gossman, T. I., Schachtschneider, K. M.,
Garroway, C. J., Madsen, O., Verhoeven, K. J. F.,
Groenen, M. A. M. (2016). Evolutionary signals of selection on
cognition from the great tit genome and methylome. Nature
Communications,7, 10474.
Lee, S.-H., Kravitz, D. J., & Baker, C. I. (2012). Disentangling visual
imagery and perception of real-world objects. NeuroImage,59,
40644073.
Lucas, J. R., & Freeberg, T. M. (2007). Informationand the chick-
a-dee call: Communicating with a complex vocal system. In
K. A. Otter (Ed.), Ecology and behavior of chickadees and tit-
mice: An integrated approach (pp. 199213). Oxford, England:
Oxford University Press.
Lupyan, G., & Ward, E. J. (2013). Language can boost otherwise
unseen objects into visual awareness. Proceedings of the
National Academy of Sciences of the United States of America,
110, 1419614201.
Macedonia, J. M., & Evans, C. S. (1993). Variation among mamma-
lian alarm call systems and the problem of meaning in animal
signals. Ethology,93, 177197.
Magrath, R. D., Haff, T. M., Fallow, P. M., & Radford, A. N. (2015).
Eavesdropping on heterospecific alarm calls: From mechanisms
to consequences. Biological Reviews,90, 560586.
Magrath, R. D., Haff, T. M., McLachlan, J. R., & Igic, B. (2015). Wild
birds learn to eavesdrop on heterospecific alarm calls. Current
Biology,25, 20472050.
Marler, P. (1998). Animal communication and human language. In
N. G. Jablonski & L. C. Aiello (Eds.), The origin and diversifica-
tion of language (pp. 120). San Francisco, CA: California Acad-
emy of Sciences.
Mercedo, I. I. I. E., Herman, L. M., & Pack, A. A. (2005). Song copy-
ing by humpback whales: Themes and variations. Animal Cog-
nition,8,93102.
Ouattara, K., Lemasson, A., & Zuberbühler, K. (2009a). Campbell's
monkeys concatenate vocalizations into context-specific call
sequences. Proceedings of the National Academy of Sciences of
the United States of America,106, 2202622031.
Ouattara, K., Lemasson, A., & Zuberbühler, K. (2009b). Campbell's
monkeys use affixation to alter call meaning. PLoS ONE,4,
e7808.
Partee, B., ter Meulen, A., & Wall, R. E. (1990). Mathematical
methods in linguistics. Dordrecht: Kluwer Academic.
Payne, R. S., & McVay, S. (1971). Songs of humpback whales. Sci-
ence,173, 587597.
Pearson, J., Naselaris, T., Holmes, E. A., & Kosslyn, S. M. (2015).
Mental imagery: Functional mechanisms and clinical applica-
tions. Trends in Cognitive Sciences,19, 590602.
Pelletier, F. J. (1994). The principle of semantic compositionality.
Topoi,13,1124.
Pfenning, A. R., Hara, E., Whitney, O., Rivas, M. V., Wang, R.,
Roulhac, P. L., Jarvis, E. D. (2014). Convergent transcrip-
tional specializations in the brains of humans and song-
learning birds. Science,346, 1256846.
Podos, J., Huber, S. K., & Taft, B. (2004). Bird song: The interface of
evolution and mechanism. Annual Review of Ecology, Evolution,
and Systematics,35,5587.
Potvin, D. A., Ratnayake, C. P., Radford, A. N., & Magrath, R. D.
(2018). Birds learn socially to recognize heterospecific alarm
calls by acoustic association. Current Biology,28, 26322637.
Rauber, R., Kranstauber, B., & Manser, M. B. (2020). Call order
within vocal sequences of meerkats contains temporary contex-
tual and individual information. BMC Biology,18, 119.
Rendall, D., Owren, M. J., & Ryan, M. J. (2009). What do animal
signals mean? Animal Behaviour,78, 233240.
Seyfarth, R. M., Cheney, D. L., & Marler, P. (1980). Monkey
responses to three different alarm calls: Evidence of predator
classification and semantic communication. Science,210,
801803.
Skwarska, J. A., Kalinski, A., Wawrzyniak, J., & Banbura, J. (2009).
Opportunity makes a predator: Great spotted woodpecker pre-
dation on tit broods depends on nest box design. Ornis Fennica,
86, 109112.
Sorace, A., Consiglio, C., Tanda, F., Lanzuisi, E., Cattaneo, A., &
Iavicoli, D. (2000). Predation by snakes on eggs and nestlings of
great tit Parus major and blue tit P. caeruleus.Ibis,142,
328330.
Struhsaker, T. T. (1967). Auditory communication among vervet
monkeys (Cercopithecus aethiops). In S. Altmann (Ed.), Social
communication among primates (pp. 281324). Chicago, IL:
University of Chicago Press.
Suzuki, T. N. (2011). Parental alarm calls warn nestlings about dif-
ferent predatory threats. Current Biology,21, R15R16.
Suzuki, T. N. (2012a). Referential mobbing calls elicit different
predator-searching behaviours in Japanese great tits. Animal
Behaviour,84,5357.
Suzuki, T. N. (2012b). Long-distance calling by the willow tit,
Poecile montanus, facilitates formation of mixed- species forag-
ing flocks. Ethology,118,1016.
Suzuki, T. N. (2012c). Calling at a food source: Context-dependent
variation in note composition of combinatorial calls in willow
tits. Ornithological Science,11, 103107.
Suzuki, T. N. (2014). Communication about predator type by a bird
using discrete, graded and combinatorial variation in alarm
calls. Animal Behaviour,87,5965.
Suzuki, T. N. (2015). Assessment of predation risk through referential
communication in incubating birds. Scientific Reports,5, 10239.
Suzuki, T. N. (2016a). Semantic communication in birds: Evidence
from field research over the past two decades. Ecological
Research,31, 307319.
Suzuki, T. N. (2016b). Referential calls coordinate multi-species
mobbing in a forest bird community. Journal of Ethology,34,
7984.
Suzuki, T. N. (2018). Alarm calls evoke a visual search image of a
predator in birds. Proceedings of the National Academy of Sci-
ences of the United States of America,115, 15411545.
Suzuki, T. N. (2019). Imagery in wild birds: Retrieval of visual infor-
mation from referential alarm calls. Learning and Behavior,47,
111114.
Suzuki, T. N. (2020). Other species' alarm calls evoke a
predator-specific search image in birds. Current Biology,30,
26162620.
Suzuki, T. N., Griesser, M., & Wheatcroft, D. (2019). Syntactic rules
in avian vocal sequences as a window into the evolution of
compositionality. Animal Behaviour,151, 267274.
230 SUZUKI
Suzuki, T. N., & Kutsukake, N. (2017). Foraging intention affects
whether willow tits call to attract members of mixed-species
flocks. Royal Society Open Science,4, 170222.
Suzuki, T. N., Wheatcroft, D., & Griesser, M. (2016). Experimental
evidence for compositional syntax in bird calls. Nature Commu-
nications,7, 10986.
Suzuki, T. N., Wheatcroft, D., & Griesser, M. (2017). Wild birds use
an ordering rule to decode novel call sequences. Current Biol-
ogy,27, 23312336.
Suzuki, T. N., Wheatcroft, D., & Griesser, M. (2018). Call combina-
tions in birds and the evolution of compositional syntax. PLoS
Biology,16, e2006532.
Suzuki, T. N., Wheatcroft, D., & Griesser, M. (2019). The syntax-
semantics interface in animal vocal communication. Philosoph-
ical Transactions of the Royal Society B: Biological Sciences,375,
20180405.
Suzuki, T. N., & Zuberbühler, K. (2019). Animal syntax. Current
Biology,29, R669R671.
Szipl, G., Boeckle, M., Wascher, C. A. F., Spreafico, M., &
Bugnyar, T. (2015). With whom to dine? Ravens' responses to
food-associated calls depend on individual characteristics of the
caller. Animal Behaviour,99,3342.
Tallerman, M., & Gibson, K. R. (2012). The Oxford handbook of lan-
guage evolution. Oxford, England: Oxford University Press.
Teramitsu, I., & White, S. A. (2006). FoxP2 regulation during undirected
singing in adult songbirds. Journal of Neuroscience,26,73907394.
Terrace, H. S., Petitto, L. A., Sanders, R. J., & Bever, T. G. (1979).
Can an ape create a sentence? Science,206, 891902.
Tomasello, M. (1995). Joint attention as social cognition. In C.
Moore, P. J. Dunham, & P. Dunham (Eds.), Joint attention: Its
origins and role in development (pp. 103130). Hillsdale, NJ:
Lawrence Erlbaum.
Tomasello, M. (1999). The human adaptation for culture. Annual
Review of Anthropology,28, 509529.
Tomasello, M., Hare, B., & Lehann, H. (2007). Reliance on head
versus eyes in the gaze following of great apes and human
infants: The cooperative eye hypothesis. Journal of Human Evo-
lution,52, 314320.
Townsend, S. W., & Manser, M. B. (2013). Functionally referential
communication in mammals: The past, present and the future.
Ethology,119,111.
Wheeler, B. C., & Fischer, J. (2012). Functionally referential sig-
nals: A promising paradigm whose time has passed. Evolu-
tionary Anthropology: Issues, News, and Reviews,21,
195205.
Zachau, C. E., & Freeberg, T. M. (2012). Chick-a-dee call variation
in the context of flyingavian predator stimuli: A field study
of Carolina chickadees (Poecile carolinensis). Behavioral Ecology
and Sociobiology,66, 683690.
Zuberbühler, K. (2009). Survivor signals: The biology and psychol-
ogy of animal alarm calling. Advances in the Study of Behavior,
40, 277322.
Zuberbühler, K., Cheney, D. L., & Seyfarth, R. M. (1999). Concep-
tual semantics in a nonhuman primate. Journal of Comparative
Psychology,113,3342.
How to cite this article: Suzuki TN. Animal
linguistics: Exploring referentiality and
compositionality in bird calls. Ecological Research.
2021;36:221231. https://doi.org/10.
1111/1440-1703.12200
SUZUKI 231
... This paper will not discuss semantic or syntactic compositionality (Gabrić, 2021;Suzuki, 2021), although relevant for a discussion of content richness in animal utterances. There is much research on supposedly differentiated and more advanced structuring of vocal forms, especially on non-human primates and birds (Suzuki, 2021;Townsend et al., 2018). ...
... This paper will not discuss semantic or syntactic compositionality (Gabrić, 2021;Suzuki, 2021), although relevant for a discussion of content richness in animal utterances. There is much research on supposedly differentiated and more advanced structuring of vocal forms, especially on non-human primates and birds (Suzuki, 2021;Townsend et al., 2018). It seems premature though to conclude on implications for the semanticity of animals in general. ...
Article
Full-text available
This meta-study of animal semantics is anchored in two claims, seemingly creating a fuzzy mismatch, that animal utterances generally appear to be simple in structure and content variation and that animals’ communicative understanding seems disproportionally more advanced. A set of excerpted, new studies is chosen as basis to discuss whether the semantics of animal uttering and understanding can be fused into one. Studies are prioritised due to their relatively complex designs, giving priority to dynamics between syntax, semantics, pragmatics, and between utterers and receivers in context. A communicational framework based on utterance theory is applied as a lens for inspection of how these aspects relate to the assumed mismatch. Inspection and discussions of the studies bring several features to surface of which five are stressed in the following. Firstly, both syntactic structures and possible semantic content are seen as lean, although richer than earlier believed, and research continues to reveal new complexities in utterances. Secondly, there is a clear willingness to broaden the perception of animals’ semantic capacity to comprehend communication both by arguing theoretically and by generating empirical research in new contexts. Thirdly, the ambition to make sense of these tendencies is still often motivated by an evolutionary search for early building blocks for verbal language, with the pro et cons that such a position can have. Fourthly, the ‘allowed’ scientific frame for studying semantic capacity among animals is extended to new fields and contexts challenging the only-in-the-wild norm. Fifthly, the dilemma of integrating uttering and understanding as aspects of an after all functional communicational system, calls for new epistemological concepts to make sense of the claimed mismatch. Affordances , abduction , life-genre , and lifeworld are suggested.
... Clark and Wrangham (1994) concluded from their analysis of arrival pant-hoots that they do not have food-associated semantic content but that they "mark status". Falk (2001) has described pant-hoots as "a kind of protosinging and protomusic", similar to Marler's (2001) views of pant-hoots as "protomusical", while Dessalless Riede et al., 2004) and pant-hoots are not typically featured in discussions on animal semantics (Egnor et al., 2006;Evans, 1997;Gabrić 2019Gabrić , 2021aHurford, 2007Hurford, , 2012Suzuki, 2021). ...
Preprint
Full-text available
I read with great interest the study by Leroux et al. [(2021) Anim Behav 179, 41–50] who investigated the nature of pant-hoot–food-call combinations in a community of wild chimpanzees (Pan troglodytes schweinfurthii) at the Budongo Conservation Field Station, Budongo Forest, Uganda. The authors propose, among others, that they reveal the first evidence that wild chimpanzees are able “to combine meaning-bearing units into larger structures” – i.e., that they are capable of semantic compositionality and, by extension, syntax. Their analysis represents an important addition to a growing body of research and discussions on communicational combinatoriality in wild primates and specifically apes, and, by extension, extinct hominins. Incidentally, I have recently published a paper in Animal Cognition in which I also suggested, based on a reanalysis of existing data, that wild chimpanzees can display semantic compositionality and syntax, i.e., are able to combine meaningful units [Gabrić (2021) Anim Cogn, online ahead of print]. In the present commentary, I argue that Leroux et al.’s (2021) interpretation of the data is ungrounded given that (1) unlike for food calls, there is currently very little if any indication in the scientific literature that pant-hoots have semantic content, i.e., are meaningful, (2) Leroux et al. (2021) did not investigate their a priori assumption that the observed pant-hoots are in fact meaningful/semantic, (3) they did not report on recipients’ behaviors in association with neither the individual nor combined calls, and (4) they did not compare the callers’ behaviours in association with the individual calls vs. combined calls. Since pant-hoots feature prominently in the chimpanzee vocal repertoire and the debate on their eventual meaningfulness/semanticity is still wide open, this represents a fine opportunity to revisit this issue in the context of Leroux et al.’s (2021) study. Their paper further raises several other less significant questions. Notwithstanding, their paper brings important novel insights into communicational combinatoriality in wild chimpanzees and supports the notion of using linguistic methods in wild animal communication research.
... Utterances comprised of a single denotative unit are found in wild nonhuman animals where they most often relate to vocalizations denoting predators and potentially intrusive species, as well as different food types (Struhsaker, 1967;Seyfarth et al., 1980;Karakashian et al., 1988;Cheney and Seyfarth, 1990;Evans et al., 1993;Uhlenbroek, 1996;Zuberbühler et al., 1999;Seddon et al., 2002;Crockford and Boesch, 2003;Brumm et al., 2005;Digweed et al., 2005;Slocombe and Zuberbühler, 2005;Egnor et al., 2006;Clay and Zuberbühler, 2009;Suzuki, 2012Suzuki, , 2016Suzuki, , 2019Suzuki, , 2020Fischer, 2020;Snowdon, 2020). On the other hand, compositional utterances have only seldomly been documented in wild animals (Arnold and Zuberbühler, 2006, 2008Ouattara et al., 2009a,b;Schlenker et al., 2016;Suzuki et al., 2017;Kuhn et al., 2018;Suzuki and Zuberbühler, 2019;Suzuki, 2021) and their status is disputed by some researchers, suggesting that the mere concatenation of (two) words to express a semantically compositional meaning may have been a paramount step in language evolution. Indeed, the currently undisputed (or little disputed) data on semantic compositionality in wild animals appear to be limited to cumulatively conjunctive meanings (i.e., "and"-meanings) (Boesch, 1991;Suzuki et al., 2016;Gabrić, 2021c). ...
Article
Full-text available
Several scholars have proposed that there was a two-word stage in the course of language evolution, in which utterances could not combine more than two words. These models agree that the putative two-word stage did not exhibit syntax. However, they disagree on whether or not there existed rules for inferring the semantic relationship between the two words expressing a compositional proposition. Focusing on semantically transitive events, I combine in the present paper language evolution models with previous empirical studies in linguistics to argue that the two-word stage was indeed governed by rules for inferring the compositional meaning of the utterance, in that (1) words were either associated with fixed (“predetermined”) semantic roles (i.e., agent, patient, predicate) or (2) there was a fixed order of semantic roles and the same words could be assigned different semantic roles in different utterances. Given the proposed existence of rules for producing and interpreting semantically compositional messages, it would appear that the putative two-word stage of language evolution did in fact exhibit syntax.
... Utterances comprised of a single denotative unit are found in wild nonhuman animals where they most often relate to vocalizations denoting predators and potentially intrusive species, as well as different food types (Struhsaker, 1967;Seyfarth et al., 1980;Q18 Karakashian et al., 1988;Cheney and Seyfarth, 1990;Evans et al., 1993;Uhlenbroek, 1996;Zuberbühler et al., 1999;Seddon et al., 2002;Crockford and Boesch, 2003;Brumm et al., 2004;Digweed et al., 2005;Slocombe and Zuberbühler, 2005;Egnor et al., 2006;Clay and Zuberbühler, 2009;Suzuki, 2012Suzuki, , 2016Suzuki, , 2019Suzuki, , 2020Fischer, 2020;Snowdon, 2020). On the other hand, compositional utterances have only seldomly been documented in wild animals (Arnold and Zuberbühler, 2006, 2008Ouattara et al., 2009a,b;Schlenker et al., 2016;Suzuki et al., 2017;Kuhn et al., 2018;Suzuki and Zuberbühler, 2019;Suzuki, 2021) and their status is disputed by some researchers, suggesting that the mere concatenation of (two) words to express a semantically compositional meaning may have been a paramount step in language evolution. Indeed, the currently undisputed (or little disputed) data on semantic compositionality in wild animals appear to be limited to cumulatively conjunctive meanings (i.e., "and"-meanings) (Boesch, 1991;Suzuki et al., 2016;Gabrić, 2021c). ...
Preprint
Full-text available
PUBLISHED IN: Frontiers in Psychology, 12, 684022 (https://www.frontiersin.org/articles/10.3389/fpsyg.2021.684022/full) .................................................................................................................................................. PLEASE CITE AS: Gabrić, P. (2021). Differentiation between agents and patients in the putative two-word stage of language evolution. Frontiers in Psychology, 12: 684022. https://doi.org/10.3389/fpsyg.2021.684022 .................................................................................................................................................. Several scholars from the latter school of thought have proposed that there was a two-word stage in the course of language evolution, in which utterances could not combine more than two words. These models agree that the putative two-word stage did not exhibit syntax. However, they disagree on whether or not there existed rules for inferring the semantic relationship between the two words expressing a compositional proposition. Focusing on semantically transitive events, I combine in the present paper language evolution models with previous empirical studies in linguistics to argue that the two-word stage was indeed governed by rules for inferring the compositional meaning of the utterance, in that (1) words were either associated with fixed (“predetermined”) semantic roles (i.e., agent, patient, predicate) or (2) there was a fixed order of semantic roles and the same words could be assigned different semantic roles in different utterances. Given the proposed existence of rules for producing and interpreting semantically compositional messages, it would appear that the putative two-word stage of language evolution did in fact exhibit syntax.
... Research has thus far demonstrated that some wild birds' and primates' vocalizations can be characterized as having a lexical or word-like character, in the sense that they denote concepts (e.g., predators in alarm calls or foods in food calls) similarly to how words in human languages denote (i.e., mean, map to) certain concepts (Gill and Bierema 2013;Macedonia and Evans 1993;Townsend and Manser 2013;Zuberbühler 2009). This has been reported for different taxa, including, for example, chickens (Gallus; Evans et al. 1993;Karakashian et al. 1988), trumpeters (Psophia; Seddon et al. 2002), tits (Paridae; Ha et al. 2020;Haftorn 2000;Suzuki 2011Suzuki , 2012Suzuki , 2014Suzuki , 2015Suzuki , 2016aSuzuki , b, 2018Suzuki , 2019Suzuki , 2020Suzuki , 2021 Cheney and Seyfarth 1990;Fischer 2020;Lyn and Christopher 2020;Seyfarth et al. 1980a, b;Snowdon 2020;Struhsaker 1967), guenons (Cercopithecus; Arnold et al. , 2010Arnold and Zuberbühler 2006b;Zuberbühler et al. 1999), and chimpanzees (Pan; Clay and Zuberbühler 2009;Crockford and Boesch 2003;Slocombe and Zuberbühler 2005;Uhlenbroek 1996). ...
Preprint
Full-text available
Recent discoveries of semantic compositionality in Japanese tits have enlivened the discussions on the presence of this phenomenon in wild animal communication. Data on semantic compositionality in wild apes are lacking, even though language experiments with captive apes have demonstrated they are capable of semantic compositionality. In this paper, I revisit the study by Boesch [1991 (Hum Evol 6:81–89] who investigated drumming sequences by an alpha male in a chimpanzee (Pan troglodytes) community in the Taï National Park, Côte d’Ivoire. A reanalysis of the data reveals that the alpha male produced semantically compositional combined messages of travel direction change and resting period initiation. Unlike the Japanese tits, the elements of the compositional expression were not simply juxtaposed but displayed structural reduction, while one of the two elements in the expression coded the meanings of both elements. These processes show relative resemblance to blending and fusion in human languages. Also unlike the tits, the elements of the compositional expression did not have a fixed order, although there was a fixed distribution of drumming events across the trees used for drumming. Because the elements of the expression appear to carry verb-like meanings, the compositional expression also resembles simple verb-verb constructions and short paratactic combinations of two clauses found across languages. In conclusion, the reanalysis suggests that semantic compositionality and phenomena resembling paratactic combinations of two clauses might have been present in the communication of the last common ancestor of chimpanzees and humans, not necessarily in the vocal modality.
Article
Mobbing is an important anti-predator behavior where prey harass and attack a predator to lower the immediate and long-term risk posed by it, warn others, and communicate about the predator's threat. While this behavior has been of interest to humans since antiquity, and aspects of mobbing have been well researched for the past 50 years, we still know little about its ecology and the evolutionary pressures that gave rise to this widespread anti-predator behavior. In this review, we explore what mobbing is, how it is used, what its functions are thought to be, and its use as a proxy for cognition, before providing suggestions for specific future avenues of research to improve our understanding of the ecology and evolution of mobbing. Here is a share link which allows free access and download until May 03, 2022. https://authors.elsevier.com/a/1eksFEsvggd%7EV
Presentation
Full-text available
An invited lecture on language-like phenomena in animal communication.
Article
Full-text available
Recent discoveries of semantic compositionality in Japanese tits have enlivened the discussions on the presence of this phenomenon in wild animal communication. Data on semantic compositionality in wild apes are lacking, even though language experiments with captive apes have demonstrated they are capable of semantic compositionality. In this paper, I revisit the study by Boesch (Hum. Evol. 6:81–89, 1991) who investigated drumming sequences by an alpha male in a chimpanzee ( Pan troglodytes ) community in the Taï National Park, Côte d’Ivoire. A reanalysis of the data reveals that the alpha male produced semantically compositional combined messages of travel direction change and resting period initiation. Unlike the Japanese tits, the elements of the compositional expression were not simply juxtaposed but displayed structural reduction, while one of the two elements in the expression coded the meanings of both elements. These processes show relative resemblance to blending and fusion in human languages. Also unlike the tits, the elements of the compositional expression did not have a fixed order, although there was a fixed distribution of drumming events across the trees used for drumming. Because the elements of the expression appear to carry verb-like meanings, the compositional expression also resembles simple verb-verb constructions and short paratactic combinations of two clauses found across languages. In conclusion, the reanalysis suggests that semantic compositionality and phenomena resembling paratactic combinations of two clauses might have been present in the communication of the last common ancestor of chimpanzees and humans, not necessarily in the vocal modality.
Article
Full-text available
Background: The ability to recombine smaller units to produce infinite structures of higher-order phrases is unique to human language, yet evidence of animals to combine multiple acoustic units into meaningful combinations increases constantly. Despite increasing evidence for meaningful call combinations across contexts, little attention has been paid to the potential role of temporal variation of call type composition in longer vocal sequences in conveying information about subtle changes in the environment or individual differences. Here, we investigated the composition and information content of sentinel call sequences in meerkats (Suricata suricatta). While being on sentinel guard, a coordinated vigilance behaviour, meerkats produce long sequences composed of six distinct sentinel call types and alarm calls. We analysed recordings of sentinels to test if the order of the call types is graded and whether they contain additional group-, individual-, age- or sex-specific vocal signatures. Results: Our results confirmed that the six distinct types of sentinel calls in addition to alarm calls were produced in a highly graded way, likely referring to changes in the perceived predation risk. Transitions between call types one step up or down the a priory assumed gradation were over-represented, while transitions over two or three steps were significantly under-represented. Analysing sequence similarity within and between groups and individuals demonstrated that sequences composed of the most commonly emitted sentinel call types showed high within-individual consistency whereby adults and females had higher consistency scores than subadults and males respectively. Conclusions: We present a novel type of combinatoriality where the order of the call types contains temporary contextual information, and also relates to the identity of the caller. By combining different call types in a graded way over long periods, meerkats constantly convey meaningful information about subtle changes in the external environment, while at the same time the temporal pattern of the distinct call types contains stable information about caller identity. Our study demonstrates how complex animal call sequences can be described by simple rules, in this case gradation across acoustically distinct, but functionally related call types, combined with individual-specific call patterns.
Article
Full-text available
Syntax (rules for combining words or elements) and semantics (meaning of expressions) are two pivotal features of human language, and interaction between them allows us to generate a limitless number of meaningful expressions. While both features were traditionally thought to be unique to human language, research over the past four decades has revealed intriguing parallels in animal communication systems. Many birds and mammals produce specific calls with distinct meanings, and some species combine multiple meaningful calls into syntactically ordered sequences. However, it remains largely unclear whether, like phrases or sentences in human language, the meaning of these call sequences depends on both the meanings of the component calls and their syntactic order. Here, leveraging recently demonstrated examples of meaningful call combinations, we introduce a framework for exploring the interaction between syntax and semantics (i.e. the syntax-semantic interface) in animal vocal sequences. We outline methods to test the cognitive mechanisms underlying the production and perception of animal vocal sequences and suggest potential evolutionary scenarios for syntactic communication. We hope that this review will stimulate phenomenological studies on animal vocal sequences as well as experimental studies on the cognitive processes, which promise to provide further insights into the evolution of language. This article is part of the theme issue ‘What can animal communication teach us about human language?’
Article
Full-text available
Japanese tits (Parus minor) produce specific alarm calls when they encounter a predatory snake. A recent field experiment showed that receiver tits became visually perceptive to an object resembling a snake when hearing these calls. However, the tits did not respond to the same object when hearing other call types or when the object was dissimilar to a snake. These findings provide the first experimental evidence for the retrieval of a visual search image from specific alarm calls, offering a novel approach for investigating the cognitive mechanisms underlying referential communication in wild animals.
Article
Full-text available
Understanding the origins and evolution of language remains a deep challenge, because its complexity and expressive power are unparalleled in the animal world. One of the key features of language is that the meaning of an expression is determined both by the meanings of its constituent parts and the syntactic rules used to combine them; known as the principle of compositionality. Although compositionality has been considered unique to language, recent field studies suggest that compositionality may have also evolved in vocal combinations in nonhuman animals. Here, we discuss how compositionality can be explored in animal communication systems and review recent evidence that birds use an ordering rule to generate compositional expressions composed of meaningful calls. Also, we suggest that birdsongs, particularly when incorporating calls, may represent unrecognized examples of compositionality in animal communication. Finally, we outline future research directions to uncover the development, neural mechanisms and evolution of compositionality.
Article
Full-text available
A key challenge in the field of human language evolution is the identification of the selective conditions that gave rise to language's generative nature. Comparative data on nonhuman animals provides a powerful tool to investigate similarities and differences among nonhuman and human communication systems and to reveal convergent evolutionary mechanisms. In this article, we provide an overview of the current evidence for combinatorial structures found in the vocal system of diverse species. We show that considerable structural diversity exits across and within species in the forms of combinatorial structures used. Based on this we suggest that a fine‐grained classification and differentiation of combinatoriality is a useful approach permitting systematic comparisons across animals. Specifically, this will help to identify factors that might promote the emergence of combinatoriality and, crucially, whether differences in combinatorial mechanisms might be driven by variations in social and ecological conditions or cognitive capacities. This article is categorized under: • Cognitive Biology > Evolutionary Roots of Cognition • Linguistics > Evolution of Language
Article
Many animals produce vocal alarm signals when they detect a predator, and heterospecific species sharing predators often eavesdrop on and respond to these calls [1]. Despite the widespread occurrence of interspecific eavesdropping in animals, its underlying cognitive process remains to be elucidated. If alarm calls, like human referential words, denote a specific predator type (e.g., “snake!”), then receivers may retrieve a mental image of the predator when hearing these calls [2, 3, 4]. Here, using a recently developed experimental paradigm [5], I test whether heterospecific alarm calls evoke a predator-specific visual search image in wild birds. During playback of snake-specific alarm calls produced by Japanese tits (Parus minor), coal tits (Periparus ater) approach a wooden stick being moved in a snake-like manner. However, coal tits do not approach the same stick when hearing other call types or if the stick’s movement is dissimilar to that of a snake. Thus, Japanese tit snake alarms cause coal tits to specifically enhance visual attention to snakelike objects. These results provide experimental evidence for the evocation of visual search images by heterospecific alarm calls, highlighting the importance of integrating cross-modal information in interspecific eavesdropping.
Article
Anti‐predatory strategies of birds are diverse and may include predator‐specific alarm calls. For example, oriental tit (Parus minor) parents can distinguish snakes from other predators and produce snake‐specific referential vocalizations ("jar" call) when a snake poses a threat to their nest. The “jar” call has a very specific function to induce fledging of nestlings close to fledging age. This reaction ensures nestlings' survival in natural encounters with snakes that are capable of entering nest cavities and kill entire broods. Sciurid rodents, like chipmunks, may pose a similar threat to cavity‐nesting birds. We explored the hypothesis that parents use the fledging‐inducing alarm vocalizations in this situation, because chipmunks, like snakes, can kill the brood upon entering the nest cavity. We compared alarm calls of parents toward two predators (chipmunk and snake) who pose a similar threat to the nestlings in a nest cavity, and toward an avian predator (Eurasian jay) who cannot enter nest cavities and poses no threat to the nestlings in a nest. Our results show that the vocal responses of oriental tits were different among the three predators. This suggests that the acoustic properties of vocal responses to predators are different between predators of a similar hunting strategy (nest‐cavity entering). The playback of recorded vocal responses of parents to chipmunks did not trigger the fledging of old nestlings, whereas the vocalizations toward a snake did, as shown by earlier studies. Our study suggests that the vocal response of parents does not carry information about the ability of predators to enter the nest cavity and confirms the special status of alarm calls triggered by snakes.
Article
In many species, individuals gather information about their environment both through direct experience and through information obtained from others. Social learning, or the acquisition of information from others, can occur both within and between species and may facilitate the rapid spread of antipredator behaviour. Within birds, acoustic signals are frequently used to alert others to the presence of predators, and individuals can quickly learn to associate novel acoustic cues with predation risk. However, few studies have addressed whether such learning occurs only though direct experience or whether it has a social component, nor whether such learning can occur between species. We investigate these questions in two sympatric species of Parids: blue tits (Cyanistes caeruleus) and great tits (Parus major). Using playbacks of unfamiliar bird vocalizations paired with a predator model in a controlled aviary setting, we find that blue tits can learn to associate a novel sound with predation risk via direct experience, and that antipredator response to the sound can be socially transmitted to heterospecific observers, despite lack of first-hand experience. Our results suggest that social learning of acoustic cues can occur between species. Such interspecific social information transmission may help to mediate the formation of mixed-species aggregations.
Article
Toshitaka Suzuki and Klaus Zuberbühler introduce the syntactical features found in the communication systems of non-human animals.
Article
As part of learning some languages, people learn to name colors using categorical labels such as "red," "yellow," and "green." Such labeling clearly facilitates communicating about colors, but does it also impact color perception? We demonstrate that simply hearing color words enhances categorical color perception, improving people's accuracy in discriminating between simultaneously presented colors in an untimed task. Immediately after hearing a color word participants were better able to distinguish between colors from the named category and colors from nearby categories. Discrimination between typical and atypical category members was also enhanced. Verbal cues slightly decreased discrimination accuracy between two typical shades of the named color. In contrast to verbal cues, a preview of the target color, an arguably more informative cue, failed to yield any changes to discrimination accuracy. The finding that color words strongly affect color discrimination accuracy suggests that categorical color perception may be caused by color representations being augmented in-the-moment by language. (PsycINFO Database Record (c) 2019 APA, all rights reserved).