ArticlePDF Available

Songbirds can learn flexible contextual control over syllable sequencing

Authors:

Abstract and Figures

The flexible control of sequential behavior is a fundamental aspect of speech, enabling endless reordering of a limited set of learned vocal elements (syllables or words). Songbirds are phylogenetically distant from humans but share both the capacity for vocal learning and neural circuitry for vocal control that includes direct pallial-brainstem projections. Based on these similarities, we hypothesized that songbirds might likewise be able to learn flexible, moment-by-moment control over vocalizations. Here, we demonstrate that Bengalese finches ( Lonchura striata domestica ), which sing variable syllable sequences, can learn to rapidly modify the probability of specific sequences (e.g. ‘ab-c’ versus ‘ab-d’) in response to arbitrary visual cues. Moreover, once learned, this modulation of sequencing occurs immediately following changes in contextual cues and persists without external reinforcement. Our findings reveal a capacity in songbirds for learned contextual control over syllable sequencing that parallels human cognitive control over syllable sequencing in speech.
Contextual changes are local to the target sequences. (A) Transition diagram for the song of Bird 6 (spectrogram in Figure 1) in yellow probe context. Sequences of syllables with fixed transition patterns (e.g. 'aab') as well as repeat phrases and introductory notes have been summarized as single states to simplify the diagram. (B) Transition matrix for the same bird, showing same data as in (A). (C) Differences between the two contexts are illustrated by subtracting the transition matrix in the yellow context from the one in the green context, so that sequence transitions which are more frequent in green context are positive (colored green) and sequence transitions which are more frequent in yellow are negative (colored yellow). For this bird, the majority of contextual differences occurred at the branch point ('aab') which most closely preceded the target sequences ('ab-c' and 'ab-d'), while very little contextual difference occurred at the other three branch points ('i', 'wr', 'cr'). (D-F) Same for Bird 2 for which two different branch points ('f' and 'n') preceded the target sequences ('f-abcd' and 'n-abcd') (spectrogram in Figure 3). (G) Proportion of changes at the branch point(s) most closely preceding the target sequences, relative to the total magnitude of context differences for each bird (see Materials and methods). Most birds exhibited high specificity of contextual changes to the relevant branch points. Source data in Figure 4-source data 1. The online version of this article includes the following source data and figure supplement(s) for figure 4:
… 
Content may be subject to copyright.
*For correspondence:
lena.veit@uni-tuebingen.de (LV);
msb@phy.ucsf.edu (MSB)
Present address:
Institute for
Neurobiology, University of
Tu
¨bingen, Tu
¨bingen, Germany;
The Rockefeller University, New
York, United States
Competing interests: The
authors declare that no
competing interests exist.
Funding: See page 15
Received: 30 July 2020
Accepted: 25 April 2021
Published: 01 June 2021
Reviewing editor: Jesse H
Goldberg, Cornell University,
United States
Copyright Veit et al. This
article is distributed under the
terms of the Creative Commons
Attribution License, which
permits unrestricted use and
redistribution provided that the
original author and source are
credited.
Songbirds can learn flexible contextual
control over syllable sequencing
Lena Veit
*, Lucas Y Tian
, Christian J Monroy Hernandez, Michael S Brainard*
Center for Integrative Neuroscience and Howard Hughes Medical Institute,
University of California, San Francisco, San Francisco, United States
Abstract The flexible control of sequential behavior is a fundamental aspect of speech, enabling
endless reordering of a limited set of learned vocal elements (syllables or words). Songbirds are
phylogenetically distant from humans but share both the capacity for vocal learning and neural
circuitry for vocal control that includes direct pallial-brainstem projections. Based on these
similarities, we hypothesized that songbirds might likewise be able to learn flexible, moment-by-
moment control over vocalizations. Here, we demonstrate that Bengalese finches (Lonchura striata
domestica), which sing variable syllable sequences, can learn to rapidly modify the probability of
specific sequences (e.g. ‘ab-c’ versus ‘ab-d’) in response to arbitrary visual cues. Moreover, once
learned, this modulation of sequencing occurs immediately following changes in contextual cues
and persists without external reinforcement. Our findings reveal a capacity in songbirds for learned
contextual control over syllable sequencing that parallels human cognitive control over syllable
sequencing in speech.
Introduction
A crucial aspect of the evolution of human speech is the development of flexible control over
learned vocalizations (Ackermann et al., 2014;Belyk and Brown, 2017). Humans have unparalleled
control over their vocal output, with a capacity to reorder a limited number of learned elements to
produce an endless combination of vocal sequences that are appropriate for current contextual
demands (Hauser et al., 2002). This cognitive control over vocal production is thought to rely on
the direct innervation of brainstem and midbrain vocal networks by executive control structures in
the frontal cortex, which have become more elaborate over the course of primate evolution
(Hage and Nieder, 2016;Simonyan and Horwitz, 2011). However, because of the comparatively
limited flexibility of vocal production in nonhuman primates (Nieder and Mooney, 2020), the evolu-
tionary and neural circuit mechanisms that have enabled the development of this flexibility remain
poorly understood.
Songbirds are phylogenetically distant from humans, but they have proven a powerful model for
investigating neural mechanisms underlying learned vocal behavior. Song learning exhibits many par-
allels to human speech learning (Doupe and Kuhl, 1999); in particular, juveniles need to hear an
adult tutor during a sensitive period, followed by a period of highly variable sensory-motor explora-
tion and practice, during which auditory feedback is used to arrive at a precise imitation of the tutor
song (Brainard and Doupe, 2002). This capacity for vocal learning is subserved by a well-under-
stood network of telencephalic song control nuclei. Moreover, as in humans, this vocal control net-
work includes strong projections directly from cortical (pallial) to brainstem vocal control centers
(Doupe and Kuhl, 1999;Simonyan and Horwitz, 2011). These shared behavioral features and neu-
ral specializations raise the question of whether songbirds might also share the capacity to learn flex-
ible control over syllable sequencing.
Contextual variation of song in natural settings, such as territorial counter-singing or female-
directed courtship song, indicate that songbirds can rapidly alter aspects of their song, including
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 1 of 19
RESEARCH ARTICLE
syllable sequencing and selection of song types (Chen et al., 2016;Heinig et al., 2014;King and
McGregor, 2016;Sakata et al., 2008;Searcy and Beecher, 2009;Trillo and Vehrencamp, 2005).
However, such modulation of song structure is often described as affectively controlled
(Berwick et al., 2011;Nieder and Mooney, 2020). For example, the presence of potential mates or
rivals elicits a global and unlearned modulation of song intensity (James et al., 2018) related to the
singer’s level of arousal or aggression (Alcami et al., 2021;Heinig et al., 2014;Jaffe and Brainard,
2020). Hence, while prior observations suggest that a variety of ethologically relevant factors can be
integrated to influence song production in natural settings, it remains unclear whether song can be
modified more flexibly by learned or cognitive factors.
Here, we tested whether Bengalese finches can learn to alter specifically targeted vocal sequen-
ces within their songs in response to arbitrarily chosen visual cues, independent of social or other
natural contexts. Each Bengalese finch song repertoire includes ~5–12 acoustically distinct elements
(‘syllables’) that are strung together into sequences in variable but non-random order. For a given
bird, the relative probabilities of specific transitions between syllables normally remain constant over
time (Okanoya, 2004;Warren et al., 2012), but previous work has shown that birds can gradually
adjust the probabilities of alternative sequences in response to training that reinforces the produc-
tion of some sequences over others. In this case, changes to syllable sequencing develop over a
period of hours to days (Warren et al., 2012). In contrast, we investigate here whether birds can
learn to change syllable sequencing on a moment-by-moment basis in response to arbitrary visual
cues that signal which sequences are adaptive at any given time. Our findings reveal that songbirds
can learn to immediately, flexibly, and adaptively adjust the sequencing of selected vocal elements
in response to learned contextual cues.
eLife digest Human speech and birdsong share numerous parallels. Both humans and birds
learn their vocalizations during critical phases early in life, and both learn by imitating adults.
Moreover, both humans and songbirds possess specific circuits in the brain that connect the
forebrain to midbrain vocal centers.
Humans can flexibly control what they say and how by reordering a fixed set of syllables into
endless combinations, an ability critical to human speech and language. Birdsongs also vary
depending on their context, and melodies to seduce a mate will be different from aggressive songs
to warn other males to stay away. However, so far it was unclear whether songbirds are also capable
of modifying songs independent of social or other naturally relevant contexts.
To test whether birds can control their songs in a purposeful way, Veit et al. trained adult male
Bengalese finches to change the sequence of their songs in response to random colored lights that
had no natural meaning to the birds. A specific computer program was used to detect different
variations on a theme that the bird naturally produced (for example, “ab-c” versus “ab-d”), and
rewarded birds for singing one sequence when the light was yellow, and the other when it was
green. Gradually, the finches learned to modify their songs and were able to switch between the
appropriate sequences as soon as the light cues changed. This ability persisted for days, even
without any further training.
This suggests that songbirds can learn to flexibly and purposefully modify the way in which they
sequence the notes in their songs, in a manner that parallels how humans control syllable
sequencing in speech. Moreover, birds can learn to do this ‘on command’ in response to an
arbitrarily chosen signal, even if it is not something that would impact their song in nature.
Songbirds are an important model to study brain circuits involved in vocal learning. They are one
of the few animals that, like humans, learn their vocalizations by imitating conspecifics. The finding
that they can also flexibly control vocalizations may help shed light on the interactions between
cognitive processing and sophisticated vocal learning abilities.
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 2 of 19
Research article Neuroscience
Results
Bengalese finches can learn context-dependent syllable sequencing
For each bird in the study, we first identified variably produced syllable sequences that could be
gradually modified using a previously described aversive reinforcement protocol (‘single context
training’; Tumer and Brainard, 2007;Warren et al., 2012). For example, a bird that normally transi-
tioned from the fixed syllable sequence ‘ab’ to either ‘c’ or ‘d’ (Figure 1A,B, sequence probability
of ~36% for ‘ab-c’ and ~64% for ‘ab-d’) was exposed to an aversive burst of white noise (WN) feed-
back immediately after the ‘target sequence’ ‘ab-d’ was sung. In response, the bird learned over a
period of days to gradually decrease the relative probability of that sequence in favor of the alterna-
tive sequence ‘ab-c’ (Figure 1C). This change in sequence probabilities was adaptive in that it
enabled the bird to escape from WN feedback. Likewise, when the sequence, ‘ab-c’ was targeted,
the probability of ‘ab-d’ increased gradually over several days of training (Figure 1D). These exam-
ples are consistent with prior work that showed such sequence modifications develop over a period
of several days, with the slow time course suggesting a gradual updating of synaptic connections
within syllable control networks in response to performance-related feedback (Warren et al., 2012).
In contrast, the ability to immediately and flexibly reorder vocal elements in speech must reflect
mechanisms that enable contextual factors to exert moment-by-moment control over selection and
sequencing of alternative vocal motor programs. Having identified sequences for each bird for which
the probability of production could be gradually modified in this manner, we then tested whether
birds could be trained to rapidly switch between those same sequences in a context-dependent
manner.
To determine whether Bengalese finches can learn to flexibly select syllable sequences on a
moment-by-moment basis, we paired WN targeting of specific sequences with distinct contextual
cues. In this context-dependent training protocol, WN was targeted to defined sequences in the
bird’s song as before, but the specific target sequence varied across alternating blocks, signaled by
different colored lights in the home cage (see Materials and methods). Figure 1E shows an example
experiment, with ‘ab-d’ targeted in yellow light, and ‘ab-c’ in green light. At baseline, without WN,
switches between yellow and green contexts (at random intervals of 0.5–1.5 hr) did not lead to sig-
nificant changes in the relative proportion of the target sequences, indicating that there was no
inherent influence of the light cues on sequence probabilities (Figure 1F, p(ab-d) in yellow vs. green
context was 67 ±1.6% vs. 64 ±1.5%, p=0.17, rank-sum test, n = 53 context blocks from baseline
period). Training was then initiated in which WN was alternately targeted to each sequence, over
blocks that were signaled by light cues. After 2 weeks of such context-specific training, significant
sequencing differences developed between light contexts that were appropriate to reduce aversive
feedback in each context (Figure 1G, p(ab-d) in yellow vs. green context shifted to 36.5 ±4.8% vs.
83.1 ±3.5%, p<0.01, rank-sum test, n = 22 context blocks, block duration between 1 and 2.5 hr).
Likewise, for all birds trained on this protocol (n = 8), context-dependent sequencing differences
developed in the appropriate direction over a period of weeks (27 ±6% difference in probabilities
between contexts after a mean of 33 days training, versus 1% ±2% average difference in probabili-
ties at baseline; p<0.01, n = 8, signed rank test, Figure 1H). Thus, Bengalese finches are able to
learn context-specific modifications to syllable sequencing.
Syllable sequencing shifts immediately following switches in context
Contextual differences between different blocks could arise through an immediate shift in sequence
probabilities upon entry into a new context and/or by rapid learning within each block. We exam-
ined whether trained birds exhibited any immediate shifts in their syllable sequencing when entering
a new light context by computing the average probability of target sequences across songs aligned
with the switch between contexts (Figure 2A,B, example experiment). This ‘switch-triggered aver-
age’ revealed that across all birds, switches to the yellow context were accompanied by an immedi-
ate decrease in the probability of the yellow target sequence, whereas switches out of the yellow
context (and into the green context) led to an immediate increase in the yellow target sequence
(Figure 2C,D, p<0.05, signed rank test comparing first and last song, n = 8). To quantify the size of
these immediate shifts, we calculated the difference in sequence probability from the last five songs
in the previous context to the first five songs in the current context; this difference averaged
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 3 of 19
Research article Neuroscience
Figure 1. Bengalese finches can learn context-dependent sequencing. (A) Example spectrogram highlighting points in song with variable
sequencing. Syllables are labeled based on their spectral structure, target sequences for the different experiments (ab-c and ab-d) are marked with
colored bars. Y-axis shows frequency in Hz. (B) Transition diagram with probabilities for sequences ab-c and ab-d. The sequence probability of ab-d
(and complementary probability ab-c) stayed relatively constant over five days. Shaded area shows 95% confidence interval for sequence probability.
Source data in Figure 1—source data 3. (C) Aversive reinforcement training. Schematic showing aversive WN after target sequence ab-d; spectrogram
shows WN stimulus, covering part of syllable d. WN targeted to sequence ab-d led to a gradual decrease in the probability of that sequence over
several days, and a complementary increase in the probability of ab-c. (D) WN targeted to ab-c led to a gradual increase in the sequence probability of
ab-d. Source data in Figure 1—source data 2. (E)Schematic of the contextual learning protocol, with target for WN signaled by colored lights. (F) Left:
Two example days of baseline without WN but with alternating blocks of green and yellow context. Colors indicate light context (black indicates
periods of lights off during the night), error bars indicate SEM across song bouts in each block. Right: Average sequence probability in yellow and
green blocks during baseline. Open circles show individual blocks, error bars show SEM across blocks. (G) Left: Two example days after training (WN
on). Right: Average sequence probability in yellow and green blocks after training. (H) Contextual difference in sequence probability for eight trained
birds before and after training (**p<0.01 signed rank test). Source data in Figure 1—source data 1.
Figure 1 continued on next page
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 4 of 19
Research article Neuroscience
A 500ms Bird 6 8
3/ e
aab
~
64% d
1
0.8
~
0.6
.0
~
0.4
a.
0.2
0
p(ab-d)
.
A..
.
~
tP
·
~
~-c)
2 3 4 5
Baseline
days
n f
be
r a a b d f
be
r a a b d f
be
r a a
be
r
c WN p(aq-c)
4
-
.,
...
Baseline
...
"0
I
e
.0
ab
<
WN
~
0.4
a.
I ;
dl · ' _ p_(ab-d)
E
F 1
0.9
-0.8
""9
0.7
.0
~
0.6
a.
0.5
0.4
0.3
0.2
0.1
0
0.2
f
be
r a a b d f be r 0 1 2 3 4
-Days since start
of
training
Context
1:
[]
Yellow
light
cue
m Target sequence: ab-d
ab-c
escapes
white
noise
Bird 6
Baseline day 2 Baseline day 3
0
0
6:30am 8:30pm 6:30am y G
G 1
0.9
-0.8
Context
'i'
0.1
..0
training
~
0.6
~
a.
0.5
~
0.4
0.3
0.2
0.1
0
D WN p(ab-d)
1
.,---
WN
0.8
at<c
l -.
....
6.<=!.se!ine.
"0
..........
.6 0.6
ro
~
0.4
d 0.2
~b-e)
0
a a
bd
f be r a a
be
r 1 2 3 4
--Days since start
of
training
Context
2: Green
light
cue
Target sequence:
ab-c
(~h"
.
w
1te
n01se
ab-d
escapes
white
noise
Bird 6
Training day 15 Training day 16
8
········
:
·········li
6:30am 8:30pm 6:30am y G
H >-
~
0.5
:0
~
0.4
e
a.
0.3
Q)
g 0.2
Q)
s-
0.1
Q)
rJ) 0
<l
n=8
**
Baseline Training
0.24 ±0.06 for switches to green light and 0.22 ±0.06 for switches to yellow light (Figure 2E,F).
These results indicate that birds could learn to immediately recall an acquired memory of context-
appropriate sequencing upon entry into each context, even before having the chance to learn from
reinforcing feedback within that context.
We next asked whether training additionally led to an increased rate of learning within each con-
text, which also might contribute to increased contextual differences over time. Indeed, such faster
re-learning for consecutive encounters of the same training context, or ‘savings’, is sometimes
observed in contextual motor adaptation experiments (Lee and Schweighofer, 2009). To compare
the magnitude of the immediate shift and the magnitude of within-block learning over the course of
training, we plotted the switch-aligned sequence probabilities at different points in the training pro-
cess. Figure 2G shows for the example bird that the magnitude of the shift (computed between the
first and last five songs across context switches) gradually increased over 11 days of training.
Figure 2H shows the switch-aligned sequence probability trajectories (as in Figure 2A,B) for this
bird early in training (red) and late in training (blue), binned into groups of seven context switches.
Qualitatively, there was both an abrupt change in sequence probability at the onset of each block
(immediate shift at time point 0) and a gradual adjustment of sequence probability within each block
(within-block learning over the first 80 songs following light switch). Over the course of training, the
immediate shift at the onset of each block got larger, while the gradual change within blocks stayed
approximately the same (learning trajectories remained parallel over training, Figure 2H). Linear fits
to the sequence probabilities for each learning trajectory (i.e. the right side of Figure 2H) reveal
that, indeed, the change in sequence probability at the onset of blocks (i.e. intercepts) increased
over the training process (Figure 2K), while the rate of change within blocks (i.e. slopes) stayed con-
stant (Figure 2I). To quantify this across birds, we measured the change over the course of learning
in both the magnitude of immediate shifts (estimated as the intercepts from linear fits) and the rate
of within-block learning (estimated as the slopes from linear fits). As for the example bird, we found
that the rate of learning within each block stayed constant over time for all five birds (Figure 2L). In
contrast, the magnitude of immediate shifts increased over time for all birds (Figure 2L). These anal-
yses indicate that adjustments to sequence probability reflect two dissociable processes, an immedi-
ate cue-dependent shift in sequence probability at the beginning of blocks, that increases with
contextual training, and a gradual adaptation of sequence probability within blocks, that does not
increase with contextual training.
Visual cues in the absence of reinforcement are sufficient to evoke
sequencing changes
The ability of Bengalese finches to implement an immediate shift in sequencing on the first rendition
in a block – and thus before they have a chance to learn from reinforcing feedback – argues that
they can maintain context-specific motor memories and use contextual visual cues to anticipate cor-
rect sequencing in each context. To explicitly test whether birds can flexibly switch between
sequencing appropriate for distinct contexts using only visual cues, we included short probe blocks
which presented the same light cues without WN stimulation. Probe blocks were interspersed in the
sequence of training blocks so that each switch between types of blocks was possible and, on aver-
age, every third switch was into a probe block (see Materials and methods). Light switches into
probe blocks were associated with similar magnitude shifts in sequence probability as switches into
WN blocks of the corresponding color (0.22 ±0.06 to both yellow WN and yellow probe blocks
from green WN blocks, p=0.94, signed rank test; 0.24 ±0.06 to green WN and 0.23 ±0.07 to green
probe blocks from yellow WN blocks, p=0.64, signed rank test). As the most direct test of whether
light cues alone evoke adaptive sequencing changes, we compared songs immediately before and
Figure 1 continued
The online version of this article includes the following source data for figure 1:
Source data 1. Switch magnitude during baseline and after training for all birds, to generate Figure 1H, and plots like Figure 1F,G for all birds.
Source data 2. Sequence data for the example bird during single-context training, to generate Figure 1C,D.
Source data 3. Sequence data for the example bird during baseline, to generate Figure 1B.
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 5 of 19
Research article Neuroscience
after switches between probe blocks without intervening WN training blocks (probe-probe switches).
Figure 3A,B shows song bouts for one example bird (Bird 2) which were sung consecutively across a
switch from yellow probe to green probe blocks. In the first song following the probe-probe switch,
the yellow target sequence (‘f-ab’) was more prevalent, and the green target sequence (‘n-ab’) was
less prevalent, and such an immediate effect was also apparent in the average sequence probabili-
ties for this bird aligned to probe–probe switches (Figure 3C,D). Similar immediate and appropri-
ately directed shifts in sequencing at switches between probe blocks were observed for all eight
birds (Figure 3E,F, p<0.05 signed rank test, n = 8), with average shifts in sequence probabilities of
0.21 ±0.09 and 0.17 ±0.08 (Figure 3G,H). The presence of such changes in the first songs sung
Figure 2. Sequence probabilities shift immediately following a switch in context. (A, B) Average sequence probability per song for example Bird 1
aligned to switches from green to yellow context (A) and from yellow to green context (B). Error bars indicate SEM across song bouts (n = 35 switches
(A), n = 33 switches (B)). (C) Changes in sequence probability from the last song in green context to the first song in yellow context for all eight birds.
Example bird in (A,B) highlighted in bold. **p<0.01 signed rank test. (D) Changes in sequence probability from the last song in yellow context to the
first song in green context. *p<0.05 signed rank test. (E) Shift magnitudes for all birds, defined as the changes in sequence probability from the last five
songs in the green context to the first five songs in the yellow context. Open circles show individual birds, error bars indicate SEM across birds. (F)
Same as (E) for switches from yellow to green. Source data in Figure 2—source data 1. (G) Shift magnitudes over training time for the example bird (11
days and 49 context switches; seven of the original 56 context switches are excluded from calculations of shift magnitudes because at least one of the
involved blocks contained only one or two song bouts.). (H) Trajectory of switch-aligned sequence probabilities for the example bird early in training
(red) and late in training (blue). Probabilities are normalized by the sequence probability in preceding block, and plotted so that the adaptive direction
is positive for both switch directions (i.e. inverting the probabilities for switches to yellow.) (I) Slopes of fits to the sequence probability trajectories over
song bouts within block. Units in change of relative sequence probability per song bout. (K) Intercepts of fits to sequence probability trajectories over
song bouts within block. Units in relative sequence probability. (L) Changes in slopes and changes in intercepts for five birds over the training process,
determined as the slopes of linear fits to curves as in (Iand K) for each bird. Source data in Figure 2—source data 2.
The online version of this article includes the following source data for figure 2:
Source data 1. Switch magnitude between all contexts after training, to generate Figures 2C–F and 3E–H.
Source data 2. Summary of training data, to generate Figure 2L.
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 6 of 19
Research article Neuroscience
after probe–probe switches indicates that visual cues alone are sufficient to cause anticipatory,
learned shifts between syllable sequences.
Figure 3. Contextual cues alone are sufficient to enable immediate shifts in syllable sequencing. (A,B) Examples of songs sung by Bird 2 immediately
before (A) and after (B) a switch from a yellow probe block to a green probe block (full song bouts in Figure 3—figure supplement 1). Scale for x-axis
is 500 ms; y-axis shows frequency in Hz. (C, D) Average sequence probability per song for Bird 2 aligned to switches from green probe to yellow probe
blocks (C) and from yellow probe to green probe blocks (D). Error bars indicate SEM across song bouts (n = 14 switches (C), 11 switches (D)). (E, F)
Average sequence probabilities for all eight birds at the switch from the last song in green probe context and the first song in yellow probe context,
and vice versa. Example Bird 2 is shown in bold. *p<0.05 signed rank test. (G, H) Shift magnitudes for probe–probe switches for all birds. Open circles
show individual birds; error bars indicate SEM across birds. Source data in Figure 2—source data 1.
The online version of this article includes the following figure supplement(s) for figure 3:
Figure supplement 1. Example song bouts surrounding a probe–probe context switch.
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 7 of 19
Research article Neuroscience
Contextual changes are specific to target sequences
A decrease in the probability of a target sequence in response to contextual cues must reflect
changes in the probabilities of transitions leading up to the target sequence. However, such changes
could be restricted to the transitions that immediately precede the target sequence, or alternatively
could affect other transitions throughout the song. For example, for the experiment illustrated in
Figure 1, the prevalence of the target sequence ‘ab-d’ was appropriately decreased in the yellow
context, in which it was targeted. The complete transition diagram and corresponding transition
matrix for this bird (Figure 4A,B) reveal that there were four distinct branch points at which syllables
were variably sequenced (after ‘cr’, ‘wr’, ‘i’, and ‘aab’). Therefore, the decrease in the target
sequence ‘ab-d’ could have resulted exclusively from an increase in the probability of the alternative
transition ‘ab-c’ at the branch point following ‘aab’. However, a reduction in the prevalence of the
target sequence could also have been achieved by changes in the probability of transitions earlier in
song such that the sequence ‘aab’ was sung less frequently. To investigate the extent to which con-
textual changes in probability were specific to transitions immediately preceding target sequences,
we calculated the difference between transition matrices in the yellow and green probe contexts
(Figure 4C). This difference matrix indicates that changes to transition probabilities were highly spe-
cific to the branch point immediately preceding the target sequences (specificity was defined as the
proportion of total changes which could be attributed to the branch points immediately preceding
target sequences; specificity for branch point ‘aab’ was 83.2%). Such specificity to branch points that
immediately precede target sequences was typical across experiments, including cases in which dif-
ferent branch points preceded each target sequence (Figure 4D–F, specificity 96.9%). Across all
eight experiments, the median specificity of changes to the most proximal branch points was
84.95%, and only one bird, which was also the worst learner in the contextual training paradigm,
had a specificity of less than 50% (Figure 4G). Hence, contextual changes were specific to target
sequences and did not reflect the kind of global sequencing changes that characterize innate social
modulation of song structure (Sakata et al., 2008;Sossinka and Bo
¨hner, 1980).
Distinct sequence probabilities are specifically associated with different
visual cues
Our experiments establish that birds can shift between two distinct sequencing states in response to
contextual cues. In order to test whether birds were capable of learning to shift to these two states
from a third neutral context, we trained a subset of three birds with three different color-cued con-
texts. For these birds, after completion of training with WN targeted to distinct sequences in yellow
and green contexts (as described above), we introduced interleaved blocks cued by white light in
which there was no reinforcement. After this additional training, switches from the unreinforced con-
text elicited changes in opposite directions for the green and yellow contexts (example bird
Figure 5A). All birds (n = 3) showed adaptive sequencing changes for the first song bout in probe
blocks (Figure 5B,C) as well as immediate shifts in the adaptive directions for all color contexts
(Figure 5D, 0.11 ±0.04 and 0.19 ±0.05 for switches to green WN and green probe blocks, respec-
tively; 0.15 ±0.06 and 0.09 ±0.02 for switches to yellow WN and yellow probe blocks, respec-
tively). While additional data would be required to establish the number of distinct associations
between contexts and sequencing states that can be learned, these findings suggest that birds can
maintain at least two distinct sequencing states separate from a ‘neutral’ state and use specific asso-
ciations between cue colors and sequencing states to rapidly shift sequencing in distinct directions
for each context.
Discussion
Speech, thought, and many other behaviors are composed of ordered sequences of simpler ele-
ments. The flexible control of sequencing is thus a fundamental aspect of cognition and motor func-
tion (Aldridge and Berridge, 2002;Jin and Costa, 2015;Tanji, 2001). While the flexibility of
human speech is unrivaled, our contextual training paradigm revealed a simpler, parallel capacity in
birds to produce distinct vocal sequences in response to arbitrary contextual cues. The colors of the
cues had no prior relevance to the birds, so that their meaning had to be learned as a new associa-
tion between cues and the specific vocal sequences that were contextually appropriate (i.e. that
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 8 of 19
Research article Neuroscience
Figure 4. Contextual changes are local to the target sequences. (A) Transition diagram for the song of Bird 6 (spectrogram in Figure 1) in yellow probe
context. Sequences of syllables with fixed transition patterns (e.g. ‘aab’) as well as repeat phrases and introductory notes have been summarized as
single states to simplify the diagram. (B) Transition matrix for the same bird, showing same data as in (A). (C) Differences between the two contexts are
illustrated by subtracting the transition matrix in the yellow context from the one in the green context, so that sequence transitions which are more
frequent in green context are positive (colored green) and sequence transitions which are more frequent in yellow are negative (colored yellow). For this
bird, the majority of contextual differences occurred at the branch point (‘aab’) which most closely preceded the target sequences (‘ab-c’ and ‘ab-d’),
while very little contextual difference occurred at the other three branch points (‘i’, ‘wr’, ‘cr’). (D–F) Same for Bird 2 for which two different branch points
(‘f’ and ‘n’) preceded the target sequences (‘f-abcd’ and ‘n-abcd’) (spectrogram in Figure 3). (G) Proportion of changes at the branch point(s) most
closely preceding the target sequences, relative to the total magnitude of context differences for each bird (see Materials and methods). Most birds
exhibited high specificity of contextual changes to the relevant branch points. Source data in Figure 4—source data 1.
The online version of this article includes the following source data and figure supplement(s) for figure 4:
Source data 1. Overview of different experimental parameters and song features for each bird, to generate (Figure 4G,Figure 4—figure supplement 1).
Figure supplement 1. Possible explanations for differences in contextual learning.
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 9 of 19
Research article Neuroscience
escaped WN, given the current cues). Learned modulation of sequencing was immediately
expressed in response to changes in cues, persisted following termination of training, and was
largely restricted to the targeted sequences, without gross modifications of global song structure.
Hence, for song, like speech, the ordering of vocal elements can be rapidly and specifically reconfig-
ured to achieve learned, contextually appropriate goals. This shared capacity for moment-by-
moment control of vocal sequencing in humans and songbirds suggests that the avian song system
could be an excellent model for investigating how neural circuits enable flexible and adaptive recon-
figuration of motor output in response to different cognitive demands.
Flexible control of vocalizations
Our demonstration of contextual control over the ordering of vocal elements in the songbird builds
on previous work showing that a variety of animals can learn to emit or withhold innate vocalizations
in response to environmental or experimentally imposed cues. For example, nonhuman primates
and other animals can produce alarm calls that are innate in their acoustic structure, but that are
deployed in a contextually appropriate fashion (Nieder and Mooney, 2020;Suzuki and Zuberbu
¨h-
ler, 2019;Wheeler and Fischer, 2012). Similarly, animals, including birds, can be trained to control
their vocalizations in an experimental setting, by reinforcing the production of innate vocalizations in
response to arbitrary cues to obtain food or water rewards (Brecht et al., 2019;Hage and Nieder,
2013;Nieder and Mooney, 2020;Reichmuth and Casey, 2014). In relation to these prior findings,
our results demonstrate a capacity to flexibly reorganize the sequencing of learned vocal elements,
rather than select from a fixed set of innate vocalizations, in response to arbitrary cues. This ability
to contextually control the ordering, or syntax, of specifically targeted syllable transitions within the
overall structure of learned song parallels the human capacity to differentially sequence a fixed set
of syllables in speech.
The ability to alter syllable sequencing in a flexible fashion also contrasts with prior studies that
have demonstrated modulation of vocalizations in more naturalistic settings. For example, songs
produced in the context of courtship and territorial or aggressive encounters (‘directed song’) differ
in acoustic structure from songs produced in isolation (‘undirected song’) (Sakata et al., 2008;
Searcy and Beecher, 2009). This modulation of song structure by social context is characterized by
global changes to the intensity of song production, with directed songs exhibiting faster tempo, and
Figure 5. Contextual cues allow shifts in both directions. (A) Sequence probability for Bird 2 at the switch from neutral context to yellow and green WN
contexts, as well as yellow and green probe contexts (no WN). Error bars indicate SEM across song bouts (n = 68 switches [green WN], 78 switches
[yellow WN], 27 switches [green probe], 24 switches [yellow probe]). (B, C) Sequence probabilities for three birds for the last song in neutral context and
the first song in the following probe context. Example bird in (A) highlighted in bold. (D) Shift magnitude for three birds at the switch from neutral
context to all other contexts. Open circles show individual birds; error bars indicate SEM across birds. Source data in Figure 5—source data 1.
The online version of this article includes the following source data for figure 5:
Source data 1. Switch magnitude during third context experiment, to generate Figure 5B–D.
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 10 of 19
Research article Neuroscience
greater stereotypy of both syllable structure and syllable sequencing, than undirected songs
(Sakata et al., 2008;Searcy and Beecher, 2009;Sossinka and Bo
¨hner, 1980). This and other etho-
logically relevant modulation of song intensity may serve to communicate the singer’s affective state,
such as level of arousal or aggression (Alcami et al., 2021;Hedley et al., 2017;Heinig et al.,
2014), and may largely reflect innate mechanisms (James et al., 2018;Kojima and Doupe, 2011)
mediated by hypothalamic and neuromodulatory inputs to premotor regions (Berwick et al., 2011;
Gadagkar et al., 2019;James et al., 2018;Nieder and Mooney, 2020). In contrast, here we show
that birds can learn to locally modulate specific features of their songs (i.e. individually targeted syl-
lable transitions) in response to arbitrarily assigned contextual cues that have no prior ethological
relevance.
Evolution of control over vocal sequencing
The capacity for moment-by-moment adjustment of vocalizations in response to arbitrary learned
cues may depend on similar capacities that evolved to enable appropriate modulation of vocaliza-
tions in ethologically relevant natural contexts. For example, some species of songbirds preferen-
tially sing different song types depending on factors such as time of day, location of the singer, or
the presence of an audience (Alcami et al., 2021;Hedley et al., 2017;King and McGregor, 2016;
Searcy and Beecher, 2009;Trillo and Vehrencamp, 2005). Even birds with only a single song type,
such as Bengalese finches, vary parameters of their song depending on social context, including the
specific identity of the listener (Chen et al., 2016;Heinig et al., 2014;Sakata et al., 2008). The abil-
ity to contextually control vocalizations is also relevant for the customization of vocal signatures for
purposes of individual and group recognition (Vignal et al., 2004) and to avoid overlap and enhance
communication during vocal turn-taking and in response to environmental noises (Benichov and Val-
lentin, 2020;Brumm and Zollinger, 2013). Such capacities for vocal control likely reflect evolution-
ary advantages of incorporating sensory and contextual information about conspecifics and the
environment in generating increasingly sophisticated vocal signaling. Our results indicate a latent
capacity to integrate arbitrary sensory signals into the adaptive deployment of vocalizations in song-
birds and suggest that some of the contextual control observed in natural settings may likewise rely
on learned associations and other cognitive factors. Perhaps evolutionary pressures to develop
nuanced social communication led to the elaboration of cortical (pallial) control over brainstem vocal
circuitry (Hage and Nieder, 2016), and thereby established a conduit that facilitated the integration
of progressively more abstract cues and internal states in that control.
Neural implementation of context-dependent vocal motor sequencing
The ability of birds to switch between distinct motor programs using visual cues is reminiscent of
contextual speech and motor control studies in humans. For example, human subjects in both labo-
ratory studies and natural settings can learn multiple ‘states’ of vocal motor adaptation and rapidly
switch between them using contextual information (Houde and Jordan, 2002;Keough and Jones,
2011;Rochet-Capellan and Ostry, 2011). Similarly, subjects can learn two separate states of motor
adaptation for other motor skills, such as reaching, and switch between them using cues or other
cognitive strategies (Cunningham and Welch, 1994). Models of such context-dependent motor
adaptation frequently assume at least two parallel processes (Abrahamse et al., 2013;Ashe et al.,
2006;Green and Abutalebi, 2013;Hikosaka et al., 1999;Lee and Schweighofer, 2009;
McDougle et al., 2016;Rochet-Capellan and Ostry, 2011;Wolpert et al., 2011), one that is more
flexible, and sensitive to contextual information (McDougle et al., 2016), and a second that cannot
readily be associated with contextual cues and is only gradually updated during motor adaptation
(Howard et al., 2013). Specifically, in support of such a two-process model, Imamizu and Kawato,
2009 and Imamizu et al., 2007 found that contextual information can drive rapid shifts in adapta-
tion at the beginning of new blocks, without affecting the rate of adaptation within blocks. The simi-
lar separation in our study between rapid context-dependent shifts in sequence probability at the
onset of blocks, and gradual adaptation within blocks that does not improve with training
(Figure 2G–L), suggests that such contextual sequence learning in the Bengalese finch may also be
enabled by two distinct processes.
Humans studies of two-process models suggest that slow adaptation occurs primarily within pri-
mary motor structures, while fast context-dependent state switches, including for cued switching
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 11 of 19
Research article Neuroscience
between languages in bilinguals, engage more frontal areas involved in executive control (Bialys-
tok, 2017;Blanco-Elorrieta and Pylkka
¨nen, 2016;De Baene et al., 2015;Imamizu and Kawato,
2009). In songbirds, the gradual adaptation of sequence probabilities within blocks might likewise
be controlled by motor and premotor song control structures, while visual contextual cues could be
processed in avian structures analogous to mammalian prefrontal cortex, outside the song system.
For example, the association area nidopallium caudolaterale (Gu
¨ntu
¨rku
¨n, 2005), is activated by arbi-
trary visual cues that encode learned rules (Veit and Nieder, 2013;Veit et al., 2015), and this or
other avian association areas (Jarvis et al., 2013) may serve as an intermediate representation of
the arbitrary contextual cues that can drive rapid learned shifts in syllable sequencing.
At the level of song motor control, our results indicate a greater capacity for rapid and flexible
adjustment of syllable transition probabilities than previously appreciated. Current models of song
production include networks of neurons in the vocal premotor nucleus HVC responsible for the tem-
poral control of individual syllables, which are linked together by activity in a recurrent loop through
brainstem vocal centers (Andalman et al., 2011;Ashmore et al., 2005;Cohen et al., 2020;
Hamaguchi et al., 2016). At branch points in songs with variable syllable sequencing, one influential
model posits that which syllable follows a branch point is determined by stochastic processes that
depend on the strength of the connections between alternative syllable production networks, and
thus dynamics local to HVC (Jin, 2009;Jin and Kozhevnikov, 2011;Troyer et al., 2017;
Zhang et al., 2017). Such models could account for a gradual adjustment of sequence probabilities
over a period of hours or days (Lipkind et al., 2013;Warren et al., 2012) through plasticity of
motor control parameters, such as the strength of synaptic connections within HVC. However, our
results demonstrate that there is not a single set of relatively fixed transition probabilities that
undergo gradual adjustments, as could be captured in synaptic connectivity of branched syllable
control networks. Rather, the song system has the capacity to maintain distinct representations of
transition probabilities and can immediately switch between those in response to visual cues. HVC
receives a variety of inputs that potentially could convey such visual or cognitive influences on
sequencing (Bischof and Engelage, 1985;Cynx, 1990;Seki et al., 2008;Ullrich et al., 2016;
Wild, 1994), and one of these inputs, Nif, has previously been shown to be relevant for sequencing
(Hosino and Okanoya, 2000;Vyssotski et al., 2016). It therefore is likely that the control of syllable
sequence in Bengalese finches involves a mix of processes local to nuclei of the song motor pathway
(Basista et al., 2014;Zhang et al., 2017) as well as inputs that convey a variety of sensory feedback
and contextual information. The well-understood circuitry of the avian song system makes this an
attractive model to investigate how such top-down pathways orchestrate the kind of contextual con-
trol of vocalizations demonstrated in this study, and more broadly to uncover how differing cognitive
demands can flexibly and adaptively reconfigure motor output.
Materials and methods
Subjects and sound recordings
The experiments were carried out on eight adult male Bengalese finches (Lonchura striata) obtained
from the lab’s breeding colony (age range 128–320 days post-hatch, median 178 days, at start of
experiment). Birds were placed in individual sound-attenuating boxes with continuous monitoring
and auditory recording of song. Song was recorded using an omnidirectional microphone above the
cage. We used custom software for the online recognition of target syllables and real-time delivery
of short 40 ms bursts of WN depending on the syllable sequence (Tumer and Brainard, 2007;
Warren et al., 2012). This LabView program, EvTAF, is included as an executable file with this sub-
mission, and further support is available from the corresponding authors upon request. All proce-
dures were performed in accordance with animal care protocols approved by the University of
California, San Francisco Institutional Animal Care and Use Committee (IACUC).
Training procedure and blocks
Bengalese finch song consists of a discrete number of vocal elements, called syllables, that are sepa-
rated by periods of silence. At the start of each experiment, a template was generated to recognize
a specific sequence of syllables (the target sequence) for each bird based on their unique spectral
structure. In the context-dependent auditory feedback protocol, the target sequence that received
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 12 of 19
Research article Neuroscience
aversive WN feedback switched between blocks of different light contexts. Colored LEDs
(superbrightleds.com, St. Louis, MO; green 520 nm, amber 600 nm) produced two visually distinct
environments (green and yellow) to serve as contextual cues to indicate which sequences would elicit
WN and which would ‘escape’ (i.e. not trigger WN). We wanted to test whether the birds would be
able to associate song changes with any arbitrary visual stimulus; therefore, there was no reason to
choose these specific colors, and the birds’ color perception in this range should not matter, as long
as they were able to discriminate the colors. The entire day was used for data acquisition by alternat-
ing the two possible light contexts. We determined sensitivity and specificity of the template to the
target sequence on a randomly selected set of 20 song bouts on which labels and delivery of WN
was hand-checked. Template sensitivity was defined as follows: sensitivity = (number of correct hits)/
(total number of target sequences). The average template sensitivity across experiments was 91.3%
(range 75.2–100%). Template specificity was defined as: specificity = (number of correct escapes)/
(number of correct escapes plus number of false alarms), where correct escapes were defined as the
number of target sequences of the currently inactive context that were not hit by WN, and false
alarms were defined as any WN that was delivered either on the target sequence of the currently
inactive context, or anywhere else in song. The average template specificity was 96.7% (range 90.6–
100%).
At the start of each experiment, before WN training, songs were recorded during a baseline
period in which cage illumination was switched between colors at random intervals. Songs from this
baseline period were separately analyzed for each light color to confirm that there was no system-
atic, unlearned effect of light cues on sequencing before training. During initial training, cage illumi-
nation was alternatingly switched between colors at random intervals. Intervals were drawn from
uniform distributions which differed between birds (60–150 min [four birds], 10–30 min [two birds],
60–240 min [one bird], 30–150 min [one bird]). Different training schedules were assigned to birds
arbitrarily and were not related to a bird’s performance. After an extended period of training (aver-
age 33 days, range 12–79 days), probe blocks without WN were included, to test whether sequenc-
ing changes could be elicited by visual cues alone. During this period, probe blocks were
interspersed with WN training blocks. Probe blocks made up approximately one third of total blocks
(10 of 34 blocks in the sequence) and 7–35% of total time, depending on the bird. The duration of
probe blocks was typically shorter or equal to the duration of WN blocks (10–30 min for six birds,
30–120 min for one bird, 18–46 min for one bird). The total duration of the experiment, consisting of
baseline, training, and probe periods, was on average 52 days. During this period, birds sang 226
(range 66–356) bouts per day during baseline days and 258 (range 171–368) bouts per day during
the period of probe collection at the end of training (14% increase). The average duration of song
bouts also changed little, with both the average number of target sequences per bout (8.7 during
baseline, 7.7 during probes, 7% decrease) and the average number of syllables per bout (74 during
baseline, 71 during probes, 2% decrease) decreasing slightly. In addition to the eight birds that com-
pleted this training paradigm, three birds were started on contextual training but never progressed
to testing with probe blocks, because they did not exhibit single-context learning (n = 1); because of
technical issues with consistent targeting at branch points, (n = 1); or because they lost sequence
variability during initial stages of training (n = 1); these birds are excluded from the results. Of the
eight birds that completed training, three birds exhibited relatively small context-dependent
changes in sequencing (Figure 1H). We examined several variables to assess whether they could
account for differences in the magnitude of learning across birds, including the bird’s age, overall
transition entropy of the song (Katahira et al., 2013), transition entropy at the targeted branch
points (Warren et al., 2012), as well as the distance between the WN target and the closest preced-
ing branch point in the sequence. None of these variables were significantly correlated with the
degree of contextual learning that birds expressed (Figure 4—figure supplement 1), and conse-
quently, all birds were treated as a single group in analysis and reporting of results. In a subset of
experiments (n = 3), after completing measurements with probe blocks, we added a third, neutral
context (Figure 5), signaled by white light, in which there was no WN reinforcement.
Syllable sequence annotation
Syllable annotation for data analysis was performed offline. Each continuous period of singing that
was separated from others by at least 2 s of silence was treated as an individual ‘song’ or ‘song
bout’. Song was bandpass filtered between 500 Hz and 10,000 Hz and segmented into syllables and
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 13 of 19
Research article Neuroscience
gaps based on amplitude threshold and timing parameters determined manually for each bird. A
small sample of songs (approximately 20 song bouts) was then annotated manually based on visual
inspection of spectrograms. These data were used to train an offline autolabeler (‘hybrid-vocal-classi-
fier’, Nicholson, 2021), which was then used to label the remaining song bouts. Autolabeled songs
were processed further in a semi-automated way depending on each bird’s unique song, for exam-
ple to separate or merge syllables that were not segmented correctly (detected by their duration
distributions), to deal with WN covering syllables (detected by its amplitude), and to correct autolab-
eling errors detected based on the syllable sequence. A subset of songs was inspected manually for
each bird to confirm correct labeling.
Sequence probability analyses
Sequence probability was first calculated within each song bout as the frequency of the yellow target
sequence relative to the total number of yellow and green target sequences: p¼n target Yð Þ
n target Yð Þþn target Gð Þ.
Note that this differs from transition probabilities at branch points in song in that it ignores possible
additional syllable transitions at the branch point, and does not require the targeted sequences to
be directly following the same branch point. For example for the experiment in Figure 3, the target
sequences were ‘n-ab’ and ‘f-ab’, so the syllable covered by WN (‘b’ in both contexts) was
two to three syllables removed from the respective branch point in the syllable sequence (‘n-f’ vs. ‘n-
a’ or ‘f-n’ vs. ‘f-a’). Note also that units of sequence probability are in percent; therefore, reported
changes in percentages (e.g. Figures 1H and 2E,F) describe absolute changes in sequence probabil-
ity, which reflect the proportion of each target sequence, not percent changes. Song bouts that did
not contain either of the two target sequences were discarded. In the plots of sequence probability
over several days in Figure 1A–C, we calculated sequence probability for all bouts on a given day
(average n = 1854 renditions of both target sequences per day). We estimated 95% confidence
intervals by approximation with a normal distribution as pzffiffiffiffiffiffiffiffiffiffiffiffi
p1pð Þ
n
qwith
n¼n target Yð Þ þ n target Gð Þ and z = 1.96. Context switches were processed to include only switches
between adjacent blocks during the same day, that is excluding overnight switches and treating
blocks as separate contexts if one day started with the same color that had been the last color on
the previous day. If a bird did not produce any song during one block, this block was merged with
any neighboring block of the same color (e.g. green probe without songs before green WN, where
the context switch would not be noticeable for the bird). If the light color switched twice (or more)
without any song bouts, those context switches were discarded.
In order to reduce variability associated with changes across individual song bouts, shift magni-
tude was calculated as the difference between the first five song bouts in the new context and the
last five song bouts in the old context. Only context switches with at least three song bouts in each
adjacent block were included in analyses of shift magnitude. In plots showing songs aligned to con-
text switches, the x-axis is limited to show only points for which at least half of the blocks contrib-
uted data (i.e. in Figure 2D, half of the green probe blocks contained at least six songs). All
statistical tests were performed with MATLAB. We used non-parametric tests to compare changes
across birds (Wilcoxon rank-sum test for unpaired data, Wilcoxon signed-rank test for paired data),
because with only eight birds/data points, it is more conservative to assume that data are not Gauss-
ian distributed.
Analysis of acquisition
In order to investigate how context-dependent performance developed over training (Figure 2G–L),
we quantified changes to sequence probabilities across block switches for five birds for which we
had a continuous record from the onset of training. Sequence probability curves (e.g. Figure 2H) for
yellow switches were inverted so that both yellow and green switches were plotted in the same
direction, aligned by the time of context switches, and were cut off at a time point relative to con-
text switches where fewer than five switches contributed data. We then subtracted the mean pre-
switch value from each sequence probability curve. For visual display of the example bird, sequence
probability curves were smoothed with a nine bout boxcar window and displayed in bins of
seven context switches. To calculate the slope of slopes and slope of intercepts (Figure 2L), we
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 14 of 19
Research article Neuroscience
calculated a linear fit to the post-switch parts of the unsmoothed sequence probability curve for
each individual context switch.
Specificity to relevant branch points
To calculate the specificity of the context difference to the targeted branch points in song, we gen-
erated transition diagrams for each bird. To simplify the diagrams, introductory notes were summa-
rized into a single introductory state. Introductory notes were defined for each bird as up to three
syllables occurring at the start of song bouts before the main motif, which tended to be quieter,
more variable, with high probabilities to repeat and to transition to other introductory notes. Repeat
phrases were also summarized into a single state. Motifs, or chunks, in the song with fixed order of
syllables were identified by the stereotyped transitions and short gap durations between syllables in
the motif (Isola et al., 2020;Suge and Okanoya, 2010) and were also summarized as a single state
in the diagram. Sometimes, the same syllable can be part of several fixed chunks (Katahira et al.,
2013), in which case it may appear several times in the transition diagram. We then calculated the
difference between the transition matrices for the two probe contexts at each transition that was a
branch point (defined as more than 3% and less than 97% transition probability). These context dif-
ferences were split into ‘targeted branch points’, i.e., the branch point or branch points most closely
preceding the target sequences in the two contexts, and ‘non-targeted branch points’, i.e., all other
branch points in the song. We calculated the proportion of absolute contextual difference in the
transition matrix that fell to the targeted branch points, for example for the matrix in Figure 4C
(44 + 45)/(44 + 45 + 6+6 + 1+1 + 2+2)=83.2%. Typically, birds with clear contextual differences at
the target sequence also had high specificity of sequence changes to the targeted branch points.
To calculate the transition entropy of baseline song, we again summarized introductory notes into
a single introductory state. In addition, the same syllables as part of multiple fixed motifs, or in multi-
ple positions within the same fixed motif, were renamed as different syllables, so as not to count as
sequence variability what was really a stereotyped sequence (i.e. a-b 50% and b-c 50% in the fixed
sequence ‘abbc’). Transition entropy was then calculated as in Katahira et al., 2013: with x denoting
the preceding syllable and y denoting the current syllable, over all syllables in the song.
Acknowledgements
We thank Alla Karpova, Jon Sakata, Dave Mets, William Mehaffey, Assaf Breska, and Guy Avraham
for helpful discussions and comments on earlier versions of this manuscript. This work was supported
by the Howard Hughes Medical Institute. Lena Veit was supported as a Howard Hughes Medical
Institute Fellow of the Life Sciences Research Foundation and by a postdoctoral fellowship from Leo-
poldina German National Academy of Sciences. Christian J Monroy Hernandez was supported by an
HHMI EXROP summer fellowship.
Additional information
Funding
Funder Grant reference number Author
Leopoldina German National
Academy of Sciences
Postdoc Fellowship Lena Veit
Life Sciences Research Foun-
dation
Howard Hughes Medical
Institute Fellowship
Lena Veit
Howard Hughes Medical Insti-
tute
Michael S Brainard
Howard Hughes Medical Insti-
tute
EXROP summer fellowship Christian J Monroy Hernandez
The funders had no role in study design, data collection and interpretation, or the
decision to submit the work for publication.
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 15 of 19
Research article Neuroscience
Author contributions
Lena Veit, Conceptualization, Data curation, Software, Formal analysis, Supervision, Funding acquisi-
tion, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing;
Lucas Y Tian, Conceptualization, Software, Supervision, Writing - review and editing; Christian J
Monroy Hernandez, Investigation, Visualization; Michael S Brainard, Conceptualization, Resources,
Supervision, Funding acquisition, Writing - review and editing
Author ORCIDs
Lena Veit https://orcid.org/0000-0002-9566-5253
Lucas Y Tian http://orcid.org/0000-0002-7346-7360
Christian J Monroy Hernandez http://orcid.org/0000-0002-3796-989X
Michael S Brainard https://orcid.org/0000-0002-9425-9907
Ethics
Animal experimentation: All procedures were performed in accordance with protocols (#AN170723-
02) approved by the University of California, San Francisco Institutional Animal Care Use Committee
(IACUC).
Decision letter and Author response
Decision letter https://doi.org/10.7554/eLife.61610.sa1
Author response https://doi.org/10.7554/eLife.61610.sa2
Additional files
Supplementary files
.Source code 1. Matlab code to generate Figure 1.
.Source code 2. Matlab code to generate Figures 2C–F and 3E–H.
.Source code 3. Matlab code to generate Figure 2L.
.Source code 4. Matlab code to generate Figure 4G,Figure 4—figure supplement 1.
.Source code 5. Matlab code to generate Figure 5B–D.
.Transparent reporting form
Data availability
Raw data are included in the manuscript and supporting files. Source data have been provided for
all summary analyses, along with code to reproduce the figures.
References
Abrahamse EL, Ruitenberg MF, de Kleine E, Verwey WB. 2013. Control of automated behavior: insights from the
discrete sequence production task. Frontiers in Human Neuroscience 7:82. DOI: https://doi.org/10.3389/
fnhum.2013.00082,PMID: 23515430
Ackermann H, Hage SR, Ziegler W. 2014. Brain mechanisms of acoustic communication in humans and
nonhuman primates: an evolutionary perspective. Behavioral and Brain Sciences 37:529–546. DOI: https://doi.
org/10.1017/S0140525X13003099
Alcami P, Ma S, Gahr M. 2021. Telemetry reveals rapid duel-driven song plasticity in a naturalistic social
environment. bioRxiv.DOI: https://doi.org/10.1101/803411
Aldridge JW, Berridge KC. 2002. Coding of Behavioral Sequences in the Basal Ganglia. In: Nicholson L. F. B,
Faull R. L. M (Eds). The Basal Ganglia VII. Boston: Springer. p. 53–66. DOI: https://doi.org/10.1007/978-1-4615-
0715-4
Andalman AS, Foerster JN, Fee MS. 2011. Control of vocal and respiratory patterns in birdsong: dissection of
forebrain and brainstem mechanisms using temperature. PLOS ONE 6:e25461. DOI: https://doi.org/10.1371/
journal.pone.0025461,PMID: 21980466
Ashe J, Lungu OV, Basford AT, Lu X. 2006. Cortical control of motor sequences. Current Opinion in
Neurobiology 16:213–221. DOI: https://doi.org/10.1016/j.conb.2006.03.008,PMID: 16563734
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 16 of 19
Research article Neuroscience
Ashmore RC, Wild JM, Schmidt MF. 2005. Brainstem and forebrain contributions to the generation of learned
motor behaviors for song. Journal of Neuroscience 25:8543–8554. DOI: https://doi.org/10.1523/JNEUROSCI.
1668-05.2005,PMID: 16162936
Basista MJ, Elliott KC, Wu W, Hyson RL, Bertram R, Johnson F. 2014. Independent premotor encoding of the
sequence and structure of birdsong in avian cortex. Journal of Neuroscience 34:16821–16834. DOI: https://doi.
org/10.1523/JNEUROSCI.1940-14.2014,PMID: 25505334
Belyk M, Brown S. 2017. The origins of the vocal brain in humans. Neuroscience & Biobehavioral Reviews 77:
177–193. DOI: https://doi.org/10.1016/j.neubiorev.2017.03.014,PMID: 28351755
Benichov JI, Vallentin D. 2020. Inhibition within a premotor circuit controls the timing of vocal turn-taking in
zebra finches. Nature Communications 11:1–10. DOI: https://doi.org/10.1038/s41467-019-13938-0,PMID: 31
924758
Berwick RC, Okanoya K, Beckers GJ, Bolhuis JJ. 2011. Songs to syntax: the linguistics of birdsong. Trends in
Cognitive Sciences 15:113–121. DOI: https://doi.org/10.1016/j.tics.2011.01.002,PMID: 21296608
Bialystok E. 2017. The bilingual adaptation: how minds accommodate experience. Psychological Bulletin 143:
233–262. DOI: https://doi.org/10.1037/bul0000099,PMID: 28230411
Bischof HJ, Engelage J. 1985. Flash evoked responses in a song control nucleus of the zebra finch (Taeniopygia
guttata castanotis). Brain Research 326:370–374. DOI: https://doi.org/10.1016/0006-8993(85)90048-4,PMID: 3
971162
Blanco-Elorrieta E, Pylkka
¨nen L. 2016. Bilingual language control in perception versus action: meg reveals
comprehension control mechanisms in anterior cingulate cortex and Domain-General control of production in
dorsolateral prefrontal cortex. The Journal of Neuroscience 36:290–301. DOI: https://doi.org/10.1523/
JNEUROSCI.2597-15.2016,PMID: 26758823
Brainard MS, Doupe AJ. 2002. What songbirds teach us about learning. Nature 417:351–358. DOI: https://doi.
org/10.1038/417351a,PMID: 12015616
Brecht KF, Hage SR, Gavrilov N, Nieder A. 2019. Volitional control of vocalizations in corvid songbirds. PLOS
Biology 17:e3000375. DOI: https://doi.org/10.1371/journal.pbio.3000375,PMID: 31454343
Brumm H, Zollinger SA. 2013. Avian Vocal Production in Noise. In: Brumm H (Ed). Animal Communication and
Noise. Heidelberg, Berlin: Springer. p. 187–227. DOI: https://doi.org/10.1007/978-3-642-41494-7_7
Chen Y, Matheson LE, Sakata JT. 2016. Mechanisms underlying the social enhancement of vocal learning in
songbirds. PNAS 113:6641–6646. DOI: https://doi.org/10.1073/pnas.1522306113,PMID: 27247385
Cohen Y, Shen J, Semu D, Leman DP, Liberti WA, Perkins LN, Liberti DC, Kotton DN, Gardner TJ. 2020. Hidden
neural states underlie canary song syntax. Nature 582:539–544. DOI: https://doi.org/10.1038/s41586-020-
2397-3,PMID: 32555461
Cunningham HA, Welch RB. 1994. Multiple concurrent visual-motor mappings: implications for models of
adaptation. Journal of Experimental Psychology: Human Perception and Performance 20:987–999.
DOI: https://doi.org/10.1037/0096-1523.20.5.987,PMID: 7964533
Cynx J. 1990. Experimental determination of a unit of song production in the zebra finch (Taeniopygia guttata).
Journal of Comparative Psychology 104:3–10. DOI: https://doi.org/10.1037/0735-7036.104.1.3,PMID: 2354628
De Baene W, Duyck W, Brass M, Carreiras M. 2015. Brain circuit for cognitive control is shared by task and
language switching. Journal of Cognitive Neuroscience 27:1752–1765. DOI: https://doi.org/10.1162/jocn_a_
00817,PMID: 25901448
Doupe AJ, Kuhl PK. 1999. Birdsong and human speech: common themes and mechanisms. Annual Review of
Neuroscience 22:567–631. DOI: https://doi.org/10.1146/annurev.neuro.22.1.567,PMID: 10202549
Gadagkar V, Puzerey PA, Goldberg JH. 2019. Dopamine neurons change their tuning according to courtship
context in singing birds. bioRxiv.DOI: https://doi.org/10.1101/822817
Green DW, Abutalebi J. 2013. Language control in bilinguals: the adaptive control hypothesis. Journal of
Cognitive Psychology 25:515–530. DOI: https://doi.org/10.1080/20445911.2013.796377,PMID: 25077013
Gu
¨ntu
¨rku
¨n O. 2005. The avian ’prefrontal cortex’ and cognition. Current Opinion in Neurobiology 15:686–693.
DOI: https://doi.org/10.1016/j.conb.2005.10.003,PMID: 16263260
Hage SR, Nieder A. 2013. Single neurons in monkey prefrontal cortex encode volitional initiation of vocalizations.
Nature Communications 4:2409. DOI: https://doi.org/10.1038/ncomms3409,PMID: 24008252
Hage SR, Nieder A. 2016. Dual neural network model for the evolution of speech and language. Trends in
Neurosciences 39:813–829. DOI: https://doi.org/10.1016/j.tins.2016.10.006,PMID: 27884462
Hamaguchi K, Tanaka M, Mooney R. 2016. A distributed recurrent network contributes to temporally precise
vocalizations. Neuron 91:680–693. DOI: https://doi.org/10.1016/j.neuron.2016.06.019,PMID: 27397518
Hauser MD, Chomsky N, Fitch WT. 2002. The faculty of language: what is it, who has it, and how did it evolve?
Science 298:1569–1579. DOI: https://doi.org/10.1126/science.298.5598.1569,PMID: 12446899
Hedley RW, Denton KK, Weiss RE. 2017. Accounting for syntax in analyses of countersinging reveals hidden
vocal dynamics in a songbird with a large repertoire. Animal Behaviour 131:23–32. DOI: https://doi.org/10.
1016/j.anbehav.2017.06.021
Heinig A, Pant S, Dunning J, Bass A, Coburn Z, Prather JF. 2014. Male mate preferences in mutual mate choice:
finches modulate their songs across and within male-female interactions. Animal Behaviour 97:1–12.
DOI: https://doi.org/10.1016/j.anbehav.2014.08.016,PMID: 25242817
Hikosaka O, Nakahara H, Rand MK, Sakai K, Lu X, Nakamura K, Miyachi S, Doya K. 1999. Parallel neural networks
for learning sequential procedures. Trends in Neurosciences 22:464–471. DOI: https://doi.org/10.1016/S0166-
2236(99)01439-3,PMID: 10481194
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 17 of 19
Research article Neuroscience
Hosino T, Okanoya K. 2000. Lesion of a higher-order song nucleus disrupts phrase level complexity in bengalese
finches. NeuroReport 11:2091–2095. DOI: https://doi.org/10.1097/00001756-200007140-00007,PMID: 10
923650
Houde JF, Jordan MI. 2002. Sensorimotor adaptation of speech I: compensation and adaptation. Journal of
Speech, Language, and Hearing Research: JSLHR 45:295–310. DOI: https://doi.org/10.1044/1092-4388(2002/
023),PMID: 12003512
Howard IS, Wolpert DM, Franklin DW. 2013. The effect of contextual cues on the encoding of motor memories.
Journal of Neurophysiology 109:2632–2644. DOI: https://doi.org/10.1152/jn.00773.2012,PMID: 23446696
Imamizu H, Sugimoto N, Osu R, Tsutsui K, Sugiyama K, Wada Y, Kawato M. 2007. Explicit contextual information
selectively contributes to predictive switching of internal models. Experimental Brain Research 181:395–408.
DOI: https://doi.org/10.1007/s00221-007-0940-1,PMID: 17437093
Imamizu H, Kawato M. 2009. Brain mechanisms for predictive control by switching internal models: implications
for higher-order cognitive functions. Psychological Research Psychologische Forschung 73:527–544.
DOI: https://doi.org/10.1007/s00426-009-0235-1,PMID: 19347360
Isola GR, Vochin A, Sakata JT. 2020. Manipulations of inhibition in cortical circuitry differentially affect spectral
and temporal features of bengalese finch song. Journal of Neurophysiology 123:815–830. DOI: https://doi.org/
10.1152/jn.00142.2019,PMID: 31967928
Jaffe PI, Brainard MS. 2020. Acetylcholine acts on songbird premotor circuitry to invigorate vocal output. eLife 9:
e53288. DOI: https://doi.org/10.7554/eLife.53288,PMID: 32425158
James LS, Dai JB, Sakata JT. 2018. Ability to modulate birdsong across social contexts develops without
imitative social learning. Biology Letters 14:20170777. DOI: https://doi.org/10.1098/rsbl.2017.0777,PMID: 2
9540565
Jarvis ED, Yu J, Rivas MV, Horita H, Feenders G, Whitney O, Jarvis SC, Jarvis ER, Kubikova L, Puck AE, Siang-
Bakshi C, Martin S, McElroy M, Hara E, Howard J, Pfenning A, Mouritsen H, Chen CC, Wada K. 2013. Global
view of the functional molecular organization of the avian cerebrum: mirror images and functional columns.
Journal of Comparative Neurology 521:3614–3665. DOI: https://doi.org/10.1002/cne.23404,PMID: 23818122
Jin DZ. 2009. Generating variable birdsong syllable sequences with branching chain networks in avian premotor
nucleus HVC. Physical Review E 80:051902. DOI: https://doi.org/10.1103/PhysRevE.80.051902,
PMID: 20365001
Jin X, Costa RM. 2015. Shaping action sequences in basal ganglia circuits. Current Opinion in Neurobiology 33:
188–196. DOI: https://doi.org/10.1016/j.conb.2015.06.011,PMID: 26189204
Jin DZ, Kozhevnikov AA. 2011. A compact statistical model of the song syntax in bengalese finch. PLOS
Computational Biology 7:e1001108. DOI: https://doi.org/10.1371/journal.pcbi.1001108,PMID: 21445230
Katahira K, Suzuki K, Kagawa H, Okanoya K. 2013. A simple explanation for the evolution of complex song
syntax in bengalese finches. Biology Letters 9:20130842. DOI: https://doi.org/10.1098/rsbl.2013.0842,
PMID: 24284561
Keough D, Jones JA. 2011. Contextual cuing contributes to the independent modification of multiple internal
models for vocal control. Journal of Neurophysiology 105:2448–2456. DOI: https://doi.org/10.1152/jn.00291.
2010,PMID: 21346208
King SL, McGregor PK. 2016. Vocal matching: the what, the why and the how. Biology Letters 12:20160666.
DOI: https://doi.org/10.1098/rsbl.2016.0666,PMID: 28120803
Kojima S, Doupe AJ. 2011. Social performance reveals unexpected vocal competency in young songbirds. PNAS
108:1687–1692. DOI: https://doi.org/10.1073/pnas.1010502108,PMID: 21220335
Lee JY, Schweighofer N. 2009. Dual adaptation supports a parallel architecture of motor memory. Journal of
Neuroscience 29:10396–10404. DOI: https://doi.org/10.1523/JNEUROSCI.1294-09.2009,PMID: 19692614
Lipkind D, Marcus GF, Bemis DK, Sasahara K, Jacoby N, Takahasi M, Suzuki K, Feher O, Ravbar P, Okanoya K,
Tchernichovski O. 2013. Stepwise acquisition of vocal combinatorial capacity in songbirds and human infants.
Nature 498:104–108. DOI: https://doi.org/10.1038/nature12173,PMID: 23719373
McDougle SD, Ivry RB, Taylor JA. 2016. Taking aim at the cognitive side of learning in sensorimotor adaptation
tasks. Trends in Cognitive Sciences 20:535–544. DOI: https://doi.org/10.1016/j.tics.2016.05.002,
PMID: 27261056
Nicholson D. 2021. NickleDave/hybrid-vocal-classifier. Github. 0.3.0. https://github.com/NickleDave/hybrid-vocal-
classifier.git
Nieder A, Mooney R. 2020. The neurobiology of innate, volitional and learned vocalizations in mammals and
birds. Philosophical Transactions of the Royal Society B: Biological Sciences 375:20190054. DOI: https://doi.
org/10.1098/rstb.2019.0054
Okanoya K. 2004. The bengalese finch: a window on the behavioral neurobiology of birdsong syntax. Annals of
the New York Academy of Sciences 1016:724–735. DOI: https://doi.org/10.1196/annals.1298.026,
PMID: 15313802
Reichmuth C, Casey C. 2014. Vocal learning in seals, sea lions, and walruses. Current Opinion in Neurobiology
28:66–71. DOI: https://doi.org/10.1016/j.conb.2014.06.011,PMID: 25042930
Rochet-Capellan A, Ostry DJ. 2011. Simultaneous acquisition of multiple auditory-motor transformations in
speech. Journal of Neuroscience 31:2657–2662. DOI: https://doi.org/10.1523/JNEUROSCI.6020-10.2011,
PMID: 21325534
Sakata JT, Hampton CM, Brainard MS. 2008. Social modulation of sequence and syllable variability in adult
birdsong. Journal of Neurophysiology 99:1700–1711. DOI: https://doi.org/10.1152/jn.01296.2007,PMID: 1
8216221
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 18 of 19
Research article Neuroscience
Searcy WA, Beecher MD. 2009. Song as an aggressive signal in songbirds. Animal Behaviour 78:1281–1292.
DOI: https://doi.org/10.1016/j.anbehav.2009.08.011
Seki Y, Suzuki K, Takahasi M, Okanoya K. 2008. Song motor control organizes acoustic patterns on two levels in
bengalese finches (Lonchura striata var Domestica). Journal of Comparative Physiology A 194:533–543.
DOI: https://doi.org/10.1007/s00359-008-0328-0
Simonyan K, Horwitz B. 2011. Laryngeal motor cortex and control of speech in humans. The Neuroscientist 17:
197–208. DOI: https://doi.org/10.1177/1073858410386727,PMID: 21362688
Sossinka R, Bo
¨hner J. 1980. Song types in the zebra finch Poephila guttata castanotis1. Zeitschrift Fu
¨r
Tierpsychologie 53:123–132. DOI: https://doi.org/10.1111/j.1439-0310.1980.tb01044.x
Suge R, Okanoya K. 2010. Perceptual chunking in the self-produced songs of bengalese finches (Lonchura striata
var Domestica). Animal Cognition 13:515–523. DOI: https://doi.org/10.1007/s10071-009-0302-4,PMID: 200390
89
Suzuki TN, Zuberbu
¨hler K. 2019. Animal syntax. Current Biology 29:R669–R671. DOI: https://doi.org/10.1016/j.
cub.2019.05.045,PMID: 31336078
Tanji J. 2001. Sequential organization of multiple movements: involvement of cortical motor Areas. Annual
Review of Neuroscience 24:631–651. DOI: https://doi.org/10.1146/annurev.neuro.24.1.631,PMID: 11520914
Trillo PA, Vehrencamp SL. 2005. Song types and their structural features are associated with specific contexts in
the banded wren. Animal Behaviour 70:921–935. DOI: https://doi.org/10.1016/j.anbehav.2005.02.004,
PMID: 17173097
Troyer TW, Brainard MS, Bouchard KE. 2017. Timing during transitions in bengalese finch song: implications for
motor sequencing. Journal of Neurophysiology 118:1556–1566. DOI: https://doi.org/10.1152/jn.00296.2017,
PMID: 28637816
Tumer EC, Brainard MS. 2007. Performance variability enables adaptive plasticity of ’crystallized’ adult birdsong.
Nature 450:1240–1244. DOI: https://doi.org/10.1038/nature06390,PMID: 18097411
Ullrich R, Norton P, Scharff C. 2016. Waltzing Taeniopygia: integration of courtship song and dance in the
domesticated australian zebra finch. Animal Behaviour 112:285–300. DOI: https://doi.org/10.1016/j.anbehav.
2015.11.012
Veit L, Pidpruzhnykova G, Nieder A. 2015. Associative learning rapidly establishes neuronal representations of
upcoming behavioral choices in crows. PNAS 112:15208–15213. DOI: https://doi.org/10.1073/pnas.
1509760112,PMID: 26598669
Veit L, Nieder A. 2013. Abstract rule neurons in the endbrain support intelligent behaviour in corvid songbirds.
Nature Communications 4:2878. DOI: https://doi.org/10.1038/ncomms3878,PMID: 24285080
Vignal C, Mathevon N, Mottin S. 2004. Audience drives male songbird response to partner’s voice. Nature 430:
448–451. DOI: https://doi.org/10.1038/nature02645,PMID: 15269767
Vyssotski AL, Stepien AE, Keller GB, Hahnloser RH. 2016. A neural code that is isometric to vocal output and
correlates with its sensory consequences. PLOS Biology 14:e2000317. DOI: https://doi.org/10.1371/journal.
pbio.2000317,PMID: 27723764
Warren TL, Charlesworth JD, Tumer EC, Brainard MS. 2012. Variable sequencing is actively maintained in a well
learned motor skill. Journal of Neuroscience 32:15414–15425. DOI: https://doi.org/10.1523/JNEUROSCI.1254-
12.2012,PMID: 23115179
Wheeler BC, Fischer J. 2012. Functionally referential signals: a promising paradigm whose time has passed.
Evolutionary Anthropology: Issues, News, and Reviews 21:195–205. DOI: https://doi.org/10.1002/evan.21319,
PMID: 23074065
Wild JM. 1994. Visual and somatosensory inputs to the avian song system via nucleus uvaeformis (Uva) and a
comparison with the projections of a similar thalamic nucleus in a nonsongbird, Columba livia. The Journal of
Comparative Neurology 349:512–535. DOI: https://doi.org/10.1002/cne.903490403,PMID: 7860787
Wolpert DM, Diedrichsen J, Flanagan JR. 2011. Principles of sensorimotor learning. Nature Reviews
Neuroscience 12:739–751. DOI: https://doi.org/10.1038/nrn3112,PMID: 22033537
Zhang YS, Wittenbach JD, Jin DZ, Kozhevnikov AA. 2017. Temperature manipulation in songbird brain implicates
the premotor nucleus HVC in birdsong syntax. The Journal of Neuroscience 37:2600–2611. DOI: https://doi.org/
10.1523/JNEUROSCI.1827-16.2017,PMID: 28159910
Veit et al. eLife 2021;10:e61610. DOI: https://doi.org/10.7554/eLife.61610 19 of 19
Research article Neuroscience
... Conversely, rapid adaptation in response to changing environmental contexts has been demonstrated in both birdsong [67][68][69] and human speech [70]. This apparent plasticity of a well learned, crystalized behavior suggests that vocalization may be controlled by a "malleable template," in which trial-by-trial variability is used to adapt learned behaviors [71]. ...
Article
Full-text available
To successfully traverse their environment, humans often perform maneuvers to achieve desired task goals while simultaneously maintaining balance. Humans accomplish these tasks primarily by modulating their foot placements. As humans are more unstable laterally, we must better understand how humans modulate lateral foot placement. We previously developed a theoretical framework and corresponding computational models to describe how humans regulate lateral stepping during straight-ahead continuous walking. We identified goal functions for step width and lateral body position that define the walking task and determine the set of all possible task solutions as Goal Equivalent Manifolds (GEMs). Here, we used this framework to determine if humans can regulate lateral stepping during non -steady-state lateral maneuvers by minimizing errors consistent with these goal functions. Twenty young healthy adults each performed four lateral lane-change maneuvers in a virtual reality environment. Extending our general lateral stepping regulation framework, we first re-examined the requirements of such transient walking tasks. Doing so yielded new theoretical predictions regarding how steps during any such maneuver should be regulated to minimize error costs, consistent with the goals required at each step and with how these costs are adapted at each step during the maneuver. Humans performed the experimental lateral maneuvers in a manner consistent with our theoretical predictions. Furthermore, their stepping behavior was well modeled by allowing the parameters of our previous lateral stepping models to adapt from step to step. To our knowledge, our results are the first to demonstrate humans might use evolving cost landscapes in real time to perform such an adaptive motor task and, furthermore, that such adaptation can occur quickly–over only one step. Thus, the predictive capabilities of our general stepping regulation framework extend to a much greater range of walking tasks beyond just normal, straight-ahead walking.
... Also similar to humans, songbirds have evolved specialized neural pathways for vocal learning, allowing the precise interrogation of the brain mechanisms of song plasticity (Jarvis, 2019;Brainard and Doupe, 2002). However, prior research on this brain network has focused almost exclusively on the role of auditory feedback, although recent work has shown the importance of visual cues (light) in shaping vocalizations (Veit et al., 2021;Zai et al., 2020). Previous studies have revealed that songbird brains have a basal ganglia-thalamocortical circuit, the anterior forebrain pathway (AFP), that is required for auditory-guided vocal learning but not vocal production (Figure 1a; Brainard and Doupe, 2000;Nordeen and Nordeen, 1993;Mooney, 2009;Bottjer et al., 1984). ...
Article
Full-text available
Songbirds and humans share the ability to adaptively modify their vocalizations based on sensory feedback. Prior studies have focused primarily on the role that auditory feedback plays in shaping vocal output throughout life. In contrast, it is unclear how non-auditory information drives vocal plasticity. Here, we first used a reinforcement learning paradigm to establish that somatosensory feedback (cutaneous electrical stimulation) can drive vocal learning in adult songbirds. We then assessed the role of a songbird basal ganglia thalamocortical pathway critical to auditory vocal learning in this novel form of vocal plasticity. We found that both this circuit and its dopaminergic inputs are necessary for non-auditory vocal learning, demonstrating that this pathway is critical for guiding adaptive vocal changes based on both auditory and somatosensory signals. The ability of this circuit to use both auditory and somatosensory information to guide vocal learning may reflect a general principle for the neural systems that support vocal plasticity across species.
... Vocalizations are generally considered to be involuntary when they appear to be consistently and "automatically" triggered by specific stimuli or contexts, and inborn when they are produced by individuals who have had no opportunity to learn them (Arriaga and Jarvis 2013). Determining whether a particular vocal act is involuntary and/or learned is a nontrivial task, and a variety of experimental procedures have been developed to clarify where specific vocalizations fall along the spectrum of control (e.g., Kloepper et al. 2014;Rose et al. 2022;Veit et al. 2021;Zhang and Ghazanfar 2019). Observational data have also been used to distinguish voluntary from involuntary vocalizations. ...
Article
Full-text available
Singing humpback whales are highly versatile vocalizers, producing complex sequences of sounds that they vary throughout adulthood. Past analyses of humpback whale song have emphasized yearly variations in structural features of songs made collectively by singers within a population with comparatively little attention given to the ways that individual singers vary consecutive songs. As a result, many researchers describe singing by humpback whales as a process in which singers produce sequences of repeating sound patterns. Here, we show that such characterizations misrepresent the degree to which humpback whales flexibly and dynamically control the production of sounds and sound patterns within song sessions. Singers recorded off the coast of Hawaii continuously morphed units along multiple acoustic dimensions, with the degree and direction of morphing varying across parallel streams of successive units. Individual singers also produced multiple phrase variants (structurally similar, but acoustically distinctive sequences) within song sessions. The precision with which individual singers maintained some acoustic properties of phrases and morphing trajectories while flexibly changing others suggests that singing humpback whales actively select and adjust acoustic elements of their songs in real time rather than simply repeating stereotyped sound patterns within song sessions.
... No other mammalian species produce vocalizations that are as richly structured as human language. Many songbird species could produce songs that have complex syntax with syllable sequences following certain rules [1][2][3][4], which are flexible in different contexts [5]. Therefore, songbirds have been widely studied as an animal model to investigate the evolution and neural mechanisms of complex vocal sequencing [6,7]. ...
Preprint
Full-text available
Vocal sequencing is a key element in human speech. Songbirds have been widely studied as an animal model to investigate neural mechanisms of vocal sequencing, due to the complex syntax of syllable sequences in their songs. However, songbirds are phylogenetically distant from humans. So far, there is little evidence of complex syntactic vocalizations in non-human primates. Here, we analyze phee sounds produced by 160 marmoset monkeys either in isolation or during vocal turn-taking and reveal complex sequencing rules at multiple levels. First, phee syllables exhibited consistent interval patterns among different marmosets, allowing categorization of calls with single and closely spaced 2-4 syllables into 4 grades. Second, the ordering of sequential calls followed distinct probabilistic rules that preferring repetition of the same-grade call and then transition between calls of adjacent grades, but not skip-grade transition. Moreover, inter-call intervals depended on the transition direction. Third, specific ABnA call patterns were discovered to be prominent in long call sequences, and their occurrence exhibited a power-law decrease with increasing 'n', reflecting a long-range sequencing rule in the dependence of later calls on the pattern of earlier calls. Finally, syllable and call intervals as well as call compositions were significantly modified during vocal turn-taking. This complex syntax of vocal sequences in marmosets offers opportunities for understanding the evolutionary origin and neural mechanisms of grammatical complexity in human language.
... . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Conversely, rapid adaptation in response to changing environmental contexts has been demonstrated in both 372 birdsong [67][68][69] and human speech [70]. This apparent plasticity of a well learned, crystalized behavior 373 suggests that vocalization may be controlled by a "malleable template," in which trial-by-trial variability is 374 used to adapt learned behaviors [71]. ...
Preprint
Full-text available
To successfully traverse their environment, humans often perform maneuvers to achieve desired task goals while simultaneously maintaining balance. Humans accomplish these tasks primarily by modulating their foot placements. As humans are more unstable laterally, we must better understand how humans modulate lateral foot placement. We previously developed a theoretical framework and corresponding computational models to describe how humans regulate lateral stepping during straight-ahead continuous walking. We identified goal functions for step width and lateral body position that define the walking task and determine the set of all possible task solutions as Goal Equivalent Manifolds (GEMs). Here, we used this framework to determine if humans can regulate lateral stepping during non -steady-state lateral maneuvers by minimizing errors consistent with these goal functions. Twenty young healthy adults each performed four lateral lane-change maneuvers in a virtual reality environment. Extending our general lateral stepping regulation framework, we first re-examined the requirements of such transient walking tasks. Doing so yielded new theoretical predictions regarding how steps during any such maneuver should be regulated to minimize error costs, consistent with the goals required at each step and with how these costs are adapted at each step during the maneuver. Humans performed the experimental lateral maneuvers in a manner consistent with our theoretical predictions. Furthermore, their stepping behavior was well modeled by allowing the parameters of our previous lateral stepping models to adapt from step to step. To our knowledge, our results are the first to demonstrate humans might use evolving cost landscapes in real time to perform such an adaptive motor task and, furthermore, that such adaptation can occur quickly – over only one step. Thus, the predictive capabilities of our general stepping regulation framework extend to a much greater range of walking tasks beyond just normal, straight-ahead walking. AUTHOR SUMMARY When we walk in the real world, we rarely walk continuously in a straight line. Indeed, we regularly have to perform other tasks like stepping aside to avoid an obstacle in our path (either fixed or moving, like another person coming towards us). While we have to be highly maneuverable to accomplish such tasks, we must also maintain balance to avoid falling while doing so. This is challenging because walking humans are inherently more unstable side-to-side. Sideways falls are particularly dangerous for older adults as they can lead to hip fractures. Here, we establish a theoretical basis for how people might accomplish such maneuvers. We show that humans execute a simple lateral lane-change maneuver consistent with our theoretical predictions. Importantly, our simulations show they can do so by adapting at each step the same step-to-step regulation strategies they use to walk straight ahead. Moreover, these same control processes also explain how humans trade-off side-to-side stability to gain the maneuverability they need to perform such lateral maneuvers.
... In combination with such methods, supervised machine learning models have been used to successfully annotate large-scale behavioral experiments (e.g. Veit et al., 2021). But these additional clean-up steps add complexity and require the researcher to perform further tuning and validation. ...
Article
Full-text available
Songbirds provide a powerful model system for studying sensory-motor learning. However, many analyses of birdsong require time-consuming, manual annotation of its elements, called syllables. Automated methods for annotation have been proposed, but these methods assume that audio can be cleanly segmented into syllables, or they require carefully tuning multiple statistical models. Here we present TweetyNet: a single neural network model that learns how to segment spectrograms of birdsong into annotated syllables. We show that TweetyNet mitigates limitations of methods that rely on segmented audio. We also show that TweetyNet performs well across multiple individuals from two species of songbirds, Bengalese finches and canaries. Lastly, we demonstrate that using TweetyNet we can accurately annotate very large datasets containing multiple days of song, and that these predicted annotations replicate key findings from behavioral studies. In addition, we provide open-source software to assist other researchers, and a large dataset of annotated canary song that can serve as a benchmark. We conclude that TweetyNet makes it possible to address a wide range of new questions about birdsong.
Article
Full-text available
The zebra finch (ZF) and the Bengalese finch (BF) are animal models that have been commonly used for neurobiological studies on vocal learning. Although they largely share the brain structure for vocal learning and production, BFs produce more complex and variable songs than ZFs, providing a great opportunity for comparative studies to understand how animals learn and control complex motor behaviors. Here, we performed a comparative study between the two species by focusing on intrinsic motivation for non-courtship singing (“undirected singing”), which is critical for the development and maintenance of song structure. A previous study has demonstrated that ZFs dramatically increase intrinsic motivation for undirected singing when singing is temporarily suppressed by a dark environment. We found that the same procedure in BFs induced the enhancement of intrinsic singing motivation to much smaller degrees than that in ZFs. Moreover, unlike ZFs that rarely sing in dark conditions, substantial portion of BFs exhibited frequent singing in darkness, implying that such “dark singing” may attenuate the enhancement of intrinsic singing motivation during dark periods. In addition, measurements of blood corticosterone levels in dark and light conditions provided evidence that although BFs have lower stress levels than ZFs in dark conditions, such lower stress levels in BFs are not the major factor responsible for their frequent dark singing. Our findings highlight behavioral and physiological differences in spontaneous singing behaviors of BFs and ZFs and provide new insights into the interactions between singing motivation, ambient light, and environmental stress.
Preprint
Full-text available
Many sequenced behaviors, including locomotion, reaching, and vocalization, are patterned differently in different contexts, enabling animals to adjust to their current environments. However, how contextual information shapes neural activity to flexibly alter the patterning of actions is not yet understood. Prior work indicates such flexibility could be achieved via parallel motor circuits, with differing sensitivities to sensory context [1, 2, 3]; instead we demonstrate here how a single neural pathway operates in two different regimes dependent on recent sensory history. We leverage the Drosophila song production system [4] to investigate the neural mechanisms that support male song sequence generation in two contexts: near versus far from the female. While previous studies identified several song production neurons [5, 6, 7, 8], how these neurons are organized to mediate song patterning was unknown. We find that male flies sing 'simple' trains of only one syllable or mode far from the female but complex song sequences consisting of alternations between modes when near to her. We characterize the male song circuit from the brain to the ventral nerve cord (VNC), and find that the VNC song pre-motor circuit is shaped by two key computations: mutual inhibition and rebound excitability [9] between nodes driving the two modes of song. Weak sensory input to a direct brain-to-VNC excitatory pathway (via pC2 brain and pIP10 descending neurons) drives simple song far from the female. Strong sensory input to the same pathway enables complex song production via simultaneous recruitment of P1a neuron-mediated disinhibition of the VNC song pre-motor circuit. Thus, proximity to the female effectively unlocks motor circuit dynamics in the correct sensory context. We construct a compact circuit model to demonstrate that these few computations are sufficient to replicate natural context-dependent song dynamics. These results have broad implications for neural population-level models of context-dependent behavior [10] and highlight that canonical circuit motifs [11, 12, 13] can be combined in novel ways to enable circuit flexibility required for dynamic communication.
Article
Full-text available
Coordinated skills such as speech or dance involve sequences of actions that follow syntactic rules in which transitions between elements depend on the identities and order of past actions. Canary songs consist of repeated syllables called phrases, and the ordering of these phrases follows long-range rules¹ in which the choice of what to sing depends on the song structure many seconds prior. The neural substrates that support these long-range correlations are unknown. Here, using miniature head-mounted microscopes and cell-type-specific genetic tools, we observed neural activity in the premotor nucleus HVC2,3,4 as canaries explored various phrase sequences in their repertoire. We identified neurons that encode past transitions, extending over four phrases and spanning up to four seconds and forty syllables. These neurons preferentially encode past actions rather than future actions, can reflect more than one song history, and are active mostly during the rare phrases that involve history-dependent transitions in song. These findings demonstrate that the dynamics of HVC include ‘hidden states’ that are not reflected in ongoing behaviour but rather carry information about prior actions. These states provide a possible substrate for the control of syntax transitions governed by long-range rules.
Article
Full-text available
Acetylcholine is well-understood to enhance cortical sensory responses and perceptual sensitivity in aroused or attentive states. Yet little is known about cholinergic influences on motor cortical regions. Here we use the quantifiable nature of birdsong to investigate how acetylcholine modulates the cortical (pallial) premotor nucleus HVC and shapes vocal output. We found that dialyzing the cholinergic agonist carbachol into HVC increased the pitch, amplitude, tempo and stereotypy of song, similar to the natural invigoration of song that occurs when males direct their songs to females. These carbachol-induced effects were associated with increased neural activity in HVC and occurred independently of basal ganglia circuitry. Moreover, we discovered that the normal invigoration of female-directed song was also accompanied by increased HVC activity and was attenuated by blocking muscarinic acetylcholine receptors. These results indicate that, analogous to its influence on sensory systems, acetylcholine can act directly on cortical premotor circuitry to adaptively shape behavior.
Article
Full-text available
Vocal turn-taking is a fundamental organizing principle of human conversation but the neural circuit mechanisms that structure coordinated vocal interactions are unknown. The ability to exchange vocalizations in an alternating fashion is also exhibited by other species, including zebra finches. With a combination of behavioral testing, electrophysiological recordings, and pharmacological manipulations we demonstrate that activity within a cortical premotor nucleus orchestrates the timing of calls in socially interacting zebra finches. Within this circuit, local inhibition precedes premotor neuron activation associated with calling. Blocking inhibition results in faster vocal responses as well as an impaired ability to flexibly avoid overlapping with a partner. These results support a working model in which premotor inhibition regulates context-dependent timing of vocalizations and enables the precise interleaving of vocal signals during turn-taking.
Article
Full-text available
Vocalization is an ancient vertebrate trait essential to many forms of communication, ranging from courtship calls to free verse. Vocalizations may be entirely innate and evoked by sexual cues or emotional state, as with many types of calls made in primates, rodents and birds; volitional, as with innate calls that, following extensive training, can be evoked by arbitrary sensory cues in non-human primates and corvid songbirds; or learned, acoustically flexible and complex, as with human speech and the courtship songs of oscine songbirds. This review compares and contrasts the neural mechanisms underlying innate, volitional and learned vocalizations, with an emphasis on functional studies in primates, rodents and songbirds. This comparison reveals both highly conserved and convergent mechanisms of vocal production in these different groups, despite their often vast phylogenetic separation. This similarity of central mechanisms for different forms of vocal production presents experimentalists with useful avenues for gaining detailed mechanistic insight into how vocalizations are employed for social and sexual signalling, and how they can be modified through experience to yield new vocal repertoires customized to the individual's social group. This article is part of the theme issue ‘What can animal communication teach us about human language?’
Article
Full-text available
Songbirds are renowned for their acoustically elaborate songs. However, it is unclear whether songbirds can cognitively control their vocal output. Here, we show that crows, songbirds of the corvid family, can be trained to exert control over their vocalizations. In a detection task, three male carrion crows rapidly learned to emit vocalizations in response to a visual cue with no inherent meaning (go trials) and to withhold vocalizations in response to another cue (catch trials). Two of these crows were then trained on a go/nogo task, with the cue colors reversed, in addition to being rewarded for withholding vocalizations to yet another cue (nogo trials). Vocalizations in response to the detection of the go cue were temporally precise and highly reliable in all three crows. Crows also quickly learned to withhold vocal output in nogo trials, showing that vocalizations were not produced by an anticipation of a food reward in correct trials. The results demonstrate that corvids can volitionally control the release and onset of their vocalizations, suggesting that songbird vocalizations are under cognitive control and can be decoupled from affective states.
Article
Full-text available
Many important behaviours are socially learned. For example, the acoustic structure of courtship songs in songbirds is learned by listening to and interacting with conspecifics during a sensitive period in development. Signallers modify the spectral and temporal structures of their vocalizations depending on the social context, but the degree to which this modulation requires imitative social learning remains unknown. We found that male zebra finches (Taeniopygia guttata) that were not exposed to context-dependent song modulations throughout development significantly modulated their song in ways that were typical of socially reared birds. Furthermore, the extent of these modulations was not significantly different between finches that could or could not observe these modulations during tutoring. These data suggest that this form of vocal flexibility develops without imitative social learning in male zebra finches.
Article
The interplay between inhibition and excitation can regulate behavioral expression and control, including the expression of communicative behaviors like birdsong. Computational models postulate varying degrees to which inhibition within vocal motor circuitry influences birdsong but few studies have tested these models by manipulating inhibition. Here, we enhanced and attenuated inhibition in the cortical nucleus HVC (used as proper name) of Bengalese finches ( Lonchura striata var. domestica). Enhancement of inhibition (using muscimol) in HVC dose-dependently reduced the amount of song produced. Infusions of higher concentrations of muscimol caused some birds to produce spectrally degraded songs, whereas infusions of lower doses of muscimol led to the production of relatively normal (non-degraded) songs. However, the spectral and temporal structures of these non-degraded songs were significantly different than songs produced under control conditions. In particular, muscimol infusions decreased the frequency and amplitude of syllables, increased various measures of acoustic entropy, and increased the variability of syllable structure. Muscimol also increased sequence durations, and the variability of syllable timing and syllable sequencing. Attenuating inhibition (using bicuculline) in HVC led to distinct and often opposite changes to song than enhancing inhibition. For example, in contrast to muscimol, bicuculline infusions increased syllable amplitude, frequency, and duration and decreased the variability of acoustic features. However, like muscimol, bicuculline increased the variability of syllable sequencing. These data highlight the importance of inhibition to the production of stereotyped vocalizations and demonstrate that changes to neural dynamics within cortical circuitry can differentially affect spectral and temporal features of song.
Preprint
Attending to mistakes while practicing alone provides opportunities for learning, but self-evaluation during audience-directed performance could distract from ongoing execution. It remains unknown how animals switch between practice and performance modes, and how evaluation systems process errors across distinct performance contexts. We recorded from striatal-projecting dopamine (DA) neurons as male songbirds transitioned from singing alone to singing female-directed courtship song. In the presence of the female, singing-related performance error signals were reduced or gated off and DA neurons were instead phasically activated by female vocalizations. Mesostriatal DA neurons can thus dynamically change their tuning with changes in social context.
Article
Toshitaka Suzuki and Klaus Zuberbühler introduce the syntactical features found in the communication systems of non-human animals.
Book
This volume, The Basal Ganglia VII, is derived from the proceedings ofthe Seventh Triennial Meeting of the International Basal Ganglia Society (IBAGS). The Meeting was held from II - 15 February 2001 at The Copthorne Resort, Waitangi, Bay of Islands, New Zealand, the site of the signing of the Treaty ofWaitangi in 1840 and the traditional birth-place of the New Zealand Nation. As at previous Meetings, our aim was to hear and discuss new ideas and research developments on the basal ganglia and the implications of these findings for novel treatment strategies for basal ganglia disorders. The International Basal Ganglia Society (IBAGS) was founded in September 1983 when a small group of about 50 neuroscientists and clinicians with a passion for research on the basal ganglia met for a three day meeting in a small isolated seaside resort, Lome, 150km from Melbourne in Australia. The meeting was organised by John McKenzie and was so successful that the participants decided to establish IBAGS and to meet every 3 years at an isolated seaside resort in different countries of the world.