Contextual changes are local to the target sequences. (A) Transition diagram for the song of Bird 6 (spectrogram in Figure 1) in yellow probe context. Sequences of syllables with fixed transition patterns (e.g. 'aab') as well as repeat phrases and introductory notes have been summarized as single states to simplify the diagram. (B) Transition matrix for the same bird, showing same data as in (A). (C) Differences between the two contexts are illustrated by subtracting the transition matrix in the yellow context from the one in the green context, so that sequence transitions which are more frequent in green context are positive (colored green) and sequence transitions which are more frequent in yellow are negative (colored yellow). For this bird, the majority of contextual differences occurred at the branch point ('aab') which most closely preceded the target sequences ('ab-c' and 'ab-d'), while very little contextual difference occurred at the other three branch points ('i', 'wr', 'cr'). (D-F) Same for Bird 2 for which two different branch points ('f' and 'n') preceded the target sequences ('f-abcd' and 'n-abcd') (spectrogram in Figure 3). (G) Proportion of changes at the branch point(s) most closely preceding the target sequences, relative to the total magnitude of context differences for each bird (see Materials and methods). Most birds exhibited high specificity of contextual changes to the relevant branch points. Source data in Figure 4-source data 1. The online version of this article includes the following source data and figure supplement(s) for figure 4:

Contextual changes are local to the target sequences. (A) Transition diagram for the song of Bird 6 (spectrogram in Figure 1) in yellow probe context. Sequences of syllables with fixed transition patterns (e.g. 'aab') as well as repeat phrases and introductory notes have been summarized as single states to simplify the diagram. (B) Transition matrix for the same bird, showing same data as in (A). (C) Differences between the two contexts are illustrated by subtracting the transition matrix in the yellow context from the one in the green context, so that sequence transitions which are more frequent in green context are positive (colored green) and sequence transitions which are more frequent in yellow are negative (colored yellow). For this bird, the majority of contextual differences occurred at the branch point ('aab') which most closely preceded the target sequences ('ab-c' and 'ab-d'), while very little contextual difference occurred at the other three branch points ('i', 'wr', 'cr'). (D-F) Same for Bird 2 for which two different branch points ('f' and 'n') preceded the target sequences ('f-abcd' and 'n-abcd') (spectrogram in Figure 3). (G) Proportion of changes at the branch point(s) most closely preceding the target sequences, relative to the total magnitude of context differences for each bird (see Materials and methods). Most birds exhibited high specificity of contextual changes to the relevant branch points. Source data in Figure 4-source data 1. The online version of this article includes the following source data and figure supplement(s) for figure 4:

Source publication
Article
Full-text available
The flexible control of sequential behavior is a fundamental aspect of speech, enabling endless reordering of a limited set of learned vocal elements (syllables or words). Songbirds are phylogenetically distant from humans but share both the capacity for vocal learning and neural circuitry for vocal control that includes direct pallial-brainstem pr...

Contexts in source publication

Context 1
... example, for the experiment illustrated in Figure 1, the prevalence of the target sequence 'ab-d' was appropriately decreased in the yellow context, in which it was targeted. The complete transition diagram and corresponding transition matrix for this bird ( Figure 4A,B) reveal that there were four distinct branch points at which syllables were variably sequenced (after 'cr', 'wr', 'i', and 'aab'). Therefore, the decrease in the target sequence 'ab-d' could have resulted exclusively from an increase in the probability of the alternative transition 'ab-c' at the branch point following 'aab'. ...
Context 2
... a reduction in the prevalence of the target sequence could also have been achieved by changes in the probability of transitions earlier in song such that the sequence 'aab' was sung less frequently. To investigate the extent to which contextual changes in probability were specific to transitions immediately preceding target sequences, we calculated the difference between transition matrices in the yellow and green probe contexts ( Figure 4C). This difference matrix indicates that changes to transition probabilities were highly specific to the branch point immediately preceding the target sequences (specificity was defined as the proportion of total changes which could be attributed to the branch points immediately preceding target sequences; specificity for branch point 'aab' was 83.2%). ...
Context 3
... difference matrix indicates that changes to transition probabilities were highly specific to the branch point immediately preceding the target sequences (specificity was defined as the proportion of total changes which could be attributed to the branch points immediately preceding target sequences; specificity for branch point 'aab' was 83.2%). Such specificity to branch points that immediately precede target sequences was typical across experiments, including cases in which different branch points preceded each target sequence ( Figure 4D-F, specificity 96.9%). Across all eight experiments, the median specificity of changes to the most proximal branch points was 84.95%, and only one bird, which was also the worst learner in the contextual training paradigm, had a specificity of less than 50% ( Figure 4G). ...
Context 4
... specificity to branch points that immediately precede target sequences was typical across experiments, including cases in which different branch points preceded each target sequence ( Figure 4D-F, specificity 96.9%). Across all eight experiments, the median specificity of changes to the most proximal branch points was 84.95%, and only one bird, which was also the worst learner in the contextual training paradigm, had a specificity of less than 50% ( Figure 4G). Hence, contextual changes were specific to target sequences and did not reflect the kind of global sequencing changes that characterize innate social modulation of song structure ( Sakata et al., 2008;Sossinka and Bö hner, 1980). ...
Context 5
... context differences were split into 'targeted branch points', i.e., the branch point or branch points most closely preceding the target sequences in the two contexts, and 'non-targeted branch points', i.e., all other branch points in the song. We calculated the proportion of absolute contextual difference in the transition matrix that fell to the targeted branch points, for example for the matrix in Figure 4C (44 + 45)/(44 + 45 + 6+6 + 1+1 + 2+2)=83.2%. Typically, birds with clear contextual differences at the target sequence also had high specificity of sequence changes to the targeted branch points. ...

Citations

... Conversely, rapid adaptation in response to changing environmental contexts has been demonstrated in both birdsong [67][68][69] and human speech [70]. This apparent plasticity of a well learned, crystalized behavior suggests that vocalization may be controlled by a "malleable template," in which trial-by-trial variability is used to adapt learned behaviors [71]. ...
Article
Full-text available
To successfully traverse their environment, humans often perform maneuvers to achieve desired task goals while simultaneously maintaining balance. Humans accomplish these tasks primarily by modulating their foot placements. As humans are more unstable laterally, we must better understand how humans modulate lateral foot placement. We previously developed a theoretical framework and corresponding computational models to describe how humans regulate lateral stepping during straight-ahead continuous walking. We identified goal functions for step width and lateral body position that define the walking task and determine the set of all possible task solutions as Goal Equivalent Manifolds (GEMs). Here, we used this framework to determine if humans can regulate lateral stepping during non -steady-state lateral maneuvers by minimizing errors consistent with these goal functions. Twenty young healthy adults each performed four lateral lane-change maneuvers in a virtual reality environment. Extending our general lateral stepping regulation framework, we first re-examined the requirements of such transient walking tasks. Doing so yielded new theoretical predictions regarding how steps during any such maneuver should be regulated to minimize error costs, consistent with the goals required at each step and with how these costs are adapted at each step during the maneuver. Humans performed the experimental lateral maneuvers in a manner consistent with our theoretical predictions. Furthermore, their stepping behavior was well modeled by allowing the parameters of our previous lateral stepping models to adapt from step to step. To our knowledge, our results are the first to demonstrate humans might use evolving cost landscapes in real time to perform such an adaptive motor task and, furthermore, that such adaptation can occur quickly–over only one step. Thus, the predictive capabilities of our general stepping regulation framework extend to a much greater range of walking tasks beyond just normal, straight-ahead walking.
... Also similar to humans, songbirds have evolved specialized neural pathways for vocal learning, allowing the precise interrogation of the brain mechanisms of song plasticity (Jarvis, 2019;Brainard and Doupe, 2002). However, prior research on this brain network has focused almost exclusively on the role of auditory feedback, although recent work has shown the importance of visual cues (light) in shaping vocalizations (Veit et al., 2021;Zai et al., 2020). Previous studies have revealed that songbird brains have a basal ganglia-thalamocortical circuit, the anterior forebrain pathway (AFP), that is required for auditory-guided vocal learning but not vocal production (Figure 1a; Brainard and Doupe, 2000;Nordeen and Nordeen, 1993;Mooney, 2009;Bottjer et al., 1984). ...
Article
Full-text available
Songbirds and humans share the ability to adaptively modify their vocalizations based on sensory feedback. Prior studies have focused primarily on the role that auditory feedback plays in shaping vocal output throughout life. In contrast, it is unclear how non-auditory information drives vocal plasticity. Here, we first used a reinforcement learning paradigm to establish that somatosensory feedback (cutaneous electrical stimulation) can drive vocal learning in adult songbirds. We then assessed the role of a songbird basal ganglia thalamocortical pathway critical to auditory vocal learning in this novel form of vocal plasticity. We found that both this circuit and its dopaminergic inputs are necessary for non-auditory vocal learning, demonstrating that this pathway is critical for guiding adaptive vocal changes based on both auditory and somatosensory signals. The ability of this circuit to use both auditory and somatosensory information to guide vocal learning may reflect a general principle for the neural systems that support vocal plasticity across species.
... Vocalizations are generally considered to be involuntary when they appear to be consistently and "automatically" triggered by specific stimuli or contexts, and inborn when they are produced by individuals who have had no opportunity to learn them (Arriaga and Jarvis 2013). Determining whether a particular vocal act is involuntary and/or learned is a nontrivial task, and a variety of experimental procedures have been developed to clarify where specific vocalizations fall along the spectrum of control (e.g., Kloepper et al. 2014;Rose et al. 2022;Veit et al. 2021;Zhang and Ghazanfar 2019). Observational data have also been used to distinguish voluntary from involuntary vocalizations. ...
Article
Full-text available
Singing humpback whales are highly versatile vocalizers, producing complex sequences of sounds that they vary throughout adulthood. Past analyses of humpback whale song have emphasized yearly variations in structural features of songs made collectively by singers within a population with comparatively little attention given to the ways that individual singers vary consecutive songs. As a result, many researchers describe singing by humpback whales as a process in which singers produce sequences of repeating sound patterns. Here, we show that such characterizations misrepresent the degree to which humpback whales flexibly and dynamically control the production of sounds and sound patterns within song sessions. Singers recorded off the coast of Hawaii continuously morphed units along multiple acoustic dimensions, with the degree and direction of morphing varying across parallel streams of successive units. Individual singers also produced multiple phrase variants (structurally similar, but acoustically distinctive sequences) within song sessions. The precision with which individual singers maintained some acoustic properties of phrases and morphing trajectories while flexibly changing others suggests that singing humpback whales actively select and adjust acoustic elements of their songs in real time rather than simply repeating stereotyped sound patterns within song sessions.
... No other mammalian species produce vocalizations that are as richly structured as human language. Many songbird species could produce songs that have complex syntax with syllable sequences following certain rules [1][2][3][4], which are flexible in different contexts [5]. Therefore, songbirds have been widely studied as an animal model to investigate the evolution and neural mechanisms of complex vocal sequencing [6,7]. ...
Preprint
Full-text available
Vocal sequencing is a key element in human speech. Songbirds have been widely studied as an animal model to investigate neural mechanisms of vocal sequencing, due to the complex syntax of syllable sequences in their songs. However, songbirds are phylogenetically distant from humans. So far, there is little evidence of complex syntactic vocalizations in non-human primates. Here, we analyze phee sounds produced by 160 marmoset monkeys either in isolation or during vocal turn-taking and reveal complex sequencing rules at multiple levels. First, phee syllables exhibited consistent interval patterns among different marmosets, allowing categorization of calls with single and closely spaced 2-4 syllables into 4 grades. Second, the ordering of sequential calls followed distinct probabilistic rules that preferring repetition of the same-grade call and then transition between calls of adjacent grades, but not skip-grade transition. Moreover, inter-call intervals depended on the transition direction. Third, specific ABnA call patterns were discovered to be prominent in long call sequences, and their occurrence exhibited a power-law decrease with increasing 'n', reflecting a long-range sequencing rule in the dependence of later calls on the pattern of earlier calls. Finally, syllable and call intervals as well as call compositions were significantly modified during vocal turn-taking. This complex syntax of vocal sequences in marmosets offers opportunities for understanding the evolutionary origin and neural mechanisms of grammatical complexity in human language.
... . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Conversely, rapid adaptation in response to changing environmental contexts has been demonstrated in both 372 birdsong [67][68][69] and human speech [70]. This apparent plasticity of a well learned, crystalized behavior 373 suggests that vocalization may be controlled by a "malleable template," in which trial-by-trial variability is 374 used to adapt learned behaviors [71]. ...
Preprint
Full-text available
To successfully traverse their environment, humans often perform maneuvers to achieve desired task goals while simultaneously maintaining balance. Humans accomplish these tasks primarily by modulating their foot placements. As humans are more unstable laterally, we must better understand how humans modulate lateral foot placement. We previously developed a theoretical framework and corresponding computational models to describe how humans regulate lateral stepping during straight-ahead continuous walking. We identified goal functions for step width and lateral body position that define the walking task and determine the set of all possible task solutions as Goal Equivalent Manifolds (GEMs). Here, we used this framework to determine if humans can regulate lateral stepping during non -steady-state lateral maneuvers by minimizing errors consistent with these goal functions. Twenty young healthy adults each performed four lateral lane-change maneuvers in a virtual reality environment. Extending our general lateral stepping regulation framework, we first re-examined the requirements of such transient walking tasks. Doing so yielded new theoretical predictions regarding how steps during any such maneuver should be regulated to minimize error costs, consistent with the goals required at each step and with how these costs are adapted at each step during the maneuver. Humans performed the experimental lateral maneuvers in a manner consistent with our theoretical predictions. Furthermore, their stepping behavior was well modeled by allowing the parameters of our previous lateral stepping models to adapt from step to step. To our knowledge, our results are the first to demonstrate humans might use evolving cost landscapes in real time to perform such an adaptive motor task and, furthermore, that such adaptation can occur quickly – over only one step. Thus, the predictive capabilities of our general stepping regulation framework extend to a much greater range of walking tasks beyond just normal, straight-ahead walking. AUTHOR SUMMARY When we walk in the real world, we rarely walk continuously in a straight line. Indeed, we regularly have to perform other tasks like stepping aside to avoid an obstacle in our path (either fixed or moving, like another person coming towards us). While we have to be highly maneuverable to accomplish such tasks, we must also maintain balance to avoid falling while doing so. This is challenging because walking humans are inherently more unstable side-to-side. Sideways falls are particularly dangerous for older adults as they can lead to hip fractures. Here, we establish a theoretical basis for how people might accomplish such maneuvers. We show that humans execute a simple lateral lane-change maneuver consistent with our theoretical predictions. Importantly, our simulations show they can do so by adapting at each step the same step-to-step regulation strategies they use to walk straight ahead. Moreover, these same control processes also explain how humans trade-off side-to-side stability to gain the maneuverability they need to perform such lateral maneuvers.
... In combination with such methods, supervised machine learning models have been used to successfully annotate large-scale behavioral experiments (e.g. Veit et al., 2021). But these additional clean-up steps add complexity and require the researcher to perform further tuning and validation. ...
Article
Full-text available
Songbirds provide a powerful model system for studying sensory-motor learning. However, many analyses of birdsong require time-consuming, manual annotation of its elements, called syllables. Automated methods for annotation have been proposed, but these methods assume that audio can be cleanly segmented into syllables, or they require carefully tuning multiple statistical models. Here we present TweetyNet: a single neural network model that learns how to segment spectrograms of birdsong into annotated syllables. We show that TweetyNet mitigates limitations of methods that rely on segmented audio. We also show that TweetyNet performs well across multiple individuals from two species of songbirds, Bengalese finches and canaries. Lastly, we demonstrate that using TweetyNet we can accurately annotate very large datasets containing multiple days of song, and that these predicted annotations replicate key findings from behavioral studies. In addition, we provide open-source software to assist other researchers, and a large dataset of annotated canary song that can serve as a benchmark. We conclude that TweetyNet makes it possible to address a wide range of new questions about birdsong.
Article
Full-text available
The zebra finch (ZF) and the Bengalese finch (BF) are animal models that have been commonly used for neurobiological studies on vocal learning. Although they largely share the brain structure for vocal learning and production, BFs produce more complex and variable songs than ZFs, providing a great opportunity for comparative studies to understand how animals learn and control complex motor behaviors. Here, we performed a comparative study between the two species by focusing on intrinsic motivation for non-courtship singing (“undirected singing”), which is critical for the development and maintenance of song structure. A previous study has demonstrated that ZFs dramatically increase intrinsic motivation for undirected singing when singing is temporarily suppressed by a dark environment. We found that the same procedure in BFs induced the enhancement of intrinsic singing motivation to much smaller degrees than that in ZFs. Moreover, unlike ZFs that rarely sing in dark conditions, substantial portion of BFs exhibited frequent singing in darkness, implying that such “dark singing” may attenuate the enhancement of intrinsic singing motivation during dark periods. In addition, measurements of blood corticosterone levels in dark and light conditions provided evidence that although BFs have lower stress levels than ZFs in dark conditions, such lower stress levels in BFs are not the major factor responsible for their frequent dark singing. Our findings highlight behavioral and physiological differences in spontaneous singing behaviors of BFs and ZFs and provide new insights into the interactions between singing motivation, ambient light, and environmental stress.
Preprint
Full-text available
Many sequenced behaviors, including locomotion, reaching, and vocalization, are patterned differently in different contexts, enabling animals to adjust to their current environments. However, how contextual information shapes neural activity to flexibly alter the patterning of actions is not yet understood. Prior work indicates such flexibility could be achieved via parallel motor circuits, with differing sensitivities to sensory context [1, 2, 3]; instead we demonstrate here how a single neural pathway operates in two different regimes dependent on recent sensory history. We leverage the Drosophila song production system [4] to investigate the neural mechanisms that support male song sequence generation in two contexts: near versus far from the female. While previous studies identified several song production neurons [5, 6, 7, 8], how these neurons are organized to mediate song patterning was unknown. We find that male flies sing 'simple' trains of only one syllable or mode far from the female but complex song sequences consisting of alternations between modes when near to her. We characterize the male song circuit from the brain to the ventral nerve cord (VNC), and find that the VNC song pre-motor circuit is shaped by two key computations: mutual inhibition and rebound excitability [9] between nodes driving the two modes of song. Weak sensory input to a direct brain-to-VNC excitatory pathway (via pC2 brain and pIP10 descending neurons) drives simple song far from the female. Strong sensory input to the same pathway enables complex song production via simultaneous recruitment of P1a neuron-mediated disinhibition of the VNC song pre-motor circuit. Thus, proximity to the female effectively unlocks motor circuit dynamics in the correct sensory context. We construct a compact circuit model to demonstrate that these few computations are sufficient to replicate natural context-dependent song dynamics. These results have broad implications for neural population-level models of context-dependent behavior [10] and highlight that canonical circuit motifs [11, 12, 13] can be combined in novel ways to enable circuit flexibility required for dynamic communication.