Article

It’s all in the sound and in the brain: A comment on Bannan, Dunbar, and Bamford 2024.

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Vocalizations differ substantially between the sexes in many primates, and low-frequency male vocalizations may be favored by sexual selection because they intimidate rivals and/or attract mates. Sexual dimorphism in fundamental frequency may be more pronounced in species with more intense male mating competition and in those with large group size, where social knowledge is limited and efficient judgment of potential mates and competitors is crucial. These non-mutually exclusive explanations have not been tested simultaneously across primate species. In a sample of vocalizations (n = 1914 recordings) across 37 anthropoid species, we investigated whether fundamental frequency dimorphism evolved in association with increased intensity of mating competition (H1), large group size (H2), multilevel social organization (H3), a trade-off against the intensity of sperm competition (H4), and/or poor acoustic habitats (H5), controlling for phylogeny and body size dimorphism. We show that fundamental frequency dimorphism increased in evolutionary transitions towards larger group size and polygyny. Findings suggest that low-frequency male vocalizations in primates may have been driven by selection to win mating opportunities by avoiding costly fights and may be more important in larger groups, where limited social knowledge affords advantages to rapid assessment of status and threat potential via conspicuous secondary sexual characteristics.
Preprint
Full-text available
Since Darwin (1871), researchers have proposed that musicality evolved in a reproductive context in which males produce music to signal their mate quality. The extent to which evidence supports this contention, however, remains unclear. Related traits in many non-human animals are sexually differentiated, and while some sex differences in human auditory perception have been documented, the pattern of results is murky. Here, we study melodic discrimination, mistuning perception, and beat alignment perception in 360,009 men and 194,291 women from 208 countries. We find that, in contrast to other non-music human traits, and in contrast to non-human traits, there was no overall advantage for either sex, and the observed sex differences were minuscule (Cohen`s d: 0.009 - 0.11) and of inconsistent direction. These results do not provide compelling support for human music perception being a sexually dimorphic trait, and therefore it is unlikely to have been shaped by sexual selection.
Article
Full-text available
Characterizing non-human primate social complexity and its cognitive bases has proved challenging. Using principal component analyses, we show that primate social, ecological and reproductive behaviours condense into two components: socioecological complexity (including most social and ecological variables) and reproductive cooperation (comprising mainly a suite of behaviours associated with pairbonded monogamy). We contextualize these results using a meta-analysis of 44 published analyses of primate brain evolution. These studies yield two main consistent results: cognition, sociality and cooperative behaviours are associated with absolute brain volume, neocortex size and neocortex ratio, whereas diet composition and life history are consistently associated with relative brain size. We use a path analysis to evaluate the causal relationships among these variables, demonstrating that social group size is predicted by the neocortex, whereas ecological traits are predicted by the volume of brain structures other than the neocortex. That a range of social and technical behaviours covary, and are correlated with social group size and brain size, suggests that primate cognition has evolved along a continuum resulting in an increasingly flexible, domain-general capacity to solve a range of socioecological challenges culminating in a capacity for, and reliance on, innovation and social information use in the great apes and humans. This article is part of the theme issue ‘Cognition, communication and social bonds in primates’.
Article
Full-text available
Albert Feng was a prominent comparative neurophysiologist whose research provided numerous contributions towards understanding how the spectral and temporal characteristics of vocalizations underlie sound communication in frogs and bats. The present study is dedicated to Al’s memory and compares the spectral and temporal representations of stochastic, complex sounds which underlie the perception of pitch strength in humans and chinchillas. Specifically, the pitch strengths of these stochastic sounds differ between humans and chinchillas, suggesting that humans and chinchillas may be using different cues. Outputs of auditory filterbank models based on human and chinchilla cochlear tuning were examined. Excitation patterns of harmonics are enhanced in humans as compared with chinchillas. In contrast, summary correlograms are degraded in humans as compared with chinchillas. Comparing summary correlograms and excitation patterns with corresponding behavioral data on pitch strength suggests that the dominant cue for pitch strength in humans is spectral (i.e., harmonic) structure, whereas the dominant cue for chinchillas is temporal (i.e., envelope) structure. The results support arguments that the broader cochlear tuning in non-human mammals emphasizes temporal cues for pitch perception, whereas the sharper cochlear tuning in humans emphasizes spectral cues.
Article
Full-text available
When interacting with infants, humans often alter their speech and song in ways thought to support communication. Theories of human child-rearing, informed by data on vocal signalling across species, predict that such alterations should appear globally. Here, we show acoustic differences between infant-directed and adult-directed vocalizations across cultures. We collected 1,615 recordings of infant- and adult-directed speech and song produced by 410 people in 21 urban, rural and small-scale societies. Infant-directedness was reliably classified from acoustic features only, with acoustic profiles of infant-directedness differing across language and music but in consistent fashions. We then studied listener sensitivity to these acoustic features. We played the recordings to 51,065 people from 187 countries, recruited via an English-language website, who guessed whether each vocalization was infant-directed. Their intuitions were more accurate than chance, predictable in part by common sets of acoustic features and robust to the effects of linguistic relatedness between vocalizer and listener. These findings inform hypotheses of the psychological functions and evolution of human communication.
Article
Full-text available
This review explores the role of oxytocin in the mediation of select social behaviours, with particular emphasis on female rodents. These behaviours include social recognition, social learning, pathogen detection and avoidance, and maternal care. Specific brain regions where oxytocin has been shown to directly mediate various aspects of these social behaviours, as well as other proposed regions, are discussed. Possible interactions between oxytocin and other regulatory systems, in particular that of oestrogens and dopamine, in the modulation of social behaviour are considered. Similarities and differences between males and females are highlighted. This article is part of the theme issue ‘Interplays between oxytocin and other neuromodulators in shaping complex social behaviours’.
Article
Full-text available
The influence of neuromodulators on brain activity and behaviour is undeniably profound, yet our knowledge of the underlying mechanisms, or ability to reliably reproduce effects across varying conditions, is still lacking. Oxytocin, a hormone that acts as a neuromodulator in the brain, is an example of this quandary; it powerfully shapes behaviours across nearly all mammalian species, yet when manipulated exogenously can produce unreliable or sometimes unexpected behavioural results across varying contexts. While current research is rapidly expanding our understanding of oxytocin, interactions between oxytocin and other neuromodulatory systems remain underappreciated in the current literature. This review highlights interactions between oxytocin and the opioid system that serve to influence social behaviour and proposes a parallel-mechanism hypothesis to explain the supralinear effects of combinatorial neuropharmacological approaches. This article is part of the theme issue ‘Interplays between oxytocin and other neuromodulators in shaping complex social behaviours’.
Article
Full-text available
Octave equivalence describes the perception that notes separated by a doubling in frequency sound similar. While the octave is used cross-culturally as a basis of pitch perception, experimental demonstration of the phenomenon has proved to be difficult. In past work, members of our group developed a three-range generalization paradigm that reliably demonstrated octave equivalence. In this study we replicate and expand on this previous work trying to answer three questions that help us understand the origins and potential cross-cultural significance of octave equivalence: (1) whether training with three ranges is strictly necessary or whether an easier-to-learn two-range task would be sufficient, (2) whether the task could demonstrate octave equivalence beyond neighbouring octaves, and (3) whether language skills and musical education impact the use of octave equivalence in this task. We conducted a large-sample study using variations of the original paradigm to answer these questions. Results found here suggest that the three-range discrimination task is indeed vital to demonstrating octave equivalence. In a two-range task, pitch height appears to be dominant over octave equivalence. Octave equivalence has an effect only when pitch height alone is not sufficient. Results also suggest that effects of octave equivalence are strongest between neighbouring octaves, and that tonal language and musical training have a positive effect on learning of discriminations but not on perception of octave equivalence during testing. We discuss these results considering their relevance to future research and to ongoing debates about the basis of octave equivalence perception.
Article
Full-text available
This study continues investigating the consonance-pattern emerging neural network model introduced in our previous publication, specifically to test if it will reproduce the results using 100-fold finer precision of 1/100th of a semitone (1 cent). The model is a simplistic feed-forward generic Hebbian-learning generic neural network trained with multiple-harmonic complex sounds from the full auditory sound spectrum of 10 octaves. We use the synaptic weights between the neural correlates of each two-tone from the said spectrum to measure the model’s preference to their inter-tonal interval (12,0002 intervals), considering familiarity as a consonance predictor. We analyze all the 12,000 intervals of a selected tone (the tonic), and the results reveal three distinct yet related features. Firstly, Helmholtz’s list of consonant intervals re-emerges from the synaptic weights of the model, although with disordered dissonant intervals. Additionally, the results show a high preference to a small number of selected intervals, mapping the virtually continual input sound spectrum to a discrete set of intervals. Finally, the model's most preferred (most consonant) intervals are from the Just Intonation scales. The model does not need to use cross-octave interval mapping due to octave equivalence to produce the said results.
Article
Full-text available
Frequency-to-place mapping, or tonotopy, is a fundamental organizing principle throughout the auditory system, from the earliest stages of auditory processing in the cochlea to subcortical and cortical regions. Although cortical maps are referred to as tonotopic, it is unclear whether they simply reflect a mapping of physical frequency inherited from the cochlea, a computation of pitch based on the fundamental frequency, or a mixture of these two features. We used high-resolution functional magnetic resonance imaging (fMRI) to measure BOLD responses as male and female human participants listened to pure tones that varied in frequency or complex tones that varied in either spectral content (brightness) or fundamental frequency (pitch). Our results reveal evidence for pitch tuning in bilateral regions that partially overlap with the traditional tonotopic maps of spectral content. In general, primary regions within Heschl's gyri (HGs) exhibited more tuning to spectral content, whereas areas surrounding HGs exhibited more tuning to pitch. SIGNIFICANCE STATEMENT Tonotopy, an orderly mapping of frequency, is observed throughout the auditory system. However, it is not known whether the tonotopy observed in the cortex simply reflects the frequency spectrum (as in the ear) or instead represents the higher-level feature of fundamental frequency, or pitch. Using carefully controlled stimuli and high-resolution functional magnetic resonance imaging (fMRI), we separated these features to study their cortical representations. Our results suggest that tonotopy in primary cortical regions is driven predominantly by frequency, but also reveal evidence for tuning to pitch in regions that partially overlap with the tonotopic gradients but extend into nonprimary cortical areas. In addition to resolving ambiguities surrounding cortical tonotopy, our findings provide evidence that selectivity for pitch is distributed bilaterally throughout auditory cortex.
Article
Full-text available
Fundamental frequency ( f o ), perceived as voice pitch, is the most sexually dimorphic, perceptually salient and intensively studied voice parameter in human nonverbal communication. Thousands of studies have linked human f o to biological and social speaker traits and life outcomes, from reproductive to economic. Critically, researchers have used myriad speech stimuli to measure f o and infer its functional relevance, from individual vowels to longer bouts of spontaneous speech. Here, we acoustically analysed f o in nearly 1000 affectively neutral speech utterances (vowels, words, counting, greetings, read paragraphs and free spontaneous speech) produced by the same 154 men and women, aged 18–67, with two aims: first, to test the methodological validity of comparing f o measures from diverse speech stimuli, and second, to test the prediction that the vast inter-individual differences in habitual f o found between same-sex adults are preserved across speech types. Indeed, despite differences in linguistic content, duration, scripted or spontan­­eous production and within-individual variability, we show that 42–81% of inter-individual differences in f o can be explained between any two speech types. Beyond methodological implications, together with recent evidence that inter-individual differences in f o are remarkably stable across the lifespan and generalize to emotional speech and nonverbal vocalizations, our results further substantiate voice pitch as a robust and reliable biomarker in human communication.
Article
Full-text available
Although auditory harmonic distortion has been demonstrated psychophysically in humans and electrophysiologically in experimental animals, the cellular origin of the mechanical harmonic distortion remains unclear. To demonstrate the outer hair cell-generated harmonics within the organ of Corti, we measured sub-nanometer vibrations of the reticular lamina from the apical ends of the outer hair cells in living gerbil cochleae using a custom-built heterodyne low-coherence interferometer. The harmonics in the reticular lamina vibration are significantly larger and have broader spectra and shorter latencies than those in the basilar membrane vibration. The latency of the second harmonic is significantly greater than that of the fundamental at low stimulus frequencies. These data indicate that the mechanical harmonics are generated by the outer hair cells over a broad cochlear region and propagate from the generation sites to their own best-frequency locations.
Article
Full-text available
The C-tactile (CLTM) peripheral nervous system is involved in social bonding in primates and humans through its capacity to trigger the brain’s endorphin system. Since the mammalian cochlea has an unusually high density of similar neurons (type-II spiral ganglion neurons, SGNs), we hypothesise that their function may have been exploited for social bonding by co-opting head movements in response to music and other rhythmic movements of the head in social contexts. Music provides one of many cultural behavioural mechanisms for ‘virtual grooming’ in that it is used to trigger the endorphin system with many people simultaneously so as to bond both dyadic relationships and large groups. Changes in pain threshold across an activity are a convenient proxy assay for endorphin uptake in the brain, and we use this, in two experiments, to show that pain thresholds are higher when nodding the head than when sitting still.
Article
Full-text available
Compared to most other mammals and birds, anthropoid primates have unusually complex societies characterised by bonded social groups. Among primates, this effect is encapsulated in the social brain hypothesis: the robust correlation between various indices of social complexity (social group size, grooming clique size, tactical behaviour, coalition formation) and brain size. Hitherto, this has always been interpreted as a simple, unitary relationship. Using data for five different indices of brain volume from four independent brain databases, we show that the distribution of group size plotted against brain size is best described as a set of four distinct, very narrowly defined grades which are unrelated to phylogeny. The allocation of genera to these grades is highly consistent across the different data sets and brain indices. We show that these grades correspond to the progressive evolution of bonded social groups. In addition, we show, for those species that live in multilevel social systems, that the typical sizes of the different grouping levels in each case coincide with different grades. This suggests that the grades correspond to demographic attractors that are especially stable. Using five different cognitive indices, we show that the grades correlate with increasing social cognitive skills, suggesting that the cognitive demands of managing group cohesion increase progressively across grades. We argue that the grades themselves represent glass ceilings on animals' capacity to maintain social and spatial coherence during foraging and that, in order to evolve more highly bonded groups, species have to be able to invest in costly forms of cognition.
Article
Full-text available
The human species possesses two complementary, yet distinct, universal communication systems—language and music. Functional imaging studies have revealed that some core elements of these two systems are processed in closely related brain regions, but there are also clear differences in brain circuitry that likely underlie differences in functionality. Music affects many aspects of human behavior, especially in encouraging prosocial interactions and promoting trust and cooperation within groups of culturally compatible but not necessarily genetically related individuals. Music, presumably via its impact on the limbic system, is also rewarding and motivating, and music can facilitate aspects of learning and memory. In this review these special characteristics of music are considered in light of recent research on the neuroscience of the peptide oxytocin, a hormone that has both peripheral and central actions, that plays a role in many complex human behaviors, and whose expression has recently been reported to be affected by music-related activities. I will first briefly discuss what is currently known about the peptide’s physiological actions on neurons and its interactions with other neuromodulator systems, then summarize recent advances in our knowledge of the distribution of oxytocin and its receptor (OXTR) in the human brain. Next, the complex links between oxytocin and various social behaviors in humans are considered. First, how endogenous oxytocin levels relate to individual personality traits, and then how exogenous, intranasal application of oxytocin affects behaviors such as trust, empathy, reciprocity, group conformity, anxiety, and overall social decision making under different environmental conditions. It is argued that many of these characteristics of oxytocin biology closely mirror the diverse effects that music has on human cognition and emotion, providing a link to the important role music has played throughout human evolutionary history and helping to explain why music remains a special prosocial human asset. Finally, it is suggested that there is a potential synergy in combining oxytocin- and music-based strategies to improve general health and aid in the treatment of various neurological dysfunctions.
Article
Full-text available
Why do humans make music? Theories of the evolution of musicality have focused mainly on the value of music for specific adaptive contexts such as mate selection, parental care, coalition signaling, and group cohesion. Synthesizing and extending previous proposals, we argue that social bonding is an overarching function that unifies all of these theories, and that musicality enabled social bonding at larger scales than grooming and other bonding mechanisms available in ancestral primate societies. We combine cross-disciplinary evidence from archaeology, anthropology, biology, musicology, psychology, and neuroscience into a unified framework that accounts for the biological and cultural evolution of music. We argue that the evolution of musicality involves gene-culture coevolution, through which proto-musical behaviors that initially arose and spread as cultural inventions had feedback effects on biological evolution due to their impact on social bonding. We emphasize the deep links between production, perception, prediction, and social reward arising from repetition, synchronization, and harmonization of rhythms and pitches, and summarize empirical evidence for these links at the levels of brain networks, physiological mechanisms, and behaviors across cultures and across species. Finally, we address potential criticisms and make testable predictions for future research, including neurobiological bases of musicality and relationships between human music, language, animal song, and other domains. The music and social bonding (MSB) hypothesis provides the most comprehensive theory to date of the biological and cultural evolution of music.
Chapter
Full-text available
Synopsis The evolutionary origins of the modern mammalian ear structures can be traced back into their phylogenetic predecessors in the Mesozoic Era. This evolutionary history shows a step-wise acquisition of middle- and inner-ear structures along separate Mesozoic mammal lineages that led to convergence of several derived characters correlated with distinct hearing functions in extant mammals.
Article
Full-text available
Music perception is plausibly constrained by universal perceptual mechanisms adapted to natural sounds. Such constraints could arise from our dependence on harmonic frequency spectra for segregating concurrent sounds, but evidence has been circumstantial. We measured the extent to which concurrent musical notes are misperceived as a single sound, testing Westerners as well as native Amazonians with limited exposure to Western music. Both groups were more likely to mistake note combinations related by simple integer ratios as single sounds (‘fusion’). Thus, even with little exposure to Western harmony, acoustic constraints on sound segregation appear to induce perceptual structure on note combinations. However, fusion did not predict aesthetic judgments of intervals in Westerners, or in Amazonians, who were indifferent to consonance/dissonance. The results suggest universal perceptual mechanisms that could help explain cross-cultural regularities in musical systems, but indicate that these mechanisms interact with culture-specific influences to produce musical phenomena such as consonance. Music varies across cultures, but some features are widespread, consistent with biological constraints. Here, the authors report that both Western and native Amazonian listeners perceptually fuse concurrent notes related by simple-integer ratios, suggestive of one such biological constraint.
Article
Full-text available
Fundamental frequency (F0, perceived as voice pitch) predicts sex and age, hormonal status, mating success and a range of social traits, and thus functions as an important biosocial marker in modal speech. Yet, the role of F0 in human nonverbal vocalizations remains unclear, and given considerable variability in F0 across call types, it is not known whether F0 cues to vocalizer attributes are shared across speech and nonverbal vocalizations. Here, using a corpus of vocal sounds from 51 men and women, we examined whether individual differences in F0 are retained across neutral speech, valenced speech and nonverbal vocalizations (screams, roars and pain cries). Acoustic analyses revealed substantial variability in F0 across vocal types, with mean F0 increasing as much as 10-fold in screams compared to speech in the same individual. Despite these extreme pitch differences, sexual dimorphism was preserved within call types and, critically, inter-individual differences in F0 correlated across vocal types (r = 0.36-0.80) with stronger relationships between vocal types of the same valence (e.g. 38% of the variance in roar F0 was predicted by aggressive speech F0). Our results indicate that biologically and socially relevant indexical cues in the human voice are preserved in simulated valenced speech and vocalizations, including vocalizations characterized by extreme F0 modulation, suggesting that voice pitch may function as a reliable individual and biosocial marker across disparate communication contexts.
Article
Full-text available
We contrast two related hypotheses of the evolution of dance: H1: Maternal bipedal walking influenced the fetal experience of sound and associated movement patterns; H2: The human transition to bipedal gait produced more isochronous/predictable locomotion sound resulting in early music-like behavior associated with the acoustic advantages conferred by moving bipedally in pace. The cadence of walking is around 120 beats per minute, similar to the tempo of dance and music. Human walking displays long-term constancies. Dyads often subconsciously synchronize steps. The major amplitude component of the step is a distinctly produced beat. Human locomotion influences, and interacts with, emotions, and passive listening to music activates brain motor areas. Across dance-genres the footwork is most often performed in time to the musical beat. Brain development is largely shaped by early sensory experience, with hearing developed from week 18 of gestation. Newborns reacts to sounds, melodies, and rhythmic poems to which they have been exposed in utero. If the sound and vibrations produced by footfalls of a walking mother are transmitted to the fetus in coordination with the cadence of the motion, a connection between isochronous sound and rhythmical movement may be developed. Rhythmical sounds of the human mother locomotion differ substantially from that of nonhuman primates, while the maternal heartbeat heard is likely to have a similar isochronous character across primates, suggesting a relatively more influential role of footfall in the development of rhythmic/musical abilities in humans. Associations of gait, music, and dance are numerous. The apparent absence of musical and rhythmic abilities in nonhuman primates, which display little bipedal locomotion, corroborates that bipedal gait may be linked to the development of rhythmic abilities in humans. Bipedal stimuli in utero may primarily boost the ontogenetic development. The acoustical advantage hypothesis proposes a mechanism in the phylogenetic development.
Article
Full-text available
Cross-cultural analysis of song It is unclear whether there are universal patterns to music across cultures. Mehr et al. examined ethnographic data and observed music in every society sampled (see the Perspective by Fitch and Popescu). For songs specifically, three dimensions characterize more than 25% of the performances studied: formality of the performance, arousal level, and religiosity. There is more variation in musical behavior within societies than between societies, and societies show similar levels of within-society variation in musical behavior. At the same time, one-third of societies significantly differ from average for any given dimension, and half of all societies differ from average on at least one dimension, indicating variability across cultures. Science , this issue p. eaax0868 ; see also p. 944
Article
Full-text available
Previous research suggests that judgments about a male speaker’s trustworthiness vary due to the speaker’s voice pitch (mean F0) and differ across domains. Mixed results in terms of the direction and extent of such effects have been reported, however. Moreover, no study so far has investigated whether men’s mean F0 is, indeed, a valid cue to their self-reported and behavioral trustworthiness, and whether trustworthiness judgments are accurate. We tested the relation between mean F0 and actual general, economic and mating-related trustworthiness in 181 men, as well as trustworthiness judgments of 95 perceivers across all three domains. Analyses show that men’s mean F0 is not related to Honesty-Humility (as a trait indicator of general trustworthiness), trustworthy intentions, or trust game behavior, suggesting no relation of mean F0 to general or economic trustworthiness. In contrast, results suggest that mean F0 might be related to mating-related trustworthiness (as indicated by self-reported relationship infidelity). However, lower mean F0 was judged as more trustworthy in economic, but less trustworthy in mating-related domains and rather weakly related to judgments of general trustworthiness. Trustworthiness judgments were not accurate for general or economic trustworthiness, but exploratory analyses suggest that women might be able to accurately judge men’s relationship infidelity based on their voice pitch. Next to these analyses, we report exploratory analyses involving and controlling for additional voice parameters.
Article
Full-text available
Despite a long history of study, consensus on a human-typical mating system remains elusive. While a simple classification would be useful for cross-species comparisons, monogamous, polyandrous, and polygynous marriage systems exist across contemporary human societies. Moreover, sexual relationships occur outside of or in tandem with marriage, resulting in most societies exhibiting multiple kinds of marriage and mating relationships. Further complicating a straightforward classification of mating system are the multiple possible interpretations of biological traits typical of humans used to indicate ancestral mating patterns. While challenging to characterize, our review of the literature offers several key insights. 1) Although polygyny is socially sanctioned in most societies, monogamy is the dominant marriage-type within any one group cross-culturally. 2) Sex outside of marriage occurs across societies, yet human extra pair paternity rates are relatively low when compared to those of socially monogamous birds and mammals. 3) Though the timing of the evolution of certain anatomical characteristics is open to debate, human levels of sexual dimorphism and relative testis size point to a diverging history of sexual selection from our great ape relatives. Thus, we conclude that while there are many ethnographic examples of variation across human societies in terms of marriage patterns, extramarital affairs, the stability of relationships, and the ways in which fathers invest, the pair-bond is a ubiquitous feature of human mating relationships. This may be expressed through polygyny and/or polyandry but is most commonly observed in the form of serial monogamy.
Preprint
Full-text available
The uniqueness of human music relative to speech and animal song has been extensively debated, but never directly measured. To address this, we applied an automated scale analysis algorithm to a sample of 86 recordings of human music, human speech, and bird songs from around the world. We found that human music throughout the world uniquely emphasized scales with small-integer ratios, particularly a perfect 5th (3:2 ratio), while human speech and bird song showed no clear evidence of scale-like tuning. We speculate that the uniquely human tendency toward scales with small-integer ratios may have resulted from the evolution of synchronized group performance among humans.
Article
Full-text available
Pitch perception is critical for recognizing speech, music and animal vocalizations, but its neurobiological basis remains unsettled, in part because of divergent results across species. We investigated whether species-specific differences exist in the cues used to perceive pitch and whether these can be accounted for by differences in the auditory periphery. Ferrets accurately generalized pitch discriminations to untrained stimuli whenever temporal envelope cues were robust in the probe sounds, but not when resolved harmonics were the main available cue. By contrast, human listeners exhibited the opposite pattern of results on an analogous task, consistent with previous studies. Simulated cochlear responses in the two species suggest that differences in the relative salience of the two pitch cues can be attributed to differences in cochlear filter bandwidths. The results support the view that cross-species variation in pitch perception reflects the constraints of estimating a sound’s fundamental frequency given species-specific cochlear tuning.
Article
Full-text available
Despite widespread evidence that nonverbal components of human speech (e.g., voice pitch) communicate information about physical attributes of vocalizers and that listeners can judge traits such as strength and body size from speech, few studies have examined the communicative functions of human nonverbal vocalizations (such as roars, screams, grunts and laughs). Critically, no previous study has yet to examine the acoustic correlates of strength in nonverbal vocalisations, including roars, nor identified reliable vocal cues to strength in human speech. In addition to being less acoustically constrained than articulated speech, agonistic nonverbal vocalizations function primarily to express motivation and emotion, such as threat, and may therefore communicate strength and body size more effectively than speech. Here, we investigated acoustic cues to strength and size in roars compared to screams and speech sentences produced in both aggressive and distress contexts. Using playback experiments, we then tested whether listeners can reliably infer a vocalizer’s actual strength and height from roars, screams, and valenced speech equivalents, and which acoustic features predicted listeners’ judgments. While there were no consistent acoustic cues to strength in any vocal stimuli, listeners accurately judged inter-individual differences in strength, and did so most effectively from aggressive voice stimuli (roars and aggressive speech). In addition, listeners more accurately judged strength from roars than from aggressive speech. In contrast, listeners’ judgments of height were most accurate for speech stimuli. These results support the prediction that vocalizers maximize impressions of physical strength in aggressive compared to distress contexts, and that inter-individual variation in strength may only be honestly communicated in vocalizations that function to communicate threat, particularly roars. Thus, in continuity with nonhuman mammals, the acoustic structure of human aggressive roars may have been selected to communicate, and to some extent exaggerate, functional cues to physical formidability.
Article
Full-text available
Octave equivalence describes the perceived similarity of notes separated by an octave or a doubling in frequency. In humans, octave equivalence perception is used in vocal learning, enabling young children to approximate adult sounds where the pitch lies outside of their vocal range. This makes sense because the octave is also the first harmonic of any tonal sound including the human voice. We hypothesized that non-human animals may also need octave equivalence perception in vocal mimicry, the copying of other species or environmental sounds, to approximate sounds where the pitch lies outside their vocal range. Thus, in the current study, we tested budgerigars ( Melopsittacus undulatus ), a vocal mimicking species, for octave equivalence perception. Budgerigars were trained and tested in a go/no-go operant task previously verified in humans. Budgerigars did not show evidence of octave equivalence perception. This result suggests that vocal-mimicking does not necessarily facilitate or presuppose octave equivalence perception.
Article
Full-text available
A subclass of C fibre sensory neurons found in hairy skin are activated by gentle touch [1] and respond optimally to stroking at ∼1–10 cm/s, serving a protective function by promoting affiliative behaviours. In adult humans, stimulation of these C-tactile (CT) afferents is pleasant, and can reduce pain perception [2]. Touch-based techniques, such as infant massage and kangaroo care, are designed to comfort infants during procedures, and a modest reduction in pain-related behavioural and physiological responses has been observed in some studies [3]. Here, we investigated whether touch can reduce noxious-evoked brain activity. We demonstrate that stroking (at 3 cm/s) prior to an experimental noxious stimulus or clinical heel lance can attenuate noxious-evoked brain activity in infants. CT fibres may represent a biological target for non-pharmacological interventions that modulate pain in early life. Gursul et al. find that gentle stroking of the skin at a frequency that stimulates C-fibre sensory neurons relieves pain in infants.
Article
Full-text available
Rhythmic entrainment-defined as a stable temporal relationship between external periodic signals and endogenous rhythmic processes-allows individuals to coordinate with environmental rhythms. However, the impact of inter-individual differences on entrainment processes as a function of the tempo of external periodic signals remain poorly understood. To better understand the effects of endogenous differences and varying tempos on rhythmic entrainment, 20 young healthy adults participated in a spontaneous motor tempo (SMT) task and synchronization-continuation tasks at three experimental tempos (50, 70, and 128 bpm; 1200, 857, and 469 ms inter onset interval (IOI)). We hypothesized that SMT task performance and tempo would influence externally paced synchronization-continuation task behavior. Indeed, intrinsic rhythmicity assessed through the SMT task predicted performance in the externally paced task, allowing us to characterize differences in entrainment behavior between participants with low and high endogenous rhythmicity. High rhythmicity individuals, defined by better SMT performance, deviated from externally paced pulses sooner than individuals with low rhythmicity, who were able to maintain externally paced pulses for longer. The magnitude of these behavioral differences depended on the experimental tempo of the synchronization-continuation task. Our results indicate that differences in intrinsic rhythmicity vary between individuals and relate to tempo-dependent entrainment performance.
Article
Full-text available
Full text (read-only): https://rdcu.be/1OcE . Animal resources have been part of hominin diets since around 2.5 million years ago, with sharp-edged stone tools facilitating access to carcasses. How exactly hominins acquired animal prey and how hunting strategies varied through time and space is far from clear. The oldest possible hunting weapons known from the archaeological record are 300,000 to 400,000-year-old sharpened wooden staves. These may have been used as throwing and/or close-range thrusting spears, but actual data on how such objects were used are lacking, as unambiguous lesions caused by such weapon-like objects are unknown for most of human prehistory. Here, we report perforations observed on two fallow deer skeletons from Neumark-Nord, Germany, retrieved during excavations of 120,000-year-old lake shore deposits with abundant traces of Neanderthal presence. Detailed studies of the perforations, including micro-computed tomography imaging and ballistic experiments, demonstrate that they resulted from the close-range use of thrusting spears. Such confrontational ways of hunting require close cooperation between participants, and over time may have shaped important aspects of hominin biology and behaviour.
Article
Full-text available
Language, humans’ most distinctive trait, still remains a ‘mystery’ for evolutionary theory. It is underpinned by a universal infrastructure — cooperative turn-taking — which has been suggested as an ancient mechanism bridging the existing gap between the articulate human species and their inarticulate primate cousins. However, we know remarkably little about turn-taking systems of nonhuman animals, and methodological confounds have often prevented meaningful cross-species comparisons. Thus, the extent to which cooperative turn-taking is uniquely human or represents a homologous and/or analogous trait is currently unknown. The present paper draws attention to this promising research avenue by providing an overview of the state of the art of turn-taking in four animal taxa — birds, mammals, insects and anurans. It concludes with a new comparative framework to spur more research into this research domain and test which elements of the human turn-taking system are shared across species and taxa.
Article
Full-text available
Significance The foundations of human music have long puzzled philosophers, mathematicians, psychologists, and neuroscientists. Although virtually all cultures uses combinations of tones as a basis for musical expression, why humans favor some tone combinations over others has been debated for millennia. Here we show that our attraction to specific tone combinations played simultaneously (chords) is predicted by their spectral similarity to voiced speech sounds. This connection between auditory aesthetics and a primary characteristic of vocalization adds to other evidence that tonal preferences arise from the biological advantages of social communication mediated by speech and language.
Article
Full-text available
Relaxation and excitation are components of the effects of music listening. The tempo of music is often considered a critical factor when determining these effects: listening to slow-tempo and fast-tempo music elicits relaxation and excitation, respectively. However, the chemical bases that underlie these relaxation and excitation effects remain unclear. Since parasympathetic and sympathetic nerve activities are facilitated by oxytocin and glucocorticoid, respectively, we hypothesized that listening to relaxing slow-tempo and exciting fast-tempo music is accompanied by increases in the oxytocin and cortisol levels, respectively. We evaluated the change in the salivary oxytocin and cortisol levels of participants listening to slow-tempo and fast-tempo music sequences. We measured the heart rate (HR) and calculated the heart rate variability (HRV) to evaluate the strength of autonomic nerve activity. After listening to a music sequence, the participants rated their arousal and valence levels. We found that both the salivary oxytocin concentration and the high frequency component of the HRV (HF) increased and the HR decreased when a slow-tempo music sequence was presented. The salivary cortisol level decreased and the low frequency of the HRV (LF) to HF ratio (LF/HF) increased when a fast-tempo music sequence was presented. The ratio of the change in the oxytocin level was correlated with the change in HF, LF/HF and HR, whereas that in the cortisol level did not show any correlation with indices of autonomic nerve activity. There was no correlation between the change in oxytocin level and self-reported emotions, while the change in cortisol level correlated with the arousal level. These findings suggest that listening to slow-tempo and fast-tempo music is accompanied by an increase in the oxytocin level and a decrease in the cortisol level, respectively, and imply that such music listening-related changes in oxytocin and cortisol are involved in physiological relaxation and emotional excitation, respectively.
Article
Full-text available
From the biological perspective human musicality is the term referred to as a set of abilities which enable the recognition and production of music. Since music is a complex phenomenon which consists of features that represent different stages of the evolution of human auditory abilities, the question concerning the evolutionary origin of music must focus mainly on music specific properties and their possible biological function or functions. What usually differentiates music from other forms of human sound expressions is a syntactically organized structure based on pitch classes and rhythmic units measured in reference to musical pulse. This structure is an auditory (not acoustical) phenomenon, meaning that it is a human-specific interpretation of sounds achieved thanks to certain characteristics of the nervous system. There is historical and cross-cultural diversity of this structure which indicates that learning is an important part of the development of human musicality. However, the fact that there is no culture without music, the syntax of which is implicitly learned and easily recognizable, suggests that human musicality may be an adaptive phenomenon. If the use of syntactically organized structure as a communicative phenomenon were adaptive it would be only in circumstances in which this structure is recognizable by more than one individual. Therefore, there is a problem to explain the adaptive value of an ability to recognize a syntactically organized structure that appeared accidentally as the result of mutation or recombination in an environment without a syntactically organized structure. The possible solution could be explained by the Baldwin effect in which a culturally invented trait is transformed into an instinctive trait by the means of natural selection. It is proposed that in the beginning musical structure was invented and learned thanks to neural plasticity. Because structurally organized music appeared adaptive (phenotypic adaptation) e.g., as a tool of social consolidation, our predecessors started to spend a lot of time and energy on music. In such circumstances, accidentally one individual was born with the genetically controlled development of new neural circuitry which allowed him or her to learn music faster and with less energy use.
Article
Full-text available
Human interaction through music is a vital part of social life across cultures. Influential accounts of the evolutionary origins of music favor cooperative functions related to social cohesion or competitive functions linked to sexual selection. However, work on non-human “chorusing” displays, as produced by congregations of male insects and frogs to attract female mates, suggests that cooperative and competitive functions may coexist. In such chorusing, rhythmic coordination between signalers, which maximizes the salience of the collective broadcast, can arise through competitive mechanisms by which individual males jam rival signals. Here, we show that mixtures of cooperative and competitive behavior also occur in human music. Acoustic analyses of the renowned St. Thomas Choir revealed that, in the presence of female listeners, boys with the deepest voices enhance vocal brilliance and carrying power by boosting high spectral energy. This vocal enhancement may reflect sexually mature males competing for female attention in a covert manner that does not undermine collaborative musical goals. The evolutionary benefits of music may thus lie in its aptness as a medium for balancing sexually motivated behavior and group cohesion.
Article
Human listeners prefer octave intervals slightly above the exact 2:1 frequency ratio. To study the neural underpinnings of this subjective preference, called the octave enlargement phenomenon, we compared neural responses between exact, slightly enlarged, oversized, and compressed octaves (or their multiples). The first experiment ( n = 20) focused on the N1 and P2 event‐related potentials (ERPs) elicited in EEG 50–250 ms after the second tone onset during passive listening of one‐octave intervals. In the second experiment ( n = 20) applying four‐octave intervals, musician participants actively rated the different octave types as ‘low’, ‘good’ and ‘high’. The preferred slightly enlarged octave was individually determined prior to the second experiment. In both experiments, N1‐P2 peak‐to‐peak amplitudes attenuated for the exact and slightly enlarged octave intervals compared with compressed and oversized intervals, suggesting overlapping neural representations of tones an octave (or its multiples) apart. While there were no differences between the N1‐P2 amplitudes to the exact and preferred enlarged octaves, ERP amplitudes differed after 500 ms from onset of the second tone of the pair. In the multivariate pattern analysis (MVPA) of the second experiment, the different octave types were distinguishable (spatial classification across electroencephalography [EEG] channels) 200 ms after second tone onset. Temporal classification within channels suggested two separate discrimination processes peaking around 300 and 700 ms. These findings appear to be related to active listening, as no multivariate results were found in the first, passive listening experiment. The present results suggest that the subjectively preferred octave size is resolved at the late stages of auditory processing.
Article
Two notes separated by a doubling in frequency sound similar to humans. This “octave equivalence” is critical to perception and production of music and speech and occurs early in human development. Because it also occurs cross‐culturally, a biological basis of octave equivalence has been hypothesized. Members of our team previousy suggested four human traits are at the root of this phenomenon: (1) vocal learning, (2) clear octave information in vocal harmonics, (3) differing vocal ranges, and (4) vocalizing together. Using cross‐species studies, we can test how relevant these respective traits are, while controlling for enculturation effects and addressing questions of phylogeny. Common marmosets possess forms of three of the four traits, lacking differing vocal ranges. We tested 11 common marmosets by adapting an established head‐turning paradigm, creating a parallel test to an important infant study. Unlike human infants, marmosets responded similarly to tones shifted by an octave or other intervals. Because previous studies with the same head‐turning paradigm produced differential results to discernable acoustic stimuli in common marmosets, our results suggest that marmosets do not perceive octave equivalence. Our work suggests differing vocal ranges between adults and children and men and women and the way they are used in singing together may be critical to the development of octave equivalence. Research Highlights A direct comparison of octave equivalence tests with common marmosets and human infants Marmosets show no octave equivalence Results emphasize the importance of differing vocal ranges between adults and infants
Preprint
Both music and language are found in all known human societies, yet no studies have compared similarities and differences between song, speech, and instrumental music on a global scale. In this Registered Report, we analyzed two global datasets: 1) 300 annotated audio recordings representing matched sets of traditional songs, recited lyrics, conversational speech, and instrumental melodies from our 75 coauthors speaking 55 languages; and 2) 418 previously published adult-directed song and speech recordings from 209 individuals speaking 16 languages. Of our six pre-registered predictions, five were strongly supported: relative to speech, songs use 1) higher pitch, 2) slower temporal rate, and 3) more stable pitches, while both songs and speech used similar 4) pitch interval size, and 5) timbral brightness. Exploratory analyses suggest that features vary along a “musi-linguistic” continuum when including instrumental melodies and recited lyrics. Our study provides strong empirical evidence of cross-cultural regularities in music and speech.
Article
Social perceptions of speakers are influenced by their voice information, including vocal characteristics and semantic content. Our study investigated how individuals’ warmth- and competence-related perceptions of speakers were affected by vocal pitch levels (i.e., high/low) and three kinds of semantic cues (i.e., prosocial, antisocial, and neutral) simultaneously. We have three key findings. First, antisocial cues negatively affected social perceptions, regardless of speakers’ gender. However, prosocial cues did not have positive impacts on evaluations of speakers because ratings were similar between prosocial cues and neutral cues. Second, female vocal pitch mattered for warmth-related perceptions but not for competence-related perceptions. The role of semantic cues should be additionally considered when investigating the impact of male vocal pitch on these perceptions. For example, higher-pitched men in prosocial contexts were perceived as warmer, while low-pitched men in antisocial contexts were judged as more competent. Third, the connection between vocal pitch and two kinds of perceptions showed an opposite trend, in which high pitch was related to more warmth but less competence, while the low pitch was associated with less warmth but more competence. These findings extend the understanding of the role of vocal pitch in the formation of stereotypes of strangers in different semantic contexts.
Article
Human speech production obeys the same acoustic principles as vocal production in other animals but has distinctive features: A stable vocal source is filtered by rapidly changing formant frequencies. To understand speech evolution, we examined a wide range of primates, combining observations of phonation with mathematical modeling. We found that source stability relies upon simplifications in laryngeal anatomy, specifically the loss of air sacs and vocal membranes. We conclude that the evolutionary loss of vocal membranes allows human speech to mostly avoid the spontaneous nonlinear phenomena and acoustic chaos common in other primate vocalizations. This loss allows our larynx to produce stable, harmonic-rich phonation, ideally highlighting formant changes that convey most phonetic information. Paradoxically, the increased complexity of human spoken language thus followed simplification of our laryngeal anatomy.
Article
Musicians say that the pitches of tones with a frequency ratio of 2:1 (one octave) have a distinctive affinity, even if the tones do not have common spectral components. It has been suggested, however, that this affinity judgment has no biological basis and originates instead from an acculturation process ‒ the learning of musical rules unrelated to auditory physiology. We measured, in young amateur musicians, the perceptual detectability of octave mistunings for tones presented alternately (melodic condition) or simultaneously (harmonic condition). In the melodic condition, mistuning was detectable only by means of explicit pitch comparisons. In the harmonic condition, listeners could use a different and more efficient perceptual cue: in the absence of mistuning, the tones fused into a single sound percept; mistunings decreased fusion. Performance was globally better in the harmonic condition, in line with the hypothesis that listeners used a fusion cue in this condition; this hypothesis was also supported by results showing that an illusory simultaneity of the tones was much less advantageous than a real simultaneity. In the two conditions, mistuning detection was generally better for octave compressions than for octave stretchings. This asymmetry varied across listeners, but crucially the listener-specific asymmetries observed in the two conditions were highly correlated. Thus, the perception of the melodic octave appeared to be closely linked to the phenomenon of harmonic fusion. As harmonic fusion is thought to be determined by biological factors rather than factors related to musical culture or training, we argue that octave pitch affinity also has, at least in part, a biological basis.
Chapter
Introduction The nature of the following work will be best understood by a brief account of how it came to be written. During many years I collected notes on the origin or descent of man, without any intention of publishing on the subject, but...
Article
Music comprises a diverse category of cognitive phenomena that likely represent both the effects of psychological adaptations that are specific to music (e.g., rhythmic entrainment) and the effects of adaptations for non-musical functions (e.g., auditory scene analysis). How did music evolve? Here, we show that prevailing views on the evolution of music — that music is a byproduct of other evolved faculties, evolved for social bonding, or evolved to signal mate quality — are incomplete or wrong. We argue instead that music evolved as a credible signal in at least two contexts: coalitional interactions and infant care. Specifically, we propose that (1) the production and reception of coordinated, entrained rhythmic displays is a co-evolved system for credibly signaling coalition strength, size, and coordination ability; and (2) the production and reception of infant-directed song is a co-evolved system for credibly signaling parental attention to secondarily altricial infants. These proposals, supported by interdisciplinary evidence, suggest that basic features of music, such as melody and rhythm, result from adaptations in the proper domain of human music. The adaptations provide a foundation for the cultural evolution of music in its actual domain, yielding the diversity of musical forms and musical behaviors found worldwide.
Article
Musical pitch perception is argued to result from nonmusical biological constraints and thus to have similar characteristics across cultures, but its universality remains unclear. We probed pitch representations in residents of the Bolivian Amazon-the Tsimane', who live in relative isolation from Western culture-as well as US musicians and non-musicians. Participants sang back tone sequences presented in different frequency ranges. Sung responses of Amazonian and US participants approximately replicated heard intervals on a logarithmic scale, even for tones outside the singing range. Moreover, Amazonian and US reproductions both deteriorated for high-frequency tones even though they were fully audible. But whereas US participants tended to reproduce notes an integer number of octaves above or below the heard tones, Amazonians did not, ignoring the note "chroma" (C, D, etc.). Chroma matching in US participants was more pronounced in US musicians than non-musicians, was not affected by feedback, and was correlated with similarity-based measures of octave equivalence as well as the ability to match the absolute f0 of a stimulus in the singing range. The results suggest the cross-cultural presence of logarithmic scales for pitch, and biological constraints on the limits of pitch, but indicate that octave equivalence may be culturally contingent, plausibly dependent on pitch representations that develop from experience with particular musical systems.
Chapter
Introduction This chapter explores the relationships between different types of play behaviours in apes and humans and their relationship with the emergence of certain critical cognitive skills, including some of those required for ritual behaviours and supernatural beliefs. It examines the relationship between these play behaviours, especially pretend play, and life-history stages in ape and human development, in particular infancy and early childhood. It then goes on to discuss the palaeoanthropological evidence for the appearance in human evolution of a modern human-like pattern of these life-history stages, and the implications this may have for the emergence of pretend play and the abilities that underlie it, including imitation. LIFE-HISTORY STAGES IN APES AND HUMANS Before exploring the different types of play behaviours exhibited in apes and humans, it is necessary to outline the commonly recognised developmental stages undergone by immature apes and humans, as a clear conception of these periods is relevant to the discussion of the incidence of play behaviours. Humans and apes show broadly very similar life-history trajectories, but with some important differences, to which have been attributed great significance in terms of the development of the cognitive abilities characteristic of modern humans. The stages of development in apes and humans have been variously named and defined, but it will be necessary here to identify terms and characteristics that we can use with consistency. For example, Hochberg (2008) and Geary and Bjorklund (2000) use the terms infancy, childhood, juvenility and adolescence. Many researchers prefer to refer to childhood as early childhood and the following period as middle childhood/juvenile (e.g. Smith 2010; Thompson & Nelson 2011), and this is the convention followed here. These stages are characterised by the following features in apes and humans (see also Figure 6.1): Infancy: The period in which offspring are breastfed; in traditional (hunter-gatherer) human societies, this is typically until the age of around 3 years. In chimpanzees, weaning occurs around 4-5 years of age (Geary & Bjorklund 2000).
Article
Synchronization of behavior has repeatedly been shown to increase pain threshold, which is understood to be an indicator of endorphin activity. Although Weinstein et al. found that large and small groups showed the same effect, to date no study has manipulated group size to determine if it has an effect on change in pain threshold. Thirty-three participants rowed two 20-min time trials under two counterbalanced conditions—paired and large group. Pain threshold was assessed before, immediately post, 5-min post, and 10-min post each session. A repeated-measures (3 × 2) factorial ANOVA revealed a significant interaction between condition and time. Specifically, there was a significantly higher pain threshold in the large group than in the paired condition after 10 min of exercise.