Access to this full-text is provided by Springer Nature.
Content available from Humanities and Social Sciences Communications
This content is subject to copyright. Terms and conditions apply.
ARTICLE
Languages in China link climate, voice quality, and
tone in a causal chain
Yuzhu Liang 1, Lining Wang2, Søren Wichmann 3, Quansheng Xia4, Shuai Wang1, Jun Ding1,
Tianheng Wang1& Qibin Ran 1,5 ✉
Are the sound systems of languages ecologically adaptive like other aspects of human
behavior? In previous substantive explorations of the climate–language nexus, the hypothesis
that desiccation affects the tone systems of languages was not well supported. The lack of
analysis of voice quality data from natural speech undermines the credibility of the following
two key premises: the compromised voice quality caused by desiccated ambient air and
constrained use of phonemic tone due to a desiccated larynx. Here, the full chain of cau-
sation, humidity →voice quality →number of tones, is for the first time strongly supported
by direct experimental tests based on a large speech database (China’s Language Resources
Protection Project). Voice quality data is sampled from a recording set that includes 997
language varieties in China. Each language is represented by about 1200 sound files,
amounting to a total of 1,174,686 recordings. Tonally rich languages are distributed
throughout China and vary in their number of tones and in the climatic conditions of their
speakers. The results show that, first, the effect of humidity is large enough to influence the
voice quality of common speakers in a naturalistic environment; secondly, poorer voice
quality is more likely to be observed in speakers of non-tonal languages and languages with
fewer tones. Objective measures of phonatory capabilities help to disentangle the humidity
effect from the contribution of phylogenetic and areal relatedness to the tone system. The
prediction of ecological adaptation of speech is first verified through voice quality analysis.
Humidity is observed to be related to synchronic variation in tonality. Concurrently, the
findings offer a potential trigger for diachronic changes in tone systems.
https://doi.org/10.1057/s41599-023-01969-4 OPEN
1School of Liberal Arts, Nankai University, Tianjin, China. 2Center for the Protection and Research of Language Resources of China, Beijing Language and
Culture University, Beijing, China. 3Cluster of Excellence ROOTS, Kiel University, Kiel, Germany. 4College of Chinese Language and Culture, Nankai
University, Tianjin, China. 5Laboratory of Social Science of Tianjin, Nankai University, Tianjin, China. ✉email: ranqibin@nankai.edu.cn
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2023) 10:453 | https://doi.org/10.1057/s41599-023-01969-4 1
1234567890():,;
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Introduction
Human behavior, such as phenotypes and survival strate-
gies, is well-adapted to most features of contemporary
environments (Chagnon and Irons, 2002). For example,
the internal nasal fossa and mid-facial morphology show an
ecogeographical distribution consistent with climate adaptation,
partly because the nasal cavity plays a major role in adapting to
extremely cold and dry climates (Maddux et al., 2017; Evteev
et al., 2014). Ancient farmers developed adaptive strategies to
cope with changes in crop yields under environmental and cli-
mate changes such as cooling events (d’Alpoim Guedes and
Bocinsky, 2018). Among all human behaviors, using language to
communicate is one of the most unique traits that distinguish
humans and animals. In view of the universal adaptability of
human biology and human behavior, a growing number of stu-
dies have attempted to trace the relationship between ecological
factors and possible adaptive language elements.
Previous studies have provided evidence for the hypothesis that
changes in the sound system are ecologically adaptive. These
studies have explored various correlations, such as the correlation
between the reduction of ambient air pressure and the use of
ejectives, and the correlation between climate and sonority classes
(Maddieson and Coupé, 2015; Munroe et al., 2009; Ember and
Ember, 2007; Everett, 2013; Everett et al., 2015; Everett, 2017).
One such exploration is presented by Everett et al. (2015), which
demonstrated a statistical association between ambient desicca-
tion and the absence of lexical tone. The authors submit that
complex tone should be more difficult to achieve in arid climates
than in warmer and more humid climates given that inhalation of
dry air impacts vocal fold physiology and that production of tones
requires relatively precise manipulation of the vocal folds. Most
commentaries have agreed in general that language is ecologically
adaptive (Boer, 2016; Donohue, 2016; Ladd, 2016; Everett et al.,
2016b; Hammarström, 2016; Collins, 2016; Winter and Wedel,
2016). However, there has been much debate about the more
specific hypothesis vis-à-vis desiccation and tonality. On the one
hand, the suggestion is supported, at least indirectly, by extensive
experimental evidence from laryngology. Everett et al. (2015)
offered global, continental, and linguistic family-level data con-
sistent with the geography–tone association. On the other hand,
the discovered patterns of humidity–tonality are not buttressed by
natural speech analysis. The two premises are that (1) desiccated
ambient air results in compromised voice quality and (2) a
desiccated larynx constrains the use of phonemic tone (Everett,
2017; Ladd, 2016; Everett, 2021). In the absence of support for
these two key premises, the discovered patterns of
humidity–tonality could potentially be interpreted as an epiphe-
nomenon of the geographical distribution of languages (Collins,
2016; Winter and Wedel, 2016). Subsequent works presented
analyses with a continuous measure of tone and found a positive
correlation between humidity and tone, but the significance dis-
appears when controlling for relative genealogical distance
(Hammarström, 2016; Roberts, 2018). Thus, whether the absence
of ambient humidity negatively correlates with the presence of
tone remains unresolved.
Regarding the first premise, namely that desiccated air affects
voice quality, the debate revolves around the question of whether
the desiccation effect is large enough to impact laryngeal pitch
control (Donohue, 2016). Everett et al. offered a brief meta-
analysis of relevant studies from laryngology, showing that lar-
yngeal desiccation impacts the viscoelasticity of the vocal folds
and that the desiccation of vocal cords leads to greater perceived
phonatory effort on the part of speakers (Everett et al., 2015;
Everett, 2017). However, some have argued that the impact of
desiccated air on the vocal cords is minor (Boer, 2016). Everett
et al. countered this view with reference to several lines of
evidence relating to special environments or special populations
such as winter athletes, singers, and patients with respiratory
diseases (Everett et al., 2016b; Sue-Chu, 2012; Koskela, 2007).
Although previous studies on the effect of hydration on voice
quality were conducted with normal subjects (Leydon et al., 2009;
Alves et al., 2019), they were conducted under laboratory con-
ditions. None of the studies has provided direct evidence that the
effect of desiccation, after long-term provocation periods, is large
enough to increase the jitter of common speakers, in a naturalistic
environment in which speech is used.
Concerning the second premise, that a desiccated larynx con-
strains the use of phonemic tone, it has been debated whether
languages with complex tones really do rely more on precise
laryngeal pitch control (Everett, 2017; Ladd, 2016). Everett et al.
(2015) categorized languages as having or not having ‘complex
tonality’. Here ‘complex tone’is defined as a tone system with
three or more tonemic contrasts according to Maddieson (2013).
Yet, languages vary non-discretely in tone. The characterization
of complex tonality by Everett et al. (2015) is somewhat sim-
plistic. Tone is not a simple pitch or fundamental frequency but
can involve several other kinds of cues, and it is often not possible
to identify a single cue that is responsible for all contrasts
(Donohue, 2016). For instance, many Hmongic languages are
known to express different tonal categories through a mesh of
cues including breathiness, creakiness, besides modal phonation.
Thus, exclusively investigating pitch or fundamental frequency
when contrasting tone systems cross-linguistically is insufficient
as a basis for judging whether tonal languages require more
precise laryngeal control (Ladd, 2016; Gussenhoven 2016).
Whether using phonemic pitch or the length and specific pho-
nations involved in tonal complexity, regular vocal fold vibration
is required. Measurements of perturbation in voice frequency
(jitter) and amplitude (shimmer) can quantify the regularity and
hence the stability of vocal fold vibration (Brockmann et al.,
2011). It is not clear whether minor effects of humidity on jitter
rates can impact tone production in normal speech (Everett,
2017; Ladd 2016) since studies have not addressed this issue yet.
Based on the two premises, the suggestion was made to use
voice quality measurements as an intermediate process to help
establish a full causal chain fleshing out the hypothesis of the
relationship between humidity and tonality (Everett et al., 2015).
Current research (Everett et al., 2015; Hammarström,2016;
Roberts, 2018) simply tested a correlation between the variables at
either end of the chain. That is, previous work analyzed corre-
lations between humidity and tones in databases only, without
analyzing actual speech data—the key middle link of the causal
chain. When directly predicting the distribution of tone using
humidity, juggling with multicollinearity between environmental
and sociocultural predictors (such as population, language
families, and language contact) makes it hard to derive causal
mechanisms from correlation patterns. A direct experimental test
of the full chain of causation, humidity →voice quality →ton-
ality, is difficult without a cross-linguistics speech database. In
order to extract voice quality measurements from natural speech,
a cross-linguistic speech database is needed. All audio in the
database should be produced according to the same standard
because acoustic voice features are sensitive to environmental
noise and signal amplitude, which depends on the location of
equipment and the quality of the microphone (Fahed et al., 2022;
Uloza et al., 2021). Additionally, acoustic voice features of dif-
ferent genders and age groups also show significant differences
(Brockmann et al., 2011; Schultz et al., 2021).
Here, we establish two more fine-grained causal links, humidity
→voice quality and voice quality →number of tones, aiming to
strengthen the evidence linking humidity with tonality. Currently,
ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-023-01969-4
2HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2023) 10:453 | https://doi.org/10.1057/s41599-023-01969-4
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
China’s Language Resources Protection Project (Zhongguo Yuyan
Ziyuan Baohu Gongcheng, abbreviated as YuBao) provides a
large, standardized, high-quality audio and video database that
can be used to extract voice quality data from natural speech. The
database covers language varieties from 1718 locations in China.
The database establisher has set strict and uniform standards for
the recording process, including recording environment, equip-
ment, parameters, and speaker selection. Other speech databases
around the world struggle to meet all these requirements for
large-scale, high-quality recording, and uniform recording stan-
dards at the same time (Heggarty et al., 2019). In addition to
meeting the above conditions, the YuBao database offers two
advantages in studying humidity–tonality patterns: (1) the data-
base covers a large number of tonally rich languages, (2) the
locations of the sampled languages cover diverse climates. Owing
to tremendous differences in latitude, longitude, and altitude, the
climate of China is extremely diverse, ranging from tropical in the
far south to subarctic in the far north and alpine in the higher
elevations of the Tibetan Plateau (Lü and Li, 2012). China is a
natural testing ground for the effects of desiccated air on the
larynx, given the great variation in the number of tones across
different languages and differences in the climactic conditions
experienced by the speakers of these languages (Collins, 2016)
(see Fig. 1a, c).
We will examine the distribution of tonal patterns in China
and address the issue of humidity’s effect by drawing upon
phonetic and phonological data for a large set of languages. We
are pursuing three goals. First, we will analyze whether lower
humidity leads to poorer voice quality. Second, we will investigate
the hypothesis that poor voice quality has an effect on the number
of tones. Third, we re-test the correlation between humidity and
the number of tones, comparing our results with previous ones
(Hammarström, 2016; Roberts, 2018).
Methods
Rather than employing simplistic binning strategies in the cate-
gorization of linguistic and geographic variables, we use con-
tinuous variables, specifically humidity, jitter, shimmer, and
number of tones. We relied on recordings and phonotactics from
the YuBao database. We were authorized and downloaded
recordings pertaining to a list of 1200 lexical items in 997 lan-
guage varieties, all of which were used in the current study. We
analyzed a total of 1,174,686 recordings, which included samples
from a few locations where not all 1200 lexical items were present.
All the recordings in the case study were digitized at a sampling
rate of 44,100 Hz, 16 bits per sample. The recordings for every
language variety were made according to strict standards that
regulated the speakers, recordings, videos, and transcriptions. To
control for noise, it was recommended to record in a professional
recording studio or a quiet room with doors and windows closed
and with electrical appliances such as fans, air conditioners,
fluorescent lights, and mobile phones turned off. These standards
also require controlling background noise to be below -60 dB and
no louder than -48 dB, and speech volume should reach a
Fig. 1 The 997 language varieties are depicted in the maps. The humidity values (g /kg) associated with them are visualized in plot (a), jitter rates in plot
(b), the number of tones in plot (c), and the language family membership in plot (d). The maps were generated using the pyecharts Python package.
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-023-01969-4 ARTICLE
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2023) 10:453 | https://doi.org/10.1057/s41599-023-01969-4 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
maximum of -18 dB or lower than -6 dB, with Audacity being
used as an example. For each language variety, the metadata
consist of a location (province, city/district, and town/county)
and the data consist of recordings of 1200-item list from one
native male speaker aged 55–65.
Forty-eight (4.81%) varieties in the dataset are classified as
non-tonal, and the remainder are tonal. For the latter, we
extracted the number of tones for each language variety (Chinese
Academy of Social Sciences, 2017a,2017b). The number of tones,
or rather the number of tone oppositions, includes all the con-
trasts that produce meaning differentiation at the word level. For
example, the Jianchuan variety of Bai (ISO 639-3 code: bca) has
eight tones, divided between those with modal (e.g., [tɕi33]‘pull’)
and non-modal phonation (e.g., [tɕi42]‘chase’). The Chao tone
numerals following the string of IPA characters indicate pitches
relative to the natural pitch range of a particular speaker’s voice.
The pitch level of words with non-modal phonation is higher
than that of words with modal phonation (Editorial Board of
Chinese Minority Languages, 2009). For the 997 languages used
in the analysis, tone ranged from 0 to 14, with a mean of 5. The
concept of ‘complex tone system’(Everett, et al., 2015) is useful
for broad comparisons on a worldwide scale, but at the more local
level of Sino–Tibetan, Hmong–Mien, and Kam–Tai languages, a
more fine-grained categorization is needed, since most of the
languages would fit a general definition of complex tone systems.
For instance, the Mandarin tone system, which has four tonemic
contrasts, is evidently less complex than that of the Jianchuan
variant of Bai, even if both languages are plausibly classified as
having complex tone systems. Thus, the concept of ‘complex
tone’is avoided here.
Commonly used acoustic measures for voice quality analysis
include jitter, shimmer, HNR (harmonic noise ratio), funda-
mental frequency, PTP (phonation threshold pressure), and PPE
(perceived phonatory effort) (Leydon et al., 2009; Alves et al.,
2019; Gussenhoven, 2016). Dehydration, water ingestion, and
steam inhalation (rehydration) can significantly affect jitter and
shimmer (Alves et al., 2019; Mahalingam and Boominathan,
2016). Jitter and shimmer are more sensitive to modest increases
or decreases in humidity than other measures. Jitter and shimmer
values were extracted using Praat (see Supplementary Text) and
averaged over 1200-item recordings of each location, with both
measures expressed as percentages. Specific humidity, which
refers to the ratio of water in the air, was chosen as the main
ecological variable. We obtained specific humidity data from the
WheatA database for locations associated with languages across
China from 1982 to 2021. The unit of humidity is expressed in
g/kg. For each location, the mean specific humidity was calculated
across all years and months. The humidity ranged from 2.24 to
16.24, with an average of 9.158. The highest humidity was
recorded in Yazhou District, Sanya City, Hainan Province
(18.363°, 109.178°), at 16.24, while the lowest was in Ritu County,
Ngari Prefecture, Tibet Autonomous Region (33.383°, 79.739°), at
2.24. In total, there are 997 locations (speakers). Each location has
a humidity datum, a jitter datum, a shimmer datum, and a tone
total (number of tones).
We used base R (R Core Team, 2018) and the lme4 package
(Bates et al., 2015) to perform linear mixed-effects analysis for
voice quality and generalized linear mixed-effects analysis for the
number of tones. When investigating a linguistic phenomenon
across multiple languages, neglecting the possibility that lan-
guages with a shared ancestor may also share similar features can
lead to incorrect conclusions. In this case, it is appropriate to
include linguistic family as a random effect in regression analysis
(Coupé, 2018). We expect that data pertaining to one and the
same language family are not independent and therefore model
language family, or linguistic groups, as random effects in all
analyses. As shown in Table 1and Fig. 1d, the family factor has
six levels (Altaic, Austroasiatic, Sinitic, Hmong–Mien,
Tibeto–Burman, Kam–Tai). Generalized linear models allow for
the dependent variable to follow a non-normal distribution, such
as a Poisson distribution, which is suitable for data on phoneme
inventory sizes (Coupé, 2018).
We first conducted two linear mixed-models to examine the
effect of humidity on voice quality measures (jitter and shimmer).
Humidity was entered into the models as a fixed effect. As ran-
dom effects, we had intercepts for language families and a by-
family random slope for the effect of humidity. We also log-
transformed jitter and shimmer to achieve a more normal dis-
tribution of the data. Next, we used a generalized linear mixed-
effects model to predict the number of tones, using a Poisson
distribution to capture the discrete and skewed nature of the data.
We entered jitter and shimmer as fixed effects and did not include
interactions between independent variables to avoid making the
model more complex. As random effects, we included intercepts
for language families and a by-family random slope for the effect
of jitter and shimmer. Finally, we performed a generalized linear
mixed-effects analysis to examine the effect of humidity on the
number of tones, with only humidity included in the model.
Sinitic accounts for 76.62% of the total number of data points.
The remaining linguistic groups have fewer representatives. We
also carried out parallel statistical analyses of these linguistic
groups. We used a linear model fitting voice quality and a gen-
eralized linear model fitting the number of tones within each
linguistic group (see Supplementary Materials for details).
Results and discussion
Humidity effect on voice quality. The linear mixed-model for
jitter/shimmer and humidity reveals an interaction, as evidenced
by the negative slope in Fig. 2. The graph illustrates that locations
with lower humidity tend to have a higher percentage of jitter and
shimmer. The analysis of factorial experiments shows that the
fixed effect of humidity is significant (Bolker et al., 2022)(jitter:
χ2=160.68,df =1,p< 0.0001; shimmer:χ2=42.58,df =1,
p< 0.0001). Speakers living in more humid regions are more
likely to have better voice quality, and speakers living in dryer
regions are more likely to have poorer voice quality (see Fig. 2).
Two groups, Sinitic and Tibeto–Burman, comprise 89.17% of the
varieties in the sample, with each having more than 100 repre-
sentatives. Although within-family regressions suggest that the
humidity effect is only significant in Sino–Tibetan, speakers of
linguistic groups located in humid regions (Kam–Tai,
Hmong–Mien, Austroasiatic) have lower jitter and shimmer.
Speakers of the linguistic group located in dryer regions (Altaic)
have higher jitter and shimmer (see Fig. 3).
Table 1 Language varieties analyzed in this study.
Language family Language subfamily Number of
language varieties
Sino–Tibetan [sit] Sinitic (Chinese) [zhx] 764
Tibeto–Burman [tbq] 125
Hmong–Mien [hmx] 35
Kra–Dai [taik1256] Kam–Tai [kamt1241] 37
Altaic [tut] Mongolic [xgn] 10
Turkic [trk] 11
Manchu–Tungusic [tuw] 1
Koreanic [kore1284] 1
Austroasiatic [aav] 13
ISO 639-3 codes and Glottocodes (if their ISO 639-3 codes are not given) are provided in
square brackets.
ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-023-01969-4
4HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2023) 10:453 | https://doi.org/10.1057/s41599-023-01969-4
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
This first experiment constitutes direct evidence in support
of the humidity–voice quality causal link. Why would less
humidity lead to higher jitter and shimmer in speakers? There
is a possible pathway linking humidity and voice quality. The
effect of desiccated air is the evaporation of the airway surface
liquid coating the vocal folds. Hypohydration alters the
vibratory characteristics of the vocal folds. Muscular function
varies according to hydration status, with increased fatigue
and decreased rapidity of movement resulting from water
deficit (Judelson et al., 2007). In naturalistic environments,
prolonged inhalation of relatively dry air results in decreased
efficiency of vocal fold vibration and compromised voice
quality. A speaker’s laryngeal control is not as precise as that
of a speaker living in a humid environment. This is a result of
long-term accumulative effects of climate rather than short-
term air provocation in experimental settings or extreme
environments.
In the within-family regressions for Kam–Tai, Hmong–Mien,
Austroasiatic, and Altaic, the effect of humidity on voice quality
was not significant, potentially due to the small sample sizes or
the small variance of humidity. That humidified air does not
affect perturbations as systematically as dry air can account for
the results pertaining to Kam–Tai, Hmong–Mien, and Austroa-
siatic (Hemler et al., 1997). Everett et al. also used the effect of
humidified air to explain why humidity does not broadly correlate
with tonality (Everett et al., 2016a). Previous work has reported
that humidified air did not systematically influence perturbation,
at least in the short-term provocation period (Hemler et al.,
1997). Only two of the four previous studies found a significant
positive effect of higher humidity levels on PTP. Limited
Fig. 2 Visualization of humidity effect on jitter, shimmer, and number of tones. The regression line was drawn using the generalized additive model
smooths method, and the figures were plotted using the ggplot2 R package.
Fig. 3 Jitter and shimmer for language varieties with associated specific humidity values. The regression line was drawn using the linear-smoothing
method.
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-023-01969-4 ARTICLE
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2023) 10:453 | https://doi.org/10.1057/s41599-023-01969-4 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
significant effects were found for moderate humidity conditions
(Alves et al., 2019). In contrast, low-humidity environments
revealed more significant negative effects. One explanation is that
the effect of further humidification of inhaled air does not exist,
another explanation is that further decrease of perturbation may
not be possible, especially since control perturbation measure-
ments of all subjects were low and well within the normal range
(Hemler et al., 1997). Humidity variation is not only limited at
Kam–Tai, Hmong–Mien, and Austroasiatic locations, but they
have high average and minimum values (see Supplementary
Table S11). This suggests that limited variation in high humidity
does not cause significant changes in voice quality, unlike low
humidity.
Jitter and shimmer effects on number of tones. A generalized
linear mixed-model with tone as the dependent variable, and
jitter and shimmer as the independent variables, shows no
interaction between jitter/shimmer and tone. Comparison of the
full model with fixed effects and the model without fixed effects
reveals that including jitter or shimmer as fixed effects in the
model does not significantly improve the model fit(χ2=2.1275,
df =5,p=0.8312).
However, within-family regressions suggest that the jitter effect
is significant in Sino–Tibetan and Austroasiatic languages (see
Fig. 4and Supplementary Table S5). Including shimmer as a fixed
effect in the models does not significantly improve the model.
Altaic languages, as well as some Austroasiatic and
Tibeto–Burman languages, are non-tonal; the mean humidity of
these is 4.972, which is only greater than the humidity in 10.63%
of locations. Their mean jitter and shimmer are 2.478 (greater
than the jitter in 78.94% of locations) and 12.10 (greater than the
shimmer in 83.85% of the locations), respectively.
In the previous section on the effect of humidity on voice
quality, we observed that humidity affects both jitter and
shimmer, but the impact on shimmer is not as prominent as
the impact on jitter. The adjusted R2results based on the linear
model of linguistic groups demonstrate that humidity has a better
predictive effect on jitter than shimmer (refer to Supplementary
Tables S2 and S4). In studies that have investigated the effects of
systemic hydration on shimmer, shimmer values are less accurate
in speech signals compared to jitter values (Alves et al., 2019).
Although both jitter and shimmer are time-based, jitter is more
dependent on fundamental frequency (Shu et al., 2022). Tone is
more associated with variations in fundamental frequency than
with jitter. However, no statistical difference was found for the
effect of systemic or surface hydration on fundamental frequency
(Alves et al., 2019). Therefore, the acoustic measure of voice
quality, jitter, is not only well-predicted by humidity but is also an
effective predictor of the number of tones.
A poorer voice quality is more likely to be observed in the
speakers of non-tonal languages and languages with fewer tones,
which occurs in Altaic and Sino–Tibetan. Speakers of
Sino–Tibetan languages with more tones are more likely to have
better voice quality. Although the jitter effect was not significant
across all within-family regressions, linguistic groups with the
lowest jitter (Kam–Tai, Hmong–Mien) have a larger number of
tones, while the linguistic group with the highest jitter (Altaic)
represents non-tonal languages. These results suggest that
synchronic variation in tonality is related to minor differences
in jitter rates.
Why might higher jitter be associated with tone reduction?
Maintaining adequate tone usage in communication is challen-
ging when the efficiency of vocal fold vibration decreases,
resulting in weaker lexical tone distinctions. Irregular and
aperiodic vocal fold vibrations impede the effortless production
of both vowels and tones. Vowels require high-amplitude vocal
fold vibration in nearly all cases. However, as phonatory effort
increases, the maintenance of adequate tone use becomes more
challenging than maintaining vowel use. Vowel differences
contribute more significantly than tone differences in dialect
perception (Liu et al., 2020). During language acquisition,
children demonstrated reduced sensitivity to tone mispronuncia-
tions relative to vowel mispronunciations (Wewalaarachchi and
Singh, 2015; Singh et al., 2015). Deficient use of tonemes is
permissible in communication, whereas vowel mispronunciation
is easily perceived by listeners.
A causal effect between voice quality and the number of tones
exists but is not robust. Beyond a certain point, an improvement
in voice quality may not result in a significant increase in the
number of tones produced, just as further humidification of
inhaled air beyond a certain point may not have a noticeable
effect on vocal fold vibration. The mean values of jitter of
Kam–Tai and Hmong–Mien are low (see Supplementary Table
S9). Hmong–Mien and Kam–Tai generally have the largest
number of tone categories due to tonogenesis and tone change.
Although Hmong–Mien and Kam–Tai maintain their tonal
complexity as they are less prone to influence from poor voice
quality, better voice quality may not increase the number of tones
any further, as the number of tones is unlikely to break the tone
inventory cap. The extreme tone inventories are limited by their
physiological basis (Ran, 2016). In humid areas or areas
categorized as not dry, the vocal fold control is not hampered
by humid air, and the normal voice quality does not impede the
use or development of phonemic tone.
Fig. 4 Number of tones and associated jitter values for 997 languages varieties. The regression line was drawn based on the generalized linear-
smoothing method.
ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-023-01969-4
6HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2023) 10:453 | https://doi.org/10.1057/s41599-023-01969-4
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
The higher jitter or shimmer in Altaic languages may be caused
by complex syllable structures. Altaic languages are non-tonal. As
Maddieson notes, non-tonal languages are considerably more
likely to have complex syllable structures (Maddieson, 2013), and
this is true of Altaic languages, which permit freer combinations
of two or more consonants in the position after a vowel. The first
consonants of clusters are mainly sonorants including liquids,
nasals, and glides. Meanwhile, languages with complex syllables
have fewer vowels (Everett, 2021). Including more consonant
clusters or fewer vowels in a syllable may increase the jitter of the
syllable, because consonants within normal speech will appear to
have little periodicity, whereas sustained vowels will appear to be
strongly periodic (Farideh et al., 2021). Vowels are richer in
glottal vibrations compared to consonants (Saggio and
Costantini, 2020). Higher jitter or shimmer may result from
complex syllable structures, which are more likely to occur in
non-tonal languages, and lower jitter or shimmer may result from
moderately complex syllable structures, which are associated with
the occurrence of complex tone systems. Thus, the association
between voice quality and tone systems may be due to a
potentially confounding factor, namely syllable structures. How-
ever, this assumption does not hold for three main reasons. The
first reason is that the tone category does not show any consistent
relationship to the occurrence of simple syllable structure
(Maddieson, 2013). Most morphemes in Sinitic consist of one
syllable, and most syllables are identifiable as morphemes
(Thurgood and LaPolla, 2008). Although Sinitic languages all
have simple syllable structures, the number of distinctive tones
varies across variants of Sinitic, and so does speakers’jitter. The
second reason is that although Hmong–Mien and Kam–Tai have
moderately complex syllable structure, their jitter and shimmer is
lower than those of Sinitic. Hmong–Mien and Kam–Tai
languages that permit liquids or glides in the second position of
consonant clusters are counted as having moderately complex
syllable structures. Third, polysyllabic (especially disyllabic)
words, most of which are transparently compounded of
monosyllabic morphemes, occur frequently in the lexicon of
most Sino–Tibetan languages (Thurgood and LaPolla, 2008).
Sampling voice quality data from lexical recordings mitigates the
effects of differences in syllable structure between Altaic and
Sino–Tibetan. These three reasons preclude the possibility that
syllable structure is responsible for the correlation between voice
quality and tone system observed in this study being fortuitous.
Humidity effect on number of tones. A generalized linear
mixed-model with the number of tones as a dependent variable
and humidity as an independent variable revealed a significant
nonlinear relationship between humidity and tone, as also found
in a previous study (Roberts, 2018). Our final model is expressed
by a polynomial of humidity and has a random intercept for
family and a random slope for the humidity effect for the family.
The results show that the fixed effect of humidity is significant
(I(Humidity2): χ2=10.59,df =1,p=0.001; I(Humidity3):
χ2=8.06,df =1,p=0.005) (see Fig. 2). Within-family regres-
sions suggest that the humidity effect is significant in
Sino–Tibetan (see Fig. 5). Thus, we came to the conclusion that
the location with higher humidity tends to have a higher number
of tones.
The correlation between climate and tonality is not particularly
surprising in China, because ethnic languages which are tonal,
and Sinitic, with a larger number of tones, are found across
Southern China and are less concentrated in the northern and
northwestern parts of the country. The Qinling–Huaihe Line,
corresponding roughly to the 33rd parallel, is often used as the
geographical dividing line between northern and southern China.
This line approximates the 0 °C January isotherm and the 800
millimeters isohyet in China. It divides eastern China into
northern and southern regions with different climates, namely,
semi-humid and humid. Moreover, because of higher altitudes,
there are arid and semi-arid regions in northwest China and the
Qinghai–Tibet alpine regions in western China. Main dialect
groups of Sinitic (Wu, Xiang, Gan, Hakka, Min, and Yue) cover
the east and southeast of China, falling neatly into almost
complementary geographical distribution with Mandarin
(LaPolla, 2001). Populations that speak Hmong–Mien, Kam–Tai,
and Austroasiatic are in southern China (Sun et al., 2013).
Historical contingencies playing out in geographical space is an
ever-present factor in the evolution of languages, and a possible
confound for correlational studies and attempts to establish
causal chains. In China, successive waves of migration from the
north have, over many centuries, led to a successive super-
imposition of layers of different northern Chinese dialects onto
evolving southern dialects. The southward expansion of Sinitic
resulted in the emergence of new varieties in Southern China with
substrate influence from the indigenous languages spoken there
(Chappell, 2001). Sino–Tibetan languages also spread into Burma
and throughout the Himalayas, from an origin which is
commonly assumed to have been the Yellow River. The Kam–Tai
and Hmong–Mien families expanded southward during historic
times (Diamond and Bellwood, 2003). The ‘Altaicization of
Northern Chinese’hypothesis (Hashimoto, 1986) implies that
Northern Sinitic varieties are more likely to be stress-based and
have a smaller number of tone categories when they are spoken
near generally non-tonal Altaic languages. Southern Sinitic
varieties maintain or develop their tonal complexity, as they are
less prone to influence from the Altaic languages and instead
prone to influence from highly tonal Hmong–Mien languages
(Collins, 2016; LaPolla 2001; Szeto and Yurayong, 2010). It is
necessary to be aware of the effects of language contact because of
its important contributions to the tone system. Still, its direct and
indirect effects on tone are mediated by humidity effects, which
unlike effects of language contact, are systematic (Collins, 2016;
Everett et al., 2016a). There is nothing to suggest that the
correlation between humidity and tone could be an artifact of the
history of language families and language contact. The objective
characteristics of phonatory capabilities captured from speech
samples are immune to contact-based effects or the language-
families history. Our findings certainly do not deny or contradict
the great contributions of history and language contact to the
distribution of tone systems, but the fact that climate still emerges
as a correlate of variation in sound systems in the face of
historical contingencies underscores the importance of this latter
factor.
Conclusion
The full chain of causation, humidity →voice quality →number
of tones, is for the first time strongly supported by direct
experimental tests on the basis of a large speech database, China’s
Language Resources Protection Project. Previous studies only
examined a correlation between the variables at either end of the
chain. Owing to the lack of large, standard, and high-quality
cross-linguistic speech databases for extracting voice quality
measurements from natural speech, direct experimental testing of
the entire causal chain has been hindered. In the absence of
intermediate links, the hypothesis about tone and humidity
cannot be verified.
Here, the prediction that climate affects the tone systems via
voice quality is verified. The chain of causal effects becomes
complete when we observe that relatively dry ambient humidity
results in decreased efficiency of vocal fold vibration. The effect of
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-023-01969-4 ARTICLE
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2023) 10:453 | https://doi.org/10.1057/s41599-023-01969-4 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
humidity on the vocal folds is sufficient to surface in natural
speech. Adequate tone usage in communication is hard to
maintain when the efficiency in vocal fold vibration decreases.
This leads to fewer distinctions in lexical tone, and then a change
in the whole sound system. The pattern holds with respect to
Sino–Tibetan languages. Meanwhile, the impact of humidified air
on the vocal folds is not as substantial as that of dry air. Humid
air does not impede laryngeal control. Languages maintain or
develop their tonal complexity as they are less prone to influence
from increased phonation effort. This pattern is observed in
Hmong–Mien and Kam–Tai. Objective measures of phonatory
capabilities disentangle the humidity effect from the effects of
language diversity, history, and contact. The prediction of eco-
logical adaptation of speech, then, is supported by the distribution
of tone systems in China. Our results suggest that humidity is
related to synchronic variation in tonality and offer a potential
trigger for diachronic changes in tone systems.
Armed with the ‘acoustic adaptation hypothesis’of Maddieson
and Coupé, 2015, which was originally used to account for the
relationship between characteristic of the syllable structure and
ecological factors, Ladd (2016) went on to explore the
humidity–tonality correlation. Areas with high annual precipita-
tion and greater tree cover are observed to contain languages with
a lower dependence on consonants in their sound patterns. This
is because filtering effects of the environment are more likely to
degrade higher frequency sounds. Consonants, in general, rely on
higher frequency acoustic characteristics for identification, and
hence their predominance in languages spoken in these areas is
relatively lower (Maddieson and Coupé, 2015). Similarly, the
effect of humidity also significantly affects sound transmission,
leading to sounds being quieter and duller, because humidity will
absorb high-frequency energy, reducing the level of high fre-
quencies in the sound (Harris, 1966; Howard and Angus, 2009).
Ladd believes that although both production and transmission
factors play a role in explaining the correlation between humidity
and tone, the effect of humidity on the signal constitutes a more
plausible explanation for the uneven distribution of tone lan-
guages than effects on the organs (Ladd, 2016). Ladd pointed out
that, in all situations, higher frequencies fade more quickly than
lower frequencies. However, frequencies within the range of
fundamental frequency tend to fade more quickly in dry air
compared to humid air. According to Ladd, the
humidity–tonality correlation can thus be explained with refer-
ence to sound transmission. The attenuation of low-frequency
sounds in dry air results in a weakened distinction of the signal,
which can further influence the perception of listeners. This can
lead to miscommunication, and eventually to a selection pressure
against fine tonal distinctions (Roberts, 2018). However, accord-
ing to the ISO 9613-1 standard for calculating the attenuation of
sound as a result of atmospheric absorption (ISO 9613-1, 1993),
low-frequency pure-tone atmospheric attenuation coefficients in
dry air are only slightly higher than in humid air. Additionally,
the attenuation of low frequencies is much lower compared to
that of high frequencies, regardless of the humidity level. This
means that any changes in low-frequency attenuation may not be
noticeable to listeners. Thus, while we are not strongly opposed to
this idea, it seems to us unlikely that the causal chain linking tone
and humidity resides in the transmission of sound in air and the
effects of humidity on the signal that reaches the hearer. In
contrast, we believe that a causal pattern of the effects of humidity
on vocal organs has been demonstrated in the present paper.
In research focusing on human behavior adaptation, it has only
rarely been possible to prove complete mechanisms of how
ecology drives human behavior. Natural selection theory can
directly support the connection between ecology and physiology.
For example, the brain expansion in Homo was mainly driven by
ecological challenges such as finding, caching, or processing food
(González-Forero and Gardner, 2018). However, it is doubtful
that other human behaviors are directly subject to the forces of
natural selection. Climate, voice quality, and tone are linked in a
causal chain, which provides a case study of a possible complete
mechanism through which ecology triggers human behavior. The
relationship between ecology and human behavior is mediated by
physiological mechanisms, rather than directly related.
Despite the limited number of extremely dry regions in our
dataset, we were still able to identify a trend that relatively dry
ambient humidity results in decreased efficiency of vocal fold
vibration. Moving forward, we will consider ways to expand our
dataset to include more regions with extreme aridity and conduct
further research to better understand the impact of humidity on
voice quality. In addition, the approach demonstrated here could
be extended to the exploration of global geo-phonetic correla-
tions, which calls for the further establishment of a global, large-
scale, high-quality, and standard speech database.
Data availability
All data generated or analyzed during this study are included in
the supplementary information file and submitted dataset. And
the dataset is also available in a GitHub repository (https://github.
com/EL-CL/SI_Data). The dataset includes the locations of the
997 language varieties along with their corresponding humidity
values (g/kg), jitter rates, shimmer rates, number of tones,
Fig. 5 Number of tones and associated specific humidity values for 997 languages varieties. The regression line was drawn based on the generalized
linear-smoothing method.
ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-023-01969-4
8HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2023) 10:453 | https://doi.org/10.1057/s41599-023-01969-4
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
language family membership, and raw acoustic data (jitter and
shimmer for each audio file). Zhongguo Yuyan Ziyuan Baohu
Gongcheng, abbreviated as YuBao, can be accessed at https://
zhongguoyuyan.cn/index. WheatA database can be accessed at
http://www.wheata.cn/. Python package pyecharts can be acces-
sed at https://github.com/pyecharts/pyecharts.
Received: 14 December 2022; Accepted: 24 July 2023;
References
Alves M, Krüger E, Pillay B, van Lierde K, van der Linde J (2019) The effect of
hydration on voice quality in adults: a systematic review. J Voice 33(1).
https://doi.org/10.1016/j.jvoice.2017.10.001
Bates D, Mächler M, Bolker BM, Walker SC (2015) Fitting linear mixed effects
models using lme4. J Stat Softw 67(1):1–48. https://www.jstatsoft.org/article/
view/v067i01
Boer BD (2016) Commentary: is the effect of desiccation large enough? J Lang Evol
1(1):55–57. https://doi.org/10.1093/jole/lzv008
Bolker B, Westfall J, Aust F, Ben-Shachar MS (2022) Analysis of factorial experi-
ments. https://github.com/singmann/afex
Brockmann M, Drinnan JM, Storck C, Carding NP (2011) Reliable jitter and
shimmer measurements in voice clinics: the relevance of vowel, gender, vocal
intensity, and fundamental frequency effects in a typical clinical task. J Voice
25(1):44–53. https://doi.org/10.1016/j.jvoice.2009.07.002
Coupé C (2018). Modeling linguistic variables with regression models: addressing
non-gaussian distributions, non-independent observations, and non-linear
predictors with random effects and generalized additive models for location,
scale, and shape. Front Psychol, 9. https://doi.org/10.3389/fpsyg.2018.00513
Chagnon N, Irons W (2002) Adaptation and human behavior: an anthropological
perspective.1st edn. Aldine De Gruyter, New York
Chappell HM (2001) Synchrony and diachrony of Sinitic languages: A brief history
of Chinese dialects. In: Chappell HM (ed) Sinitic grammar: synchronic and
diachronic perspectives. Oxford University Press, Oxford, pp. 3–28
Chinese Academy of Social Sciences (2017a) Language atlas of China: Chinese
dialect volume, 2nd edn. The Commercial Press, Beijing
Chinese Academy of Social Sciences (2017b) Language atlas of China: minority
languages volume, 2nd edn. The Commercial Press, Beijing
Collins J (2016) Commentary: the role of language contact in creating correlations
between humidity and tone. J Lang Evol 1(1):46–52. https://doi.org/10.1093/
jole/lzv012
d’Alpoim Guedes J, Bocinsky RK (2018) Climate change stimulated agricultural
innovation and exchange across Asia. Sci Adv 4(10). https://www.science.org/
doi/10.1126/sciadv.aar4491
Diamond J, Bellwood P (2003) Farmers and their languages: the first expansions.
Science 300(5619):597–603. https://www.science.org/doi/10.1126/science.
1078208
Donohue M (2016) Commentary: culture mediates the effects of humidity on
language. J Lang Evol 1(1):57–60. https://doi.org/10.1093/jole/lzv009
Editorial Board of Chinese Minority Languages (2009) Brief chronicles of Chinese
minority languages series. The Ethnic Publishing House, Beijing
Ember C, Ember M (2007) Climate, econiche, and sexuality: influences of sonority in
language. Am Anthropol 109(1):180–185. https://www.jstor.org/stable/4496596
Everett C (2013) Evidence for direct geographic influences on linguistic sounds: the
case of ejectives. PLoS ONE 8(6):65275. https://doi.org/10.1371/journal.pone.
0065275
Everett C (2017) Languages in drier climates use fewer vowels. Front Psychol
8:1285. https://doi.org/10.3389/fpsyg.2017.01285
Everett C (2021) The sound systems of languages adapt, but to what extent?
Considerations of typological, diachronic and mercurial data. Cadernos de
Linguística 2(1):1–23. https://cadernos.abralin.org/index.php/cadernos/
article/view/342
Everett C, Blasi DE, Roberts SG (2015) Climate, vocal folds, and tonal languages:
connecting the physiological and geographic dots. Proc Natl Acad Sci USA
112(5):1322–1327. https://doi.org/10.1073/pnas.1417413112
Everett C, Blasi DE, Roberts SG (2016a) Language evolution and climate: the case
of desiccation and tone. J Lang Evol 1(1):33–46. https://doi.org/10.1093/jole/
lzv004
Everett C, Blasi DE, Roberts SG (2016b) Response: climate and language: has the
discourse shifted? J Lang Evol 1(1):83–87. https://doi.org/10.1093/jole/lzv013
Evteev AA, Cardini AL, Morozova IY, O’Higgins P (2014) Extreme climate, rather
than population history, explains mid-facial morphology of northern Asians.
Am J Phys Anthropol 153:449–462. https://doi.org/10.1002/ajpa.22444
Fahed VS, Doheny EP, Busse M, Hoblyn J, Lowery MM (2022) Comparison of
acoustic voice features derived from mobile devices and studio microphone
recordings. J Voice. https://doi.org/10.1016/j.jvoice.2022.10.006
Farideh J, Gadepalli C, Jarchi D, Cheetham B (2021) Acoustic analysis and digital
signal processing for the assessment of voice quality. Biomed Signal Process
Cont 70(4):103018. https://doi.org/10.1016/j.bspc.2021.103018
González-Forero M, Gardner A (2018) Inference of ecological and social drivers of
human brain-size evolution. Nature, 554–557. https://www.nature.com/
articles/s41586-018-0127-x
Gussenhoven C (2016) Commentary: tonal complexity in non-tonal languages. J
Lang Evol 1(1):62–64. https://doi.org/10.1093/jole/lzv016
Hammarström H (2016) Commentary: there is no demonstrable effect of desic-
cation. J Lang Evol 1(1):65–69. https://doi.org/10.1093/jole/lzv015
Harris CM (1966) Absorption of sound in air versus humidity and temperature. J
Acoust Soc Am 40(1):148–159. https://doi.org/10.1121/1.1910031
Hashimoto M (1986) The Altaicization of Northern Chinese. In: McCoy J, Light T
(eds) Contributions to Sino-Tibetan studies. Brill EJ, Leiden, pp. 76–97
Heggarty P, Shimelman A, Abete G, Anderson C, Sadowsky S (2019) Sound com-
parisons: a new online database and resource for research in phonetic diversity.
In: Calhoun S, Escudero P, Tabain M, Warren P (eds.) Proceedings of the 19th
International Congress of Phonetic Sciences (ICPhS), Melbourne, Australia
2019. Australasian Speech Science and Technology Association, Canberra,
Australia, pp. 280–284. https://www.internationalphoneticassociation.org/icphs-
proceedings/ICPhS2019/papers/ICPhS_329.pdf
Hemler R, Wieneke GH, Jonckere PD (1997) The effect of relative humidity of
inhaled air on acoustic parameters of voice in normal subjects. J Voice
11(3):295–300. https://doi.org/10.1016/S0892-1997(97)80007-0
Howard D, Angus J (2009) Acoustics and Psychoacoustics, 4th edn. Focal Press, Oxford
ISO 9613-1. Acoustics–attenuation of sound during propagation outdoors–part 1:
calculation of the absorption of sound by the atmosphere. Int StandOrgan
1993. https://www.iso.org/standard/17426.html
Judelson DA, Maresh CM, Anderson JM, Armstrong LE, Casa DJ, Kraemer WJ,
Volek JS (2007) Hydration and muscular performance. does fluid balance
affect strength, power and high-intensity endurance? Sports Med
37(10):907–921. https://doi.org/10.1111/j.1467-3010.2009.01790.x
Koskela HO (2007) Cold air-provoked respiratory symptoms: the mechanisms and
management. Int J Circumpolar Health 66(2):91–100. https://doi.org/10.
3402/ijch.v66i2.18237
Ladd DR (2016) Commentary: tone languages and laryngeal precision. J Lang Evol
1(1):70–72. https://doi.org/10.1093/jole/lzv014
LaPolla RJ (2001) The role of migration and language contact in the development
of the Sino-Tibetan language family. In: Aikhenvald AY, Dixon RMW (eds).
Areal diffusion and genetic inheritance: case studies in language change.
Oxford Univ Press, Oxford, pp. 225–254
Leydon C, Sivasankar M, Falciglia DL, Atkins C, Fisher KV (2009) Vocal fold
surface hydration: a review. J Voice 23(6):658–665. https://doi.org/10.1016/j.
jvoice.2008.03.010
Liu HM, Liang J, van Heuven VJ, Heeringa W (2020) Vowels and tones as acoustic
cues in Chinese subregional dialect identification. Speech Commun
123(3):59–69. https://doi.org/10.1016/j.specom.2020.06.006
Lü LC, Li WL (2012) Zhongguo dili [China Geography], 1st edn. Science Press,
Beijing
Maddieson I (2013) Tone. In: Dryer, M., Haspelmath, M. (eds) The world atlas of
language structures online. Max Planck Institute for Evolutionary Anthro-
pology, Leipzig. https://wals.info/chapter/13
Maddieson I, Coupé C (2015) Human spoken language diversity and the acoustic
adaptation hypothesis. J Acoust Soc Am 138(3):1838. https://doi.org/10.1121/
2.0000198
Maddux SD, Butaric LN, Yokley TR, Franciscus RG (2017) Ecogeographic varia-
tion across morphofunctional units of the human nose. Am J Phys Anthropol
162(1):103–119. https://doi.org/10.1002/ajpa.23100
Mahalingam S, Boominathan P (2016) Effects of steam inhalation on voice quality-
related acoustic measures. Laryngoscope 126(10):2305–2309. https://doi.org/
10.1002/lary.25933
Munroe RL, Fought JG, Macaulay R (2009) Warm climates and sonority classes not
simply more vowels and fewer consonants. Cross Cult Res 43(2):123–133.
https://doi.org/10.1177/106939710933148
R Core Team (2018) R: A language and environment for statistical computing.
Vienna. http://www.R-project.org/
Ran Q (2016) Hanyu fangyan jixian shengdiao qingdan yanjiu [Studies on extreme
tone inventories across Chinese dialects]. Nankai Univ Press, Tianjin
Roberts SG (2018) Robust, causal, and incremental approaches to investigating lin-
guistic adaptation. Front Psychol 9:166. https://doi.org/10.3389/fpsyg.2018.00166
Saggio G, Costantini G (2020) Worldwide healthy adult voice baseline parameters:
a comprehensive review. J Voice. https://doi.org/10.1016/j.jvoice.2020.08.028
Schultz BG, Rojas S, John MS, Kefalianos E, Vogel AP (2021) A cross-sectional
study of perceptual and acoustic voice characteristics in healthy aging. J
Voice. https://doi.org/10.1016/j.jvoice.2021.06.007
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-023-01969-4 ARTICLE
HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2023) 10:453 | https://doi.org/10.1057/s41599-023-01969-4 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Shu M, Zhang Y, Jiang JJ (2022) The effect of Mandarin vowels on acoustic
analysis: a prospective observational study. J Voice 138(3):1–6
Singh L, Goh HH, Wewalaarachchi TD (2015) Spoken word recognition in early
childhood: comparative effects of vowel, consonant and lexical tone variation.
Cognition 142:1–11. https://doi.org/10.1016/j.jvoice.2022.03.028
Sue-Chu M (2012) Winter sports athletes: long-term effects of cold air exposure. Br
J of Sports Med 46(6):397–401. https://doi.org/10.1136/bjsports-2011-090822
Sun H, Zhou C, Huang X, Liu S, Lin K, Yu L, Huang K, Chu J, Yang Z (2013)
Correlation between the linguistic affinity and genetic diversity of Chinese
ethnic groups. J Hum Genet 58(10). https://www.nature.com/articles/jhg201379
Szeto PY, Yurayong C (2010) Sinitic as a typological sandwich: revisiting the
notions of Altaicization and Taicization. Linguistic Typol 25(3):6858–6868.
https://doi.org/10.1515/lingty-2021-2074
Thurgood G, LaPolla RJ (2008) The Sino-Tibetan Languages. Routledge, London
and New York
Uloza V, Ulozaite-Staniene N, Petrauskas T, Kregz
dyte R, Lithuania K (2021) Accuracy
of acoustic voice quality index captured with a smartphone–measurements with
added ambient noise. J Voice. 465.e19–465.e26 https://doi.org/10.1016/j.jvoice.
2021.01.025
Wewalaarachchi TD, Singh L (2015) Vowel, consonant, and tone variation exert
asymmetrical effects on spoken word recognition: evidence from 6 year-old
monolingual and bilingual learners of Mandarin. J Exp Child Psychol
189(3):1838–1838. https://doi.org/10.1016/j.jecp.2019.104698
Winter B, Wedel A (2016) Commentary: desiccation and tone within linguistic
theory and language contact research. J Lang Evol 1(1):80–82. https://doi.org/
10.1093/jole/lzv010
Acknowledgements
This research was funded by the major project from National Social Science Fund of
China (Grant No. 19ZDA300) and the Deutsche Forschungsgemeinschaft (DFG, German
Research Foundation) under Germany´s Excellence Strategy (EXC 2150–390870439).
Author contributions
QR and QX designed research, analyzed data, reviewed, and edited the paper. YL and S
Wichmann wrote, reviewed, and edited the paper. LW contributed language recordings
resource, supervised the research, and reviewed the paper. S Wang, JD, and YL per-
formed research and analyzed data. TW improved the visualization. YL and QX con-
tributed equally to this work.
Competing interests
The authors declare no competing interests.
Ethical approval
This article does not contain any studies with human participants performed by any of
the authors.
Informed consent
This article does not contain any studies with human participants performed by any of
the authors.
Additional information
Supplementary information The online version contains supplementary material
available at https://doi.org/10.1057/s41599-023-01969-4.
Correspondence and requests for materials should be addressed to Qibin Ran.
Reprints and permission information is available at http://www.nature.com/reprints
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the article’s Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
article’s Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this license, visit http://creativecommons.org/
licenses/by/4.0/.
© The Author(s) 2023
ARTICLE HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | https://doi.org/10.1057/s41599-023-01969-4
10 HUMANITIES AND SOCIAL SCIENCES COMMUNICATIONS | (2023) 10:453 | https://doi.org/10.1057/s41599-023-01969-4
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
Content uploaded by Qibin Ran
Author content
All content in this area was uploaded by Qibin Ran on Aug 01, 2023
Content may be subject to copyright.