Conference PaperPDF Available

The interplay and interaction between prosody and syntax: evidence from Mandarin Chinese

Authors:

Abstract

This paper aims to (1.) quantify possible correlation between syntactic structure and prosodic manifestation in Mandarin Chinese, (2.) explore to what extent such correlation could be predicted by syntactic structures and what may go beyond these correlations, and (3.) increase more operational characteristics to Mandarin prosodic structures.
A preview of the PDF is not available
... The prosodic property in Fig. 1 indicates a correspondence with the syntactic property. This investigation concurs with the conclusion of Tseng et al. [1]. Fach [2] also concluded that prosodic boundaries can be obtained from the syntactic boundaries. ...
... The unit sequence with minimum cost between all potential target unit sequences in the unit lattice is selected and output as the synthesized speech. For a certain target unit sequence in the unit lattice, the unit sequence of length with minimum cost is selected according to the following cost function: (1) where denotes the candidate unit sequence, is the cost of concatenating two adjacent candidate units, and represents the substitution cost consisting of the syntactic cost and prosodic cost . The substitution cost for target word sequence is obtained by (2) where means the difference in syntactic structures between target word sequence and candidate word sequence , and will be introduced in Section III. ...
... This paper has presented a syntax-based framework for Mandarin TTS system using probabilistic CFG. The correspondence between syntactic and prosodic phrasing has been widely investigated [1], [2], [26]- [28], indicating that syntactic properties could help the prediction of prosodic structure. Furthermore, candidate units similar to the target units in syntactic structures can improve the naturalness of the synthetic speech. ...
Article
Full-text available
This paper presents a variable-length unit selection scheme based on syntactic cost to select text-to-speech (TTS) synthesis units. The syntactic structure of a sentence is derived from a probabilistic context-free grammar (PCFG), and represented as a syntactic vector. The syntactic difference between target and candidate units (words or phrases) is estimated by the cosine measure with the inside probability of PCFG acting as a weight. Latent semantic analysis (LSA) is applied to reduce the dimensionality of the syntactic vectors. The dynamic programming algorithm is adopted to obtain a concatenated unit sequence with minimum cost. A syntactic property-rich speech database is designed and collected as the unit inventory. Several experiments with statistical testing are conducted to assess the quality of the synthetic speech as perceived by human subjects. The proposed method outperforms the synthesizer without considering syntactic property. The structural syntax estimates the substitution cost better than the acoustic features alone
... We noted that speakers do respect PMs as indication of syntactic structure since only a small portion of PMs (commas in particular) were overlooked in speech data (4.3%, 4.41% in CNA; 13.41%, 8.50 in WF; 6.50, 2.17% in CL). However, we note considerable higher percentage of inserted PPh boundaries (and pauses) exist in speech data where no corresponding PMs occur in text (26.56%, 30.49% in CNA; 23.08%, 20.19% in WF;and 11.69%, 3.14% in CL). ...
... Since paragraph and discourse involve semantic cohesion above sentences, more information than syntactic governing exists in speech flow and prosody functions much more than disambiguating underlying syntactic structures [1,2,3,4,5]. We believe one feasible way to look into this aspect is to perform syntactic analyses of text data for nodes and boundaries [9,10] and subsequently compare with actual speech data [11] to look into the speechsyntax non-overlap. In other words, what speakers actually do and why they did it. ...
Article
Full-text available
We have previously addressed the functions of Mandarin fluent speech prosody from a top-down perspective in light of higher level discourse information and cross-phrase prosodic associations. Postulating a prosody hierarchy HPG (Hierarchical Phrase Grouping) of multi-phrase speech paragraphs by systematic treatments of boundaries and breaks as discourse related, we were able to quantitatively account for higher level contributions in the cross-phrase prosody context by acoustic parameters, and explain how such association triggers lower level nodes to modify systematically (Tseng et al, 2004; 2005; 2006). In this paper, we further investigate within-paragraph prosody-syntax non-overlaps to look for contributions of higher information rather than on how prosody disambiguates underlying syntactic structures most noted in the literature (1, 2, 3, 4, 5). We hypothesize that most of such non- overlaps are due to higher-level information and CNA be accounted for. We define overlap by mapping of annotated boundaries in speech corpora and punctuation marks (PM) in corresponding text; whereas non-overlaps are where (1.) no PM in text but a boundary is tagged in speech data, (2.) a PM in text but no boundary occurs in speech data and (3.) mismatch between PM and produced boundary. Three types of speech corpora differing in style and format were used: (1.) reading of 26 discourse pieces up to 900 more syllables/characters by 2 radio announcers of plain text (CNA), (2.) reading of weather forecast by 2 untrained speakers (WF) and (3.) reading of three Chinese Classics in three different rhyming formats by 2 untrained speakers (CL). We noted that speakers do respect PMs as indication of syntactic structure since only a small portion of PMs (commas in particular) were overlooked in speech data (4.3%, 4.41% in CNA; 13.41%, 8.50 in WF; 6.50, 2.17% in CL). However, we note considerable higher percentage of inserted PPh boundaries (and pauses) exist in speech data where no corresponding PMs occur in text (26.56%, 30.49% in CNA; 23.08%, 20.19% in WF; and 11.69%, 3.14% in CL). We also find relatively high between- speaker overlaps exist in these non-overlaps (45.15%, 37.02% in CNA; 36.57%, 39.20% in WF; and 8.33%, 30.00% in CL) indicating such non-overlaps are by no means random. These non-overlaps are analyzed by syntactic structure and by paragraph positions using quantitative methods to demonstrate contributions from higher level paragraph information. Index Terms: Hierarchical Prosody Group, HPG, discourse prosody, higher level contribution, prosody-syntax non-overlap
... Complex analysis by a linguist on high-level linguistic features is avoided. Moreover, the difficult analysis on the phrase [8,28,29], the preposition [30], the sub-sentence, the prosodic boundary [31], the breathable point or the prosodic phrase [32,33] can also be avoided. The large requirement on computation is reduced. ...
Conference Paper
A new approach for an efficient text analyzer is proposed. The prosody generator driven method is employed to design an efficient text analyzer for Mandarin text-to-speech synthesis. Three heuristic and theoretical methods are used to examine the capability of each linguistic feature. Firstly, the contribution of each linguistic feature on the prosody generator is examined experimentally. Secondly, the cross-influence of each linguistic feature on the prosody generator is analyzed. Thirdly, the problem of over- and under-classification on the linguistic feature is inspected. Finally, these three analytic results are referenced to design an efficient text analyzer. More than 39103 Chinese characters are employed to examine the performance of our text analyzer. Less than 78 ms is needed for word tagging under a P4 1.4 GHz PC. The correction rate with 97% is achieved. It confirms that the performance of our text analyzer is very good. Moreover, more natural and fluent speech is obtained under the lower computation.
... Complex analysis by a linguist on high-level linguistic features is avoided. Moreover, the difficult analysis on the phrase [8,28,29], the preposition [30], the sub-sentence, the prosodic boundary [31], the breathable point or the prosodic phrase [32,33] can also be avoided. The large requirement on computation is reduced. ...
Article
A new approach for an efficient text analyser is proposed. A prosody generator-driven method is employed to design an efficient text analyser for Mandarin text-to-speech. A simpler structure for text analysis, a more suitable classification of linguistic features and a more efficient contribution of linguistic features to the prosody generator can be achieved. Three heuristic and theoretical methods are used to analyse and examine the capability of each linguistic feature: (1) the contribution of each linguistic feature to the prosody generator is examined experimentally; (2) the cross-influence of each linguistic feature on the prosody generator is analysed; (3) the problem of over- and under-classification of the linguistic features is inspected. Finally, these three analytic results are referenced to design an efficient text analyser. In total 35,243 Chinese characters are employed to examine the performance of our text analyser. Only 79 ms CPU time on a P4-1.4G PC is needed for word segmentation and POS tagging. Correction rates of 97.5% and 93.2% are achieved for word segmentation and POS tagging, respectively. This confirms that the performance of our text analyser is very good. Moreover, a Mandarin text-to-speech system is implemented to inspect the performance of the text analysis and the contribution to the prosody generator. More natural and fluent speech is obtained under the lower computation. The MOS of prosody of the synthesised and original speech are 4.2 and 4.8, respectively, which is reasonably good.
... Second, but more important, is that Chinese has many morpho-syntactic features quite different from western languages on one hand, such as monosyllabic structure in morpheme and its flexibility and poly-synthetism in wordformation; while on the other hand, the speech units in natural Chinese is also well-organized as a hierarchy, in stead of a discrete linear alignment. According to Cao [1,2], Tseng [11] and Qian et al [9], the prosodic hierarchy of Mandarin Chinese consists of at least three layers, i.e., prosodic word, prosodic phrase and intonation phrase. That is to say, the monosyllabic written form in Chinese is completely separated from the spoken form. ...
Article
Full-text available
This paper tries to discuss the interrelation between prosody and syntax by clarifying some syntactic constraints in Chinese prosodic segmentation and grouping. The main attention will be paid to search for (1) possible correlation between prosodic breaks and syntactic construction; (2) possible correlation between prosodic breaks and POS; and (3) the role of syntactic and lexical information in prosodic word chunking. Accordingly, an algorisim for the prediction of prosodic structure based on these information could be formed later on. 1.
Conference Paper
Full-text available
A diverse group of speech scientists and engineers has developed the ToBI (TOnes and Break Indices) prosodic transcription system and materials to teach it to transcribers. ToBI consists of parallel tiers reflecting the multiple components of prosody, the most important being a tone tier, for intonational analysis, and a break index tier, for indicating strength of coherence or disjunctive between adjacent words. To assess the system, we measured inter-transcriber agreement on utterances representative of the varied types of speech important to researchers, employing a diverse set of transcribers ranging from experts to newly-trained users. Consistency was measured in terms of number of transcriber pairs agreeing on the labeling of each particular word, a stringent metric. Using this metric, we observe 88% agreement on the presence or absence of a particular category of tonal element, and 81% agreement on the exact label for a tonal category. For break indices, agreement to within one level occurs 92% of the time. We conclude that the ToBI standard and its training materials have been refined to the point that they can be used fruitfully for largescale annotation of prosodic phenomena in speech databases.
Article
Full-text available
This paper aims to present the methodology and guidelines for annotation in CKIP Chinese Treebank. Under the framework of the Information-based Case grammar (ICG), a lexical feature-based grammar formalism, which stipulates each lexical item containing both syntactic and semantic information, the potential phrasal heads of input are located and the semantic relations between words are also identified. Thus, not only phrasal categories but also thematic roles are both annotated. Incorporating with Head-Driven Principle, some guidelines are also implemented for more consistent annotation in such grammatical phenomenon as the constructions of coordinates, topicalization, and the construction with nominal predicate. In addition, we tag the CKIP Treebank with semantic categories to extract useful collocation among semantic classes of the bracketed constitutes, which is also supposed to further enhance the performance of our parsing model.
Article
Full-text available
This paper describes the units and rules involved in (1) the lexicon-prosody interface, (2) the intonation g rammar, (3) prosodic grouping, and (4) the syntax-prosody inter- face. All units are depicted in a multi-layered representation of prosodic structure.
Article
Full-text available
We propose a mapping between prosodic phenomena and semantico-pragmatic effects based upon the hypothesis that intonation conveys information about the intentional as well as the attentional structure of discourse. In particular, we discuss how variations in pitch range and choice of accent and tune can help to convey such information as: discourse segmentation and topic structure, appropriate choice of referent, the distinction between 'given' and 'new' information, conceptual contrast or parallelism between mentioned items, and subordination relationships between propositions salient in the discourse. Our goals for this research are practical as well as theoretical. In particular, we are investigating the problem of intonational assignment in synthetic speech.
Article
Four subjects recorded eight non-compound declarative sentences, containing from one to eight stress groups. Acoustic analysis reveals a tendency for fundamental frequency range to increase with increased utterance length, but in a non-linear and seemingly random fashion. The increase is brought about by higher starting points as well as lower ending points in the longer utterances. Concomitant with the range increase we find a decrease in overall downdrift in the longer utterances, but degree of downdrift is not simply inversely related to utterance length. With four and more stress groups the intonation contour is decomposed into prosodic phrase groups, i.e. the contour contains discontinuities in the shape of partial resettings. The prosodic phrase group boundaries are determined by but do not exactly coincide with major syntactic boundaries, and the data present an argument in favour of a hypothesis of prosodic categories as distinct entities with a non-isomorphous relation to syntactic structure.
Article
Examined the effect of morphological and syntactic boundaries on the temporal structure of spoken utterances. 2 speakers produced 20 tokens each of 4 sets of words consisting of a monosyllabic base form, disyllabic and trisyllabic words derived from the base by the addition of suffixes, and 3 short sentences in which the base form was followed by a syntactic boundary; this in turn was followed by a stressed syllable, 1 unstressed syllable, and 2 unstressed syllables. The sentences thus reproduced the syllabic sequences of the derived words. The duration of words and segments was measured from oscillograms. The manifestation of morphological and syntactic boundaries is discussed, and some implications of the findings relative to the temporal programming of spoken utterances are considered. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Investigating Mandarin Chinese prosody through speech database
  • C Tseng
Tseng, C. "Investigating Mandarin Chinese prosody through speech database". Proceedings of the 1999 Oriental COCOSDA Workshop, Taipei, Taiwan, pp. 141-144, 1999.
A Prosodic labeling system for Mandarin Speech Database
  • C Tseng
  • F Chou
Tseng, C. & Chou, F. "A Prosodic labeling system for Mandarin Speech Database". Proceedings of the XIVth International Congress of Phonetic Sciences, San Francisco, CA., pp. 2379-2382, 1999.
The intonational structuring of discourse
  • J Hirshchberg
  • J Pierrehumbert
Hirshchberg, J. & Pierrehumbert, J. " The intonational structuring of discourse". Proceedings of the 24 th annual meeting of the Association for Computational Linguistics, New York, NY pp. 136-144, 1986.