Hiroya Fujisaki

Hiroya Fujisaki
  • PhD
  • Professor Emeritus at The University of Tokyo

About

222
Publications
21,551
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,262
Citations
Current institution
The University of Tokyo
Current position
  • Professor Emeritus
Additional affiliations
March 2013 - present
Nanjing Normal University
Position
  • Distinguished Honorary Professor
April 1991 - March 2001
Tokyo University of Science
Position
  • Professor (Full)
April 1987 - March 1991
University of Science and Technology of China
Position
  • Guest Professor
Education
April 1954 - March 1956
The University ot Tokyo Graduate School
Field of study
  • Electrical Engineering
April 1954 - March 1956
The University of Tokyo Graduate School
Field of study
  • Electrical Engineering
April 1950 - March 1954
The University of Tokyo
Field of study
  • Electrical Engineering

Publications

Publications (222)
Article
We present here a framework for a systematic study of attitudes expressed by speech. Although the term “attitude” is commonly used to refer to phenomena at several different levels, we shall assign separate terms to each of these levels for the sake of clarity. The term “ attitude” can refer both to a person and an object (either concrete or abstra...
Chapter
Full-text available
After differentiating attitudes from emotions, the present work proposes a method for collecting attitudinal speech, and investigates prosodic manifestation and perceptual interpretation of Mandarin utterances conveying various attitudes. Using the induction technique, a speech corpus was designed and acquired to incorporate seven classes of behavi...
Conference Paper
Full-text available
This study first examines the differences in the gross features of the fundamental frequency contour (the F0 contour) responsible for discriminating utterances of three sentence types, namely declarative, imperative and interrogative, in Bangla. In order to realize these differences in speech synthesis, these differences are then interpreted in ter...
Conference Paper
Pause and F0 play important roles in the process of understanding a spoken message. Up to the present, pause insertion and F0 contour generation have been modeled separately and evaluated independently from each other. However, the occurrence of a pause has a direct influence on the F0 contour of the portion of an utterance immediately after the pa...
Article
Full-text available
This paper presents a comparative study of prosodic features of utterances of two types of Bangla sentences: the declarative type and the interrogative (‘yes-no’ question) type whose textual contents are identical except for the punctuation marks in Bangla. The study is based on the analysis of 44 utterances each of the declarative type and the int...
Conference Paper
Full-text available
As we enter the age of information society, information retrieval over the Internet will be indispensable for everyday life, and spoken language will be an essential medium for human-machine dialogue. This paper presents a part of the outcome of a five-year project led by the first author on Human-Machine Spoken Dialogue Systems, conducted under th...
Conference Paper
Full-text available
After differentiating attitudes from emotions, the present work investigates prosodic manifestations and perceptual attributes of Mandarin utterances conveying various attitudes. A speech corpus was designed to incorporate five classes of attitudes: friendly/hostile, polite/rude, serious/joking, praising/blaming, and confident/uncertain. Perceptual...
Article
Full-text available
This study examines how closely Japanese utterances produced by German learners match those of Japanese natives, especially with respect to word and sentence prosody. Utterances were rated perceptually by Japanese native listeners as well as evaluated with respect to timing and F0 contours. We found that the German learners speak at a similar speec...
Conference Paper
Full-text available
Control of pause occurrence and duration is an important issue for text-to-speech synthesis systems. In text-readout speech, pauses occur unconditionally at sentence boundaries and with high probability at major syntactic boundaries such as clause boundaries, but more or less arbitrarily at minor syntactic boundaries. Pause duration tends to be lon...
Article
Contains research objectives and reports on three research projects. U.S. Air Force (Air Force Cambridge Research Center, Air Research and Development Command) under Contract AF19(604)-6102 National Science Foundation
Article
Contains reports on four research projects.
Conference Paper
Full-text available
In the present study, the F0 contours of continuing and terminating prosodic phrases of 4 Swiss German dialects are analyzed by means of the command-response model. In every model parameter, the two prosodic phrase types show significant differences: continuing prosodic phrases indicate higher phrase command magnitude and shorter durations. Locally...
Article
It is apparent that prosodic and segmental features of speech must be temporally coordinated in order to produce a consistent and meaningful message. The precise mechanism for the coordination, however, has not been clear. The present study looks into this problem in the case of word accent in spoken Japanese. The speech material consisted of utter...
Conference Paper
Full-text available
This paper first presents the author's personal view on the importance of modeling in scientific research in general, and then describes two of his works toward modeling certain aspects of human speech communication. The first work is concerned with the physiological and physical mechanisms of controlling the voice fundamental frequency of speech,...
Chapter
Full-text available
This wide-ranging survey of experimental methods in phonetics and phonology shows the insights and results provided by different methods of investigation, including laboratory-based, statistical, psycholinguistic, computational-modeling, corpus, and field techniques. The five chapters in the first part of the book examine the recent history and int...
Article
As one of the major Chinese dialects, Cantonese has a tone system consisting of nine lexical tones and three additional changed tones, which is considerably more complex than that of Mandarin. The most important acoustic feature characterizing these tones is the contour of the voice fundamental frequency (the F0 contour). In this article we present...
Article
Full-text available
As one of the major Chinese dialects, Cantonese has a tone system consisting of nine lexical tones and three additional changed tones, which is considerably more complex than that of Mandarin. The most important acoustic feature characterizing these tones is the contour of the voice fundamental frequency (the F(0) contour). In this article we prese...
Chapter
Full-text available
Although there have been many studies on the prosodic structure of spoken Mandarin as well as many proposals for labeling the prosody of spoken Mandarin, the labeling of prosodic boundaries in all the existing annotation systems relies on auditory perception, and lacks a direct relation to the acoustic process of prosody generation. Besides, percep...
Article
The command‐response model for the F0 contours, first proposed for nontone languages such an Japanese and English, has been extended by the author and his co‐workers to cover F0 contour of tone languages and has been proved to be applicable to several dialects of Chinese including Mandarin, Cantonese, and Shanghainese, as well as other tone languag...
Article
Full-text available
Emphasis and question are two factors that have significant effects on F<sub>0</sub> contours for various languages, among which tone languages require more careful study because their F<sub>0</sub> contours show complex interaction between lexical tones and sentence intonation. This paper employs the command-response model for the process of F<sub...
Conference Paper
Full-text available
Although the command-response model for the process of F0 contour generation has been successfully applied to many languages, the inverse problem, viz., automatic derivation of the model parameters from an observed F 0 contour, is more challenging, especially for tone languages which have tone commands of both polarities. Since the polarities of to...
Conference Paper
Full-text available
Emphasis has significant effect on F 0 contours in various languages, among which tone languages require more careful study because their F 0 contours show complex interaction between lexical tones and phrase intonation. Here we employ the command-response model to investigate the effect of paralinguistic emphasis in Cantonese, a typical tone langu...
Conference Paper
This paper describes the results of a preliminary study on the applicability of the command-response model to F0 contours of spoken Hindi, an official language of India with almost 400 million native speakers in the world. Analysis of observed F0 contours of a number of utterances by two native speakers indicated that the model with provisions for...
Conference Paper
Full-text available
This paper presents the evaluation of a system for speech F0 contour prediction for European Portuguese using the Fujisaki model. It is composed of two command-generating sub- systems, the phrase command sub-system and the accent command sub-system. The parameters for evaluating the ability of each sub-system are described. A comparison is made bet...
Article
While the tonal characteristics of Chinese syllables have been qualitatively described in traditional phonetics, quantitative analysis requires a mathematical model. This paper presents such a model for the fundamental frequency contours of Standard Chinese, based on an extension of a model that has already been proved to be applicable to non-tone...
Conference Paper
Full-text available
Cantonese is a well-known Chinese dialect with a quite complex tone system. We have successfully applied the command-response model to represent F(0) contours of Cantonese speech by defining a set of appropriate tone command patterns. It provides an efficient means to describe Cantonese F(0) contours with high accuracy. In this paper, both qualitat...
Conference Paper
Full-text available
Cantonese is a well-known Chinese dialect with a quite complex tone system. We have applied the command-response model to represent F<sub>0</sub> contours of Cantonese speech by defining a set of appropriate tone command patterns. In this paper, the analysis is extended to Cantonese utterances at three different speech rates. By incorporating the e...
Conference Paper
Full-text available
As one of the major Chinese dialects, Shanghainese is well known for its complex tone sandhi system. This paper applies the command-response model to represent F<sub>0</sub> contours of Shanghainese speech. Analysis-by-synthesis is conducted both on carrier sentences with monosyllabic target words and on isolated polysyllabic words, from which a se...
Conference Paper
Cantonese is well-known as a Chinese dialect with a highly complex tone system. Like Mandarin, the tones in Cantonese are traditionally denoted by a 5-scale notation system, which cannot provide a quantitative model for the F<sub>0</sub> contours of continuous speech. This paper applies the command-response model, which has already been shown to ap...
Article
As a major Chinese dialect, Cantonese is well known for its complex tone system. This paper presents a preliminary analysis of the F 0 contours of Cantonese using a command-response model. Experiments are conducted on a set of designed sentences, from which a set of appropriate tone command patterns for each tone is derived by Analysis-by-Synthesis...
Conference Paper
Full-text available
As a major Chinese dialect, Cantonese is well known for its complex tone system. This paper presents a preliminary analysis of the F 0 contours of Cantonese using a command-response model. Experiments are conducted on a set of designed sentences, from which a set of appropriate tone command patterns for each tone is derived by Analysis-by-Synthesis...
Article
Full-text available
The model for the process of F0 contour generation, first proposed by Fujisaki and his coworkers, has been successfully applied to Standard Chinese, which is a typical tone language with a distinct feature that both positive and negative tone commands are required. However, the inverse problem, viz., automatic derivation of the model parameters fro...
Conference Paper
Full-text available
This paper first presents the physiological and physical properties of the vocal fold and the laryngeal structure involving intrinsic laryngeal muscles that are mainly responsible for the generation of F 0 contours with global components and positive local components. It then takes up the mechanism involving extrinsic laryngeal muscles for generati...
Conference Paper
Full-text available
The model for the process of F 0 contour generation, first proposed by Fujisaki and his coworkers, has been successfully applied to Mandarin, which is a typical tone language with a distinct feature that both positive and negative tone commands are required. However, the inverse problem, viz., automatic derivation of the model parameters from an ob...
Article
Full-text available
This paper presents a model to predict the accent commands (henceforth ACs) of the Fujisaki Model for the F0 contour, being known the phrase commands (henceforth FCs). Accent commands are associated with syllables. For each syllable, an artificial neural network (ANN) decides, with an accuracy of 89.4% whether there will be an associated AC or not....
Article
Full-text available
Starting from the author's view on the process of information manifestation in the tonal features of speech, this paper empha-sizes the importance of objective and quantitative modeling in the study of these features. It then describes a model for the pro-cess of fundamental frequency control of speech that has been originally proposed and establis...
Article
Full-text available
The authors have already presented a method for automatic ex- traction of accent and phrase commands of a model from a given F0 contour of speech. This paper describes improvements in- troduced to cope with difficulties encountered by the previous method, especially in connection with the extraction of accent commands, and reports the results of ex...
Conference Paper
Full-text available
The generation of naturally-sounding F0 contours in TTS enhances the intelligibility and perceived naturalness of synthetic speech. In earlier works the first author developed a linguistically motivated model of German intonation based on the quantitative Fujisaki model of the production process of F0, and an automatic procedure for extracting the...
Conference Paper
Full-text available
A method was developed to utilize linguistic information (lexical accent types and syntactic boundaries) to improve the performance of the automatic extraction of the F0 contour generation process model commands. The extraction scheme is first to smooth the observed F0 contour by a piecewise 3rd order polynomial function and to locate accent comman...
Conference Paper
Full-text available
This paper presents a model to predict the phrase commands of the Fujisaki Model for F0 contour for the Portuguese Language. Phrase commands location in text is governed by a set of weighted rules. The amplitude (Ap) and timing (T0) of the phrase commands are predicted in separate neural networks. The features for both neural networks are discussed...
Conference Paper
Full-text available
The current paper presents a preliminary study on the production and perception of syllabic tones of Vietnamese. A speech corpus consisting of fifty-two six-syllable sequences with various combinations of tones was uttered by two speakers of Standard Vietnamese, one male and one female. The corpus was labeled on the syllabic level and analyzed usin...
Article
Full-text available
A method was developed to utilize linguistic information (lexical accent types and syntactic boundaries) to improve the performance of the automatic extraction of the F0 contour generation process model commands. The extraction scheme is first to smooth the observed F0 contour by a piecewise 3rd order polynomial function and to locate accent comman...
Conference Paper
Full-text available
The model of F<sub>0</sub> contour generation has been successfully applied to Mandarin, which is a typical tone language with a distinct feature that both positive and negative tone commands are required. In this paper, a method is proposed for automatic extraction of parameters of the F<sub>0</sub> contour model for Mandarin. With the same framew...
Conference Paper
This paper describes the results of a preliminary study on the modeling of fundamental frequency contours of Thai utterances. Based on our previous studies on the analysis and modeling of F<sub>0</sub> contours of Standard Chinese, the command-response model is expected to apply also to Thai if we assume the existence of both positive and negative...
Conference Paper
Full-text available
The generation process model of the fundamental frequency contours (F0 contours) of speech is known to be capable of generating F0 contours quite close to observed ones. The extraction of model parameters from an observed contour, however, requires an iterative process starting from a set of initial parameter values. In order to guarantee a rapid c...
Conference Paper
Full-text available
The process of generating the F 0 contour of speech has been modeled quite accurately in mathematical tenns by Fujisaki and his coworkers, but the extraction of parameters of the underlying commands from an observed F 0 contour is an inverse problem that can be solved only by successive approximation. In order to guarantee an efficient and accurate...
Article
Full-text available
The current study examines the interaction of syllable tones and vowel quantity in the production and perception of mono-syllabic words of Thai. A speech corpus containing groups of words differing only as to tone type and vowel quantity was designed. These were embedded in a short carrier sentence of five mid tone syllables, with the target word b...
Conference Paper
This paper presents an intelligent system for information retrieval based on human-machine dialogue through spoken language with novel features such as use of key concepts, unknown word processing, dialogue management through user and system modeling, and automatic acquisition of knowledge to adapt the system to individual users. It then describes...
Conference Paper
Full-text available
The prosodic quality of a text-to-speech system is important for the intellegibility and perceived naturalness of synthetic speech. In earlier works the author developed a linguistically motivated model of German intonation based on the quantitative Fujisaki model of the production process of F0. The current paper compares results yielded by automa...
Conference Paper
While it is quite straightforward to obtain a natural F<sub>0</sub> contour from a set of parameters of a model for F<sub>0 </sub> contour generation, the inverse problem cannot be solved analytically. This paper presents a method for pre-processing a measured F<sub>0</sub> contour to obtain its approximation consisting of third-order polynomial se...
Article
Full-text available
The problem of adequately describing F0 contours is far from being solved. Although symbolic representations such as the ToBI-system appear attractive and have been strongly promoted recently, they neither capture F0 contours in a way that permits their reproduction from the labels, a demand resulting from TTS, nor are the labels phonological in th...
Article
While it is well known that the cricothyroid (CT) muscle is mainly responsible for lowering the voice fundamental frequency (F0) in many languages, the mechanism for F0 lowering in languages such as Standard Chinese (SC) has not been elucidated. Although several studies have shown that the sternohyoid (SH) muscle activity is strongly correlated wit...
Article
No PDF available ABSTRACT The command–response model by Fujisaki and his co‐workers formulates the generation process of the fundamental frequency contour (henceforth F0 contour) in terms of a set of input commands carrying linguistic and paralinguistic information and the mechanisms that respond to these commands. The parameters of the mechanisms...
Article
A command?response model has been presented by Fujisaki and his co?workers initially for the process of generation of the fundamental frequency contour (henceforth F0 contour) of the common Japanese. It consists of a set of input commands carrying linguistic and paralinguistic information, and the mechanisms that respond to these commands to genera...
Article
Conventional approaches for automatic speech recognition and understanding are essentially bottom‐up. Namely, they are based on the assumption that speech can be decomposed reliably into the smallest units, can be recognized by identifying these units first, and then by clustering them to find larger units and their interrelationships. It is thus t...
Conference Paper
This paper describes the command-response model for F<sub>O</sub> contour generation originally developed for common Japanese, and demonstrates its capability of generating F<sub>O</sub> contours of various other languages with minor language-specific modifications. The model is especially useful in multilingual speech synthesis, since the same mec...
Chapter
This paper presents a definition of prosody as the organization of linguistic units within an utterance and a coherent group of utterances, having manifestations both in segmental and suprasegmental features of speech, serving at the same time as a medium for conveying para- and nonlinguistic information. It then discusses the process of spontaneou...
Chapter
A text-to-speech conversion system for Japanese has been developed for the purpose of producing high-quality speech output. This system consists of four processing stages: linguistic processing, phonological processing, control parameter generation, and speech waveform generation. The chapter focuses on the second and the fourth stages, especially...
Conference Paper
The process of generating an F<sub>0</sub> contour from a small number of linguistically meaningful parameters, has been modeled quite accurately, and the model has been used extensively in speech synthesis. The paper deals with the inverse problem, i.e., that of extracting the model parameters from a given contour, which can only be solved by succ...
Conference Paper
Full-text available
Accentuation serves to express both the discrete information concerning the accent type of a prosodic word and the continuous information concerning its prominence. The paper examines the latter aspect of accentuation using recorded radio news read by announcers. The amplitude of the accent command was extracted from an F<sub>0</sub> contour and us...
Conference Paper
Full-text available
On the basis of the short-time relative speech rate defined by the authors, this paper examines the optimum width of the smoothing window by perceptual experiments on the naturalness of re-synthesized speech. With the optimum window of 270 ms, relative speech rates are obtained both for `fast' and `slow' utterances of the same sentence, using an ut...
Conference Paper
Full-text available
The process of generating an F<sub>0</sub> contour from a small number of linguistically meaningful parameters, has been modeled quite accurately, and the model has been used extensively in speech synthesis. The study deals with the inverse problem, i.e., that of extracting the model parameters from a given contour, which can only be solved by succ...
Article
One of the fundamental issues in the study of speech perception is to model the cognitive process of phonetic judgment and to infer the underlying mechanism. Experimental results on categorical judgment as well as on judgment of typicality do not attest to the existence of a single prototype for each phonetic category, but rather to the existence o...
Article
It is well known that speech rate varies both globally and locally in natural discourse due to various factors such as contrastive stress, syntactic boundaries, emotion, etc. While the global speech rate can be clearly defined by the durations of utterances and pauses, the local speech rate has not been well defined. The present authors have propos...

Network

Cited By