Kiyohiro Shikano

Kiyohiro Shikano
  • Nara Institute of Science and Technology

About

522
Publications
50,720
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
13,397
Citations
Current institution
Nara Institute of Science and Technology
Additional affiliations
April 1994 - present
Nara Institute of Science and Technology
Position
  • Professor (Full)
June 1984 - May 1986
Carnegie Mellon University
Position
  • visiting researcher

Publications

Publications (522)
Article
We introduce a new optimized microphone-array processing method for a spoken-dialogue robot in noisy and reverberant environments. The method is based on frequency-domain blind signal extraction, a signal separation algorithm that exploits the sparseness of a speech signal to separate the target speech and diffuse background noise from the sound mi...
Article
Full-text available
In this paper, we propose a musical-noise-free blind speech extraction method using a microphone array for application to nonstationary noise. In our previous study, it was found that optimized iterative spectral subtraction (SS) results in speech enhancement with almost no musical noise generation, but this method is valid only for stationary nois...
Conference Paper
In this study, we propose a novel toolkit to handle multiple speech-oriented guidance agents for mobile applications. The basic architecture of the toolkit is server-and-client architecture. We assumed the servers are located on a cloud-computing environment, and the clients are mobile phones, such as the iPhone. Huge amounts of servers exist on th...
Conference Paper
We investigate a discrimination method for invalid and valid inputs, received by a speech-oriented guidance system operating in a real environment. Invalid inputs include background voices, which are not directly uttered to the system, and nonsense utterances. Such inputs should be rejected beforehand. We have reported methods using not only the li...
Chapter
In this work, we address the topic classification of spoken inquiries in Japanese that are received by a guidance system operating in a real environment, with a semi-supervised learning approach based on a transductive support vector machine (TSVM). Manual data labeling, which is required for supervised learning, is a costly process, and unlabeled...
Article
In this letter, we address monaural source separation based on supervised nonnegative matrix factorization (SNMF) and propose a new penalized SNMF. Conventional SNMF often degrades the separation performance owing to the basis-sharing problem. Our penalized SNMF forces nontarget bases to become different from the target bases, which increases the s...
Patent
Full-text available
A processing unit is provided which executes speech recognition on speech signals captured by a microphone for capturing sounds uttered in an environment. The processing unit has: an initial reflection component extraction portion that extracts initial reflection components by removing diffuse reverberation components from a reverberation pattern o...
Article
In this paper, we present novel speaking-aid systems based on one-to-many eigenvoice conversion (EVC) to enhance three types of alaryngeal speech: esophageal speech, electrolaryngeal speech, and body-conducted silent electrolaryngeal speech. Although alaryngeal speech allows laryngectomees to utter speech sounds, it suffers from the lack of speech...
Conference Paper
In this paper, we address a monaural source separation problem and propose a new penalized supervised nonnegative matrix factorization (SNMF). Conventional SNMF often degrades the separation performance owing to the basis-sharing problem between supervised bases and nontarget bases. To solve this problem, we employ two types of penalty term based o...
Conference Paper
In this paper, we propose an automatic optimization scheme of FD-BSE-based joint suppression of noise and late reverberation to improve the speech recognition accuracy for spoken-dialogue system. First, we optimize the parameter of conventional FD-BSE-based method using the assessment of musical noise measured by higher-order statistics and acousti...
Conference Paper
In this paper, we review a blind musical-noise-free speech extraction method using a microphone array that can be applied to nonstationary noise. In our previous study, it was found that optimized iterative spectral subtraction (SS) results in speech enhancement with almost no musical noise generation, but this method is valid only for stationary n...
Conference Paper
In this paper, we address a stereo signal separation problem and propose a new method utilizing both directional clustering and superresolution-based supervised nonnegative matrix factorization (NMF) via spectrogram extrapolation using supervised bases. In previous studies, a hybrid method concatenating supervised NMF after directional clustering w...
Conference Paper
In this paper, we address a music signal separation problem, and propose a new supervised algorithm for real instrumental signal separation employing a deformable capability for a spectral supervision trained in advance. Nonnegative matrix factorization (NMF) is one of the techniques used for the separation of an audio mixture that consists of mult...
Article
In this work, we address the topic classification of spoken inquiries in Japanese that are received by a speech-oriented guidance system operating in a real environment. The classification of spoken inquiries is often hindered by automatic speech recognition (ASR) errors, the sparseness of features and the shortness of spontaneous speech utterances...
Article
In this study, we perform a theoretical analysis of the amount of musical noise generated in Bayesian minimum mean-square error speech amplitude estimators. In our previous study, a musical noise assessment based on kurtosis has been success- fully applied to spectral subtraction. However, it is difficult to apply this approach to the methods with...
Article
In this paper, we present statistical approaches to enhance body-conducted unvoiced speech for silent speech communication. A body-conductive microphone called nonaudible murmur (NAM) microphone is effectively used to detect very soft unvoiced speech such as NAM or a whispered voice while keeping speech sounds emitted outside almost inaudible. Howe...
Article
Full-text available
In this paper, we propose a new interactive controller for audio object localization based on spatially representative vector operations on a stereo mixed source. First, we developed the interactive controller, which is equipped with a capacitive touchscreen panel so that the listener can intuitively operate audio objects displayed on the touchscre...
Article
In this paper, we provide a theoretical analysis of the amount of musical noise in iterative spectral subtraction, and its optimization method for the least musical noise generation. To achieve high-quality noise reduction with low musical noise, iterative spectral subtraction, i.e., iteratively applied weak nonlinear signal processing, has been pr...
Conference Paper
In this paper, we propose a new iterative signal extraction method using a microphone array that can be applied to nonstationary noise. In our previous study, it was found that optimized iterative spectral subtraction (SS) results in speech enhancement with almost no musical noise generation, but this method is valid only for stationary noise. The...
Conference Paper
Full-text available
This paper describes a voice quality control method in statistical esophageal speech enhancement. Esophageal speech is produced by one of the alternative speaking methods for laryngectomees. Its naturalness and intelligibility are much lower than those of natural voices and its voice quality sounds similar even if uttered by different laryngectomee...
Conference Paper
Full-text available
In this paper, we propose a new theory of nonlinear noise reduction with a perfectly musical-noise-free property, where no musical noise is generated even for a high signal-to-noise ratio. To achieve high-quality noise reduction with low musical noise, an iterative spectral subtraction method, i.e., recursively applied weak nonlinear signal process...
Conference Paper
Full-text available
In this paper, we propose a new method for stable estimation of the kurtosis of a speech power spectrum. Speech kurtosis can be used for the prediction of speech recognition accuracy as reported in recent studies. However, the conventional estimation method is very unstable owing to the high sensitivity of higher-order statistics. To overcome this...
Article
Full-text available
In this paper, we introduce a generalized minimum mean-square error short-time spectral amplitude estimator with a new prior estimation of the speech probability density function based on moment-cumulant transformation. From the objective and subjective evaluation experiments, we show the improved noise reduction performance of the proposed method.
Article
We propose a structure-generalized blind spatial subtraction array (BSSA), and the theoretical analysis of the amounts of musical noise and speech distortion. The structure of BSSA should be selected according to the application, i.e., a channelwise BSSA is recommended for listening but a conventional BSSA is suitable for speech recognition.
Article
An electrolarynx (EL) is a medical device that generates sound source signals to provide laryngectomees with a voice. In this article we focus on two problems of speech produced with an EL (EL speech). One problem is that EL speech is extremely unnatural and the other is that sound source signals with high energy are generated by an EL, and therefo...
Conference Paper
In this paper, we propose a new theoretical analysis of amount of musical noise generated in several noise reduction methods with a decision-directed a priori SNR estimator using higher-order statistics. In our previous study, a musical noise assessment based on kurtosis has been successfully applied to spectral subtraction and Wiener filter. Howev...
Conference Paper
An example-based response generation is a robust and practical approach for a real-environment information guidance system. However, this framework cannot reflect differences in nuance, because the set of answer sentences are fixed beforehand. To overcome this issue, we have proposed response generation using a statistical machine translation techn...
Conference Paper
In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, to achieve high recognition accuracy for the early utterance of the target speaker, we introduce a new rapid ICA initialization method combining robot image information and a prestored initial separation fi...
Conference Paper
In this paper, we apply the higher-order statistics parameter to automatically improve the performance of blind speech enhancement. Recently, a method to suppress both diffuse background noise and late reverberation part of speech has been proposed combining blind signal extraction and Wiener filtering. However, this method requires a good strategy...
Conference Paper
In this paper, we propose a modified musical-noise-free blind spatial subtraction array (BSSA) based on ICA-based iterative noise estimation with channel selection. In our previous study, we have proposed a modified BSSA consisting of dynamic noise estimation by ICA and musical-noise-free iterative spectral subtraction (SS), where multiple iterativ...
Conference Paper
To build an acoustic system that can maintain the localization of sound images included in stereo mixed signals, we propose a new object-based up-mixer that performs sound source separation and sound location estimation. First, in a preliminary experiment, we show the effectiveness of sound location estimation using the proposed up-mixer via object...
Conference Paper
In this paper, we address an improved method of noise reduction used in multichannel Non-Audible Murmur (NAM) based on blind source separation. Recently, speech processing with NAM has been proposed for applying versatile speech interface into quiet environments where we hesitate to utter. NAM is a very soft whispered voice signal detected with the...
Conference Paper
In this paper, we address some variations of the source-localization-preserved MMSE STSA estimator used for binaural hearing aids. In our previous work, the sound-localization-preserved MMSE STSA estimator with ICA-based noise estimation has been proposed. However, this conventional method is based on an approximated optimization criterion and does...
Conference Paper
In this paper, we propose a new theoretical analysis of the amount of musical noise generated in several noise reduction methods with a decision-directed a priori SNR estimator using higher-order statistics. In our previous study, a musical noise assessment based on kurtosis has been successfully applied to spectral subtraction. However, this appro...
Article
In this paper, we propose a new method of sound localization preserved noise reduction method for binaural hearing-aid systems. In the previous works, the sound-localization-preserved MMSE STSA estimator with ICA-based noise estimation was proposed. This method can preserve sound localization by using common spectral gain for noise reduction at eac...
Article
The need for robust pronunciation annotation over out-of-vocabulary (OOV) words has been increasing with the development of an application that deals with proper nouns and brand-new words, such as Voice Search. In robust pronunciation annotation over OOV words, the alignment between graphemes and phonemes is vital data. For a many-to-many alignment...
Article
In this paper, we address the separation of multiple instrumental sources based on semi-supervised nonnegative matrix factorization (SNMF) and propose a new constrained SNMF. Recently, various types of SNMF have been proposed. In particular, we focus our attention on one type of SNMF that utilizes information on a priori bases. Indeed, this type of...
Conference Paper
In this paper, speech recognition accuracy improvement is addressed for ICA-based multichannel noise reduction in spoken-dialogue robot. First, a new permutation solving method using a probability statistics model is proposed for realistic sound mixtures consisting of point-source speech and diffuse noise. Next, to achieve high recognition accuracy...
Article
In this paper, we propose a musical-noise-controllable algorithm for array signal processing with the aim for high-performance and high-quality noise reduction. Recently, many methods of integrating linear microphone array signal processing and nonlinear signal processing for noise reduction have been studied, but these methods often suffer from th...
Article
In this paper, we provide a new theoretical analysis of the amount of musical noise generated via generalized spectral subtraction based on higher order statistics. Power spectral subtraction is the most commonly used spectral subtraction method, and in our previous study a musical noise assessment theory limited to the power spectral domain was pr...
Conference Paper
In this paper, we propose a structure-generalized parametric blind spatial subtraction array (BSSA), and the theoretical analysis of the amounts of musical noise and speech distortion is conducted via higher-order statistics. We theoretically prove a tradeoff between the amounts of musical noise and speech distortion in various BSSA structures. Fro...
Conference Paper
Full-text available
This paper describes a novel approach based on voice conversion (VC) to speaker-adaptive speech synthesis for speech-tospeech translation. Voice quality of translated speech in an output language is usually different from that of an input speaker of the translation system since a text-to-speech system is developed with another speaker’s voices in t...
Conference Paper
In this paper, to achieve high-quality speech enhancement, we introduce the generalized minimum mean-square error shorttime spectral amplitude estimator with a new blind prior estimation of the speech probability density function (p.d.f.). To deal with various types of speech signals with different p.d.f., we propose an algorithm of speech kurtosis...
Conference Paper
Full-text available
For a reproduced sound field, the competing goals between the listening area and reproduction accuracy in an actual environment is one of the most important problems in sound field reproduction using loudspeakers. In this paper, we propose a new method of balancing these goals with absolute accuracy using an inverse filter of the room acoustics: th...
Conference Paper
Full-text available
In this study, we evaluate our proposed methods for enhancing alaryngeal speech based on statistical voice conversion techniques. Voice conversion based on a Gaussian mixture model has been applied to the conversion of alaryngeal speech into normal speech (AL-to-Speech). Moreover, one-to-many eigenvoice conversion (EVC) has also been applied to AL-...
Conference Paper
Full-text available
In this paper we present a novel approach to acoustic model training for non-audible murmur (NAM) recognition using normal speech data transformed into NAM data. NAM is extremely soft murmur, that is so quiet that people around the speaker can hardly hear it. It is detected directly through the soft tissue of the head using a special body-conductiv...
Conference Paper
Full-text available
In this paper, to automatically generate musical thumbnails that contain the main part of the original tune, we propose a new estimation method for identifying structure changes in stereo tunes based on localization information. The proposed method can estimate the main parts of a musical tune by analyzing the specific timing when localization info...
Conference Paper
Recently, one of the authors has reported that the amount of generated musical noise is strongly correlated with higher-order statistics of the power spectra. On the basis of this finding, in this paper, we provide a new theoretical analysis of the amount of musical noise generated via the Wiener filtering family. Our theoretical analysis allows th...
Article
In this paper, an improved parametric postfiltering is introduced in our previously proposed blind spatial subtraction array (BSSA), and its theoretical analysis of the amounts of musical noise and noise reduction is conducted via higher-order statistics. Compared with the conventional BSSA, it is clarified that parametric BSSA can improve speech r...
Article
Full-text available
In this paper, we present a comparative study on directly aligned multi point controlled wavefront synthesis (DMCWS) and wave field synthesis (WFS) for the realization of a high-accuracy sound reproduction system, and the amplitude, phase and attenuation characteristics of the wavefronts generated by DMCWS and WFS are assessed. First, in the case o...
Article
Full-text available
Example-based question answering (QA) is an ef-fective approach for real-world spoken dialogue systems. A limitation of an example-based QA is that a system cannot appropriately respond to a user's question, if a similar question-answer pair does not exist in the question and answer database (QADB). For a robust spoken dialogue system, it is import...
Article
Full-text available
In this paper, we propose a computationally efficient method of body-conducted voice conversion. A body-conducted voice is robust against to external noise but its voice quality is severely degraded by mechanisms of body-conduction. The conventional body-conducted voice conversion method effectively enhances the body-conducted voice by converting b...
Article
Full-text available
An example-based question answering (QA) is a robust and practical approach for a real-environment information guidance system. However, it cannot appropriately respond to unexpected user's utterances if a similar example of a question-answer pair does not exist in the QA database; in addition, the answer sentences cannot reflect differences in nua...
Article
Full-text available
An alignment between graphemes and phonemes is vital data to annotate the pronunciation for out-of-vocabulary words. We desire an alignment to be (1) many-to-many and (2) fine-grained. A traditional one-to-one alignment model does not represent an intuitive mapping for logograms, such as Chinese characters, and has previously reported an inferior p...
Article
Stacked generalization is a method that allows combining output of multiple classifiers using a second-level classification, minimizing the generalization error of first-level classifiers and achieving greater predictive accuracy. In a previous work, we compared the performance of support vector machine (SVM) with radial basis function (RBF) kernel...
Article
Full-text available
We conduct an objective analysis on musical noise generated by two methods of integrating microphone array signal processing and spectral subtraction. To obtain better noise reduction, methods of integrating microphone array signal processing and nonlinear signal processing have been researched. However, nonlinear signal processing often generates...
Conference Paper
In this paper, we propose a microphone array structure for a spoken-oriented robot dialog system that is designed to discriminate the direction of arrival (DOA) of the target speech and that of the robot internal noise. First, we investigate the performance of the noise estimation conducted by semi-blind source separation (SBSS) in presence of both...
Article
In this paper, we propose a new blind speech extraction microphone array combining an independent component analysis (ICA)-based noise estimator and nonlinear signal processing for achieving high-quality speech enhancement. The proposed method consists of three parts, namely, the ICA-based noise estimator for a robust target cancellation, channel-w...
Conference Paper
In this paper, we provide a new theoretical analysis of the amount of musical noise generated via iterative spectral subtraction based on higher-order statistics. To achieve high-quality noise reduction with low musical noise, the iterative spectral subtraction method, i.e., recursively applied weak nonlinear signal processing, has been proposed. A...
Conference Paper
Full-text available
In this paper, we propose a new blind speech extraction method combining ICA-based dynamic noise estimation and a generalized minimum mean-square-error short-time spectral amplitude estimator of the target speech. To deal with various types of speech signals with different probability density functions (p.d.f.), we also introduce a spectral-subtrac...
Conference Paper
Full-text available
This work addresses the classification in topics of utterances in Japanese, received by a speech-oriented guidance system operating in a real environment. For this, we compare the performance of Support Vector Machine and PrefixSpan Boosting, against a conventional Maximum Entropy classification method. We are interested in evaluating their strengt...
Conference Paper
Full-text available
In our previous work, we proposed a speaking-aid system converting electrolaryngeal speech (EL speech) to normal speech using a statistical voice conversion technique. The main weakness of our system is the difficulty of estimating natural contours of the fundamental frequency (F0) from EL speech including only built-in F0 contours. This paper prop...
Conference Paper
Full-text available
This paper presents adaptive voice-quality control methods based on one-to-many eigenvoice conversion. To intuitively control the converted voice quality by manipulating a small number of control parameters, a multiple regression Gaussian mixture model (MR-GMM) has been proposed. The MR-GMM also allows us to estimate the optimum control parameters...
Article
Full-text available
In this paper, we present linear transformation algorithms for many-to-one voice conversion (VC). Many-to-one VC is a tech-nique for converting an arbitrary source speaker's voice into the target speaker's voice. A conversion model previously devel-oped between many prestored source speakers and the target speaker is adapted into a new source speak...
Article
Full-text available
We have developed a one-to-many eigenvoice conversion (EVC) system that allows us to convert a single source speaker's voice into an arbitrary target speaker's voice using an eigenvoice Gaussian mixture model (EV-GMM). This system is capable of effectively building a conversion model for an arbitrary target speaker by adapting the EV-GMM using only...
Article
Full-text available
This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices usually sound unnatural compared with normal speech. To improve the intelligibility and naturalness...
Article
Non-audible murmur (NAM) is an unvoiced speech signal that can be received through the body tissue with the use of special acoustic sensors (i.e., NAM microphones) attached behind the talker's ear. The authors had previously reported experimental results for NAM recognition using a stethoscopic and a silicon NAM microphone. Using a small amount of...
Article
Full-text available
We have so far proposed a speaking-aid system for laryngectomees using a statistical voice conversion technique. In the proposed system, artificial speech articulated with extremely small sound source signals is detected with a Non-Audible Murmur (NAM) microphone, and then, the detected artificial speech is converted into more natural voice in a pr...
Article
Full-text available
In this paper, we describe a novel model training method for one-to-many eigenvoice conversion (EVC). One-to-many EVC is a technique for converting a specific source speaker's voice into an arbitrary target speaker's voice. An eigenvoice Gaussian mixture model (EV-GMM) is trained in advance using multiple parallel data sets consisting of utterance-...
Article
In this paper, we propose a musical-noise-controllable algorithm for array signal processing with the aim for high-performance and high-quality noise reduction. Recently, many methods of integrating linear microphone array signal processing and nonlinear signal processing for noise reduction have been studied, but these methods often suffer from th...
Conference Paper
Full-text available
This paper study the blind estimation of the diffuse background noise for the hands-free speech interface. Some recent papers showed that it is possible to use blind signal separation (BSS) to estimate the diffuse background noise by suppressing the speech component after all the components were separated. In particular, the scale indeterminacy of...
Conference Paper
Full-text available
In this paper, we propose a new blind speech extraction method consisting of a minimum mean-square error short-time spectral amplitude (MMSE STSA) estimator and noise estimation based on independent component analysis (ICA). First, we perform a computer simulation using the artificial noise whose stationarity could be controlled parametrically, and...
Conference Paper
Full-text available
This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices sound unnatural. To improve the intelligibility and naturalness of esophageal speech, we propose a v...
Conference Paper
Several recent methods for speech enhancement in presence of diffuse background noise use frequency domain blind signal separation to estimate the diffuse noise and a nonlinear post filter to suppress this estimated noise. This paper presents a frequency domain blind signal extraction method for estimating the diffuse noise in place of the frequenc...
Conference Paper
Full-text available
In this paper, we conduct a theoretical analysis of the amount of musical noise generated via methods of integrating beamforming and spectral subtraction (SS) based on higher-order statistics under the same noise reduction performance condition. In our previous analysis, we did not consider the effect of flooring technique in SS and the fact that t...
Article
The physical characteristics of weak body-conducted vocal-tract resonance signals called non-audible murmur (NAM) and the acoustic characteristics of three sensors developed for detecting these signals have been investigated. NAM signals attenuate 50 dB at 1 kHz; this attenuation consists of 30-dB full-range attenuation due to air-to-body transmiss...
Conference Paper
Full-text available
This paper presents a novel training method of an eigenvoice Gaussian mixture model (EV-GMM) effectively using non-parallel data sets for many-to-many eigenvoice conversion, which is a technique for converting an arbitrary source speaker's voice into an arbitrary target speaker's voice. In the proposed method, an initial EV-GMM is trained with the...
Article
Full-text available
In this paper, we propose a new extension framework of multichannel audio coding based on temporal quantization of spatial information. In our previous study, multiple-audio-object signal can be encoded/decoded via prototypes of directional clustering for each audio object. This paper, first, pays attention to the fact that quantized information co...
Article
Full-text available
The performance of automatic speech recognition for signals ac-quired through a hands-free speech interface is limited by the ad-verse effect of the noise and the reverberation. Frequency do-main blind signal processing techniques, like blind signal separa-tion, have been used with success for suppressing the noise in real situation but they usuall...
Article
Full-text available
In this paper, we provide a new theoretical analysis of the amount of musical noise generated via generalized spectral subtraction based on higher-order statistics. Power spectral subtraction is the most commonly used spectral subtraction method, and in our previous study a musical noise assessment theory limited to the power spec-tral domain was p...
Article
Full-text available
This paper presents an acoustic compensation method in body-conducted speech conversion that automati-cally compensates for acoustic differences caused by changes in recording conditions. An enhancement process for body-conducted speech recorded with a Non-Audible Murmur (NAM) microphone has successfully applied a statistical voice conver-sion tech...
Article
Full-text available
This paper proposes speaking-aid systems based on one-to-many eigenvoice conversion (EVC) for enhancing three types of alaryngeal speech: esophageal speech; electrolaryngeal speech; and body-conducted silent electrolaryngeal speech. Al-though alaryngeal speech allows laryngectomees to utter speech sounds, it suffers from lack of naturalness and spe...
Article
In this work, we address the classification in topics of utterances in Japanese received by a speech-oriented guidance system operating in a real environment. The implementation of this kind of systems requires the collection and manual labeling of actual user's utterances, which is a costly process. Because of this, we are interested in evaluating...
Article
In this study, in order to automatically generate thumbnail music that has a main part of the original tune, we propose a new estimation method of structure changes in stereo tunes based on localization information. The proposed method can estimate the main parts of the music tune by analyzing specific timing when localization information changes u...
Article
Full-text available
In this paper, we propose a fast and versatile blind source separation including closed-form estimation of sources' probability density functions (PDFs), where the ICA's activation function is automatically adapted to various noise conditions. In the proposed method, closed-form second-order ICA and closed-form PDF estimation are introduced as a co...
Conference Paper
Full-text available
Binaural cue coding, which is a representing low bit-rate coding of multichannel audio, generates large distortion when the audio data have complex spatial image, such as symphony. Such distortion caused by the low frequency resolution of spatial information because BCC quantizes the parameters of localization. In this paper we propose a new coding...
Conference Paper
The speech enhancement architecture presented in this paper is specifically developed for hands-free robot spoken dialog systems. It is designed to take advantage of additional sensors installed inside the robot to record the internal noises. First a modified frequency domain blind signal separation (FD-BSS) gives estimates of the noises generated...
Article
Full-text available
APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Poster session: Speech Processing (7 October 2009). This paper presents a novel method of enhancing esophageal speech based on statistical voice conversion. Esophageal speech is one of the speaking method...
Conference Paper
In this paper, we propose an appropriate structure selection algorithm for less musical-noise generation in integration methods of microphone array and spectral subtraction. In our previous work, we have analyzed musical-noise reduction structure in integration methods of microphone array and spectral subtraction based on higher order statistics. H...
Conference Paper
This paper presents a new frequency domain blind signal extraction (FD-BSE) method for the extraction of a target speech in presence of diffuse background noise. This is a fast alternative to frequency domain blind signal separation (FD-BSS) for hands-free speech interface. Like the FD-BSS approach, the speech signal is enhanced by using a nonlinea...
Conference Paper
Full-text available
This paper presents a statistical approach to synthesizing emphasized speech based on hidden Markov models (HMMs). Context-dependent HMMs are trained using emphasized speech data uttered by intentionally emphasizing an arbitrary accentual phrase in a sentence. To model acoustic characteristics of emphasized speech, new contextual factors describing...
Conference Paper
Full-text available
In this paper, we review our recent research on technologies for processing body-conducted speech detected with Non-Audible Murmur (NAM) microphone. NAM microphone enables us to detect various types of body-conducted speech such as ex- tremely soft whisper, normal speech, and so on. Moreover, it is robust against external noise due to its noise-pro...
Conference Paper
Full-text available
In this paper, we propose many-to-many voice conversion (VC) techniques to convert an arbitrary source speaker's voice into an arbitrary target speaker's voice. We have proposed one-to- many eigenvoice conversion (EVC) and many-to-one EVC. In the EVC, an eigenvoice Gaussian mixture model (EV-GMM) is trained in advance using multiple parallel data s...
Article
Full-text available
In a spoken dialog system, the example-based response generation method generates a response by searching a dialog example database for the example question most similar to an input user utterance. That method has the advantage of ease of system expansion. It requires, however, a number of utterance examples whose correct responses are labeled. In...
Article
We develop a new blind source separation (BSS) microphone named SSM-001 which can separate multiple sounds in real-time under noisy conditions. The BSS microphone is based on our previously proposed BSS algorithm which combines a Single-Input Multiple-Output (SIMO)-model based BSS and SIMO-model based binary masking. We modify this algorithm and im...
Article
We propose a new blind spatial subtraction array (BSSA) consisting of a noise estimator based on independent component analysis (ICA) for efficient speech enhancement. In this paper, first, we theoretically and experimentally point out that ICA is proficient in noise estimation under a non-point-source noise condition rather than in speech estimati...
Conference Paper
Full-text available
In this paper, we conduct an analysis for reduction of musical noise in integration method of microphone array signal processing and nonlinear signal processing. In these days, for better noise reduction, integration methods of microphone array signal processing and nonlinear signal processing have been researched. However, non-linear signal proces...
Conference Paper
In this paper, we describe and review our recent development of hands-free speech dialogue system which is used for railway station guidance. In the application at the real railway station, robustness against reverberation and noise is the most essential issue for the dialogue system. To address the problem, we introduce two key techniques in our p...
Article
Full-text available
A new blind source extraction method in widespread noise conditions is proposed, which is based on multiple frequency-domain independent component analysis (FDICA) combining projection back and spectral subtraction. In addition, We implement the proposed method to digital signal processor (DSP) for a more realistic real-time operation, and develop...

Network

Cited By