J. Even

Nara Institute of Science and Technology, Ikoma, Nara, Japan

Are you J. Even?

Claim your profile

Publications (36)31.53 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a multi-modal sensor approach for mapping sound sources using an omni-directional microphone array on an autonomous mobile robot. A fusion of audio data (from the microphone array), odometry information and the laser range scan data (from the robot) was used to precisely localize and map the audio sources in an environment. An audio map is created while the robot is autonomously navigating through the environment by continuously generating audio scans with a steered response power (SRP) algorithm. Using the poses of the robot, rays are cast in the map in all directions given by the SRP. Then each occupied cell in the geometric map hit by a ray is assigned a likelihood of containing a sound source. This likelihood is derived from the SRP at that particular instant. Since the localization of the robot is probabilistic, the uncertainty in the pose of the robot in the geometric map is propagated to the occupied cells hit during the ray casting. This process is repeated while the robot is in motion and the map is updated after every audio scan. The generated sound maps were reused and the changes in the audio environment were updated by the robot as it identifies these changes.
    Robotics and Automation (ICRA), 2013 IEEE International Conference on; 01/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for detecting moving entities that are in the robot's path but not in the field of view of sensors like laser scanners, cameras or ultrasonic sensors. The proposed system makes use of passive acoustic localization methods which receive information from occluded regions (at intersections or corners) because of the multipath nature of sound propagation. Contrary to the conventional sensors, this method does not require line of sight. In particular, specular reflections in the environment make it possible to detect moving entities that emit sound such as a walking person or a rolling cart. This idea was exploited for safe navigation of a mobile platform at intersections. The passive acoustic localization output is combined with a 3D geometric map of the environment that is precise enough to estimate sound propagation and reflection using ray casting methods. This gives the robot the ability to detect a moving entity out of the field of view of the sensors that require line of sight. Then the robot is able to recalculate its path and waits until the detected entity is out of its path so that it is safe to move to its destination. To illustrate the performance of the proposed method, a comparison of the robot's navigation with and without the audio sensing is provided for several intersection scenarios.
    Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on; 01/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for mapping the radiated sound intensity of an environment using an autonomous mobile platform. The sound intensities radiated by the objects are estimated by combining the sound intensity at the platform's position (estimated with a steered response power algorithm) and the distances to the objects (estimated using laser range finders). By combining the estimated sound intensity at the platform's position with the platform's pose obtained from a particle filter based localization algorithm, the sound intensity radiated from the objects is registered in the cells of a grid map covering the environment. This procedure creates a map of the radiated sound intensity that contains information about the sound directivity. To illustrate the effectiveness of the proposed method, a map of radiated sound intensity is created for a test environment. Then the position and the directivity of the sound sources in the test environment are estimated from this map.
    Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on; 01/2013
  • C.T. Ishi, J. Even, N. Hagita
    [Show abstract] [Hide abstract]
    ABSTRACT: We proposed a method for estimating sound source locations in a 3D space by integrating sound directions estimated by multiple microphone arrays and taking advantage of reflection information. Two types of sources with different directivity properties (human speech and loudspeaker speech) were evaluated for different positions and orientations. Experimental results showed the effectiveness of using reflection information, depending on the position and orientation of the sound sources relative to the array, walls, and the source type. The use of reflection information increased the source position detection rates by 10% on average and up to 60% for the best case.
    Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper focuses on the problem of environmental noises in human-human communication and in automatic speech recognition. To deal with this problem, the use of alternative acoustic sensors -which are attached to the talker and receive the uttered speech through skin or bones- is investigated. In the current study, throat microphones and ear bone microphones are integrated with standard microphones using several fusion methods. The results obtained show that the recognition rates in noisy environments are drastically increased when these sensors are integrated with standard microphones. Moreover, the system does not show any recognition degradations in clean environments. In fact, recognition rates also increase slightly in clean environments. Using late fusion to integrate a throat microphone, an ear bone microphone, and a standard microphone, we achieved a 44% relative improvement in recognition rate in a noisy environment and a 24% relative improvement in recognition rate in a clean environment.
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an audio monitoring system for detecting and identifying people engaged in a conversation. The proposed method is hands-free as it uses a microphone array to acquire the sound. A particularity of the approach is the use of a laser range finder based human tracker system. The human tracker monitors the locations of people then local steered response power is used to detect the people speaking and localize precisely their mouths. Then an audio stream is created for each person and used to perform speaker identification. Experimental results show that the use of the human tracker has several benefits compared to an audio only approach.
    Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: Parkinson's disease (PD) is a severe disease with many symptoms, including speech disorders. Although many methods exist to treat some of PD's symptoms, therapies for speech impairment are not effective and satisfactory, resulting in an open area of research. The current project aims at taking advantage of the Lombard reflex to improve the speech loudness of PD patients. As a first step, the experience of the Lombard reflex by Japanese PD people was confirmed, and the perception of PD patients' speech was evaluated by several subjects. In a following step, methods based on masking sound will be used for intensive training and for self-training of PD patients. However, after intensive training, PD patients may be able to talk louder even without masking noise. In addition, the design and the development of a device based on masking sound that can be used by PD patients while using phone is under consideration.
    Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on; 01/2012
  • Source
    J. Even, N. Hagita
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a novel method for solving the permutation problem inherent to frequency domain blind signal separation of multiple simultaneous speakers. As conventional methods, the proposed method exploits the direction of arrival (DOA) of the different speakers to resolve the permutation. But it is designed to exploit the information from pairs of microphones that are usually discarded because of the spatial aliasing. The proposed method is based on an explicit expression of the spatial aliasing effect on the DOA estimation. By introducing a vector of integer values in the equation used to estimate the DOA, it becomes possible to compensate the spatial aliasing by solving the equation relatively to that vector. The proposed method operates sequentially along the frequency bins. First the spatial aliasing is compensated by an iterative procedure that also detects the permutations. Then the detected permutation are suppressed and the DOA are estimated using all available pairs of microphones. Some simulation results demonstrate the effectiveness of the method.
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on; 06/2011
  • INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Small informal meetings of two to four participants are very common in work environments. For this reason, a convenient way for recording and archiving these meetings is of great interest. In order to efficiently archive such meetings, an important task to address is to keep trace of “who talked when” during a meeting. This paper proposes a new multi-modal approach to tackle this speaker activity detection problem. One of the novelty of the proposed approach is that it uses a human tracker that relies on scanning laser range finders (LRFs) to localize the participants. This choice is especially relevant for robotic applications as robots are often equipped with LRFs for navigation purpose. In the proposed system, a table top microphone array in the center of the meeting room acquires the audio data while the LRF based human tracker monitors the movement of the participants. Then the speaker activity detection is performed using Gaussian mixture models that were trained before hand. An experiment reproducing a meeting configuration demonstrates the performance of the system for speaker activity detection. In particular, the proposed hands free system maintains an good level of performance compared to the use of close talking microphone while participants are simultaneously speaking.
    2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2011, San Francisco, CA, USA, September 25-30, 2011; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a microphone array structure for a spoken-oriented robot dialog system that is designed to discriminate the direction of arrival (DOA) of the target speech and that of the robot internal noise. First, we investigate the performance of the noise estimation conducted by semi-blind source separation (SBSS) in presence of both the diffuse background noise and the robot internal noise. The result indicates that the noise estimation of the SBSS is not good. Next, we analyze the DOA of the robot internal noise in order to determine the reason of the above result; we find out that the internal noise is always in-phase at the microphone array and overlap spacial with the target speech. Based on this fact, we propose to change the microphone array structure from the broadside array to the end-fire array in order to discriminate the DOAs of the target speech and the internal noise. Finally, we evaluate the word accuracy in a dictation task in presence of both diffuse background noise and robot internal noise to confirm the advantage of the proposed structure. Simulation results shows that the proposed microphone array structure results in approximately 10% improvement of the speech recognition performance.
    Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on; 11/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Several recent methods for speech enhancement in presence of diffuse background noise use frequency domain blind signal separation to estimate the diffuse noise and a nonlinear post filter to suppress this estimated noise. This paper presents a frequency domain blind signal extraction method for estimating the diffuse noise in place of the frequency domain blind signal separation. The method is based on the minimization by means of a complex Newton algorithm of a cost function depending of the modulus of the extracted component. The proposed complex Newton method is compared to the gradient descent on the same cost function and to the blind signal separation approach.
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on; 04/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper study the blind estimation of the diffuse background noise for the hands-free speech interface. Some recent papers showed that it is possible to use blind signal separation (BSS) to estimate the diffuse background noise by suppressing the speech component after all the components were separated. In particular, the scale indeterminacy of BSS is avoided by using the projection back method. In this paper, we study an alternative to the projection back for the noise estimation and justify the use of blind signal extraction BSE rather than BSS.
    Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on; 04/2010
  • INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: The speech enhancement architecture presented in this paper is specifically developed for hands-free robot spoken dialog systems. It is designed to take advantage of additional sensors installed inside the robot to record the internal noises. First a modified frequency domain blind signal separation (FD-BSS) gives estimates of the noises generated outside and inside of the robot. Then these noises are canceled from the acquired speech by a multichannel Wiener post-filter. Some experimental results show the recognition improvement for a dictation task in presence of both diffuse background noise and internal noises.
    Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on; 11/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new frequency domain blind signal extraction (FD-BSE) method for the extraction of a target speech in presence of diffuse background noise. This is a fast alternative to frequency domain blind signal separation (FD-BSS) for hands-free speech interface. Like the FD-BSS approach, the speech signal is enhanced by using a nonlinear filter to suppress the noise estimated by the blind method. Simulation results in a realistic environment show the effectiveness of the proposed method.
    Statistical Signal Processing, 2009. SSP '09. IEEE/SP 15th Workshop on; 10/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for enhancing a target speech in the presence of a jammer and background diffuse noise. The method is based on frequency domain blind signal separation (FD-BSS). In particular, the permutation resolution is done using both the direction of arrival (DOA) information contained in the estimated filters and some statistical features computed on the estimated signals. This enables the separation of the target speech, the jammer and the diffuse background noise which is not possible if using only the DOA or the statistical features. Since in presence of diffuse noise, FD-BSS cannot provide a good estimate of the target speech a channel wise modified Wiener filter is proposed as post processing to further enhance the target speech.
    Independent Component Analysis and Signal Separation, 8th International Conference, ICA 2009, Paraty, Brazil, March 15-18, 2009. Proceedings; 01/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new blind signal extraction method based on mutual information. Conventional blind signal extraction methods minimize the mutual information between the extracted signal and the remaining signals indirectly by using a cost function. The proposed method directly minimizes this mutual information through a gradient descent. The derivation of the gradient exploits recent results on the differential of the mutual information and the implementation is based on kernel based density estimation. Some simulation results show the performance of the proposed approach and underline the improvement obtained by using the proposed method as a post-processing for conventional methods.
    Machine Learning for Signal Processing, 2008. MLSP 2008. IEEE Workshop on; 11/2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: The model of the human/machine hands-free speech interface is defined as a point source (the user voice) and a diffuse background noise. This situation is very different from the usual cocktail party model, separation of a mixture of speeches, that is usually treated in frequency domain blind signal separation (FD-BSS). In particular, the fast permutation solvers proposed for the cocktail party model results in poor separation performance in this case. In order to resolve the permutation more efficiently, this paper proposes a new approach that exploits the statistical discrepancy between the target speech and the diffuse background noise.
    Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on; 10/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A robust dereverberation technique for real-time hands-free speech recognition application is proposed. Real-time implementation is made possible by avoiding time-consuming blind estimation. Instead, we use the impulse response by effectively identifying the late reflection components of it. Using this information, together with the concept of Spectral Subtraction (SS), we were able to remove the effects of the late reflection of the reverberant signal. After dereverberation, only the effects of the early component is left and used as input to the recognizer. In this method, multi-band SS is used in order to compensate for the error arising from approximation. We also introduced a training strategy to optimize the values of the multi-band coefficients to minimize the error.
    Hands-Free Speech Communication and Microphone Arrays, 2008. HSCMA 2008; 06/2008