R. Mukai

NTT DATA Corporation, Edo, Tōkyō, Japan

Are you R. Mukai?

Claim your profile

Publications (33)64.59 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper deals with a novel concept of an exponential IDF in the BM25 formulation and compares the search accuracy with that of the BM25 with the original IDF in a content-based video retrieval (CBVR) task. Our video retrieval method is based on a bag of keypoints (local visual features) and the exponential IDF estimates the keypoint importance weights more accurately than the original IDF. The exponential IDF is capable of suppressing the keypoints from frequently occurring background objects in videos, and we found that this effect is essential for achieving improved search accuracy in CBVR. Our proposed method is especially designed to tackle instance video search, one of the CBVR tasks, and we demonstrate its effectiveness in significantly enhancing the instance search accuracy using the TRECVID2012 video retrieval dataset.
    IEEE Transactions on Multimedia 10/2014; 16(6):1690-1699. DOI:10.1109/TMM.2014.2323945 · 1.78 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper discusses a new extension of hidden Markov models that can capture clusters embedded in transitions between the hidden states. In our model, the state-transition matrices are viewed as representations of relational data reflecting a network structure between the hidden states. We specifically present a nonparametric Bayesian approach to the proposed state-space model whose network structure is represented by a Mondrian Process-based relational model. We show an application of the proposed model to music signal analysis through some experimental results.
    ICASSP 2014 - 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 05/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: In digital archiving for cultural heritage preservation, in the medical field, and in some industrial fields, the high-fidelity reproduction of color, gloss, texture, three-dimensional (3-D) shape, and movement is very important. Multi-spectrum imaging can provide accurate color reproduction. Although several types of multi-spectral camera systems have been developed, all of them, except for the six-band HDTV camera system developed by Ohsawa et al [Ohsawa et al. 2004], are multi-shot and none can take still images of moving objects and moving pictures. However, Ohsawa et al.'s system requires very complex and expensive customized optics whose optical elements must be arranged precisely, which makes it far from practical. In order to make multi-spectrum video systems pervasive, the equipment costs must be reduced by ensuring they have as much compatibility with existing video camera systems as possible. To meet this requirement, several stereo one-shot six-band still image capturing systems that also combine multi-spectrum and stereo imaging techniques have been proposed [Tsuchida et al., 2010; Shrestha et al., 2011]. In this paper, we propose a system that applies their concept to existing 4K digital cinema cameras and show the possibility using the proposed system for cinematography.
    ACM SIGGRAPH 2013 Posters; 07/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a novel representation of music that can be used for similarity-based music information retrieval, and also presents a method that converts an input polyphonic audio signal to the proposed representation. The representation involves a 2-dimensional tree structure, where each node encodes the musical note and the dimensions correspond to the time and simultaneous multiple notes, respectively. Since the temporal structure and the synchrony of simultaneous events are both essential in music, our representation reflects them explicitly. In the conventional approaches to music representation from audio, note extraction is usually performed prior to structure analysis, but accurate note extraction has been a difficult task. In the proposed method, note extraction and structure estimation is performed simultaneously and thus the optimal solution is obtained with a unified inference procedure. That is, we propose an extended 2-dimensional infinite probabilistic context-free grammar and a sparse factor model for spectrogram analysis. An efficient inference algorithm, based on Markov chain Monte Carlo sampling and dynamic programming, is presented. The experimental results show the effectiveness of the proposed approach.
    Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on; 03/2012
  • TRECVID 2010 workshop participants notebook papers, Gaithersburg, MD, USA, November 2010; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a music segment detection method for audio signals. Unlike many existing methods, ours specifically focuses on a background-music detection task, that is, detecting music used in background of main sounds. This task is important because music is almost always overlapped by speech or other environmental sounds in visual materials such as TV programs. Our method consists of feature extraction, dimension reduction, and statistical discrimination steps. For each step, we analyzed a set of methods to maximize the detection accuracy. With a simple post processing step, we achieved a framewise error rate as low as 8 % even when the mixed speech was louder than the target music by 10dB.
    Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a new formulation and optimization procedure for grouping frequency components in frequency-domain blind source separation (BSS). We adopt two separation techniques, independent component analysis (ICA) and time-frequency (T-F) masking, for the frequency-domain BSS. With ICA, grouping the frequency components corresponds to aligning the permutation ambiguity of the ICA solution in each frequency bin. With T-F masking, grouping the frequency components corresponds to classifying sensor observations in the time-frequency domain for individual sources. The grouping procedure is based on estimating anechoic propagation model parameters by analyzing ICA results or sensor observations. More specifically, the time delays of arrival and attenuations from a source to all sensors are estimated for each source. The focus of this paper includes the applicability of the proposed procedure for a situation with wide sensor spacing where spatial aliasing may occur. Experimental results show that the proposed procedure effectively separates two or three sources with several sensor configurations in a real room, as long as the room reverberation is moderately low.
    IEEE Transactions on Audio Speech and Language Processing 08/2007; 15(5-15):1592 - 1604. DOI:10.1109/TASL.2007.899218 · 2.63 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the frequency-domain blind source separation (BSS) of convolutively mixed acoustic signals using independent component analysis (ICA). The most critical issue related to frequency domain BSS is the permutation problem. This paper presents two methods for solving this problem. Both methods are based on the clustering of information derived from a separation matrix obtained by ICA. The first method is based on direction of arrival (DOA) clustering. This approach is intuitive and easy to understand. The second method is based on normalized basis vector clustering. This method is less intuitive than the DOA based method, but it has several advantages. First, it does not need sensor array geometry information. Secondly, it can fully utilize the information contained in the separation matrix, since the clustering is performed in high-dimensional space. Experimental results show that our methods realize BSS in various situations such as the separation of many speech signals located in a 3-dimensional space, and the extraction of primary sound sources surrounded by many background interferences
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on; 06/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new method for estimating the direction of arrival (DOA) of source signals whose number N can exceed the number of sensors M. Subspace based methods, e.g., the MUSIC algorithm, have been widely studied, however, they are only applicable when M > N. Another conventional independent component analysis based method allows M ges N, however, it cannot be applied when M < N. By contrast, our new method can be applied where the sources outnumber the sensors (i.e., an underdetermined case M < N) by assuming source sparseness. Our method can cope with 2- or 3-dimensionally distributed sources with a 2- or 3-dimensional sensor array. We obtained promising experimental results for 3 times 4, 3 times 5 and 4 times 5 (#sensors times #speech sources) in a room (RT<sub>60</sub>= 120 ms)
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on; 06/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a method for solving the permutation problem of frequency-domain blind source separation (BSS). The method analyzes the mixing system information estimated with independent component analysis (ICA). When we use widely spaced sensors or increase the sampling rate, spatial aliasing may occur for high frequencies due to the possibility of multiple cycles in the sensor spacing. In such cases, the estimated information would imply multiple possibilities for a source location. This causes some difficulty when analyzing the information. We propose a new method designed to overcome this difficulty. This method first estimates the model parameters for the mixing system at low frequencies where spatial aliasing does not occur, and then refines the estimations by using data at all frequencies. This refinement leads to precise parameter estimation and therefore precise permutation alignment. Experimental results show the effectiveness of the new method
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on; 06/2006
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a prototype system for blind source separation (BSS) of many speech signals and describes the techniques used in the system. Our system uses 8 microphones located at the vertexes of a 4 cm times 4 cm times 4 cm cube and has the ability to separate signals distributed in three-dimensional space. The mixed signals observed by the microphone array are processed by independent component analysis (ICA) in the frequency domain and separated into a given number of signals (up to 8). We carried out experiments in an ordinary office and obtained more than 20 dB of SIR improvement
    Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on; 11/2005
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for estimating location information about multiple sources. The proposed method uses independent component analysis (ICA) as a main statistical tool. The near-field model as well as the far-field model can be assumed in this method. As an application of the method, we show experimental results for the direction-of-arrival (DOA) estimation of three sources that were positioned 3-dimensionally.
    Antennas and Propagation Society International Symposium, 2005 IEEE; 08/2005
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Musical noise is a typical problem with blind source separation using a time-frequency mask. We report that a fine-shift and overlap-add method reduces the musical noise without degrading the separation performance. The effectiveness was confirmed by results of a listening test undertaken in a room with a reverberation time of RT<sub>60</sub>=130 ms.
    Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on; 04/2005
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for enhancing a dominant target source that is close to sensors, and suppressing other interferences. The enhancement is performed blindly, i.e. without knowing the number of total sources or information about each source, such as position and active time. We consider a general case where the number of sources is larger than the number of sensors. We employ a two-stage processing technique where a spatial filter is first employed in each frequency bin and time-frequency masking is then used to improve the performance further. To obtain the spatial filter we employ independent component analysis and then select the component of the target source. Time-frequency masks in the second stage are obtained by calculating the angle between the basis vector corresponding to the target source and a sample vector. The experimental results for a simulated cocktail party situation were very encouraging.
    Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on; 04/2005
  • Source
    Proc. of Joint Workshop on Hands-Free Speech Communication and Microphone Arrays (HSCMA 2005); 03/2005
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This work presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when the number of source signals is large, and the potential source locations are omnidirectional. We propose a combination of small and large spacing sensor pairs with various axis directions in order to obtain proper geometrical information for solving the permutation problem. Experimental results show that the proposed method can separate a mixture of speech signals that came from various directions, even when two of them come from the same direction.
    Circuits and Systems, 2004. ISCAS '04. Proceedings of the 2004 International Symposium on; 06/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we propose a method for separating speech signals when there are more signals than sensors. Several methods have already been proposed for solving the underdetermined problem, and some of these utilize the sparseness of speech signals. These methods employ binary masks to extract the signals, and therefore, their extracted signals contain loud musical noise. To overcome this problem, we propose combining a sparseness approach and independent component analysis (ICA). First, using sparseness, we estimate the time points when only one source is active. Then, we remove this single source from the observations and apply ICA to the remaining mixtures. Experimental results show that our proposed sparseness and ICA (SPICA) method can separate signals with little distortion even in reverberant conditions of T<sub>R</sub>=130 and 200 ms.
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on; 06/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Blind source separation (BSS) for convolutive mixtures can be efficiently achieved in the frequency domain, where independent component analysis is performed separately in each frequency bin. However, frequency-domain BSS involves a permutation problem, which is well known as a difficult problem, especially when the number of sources is large. This paper presents a method for solving the permutation problem, which works well even for many sources. The successful solution for the permutation problem highlights another problem with frequency-domain BSS that arises from the circularity of the discrete frequency representation. This paper discusses the phenomena of the problem and presents a method for solving it. With these two methods, we can separate many sources with a practical execution time. Moreover, real-time processing is currently possible for up to three sources with our implementation.
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on; 06/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper presents a method for solving the permutation problem of frequency domain blind source separation (BSS) when source signals come from the same or similar directions. Geometric information such as the direction of arrival (DOA) is helpful for solving the permutation problem, and a combination of the DOA based and correlation based methods provides a robust and precise solution. However, when signals come from similar directions, the DOA based approach fails, and we have to use only the correlation based method whose performance is unstable. We show that an interpretation of the ICA solution by a near-field model yields information about spheres on which source signals exist, which can be used as an alternative to the DOA. Experimental results show that the proposed method can robustly separate a mixture of signals arriving from the same direction.
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on; 06/2004