Akira Maezawa

Kyoto University, Kyoto, Kyoto-fu, Japan

Are you Akira Maezawa?

Claim your profile

Publications (7)5.39 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a monaural audio dereverberation method that operates in the power spectrogram domain. The method is robust to different kinds of source signals such as speech or music. Moreover, it requires little manual intervention, including the complexity of room acoustics. The method is based on a non-conjugate Bayesian model of the power spectrogram. It extends the idea of multi-channel linear prediction to the power spectrogram domain, and formulates a model of reverberation as a non-negative, infinite-order autoregressive process. To this end, the power spectrogram is interpreted as a histogram count data, which allows a nonparametric Bayesian model to be used as the prior for the autoregressive process, allowing the effective number of active components to grow, without bound, with the complexity of data. In order to determine the marginal posterior distribution, a convergent algorithm, inspired by the variational Bayes method, is formulated. It employs the minorization-maximization technique to arrive at an iterative, convergent algorithm that approximates the marginal posterior distribution. Both objective and subjective evaluations show advantage over other methods based on the power spectrum. We also apply the method to a music information retrieval task and demonstrate its effectiveness.
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on. 01/2014; 22(12):1918-1930.
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present a method to recuperate fingerings for a given piece of violin music in order to recreate the timbre of a given audio recording of the piece. This is achieved by first analyzing an audio signal to determine the most likely sequence of two-dimensional fingerboard locations (string number and location along the string), which recovers elements of violin fingering relevant to timbre. This sequence is then used as a constraint for finding an ergonomic sequence of finger placements that satisfies both the sequence of notated pitch and the given fingerboard-location sequence. Fingerboard-location-sequence estimation is based on estimation of a hidden Markov model, each state of which represents a particular fingerboard location and emits a Gaussian mixture model of the relative strengths of harmonics. The relative strengths of harmonics are estimated from a polyphonic mixture using score-informed source segregation, and compensates for discrepancies between observed data and training data through mean normalization. Fingering estimation is based on the modeling of a cost function for a sequence of finger placements. We tailor our model to incorporate the playing practices of the violin. We evaluate the performance of the fingerboard-location estimator with a polyphonic mixture, and with recordings of a violin whose timbral characteristics differ significantly from that of the training data. We subjectively evaluate the fingering estimator and validate the effectiveness of tailoring the fingering model towards the violin.
    Computer Music Journal 01/2012; 36(3):57-72. · 0.76 Impact Factor
  • Source
    A. Maezawa, H.G. Okuno, T. Ogata, M. Goto
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a Bayesian method for temporally aligning a music score and an audio rendition. A critical problem in audio-to-score alignment is in dealing with the wide variety of timbre and volume of the audio rendition. In contrast with existing works that achieve this through ad-hoc feature design or careful training of tone models, we propose a Bayesian audio-to-score alignment method by modeling music performance as a Bayesian Hidden Markov Model, each state of which emits a Bayesian signal model based on Latent Harmonic Allocation. After attenuating reverberation, variational Bayes method is used to iteratively adapt the alignment, instrument tone model and the volume balance at each position of the score. The method is evaluated using sixty works of classical music of a variety of instrumentation ranging from solo piano to full orchestra. We verify that our method improves the alignment accuracy compared to dynamic time warping based on chroma vector for orchestral music, or our method employed in a maximum likelihood setting.
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on; 06/2011
  • 01/2010; Springer.
  • Source
    Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR 2010, Utrecht, Netherlands, August 9-13, 2010; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This work presents an automated violin fingering estimation method that facilitates a student violinist acquire the “sound” of his/her favorite recording artist created by the artist’s unique fingering. Our method realizes this by analyzing an audio recording played by the artist, and recuperating the most playable fingering that recreates the aural characteristics of the recording. Recovering the aural characteristics requires the bowed string estimation of an audio recording, and using the estimated result for optimal fingering decision. The former requires high accuracy and robustness against the use of different violins or brand of strings; and the latter needs to create a natural fingering for the violinist. We solve the first problem by detecting estimation errors using rule-based algorithms, and by adapting the estimator to the recording based on mean normalization. We solve the second problem by incorporating, in addition to generic stringed-instrument model used in existing studies, a fingering model that is based on pedagogical practices of violin playing, defined on a sequence of two or three notes. The accuracy of the bowed string estimator improved by 21 points in a realistic situation (38% → 59%) by incorporating error correction and mean normalization. Subjective evaluation of the optimal fingering decision algorithm by seven violinists on 22 musical excerpts showed that compared to the model used in existing studies, our proposed model was preferred over existing one (p = 0.01), but no significant preference towards proposed method defined on sequence of two notes versus three notes was observed (p = 0.05).
    Trends in Applied Intelligent Systems - 23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2010, Cordoba, Spain, June 1-4, 2010, Proceedings, Part III; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: The sequence of strings played on a bowed string instrument is essential to understanding of the fingering. Thus, its estimation is required for machine understanding of violin playing. Audio-based identification is the only viable way to realize this goal for existing music recordings. A naive implementation using audio classification alone, however, is inaccurate and is not robust against variations in string or instruments. We develop a bowed string sequence estimation method by combining audio-based bowed string classification and context-dependent error correction. The robustness against different setups of instruments improves by normalizing the F0-dependent features using the average feature of a recording. The performance of error correction is evaluated using an electric violin with two different brands of strings and an acoustic violin. By incorporating mean normalization, the recognition error of recognition accuracy due to changing the string alleviates by 8 points, and that due to change of instrument by 12 points. Error correction decreases the error due to change of string by 8 points and that due to different instrument by 9 points.
    ISM 2009, 11th IEEE International Symposium on Multimedia, San Diego, California, USA, December 14-16, 2009; 01/2009