Conference Paper

Harmonic elimination structures for Karaoke mode in Spatial Audio Object Coding scheme

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this paper, a modified SAOC (Spatial Audio Object Coding) scheme with harmonic elimination structure is proposed. The proposed structure improves the quality of vocal-removed sound and well removes a vocal object using the harmonic information of the vocal object. Subjective and objective evaluation results show the proposed scheme is superior to the conventional ones.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To improve the quality of the background music decoded by the SAOC, Park et al. proposed a harmonic elimination scheme [6], [7]. They tried to eliminate undesired harmonic components of the vocal object remaining in the decoded background music with a little increased bit rate. ...
... For both cases, the down-mix signal was coded with the advanced audio coding, i.e., AAC, at 128 kbps. The SAOC-HE in the table is for the harmonic elimination scheme proposed [7]. The SAOC-TSC I and II are same, but they use different bit allocations for the residual coding. ...
Article
Full-text available
Interactive audio services (IASs) usually provide users with audio editing functionality and they can render their own sounds according to their preference. For IASs, the spatial audio object coding (SAOC) is an appropriate multichannel coding tool that satisfies most of the required functionalities with relatively low bit rate. Nevertheless, the SAOC usually fails to remove a specific object successfully, especially the vocal object in the case of the Karaoke service. In addition, to expand the service to mobile environments, lower bit rate and complexity are required. Thus, we propose a new SAOC vocal harmonic coding technique to improve the background music quality in the Karaoke service. Namely, utilizing the harmonic information of the vocal object, we removed the harmonics of the vocal object remaining in the background music. Our experimental results confirm that the background music quality is improved by the proposed algorithm even with the low bit rate and complexity.
... was done for the purpose of providing a spatial teleconferencing system; however the integration of its two-step encoding structure (Kim, Seo, Beack, Kang, & Hahn, 2011) was for improving the performance of each rendered audio object. As shown in (Park, Kim & Hahn, 2013;Park, Kim & Hahn, 2011), the transmission of harmonic information to the decoder side can be also performed in order to improve the vocal removal efficiency of MPEG SAOC on a music re-composition application. ...
Chapter
Full-text available
Spatial audio encoding plays a fundamental role in the Ultra High Definition TV (UHDTV) and the latest generation of television broadcasting, as well as other technological devices by providing a three dimensional (3D) audio content to consumers. In this chapter, the fundamental concepts of the spatial audio coding including its techniques, standards and applications, are exhibited. The object-based audio reproduction system will be presented and compared to the traditional channel-based system in order to offer a good understanding of this system to the users and to give them more flexibility in their preferred audio composition. Moreover, the MPEG standard for encoding Multi-channel audio signals will be exposed. Machine learning (ML) methods and their applications in acoustics and spatial audio scenes will then be offered. Ultimately, further research directions will be illustrated and discussed.
... In [106], MPEG SAOC is combined with DirAC to provide spatial teleconference system while [107] introduces two-step coding structure to improve performance of every rendered audio object. To increase vocal removal performance of MPEG SAOC on music re-composition application, a harmonic information can also be transmitted to the decoder side [108][109]. ...
... In [106], MPEG SAOC is combined with DirAC to provide spatial teleconference system while [107] introduces two-step coding structure to improve performance of every rendered audio object. To increase vocal removal performance of MPEG SAOC on music re-composition application, a harmonic information can also be transmitted to the decoder side [108][109]. ...
Article
Full-text available
Market demands on a more impressive entertainment media have motivated for delivery of three dimensional (3D) audio content to home consumers through Ultra High Definition TV (UHDTV), the next generation of TV broadcasting, where spatial audio coding plays fundamental role. This paper reviews fundamental concept on spatial audio coding which includes technology, standard, and application. Basic principle of object-based audio reproduction system will also be elaborated, compared to the traditional channel-based system, to provide good understanding on this popular interactive audio reproduction system which gives end users flexibility to render their own preferred audio composition. Keywords : spatial audio, audio coding, multi-channel audio signals, MPEG standard, object-based audio
... As a method to enhance the SAOC performance, a harmonic elimination scheme was proposed by Park et al. [23], [24]. Park et al. tried to enhance the sound quality by eliminating the undesired harmonic components included in the decoded output signals when the specific audio object is fully suppressed. ...
Article
Full-text available
Aninteractive audioserviceisa new conceptualaudio service that provides the users with opportunities for a variety of experiences on the alternative and advanced audio services. In the interactive audio service, users can freely control various audio ob- jects to maketheir ownaudio sounds.A spatial audioobject coding (SAOC) is a useful technology that can support most parts of the interactive audio service with a relatively low bit-rate, but is very poor to perfect gain control of a certain audio object, i.e., the target audio object. In this paper, the SAOC with a two-step coding struc- ture is proposed to efficiently handle the target audio object as well as the normal audio objects. A transform coded excitation (TCX) based residual coding scheme is presented in the context of the sound quality enhancement. From experimental results, it can be noted that the various audio objects can be successfully handled with respect to the bit-rate and the sound quality by using the pro- posed two-step coding structure SAOC.
Article
Full-text available
In this paper, we present a vocal suppression algorithm that can enhance the quality of music signal coded using Spatial Audio Object Coding (SAOC) in Karaoke mode. The residual vocal component in the coded music signal is estimated by using a cross prediction method in which the music signal coded in Karaoke mode is used as the primary input and the vocal signal coded in Solo mode is used as a reference. However, the signals are extracted from the same downmix signal and highly correlated, so that the music signal can be severely damaged by the cross prediction. To prevent this, a psycho-acoustic disturbance rule is proposed, in which the level of disturbance to the reference input of the cross prediction filter is adapted according to the auditory masking property. Objective and subjective test were performed and the results confirm that the proposed algorithm offers improved quality.
Article
Full-text available
This paper proposes a novel algorithm for separating vocals from polyphonic music accompaniment. Based on pitch esti-mation, the method first creates a binary mask indicating time-frequency segments in the magnitude spectrogram where har-monic content of the vocal signal is present. Second, non-negative matrix factorization (NMF) is applied on the non-vocal segments of the spectrogram in order to learn a model for the accompaniment. NMF predicts the amount of noise in the vo-cal segments, which allows separating vocals and noise even when they overlap in time and frequency. Simulations with commercial and synthesized acoustic material show an average improvement of 1.3 dB and 1.8 dB, respectively, in compari-son with a reference algorithm based on sinusoidal modeling, and also the perceptual quality of the separated vocals is clearly improved. The method was also tested in aligning separated vo-cals and textual lyrics, where it produced better results than the reference method.
Article
A conventional audio service provides mixed one audio scene to user, so user can control the overall volume only. In personalized audio service however, user can control properties of audio objects such as loudness, direction and distance to construct his/her audio scene. But it is not easy to create audio scene for normal users, so we adopted preset-based system, which can provide various audio scenes to user and user can choose one of them based on his/her preference, conveniently. The system consists of an authoring tool, streaming server and a terminal. In this paper, we present design and implementation method of a personalized preset-based audio system and describe the simulation results and applications.
Conference Paper
In the area of low bitrate audio coding, the emergence of "spatial audio coding" (SAC) technology is one of the most remarkable innovations during the recent years. By exploiting the human perception of spatial sound, these coding schemes are capable of transmitting high quality surround sound using bitrates that have been used so far for carrying traditional two-channel stereo audio. After the recent finalization of the MPEG surround (MPS) specification, a next technology generation is envisaged for standardization within ISO/MPEG allowing bitrate-efficient and backward compatible coding of several sound objects. On the receiving side, such a "spatial audio object coding" (SAOC) system renders the objects interactively into a sound scene on a reproduction setup of choice. The paper reviews the principles and current status of SAC schemes and discusses their evolution towards SAOC.
Conference Paper
This paper proposes a conceptually simple and computa- tionally efficient fundamental frequency (F0) estimator for polyphonic music signals. The studied class of estimators calculate the salience, or strength, of a F0 candidate as a weighted sum of the amplitudes of its harmonic partials. A mapping from the Fourier spectrum to a "F0 salience spec- trum" is found by optimization using generated training ma- terial. Based on the resulting function, three different esti- mators are proposed: a "direct" method, an iterative estima- tion and cancellation method, and a method that estimates multiple F0s jointly. The latter two performed as well as a considerably more complex reference method. The number of concurrent sounds is estimated along with their F0s.
Article
Binaural Cue Coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and side information. The companion paper (Part I) covers the psychoacoustic fundamentals of this method and outlines principles for the design of BCC schemes. The BCC analysis and synthesis methods of Part I are motivated and presented in the framework of stereophonic audio coding. This paper, Part II, generalizes the basic BCC schemes presented in Part I. It includes BCC for multichannel signals and employs an enhanced set of perceptual spatial cues for BCC synthesis. A scheme for multichannel audio coding is presented. Moreover, a modified scheme is derived that allows flexible rendering of the spatial image at the receiver supporting dynamic control. All aspects of complete BCC encoder and decoder implementations are discussed, such as down-mixing of the input signals, low complexity estimation of the spatial cues, and quantization and coding of the side information. Application examples are given and the performance of the coder implementations are evaluated and discussed based on subjective listening test results.