Robust music identification based on low-order zernike moment in the compressed domain.
ABSTRACT In this paper, we devise a novel robust music identification algorithm utilizing compressed-domain audio Zernike moment adapted from image processing techniques as the pivotal feature. Audio fingerprint derived from this feature exhibits strong robustness against various audio signal distortions including the challenging pitch shifting and time-scale modification. Experiments show that in our test dataset composed of 1822 popular songs, a 5s music query example which might have been severely corrupted is still sufficient to identify its original near-duplicate copy, with more than 90% top five precision rate.
- SourceAvailable from: ac.kr
Conference Proceeding: Content-based retrieval of MP3 songs based on query by singing[show abstract] [hide abstract]
ABSTRACT: With the growth of multimedia in the Internet, content analysis of multimedia plays an important role for humanistic management. We investigate the content-based retrieval of MP3 songs based on the interface of query by singing. MDCT (modified DCT) spectral coefficients are directly used to represent the tonic characteristics of a short-term sound. This spectral profile is used for detailed matching between two audio segments. Perceptual features are also computed from MDCT coefficients for audio classification. Two pre-stages based on SVM and k-means classifications are used to remove incorrect (or noisy) segment candidates and to speed up the subsequent matching process. On the other hand, exponential key-scaling schemes and time-warping techniques are developed to overcome key difference and tempo variation between different singers. Experiments show that the retrieval probability of our design can achieve up to 76% among the top 5 out of a total of 114 excerpts in the database.Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on; 06/2004 · 4.63 Impact Factor
Conference Proceeding: MDCT-Based Perceptual Hashing for Compressed Audio Content Identification[show abstract] [hide abstract]
ABSTRACT: In this paper, a perceptual audio hashing method in compressed domain is proposed for content identification, in which MDCT coefficients as the intermediate decoding result are selected for perceptual feature extraction and hash generation. The perceptual feature extraction is based on psychoacoustic model and exhibits good discrimination ability for different audio contents but robustness against common audio signal processing operations. Via feature extraction in the compressed domain, the MDCT-based compressed audios, such as MP3, AAC, etc., could be efficiently identified without complete decoding which facilitates those practical applications with strict requirements of memory and computational complexity, such as online audio retrieval, indexing of massive compressed audio data, audio identification by mobile phone, etc. The algorithm is highly robust against MDCT compression which is widely used in audio coding. Experiments demonstrate the effectiveness of the proposed scheme.Multimedia Signal Processing, 2007. MMSP 2007. IEEE 9th Workshop on; 11/2007
Conference Proceeding: Content-based retrieval of audio example on MP3 compression domain[show abstract] [hide abstract]
ABSTRACT: This paper considers content-based retrieval of a MP3-based (MPEG 1 layer III) digital music archive. In our approach, two kinds of value, scale-factor (SCF) and sub-band coefficient (SBC), in a MP3 frame are used. These two values are extracted from the MP3 decoder to compute the MP3 features for indexing the MP3 objects. Evaluations on a content-based MP3 retrieval system indicate that our approach can achieve a good performance.Multimedia Signal Processing, 2004 IEEE 6th Workshop on; 01/2004