Conference Paper

Multi-modal music genre classification approach

Multimedia Lab. of Inf. Sch., Renmin Univ. of China, Beijing, China
DOI: 10.1109/ICCSIT.2010.5564489 Conference: Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on, Volume: 8
Source: IEEE Xplore


As a fundamental and critical component of music information retrieval (MIR) systems, automatically classifying music by genre is a challenging problem. The traditional approaches which solely depending on low-level audio features may not be able to obtain satisfactory results. In recent years, the social tags have emerged as an important way to provide information about resources on the web. So, in this paper we propose a novel multi-modal music genre classification approach which uses the acoustic features and the social tags together for classifying music by genre. For the audio content-based classification, we design a new feature selection algorithm called IBFFS (Interaction Based Forward Feature Selection). This algorithm selects the features depending on the pre-computed rules which considering the interaction between the different features. In addition, we are interested in another aspect, that is how performing automatic music genre classification depending on the available tag data. Two classification methods based on the social tags (including music-tags and artist-tags) which crawled from website are developed in our work: (1) we use the generative probabilistic model Latent Dirichlet Allocation (LDA) to analyze the music-tags. Then, we can obtain the probability of every tag belonging to each music genre. (2) The starting point of the second method is that music's artist is often associated with music genres more closely. Therefore, we can compute the similarity between the artist-tag vectors to infer which genre the music belongs to. At last, our experimental results demonstrate the benefit of our multi-modal music genre classification approach.

Download full-text


Available from: Jieping xu, Apr 14, 2014
1 Follower
59 Reads
  • Source
    • "For classification, wrapper and filter methods are used. Zhen and Xu (2010) have followed multi-modal approach. Along with the low level acoustic feature and corresponding social tags (music-tag and artist-tag) gathered from the web are used. "

    International Journal of Computational Intelligence Studies 01/2015; 4(1):31. DOI:10.1504/IJCISTUDIES.2015.069831
  • Source

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A challenging open question in music classification is which music representation (i.e., audio features) and which machine learning algorithm is appropriate for a specific music classification task. To address this challenge, given a number of audio feature vectors for each training music recording that capture the different aspects of music (i.e., timbre, harmony, etc.), the goal is to find a set of linear mappings from several feature spaces to the semantic space spanned by the class indicator vectors. These mappings should reveal the common latent variables, which characterize a given set of classes and simultaneously define a multi-class linear classifier that classifies the extracted latent common features. Such a set of mappings is obtained, building on the notion of the maximum margin matrix factorization, by minimizing a weighted sum of nuclear norms. Since the nuclear norm imposes rank constraints to the learnt mappings, the proposed method is referred to as low-rank semantic mappings (LRSMs). The performance of the LRSMs in music genre, mood, and multi-label classification is assessed by conducting extensive experiments on seven manually annotated benchmark datasets. The reported experimental results demonstrate the superiority of the LRSMs over the classifiers that are compared to. Furthermore, the best reported classification results are comparable with or slightly superior to those obtained by the state-of-the-art task-specific music classification methods.
    EURASIP Journal on Audio Speech and Music Processing 01/2013; 2013(1). DOI:10.1186/1687-4722-2013-13 · 0.39 Impact Factor
Show more