Article

Clave-direction analysis: A new arena for educational and creative applications of music technology

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Music Information Research abounds with work on the recognition of style, genre, composer and singer, and the detection of beat, tempo, metre and even emotion. However, some musical attributes remain unexplored. One of these, clave direction, is a rhythmic principle found throughout Latin American music. The purpose of this article is to introduce the significance and potential pedagogical role of automated clave-direction identification, to propose a system-level implementation and to suggest application areas that may benefit performers, educators, students and industry. Clave direction is an inherent feature of rhythmic patterns, not just of the standard sequences commonly associated with clave. Technological aids to composition, arranging, education, search and music production are needed for a growing population of musicians. Existing and developing methods of music technology can be used to fulfil this need. Target applications include rhythmtraining equipment, recording and sequencing software, auto-accompaniment and automated querying of databases or the Internet.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... As the research deepens continuously, computer technology has gradually shown its practicality, comprehensiveness, and flexibility in music teaching. In music teaching and learning, accompaniment is the soul of music, while the computer technology can assist in the completion of music audio, video, and generation of sheet music, etc. [12]. In addition, it also provides recording, editing and processing of sounds, that is, audio extraction, sound editing, and standardization processing of sounds [13]. ...
Article
Full-text available
The computer-based auto accompaniment (AA) reflects the integration between computer technology and music theory. If applied in music education, the computer-based AA may kindle the flame of innovation in music teaching and creation and reshape the traditional music teaching mode. Therefore, this paper probes deep into the value of applying the computer-based AA in music education. Firstly, the author discussed the merits and defects of the application. To disclose the application effect, the computer-based AA was introduced to the teaching of ten songs among 92 music majors. The test results were analysed in details. The results show that the computer-based AA can satisfy the requirements of music teaching, making the teaching objective easier to achieve; the students were very interested in the computer-based AA; the computer-based AA can adapt to various tones, melodies and chords, and thus suit different styles of music; the computer-based AA enjoys high feasibility and flexibility, and boasts high values in music teaching. The research findings lay a theoretical basis for applying computer arrangement in music teaching.
... Locke, D., (1982), provides a discussion on how important the principle of offbeat timing is in much of the sub-Saharan music. Vurkaç, M., (2011Vurkaç, M., ( , 2012, uses the offbeatness measure to analyze the directionality of timelines in a variety of Afro-Latin musics. 4 Handel, S., (1992). ...
... Clave Direction The task for the Firm Teacher Clave Direction dataset is to classify the clave direction from rhythmic patterns (Vurkaç 2011). The original dataset contains four classes, so we select the two most common classes, resulting in a total of 8,606 examples. ...
Article
We consider the task of training classifiers without labels. We propose a weakly supervised method—adversarial label learning—that trains classifiers to perform well against an adversary that chooses labels for training data. The weak supervision constrains what labels the adversary can choose. The method therefore minimizes an upper bound of the classifier’s error rate using projected primal-dual subgradient descent. Minimizing this bound protects against bias and dependencies in the weak supervision. Experiments on real datasets show that our method can train without labels and outperforms other approaches for weakly supervised learning.
Chapter
In order to better understand and optimize the learning process and learning environment, educational data mining technology is becoming more and more important in processing a large number of educational data. Through the analysis of large amounts of educational data, students' academic performance is predicted, identifying a “high risk” of dropping out and predicting their future achievement, e.g., on final exams. The predicted results can provide early warning for students’ own learning, and provide suggestions for educators to allocate educational resources more reasonably and improve the teaching mode. The purpose of this chapter is to comprehensively introduce the more advanced supervised machine learning technology, different educational resource dataset, and the latest research results.
Article
Full-text available
Learning rates in stochastic neural network training are currently determined a priori to training, using expensive manual or automated iterative tuning. Attempts to resolve learning rates adaptively, using line searches, have proven computationally demanding. Reducing the computational cost by considering mini-batch sub-sampling (MBSS) introduces challenges due to significant variance in information between batches that may present as discontinuities in the loss function, depending on the MBSS approach. This study proposes a robust approach to adaptively resolve learning rates in dynamic MBSS loss functions. This is achieved by finding sign changes from negative to positive along directional derivatives, which ultimately converge to a stochastic non-negative associated gradient projection point. Through a number of investigative studies, we demonstrate that gradient-only line searches (GOLS) resolve learning rates adaptively, improving convergence performance over minimization line searches, ignoring certain local minima and eliminating an otherwise expensive hyperparameter. We also show that poor search directions may benefit computationally from overstepping optima along a descent direction, which can be resolved by considering improved search directions. Having shown that GOLS is a reliable line search allows for comparative investigations between static and dynamic MBSS.
Preprint
Full-text available
Learning rates in stochastic neural network training are currently determined a priori to training, using expensive manual or automated iterative tuning. This study proposes gradient-only line searches to resolve the learning rate for neural network training algorithms. Stochastic sub-sampling during training decreases computational cost and allows the optimization algorithms to progress over local minima. However, it also results in discontinuous cost functions. Minimization line searches are not effective in this context, as they use a vanishing derivative (first order optimality condition), which often do not exist in a discontinuous cost function and therefore converge to discontinuities as opposed to minima from the data trends. Instead, we base candidate solutions along a search direction purely on gradient information, in particular by a directional derivative sign change from negative to positive (a Non-negative Associative Gradient Projection Point (NN- GPP)). Only considering a sign change from negative to positive always indicates a minimum, thus NN-GPPs contain second order information. Conversely, a vanishing gradient is purely a first order condition, which may indicate a minimum, maximum or saddle point. This insight allows the learning rate of an algorithm to be reliably resolved as the step size along a search direction, increasing convergence performance and eliminating an otherwise expensive hyperparameter.
Conference Paper
Machine ensembles are learning architectures that offer high expressive capacities and, consequently, remarkable performances. This is due to their high number of trainable parameters. In this paper, we explore and discuss whether binarization techniques are effective to improve standard diversification methods and if a simple additional trick, consisting in weighting the training examples, allows to obtain better results. Experimental results, for three selected classification problems, show that binarization permits that standard direct diversification methods (bagging, in particular) achieve better results, obtaining even more significant performance improvements when preemphasizing the training samples. Some research avenues that this finding opens are mentioned in the conclusions.
Conference Paper
Full-text available
We describe a new method for preprocessing STFT phase- vocoder frames for improved performance in real-time on- set detection, which we term "adaptive whitening". The procedure involves normalising the magnitude of each bin according to a recent maximum value for that bin, with the aim of allowing each bin to achieve a similar dynamic range over time, which helps to mitigate against the influence of spectral roll-off and strongly-varying dynamics. Adaptive whitening requires no training, is relatively lightweight to compute, and can run in real-time. Yet it can improve onset detector performance by more than ten percentage points (peak F-measure) in some cases, and improves the performance of most of the onset detectors tested. We present results demonstrating that adaptive whitening can significantly improve the performance of various STFT-based onset detection functions, including functions based on the power, spectral flux, phase deviation, and complex deviation measures. Our results find the process to be especially beneficial for certain types of audio signal (e.g. complex mixtures such as pop music).
Article
Full-text available
Abstract. John Blacking said “The main task of ethnomusicology is to explain music and music making with reference to the social, but in terms of the musical factors involved in performance and appreciation” (1979:10). For this reason, research in ethnomusicology has, from the beginning, involved analysis of sound, mostly in the form of transcriptions done “by ear” by trained scholars. Bartók’s many transcriptions of folk music of his native Hungary are a notable example. Since the days of Charles Seeger, there have been many attempts to facilitate this analysis using various technological tools. We survey such existing work, outline some guidelines for scholars interested in working in this area, and describe some of our initial research ef- forts in this field. We will use the term “Computational Ethnomusicology” (CE) to refer to the design, development and usage of computer tools that have the potential to assist in ethnomusicological research. Although not new, CE is not an established term and existing work is scattered among the different disciplines involved. As we quickly enter an era in which all recorded media will be “online,” meaning that it will be instantaneously available in digital form anywhere in the world that has an Inter- net connection, there is an unprecedented need for navigational/analytical methods that were entirely theoretical just a decade ago. This era of instantaneously available, enormous collections of music only intensifies the need for the tools that fall under the CE rubric. We will concentrate on the usefulness of a relatively new area of research in music called Music Information Retrieval (MIR). MIR is about designing and building tools that help us organize, understand and search large collections of music, and it is a field that has been rapidly evolving over the past few years, thanks in part to recent advances in computing power and digital music distribution. It en- compasses a wide variety of ideas, algorithms, tools, and systems that have been proposed to handle the increasingly large and varied amounts of musical data available digitally. Researchers in this emerging field come from many different backgrounds including computer science, electrical engineering, library and information science, music, and psychology. The technology of MIR is ripe to be integrated into the practice of ethnomusicological research. To date, the majority of existing work in MIR has focused on either popular music with applications such as music recommendation systems, or on Western “classical” music with applications such as score following and query-by- humming. In addition, as microchips become smaller and faster and as sensor technology and actuators become cheaper and more precise, we are beginning to see ethnomusicological re- search incorporating both robotic systems and digital capture of music-related bodily gestures; music in general is embodied and involves more than a microphone can record. Our hope is that the material in this paper will help motivate more interdisciplinary and multidisciplinary researchers and scholars to explore these possibilities and solidify the field of computational ethnomusicology.
Conference Paper
Full-text available
This paper presents a framework for audio-driven human body motion analysis and synthesis. We address the problem in the context of a dance performance, where gestures and movements of the dancer are mainly driven by a musical piece and characterized by the repetition of a set of dance figures. The system is trained in a supervised manner using the multiview video recordings of the dancer. The human body posture is extracted from multiview video information without any human intervention using a novel marker-based algorithm based on annealing particle filtering. Audio is analyzed to extract beat and tempo information. The joint analysis of audio and motion features provides a correlation model that is then used to animate a dancing avatar when driven with any musical piece of the same genre. Results are provided showing the effectiveness of the proposed algorithm.
Conference Paper
Full-text available
We examine performance of different classifiers on different audio feature sets to determine the genre of a given music piece. For each classifier, we also evaluate performances of feature sets obtained by dimensionality reduction methods. Finally, we experiment on increasing classification accuracy by combining different classifiers. Using a set of different classifiers, we first obtain a test genre classification accuracy of around 79.6 plusmn 4.2% on 10 genre set of 1000 music pieces. This performance is better than 71.1 plusmn 7.3% which is the best that has been reported on this data set. We also obtain 80% classification accuracy by using dimensionality reduction or combining different classifiers. We observe that the best feature set depends on the classifier used
Conference Paper
Full-text available
In this paper we explore the relationship between the temporal and rhythmic structure of musical audio signals. Using automatically extracted rhythmic structure we present a rhythmically-aware method to combine note onset detection techniques. Our method uses top-down knowledge of repetitions of musical events to improve detection performance by modelling the temporal distribution of onset locations. Results on a publicly available database demonstrate that using musical knowledge in this way can lead to significant improvements by reducing the number of missed and spurious detections.
Conference Paper
Full-text available
This paper introduces a novel way to detect metrical struc- ture in music. We introduce a way to compute autocorre- lation such that the distribution of energy in phase space is preserved in a matrix. The resulting autocorrelation phase matrix is useful for several tasks involving metrical struc- ture. First we can use the matrix to enhance standard auto- correlation by calculating the Shannon entropy at each lag. This approach yields improved results for autocorrelation- based tempo induction. Second, we can efficiently search the matrix for combinations of lags that suggest particular metrical hierarchies. This approach yields a good model for predicting the meter of a piece of music. Finally we can use the phase information in the matrix to align a can- didate meter with music, making it possible to perform beat induction with an autocorrelation-based model. We present results for several meter prediction and tempo in- duction datasets, demonstrating that the approach is com- petitive with models designed specifically for these tasks. We also present preliminary beat induction results on a small set of artificial patterns. sults for meter prediction and tempo induction. We will also present some details concerning the alignment of the metrical structure with a piece of music. We will also present alignment results for a small dataset of artificial patterns. However the details of computing this alignment online (for beat induction) are the topic of another paper. The structure of this paper is as follows. In Section 2 we will discuss other approaches to finding meter and beat in music. In Section 3 we will describe our model consist- ing of the creation of an autocorrelation matrix, computa- tion of the entropy for each lag in this matrix, the selection of a metrical hierarchy and the alignment of the hierarchy with music. Finally in Section 4 we present simulation results.
Conference Paper
Full-text available
The majority of existing research in Music Information Re- trieval (MIR) has focused on either popular or classical mu- sic and frequently makes assumptions that do not generalize to other music cultures. We use the term Computational Eth- nomusicology (CE) to describe the use of computer tools to assist the analysis and understanding of musics from around the world. Although existing MIR techniques can serve as a good starting point for CE, the design of effective tools can benefit from incorporating domain-specific knowledge about the musical style and culture of interest. In this pa- per we describe our realization of this approach in the con- text of studying Afro-Cuban rhythm. More specifically we show how computer analysis can help us characterize and appreciate the complexities of tracking tempo and analyz- ing micro-timing in these particular music styles. A novel template-based method for tempo tracking in rhythmically complex Afro-Cuban music is proposed. Although our ap- proach is domain-specific, we believe that the concepts and ideas used could also be used for studying other music cul- tures after some adaptation.
Article
Full-text available
In this short communication we describe some experiments in which methods of statistical pattern recognition are applied for musical style recognition and disputed musical authorship attribution.Values of a set of 20 features (also called “style markers”) are measured in the scores of a set of compositions, mainly describing the different sonorities in the compositions. For a first study over 300 different compositions of Bach, Handel, Telemann, Mozart and Haydn were used and from this data set it was shown that even with a few features, the styles of the various composers could be separated with leave-one-out-error rates varying from 4% to 9% with the exception of the confusion between Mozart and Haydn which yielded a leave-one-out-error rate of 24%. A second experiment included 30 fugues from J.S. Bach, W.F. Bach and J.L. Krebs, all of different style and character. With this data set of compositions of undisputed authorship, the F minor fugue for organ, BWV 534 (of which Bach’s authorship is disputed) then was confronted. It could be concluded that there is experimental evidence that J.L. Krebs should be considered in all probability as the composer of the fugue in question.
Article
Full-text available
This paper presents a non-conventional approach for the automatic music genre classification problem. The proposed approach uses multiple feature vectors and a pattern recognition ensemble approach, according to space and time decomposition schemes. Despite being music genre classification a multi-class problem, we accomplish the task using a set of binary classifiers, whose results are merged in order to produce the final music genre label (space decomposition). Music segments are also decomposed according to time segments obtained from the beginning, middle and end parts of the original music signal (time-decomposition). The final classification is obtained from the set of individual results, according to a combination procedure. Classical machine learning algorithms such as Naïve-Bayes, Decision Trees, k Nearest-Neighbors, Support Vector Machines and Multi-Layer Perceptron Neural Nets are employed. Experiments were carried out on a novel dataset called Latin Music Database, which contains 3,160 music pieces categorized in 10 musical genres. Experimental results show that the proposed ensemble approach produces better results than the ones obtained from global and individual segment classifiers in most cases. Some experiments related to feature selection were also conducted, using the genetic algorithm paradigm. They show that the most important features for the classification task vary according to their origin in the music signal.
Article
Full-text available
Most people follow the music to hum or the rhythm to tap sometimes. We may get different meanings of a music style if it is explained or felt by different people. Therefore we cannot obtain a very explicit answer if there is no music notation. Tempo and beats are very important elements in the perceptual music. Therefore, tempo estimation and beat tracking are fundamental techniques in automatic audio processing, which are crucial to multimedia applications. We first develop an artificial neural network to classify the music excerpts into the evaluation preference. And then, with the preference classification, we can obtain accurate estimation for tempo and beats, by either Ellis's method or Dixon's method. We test our method with mixed data set which contains ten music genres from the 'ballroom dancer' database. Our experimental results show that the accuracy of our method is higher than only one individual Ellis's method or Dixon's method.
Article
Full-text available
Content based music genre classification is a key component for next generation multimedia search agents. This paper introduces an audio classification technique based on audio content analysis. Artificial Neural Networks (ANNs), specifically multi-layered perceptrons (MLPs) are implemented to perform the classification task. Windowed audio files of finite length are analyzed to generate multiple feature sets which are used as input vectors to a parallel neural architecture that performs the classification. This paper examines a combination of linear predictive coding (LPC), mel frequency cepstrum coefficients (MFCCs), Haar Wavelet, Daubechies Wavelet and Symlet coefficients as feature sets for the proposed audio classifier. Parallel to MLP, a Gaussian radial basis function (GRBF) based ANN is also implemented and analyzed. The obtained prediction accuracy of 87.3% in determining the audio genres claims the efficiency of the proposed architecture. The ANN prediction values are processed by a rule based inference engine (IE) that presents the final decision.
Article
Full-text available
This paper presents a novel approach to detecting onsets in music audio files. We use a supervised learning algorithm to classify spectrogram frames extracted from digital audio as being onsets or nononsets. Frames classified as onsets are then treated with a simple peak-picking algorithm based on a moving average. We present two versions of this approach. The first version uses a single neural network classifier. The second version combines the predictions of several networks trained using different hyperparameters. We describe the details of the algorithm and summarize the performance of both variants on several datasets. We also examine our choice of hyperparameters by describing results of cross-validation experiments done on a custom dataset. We conclude that a supervised learning approach to note onset detection performs well and warrants further investigation.
Article
Full-text available
We present an innovative tempo estimation system that processes acoustic audio signals and does not use any high-level musical knowledge. Our proposal relies on a harmonic + noise decomposition of the audio signal by means of a subspace analysis method. Then, a technique to measure the degree of musical accentuation as a function of time is developed and separately applied to the harmonic and noise parts of the input signal. This is followed by a periodicity estimation block that calculates the salience of musical accents for a large number of potential periods. Next, a multipath dynamic programming searches among all the potential periodicities for the most consistent prospects through time, and finally the most energetic candidate is selected as tempo. Our proposal is validated using a manually annotated test-base containing 961 music signals from various musical genres. In addition, the performance of the algorithm under different configurations is compared. The robustness of the algorithm when processing signals of degraded quality is also measured.
Article
Full-text available
We present a strategy to perform automatic genre classification of musical signals. The technique divides the signals into 21.3 milliseconds frames, from which 4 features are extracted. The values of each feature are treated over 1-second analysis segments. Some statistical results of the features along each analysis segment are used to determine a vector of summary features that characterizes the respective segment. Next, a classification procedure uses those vectors to differentiate between genres. The classification procedure has two main characteristics: (1) a very wide and deep taxonomy, which allows a very meticulous comparison between different genres, and (2) a wide pairwise comparison of genres, which allows emphasizing the differences between each pair of genres. The procedure points out the genre that best fits the characteristics of each segment. The final classification of the signal is given by the genre that appears more times along all signal segments. The approach has shown very good accuracy even for the lowest layers of the hierarchical structure.
Conference Paper
Full-text available
In this paper we describe a system for detecting the tempo of sitar performance using a multimodal signal processing approach. Real-time measurements are obtained from sensors on the instrument and by wearable sensors on the performer's body. Experiments comparing audio-based and sensor-based tempo tracking are described. The real-time tempo tracking method is based on extracting onsets and applying Kalman filtering. We show how late fusion of the audio and sensor tempo estimates can improve tracking. The obtained results are used to inform design parameters for a real-time system for human-robot musical performance.
Article
Full-text available
In the field of computer music, pattern recognition algorithms are very relevant for music information retrieval applications. One challenging task in this area is the automatic recognition of musical style, having a number of applications like indexing and selecting musical databases. From melodies symbolically represented as digital scores (standard musical instrument digital interface files), a number of melodic, harmonic, and rhythmic statistical descriptors are computed and their classification capability assessed in order to build effective description models. A framework for experimenting in this problem is presented, covering the feature extraction, feature selection, and classification stages, in such a way that new features and new musical styles can be easily incorporated and tested. Different classification methods, like Bayesian classifier, nearest neighbors, and self-organizing maps, are applied. The performance of such algorithms against different description models and parameters is analyzed for two particular musical styles, jazz and classical, used as an initial benchmark for our system
Conference Paper
This paper presents a tempo tracking system adaptable to different styles of music. The system analyzes tonal music having a rhythmic structure based on a simple meter. The focus is on tracking gradual tempo changes during the live performance of a piece of music. A musician using the system plays an instrument connected to the computer through a MIDI interface. The system analyzes every note played according to a set of rules which identify temporal, melodic, harmonic and expressive qualities that contribute to the rhythmic structure in the music. This establishes the location of the note with respect to the beat, and allows another set of rules to predict the time of the next beat. The predicted beat can then be used to synchronize programmed accompaniment. The flexibility of the system stems from the use of distinct rules which can be prioritized. High priorities are assigned to rules which identify strong rhythmic qualities; low priority rules become effective in the absence of such qualities. Furthermore, the priorities can be tailored to suit a specific style, or a particular piece of music by identifying which qualities are most relevant to the rhythmic structure.
Article
Musical genre is probably the most popular music descrip- tor. In the context of large musical databases and Electronic Music Distribution, genre is therefore a crucial metadata for the description of music content. However, genre is intrinsi- cally ill-defined and attempts at defining genre precisely have a strong tendency to end up in circular, ungrounded projec- tions of fantasies. Is genre an intrinsic attribute of music titles, as, say, tempo? Or is genre a extrinsic description of the whole piece? In this article, we discuss the various approaches in representing musical genre, and propose to classify these approaches in three main categories: manual, prescriptive and emergent approaches. We discuss the pros and cons of each approach, and illustrate our study with results of the Cuidado IST project.
Article
This article describes a method to estimate and track the tempo of musical recordings which was submitted to the MIREX 2006 evaluation contest where it was ranked third out of seven submissions. The algorithm that we present is composed of three stages: first a front-end analyses the audio signal in order to extract a representation of the musically relevant events, the so-called “detection function”. Then, the periodicity of these events is estimated in contiguous and overlapping excerpts of the detection function signal. Finally, the periodicities are tracked through time and the most energetic are selected as tempi.
Article
This paper introduces a novel way to detect metrical structure in music. We introduce a way to compute autocorrelation such that the distribution of energy in phase space is preserved in a matrix. The resulting autocorrelation phase matrix is useful for several tasks in-volving metrical structure. First we can use the matrix to enhance standard autocorrelation by calculating the Shannon entropy at each lag. This approach yields improved results for autocorrelation-based tempo induction. Second, we can efficiently search the matrix for combinations of lags that suggest particular metrical hierarchies. This approach yields a good model for predicting the meter of a piece of music. Finally we can use the phase information in the matrix to align a candidate meter with music, making it possible to perform beat in-duction with an autocorrelation-based model. We argue that the au-tocorrelation phase matrix is a good, relatively efficient representation of temporal structure that is useful for a variety of applications. We present results for several relatively large meter prediction and tempo induction datasets, demonstrating that the approach is competitive with models designed specifically for these tasks. We also present preliminary beat induction results on a small set of artificial patterns.
Article
This study uses machine learning techniques (ML) to classify and cluster different Western music genres. Three artificial neural network models (multi-layer perceptron neural network [MLP], probabilistic neural network [PNN]) and self-organizing maps neural network (SOM) along with support vector machines (SVM) are compared to two standard statistical methods (linear discriminant analysis [LDA] and cluster analysis [CA]). The variable sets considered are average frequencies, variance frequencies, maximum frequencies, amplitude or loudness of the sound and the median of the location of the 15 highest peaks in the periodogram. The results show that machine learning models outperform traditional statistical techniques in classifying and clustering different music genres due to their robustness and flexibility of modeling algorithms. The study also shows how it is possible to identify various dimensions of music genres by uncovering complex patterns in the multidimensional data.
Conference Paper
We present new developments in the improvisational robotic percussionist project, aimed at improving human-robot interaction through design, mechanics, and perceptual modeling. Our robot, named Haile, listens to live human players, analyzes perceptual aspects in their playing in real-time, and uses the product of this analysis to play along in a collaborative and improvisatory manner. It is designed to combine the benefits of computational power in algorithmic music with the expression and visual interactivity of acoustic playing. Haile's new features include an anthropomorphic form, a linear-motor based robotic arm, a novel perceptual modeling implementation, and a number of new interaction schemes. The paper begins with an overview of related work and a presentation of goals and challenges based on Haile's original design. We then describe new developments in physical design, mechanics, perceptual implementation, and interaction design, aimed at improving human-robot interactions with Haile. The paper concludes with a description of a user study, conducted in an effort to evaluate the new functionalities and their effectiveness in facilitating expressive musical human-robot interaction. The results of the study show correlation between human's and Haile's rhythmic perception as well as user satisfaction regarding Haile's perceptual and mechanical abilties. The study also indicates areas for improvement such as the need for better timbre and loudness control and more advance and responsive interaction schemes.
Conference Paper
In this paper we propose a template matching algorithm to address tempo tracking problem in MP3 domain. The algorithm is based on MP3 Window-Switching Pattern (WSP) only. This means that no frequency analysis is performed by the program itself. Because the WSP is structured coherently with the drums line it is possible to compare this pattern with a simple metronome template. Experimental results are presented for a range of different musical styles, including rock, jazz, and popular songs with a variety of BPM and time signature. A part of the experimentation is dedicated to analyze the train set proposed for MIREX 2006. A computational cost analysis is presented too.
Conference Paper
This paper explores how data generated by meter induction models may be recycled to quantify metrical ambiguity, which is calculated by measuring the dispersion of metrical induction strengths across a population of possible meters. A measure of dispersion commonly used in economics to measure income inequality, the Gini coefficient, is introduced for this purpose. The value of this metric as a rhythmic descriptor is explored by quantifying the ambiguity of several common clave patterns and comparing the results to other metrics of rhythmic complexity and syncopation.
Article
A scheme of complexity-scalable beat detection of popular music recordings that can run different platforms, particularly battery-powered handheld devices has been presented. The detector complexity can be adjusted to match the constraints of the device and user requirements. The use of Huffman bits from the compressed bitstream is use without requiring any decoding as the sole feature for onset detection. An efficient and robust graph-based beat induction algorithm is also provided. By applying the beat detector to compressed rather than uncompressed audio, the system execution time can be reduced by almost three orders of magnitude. The algorithm was implemented and tested on a targeted personal digital assistants (PDA) platform. The compressed and transform-domain processing are particularly suitable for mobile applications, providing a satisfactory tradeoff between detection accuracy and execution speed.
Article
Automatic beat tracking consists of estimating the number of beats per minutes at which a music track is played and identifying exactly when these beats occur. Applications range from music analysis, sound-effect synchronization, and audio editing to automatic playlist generation and deejaying. An off-line beat-tracking technique for estimating a time-varying tempo in an audio track is presented. The algorithm uses an MMSE estimation of local tempo and beat location candidates, followed by a dynamic programming stage used to determine the optimum choice of candidate in each analysis frame. The algorithm is efficient in its use of computation resource, yet provides very good results on a wide range of audio tracks. The algorithm details are presented, followed by a discussion of the performance and suggestions for further improvements.
Conference Paper
We introduce the beat spectrum, a new method of automatically characterizing the rhythm and tempo of music and audio. The beat spectrum is a measure of acoustic self-similarity as a func- tion of time lag. Highly structured or repetitive music will have strong beat spectrum peaks at the repetition times. This reveals both tempo and the relative strength of particular beats, and there- fore can distinguish between different kinds of rhythms at the same tempo. We also introduce the beat spectrogram which graphically illustrates rhythm variation over time. Unlike previ- ous approaches to tempo analysis, the beat spectrum does not depend on particular attributes such as energy or frequency, and thus will work for any music or audio in any genre. We present tempo estimation results which are accurate to within 1% for a variety of musical genres. This approach has a variety of applica- tions, including music retrieval by similarity and automatically generating music videos. Anyone who has ever tapped a foot in time to music has per- formed rhythm analysis. Though simple for humans, this task is considerably more difficult to automate. We introduce a new mea- sure of tempo analysis called the beat spectrum. This is a measure of acoustic self-similarity versus lag time, computed from a repre- sentation of spectrally similarity. Peaks in the beat spectrum cor- respond to major rhythmic components of the source audio. The repetition time of each component can be determined by the lag time of the corresponding peak, while the relative amplitudes of different peaks reflects the strengths of their corresponding rhyth- mic components. We also present the beat spectrogram which graphically illustrates rhythmic variation over time. The beat spectrogram is an image formed from the beat spectrum over suc- cessive windows. Strong rhythmic components are visible as bright bars in the beat spectrogram, making changes in tempo or time signature visible. In addition, a measure of audio novelty can be computed that measures how novel the source audio is at any time (2). Instances when this measure is large correspond to sig- nificant audio changes. Periodic peaks correspond to rhythmic periodicity in the music. In the final section, we present various applications of the beat spectrum, including music retrieval by rhythmic similarity, an "automatic DJ" that can smoothly sequence music with similar tempos and automatic music video generation.
Article
A single voice extracted from the scores of compositions from the Baroque, Classical, Romantic, and Contemporary periods has been studied in order to determine the feasibility of the determination of musical meter by computer. The method of autocorrelation is appropriate for this calculation since it is a measure of the frequency of occurrence of events following an event at time zero. If a greater frequency of events occurs on the downbeat of a measure as predicted by Palmer and Krumhansl [''Mental Representations for Musical Meter,'' J. Exp. Psychol.: Hum. Percept. Perform. 16, 728-741 (1990)], then a peak in the autocorrelation function should indicate the time for a single measure. The results of these computations indicate that computer determination of meter from score events is indeed possible. An example is included to show that this method of analysis can be applied to live performance data as well.
Article
A method is presented for using a small number of bandpass filters and banks of parallel comb filters to analyze the tempo of, and extract the beat from, musical signals of arbitrary polyphonic complexity and containing arbitrary timbres. This analysis is performed causally, and can be used predictively to guess when beats will occur in the future. Results in a short validation experiment demonstrate that the performance of the algorithm is similar to the performance of human listeners in a variety of musical situations. Aspects of the algorithm are discussed in relation to previous high-level cognitive models of beat tracking.
Conference Paper
This paper proposes a music accompaniment system capable of catching the tempo of music. The original signal is first reduced to a detection function revealing the pulses of music. To induce the tempo and locate beats, a Fourier analysis-based algorithm with high practicality and generality is designed that it is robust to beat strength and not restricted to specific music genres. The regularity and periodicity of music beats are extracted and even the missing beats are recovered. The performance is validated using a comprehensive testing data set, and results in both formal objective experiments and subjective listening evaluations show convincible performance. Moreover, an interactive interface is designed that users are allowed to select an instrument to accompany the music based on the obtained tempo.
Conference Paper
An algorithm for analyzing the rhythmic content of acoustic signals of polyphonic and multitimbral Western music is presented. The analysis consists of detecting sound onsets, computing an inter-onset interval (IOI) histogram, and estimating the duration of the shortest notes, i.e. the tatum period, from the histogram. Robustness against tempo changes has explicitly been built into the system by using short-term memory for the tatum grid estimation. The results are directly applicable to computational music processing for making a musically useful segmentation and computing a musical time base. The proposed algorithm works causally and a real-time software implementation is available on-line (see Seppanen, J. and Majdak, P., "Rhythm Estimator Software", ftp://iem.kug.ac.at/pd/Externals/RHYTHM/, 2000). The performance of the system was validated for 50 musical excerpts, and the algorithm was found to be capable of finding the tatum grid from music with a regular rhythm
Article
We formulate tempo tracking in a Bayesian framework where a tempo tracker is modeled as a stochastic dynamical system. The tempo is modeled as a hidden state variable of the system and is estimated by a Kalman filter. The Kalman filter operates on a Tempogram, a wavelet-like multiscale expansion of a real performance. An important advantage of our approach is that it is possible to formulate both off-line or real-time algorithms. The simulation results on a systematically collected set of MIDI piano performances of Yesterday and Michelle by the Beatles shows accurate tracking of approximately of the beats. 1
Article
This paper describes a real-time beat tracking system that recognizes a hierarchical beat structure comprising the quarternote, half-note, and measure levels in real-world audio signals sampled from popular-music compact discs. Most previous beat-tracking systems dealt with MIDI signals and had difficulty in processing, in real time, audio signals containing sounds of various instruments and in tracking beats above the quarter-note level. The system described here can process music with drums and music without drums and can recognize the hierarchical beat structure by using three kinds of musical knowledge: of onset times, of chord changes, and of drum patterns. This paper also describes several applications of beat tracking, such as beat-driven real-time computer graphics and lighting control.
Musical style identification with n-Grams and neural networks'
  • P P Cruz-Alcázar
  • M J Castro-Bleda
Cruz-Alcázar, P. P. and Castro-Bleda, M. J. (2008), 'Musical style identification with n-Grams and neural networks', in J. Ruiz-Shulcloper, W. G. Kropatsch (eds), Lecture Notes in Computer Science, Vol 5197: Progress in Pattern Recognition, Image Analysis and Applications, Heidelberg: Springer, pp. 461-69.
Evaluating rhythmic descriptors for musical genre classification
  • F Gouyon
  • S Dixon
  • E Pampalk
  • G Widmer
Gouyon, F., Dixon, S., Pampalk, E. and Widmer, G. (2004), 'Evaluating rhythmic descriptors for musical genre classification', AES 25th International Conference, Church House, London, UK, 17-19 June, pp. 1-9.
Musical genre classification using modified wavelet-like features and support vector machines
  • O Kotov
  • A Paradzinets
  • E Bovbel
Kotov, O., Paradzinets, A. and Bovbel, E. (2007), 'Musical genre classification using modified wavelet-like features and support vector machines', Proceedings of the IASTED European Conference on Internet and Multimedia Systems and Applications, Chamonix, France, 14-16 March, pp. 260-65.
Stochastic text models for music categorization
  • C Pérez-Sancho
  • D Rizo
  • J M Iñesta
Pérez-Sancho, C., Rizo, D. and Iñesta, J. M. (2008), 'Stochastic text models for music categorization', in N. D Vitoria Lobo, T. Kasparis, F. Roli, J. Kwok, G. C. Anagnostopoulos, M. Loog (eds), Lecture Notes in Computer Science, Vol. 5342: Structural, Syntactic, and Statistical Pattern Recognition, Heidelberg: Springer, pp. 55-64.
Tracking musical beats in real time
  • P E Allen
  • R B Dannenberg
Allen, P. E. and Dannenberg, R. B. (1990), 'Tracking musical beats in real time', Proceedings of the International Computer Music Conference, University of Glasgow, Glasgow, Scotland, UK, San Francisco, CA, USA: ICMA, pp. 140-43.
Foot-Tapping: A brief introduction to beat induction
  • P Desain
  • H Honing
Desain, P. and Honing, H. (1994), 'Foot-Tapping: A brief introduction to beat induction', Proceedings of the International Computer Music Conference, Aarhus, Denmark, 12-17 September, San Francisco, CA, USA: ICMA, pp. 78-79.
Beat tracking with musical knowledge
  • S Dixon
  • E Cambouropoulos
Dixon, S. and Cambouropoulos, E. (2000), 'Beat tracking with musical knowledge', Proceedings of the 14th European Conference on Artificial Intelligence, Berlin, Germany, 20-25 August, pp 626-30.
Beat tracking with particle filtering algorithms
  • S Hainsworth
  • M Macleod
Hainsworth, S. and Macleod, M. (2003), 'Beat tracking with particle filtering algorithms', Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 19-23, October, Piscataway, NJ, USA: IEEE, pp. 91-94.