ThesisPDF Available

Estimation automatique des premiers temps dans un signal audio musical.

Authors:
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Recently, the end-to-end approach that learns hierarchical representations from raw data using deep convolutional neural networks has been successfully explored in the image, text and speech domains. This approach was applied to musical signals as well but has been not fully explored yet. To this end, we propose sample-level deep convolutional neural networks which learn representations from very small grains of waveforms (e.g. 2 or 3 samples) beyond typical frame-level input representations. This allows the networks to hierarchically learn filters that are sensitive to log-scaled frequency, such as mel-frequency spectrogram that is widely used in music classification systems. It also helps learning high-level abstraction of music by increasing the depth of layers. We show how deep architectures with sample-level filters improve the accuracy in music auto-tagging and they provide results that are com- parable to previous state-of-the-art performances for the Magnatagatune dataset and Million song dataset. In addition, we visualize filters learned in a sample-level DCNN in each layer to identify hierarchically learned features.
Conference Paper
Full-text available
In this paper, we present madmom, an open-source audio processing and music information retrieval (MIR) library written in Python. madmom features a concise, NumPy-compatible, object oriented design with simple calling conventions and sensible default values for all parameters, which facilitates fast prototyping of MIR applications. Prototypes can be seamlessly converted into callable processing pipelines through madmom's concept of Processors, callable objects that run transparently on multiple cores. Processors can also be serialised, saved, and re-run to allow results to be easily reproduced anywhere. Apart from low-level audio processing, madmom puts emphasis on musically meaningful high-level features. Many of these incorporate machine learning techniques and madmom provides a module that implements some in MIR commonly used methods such as hidden Markov models and neural networks. Additionally, madmom comes with several state-of-the-art MIR algorithms for onset detection, beat, downbeat and meter tracking, tempo estimation, and piano transcription. These can easily be incorporated into bigger MIR systems or run as stand-alone programs.
Book
This volume presents an overview of a relatively new field of psychoacoustic and hearing research that involves perception of musical sound patterns. The material is considered in a set of chapters that reflect the current status of scientific scholarship related to music perception. Each chapter aims at synthesizing a range of findings associated with one of of several major research areas in the field of music perception. Overview Mari Riess Jones, Richard R. Fay, Arthur N. Popper Music Perception: Current Research and Future Directions Mari Riess Jones The Perception of Family and Register in Musical Notes Roy G. Patterson, Etienne Gaudrain, and Thomas C. Walters A Theory of Tonal Hierarchies in Music Carol L. Krumhansl and Lola L. Cuddy Music Acquisition and Effects of Musical Experience Laurel L. Trainor and Kathleen A. Corrigall Music and Emotion Patrick G. Hunter and E. Glenn Schellenberg Tempo and Rhythm J. Devin McAuley Neurodynamics of Music Edward W. Large Memory for Melodies Andrea R. Halpern and James C. Bartlett About the Editors: Mari Riess Jones is Full Professor in the Department of Psychology at The Ohio State University, Columbus, and Adjunct Professor of Psychology at the University of California, Santa Barbara. Arthur N. Popper is Professor in the Department of Biology and Co-Director of the Center for Comparative and Evolutionary Biology of Hearing at the University of Maryland, College Park. Richard R. Fay is Director of the Parmly Hearing Institute and Professor of Psychology at Loyola University of Chicago. About the Series: The Springer Handbook of Auditory Research presents a series of synthetic reviews of fundamental topics dealing with auditory systems. Each volume is independent and authoritative; taken as a set, this series is the definitive resource in the field.
Conference Paper
Can a large convolutional neural network trained for whole-image classification on ImageNet be coaxed into detecting objects in PASCAL? We show that the answer is yes, and that the resulting system is simple, scalable, and boosts mean average precision, relative to the venerable deformable part model, by more than 40% (achieving a final mAP of 48% on VOC 2007). Our framework combines powerful computer vision techniques for generating bottom-up region proposals with recent advances in learning high-capacity convolutional neural networks. We call the resulting system R-CNN: Regions with CNN features. The same framework is also competitive with state-of-the-art semantic segmentation methods, demonstrating its flexibility. Beyond these results, we execute a battery of experiments that provide insight into what the network learns to represent, revealing a rich hierarchy of discriminative and often semantically meaningful features.
Article
While logistic sigmoid neurons are more biologically plausable that hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-differentiability at zero, creating sparse representations with true zeros, which seem remarkably suitable for naturally sparse data. Even though they can take advantage of semi-supervised setups with extra-unlabelled data, deep rectifier networks can reach their best performance without requiring any unsupervised pre-training on purely supervised tasks with large labelled data sets. Hence, these results can be seen as a new milestone in the attempts at understanding the difficulty in training deep but purely supervised nueral networks, and closing the performance gap between neural networks learnt with and without unsupervised pre-training
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Conference Paper
MatConvNet is an open source implementation of Convolutional Neural Networks (CNNs) with a deep integration in the MATLAB environment. The toolbox is designed with an emphasis on simplicity and flexibility. It exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing convolutions with filter banks, feature pooling, normalisation, and much more. MatConvNet can be easily extended, often using only MATLAB code, allowing fast prototyping of new CNN architectures. At the same time, it supports efficient computation on CPU and GPU, allowing to train complex models on large datasets such as ImageNet ILSVRC containing millions of training examples