About
89
Publications
29,706
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
712
Citations
Introduction
Machine Listening for Sound and Music Understanding
Publications
Publications (89)
The deployment of machine listening algorithms in real-world application scenarios is challenging. In this paper, we investigate how the superposition of multiple sound events within complex sound scenes affects their recognition. As a basis for our research, we introduce the Urban Sound Monitoring (USM) dataset, which is a novel public benchmark d...
The deployment of machine listening algorithms in real-life applications is often impeded by a domain shift caused for instance by different microphone characteristics. In this paper, we propose a novel domain adaptation strategy based on disentanglement learning. The goal is to disentangle task-specific and domain-specific characteristics in the a...
In the context of music information retrieval, similarity-based approaches are useful for a variety of tasks that benefit from a query-by-example scenario. Music however, naturally decomposes into a set of semantically meaningful factors of variation. Current representation learning strategies pursue the disentanglement of such factors from deep re...
The deployment of machine listening algorithms in real-life applications is often impeded by a domain shift caused for instance by different microphone characteristics. In this paper, we propose a novel domain adaptation strategy based on disentanglement learning. The goal is to disentangle task-specific and domain-specific characteristics in the a...
The development of robust acoustic traffic monitoring (ATM) algorithms based on machine learning faces several challenges. The biggest challenge is to collect and annotate a suitable data set for model training and evaluation, which must reflect a broad variety of vehicle sounds since their emitted acoustic noise patterns depend on a variety of fac...
The development of robust acoustic traffic monitoring (ATM) algorithms based on machine learning faces several challenges. The biggest challenge is to collect and annotate large high-quality datasets for algorithm training and evaluation. Such a dataset must reflect a broad variety of vehicle sounds since their emitted acoustic noise patterns depen...
Musicological studies on jazz performance analysis commonly require a manual selection and transcription of improvised solo parts, both of which can be time-consuming. In order to expand these studies to larger corpora of jazz recordings, algorithms for automatic content analysis can accelerate these processes. In this study, we aim to detect the p...
In many urban areas, traffic load and noise pollution are constantly increasing. Automated systems for traffic monitoring are promising countermeasures, which allow to systematically quantify and predict local traffic flow in order to to support municipal traffic planning decisions. In this paper, we present a novel open benchmark dataset, containi...
This paper introduces a novel dataset for polyphonic sound event detection in urban sound monitoring use-cases. Based on isolated sounds taken from the FSD50k dataset, 20,000 polyphonic soundscapes are synthesized with sounds being randomly positioned in the stereo panorama using different loudness levels. The paper gives a detailed discussion of p...
In many urban areas, traffic load and noise pollution are constantly increasing. Automated systems for traffic monitoring are promising countermeasures, which allow to systematically quantify and predict local traffic flow in order to to support municipal traffic planning decisions. In this paper, we present a novel open benchmark dataset, containi...
In this work, we propose considering the information from a polyphony for multi-pitch estimation (MPE) in piano music recordings. To that aim, we propose a method for local polyphony estimation (LPE), which is based on convolutional neural networks (CNNs) trained in a supervised fashion to explicitly predict the degree of polyphony. We investigate...
In this paper, we adapt a recently proposed U-net deep neural network architecture from melody to bass transcription. We investigate pitch shifting and random equalization as data augmentation techniques. In a parameter importance study, we study the influence of the skip connection strategy between the encoder and decoder layers, the data augmenta...
Electroacoustic music is experienced primarily through auditory perception, as it is not usually based on a prescriptive score. For the analysis of such pieces, transcriptions are sometimes created to illustrate events and processes graphically in a readily comprehensible way. These are usually based on the spectrogram of the recording. Although th...
Research on sound event detection (SED) in environmental settings has seen increased attention in recent years. Large amounts of (private) domestic or urban audio data raise significant logistical and privacy concerns. The inherently distributed nature of these tasks, make federated learning (FL) a promising approach to take advantage of large-scal...
In this paper, we investigate a previously proposed algorithm for spoken language identification based on convolutional neural networks and convolutional recurrent neural networks. We improve the algorithm by modifying the training strategy to ensure equal class distribution and efficient memory usage. We successfully replicate previous experimenta...
In this paper, we approach the problem of detecting segments of singing voice activity in opera recordings. We consider three state-of-the-art methods for singing voice detection based on supervised deep learning. We train and test these models on a novel dataset comprising three annotated performances (versions) of Richard Wagner’s opera “Die Walk...
The number of publications on acoustic scene classification (ASC) in environmental audio recordings constantly increased over the last years. This was mainly stimulated by the annual Detection and Classification of Acoustic Scenes and Events (DCASE) competition with its first edition in 2013. All competitions so far involved one or multiple ASC tas...
Die objektivierte Messung subjektiven Lärmempfindens kann helfen, Lärmentstehung präziser nachvollziehen zu können. Im Forschungsprojekt „StadtLärm“ wurde dafür ein erstes technisches System entwickelt. Sensoren messen hier nicht nur die bekannten Lärmpegel, sondern identifizieren gleichzeitig die Geräuschklasse, also die Herkunft des Geräusches. E...
Western classical music comprises a rich repertoire composed for different ensembles. Often, these ensembles consist of instruments from one or two of the families wood-winds, brass, piano, vocals, and strings. In this paper, we consider the task of automatically recognizing instrument families from music recordings. As one main contribution , we i...
Electroacoustic music is experienced primarily through hearing, as it is not usually based on a prescriptive score. For the analysis of such pieces, transcriptions are sometimes created to illustrate events and processes graphically in a readily comprehensible way. These are usually based on the spectrogram of the recording. Although transcriptions...
This is a late-breaking demo session submission. It describes a side-project I've been working on since this year. I describe the concept of a music learning platform for bass players and drummers, results of an initial user study as well as an early app prototype.
In this paper, we build upon a recently proposed deep convolutional neural network architecture for automatic chord recognition (ACR). We focus on extending the commonly used major/minor vocabulary (24 classes) to an extended chord vocabulary of seven chord types with a total of 84 classes. In our experiments, we compare joint and separate classifi...
In this paper, we evaluate hand-crafted features as well as features learned from data using a convolutional neural network (CNN) for different fundamental frequency classification tasks. We compare classification based on full (variable-length) contours and classification based on fixed-sized subcontours in combination with a fusion strategy. Our...
For musicological studies on large corpora, the compilation of suitable data constitutes a time-consuming step. In particular, this is true for high-quality symbolic representations that are generated manually in a tedious process. A recent study on Western classical music has shown that musical phenomena such as the evolution of tonal complexity o...
Predominant instrument recognition in ensemble recordings remains a challenging task, particularly if closely-related instruments such as alto and tenor saxophone need to be distinguished. In this paper, we build upon a recently-proposed instrument recognition algorithm based on a hybrid deep neural network: a combination of convolu-tional and full...
In this paper, we consider two methods to improve an algorithm for bass saliency estimation in jazz ensemble recordings which are based on deep neural networks. First, we apply label propagation to increase the amount of training data by transferring pitch labels from our labeled dataset to unlabeled audio recordings using a spectral similarity mea...
Retrieving short monophonic queries in music recordings is a challenging research problem in Music Information Retrieval (MIR). In jazz music, given a solo transcription, one retrieval task is to find the corresponding (potentially polyphonic) recording in a music collection. Many conventional systems approach such retrieval tasks by first extracti...
Web services allow permanent access to music from all over the world. Especially in the case of web services with user-supplied content, e.g., YouTube™, the available metadata is often incomplete or erroneous. On the other hand, a vast amount of high-quality and musically relevant metadata has been annotated in research areas such as Music Informat...
The use of pitch-informed solo and accompaniment separation as a tool for the creation of practice content
http://schott-campus.com/jazzomat/
Motivated by the recent success of deep learning techniques in various audio analysis tasks, this work presents a distributed sensor-server system for acoustic scene classification in urban environments based on deep convolutional neural networks (CNN). Stacked autoencoders are used to compress extracted spectrogram patches on the sensor side befor...
In this paper, we focus on transcribing walking bass lines, which provide clues for revealing the actual played chords in jazz recordings. Our transcription method is based on a deep neural network (DNN) that learns a mapping from a mixture spectrogram to a salience representation that emphasizes the bass line. Furthermore, using beat positions, we...
This paper deals with the automatic transcription of solo bass guitar recordings with an additional estimation of playing techniques and fretboard positions used by the musician. Our goal is to first develop a system for a robust estimation of the note parameters pitch, onset, and duration (score-level parameters). As a second step, we aim to autom...
Both the collection and analysis of large music repertoires constitute major challenges within musicological disciplines such as jazz research. Automatic methods of music analysis based on audio signal processing have the potential to assist researchers and to accelerate the transcription and analysis of music recordings significantly. In this pape...
The audio mixing process is an art that has proven to be extremely hard to model: What makes a certain mix better than another one? How can the mixing processing chain be automatically optimized to obtain better results in a more efficient manner? Over the last years, the scientific community has exploited methods from signal processing, music info...
The metaphor of storytelling is widespread among jazz performers and jazz researchers. However, little is known about the precise meaning of this metaphor on an analytical level. The present paper attempts to shed light on the connected semantic field of the metaphor and relate it to its musical basis by investigating time courses of selected music...
We present a novel approach to the analysis of jazz solos based on the categorisation and annotation of musical units on a middle level between single notes and larger form parts. A guideline during development was the hypothesis that these midlevel units (MLU) correspond to the improvising musicians’ playing ideas and action plans. A system of cat...
Tone input device having a tone signal input, a tone signal output and a sound classifier connected to the tone signal input for receiving a tone signal incoming at the tone signal input and for analyzing the tone signal for identifying, within the tone signal, one or several tone signal passages corresponding to at least one condition. Further, th...
The paper presents new approaches for analyzing the characteristics of intonation and pitch modulation of woodwind and brass solos in jazz recordings. To this end, we use score-informed analysis techniques for source separation and fundamental frequency tracking. After splitting the audio into a solo and a backing track, a reference tuning frequenc...
In this paper, we aim at analyzing the use of dynamics in jazz improvisation by applying score-informed source separation and automatic estimation of note intensities. A set of 120 jazz solos taken from the Weimar Jazz Database covering many different jazz styles was manually transcribed and annotated by musicology and jazz students within the Jazz...
In this paper, we focus on the automatic classification of jazz records. We propose a novel approach where we break down the ambiguous task, which is commonly referred to as genre classification, into three more specific semantic levels. First, the rhythm feel (swing, latin, funk, two-beat) characterizes the basic groove organization and most often...
In this paper, we propose an instrument-centered bass guitar transcription algorithm. Instead of aiming at a general-purpose bass transcription algorithm, we incorporate knowledge about the instrument construction and typical playing techniques of the electric bass guitar. In addition to the commonly extracted score-level parameters note onset, off...
Instrument recognition is an important task in music information retrieval (MIR). Whereas the recognition of musical instruments in monophonic recordings has been studied widely, the polyphonic case still is far from being solved. A new approach towards feature-based instrument recognition is presented that makes use of redundancies in the harmonic...
Over the past years, the detection of onset times of acoustic events has been investigated in various publications. However, to our knowledge, there is no research on event detection on a broader scale. In this paper, we introduce a method to automatically detect "big" events in music pieces in order to match them with events in videos. Furthermore...
In this paper, we present a novel approach to real-time detection of the string number and fretboard position from polyphonic guitar recordings. Our goal is to assess, if a music student is correctly performing guitar exercises presented via music education software or a remote guitar teacher. We combine a state-of-the art approach for multi-pitch...
In this paper, we propose a novel approach for music similarity estimation. It combines temporal segmentation of music signals with source separation into so-called tone objects. We solely use the timbre-related audio features Mel-Frequency Cepstral Coefficients (MFCC) and Octave-based Spectral Contrast (OSC) to describe the extracted tone objects....
In this paper, we present a comparative study of three different classification paradigms for genre classification based on repetitive basslines. In spite of a large variety in terms of instrumentation, a bass instrument can be found in most music genres. Thus, the bass track can be analysed to explore stylistic similarities between music genres. W...
In this paper, we present a feature-based approach to au-tomatically estimate the string number in recordings of the bass guitar and the electric guitar. We perform different experiments to evaluate the classification performance on isolated note recordings. First, we analyze how factors such as the instrument, the playing style, and the pick-up se...
In this paper, we present a novel audio synthesis model that allows us to simulate bass guitar tones with 11 different playing techniques to choose from. In contrast, previous approaches focussing on bass guitar synthesis only implemented the two slap techniques. We apply a digital waveguide model extended by different modular parts to imitate the...
This paper addresses the use of Music Information Retrieval (MIR) techniques in music education and their integration in learning software. A general overview of systems that are either commer-cially available or in research stage is presented. Furthermore, three well-known MIR methods used in music learning systems and their state-of-the-art are d...