About
60
Publications
20,492
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
702
Citations
Introduction
Hendrik Purwins currently works at the The Faculty of Engineering Sciences, Department of Architecture, Design and Media Technology, Aalborg University.
Current institution
Additional affiliations
August 2005 - April 2012
Publications
Publications (60)
We present a preliminary study on an end-to-end variational autoencoder (VAE) for sound morphing. Two VAE variants are compared: VAE with dilation layers (DC-VAE) and VAE only with regular convolutional layers (CC-VAE). We combine the following loss functions: 1) the time-domain mean-squared error for reconstructing the input signal, 2) the Kullbac...
We present a preliminary study on an end-to-end variational autoencoder (VAE) for sound morphing. Two VAE variants are compared: VAE with dilation layers (DC-VAE) and VAE only with regular convolutional layers (CC-VAE). We combine the following loss functions: 1) the time-domain mean-squared error for reconstructing the input signal, 2) the Kullbac...
This paper presents a user interface for the exploration of music libraries based on parametric t-SNE. Each song in the music library is represented as a stack of 34-dimensional vectors containing features related to genres, emotions and other musical characteristics. Parametric t-SNE is used to construct a model that extracts a pair of coordinates...
Learning to navigate in 3D environments from raw sensory input is an important step towards bridging the gap between human players and artificial intelligence in digital games. Recent advances in deep reinforcement learning have seen success in teaching agents to play Atari 2600 games from raw pixel information where the environment is always fully...
End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram tran...
An application for ballet training is presented that monitors the posture position (straightness of the spine and rotation of the pelvis) deviation from the ideal position in real-time. The human skeletal data is acquired through a Microsoft Kinect v2. The movement of the student is mirrored through an abstract skeletal figure and instructions are...
After having revolutionized image and speech processing, convolutional neural networks (CNN) are now starting to become more and more successful in music information retrieval as well. We compare four CNN types for classifying a dataset of more than 3000 acoustic and synthesized samples of the most prominent drum set instruments (bass, snare, hi-ha...
One of the recent developments in teaching that heavily relies on current technology is the “flipped classroom” approach. In a flipped classroom the traditional lecture and homework sessions are inverted. Students are provided with online material in order to gain necessary knowledge before class, while class time is devoted to clarifications and a...
Guided by the idea that musical human-computer interaction may become more effective, intuitive, and creative when basing its computer part on cognitively more plausible learning principles, we employ unsupervised incremental online learning (i.e. clustering) to build a system that predicts the next event in a musical sequence, given as audio input...
When listening to ensemble music, even nonmusicians can follow single instruments effortlessly. Electrophysiological indices for neural sensory encoding of separate streams have been described using oddball paradigms that utilize brain reactions to sound events that deviate from a repeating standard pattern. Obviously, these paradigms put constrain...
A system is presented that segments, clusters and predicts musical audio in
an unsupervised manner, adjusting the number of (timbre) clusters
instantaneously to the audio input. A sequence learning algorithm adapts its
structure to a dynamically changing clustering tree. The flow of the system is
as follows: 1) segmentation by onset detection, 2) t...
Objective. Polyphonic music (music consisting of several instruments playing in parallel) is an intuitive way of embedding multiple information streams. The different instruments in a musical piece form concurrent information streams that seamlessly integrate into a coherent and hedonistically appealing entity. Here, we explore polyphonic music as...
The quality of wafer production in semiconductor manufacturing cannot always be monitored by a costly physical measurement. Instead of measuring a quantity directly, it can be predicted by a regression method (virtual metrology). In this paper, a survey on regression methods is given to predict average silicon nitride cap layer thickness for the pl...
A framework is proposed for generating interesting, musically similar variations of a given monophonic melody. The focus is on pop/rock guitar and bass guitar melodies with the aim of eventual extensions to other instruments and musical styles. It is demonstrated here how learning musical style from segmented audio data can be formulated as an unsu...
Byzantine Chant performance practice is computationally compared to the Chrysanthine theory of the eight Byzantine Tones (octoechos). Intonation, steps, and prominence of scale degrees are quantified, based on pitch class profiles. The novel procedure introduced here comprises the following analysis steps: (1) the pitch trajectory is extracted and...
A system is presented that generates a sound sequence from an original audio chord sequence, having the following characteristics: The generation can be arbitrarily long, preserves certain musical characteristics of the original and has a reasonable degree of interestingness. The procedure comprises the following steps: 1) chord segmentation by ons...
Studies of Gaver (W. W. Gaver, “How do we hear in the world? Explorations in ecological acoustics,” Ecological Psychology, 1993) revealed that humans categorize everyday sounds considering the processes that have generated them: He defined these categories in a taxonomy according to the aggregate states of the involved materials (solid, liquid, gas...
In this paper we introduce a framework that represents environ-mental texture sounds as a linear superposition of independent foreground and background layers that roughly correspond to en-tities in the physical production of the sound. Sound samples are decomposed into a sparse representation with the matching pur-suit algorithm and a dictionary o...
Up to now, there has only been little work on using features from temporal approximations of signals for audio recognition. Time-frequency tradeoffs are an important issue in signal processing; sparse representations using overcomplete dictionaries may (or may not, depending on the dictionary) have more time-frequency flexibility than standard shor...
Different approaches for the prediction of average Silicon Nitride cap layer thickness for the Plasma Enhanced Chemical Vapor Deposition (PECVD) dual-layer metal passivation stack process are compared, based on metrology and production equipment Fault Detection and Classification (FDC) data. Various sets of FDC parameters are processed by different...
A causal system to represent a stream of music into musical events, and to generate further expected events, is presented. Starting from an auditory front-end which extracts low-level (i.e. MFCC) and mid-level features such as onsets and beats, an unsupervised clustering process builds and maintains a set of symbols aimed at representing musical st...
A system is presented that learns the structure of an audio recording of a rhythmical percussion fragment in an unsupervised manner and that synthesizes musical variations from it. The procedure consists of 1) segmentation, 2) symbolization (feature extraction, clustering, sequence structure analysis, temporal alignment), and 3) synthesis. The symb...
Although rare in the sound recognition literature, previous work us-ing features derived from a sparse temporal representation has led to some success [8, 2, 9]. A great advantage of deriving features from a temporal representation is that such an approach does not face the trade-off problem between time and frequency resolution. Here, we present a...
Query-by-Humming (QBH) is an increasingly prominent technology that allows users to browse through a song database by singing/humming a part of the song they wish to retrieve. Besides these cases, QBH can also be used to track the performance of a user in applications such as Score Alignment and Real-Time Accompani-ment. In this paper we present an...
Not Available Bibtex entry for this abstract Preferred format for this abstract (see Preferences) Find Similar Abstracts: Use: Authors Title Return: Query Results Return items starting with number Query Form Database: Astronomy Physics arXiv e-prints
Query-by-Humming (QBH) is an increasingly popular technology that allows users to browse through a song database by singing/humming a part of the song they wish to retrieve. Besides these cases, QBH can be also used in applications such as Score Alignment and Real-Time Accompaniment, to locate the exact position of the soloist within the piece. We...
In this paper we describe a new parametric model for syn-thesizing environmental sound textures, such as running water, rain, and fire. Sound texture analysis is cast in the framework of wavelet decomposition and multiresolution statistical models, that have previously found application in image texture analysis and synthesis. We stochastically sam...
In this paper we present a system that learns rhythmic patterns from drum audio recording and synthesizes music variations from the learnt sequence. The procedure described is completely unsupervised and embodies the transcription of a percussion sequence into a fuzzy multilevel representation. Moreover, a tempo estimation procedure identifying the...
Measures such as entropy and mutual information can be used to characterise random processes. In this paper, we propose the use of several time-varying information measures, computed in the context of a probabilistic model that evolves as a sample of ...
A causal system to represent a stream of music into musical events, and to generate further expected events, is presented. Starting from an auditory front-end that extracts low-level (i.e. MFCC) and mid-level features such as onsets and beats, an unsupervised clustering process builds and maintains a set of symbols aimed at representing musical str...
We describe a biophysically motivated model of auditory salience based on a model of cortical responses and present results that show that the derived measure of salience can be used to identify the position of perceptual onsets in a musical stimulus successfully. The salience measure is also shown to be useful to track beats and predict rhythmic s...
We present a review on perception and cognition models designed for or applicable to music. An emphasis is put on computational implementations. We include findings from different disciplines: neuroscience, psychology, cognitive science, artificial intelligence, and musicology. The article summarizes the methodology that these disciplines use to ap...
In Part I [Purwins H, Herrera P, Grachten M, Hazan A, Marxer R, Serra X. Computational models of music perception and cognition I: The perceptual and cognitive processing chain. Physics of Life Reviews 2008, in press, doi:10.1016/j.plrev.2008.03.004], we addressed the study of cognitive processes that underlie auditory perception of music, and thei...
A causal system for representing a musical stream and generating further expected events is presented. Starting from an auditory front-end which extracts low-level (e.g., spectral shape, mel frequency cepstral coefficients) and midlevel features such as onsets and beats, an unsupervised categorisation process builds and maintains a set of symbols a...
We introduce a generic model of emergence of musical categories during the listening process. The model is based on a preprocessing and a categorization module. Preprocessing results in a perceptually plausible representation of music events extracted from symbolic input. The categorization module lets a taxonomy of musical entities emerge accordin...
We present a system to produce expectations based on the observation of a rhythmic music signals at a constant tempo. The algorithms we use are causal, in order be fit closer to cognitive constraints and allow a future real-time implementation. In a first step, an acoustic front-end based on the aubio library extracts onsets and beats from the inco...
In this study the classification performance of 2 machine learning methods and 2 sound representations schemes are compared, having the focus on short impact like sounds: Footsteps have been classified according to the material of the floor and the shoe type. The gammatone auditory filter bank is a spectral analyser, that converts a given signal in...
We describe a biophysically motivated model of auditory salience and present results which show that the derived measure of salience can be used to successfully identify the position of perceptual on sets in a musi- cal stimulus. We evaluate the method using a corpus of unaccompanied freely sung stimuli. We briefly show that perceptual onsets d ete...
In the recognition and classification of sounds, extracting perceptually and biologically relevant features yields much better results than the standard low-level methods (e.g zero-crossings, roll-o, centroid, energy, etc.). Gamma-tone filters are biologically relevant, as they simulate the motion of the basilar membrane. The representation techniq...
Die Beziehungen der 24 Dur- und moll-Tonarten lassen sich doppelt-zirkulär als Torus darstellen. Wir zeigen die Konvergenz der Herleitungen auf den Beschreibungsebenen a) des psychoakustischen Experiments, b) der geometrischen Musiktheorie und c) der Computersimulation des Musikhörens. Shepard (1964) verallgemeinernd wird zirkuläre Tonhöhenwahrnehm...
We apply correspondence analysis for visualization of interdependence of pitch class & key and key & composer. A co-occurrence matrix of key & pitch class frequencies is extracted from score (Bach's WTC). Keys are represented as high-dimensional pitch class vectors. Correspondence analysis then projects keys on a planar ìkeyscapeî. Vice versa, on ì...
In this paper the ingredients of computing auditory perception are reviewed. On the basic level there is neurophysiology, which is abstracted to artificial neural nets (ANNs) and enhanced by statistics to machine learning. There are high-level cognitive models derived from psychoacoustics (especially Gestalt principles). The gap between neuroscienc...
In the experiment we proof the paradoxical perception of pitch. A tone sequence demonstrates this paradox: the se- quence is perceived as descending, although the fir st and last tone are identical. The intransitivity of pitch perception gen- eralizes (Shepard 1964). It applies also to harmonic com- plex tones with variate amplitude envelope and pa...
Cq-profiles are 12-dimensional vectors, each component referring to a pitch class. They can be employed to represent keys. Cq-profiles are calculated with the constant Q filter bank [4]. They have the following advantages: (i) they correspond to probe tone ratings. (ii) Calculation is possible in real-time. (iii) Stability is obtained with respect...
Cq-profiles are 12-dimensional vectors, each component referring to a pitch class. They can be employed to represent keys. Cq-profiles are calculated with the constant Q filter bank. They have the following advantages: 1) they correspond to probe tone ratings; 2) calculation is possible in real-time; 3) stability is obtained with respect to sound q...
cote interne IRCAM: Susini06c
Key finding in audio is based on the constant Q trans-form. A heuristics is suggested how to compress the con-stant Q transform into a 12-dimensional short-term pitch class profile. Short-term profiles are weighted by a co-sine window and summed up yielding long-term profiles. The latter are matched against averaged major and minor prototype profil...
Cq-proles are 12-dimensional vectors, each component referring to a pitch class. They can be employed to represent keys. Cq-proles are calculated with the constant Q lter bank (4). They have the following advantages: (i) They correspond to probe tone ratings. (ii) Calculation is possible in real-time. (iii) Stability is obtained with respect to sou...
The abrupt change of loudness is a salient event that is not always expected by a music listener. Therefore loudness is an important cue when seeking for events in a music stream that could violate human expectations. The concept of expectation and surprise in music has become recently the subject of extensive research, however mostly using symboli...
A causal system for representing a musical stream and generating further expected events is presented. Starting from an auditory front-end which extracts low-level (e.g. spectral shape, MFCC, pitch) and mid-level features such as onsets and beats, an unsupervised clustering process builds and maintains a set of symbols aimed at representing musical...
The doubly circular relations of the major and minor keys based on all twelve pitch-classes can be depicted in toroidal models. We demonstrate a convergence of deriva-tions from the different bases of conventional harmonic theory and recent experi-ments in music psychology. We present a formalization of the music-theoretical derivation from Gottfri...
A system is introduced that learns the structure of an audio recording of a rhythmical percussion fragment in an unsupervised man-ner and synthesizes musical variations from it. The procedure consists of 1) segmentation, 2) symbolization (feature extraction, clustering, se-quence structure analysis, temporal alignment), and 3) synthesis. The symbol...