György Fazekas

György Fazekas
  • Queen Mary University of London

About

139
Publications
33,280
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,022
Citations
Introduction
Skills and Expertise
Current institution
Queen Mary University of London
Additional affiliations
September 2007 - present
Queen Mary University of London
Position
  • Lecturer

Publications

Publications (139)
Preprint
Many audio synthesizers can produce the same signal given different parameter configurations, meaning the inversion from sound to parameters is an inherently ill-posed problem. We show that this is largely due to intrinsic symmetries of the synthesizer, and focus in particular on permutation invariance. First, we demonstrate on a synthetic task tha...
Preprint
Full-text available
Style Transfer with Inference-Time Optimisation (ST-ITO) is a recent approach for transferring the applied effects of a reference audio to a raw audio track. It optimises the effect parameters to minimise the distance between the style embeddings of the processed audio and the reference. However, this method treats all possible configurations equal...
Preprint
The recent surge in the popularity of diffusion models for image synthesis has attracted new attention to their potential for generation tasks in other domains. However, their applications to symbolic music generation remain largely under-explored because symbolic music is typically represented as sequences of discrete events and standard diffusion...
Preprint
Full-text available
This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for ``Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for parameter estimation. Voca...
Preprint
Full-text available
Contrastive learning has proven effective in self-supervised musical representation learning, particularly for Music Information Retrieval (MIR) tasks. However, reliance on augmentation chains for contrastive view generation and the resulting learnt invariances pose challenges when different downstream tasks require sensitivity to certain musical a...
Article
Full-text available
This paper introduces GlOttal‑flow LPC Filter (GOLF), a novel method for singing voice synthesis (SVS) that exploits the physical characteristics of the human voice using differentiable digital signal processing. GOLF employs a glottal model as the harmonic source and LPC filters to simulate the vocal tract, resulting in an interpretable and effici...
Preprint
Full-text available
This paper presents Tidal-MerzA, a novel system designed for collaborative performances between humans and a machine agent in the context of live coding, specifically focusing on the generation of musical patterns. Tidal-MerzA fuses two foundational models: ALCAA (Affective Live Coding Autonomous Agent) and Tidal Fuzz, a computational framework. By...
Preprint
Full-text available
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multi...
Preprint
Full-text available
Despite the success of contrastive learning in Music Information Retrieval, the inherent ambiguity of contrastive self-supervision presents a challenge. Relying solely on augmentation chains and self-supervised positive sampling strategies can lead to a pretraining objective that does not capture key musical information for downstream tasks. We int...
Preprint
Full-text available
Training the linear prediction (LP) operator end-to-end for audio synthesis in modern deep learning frameworks is slow due to its recursive formulation. In addition, frame-wise approximation as an acceleration method cannot generalise well to test time conditions where the LP is computed sample-wise. Efficient differentiable sample-wise LP for end-...
Preprint
Full-text available
Effective music mixing requires technical and creative finesse, but clear communication with the client is crucial. The mixing engineer must grasp the client's expectations, and preferences, and collaborate to achieve the desired sound. The tacit agreement for the desired sound of the mix is often established using guides like reference songs and d...
Preprint
The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesi...
Conference Paper
Full-text available
Music can convey ideological stances, and gender is just one of them. Evidence from musicology and psychology research shows that gender-loaded messages can be reliably encoded and decoded via musical sounds. However, much of this evidence comes from examining music in isolation, while studies of the gendering of music within multimodal communicati...
Preprint
Full-text available
This paper introduces GlOttal-flow LPC Filter (GOLF), a novel method for singing voice synthesis (SVS) that exploits the physical characteristics of the human voice using differentiable digital signal processing. GOLF employs a glottal model as the harmonic source and IIR filters to simulate the vocal tract, resulting in an interpretable and effici...
Preprint
We discuss the discontinuities that arise when mapping unordered objects to neural network outputs of fixed permutation, referred to as the responsibility problem. Prior work has proved the existence of the issue by identifying a single discontinuity. Here, we show that discontinuities under such models are uncountably infinite, motivating further...
Chapter
This paper introduces a prototype of SketchSynth, a system that enables users to graphically control synthesis using sketches of cross-modal associations between sound and shape. The development is motivated by finding alternatives to technical synthesiser controls to enable a more intuitive realisation of sound ideas. There is strong evidence that...
Preprint
Full-text available
Musical professionals who produce material for non-musical stakeholders often face communication challenges in the early ideation stage. Expressing musical ideas can be difficult, especially when domain-specific vocabulary is lacking. This position paper proposes the use of artificial intelligence to facilitate communication between stakeholders an...
Preprint
Full-text available
Computer musicians refer to mesostructures as the intermediate levels of articulation between the microstructure of waveshapes and the macrostructure of musical forms. Examples of mesostructures include melody, arpeggios, syncopation, polyphonic grouping, and textural contrast. Despite their central role in musical expression, they have received li...
Preprint
Full-text available
Physical models of rigid bodies are used for sound synthesis in applications from virtual environments to music production. Traditional methods such as modal synthesis often rely on computationally expensive numerical solvers, while recent deep learning approaches are limited by post-processing of their results. In this work we present a novel end-...
Preprint
Full-text available
Sinusoidal parameter estimation is a fundamental task in applications from spectral analysis to time-series forecasting. Estimating the sinusoidal frequency parameter by gradient descent is, however, often impossible as the error function is non-convex and densely populated with local minima. The growing family of differentiable signal processing m...
Article
Full-text available
Jazz is a musical tradition that is just over 100 years old; unlike in other Western musical traditions, improvisation plays a central role in jazz. Modelling the domain of jazz poses some ontological challenges due to specificities in musical content and performance practice, such as band lineup fluidity and importance of short melodic patterns fo...
Preprint
Full-text available
As one of the most intuitive interfaces known to humans, natural language has the potential to mediate many tasks that involve human-computer interaction, especially in application-focused fields like Music Information Retrieval. In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain. To this en...
Preprint
Disembodied electronic sounds constitute a large part of the modern auditory lexicon, but research into timbre perception has focused mostly on the tones of conventional acoustic musical instruments. It is unclear whether insights from these studies generalise to electronic sounds, nor is it obvious how these relate to the creation of such sounds....
Preprint
Full-text available
Sound-shape associations, a subset of cross-modal associations between the auditory and visual domain, have been studied mainly in the context of matching a set of purposefully crafted shapes to sounds. Recent studies have explored how humans represent sound through free-form sketching and how a graphical sketch input could be used for sound produc...
Article
Disembodied electronic sounds constitute a large part of the modern auditory lexicon, but research into timbre perception has focused mostly on the tones of conventional acoustic musical instruments. It is unclear whether insights from these studies generalize to electronic sounds, nor is it obvious how these relate to the creation of such sounds....
Article
Full-text available
The Smart Musical Instruments (SMIs) are an emerging category of musical instruments that belongs to the wider class of Musical Things within the Internet of Musical Things paradigm. SMIs encompass sensors, actuators, embedded intelligence, and wireless connectivity to local networks and to the Internet. Interoperability represents a key issue with...
Preprint
Full-text available
The main challenges of Optical Music Recognition (OMR) come from the nature of written music, its complexity and the difficulty of finding an appropriate data representation. This paper provides a first look at DoReMi, an OMR dataset that addresses these challenges, and a baseline object detection model to assess its utility. Researchers often appr...
Preprint
Full-text available
Sound synthesiser controls typically correspond to technical parameters of signal processing algorithms rather than intuitive sound descriptors that relate to human perception of sound. This makes it difficult to realise sound ideas in a straightforward way. Cross-modal mappings, for example between gestures and sound, have been suggested as a more...
Preprint
We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural audio synthesis which operates directly in the waveform domain, with an accompanying optimisation (FastNEWT) for efficient CPU inference. The NEWT uses time-distributed multilayer perceptrons with periodic activations to implicitly learn nonlinear t...
Preprint
Convolutional Neural Networks have been extensively explored in the task of automatic music tagging. The problem can be approached by using either engineered time-frequency features or raw audio as input. Modulation filter bank representations that have been actively researched as a basis for timbre perception have the potential to facilitate the e...
Preprint
Full-text available
Detecting piano pedalling techniques in polyphonic music remains a challenging task in music information retrieval. While other piano-related tasks, such as pitch estimation and onset detection, have seen improvement through applying deep learning methods, little work has been done to develop deep learning models to detect playing techniques. In th...
Article
Full-text available
The recent introduction of Deep Learning has led to a vast array of breakthroughs in many fields of science and engineering [...]
Article
Full-text available
Music has been shown to be capable of improving runners’ performance in treadmill and laboratory-based experiments. This paper evaluates a generative music system, namely HEARTBEATS, designed to create biosignal synchronous music in real-time according to an individual athlete’s heartrate or cadence (steps per minute). The tempo, melody, and timbra...
Preprint
Full-text available
Optical Music Recognition (OMR) is concerned with transcribing sheet music into a machine-readable format. The transcribed copy should allow musicians to compose, play and edit music by taking a picture of a music sheet. Complete transcription of sheet music would also enable more efficient archival. OMR facilitates examining sheet music statistica...
Article
Full-text available
Large online music databases under Creative Commons licenses are rarely recorded by well-known artists, therefore conventional metadata-based search is insufficient in their adaptation to instrument players' needs. The emerging class of smart musical instruments (SMIs) can address this challenge. Thanks to direct internet connectivity and embedded...
Preprint
In recent years, Markov logic networks (MLNs) have been proposed as a potentially useful paradigm for music signal analysis. Because all hidden Markov models can be reformulated as MLNs, the latter can provide an all-encompassing framework that reuses and extends previous work in the field. However, just because it is theoretically possible to refo...
Article
Full-text available
The Internet of Musical Things (IoMusT) is an emerging research area consisting of the extension of the Internet of Things paradigm to the music domain. Interoperability represents a central issue within this domain, where heterogeneous objects dedicated to the production and/or reception of musical content (Musical Things) are envisioned to commun...
Conference Paper
Full-text available
It is not uncommon to hear musicians and audio engineers speak of warmth and brightness when describing analog technologies such as vintage mixing consoles, multitrack tape machines, and valve compressors. What is perhaps less common, is hearing this term used in association with retro digital technology. A question exists as to how much the low bi...
Conference Paper
It is not uncommon to hear musicians and audio engineers speak of warmth and brightness when describing analog technologies such as vintage mixing consoles, multitrack tape machines, and valve compressors. What is perhaps less common, is hearing this term used in association with retro digital technology. A question exists as to how much the low bi...
Chapter
Full-text available
With the advent of online audio resources and web technologies, digital tools for sound designers and music producers are changing. The Internet provides access to hundreds of thousands of digital audio files, from human- and nature-related environmental sounds, instrument samples and sound effects, to produced songs ready to use in media productio...
Preprint
In this paper, a siamese DNN model is proposed to learn the characteristics of the audio dynamic range compressor (DRC). This facilitates an intelligent control system that uses audio examples to configure the DRC, a widely used non-linear audio signal conditioning technique in the areas of music production, speech communication and broadcasting. S...
Conference Paper
Full-text available
The Internet of Musical Things is an emerging research area that relates to the network of Musical Things, which are computing devices embedded in physical objects dedicated to the production and/or reception of musical content. In this paper we propose a semantically-enriched Internet of Musical Things architecture which relies on a semantic audio...
Conference Paper
Full-text available
Semantic Web technologies are increasingly used in the Internet of Things due to their intrinsic propensity to foster interoperability among heterogenous devices and services. However, some of the IoT application domains have strict requirements in terms of timeliness of the exchanged messages, latency and support for constrained devices. An exampl...
Conference Paper
Dynamic range compressors (DRC) are one of the most commonly used audio effect in music production. The timing settings are particularly important for controlling the manner in which they will shape an audio signal. We present a subjective user study of DRC, where a series of different compressor attack and release setting are varied and applied to...
Conference Paper
Playsound is a simple and intuitive web-based tool for music composition based on sounds from Freesound, an online repository of diverse audio content with Creative Commons licenses. In this paper, we present an approach based on Semantic Web technologies to provide recommendations to Playsound users. A Semantic Web of Things architecture is outlin...
Conference Paper
Full-text available
A common problem in music education is finding varied and engaging material that is suitable for practising a specific musical concept or technique. At the same time, a number of large music collections are available under a Creative Commons (CC) licence (e.g. Jamendo, ccMixter), but their potential is largely untapped because of the relative obscu...
Chapter
Multiple online services host repositories of audio clips of different kinds, ranging from music tracks, albums, playlists, to instrument samples and loops, to a variety of recorded or synthesized sounds. Programmatic access to these resources maybe used by client applications for tasks ranging from customized musical listening and exploration, to...
Conference Paper
Full-text available
Nowadays, a number of online music databases are available under Creative Commons licenses (e.g. Jamendo, ccMixter). Typically, it is possible to navigate and play their content through search interfaces based on metadata and file-wide tags. However, because this music is largely unknown, additional methods of discovery need to be explored. In this...
Article
This paper presents a study of piano pedaling gestures and techniques on the sustain pedal from the perspective of measurement, recognition, and visualization. Pedaling gestures can be captured by a dedicated measurement system where the sensor data can be simultaneously recorded alongside the piano sound under normal playing conditions. Using the...
Conference Paper
Full-text available
The recent increase in the accessibility and size of personal and crowd-sourced digital sound collections brought about a valuable resource for music creation. Finding and retrieving relevant sounds in performance leads to challenges that can be approached using music information retrieval (MIR). In this paper, we explore the use of MIR to retrieve...
Conference Paper
Full-text available
The recent increase in the accessibility and size of personal and crowdsourced digital sound collections brought about a valuable resource for music creation. Finding and retrieving relevant sounds in performance leads to challenges that can be approached using music information retrieval (MIR). In this paper, we explore the use of MIR to retrieve...
Article
Full-text available
Deep neural networks (DNNs) have been successfully applied to music classification including music tagging. However, there are several open questions regarding the training, evaluation, and analysis of DNNs. In this paper, we investigate specific aspects of neural networks, the effects of noisy labels, to deepen our understanding of their propertie...
Conference Paper
We describe the publication of a linked data set exposing metadata from the Internet Archive Live Music Archive along with detailed feature analysis data of the audio files contained in the archive. The collection is linked to existing musical and geographical resources allowing for the extraction of useful or nteresting subsets of data using addit...
Article
Full-text available
Following their success in Computer Vision and other areas, deep learning techniques have recently become widely adopted in Music Information Retrieval (MIR) research. However, the majority of works aim to adopt and assess methods that have been shown to be effective in other domains, while there is still a great need for more original research foc...
Conference Paper
Full-text available
Practical experience with audio effects as well as knowledge of their parameters and how they change the sound is crucial when controlling digital audio effects. This often presents barriers for musicians and casual users in the application of effects. These users are more accustomed to describing the desired sound verbally or using examples, rathe...
Conference Paper
This paper presents a study of piano pedalling technique recognition on the sustain pedal utilising gesture data that is collected using a novel measurement system. The recognition is comprised of two separate tasks: onset/offset detection and classification. The onset and offset time of each pedalling technique was computed through signal processi...
Conference Paper
As many cultural institutions are publishing digital heritage material on the web, a new type of user emerged, that casually interacts with the art collection in his/her free time, driven by intrinsic curiosity more than by a professional duty or an informational goal. Can choices in how the interaction with data is structured increase engagement o...
Article
Full-text available
In this paper, we present a transfer learning approach for music classification and regression tasks. We propose to use a pretrained convnet feature, a concatenated feature vector using activations of feature maps of multiple layers in a trained convolutional network. We show that how this convnet feature can serve as a general-purpose music repres...
Conference Paper
Creating an ecosystem that will tie together the content, technologies and tools in the field of digital music and audio is possible if all the entities of the ecosystem share the same vocabulary and high quality metadata. Creation of such metadata will allow the creative industries to retrieve and reuse the content of Creative Commons audio in inn...
Conference Paper
This paper presents MusicWeb, a novel platform for music discovery by linking music artists within a web-based application. MusicWeb provides a browsing experience using connections that are either extra-musical or tangential to music, such as the artists’ political affiliation or social influence, or intra-musical, such as the artists’ main instru...
Conference Paper
Feature extraction algorithms in Music Informatics aim at deriving statistical and semantic information directly from audio signals. These may be ranging from energies in several frequency bands to musical information such as key, chords or rhythm. There is an increasing diversity and complexity of features and algorithms in this domain and applica...
Conference Paper
This paper introduces the Audio Effect Ontology (AUFX-O) building on previous theoretical models describing audio processing units and workflows in the context of music production. We discuss important conceptualisations of different abstraction layers, their necessity to successfully model audio effects, and their application method. We present us...
Conference Paper
Dynamic music is gaining increasing popularity outside of its initial environment, the videogame industry, and is gradually becoming an autonomous medium. Responsible for this is doubtlessly the prevalence of integrated multisensory platforms such as smartphones as well as the omnipresence of the internet as a provider of content on demand. The mus...
Conference Paper
In music production, descriptive terminology is used to define perceived sound transformations. By understanding the underlying statistical features associated with these descriptions, we can aid the retrieval of contextually relevant processing parameters using natural language, and create intelligent systems capable of assisting in audio engineer...
Article
Full-text available
Consumer cultures are increasingly shifting to cultures of participation supported by technology such as social computing. In the domain of interactive audio, listeners' roles are revisited giving more importance to desire, context, and the sense of control. We present new developments in our mood-driven music player Moodplay allowing both social (...
Code
The Audio Effect Ontology (AUFX-O) provides concepts and properties for describing audio effects in the context of music production in a studio environment. It provides means for describing effect implementations and their application within a music production project. http://isophonics.net/content/aufx
Conference Paper
Moodplay is a system that allows users to collectively control music and lighting effects to express desired emotions. The interaction is based on the Mood Conductor participatory performance system that uses web, data visualisation and affective computing technologies. We explore how artificial intelligence, semantic web and audio synthesis can be...
Conference Paper
Computational feature extraction provides one means of gathering structured analytic metadata for large media collections. We demonstrate a suite of tools we have developed that automate the process of feature extraction from audio in the Internet Archive. The system constructs an RDF description of the analysis workflow and results which is then r...

Network

Cited By