Dorien Herremans

Dorien Herremans
Singapore University of Technology and Design · Information Systems Technology & Design Pillar

PhD in Applied Economics

About

127
Publications
42,437
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,143
Citations
Citations since 2016
110 Research Items
1087 Citations
2016201720182019202020212022050100150200250300
2016201720182019202020212022050100150200250300
2016201720182019202020212022050100150200250300
2016201720182019202020212022050100150200250300
Introduction
Dorien Herremans is an Assistant Professor at Singapore University of Technology and Design, where she is also Director of Game Lab. She has a joint-appointment at the Institute of High Performance Computing, A*STAR and works as a certified instructor for the NVIDIA Deep Learning Institute and is director of SUTD Game Lab. Before going to SUTD, she was a Marie Sklodowska-Curie Postdoctoral Fellow at the Centre for Digital Music at Queen Mary University of London., where she worked on the project: ``MorpheuS: Hybrid Machine Learning – Optimization techniques To Generate Structured Music Through Morphing And Fusion''. She received her Ph.D. in Applied Economics on the topic of Computer Generation and Classification of Music through Operations Research Methods. She graduated as a commercial engineer in management information systems at the University of Antwerp in 2005. After that, she worked as a Drupal consultant and was an IT lecturer at the Les Roches University in Bluche, Switzerland. She also worked as a mandaatassistent at the University of Antwerp, in the domain of operations management, supply chain management and operations research. Dr. Herremans' research interests include machine learning and music for automatic music generation, data mining for music classification (hit prediction) and novel applications in the intersections of machine learning/optimization and music.
Additional affiliations
August 2017 - present
Singapore University of Technology and Design
Position
  • Professor (Assistant)
August 2017 - July 2020
Agency for Science, Technology and Research (A*STAR)
Position
  • Researcher
June 2015 - May 2017
Queen Mary, University of London
Position
  • Fellow
Education
October 2010 - December 2014
University of Antwerp
Field of study
  • PhD (in applied Economics): Generating and Classifying music through Operations Research Methods

Publications

Publications (127)
Article
Full-text available
Record companies invest billions of dollars in new talent around the globe each year. Gaining insight into what actually makes a hit song would provide tremendous benefits for the music industry. In this research we tackle this question by focussing on the dance hit song classification problem. A database of dance hit songs from 1985 until 2013 is...
Article
Full-text available
Automatic music generation systems have gained in popularity and sophistication as advances in cloud computing have enabled large-scale complex computations such as deep models and optimization algorithms on personal devices. Yet, they still face an important challenge, that of long-term structure, which is key to conveying a sense of musical coher...
Article
Full-text available
Digital advances have transformed the face of automatic music generation since its beginnings at the dawn of computing. Despite the many breakthroughs, issues such as the musical tasks targeted by different machines and the degree to which they succeed remain open questions. We present a functional taxonomy for music generation systems with referen...
Conference Paper
Full-text available
We propose an end-to-end approach for modeling polyphonic music with a novel graphical representation, based on music theory, in a deep neural network. Despite the success of deep learning in various applications, it remains a challenge to incorporate existing domain knowledge in a network without affecting its training routines. In this paper we p...
Article
Full-text available
A total of 34% of AI research and development projects fail or are abandoned, according to a recent survey by Rackspace Technology of 1,870 companies [1]. In this perspective paper, a new STrategic ROadMap, aiSTROM, is presented that empowers managers to create an AI strategy. A comprehensive approach is provided that guides managers and lead devel...
Preprint
Full-text available
Following the success of the transformer architecture in the natural language domain, transformer-like architectures have been widely applied to the domain of symbolic music recently. Symbolic music and text, however, are two different modalities. Symbolic music contains multiple attributes, both absolute attributes (e.g., pitch) and relative attri...
Conference Paper
Full-text available
This paper puts forth a method that is easily transposable to a realtime environment by utilising a “Physics Approximating Neural Network" to predict 1D output signal. The technique described in this paper is inspired by Physics Informed Neural Networks put forth by Raissi et al. The model demonstrated in this paper makes use of a recurrent input....
Preprint
Music is capable of conveying many emotions. The level and type of emotion of the music perceived by a listener, however, is highly subjective. In this study, we present the Music Emotion Recognition with Profile information dataset (MERP). This database was collected through Amazon Mechanical Turk (MTurk) and features dynamical valence and arousal...
Preprint
Full-text available
In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT). Instead of treating AMT as a discriminative task in which the model is trained to convert spectrograms into piano rolls, we think of it as a conditional generative task where we train our model to generate realistic looking piano rolls fro...
Preprint
Full-text available
The cryptocurrency market is highly volatile compared to traditional financial markets. Hence, forecasting its volatility is crucial for risk management. In this paper, we investigate CryptoQuant data (e.g. on-chain analytics, exchange and miner data) and whale-alert tweets, and explore their relationship to Bitcoin's next-day volatility, with a fo...
Article
Studies in affective audio–visual correspondence learning require ground-truth data to train, validate, and test models. The number of available datasets together with benchmarks, however, is still limited. In this paper, we create a collection of three datasets (called EmoMV) for affective correspondence learning between music and video modalities...
Preprint
Full-text available
Inspired by recent advancements in the field of computer vision, specifically models for generating higher-resolution images from low-resolution images, we investigate the utility of a deep convolutional autoencoder for downscaling and bias correcting climate projections for South East Asia (SEA). Downscaled projections of 2 m surface temperature a...
Preprint
Full-text available
In this paper, we introduce Jointist, an instrument-aware multi-instrument framework that is capable of transcribing, recognizing, and separating multiple musical instruments from an audio clip. Jointist consists of the instrument recognition module that conditions the other modules: the transcription module that outputs instrument-specific piano r...
Preprint
Full-text available
Bitcoin, with its ever-growing popularity, has demonstrated extreme price volatility since its origin. This volatility, together with its decentralised nature, make Bitcoin highly subjective to speculative trading as compared to more traditional assets. In this paper, we propose a multimodal model for predicting extreme price fluctuations. This mod...
Article
Full-text available
In this paper, we introduce an approach for future frames prediction based on a single input image. Our method is able to generate an entire video sequence based on the information contained in the input frame. We adopt an autoregressive approach in our generation process, i.e., the output from each time step is fed as the input to the next step. U...
Preprint
In this paper we explore the possibility of maximizing the information represented in spectrograms by making the spectrogram basis functions trainable. We experiment with two different tasks, namely keyword spotting (KWS) and automatic speech recognition (ASR). For most neural network models, the architecture and hyperparameters are typically fine-...
Chapter
The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle...
Chapter
Full-text available
We present a novel music generation framework for music infilling, with a user friendly interface. Infilling refers to the task of generating musical sections given the surrounding multi-track music. The proposed transformer-based framework is extensible for new control tokens as the added music control tokens such as tonal tension per bar and trac...
Preprint
Full-text available
What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR 2021 NeurIPS challenge is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR 2021 evaluates audio rep...
Preprint
Full-text available
Although media content is increasingly produced, distributed, and consumed in multiple combinations of modalities, how individual modalities contribute to the perceived emotion of a media item remains poorly understood. In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory a...
Preprint
Full-text available
We present a novel music generation framework for music infilling, with a user friendly interface. Infilling refers to the task of generating musical sections given the surrounding multi-track music. The proposed transformer-based framework is extensible for new control tokens as the added music control tokens such as tonal tension per bar and trac...
Preprint
Full-text available
The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle...
Article
Full-text available
While public awareness of climate change has grown over the years, many people still have misconceptions regarding effective individual environmental action. In this paper, we present a serious game called PEAR, developed using elements of geolocation and augmented reality (AR), aimed at increasing players’ awareness of climate change issues and pr...
Article
Full-text available
In this paper, we tackle the problem of predicting the affective responses of movie viewers, based on the content of the movies. Current studies on this topic focus on video representation learning and fusion techniques to combine the extracted features for predicting affect. Yet, these typically, while ignoring the correlation between multiple mod...
Article
Full-text available
Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract infection (LRTI). To train a deep neur...
Preprint
Full-text available
Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres that are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issu...
Preprint
Full-text available
A total of 34% of AI research and development projects fails or are abandoned, according to a recent survey by Rackspace Technology of 1,870 companies. We propose a new strategic framework, aiSTROM, that empowers managers to create a successful AI strategy based on a thorough literature review. This provides a unique and integrated approach that gu...
Preprint
Full-text available
Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract infection (LRTI). In order to train a...
Conference Paper
Full-text available
Recent advances in automatic music transcription (AMT) have achieved highly accurate polyphonic piano transcription results by incorporating onset and offset detection. The existing literature, however, focuses mainly on the leverage of deep and complex models to achieve state-of-the-art (SOTA) accuracy, without understanding model behaviour. In th...
Conference Paper
Full-text available
The field of automatic music composition has seen great progress in the last few years, much of which can be attributed to advances in deep neural networks. There are numerous studies that present different strategies for generating sheet music from scratch. The inclusion of high-level musical characteristics (e.g., perceived emotional qualities),...
Conference Paper
Full-text available
The rise of deep learning technologies has quickly advanced many fields, including generative music systems. There exists a number of systems that allow for the generation of musically sounding short snippets, yet, these generated snippets often lack an overarching, longer-term structure. In this work, we propose CM-HRNN: a conditional melody gener...
Preprint
Full-text available
The field of automatic music composition has seen great progress in the last few years, much of which can be attributed to advances in deep neural networks. There are numerous studies that present different strategies for generating sheet music from scratch. The inclusion of high-level musical characteristics (e.g., perceived emotional qualities),...
Preprint
Full-text available
Recent advances in automatic music transcription (AMT) have achieved highly accurate polyphonic piano transcription results by incorporating onset and offset detection. The existing literature, however, focuses mainly on the leverage of deep and complex models to achieve state-of-the-art (SOTA) accuracy, without understanding model behaviour. In th...
Article
Underwater environments create a challenging channel for communications. In this paper, we design a novel receiver system by exploring the machine learning technique–Deep Belief Network (DBN) – to combat the signal distortion caused by the Doppler effect and multi-path propagation. We evaluate the performance of the proposed receiver system in both...
Preprint
Underwater environments create a challenging channel for communications. In this paper, we design a novel receiver system by exploring the machine learning technique--Deep Belief Network (DBN)-- to combat the signal distortion caused by the Doppler effect and multi-path propagation. We evaluate the performance of the proposed receiver system in bot...
Preprint
Full-text available
The fields of music, health, and technology have seen significant interactions in recent years in developing music technology for health care and well-being. In an effort to strengthen the collaboration between the involved disciplines, the workshop ‘Music, Computing, and Health’ was held to discuss best practices and state-of-the-art at the inters...
Preprint
Full-text available
The rise of deep learning technologies has quickly advanced many fields, including that of generative music systems. There exist a number of systems that allow for the generation of good sounding short snippets, yet, these generated snippets often lack an overarching, longer-term structure. In this work, we propose CM-HRNN: a conditional melody gen...
Article
Full-text available
The fields of music, health, and technology have seen significant interactions in recent years in developing music technology for health care and well-being. In an effort to strengthen the collaboration between the involved disciplines, the workshop “Music, Computing, and Health” was held to discuss best practices and state-of-the-art at the inters...
Conference Paper
Full-text available
In this work, we propose different variants of the self-attention based network for emotion prediction from movies, which we call AttendAffectNet. We take both audio and video into account and incorporate the relation among multiple modalities by applying self-attention mechanism in a novel manner into the extracted features for emotion prediction....
Conference Paper
Full-text available
Most of the state-of-the-art automatic music transcription (AMT) models break down the main transcription task into sub-tasks such as onset prediction and offset prediction and train them with onset and offset labels. These predictions are then concatenated together and used as the input to train another model with the pitch labels to obtain the fi...
Preprint
In this work, we propose different variants of the self-attention based network for emotion prediction from movies, which we call AttendAffectNet. We take both audio and video into account and incorporate the relation among multiple modalities by applying self-attention mechanism in a novel manner into the extracted features for emotion prediction....
Preprint
Full-text available
Most of the state-of-the-art automatic music transcription (AMT) models break down the main transcription task into sub-tasks such as onset prediction and offset prediction and train them with onset and offset labels. These predictions are then concatenated together and used as the input to train another model with the pitch labels to obtain the fi...
Preprint
Billions of USD are invested in new artists and songs by the music industry every year. This research provides a new strategy for assessing the hit potential of songs, which can help record companies support their investment decisions. A number of models were developed that use both audio data, and a novel feature based on social media listening be...
Preprint
Full-text available
Many of the music generation systems based on neural networks are fully autonomous and do not offer control over the generation process. In this research, we present a controllable music generation system in terms of tonal tension. We incorporate two tonal tension measures based on the Spiral Array Tension theory into a variational autoencoder mode...
Preprint
In this paper we present a new dataset, with musical excepts from the three main ethnic groups in Singapore: Chinese, Malay and Indian (both Hindi and Tamil). We use this new dataset to train different classification models to distinguish the origin of the music in terms of these ethnic groups. The classification models were optimized by exploring...
Article
Full-text available
In this paper, we present nnAudio, a new neural network-based audio processing framework with graphics processing unit (GPU) support that leverages 1D convolutional neural networks to perform time domain to frequency domain conversion. It allows on-the-fly spectrogram extraction due to its fast speed, without the need to store any spectrograms on t...
Conference Paper
Full-text available
High-level musical qualities (such as emotion) are often abstract, subjective, and hard to quantify. Given these difficulties, it is not easy to learn good feature representations with supervised learning techniques, either because of the insufficiency of labels, or the subjectiveness (and hence large variance) in human-annotated labels. In this pa...
Conference Paper
We present a controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE), which can generate realistic piano performances in the audio domain that closely follows temporal conditions of two essential style features for piano performances: articulation and dynamics. We demonstrate how the model is able to apply...
Article
Full-text available
Cough is a common symptom presenting in asthmatic children. In this investigation, an audio-based classification model is presented that can differentiate between healthy and asthmatic children, based on the combination of cough and vocalised /ɑ:/ sounds. A Gaussian mixture model using mel-frequency cepstral coefficients and constant-Q cepstral coe...
Preprint
Full-text available
High-level musical qualities (such as emotion) are often abstract, subjective, and hard to quantify. Given these difficulties, it is not easy to learn good feature representations with supervised learning techniques, either because of the insufficiency of labels, or the subjectiveness (and hence large variance) in human-annotated labels. In this pa...
Conference Paper
Full-text available
In this paper, we adapt triplet neural networks (TNNs) to a regression task, music emotion prediction. Since TNNs were initially introduced for classification, and not for regression, we propose a mechanism that allows them to provide meaningful low dimensional representations for regression tasks. We then use these new representations as the input...
Conference Paper
Full-text available
Generating an image from a provided descriptive text is quite a challenging task because of the difficulty in incorporating perceptual information (object shapes, colors, and their interactions) along with providing high relevancy related to the provided text. Current methods first generate an initial low-resolution image, which typically has irreg...
Preprint
Full-text available
Generating an image from a provided descriptive text is quite a challenging task because of the difficulty in incorporating perceptual information (object shapes, colors, and their interactions) along with providing high relevancy related to the provided text. Current methods first generate an initial low-resolution image, which typically has irreg...
Preprint
Full-text available
Information on liquid jet stream flow is crucial in many real world applications. In a large number of cases, these flows fall directly onto free surfaces (e.g. pools), creating a splash with accompanying splashing sounds. The sound produced is supplied by energy interactions between the liquid jet stream and the passive free surface. In this inves...
Preprint
Full-text available
We present a controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE), which can generate realistic piano performances in the audio domain that closely follows temporal conditions of two essential style features for piano performances: articulation and dynamics. We demonstrate how the model is able to apply...
Article
Full-text available
Separating a singing voice from its music accompaniment remains an important challenge in the field of music information retrieval. We present a unique neural network approach inspired by a technique that has revolutionized the field of vision: pixel-wise image classification, which we combine with cross entropy loss and pretraining of the CNN as a...