
Aurelio Uncini- Full Professor
- Sapienza University of Rome
Aurelio Uncini
- Full Professor
- Sapienza University of Rome
About
404
Publications
71,209
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,768
Citations
Introduction
Aurelio Uncini is full Professor at Sapienza University of Rome, Italy, teaching Digital Audio Signal Processing, Machine Learning, Deep Learning. He is the director of the laboratory of Intelligent Signal Processing and Multimedia (ISPAMM) and co-founder of Cyber Intelligence and Information Security (CIS) research center at Sapienza University of Rome.
Current institution
Publications
Publications (404)
Artificial Intelligence (AI) holds transformative potential in education, enabling personalized learning, enhancing inclusivity, and encouraging creativity and curiosity. In this paper, we explore how Large Language Models (LLMs) can act as both patient tutors and collaborative partners to enhance education delivery. As tutors, LLMs personalize lea...
In recent years, diffusion models have emerged as a superior alternative to generative adversarial networks (GANs) for high-fidelity image generation, with wide applications in text-to-image generation, image-to-image translation, and super-resolution. However, their real-time feasibility is hindered by slow training and inference speeds. This stud...
Emotion recognition is relevant in various domains, ranging from healthcare to human-computer interaction. Physiological signals, being beyond voluntary control, offer reliable information for this purpose, unlike speech and facial expressions which can be controlled at will. They reflect genuine emotional responses, devoid of conscious manipulatio...
Emotion recognition is essential across numerous fields, including medical applications and brain-computer interface (BCI). Emotional responses include behavioral reactions, such as tone of voice and body movement, and changes in physiological signals, such as the electroencephalogram (EEG). The latter are involuntary, thus they provide a reliable...
Nanostructured metamaterials are now commonplace in contemporary photonics, providing a means for tailored manipulation of light-matter interactions. By deliberately altering the spatial distribution of materials at sub-wavelength scales, these materials yield optical responses surpassing those seen in natural counterparts. Among these, thin-film m...
Hypercomplex algebras have recently been gaining prominence in the field of deep learning owing to the advantages of their division algebras over real vector spaces and their superior results when dealing with multidimensional signals in real-world 3D and 4D paradigms. This article provides a foundational framework that serves as a road map for und...
The primary goal of the L3DAS (Learning 3D Audio Sources) project is to stimulate and support collaborative research studies concerning machine learning techniques applied to 3D audio signal processing. To this end, the L3DAS23 Challenge, presented at IEEE ICASSP 2023, focuses on two spatial audio tasks of paramount interest for practical uses: 3D...
Neural models based on hypercomplex algebra systems are growing and prolificating for a plethora of applications, ranging from computer vision to natural language processing. Hand in hand with their adoption, parameterized hypercomplex neural networks (PHNNs) are growing in size and no techniques have been adopted so far to control their convergenc...
In this paper, we investigate the exploitation of the latent space of a MelGAN architecture by the vector arithmetic for the generation of new sounds that may be appealing for musicians, similar to what has already been done in the case of words and images. Specifically, since the MelGAN uses directly the spectrogram as input to its generator, we f...
In this work, we propose a machine learning-based classification approach aiming at identifying real-world sounds recorded in construction sites. The proposed approach is based on the leaky version of the Echo State Network (ESN), and it has been tested on a real-world dataset composed of recordings of five vehicles and tools usually used in constr...
Catastrophic forgetting (CF) happens whenever a neural network overwrites past knowledge while being trained on new tasks. Common techniques to handle CF include regularization of the weights (using, e.g., their importance on past tasks), and rehearsal strategies, where the network is constantly re-trained on past data. Generative models have also...
The hearing organ is capable of perceiving sounds in a very wide frequency range, covering about 4 decades: nominally from 20 to 20 kHz. Most common audio signal sampling frequencies, although they ensure adequate acquisition of higher frequencies, may not be necessary for signals in the lower frequency range. For example, for processing a signal t...
In digital audio signal processing (DASP), there is a huge amount of literature about filtering systems designed for specific applications [1-88].
In all real physical situations, in the communication processes and in the wider meaning terms, it is usual to think about the signal that “carries information” as variable physical, mathematical, or symbolic quantities with which some information is associated [1–8].
The possibility of generating sounds from computers has had an extraordinary impact on the way in which the musical message is created and enjoyed. In 1962 Mathews et al. conclude their article Musical Sounds from Digital Computers [1], by stating that.
Sound and acoustic wave propagation are very complex phenomena, and for the generation or numerical manipulation of audio signals, it is of primary importance to have both theoretical and practical tools to model this complexity in systematic terms.
ModernSeeAlsoSeeAlsoDigital audio effects
processing and circuit technology has made available a number of methods for processing the acoustic signal covering various requirements. Among the different methods, the term effect generally refers to the processing of an existing sound in order to make it more suggestive.
The possibility of modifying the signal spectrum is a very common requirement for all digital audio signal processing (DASP) applications [1–9]. Considering a general scheme as shown in Fig. 3.1, the device for its implementation, generally known as a filter, is almost always present in both professional and consumer equipment. For example, the var...
One of the most important issues in sound synthesis techniques, as we saw in the previous chapter, is the dynamic adjustment of parameters related to the perceived acoustic quality or acousticity of the produced sound. Any musician considers important not only the timbre, but also the so-called playability of the instrument.
The generation of sound is commonly associated with the vibration of solid objects, and sound itself is often referred to as “air vibration.” The words sound and vibration are, in fact, commonly connected to each other. In musical instruments, for example, some sound generators are vibrating strings as in piano, guitar, violin, etc.; vibration of b...
In recent years, deep learning has permeated the field of medical image analysis gaining increasing attention from clinicians. However, medical images always require specific preprocessing that often includes downscaling due to computational constraints. This may cause a crucial loss of information magnified by the fact that the region of interest...
Spatial audio methods are gaining a growing interest due to the spread of immersive audio experiences and applications, such as virtual and augmented reality. For these purposes, 3D audio signals are often acquired through arrays of Ambisonics microphones, each comprising four capsules that decompose the sound field in spherical harmonics. In this...
Catastrophic forgetting (CF) occurs when a neural network loses the information previously learned while training on a set of samples from a different distribution, i.e., a new task. Existing approaches have achieved remarkable results in mitigating CF, especially in a scenario called task incremental learning. However, this scenario is not realist...
In this article, we investigate the degree of explainability of graph neural networks (GNNs). The existing explainers work by finding global/local subgraphs to explain a prediction, but they are applied after a GNN has already been trained. Here, we propose a meta-explainer for improving the level of explainability of a GNN directly at training tim...
Image-to-image translation (I2I) aims at transferring the content representation from an input domain to an output one, bouncing along different target domains. Recent I2I generative models, which gain outstanding results in this task, comprise a set of diverse deep networks each with tens of million parameters. Moreover, images are usually three-d...
In recent years, several approaches have been proposed for the task of Sound Event Localization and Detection (SELD) with multiple overlapping sound events in the 3D sound field. However, accuracy improvements have been often achieved at the expense of more complex networks and a larger number of parameters. In this paper, we propose an efficient a...
Spatial audio methods are gaining a growing interest due to the spread of immersive audio experiences and applications, such as virtual and augmented reality. For these purposes, 3D audio signals are often acquired through arrays of Ambisonics microphones, each comprising four capsules that decompose the sound field in spherical harmonics. In this...
The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments. This challenge improves and extends the tasks of the L3DAS21 edition. We generated a new dataset, which maintains the same general characteristics of L3DAS21 data...
Catastrophic forgetting (CF) happens whenever a neural network overwrites past knowledge while being trained on new tasks. Common techniques to handle CF include regularization of the weights (using, e.g., their importance on past tasks), and rehearsal strategies, where the network is constantly re-trained on past data. Generative models have also...
Recent research has found that neural networks are vulnerable to several types of adversarial attacks, where the input samples are modified in such a way that the model produces a wrong prediction that misclassifies the adversarial sample. In this paper we focus on black-box adversarial attacks, that can be performed without knowing the inner struc...
Nonlinear models are known to provide excellent performance in real-world applications that often operate in nonideal conditions. However, such applications often require online processing to be performed with limited computational resources. To address this problem, we propose a new class of efficient nonlinear models for online applications. The...
In this paper, we propose a new approach to train a deep neural network with multiple intermediate auxiliary classifiers, branching from it. These ‘multi-exits’ models can be used to reduce the inference time by performing early exit on the intermediate branches, if the confidence of the prediction is higher than a threshold. They rely on the assum...
In this paper, we investigate the degree of explainability of graph neural networks (GNNs). Existing explainers work by finding global/local subgraphs to explain a prediction, but they are applied after a GNN has already been trained. Here, we propose a meta-learning framework for improving the level of explainability of a GNN directly at training...
In this paper, we propose a novel ensembling technique for deep neural networks, which is able to drastically reduce the required memory compared to alternative approaches. In particular, we propose to extract multiple sub-networks from a single, untrained neural network by solving an end-to-end optimization task combining differentiable scaling ov...
Variational autoencoders are deep generative models that have recently received a great deal of attention due to their ability to model the latent distribution of any kind of input such as images and audio signals, among others. A novel variational autoncoder in the quaternion domain H, namely the QVAE, has been recently proposed, leveraging the au...
In this paper, we propose a novel ensembling technique for deep neural networks, which is able to drastically reduce the required memory compared to alternative approaches. In particular, we propose to extract multiple sub-networks from a single, untrained neural network by solving an end-to-end optimization task combining differentiable scaling ov...
Graph representation learning has become a ubiquitous component in many scenarios, ranging from social network analysis to energy forecasting in smart grids. In several applications, ensuring the fairness of the node (or graph) representations with respect to some protected attributes is crucial for their correct deployment. Yet, fairness in graph...
Nonlinear models are known to provide excellent performance in real-world applications that often operate in non-ideal conditions. However, such applications often require online processing to be performed with limited computational resources. In this paper, we propose a new efficient nonlinear model for online applications. The proposed algorithm...
The L3DAS21 Challenge is aimed at encouraging and fostering collaborative research on machine learning for 3D audio signal processing, with particular focus on 3D speech enhancement (SE) and 3D sound localization and detection (SELD). Alongside with the challenge, we release the L3DAS21 dataset, a 65 hours 3D audio corpus, accompanied with a Python...
In this paper, we propose a Deep Belief Network (DBN) based approach for the classification of audio signals to improve work activity identification and remote surveillance of construction projects. The aim of the work is to obtain an accurate and flexible tool for consistently executing and managing the unmanned monitoring of construction sites by...
Bayesian Neural Networks (BNNs) are trained to optimize an entire distribution over their weights instead of a single set, having significant advantages in terms of, e.g., interpretability, multi-task learning, and calibration. Because of the intractability of the resulting optimization problem, most BNNs are either sampled through Monte Carlo meth...
Convolutional Neural Networks (CNNs) have been widely used in the field of audio recognition and classification, since they often provide positive results. Motivated by the success of this kind of approach and the lack of practical methodologies for the monitoring of construction sites by using audio data, we developed an application for the classi...
In this paper, we propose a quaternion widely linear approach for the forecasting of environmental data, in order to predict the air quality. Specifically, the proposed approach is based on a fusion of heterogeneous data via vector spaces. A quaternion data vector has been constructed by concatenating a set of four different measurements related to...
Generative adversarial networks (GANs) have become widespread models for complex density estimation tasks such as image generation or image-to-image synthesis. At the same time, training of GANs can suffer from several problems, either of stability or convergence, sometimes hindering their effective deployment. In this paper we investigate whether...
Multiple sclerosis is one of the most common chronic neurological diseases affecting the central nervous system. Lesions produced by the MS can be observed through two modalities of magnetic resonance (MR), known as T2W and FLAIR sequences, both providing useful information for formulating a diagnosis. However, long acquisition time makes the acqui...
Recently, data augmentation in the semi-supervised regime, where unlabeled data vastly outnumbers labeled data, has received a considerable attention. In this paper, we describe an efficient technique for this task, exploiting a recent framework we proposed for missing data imputation called graph imputation neural network (GINN). The key idea is t...
Graph representation learning has become a ubiquitous component in many scenarios, ranging from social network analysis to energy forecasting in smart grids. In several applications, ensuring the fairness of the node (or graph) representations with respect to some protected attributes is crucial for their correct deployment. Yet, fairness in graph...
Deep probabilistic generative models have achieved incredible success in many fields of application. Among such models, variational autoencoders (VAEs) have proved their ability in modeling a generative process by learning a latent representation of the input. In this paper, we propose a novel VAE defined in the quaternion domain, which exploits th...
Graph convolutional networks (GCNs) are a family of neural network models that perform inference on graph data by interleaving vertexwise operations and message-passing exchanges across nodes. Concerning the latter, two key questions arise: 1) how to design a differentiable exchange protocol (e.g., a one-hop Laplacian smoothing in the original GCN)...
Deep neural networks are generally designed as a stack of differentiable layers, in which a prediction is obtained only after running the full stack. Recently, some contributions have proposed techniques to endow the networks with early exits, allowing to obtain predictions at intermediate points of the stack. These multi-output networks have a num...
The sounds of work activities and equipment operations at a construction site provide critical information regarding construction progress, task performance, and safety issues. The construction industry, however, has not investigated the value of sound data and their applications, which would offer an advanced approach to unmanned management and re...
In recent years, hyper-complex deep networks (such as complex-valued and quaternion-valued neural networks – QVNNs) have received a renewed interest in the literature. They find applications in multiple fields, ranging from image reconstruction to 3D audio processing. Similar to their real-valued counterparts, quaternion neural networks require cus...
Nonlinear adaptive filters often show some sparse behavior due to the fact that not all the coefficients are equally useful for the modeling of any nonlinearity. Recently, a class of proportionate algorithms has been proposed for nonlinear filters to leverage sparsity of their coefficients. However, the choice of the norm penalty of the cost functi...
Catastrophic forgetting (CF) happens whenever a neural network overwrites past knowledge while being trained on new tasks. Common techniques to handle CF include regularization of the weights (using, e.g., their importance on past tasks), and rehearsal strategies, where the network is constantly re-trained on past data. Generative models have also...
Missing data imputation (MDI) is the task of replacing missing values in a dataset with alternative, predicted ones. Because of the widespread presence of missing data, it is a fundamental problem in many scientific disciplines. Popular methods for MDI use global statistics computed from the entire dataset (e.g., the feature-wise medians), or build...
Deep neural networks are generally designed as a stack of differentiable layers, in which a prediction is obtained only after running the full stack. Recently, some contributions have proposed techniques to endow the networks with early exits, allowing to obtain predictions at intermediate points of the stack. These multi-output networks have a num...
Bayesian Neural Networks (BNNs) are trained to optimize an entire distribution over their weights instead of a single set, having significant advantages in terms of, e.g., interpretability, multi-task learning, and calibration. Because of the intractability of the resulting optimization problem, most BNNs are either sampled through Monte Carlo meth...
Graph convolutional networks (GCNs) are a family of neural network models that perform inference on graph data by interleaving vertex-wise operations and message-passing exchanges across nodes. Concerning the latter, two key questions arise: (i) how to design a differentiable exchange protocol (e.g., a 1-hop Laplacian smoothing in the original GCN)...
The incoming IoT big data era requires efficient and resource-constrained mining of large sets of distributed data. This paper explores a possible approach to this end, combining the two emerging paradigms of Conditional Neural Networks with early exits and Fog Computing. Apart from describing the general framework, we provide four specific contrib...
Continual learning of deep neural networks is a key requirement for scaling them up to more complex applicative scenarios and for achieving real lifelong learning of these architectures. Previous approaches to the problem have considered either the progressive increase in the size of the networks, or have tried to regularize the network behavior to...
In this paper, we propose an architecture based on a stacked auto-encoder (SAE) for the classification of music genre. Each level in the stacked architecture works by stacking some hidden representations resulting from the previous level and related to different frames of the input signal. In this way, the proposed architecture shows a more robust...
In this paper, we characterize the main building blocks and numerically verify the classification accuracy and energy performance of SmartFog, a distributed and virtualized networked Fog technological platform for the support for Stacked Denoising Auto-Encoder (SDAE)-based anomaly detection in data flows generated by Smart-Meters (SMs). In SmartFog...
Continual learning of deep neural networks is a key requirement for scaling them up to more complex applicative scenarios and for achieving real lifelong learning of these architectures. Previous approaches to the problem have considered either the progressive increase in the size of the networks, or have tried to regularize the network behavior to...
Multiple sclerosis is one of the most common chronic neurological diseases affecting the central nervous system. Lesions produced by the MS can be observed through two modalities of magnetic resonance (MR), known as T2W and FLAIR sequences, both providing useful information for formulating a diagnosis. However, long acquisition time makes the acqui...
In recent years, hyper-complex deep networks (e.g., quaternion-based) have received increasing interest with applications ranging from image reconstruction to 3D audio processing. Similarly to their real-valued counterparts, quaternion neural networks might require custom regularization strategies to avoid overfitting. In addition, for many real-wo...
Recently, data augmentation in the semi-supervised regime, where unlabeled data vastly outnumbers labeled data, has received a considerable attention. In this paper, we describe an efficient technique for this task, exploiting a recent framework we proposed for missing data imputation called graph imputation neural network (GINN). The key idea is t...
Convolutional Neural Networks (CNNs) have been widel yused in the field of audio recognition and classification, often with ex-tremely positive results. Due to the success of this kind of approach, we developed an application for the classification of different types and brands of construction vehicles and tools, which operates on the emitted audio...
Missing data imputation (MDI) is a fundamental problem in many scientific disciplines. Popular methods for MDI use global statistics computed from the entire data set (e.g., the feature-wise medians), or build predictive models operating independently on every instance. In this paper we propose a more general framework for MDI, leveraging recent wo...
In this brief we investigate the generalization properties of a recently-proposed class of non-parametric activation functions, the kernel activation functions (KAFs). KAFs introduce additional parameters in the learning process in order to adapt nonlinearities individually on a per-neuron basis, exploiting a cheap kernel expansion of every activat...
Complex-valued neural networks (CVNNs) have been shown to be powerful nonlinear approximators when the input data can be properly modeled in the complex domain. One of the major challenges in scaling up CVNNs in practice is the design of complex activation functions. Recently, we proposed a novel framework for learning these activation functions ne...
In this paper, we propose a deep recurrent neural network (DRNN), based on the Long Short-Term Memory (LSTM) unit, for the separation of drum and bass sources from a monaural audio track. In particular, a single DRNN with a total of six hidden layers (three feedforward and three recurrent) is used for each original source to be separated. In this w...
Neural networks require a careful design in order to perform properly on a given task. In particular, selecting a good activation function (possibly in a data-dependent fashion) is a crucial step, which remains an open problem in the research community. Despite a large amount of investigations, most current implementations simply select one fixed f...
In this paper, the problem of the online modeling of nonlinear speech signals is addressed. In particular, the goal of this work is to provide a nonlinear model yielding the best tradeoff between performance results and required computational resources. Functional link adaptive filters were proved to be an effective model for this problem, providin...
The degree of diffusion of hypercomplex algebras in adaptive and non-adaptive filtering research topics is growing faster and faster. The debate today concerns the usefulness and the benefits of representing multidimensional systems by means of these complicated mathematical structures and the criterions of choice between one algebra or another. Th...