Agustín Álvarez-Marquina

Agustín Álvarez-Marquina
Universidad Politécnica de Madrid | UPM · Centre for Biomedical Technology

PhD

About

122
Publications
25,710
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
881
Citations
Introduction
Agustín Álvarez-Marquina currently works at the Centre for Biomedical Technology, Universidad Politécnica de Madrid. Agustín does research in Computer Architecture and Biomedical Engineering. Their current project is 'Speech Neuromechanics'.

Publications

Publications (122)
Article
Full-text available
Smith–Magenis syndrome (SMS) is a rare, underdiagnosed condition due to limited public awareness of genetic testing and a lengthy diagnostic process. Voice analysis can be a noninvasive tool for monitoring and detecting SMS. In this paper, the cepstral peak prominence and mel-frequency cepstral coefficients are used as disease monitoring and detect...
Article
The production of phonation involves very complex processes, linked to the physical, clinical, and emotional state of the speaker. Thus, in populations with neurological diseases, it is possible to find the imprint in the voice signal left by the deterioration of certain cortical areas or part of the neurocognitive mechanisms that are involved in s...
Chapter
This research work proposes a novel, encryption-based method for comparing embeddings generated by neural networks on various information types (text, images, videos, audio, etc.). This approach prioritizes real-world applications dealing with sensitive or private data, particularly in biomedical and biometric analysis, where even minor information...
Article
Full-text available
This research work introduces a novel, nonintrusive method for the automatic identification of Smith–Magenis syndrome, traditionally studied through genetic markers. The method utilizes cepstral peak prominence and various machine learning techniques, relying on a single metric computed by the research group. The performance of these techniques is...
Article
Deep Learning (DL), a groundbreaking branch of Machine Learning (ML), has emerged as a driving force in both theoretical and applied Artificial Intelligence (AI). DL algorithms, rooted in complex and non-linear artificial neural systems, excel at extracting high-level features from data. DL has demonstrated human-level performance in real-world tas...
Preprint
Groundbreaking advances in theoretical and applied Artificial Intelligence (AI). Deep Learning (DL) algorithms are grounded in non-linear and complex artificial neural systems that progressively extract higher-level features from data. DL is frequently compared with human-level performance in real-world tasks, such as clinical diagnostics. It is al...
Article
Full-text available
Pathological voice characterization has received increasing attention over the last 20 years. Hundreds of studies have been published showing inventive approaches with very promising findings. Nevertheless, methodological issues might hamper performance assessment trustworthiness. This study reviews some critical aspects regarding data collection a...
Chapter
The first prevailing Neuromotor Disorder (ND) is Parkinson’s Disease (PD) with steadily increasing incidence rates, lacking a definitive cure. Nevertheless, dopaminergic medication and rehabilitation may improve the living conditions of people affected by PD. Neuroacoustical stimulation is a non-invasive, which may improve some motor symptoms assoc...
Chapter
All the gestures and movements we make are influenced by our psychomotor abilities. This mobility deteriorates over the years. It is logical to think that an older individual has worse mobility than a younger one if they do not suffer from other pathologies. On this premise, the main aim of this research work is based on the detection of semantic b...
Chapter
Parkinson’s Disease (PD) is a major neurodegenerative disorder with steadily increasing incidence rates, demanding overgrowing resources from national health systems and imposing considerable burden on caregivers. Cost-effective and efficient turn-around time monitoring methods are required to facilitate regular, longitudinal, accurate clinical ass...
Chapter
Parkinson’s Disease (PD) is a neurodegenerative disorder that severely impacts the motor capabilities of patients. Dysarthria is one of the symptoms that can be accurately characterized using speech analysis, tracking the deterioration associated with the evolution of the disease. Through the present work the use of machine learning-based technolog...
Article
Speech signal analysis is a powerful tool that facilitates the monitoring and tracking of symptom deterioration caused by neurodegenerative disorders, typically achieved using either sustained vowels, diadochokinetic exercises or running speech. This study expands our previous work on the study of the movement produced by the jaw-tongue biomechanic...
Article
Full-text available
Aim The present work proposes the study of the neuromotor activity of the masseter-jaw-tongue articulation during diadochokinetic exercising to establish functional statistical relationships between surface Electromyography (sEMG), 3D Accelerometry (3DAcc), and acoustic features extracted from the speech signal, with the aim of characterizing Hypok...
Article
Speech is controlled by axial neuromotor systems, therefore, it is highly sensitive to the effects of neurodegenerative illnesses such as Parkinson's Disease (PD). Patients suffering from PD present important alterations in speech, which are manifested in phonation, articulation, prosody, and fluency. These alterations may be evaluated using statis...
Chapter
Speech is a vehicular tool to detect neurological degeneration using certain accepted biomarkers derived from sustained vowels, diadochokinetic exercises, or running speech. Classically, mel-frequency cepstral coefficients (MFCCs) have been used in the organic and neurologic characterization of pathologic phonation using sustained vowels. In the pr...
Article
Speech is a vehicular tool to detect neurological degeneration using certain accepted biomarkers derived from sustained vowels, diadochokinetic exercises or running speech. Classically the Vowel Space Area (VSA) and the Formant Centralization Ratio (FCR) have been proposed to describe dysarthria in Parkinson Disease (PD). These features are based i...
Chapter
Speech is controlled by axial neuromotor systems, highly sensible to certain neurodegenerative illnesses as Parkinson’s Disease (PD). Patients suffering PD present important alterations in speech, which manifest in phonation, articulation, prosody and fluency. Usually phonation and articulation alterations are estimated using different statistical...
Article
Speech articulation is produced by the movements of muscles in the larynx, pharynx, mouth and face. Therefore speech shows acoustic features as formants which are directly related with neuromotor actions of these muscles. The first two formants are strongly related with jaw and tongue muscular activity. Speech can be used as a simple and ubiquitous...
Article
Full-text available
Parkinson Disease (PD) is a neuromotor illness affecting general movements of different muscles, those implied in speech production being among them. The relevance of speech in monitoring illness progression has been documented in these last two decades. Most of the studies have concentrated in dysarthria and dysphonia induced by the syndrome. The...
Conference Paper
Full-text available
Speech articulation is conditioned by the movements produced by well determined groups of muscles in the larynx, pharynx, mouth and face. The resulting speech shows acoustic features which are directly related with muscle neuromotor actions. Formants are some of the observable correlates most related to certain muscle actions, such as the ones acti...
Article
Full-text available
Patients suffering from Parkinson's disease (PD) may be successfully treated pharmacologically and surgically to preserve and even improve their life quality and health conditions. Although the progress of the disease cannot be stopped, at least mitigation of the most handicapping symptoms can be achieved. But both pharmacological and surgical trea...
Conference Paper
Vocal Fold Paralysis (VFP) is a secondary consequence of neck and throat surgery. A possible corrective treatment of VFP is fat injection into the paralyzed vocal fold. Recently, this technique has been modified to enrich the injection of fat with grafts of stem cells. Questions as if the implantation of fat plus stem-cells is efficient enough comp...
Chapter
Full-text available
It is known that Parkinson’s Disease (PD) leaves marks in phonation dystonia and tremor. These marks can be expressed as a function of biomechanical characteristics monitoring vocal fold tension and imbalance. These features may assist tracing the neuromotor activity of laryngeal pathways. Therefore these features may be used in grading the stage o...
Chapter
It is known that the amount of characteristics may be the bottleneck of a digital processing system. Finding a good method to detect which characteristics are the most important to identify a speaker would get better results with less characteristics. The classification of an adult speaker by their age is a big challenge since the adulthood is a lo...
Article
Full-text available
Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face reco...
Conference Paper
Full-text available
Organic as well as neurologic diseases leave important correlates in phonation. Parkinson’s Disease (PD) may leave marks in vocal fold dystonia and tremor. Biomechanical parameters monitoring vocal fold tension and unbalance, as well as tremor are defined in the study. These correlates are known to be of help in tracing the neuromotor activity of b...
Article
Full-text available
Phonation distortion leaves relevant marks in a speaker's biometric proole. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Veriicati...
Conference Paper
Full-text available
Parkinson's Disease (PD), contrary to other neurodegenerative diseases, supports certain treatments which can improve patients' conditions or at least mitigate disease effects. Treatments, either pharmacological, surgical or rehabilitative need longitudinal monitoring of patients to assess the progression or regression of thier condition, to optimi...
Article
Speech production in patients suffering of dementias of Alzheimer's type is known to experience noticeable changes with respect to normative speakers. Classically this kind of speech has been described as presenting altered prosody, rhythmic pace, anomy, or impaired semantics. Phonation, conceived as the production of voice in voiced speech fragmen...
Conference Paper
Full-text available
Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used in the biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization after the vocal tract inversion is proposed for dysphonic voice characterization in Speaker Ver...
Article
Full-text available
The dramatic impact of neurological degenerative pathologies in life quality is a growing concern nowadays. Many techniques have been designed for the detection, diagnosis, and monitoring of the neurological disease. Most of them are too expensive or complex for being used by primary attention medical services. On the other hand, it is well known t...
Conference Paper
Full-text available
The present paper presents the system developed to participate in the 2013 Speaker Recognition Evaluation in Mobile Environments. The aim of the system is to show that selecting an adequate front-end that effectively character-izes the speaker is as important as the selection of the classifier. This compo-nent of the recognition system seems to be...
Conference Paper
Full-text available
Gender detection is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here consists in obtaining uncorrelated glottal and vocal...
Conference Paper
Full-text available
Neurological Diseases (ND) are affecting larger segments of aging population every year. Treatment is dependent on expensive accurate and frequent monitoring. It is well known that ND leave correlates in speech and phonation. The present work shows a method to detect alterations in vocal fold tension during phonation. These may appear either as hyp...
Conference Paper
Full-text available
The Glottal Source correlates reconstructed from the phonated parts of voice may render interesting information with applicability in different fields. One of them is defective closure (gap) detection. Through the paper the background to explain the physical foundations of defective gap are reviewed. A possible method to estimate defective gap is a...
Conference Paper
Full-text available
MFCC coefficients extracted from the power spectral density of speech as a whole, seems to have become the de facto standard in the area of speaker recognition, as demonstrated by its use in almost all systems submitted to the 2013 Speaker Recognition Evaluation (SRE) in Mobile Environment [1], thus relegating to background this component of the re...
Conference Paper
Full-text available
Gender detection from running speech is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here discards f0 as a valid feature be...
Conference Paper
Full-text available
This paper evaluates the performance of the twelve primary systems submitted to the evaluation on speaker verification in the context of a mobile environment using the MOBIO database. The mobile environment provides a challenging and realistic test-bed for current state-of-the-art speaker verification techniques. Results in terms of equal error rat...
Conference Paper
Full-text available
BioMet®Phon is a software application developed for the characterization of voice in voice quality evaluation. Initially it was conceived as plain research code to estimate the glottal source from voice and obtain the biomechanical parameters of the vocal folds from the spectral density of the estimate. This code grew to what is now the Glottex®Eng...
Article
Full-text available
BioMet®Tools is a set of software applications developed for the biometrical characterization of voice in different fields as voice quality evaluation in laryngology, speech therapy and rehabilitation, education of the singing voice, forensic voice analysis in court, emotional detection in voice, secure access to facilities and services, etc. Initi...
Conference Paper
Kernel-PCA and PCA techniques are compared in the task of age and gender separation. A feature extraction process that discriminates between vocal tract and glottal source is implemented. The reason why speech is processed in that way is because vocal tract length and resonant characteristics are related to gender and age and there is also a great...
Conference Paper
Full-text available
Vowels are important clues supporting speech perception. Nevertheless there are not good definitions for the vowel under the perceptual and computational points of view, among others. The purpose of the present paper is to give an explanation on how the concept of vowel may be defined under the perceptual point of view as those patterns assigned to...
Article
Full-text available
Voice biometry is classically based on the parameterization and patterning of speech features mainly. The present approach is based on the characterization of phonation features instead (glottal features). The intention is to reduce intra-speaker variability due to the ‘text’. Through the study of larynx biomechanics it may be seen that the glottal...
Conference Paper
Full-text available
It is well known that many neurological diseases leave a fingerprint in voice and speech production. The dramatic impact of these pathologies in life quality is a growing concert. Many techniques have been designed for the detection, diagnose and monitoring the neurological disease. Most of them are costly or difficult to extend to primary services...
Conference Paper
Full-text available
In this paper a layered architecture to spot and characterize vowel segments in running speech is presented. The detection process is based on neuromorphic principles, as is the use of Hebbian units in layers to implement lateral inhibition, band probability estimation and mutual exclusion. Results are presented showing how the association between...
Article
Speech and voice technologies are experiencing a profound review as new paradigms are sought to overcome some specific problems which cannot be completely solved by classical approaches. Neuromorphic Speech Processing is an emerging area in which research is turning the face to understand the natural neural processing of speech by the Human Auditor...
Article
Full-text available
Current trends in the search for improvements in well-established technologies imitating human abilities, as speech perception, try to find inspiration in the explanation of certain capabilities hidden in the natural system which are not yet well understood. A typical case is that of speech recognition, where the semantic gap going from spectral ti...
Article
Full-text available
Recent studies have shown that the correct labeling of phonetic classes may help current Automatic Speech Recognition (ASR) when combined with classical parsing automata based on Hidden Markov Models (HMM).Through the present paper a method for Phonetic Class Labeling (PCL) based on bio-inspired speech processing is described. The methodology is ba...
Article
The Glottal Source is an important component of voice as it can be considered as the excitation signal to the voice apparatus. The use of the Glottal Source for pathology detection or the biometric characterization of the speaker are important objectives in the acoustic study of the voice nowadays. Through the present work a biometric signature bas...
Conference Paper
Speech and voice technologies are experiencing a profound review as new paradigms are sought to overcome some specific problems which can not be completely solved by classical approaches. Neuromorphic Speech Processing is an emerging area in which research is turning the face to understand the natural neural processing of speech by the Human Audito...
Article
Full-text available
In this paper the determination of the optimal word-length of the variables implicated in a noise adaptive canceller based on a gradient lattice-ladder algorithm is presented. Upper and lower bounds from the variables are determined from a set of spoken words.
Conference Paper
Full-text available
In this paper, a portable hardware design implementing a fast fourier transform oriented to its reusability as a core is presented. The module has been developed using radix-2 Decimation-In-Time algorithm. Structural modeling is implemented using VHDL to describe, simulate and perform the design. The module is portable among different EDA tools and...
Conference Paper
This paper describes the design and implementation results of an adaptive Noise canceller useful for the construction of robust speech enhancement interfaces. The algorithm being used has very good performance for real time applications. Its main disadvantage is the requirement of calculating several operations of division, having a high computatio...
Article
Full-text available
Classical parameterization techniques in Speaker Identification tasks use the codification of the power spectral density of speech as a whole, not discriminating between articulatory features due to the dynamics of vocal tract (acoustic-phonetics) and those contributed by the glottal source. Through the present paper a study is conducted to separat...
Article
Full-text available
Cognitive Speech Perception is a field of growing interest as far as studies in cognitive sciences have advanced during the last decades helping in providing better descriptions on neural processes taking place in sound processing by the Auditory System and the Auditory Cortex. This knowledge may be applied to design new bio-inspired paradigms in t...
Article
This paper describes the design and implementation results of an adaptive Noise Canceller useful for the construction of Robust Speech Enhancement Interfaces. The algorithm being used has very good performance for real time applications. Its main disadvantage is the requirement of calculating several operations of division, having a high computatio...
Article
Full-text available
The biometric signature derived from the estimation of the power spectral density singularities of a speaker’s glottal source is described in the present work. This consists in the collection of peak-trough profiles found in the spectral density, as related to the biomechanics of the vocal folds. Samples of parameter estimations from a set of 100 n...
Conference Paper
Full-text available
The biometric voice signature may be derived from voice as a whole, or from the separate vocal tract and glottal source after inverse filtering extraction. This last approach has been used by the authors in early work, where it has been shown that the biometric signature obtained from the glottal source provides a good description of speaker’s char...
Conference Paper
Full-text available
Voice-controlled devices provide a smart solution to operate add-on appliances in a car. Although, speech recognition appears as a key technology to produce useful end-user interfaces, the amount of acoustic disturbances existing in automotive platforms usually prevents satisfactory results. In most of the cases, noise reduction techniques involvin...
Article
Full-text available
Voice disorders are a source of increasing concern as normal voice quality is a social demand for at least one third of the population in developed countries in cases where voice is an essential resource in professional exercise. In addition, the growing exposure to certain pathogenic factors such as smoking, alcohol abuse, air pollution, and acous...
Conference Paper
A comprehensive view of speech and voice technologies is now demanding better and more complex tools amenable of extracting as much knowledge about sound and speech as possible. Many knowledge-extraction tasks from speech and voice share well-known procedures at the algorithmic level under the point of view of bio-inspiration. The same resources em...
Conference Paper
Full-text available
Through the present work a biometric pattern of a speaker's glottal source based on the power spectral density profile of the mucosal wave correlate residual is defined, after estimations derived from the removal of the vocal tract transfer function by inverse filtering. This pattern may be parameterized accordingly to its peak-trough profile, whic...
Conference Paper
In this paper we introduce the design of a HMM soft-core for the recognition stage in an independent speaker isolated word recognition system, which may be useful in many applications. The design has been oriented towards its reusability having parameterized the number of vocabulary's words, the number of hidden states and bits of data format. Stru...
Conference Paper
The biometric voice signature may be derived from voice as a whole, or from the separate vocal tract and glottal source after inverse filtering extraction. This last approach has been used by the authors in early work, where it has been shown that the biometric signature obtained from the glottal source provides a good description of speaker’s char...
Article
In this paper, we propose a portable hardware design that implements a Fast Fourier Transform oriented to its reusability as a core. The design has parameterized the number of samples and the number of the data's bits. The module has been developed using a radix-2 decimation in time algorithm of n-point samples. Structural modelling is implemented...