Humberto Maximiliano Torres

Humberto Maximiliano Torres
  • Doctor of Engineering
  • Research Associate at National Scientific and Technical Research Council

About

53
Publications
16,977
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
226
Citations
Introduction
Humberto Maximiliano Torres currently works at the Laboratorio de Investigaciones Sensoriales, National Scientific and Technical Research Council. Humberto does research in Human-computer Interaction, Prosody, and Artificial Intelligence. Their current project is 'Aromo: Argentine Spanish TTS System'.
Current institution
National Scientific and Technical Research Council
Current position
  • Research Associate
Additional affiliations
April 2009 - August 2019
National Scientific and Technical Research Council
Position
  • Research Associate
April 2006 - present
University of Buenos Aires
Position
  • Research Assistant
April 2006 - present
University of Buenos Aires
Position
  • Professor (Assistant)

Publications

Publications (53)
Article
El objetivo de este caso fue verificar la autoría de grabaciones de voz que circulan por una red social, solicitada por una organización no gubernamental extranjera dedicada a investigar la falsedad o veracidad de la información pública. Para esta tarea se emplea el sistema automático de comparación de hablantes FORENSIA de BlackVOX-CONICET. La g...
Conference Paper
How deep(fake) is your voice? Understanding the linguistic foundations of deepfakes is a research project funded by the Spanish Ministry of Science and Innovation. We investigate how to bridge the gap between: (1) the latest research in Artificial Intelligence and automatic system design to avoid spoofing attacks, and (2) the linguistic-phonetic k...
Article
Purpose: To provide voice experts with a method for determining the likelihood ratio (LR) from the perceptual evaluation of distinctive voice attribute scores. The proposed method aims to obtain the similarity and typicality judgments made by forensic voice experts (FVEs) during the comparison of attributes in voice pairs. Method: It is based on t...
Article
Full-text available
Resumen: Objetivo: obtener las distribuciones discriminantes de igual/diferentes hablantes para la tipificación del modelo de voces masculinas para el cálculo del cociente de verosimilitudes (LR). Desarrollo experimental: se inicia con el desarrollo de una base de datos de frases que considera la grabación de 2520 frases en distintas sesiones y gra...
Article
Full-text available
Resumen: Objetivo: obtener las distribuciones probabilísticas de las respuestas a la evaluación de pares de voces femeninas provenientes de la misma y diferentes hablantes que permiten desarrollar el método indirecto para el cálculo del cociente de verosimilitudes (LR). Desarrollo experimental: se inicia con la creación de una base de datos que con...
Article
En los últimos tiempos se ha desarrollado un área de estudio vinculada con la manera en que comprendemos y producimos las estructuras y unidades gramaticales, entre ellas las relaciones intra e interoracionales. El análisis de la prosodia, una de las variables que más interviene en la comprensión y en el mapeo de estructuras (Frazier et al., 2006),...
Preprint
Full-text available
Objetivo Difundir a la comunidad de fonoaudiología y peritos de voz el desarrollo de un sistema de transformación de puntajes de atributos en relaciones de verosimilitud Desarrollo La evaluación humana en casos forenses es el complemento necesario de los métodos automáticos y semiautomáticos actuales en la identificación de voces y se suman a los r...
Article
This paper addresses the issue of local disturbances in the fundamental frequency contour of speech, caused by the articulation of voiced/unvoiced consonant phonemes. Depending on the intended use of the F0 contour, these disturbances are usually eliminated by a filtering, smoothing or stylization procedure. These procedures that seek to preserve o...
Conference Paper
Full-text available
Objetivo Presentar a los auxiliares de la justicia, jueces, secretarios, defensores, fiscales y peritos una guía de buenas prácticas para la solicitud y realización de las pericias de identificación forense de voces. Se espera que esta guía permita la homogeneización de los pasos y reglas a seguir para lograr mayor certidumbre en todas las instanci...
Article
Full-text available
This paper introduces Emilia, a speech corpus created to build a female voice in Spanish spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text-to-speech system, which employs diphones as units of synthesis. The key requirements and design criteria for Emilia were: to synthesize any text in Spanish into high-qual...
Article
Full-text available
Background: Colorectal cancer is one of the most prevalent pathologies. Its prognosis is linked to the early detection and treatment. Currently diagnosis is performed by histological analysis from polyp biopsies, followed by morphological classification. Kudo's pit pattern classification is frequently used for the differentiation of neoplastic col...
Article
Objective: To present and test a production-matching method with external references, looking at the improvement of inter-rater variability of expert evaluations. Method: It consists of adjusting quality attribute levels of a synthetic vowel for a simultaneous matching with the natural patient vowel (NPV) attributes. In an initial experiment, se...
Conference Paper
Full-text available
This paper explores the relationship between perceived syllable prominence and the acoustic properties of a speech utterance. It is aimed at establishing a link between the linguistic meaning of an utterance in terms of sentence modality and focus with its underlying prosodic features. Our acoustic analysis compares traditional parameters modified...
Article
Fujisaki's intonation model parameterizes the F0's contour efficiently and because of its strong physiological basis has been successfully tested in different languages. One problem that has not been fully addressed is the extraction of the model's parameters, i.e., given a sentence, which model's parameter values best describe its intonation. Most...
Conference Paper
Full-text available
This paper explores the relationship between perceived syllable prominence and the acoustic properties of a speech utterance. It is aimed at establishing a link between the linguistic meaning of an utterance in terms of sentence modality and focus and its underlying prosodic features. Applications of such knowledge can be found in computerbased pro...
Research
Full-text available
This brief technical report shows a new method for subjective evaluation of voice, where intra and inter rater variability is lower than the observed with estimation methods. . Reference to the original published paper is cited. (In Spanish and English)
Article
To explore perceptual evaluation of jitter produced by fundamental frequency (F0) variation in a sustained vowel /a/, using two different methods. One is based on listener's internal references and the other is based on external references provided by the experimenter. We used two methods: one is magnitude estimation-converging limits (ME-CL), whic...
Article
En este trabajo se define una guía para la segmentación fonética y su transcripción mediante el alfabeto SAMPA (Speech Assessment Methods: Phonetic Alphabet). La transcripción fonética con SAMPA es de uso creciente en las tecnologías de habla en las tareas de preparación de datos acústicos para ser utilizados en el entrenamiento de sistemas de reco...
Article
Full-text available
This paper proposes two novel approaches for parameter estimation of a superpositional intonation model. These approaches present linguistic and paralinguistic assumptions for initializing a pre-existing standard method. In addition, all restrictions on the configuration of commands were eliminated. The proposed linguistic hypotheses can be based o...
Article
We evaluate here the application of two intonational models -quantitative and phonetic- To the analysis of an Argentine Spanish database of 741 broad-focus declarative sentences. The analytic model is the superpositional model proposed by Fujisaki (2003) for several languages. The phonetic model is the result of the application of a labelling metho...
Conference Paper
Full-text available
This work summarizes the perceptual evaluation of our recently developed text-to-speech system (S1), based on unit concatenation. We compare it with two commercially available systems (S2 and S3) using three different evaluation methods. One is the P.85 recommendation by the International Telecommunication Union (ITU), the second method, called Syn...
Conference Paper
Full-text available
This paper introduces Aromo text-to-speech system for Argentine Spanish, which was designed for telephony applications. We implement a speech synthesis technique using unit selection and concatenations. The system operates as client-server engine, that supports MRCP, SIP and SSML technology. The perceptual evaluation results show that Aromo's voice...
Conference Paper
In this paper we model the segmental duration of Spanish spoken in Buenos Aires, considering its application in a text-to-speech system. The work was performed on two hand labeled databases. We use arti�cial neural networks as predictor, and all the input features can be extracted automatically from the speech text. We experimented with a neural ne...
Conference Paper
The goal of this study is to explore the position of pitch accent commands relative to the accented syllable in final and non-final words for absolute interrogative sentences in Spanish. Fundamental frequency parameters are obtained from the Fujisaki model. Results indicate that accent commands for three-syllable words in final position are associa...
Conference Paper
This paper presents a novel method for rescoring the n-best recognition hypotheses using intonation knowledge. The model synthesizes the f0 contours for each of the n-best hypotheses and estimates an intonative matching index between the synthetic shapes and the real f0 contour. This index is applied in the rescoring process, and can be viewed as a...
Article
Full-text available
The goal of this study is to explore the association between tonal accents and fundamental frequency parameters obtained from the Fujisaki model in Buenos Aires Spanish. Results indicate that three-syllable words in final position which are stressed on the third syllable are associated with early peaks. In non-final word accents, late peaks are fou...
Conference Paper
Full-text available
This work presents an approach for parameter estimation and prediction of the Fujisaki model for Argentine Spanish. Language hypotheses were proposed for estimation and tested by means of genetic algorithms. These hypotheses were validated by comparison of the estimation performance relative to the standard method. Prediction was then calculated ba...
Article
A quick intelligibility test for noisy environments is presented and evaluated in this article. It is designed for children of 6 to 12 years old at schools or institutions where audiometric equipments are not easily available. Word identification capacity is evaluated in sentences through headphones under noise controlled conditions. The test is pr...
Article
A quick intelligibility test for noisy environments is presented and evaluated in this article. It is designed for children of 6 to 12 years old at schools or institutions where audiometric equipments are not easily available. Word identification capacity is evaluated in sentences through headphones under noise controlled conditions. The test is pr...
Article
Synthesis by concatenation of natural speech improves perceptual results when phonemes and syllables are segmented at places where spectral variations are small [Klatt, D., 1987. Review of text-to-speech conversion for English. J. Acoust. Soc. Am 82 (3), 737–793]. An automatic segmentation method is explored here, using a tool based on a combinatio...
Article
The detection of changes in the parameter values of a nonlinear dynamic system is a branch of study with multiple applications. In this paper, we explore a variant of an automatic detector and clustering of slight parameter variations in nonlinear dynamic systems proposed by Torres et al. [Automatic detection of slight changes in nonlinear dynamica...
Article
Full-text available
Hasta el presente la fonetografía ha sido mejor aplicada en pacientes con cierto grado de entrenamiento musical. En su versión más simple, el profesional que realiza el estudio, utiliza un teclado musical para generar los distintos semitonos de referencia y un medidor del nivel de presión sonora (NPS) para medir los niveles máximo y mínimo en dB ge...
Conference Paper
Full-text available
This work evaluates the efficiency of different word classes -part of speech-, normalized vs. non normalized counting for syllable and word occurrences, to predict non orthographic breaks of an Argentine Spanish database, designed for the development of the prosody component for a Text To Speech system. Within a set of 741 sentences, regression tre...
Article
Full-text available
We evaluate here the application of two intonational models �quantitative and phonetic- to the analysis of an Argentine Spanish database of 741 broad-focus declarative sentences. The analytic model is the superpositional model proposed by Fujisaki (2003) for several languages. The phonetic model is the result of the application of a labelling metho...
Conference Paper
We evaluate here the application of two intonational models –quantitative and phonetic- to the analysis of an Argentine Spanish database of 741 broad-focus declarative sentences. The analytic model is the superpositional model proposed by Fujisaki (2003) for several languages. The phonetic model is the result of the application of a labelling metho...
Conference Paper
Full-text available
The goal of this project was the design and realisation of a database to be used in an automatic speech recognition system for a fixed telephone network. One thousand speakers, native to five Argentine dialectal regions, were recorded. Each speaker answered five questions and read 38 texts, which consisted of numbers, names, last names, corporation...
Conference Paper
Full-text available
This project involved the design and development of a relational SQL-based database to generate an intonational model for an Argentine Spanish text to speech system. The first stage in the population of the database involved the massive loading of text, divided into three co-indexed files: sentences, orthographic words and phonological syllables. A...
Conference Paper
Full-text available
The present work consists on the use of delta cepstra coefficients in Mel scale, wavelet and wavelet packet transforms to feed a system for automatic speaker identification based on neural networks. Different alternatives are tested for the classifier based on neural nets, having achieved very good performance for closed groups of speakers in a tex...
Article
Full-text available
An automatic segmentation method is tested here, which uses a combination of entropy coding, continuous multiresolution analysis, and Kohonen's self organized maps. The method considers that there are no limits imposed by any linguistic unit. Resulting waveforms represent phone chains dominated by spectral dynamic structures. Each obtained acoustic...

Network

Cited By