Laura Fernández Gallardo

Laura Fernández Gallardo
Technische Universität Berlin | TUB · Department of Software Engineering and Theoretical Computer Science

PhD

About

39
Publications
5,722
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
166
Citations
Introduction
When we are confronted with unknown voices, e.g. via telephone calls, we unintentionally try to compile a picture of the speakers from their voice characteristics. The created impressions may determine our decisions and attitudes towards the speakers and their messages - How are human perceptions and automatic estimations of speaker personality and voice likability influenced by modern speech communication channels? How can these influences be assessed and predicted?
Additional affiliations
June 2015 - present
Technische Universität Berlin
Position
  • PostDoc Position
November 2011 - May 2015
University of Canberra
Position
  • PhD Student
Education
November 2011 - May 2015
University of Canberra
Field of study
  • Thesis: "Human and Automatic Speaker Recognition over Telecommunication Channels"
September 2005 - July 2011
University of Granada
Field of study
  • Telecommunications Engineering

Publications

Publications (39)
Conference Paper
Full-text available
Wideband communications permit the transmission of an extended frequency range compared to the traditional narrowband. While benefits for automatic speaker recognition can be expected, the extent of the contribution of the additional bandwidth in wideband is still unclear. This work compares the i-vector speaker verification performances employing...
Conference Paper
Full-text available
It is well known that the speaker discriminative information is not equally distributed over the spectral domain. However, it is still not clear whether that distribution is altered when the speech is transmitted through telecommunication channels, which introduce different kinds of degradations. In this paper we address the analysis of different f...
Conference Paper
Full-text available
With the advent of wideband technologies (50–7,000 Hz), higher transmitted signal quality can be achieved in contrast to traditional narrowband communications (300– 3,400 Hz). It is commonly acknowledged that the low frequencies incorporated contribute to increased naturalness , presence, and comfort, whereas the high frequency extension facilitate...
Conference Paper
Full-text available
Voice biometrics are frequently exposed to channel deg-radations of transmitted speech and to channel mismatch between enrollment and test utterances, which cause speaker recognition systems to perform poorly. In this paper, the influence of channel bandwidth and speech coding on speaker verification is assessed employing the state-of-the-art i-vec...
Conference Paper
Full-text available
Together with the variety of networks, diverse terminals and devices, such as telephones with handset or hands-free mode, mobile phones and headsets, are commonly available for everyday calls. We conducted an auditory test to examine the combined influence of these user interfaces, audio bandwidths, coding schemes and packet loss on human speaker i...
Article
Modern human-computer interaction systems may not only be based on interpreting natural language but also on detecting speaker interpersonal characteristics in order to determine dialog strategies. This may be of high interest in different fields such as telephone marketing or automatic voice-based interactive services. However, when such systems e...
Conference Paper
Crowdsourcing provides an exceptional opportunity for the rapid collection of human input for data acquisition and labelling. This approach have been adopted in multiple domains and researchers are now able to reach a demographically diverse audience at low cost. However, it remains the question of whether the results are still valid and reliable....
Conference Paper
Full-text available
Human perceptions of speaker characteristics, needed to perform automatic predictions from speech features, have generally been collected by conducting demanding in-lab listening tests under controlled conditions. Concurrently, crowdsourc-ing has emerged as a valuable approach for running user studies through surveys or quantitative ratings. Micro-...
Conference Paper
Full-text available
A great number of investigations on person characterization rely on the assessment of the Big-Five personality traits, a prevalent and widely accepted model with strong psychological foundation. However, in the context on characterizing unfamiliar individuals from their voices only, it may be hard for assessors to determine the Big-Five traits base...
Conference Paper
Full-text available
The performance of automatic speech recognition based on coded-decoded speech heavily depends on the quality of the transmitted signals, determined by channel impairments. This paper examines relationships between speech recognition performance and measurements of speech quality and intelligibility over transmission channels. Different to previous...
Article
Micro-task crowdsourcing has emerged as a powerful approach for rapid collection of user input from a large set of participants at low cost. While previous studies have investigated the acceptability of the crowdsourcing paradigm for obtaining reliable perceptual scores of audio or video quality, this work examines the suitability of crowdsourcing...
Article
It is important to realize in which environments a speech quality evaluation test can be carried out to achieve reliable results outside the laboratory. We report on our current activity on using microphone signals for evaluating environmental conditions in crowdtesting. In order to analyze the impact of environmental noise, a two-phase experiment...
Conference Paper
Full-text available
For participants of multi-party teleconferences, it can be challenging to attribute what was said to the individual talkers. Spatial audio reproduction may help to overcome this issue. Our work investigates the potential improvement for speaker recognition using binaural synthesis. Here, the individual talkers are distributed to different simulated...
Conference Paper
Full-text available
The ongoing process for recording a personality and likability database in German is motivated and described. Overall, high-quality and consistency among recordings is pursued, in order to avoid possible biases when rating speaker characteristics and low performance when automatically detecting them. Prescribed and spontaneous human-human dialogs a...
Conference Paper
Full-text available
The research on acoustic correlates and on the automatic classification of voice likability commonly faces the undesirable low agreement between human raters. This may partly hinder the good performance of automatic likability detection techniques. Whereas only Likert scales have been employed for subjective likability assessments of utterances, th...
Conference Paper
Full-text available
The Social Relations Model is well-known for analyses of in-terpersonal attraction. As a novelty in this paper, the model is applied to assess different effects on likability ratings from speech only. A group of 30 unacquainted participants is considered in our experiment. Their voices were recorded and transmitted through communication channels, a...
Chapter
Telecommunication networks have been improved at a rapid pace in the recent years. The capabilities of WB and SWB transmissions, e.g. in VoIP, were shown to be superior to those of NB regarding perceived quality. In addition, evidence was found of speaker-discriminative information being conveyed by frequencies beyond 4 kHz of microphone signals. A...
Chapter
Voice biometrics are frequently exposed to channel degradations of transmitted speech, which cause speaker recognition systems to perform poorly. Particularly, there may exist a severe mismatch between enrolment and test utterances when each of the transmissions presents different characteristics, causing an undesired increase of within-speaker var...
Chapter
It has been widely reported that the information of speaker individuality in the voice is not equally distributed on the speech spectrum, and that this is attributed to the occurrence of different phoneme events (e.g. [115, 156, 170]). Based on this finding, a variety of methods have been developed to conveniently extract the most useful informatio...
Chapter
Speech communication channels and their components (e.g. codecs) are generally designed for optimum perceived speech quality. However, transmission channels should also preserve principal speaker-specific characteristics that enable acceptable speaker identification performance by end listeners. This chapter proposes a first step towards effective...
Chapter
The phonemes that permit more accurate human speaker recognition are determined by means of speaker verification experiments, focusing on the differences in performance when the stimuli are presented to listeners in NB or in WB. It is known that nasal consonants and vowels are more effective than other phonemes for human speaker recognition [6]. Ho...
Chapter
The effects of different transmission channel impairments on the human speaker recognition performance are assessed in this chapter by conducting two listening tests. Comparisons between NB, WB, and SWB channels are shown, as well as how the human performance is affected by the degradations introduced by speech coding, random packet loss, and elect...
Article
This work addresses the evaluation of the human and the automatic speaker recognition performances under different channel distortions caused by bandwidth limitation, codecs, and electro-acoustic user interfaces, among other impairments. Its main contribution is the demonstration of the benefits of communication channels of extended bandwidth, toge...
Conference Paper
Full-text available
In diesem Beitrag wird diskutiert, wie sich die Erkennbarkeit von Sprechern beim Übergang von schmalbandiger auf breitbandige oder super-breitbandige Telefonübertragung verbessert. Dazu wurde zunächst ein Versuchsparadigma definiert, bei dem innerhalb einer Gruppe bekannte Sprecher von Versuchspersonen auf Basis von Segmenten unterschiedlicher Läng...
Conference Paper
Full-text available
It is commonly acknowledged that the introduction of wideband and super-wideband speech transmission in Voiceover- IP leads to an improved overall quality compared to traditional narrowband telephony. However, beyond overall quality, dimensions such as coloration, continuity, noisiness, loudness, human speaker identification ability, as well as aut...
Conference Paper
Full-text available
Past studies have shown evidence of important speaker-specific content in the higher frequencies of the spectrum, which are filtered out by narrowband channels. Besides, wideband transmissions, which are gaining ground over narrowband communications, offer an extended range of frequencies which account not only for better speech quality and intelli...
Technical Report
ITU-T Contribution COM 12-198 (2014). Operational Quality Estimator: Comparison of Transmission Quality Dimensions of Narrowband, Wideband, and Super-Wideband Channels, Deutsche Telekom AG (Authors: S. Möller, F. Köster, L. Fernández Gallardo, M. Wagner), ITU-T SG12 Meeting, 2-11 Sept. 2014, CH-Geneva. Download Link: http://www.itu.int/md/T13-SG12...
Conference Paper
Full-text available
The automatic detection of people's identity and characteristics such as age, gender, emotion and personality from their voices generally requires the transmission of the speech to remote servers that perform the recognition task. This transmission may introduce severe distortions and channel mismatch that degrade the system performance or vary the...
Conference Paper
Full-text available
Telecommunication systems available today allow efficient voice transmission through channels of different audio bandwidths and terminated with different user interfaces. However, the sending and receiving user interface, the bandwidth limitation and the effects of lossy signal compression degrade the quality of the received signal and impede an un...
Technical Report
ITU-T Contribution COM 12-42 (2013). Human Speaker Identification Over Transmission Channels of Different Bandwidths and Impairments, Deutsche Telekom AG (Authors: L. Fernández Gallardo, S. Möller, M. Wagner), ITU-T SG12 Meeting, 19 – 28 Mar. 2013, CH-Geneva. Download link: http://www.itu.int/md/T13-SG12-C-0042/en
Conference Paper
Full-text available
The traditional Public Switched Telephone Network (PSTN) is the primary platform for voice communications and is commonly limited to narrowband (NB). It has been shown, however, that wideband (WB) communications produce a higher quality speech signal compared to conventional NB. Additionally, the channel bandwidth plays a critical role in enabling...
Conference Paper
Current speaker recognition applications involve the authentication of users by their voices for access to restricted information and privileges. The speech signal is often transmitted to the recognizer through communication channels presenting different transmission characteristics. The aim of this paper is to study the effects of speech bandwidth...
Conference Paper
The grasping skill is an indispensable quality for general service robotics. In a home-like natural environment, manipulated objects may be unknown in advance, which prevents the use of a combination of traditional grasp planning and visual pose estimation to realize grasping. Stereo vision is an inexpensive and relatively general sensor for 3-D ob...

Network

Cited By

Projects

Project (1)
Project
When listening to unknown voices, humans tend to make spontaneous inferences about the perceived personality and voice likability of their interlocutors. The voices heard are generally transmitted through communication channels, e.g. in telephone-based speech applications. However, the study of transmission channel effects has not yet been addressed in previous investigations of human and automatic detection of personality traits and likability. Besides, regarding the automatic prediction of these speaker characteristics, the binary classification task has principally been tackled despite the continuous nature of the perceptive ratings. This project examines the influence of transmission channels of different settings, such as bandwidth, codec and user interface, on speaker personality and likability detection by humans and machines. Conversational speech data in German, needed for the proposed analyses, are recorded. On the human side, crowdsourcing can be employed to rapidly and reliably gather listeners’ assessments from large transmitted speech material. On the automatic side, regression models are considered for personality and likability prediction, employing state-of-the-art techniques such as deep neural networks. The validity of speech quality measures as predictors of these speaker characteristics are also studied. The outcomes will elucidate which transmission channels can preserve the voice properties that determine the perceived personality and likability, and how these can be automatically predicted. This can be used in applications based on telephone speech which aim at estimating perceived speaker characteristics and at foreseeing subsequent user behavior. Research questions: 1. Effects of transmission channels on human personality and likability perceptions: How do different transmission channel impairments influence the speaker personality perceived by a listener? And how do they alter the voice likability perceptions? 2. Effects of transmission channels on automatic systems: How do different transmission channel impairments affect the automatic prediction of speaker personality? And how do they modify the automatic prediction of voice likability? 3. Relations between transmitted speech quality and speaker personality and likability: Do subjective and instrumental speech quality measures correlate with perceived personality traits? And with the perceived likability? Can speech quality measures assist the prediction of perceived personality and likability? Time Frame: 02/2016 - 06/2018 --- Funding by: Deutsche Forschungsgemeinschaft (DFG) http://www.qu.tu-berlin.de/menue/research/running_projects/detection_of_speaker_personality_and_likability