Anders ErikssonStockholm University | SU · Department of Linguistics
Anders Eriksson
MSc, Ph.D
About
75
Publications
39,115
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,520
Citations
Additional affiliations
March 2014 - present
January 2009 - February 2014
Publications
Publications (75)
This study assessed the influence of speaker similarity and sample length on the performance of an automatic speaker recognition (ASR) system utilizing the SpeechBrain toolkit. The dataset comprised recordings from 20 male identical twin speakers engaged in spontaneous dialogues and interviews. Performance evaluations involved comparing identical t...
This study aimed to assess what we refer to as the speaker discriminatory power asymmetry and its forensic implications in comparisons performed in different speaking styles: spontaneous dialogues vs. interviews. We also addressed the impact of data sampling on the speaker's discriminatory performance concerning different acoustic-phonetic estimate...
This pilot study set out to assess the speaker discriminatory power asymmetry regarding parameters from different phonetic dimensions in spontaneous speech, i.e., spectral, melodic, and temporal. The speech material consisted of spontaneous telephone conversations between siblings. The participants were 20 male subjects, Brazilian Portuguese speake...
This study aimed to analyze the impact of the amount of data on the discriminatory performance of acoustic-phonetic parameters, some of which are frequently assessed in forensic speaker comparisons. Parameters from three distinct phonetic domains were considered, namely, spectral, melodic, and temporal, which were assessed separately within the sam...
The purpose of this study was to assess the speaker-discriminatory potential of a set of speech timing parameters while probing their suitability for forensic speaker comparison applications. The recordings comprised of spontaneous dialogues between twin pairs through mobile phones while being directly recorded with professional headset microphones...
Objective:
To assess the speaker-discriminatory potential of a set of fundamental frequency estimates in intraidentical twin pair comparisons and cross-pair comparisons (i.e., among all speakers).
Participants:
A total of 20 Brazilian Portuguese speakers of the same dialect, namely 10 male identical twin pairs aged between 19 and 35, were recrui...
The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels’ acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction...
This work comprises an experimental investigation approach of expressive speech that integrates methodological procedures of perceptual and acoustic analyses. As the object of this work, we have focused on voice quality and vocal dynamics. Speech samples from the four main personality-distinct characters in the animated feature film “Zootopia” dubb...
In this study, we outline a methodology to quantify the degree of similarity between pairs of f0 distributions based on the Anderson-Darling measure that underlies its namesake goodness-of-fit test. The procedure emphasizes differences due to more fine-grained f0 modulations rather than differences in measures of central tendency, such as the mean...
In this study, we expand on previous experiments designed with the aim of determining the minimum length that an audio sample should have in order for the speaking rate derived from it to be representative of the sample as a whole. We compare two different approaches to establishing that the time series of the cumulative speaking rate calculated ov...
This study of lexical stress in English is part of a series of studies, the goal of which is to describe the acoustics of lexical stress for a number of typologically different languages. When fully developed the methodology should be applicable to any language. The database of recordings so far includes Brazilian Portuguese, English (U.K.), Estoni...
We investigated long-term mean, median and base value of F0 to estimate how long it takes their variability to stabilize. Change point analysis was used to locate stabilization points. In one experiment, stabilization points were calculated in record-ings of the same text spoken in 26 languages. Average stabi-lization points are 5 seconds for base...
The major aim was to examine the effect of the perpetrator's tone of voice and time delay on voice recognition. In addition, the effect of two types of voice description interviews intended to strengthen voice encoding was tested. Both 11- to 13-year-olds (n = 160) and adults (n = 148) heard an unfamiliar voice for 40 s. The perpetrator either spok...
The study presented here is one in a series of studies intended to describe the acoustics of word stress for several typologically different languages in a common framework. The idea is that, when fully developed the methodology should be applicable to any language in the same way regardless of prosodic type. The languages included in the present r...
This work aims at examining three classes of acoustic correlates of lexical stress in Brazilian Portuguese (BP) in three speaking styles: informal interview, phrase reading and word list reading. In the framework of an international collaboration, a parallel corpus was recorded in the three speaking styles with 10 subjects so far in each one of the...
Resumo: Este trabalho tem como objetivo avaliar a taxa de reconhecimento de locutor entre grupos de falantes e não-falantes do português brasileiro e investigar qual o tipo de informação (acústica e/ou lexical) empregada durante a tarefa de verificação de locutor, além de tecer considerações sobre possíveis pistas acústicas que estariam interferind...
In this chapter focus will be on speech analysis in a forensic context. Both so called aural/acoustic approaches and automatic methods will be considered and their application in a forensic context described. Forensic casework introduces many challenges not found in the laboratory settings where the applied methods were originally developed. Forens...
This work aims at characterising and comparing acoustic correlates of lexical stress in Swed-ish and Brazilian Portuguese (BP). For doing so, a parallel corpus was recorded in three speaking styles (spontaneous, read phrases and read words) with 10 subjects in each language. In both languages duration, F0 standard-deviation and spectral emphasis va...
The SweDia 2000 dialect database (SweDat as we refer to it in our daily work) is a speech database containing recordings of Swedish dialects from all over Sweden and Swedish speaking communities in Finland. The database contains recordings of at least 12 speakers per dialect from 107 locations. A little over 1300 speakers have been recorded and the...
The aim of the study was to find ways to enhance earwitnesses’ memory for voices and content. Another aim was to evaluate an interview protocol used by the Swedish Security Service. Three different types of interviews were compared; the Cognitive Interview, the Swedish Security Service checklist, and a baseline interview. Both 11–13-year-olds (n =...
This study examined the reliability of earwitnesses using an ecologically realistic experimental set-up. A total of 282 participants,
distributed over three age-groups (7–9 vs. 11–13year olds vs. adults), were exposed to an unfamiliar voice for 40 seconds.
After a two week delay, they were presented with a 7-voice lineup. Half of the participants w...
The present study aimed to gain insight into the effect of mobile phone quality on voice identification using an ecologically realistic design. A total of 165 participants were exposed to an unfamiliar voice, either directly recorded or mobile phone recorded, for 40 seconds. After a two week delay, they were asked to identify the target-voice in a...
In this chapter we have seen how various types of disguise may affect both the recognition of a speaker by voice and discrimination between unfamiliar speakers. We have also seen how naturally occurring variation like foreign language, dialect or accent may influence recognition and discrimination in similar ways. One may perhaps say that some of t...
The project described here may be seen as a continuation of an earlier project, SweDia 2000, aimed at transforming the database collected in that project to a full-fledged e-science database. The database consists of recordings of Swedish dialects from 107 locations in Sweden and Swedish speaking parts of Finland. The goal of the present project is...
In this chapter we have seen how various types of disguise may affect both the recognition of a speaker by voice and discrimination between unfamiliar speakers. We have also seen how naturally occurring variation like foreign language, dialect or accent may influence recognition and discrimination in similar ways. One may perhaps say that some of t...
In a set of experiments, subjects had to estimate the liveliness of an utterance in which variations in the speaker's age, sex, articulation rate, voice register, and liveliness were simulated. The results showed that listeners equate Fo intervals that are equal in semitones, as long as no variation in voice register is involved.
Proceedings of the NODALIDA 2009 workshop Nordic Perspectives on the CLARIN Infrastructure of Language Resources. Editors: Rickard Domeij, Kimmo Koskenniemi, Steven Krauwer, Bente Maegaard, Eiríkur Rögnvaldsson and Koenraad de Smedt. NEALT Proceedings Series, Vol. 5 (2009), 1-5. © 2009 The editors and contributors. Published by Northern European As...
A lie detector which can reveal lie and deception in some automatic and perfectly reliable way is an old idea we have often met with in science fiction books and comic strips. This is all very well. It is when machines claimed to be lie detectors appear in the context of criminal investigations or security applications that we need to be concerned....
This paper deals with the perception of linguistic and paralinguistic qualities conveyed by synthetic vowels produced with an articulatory model in which transfer functions of the French vowels/i y e o e ce/characteristic of five growth stages were each combined with five different F0 values. Listeners had to judge the speaker's age and sex in addi...
In an experiment reported previously, subjects rated perceived syllable prominence in a Swedish utterance produced by ten speakers at various levels of vocal effort. The analysis showed that about half of the variance could be accounted for by acoustic factors. Slightly more than half could be accounted for by linguistic factors. Here, we report tw...
Transcription raises speakers' awareness of sound systems and, in the case of language learners, of pronunciation errors. It is also a valuable diagnostic technique in pro-nunciation competence assessment. Nevertheless, tran-scription requires extensive practice and feedback, making heavy demands on tutors. Autonomous transcription learning can be...
The Computer Aided Learning (CAL) working group of the SOCRATES thematic network in Speech Communication Science have studied how the Internet is being used and could be used for the provision of self-study materials for education. In this paper we follow up previous recommendations for the design of Internet tutorials with recommendations for thei...
This paper consists of two, somewhat disparate parts. In the first part, some experiences of two years of fieldwork are summarized, concentrating, as the subtitle suggests, on the very heart of phonetic fieldwork: the encounters and interviews with the informants. As a result of the fieldwork, the project now has access to recordings from approxima...
The sound pressure level of vowels reflects several nonlinguistic and linguistic factors: distance from the speaker, vocal effort, and vowel quality. Increased vocal effort also involves the emphasis of higher frequency components and increases in F0 and F1. This should allow listeners to distinguish it from decreased distance, which does not have...
Abstract Inthis experiment, subjects had to rate the “prominence” of each of the syllables of 20 versions of the same utterance produced by men, women and children at various levels of vocal effort. The ratings were correlated with measurements ,of the SPL of the fundamental, spectral emphasis, vowel duration, F0max and ,F0 rise from the previous s...
A searchable database of speech samples from more than 100 Swedish dialects is being established for use in research and education. Each dialect is represented by at least 12 speakers. The recorded material comprises spontaneous speech as well as words and phrases elicited with a number of specific research goals in mind. This paper summarizes one...
The acoustic effects of the adjustment in vocal effort that is required when the distance between speaker and addressee is varied over a large range (0.3-187.5 m) were investigated in phonated and, at shorter distances, also in whispered speech. Several characteristics were studied in the same sentence produced by men, women, and 7-year-old boys an...
In previous studies of formant frequency discrimination, variation in the stimuli has mainly concerned the formant frequencies while other factors which may affect formant frequency discrimination have largely been ignored. In most studies, fundamental frequencies typical of adult male speakers have been used. In the study presented here, formant f...
The sound pressure level (SPL) of vowels received by a listener reflects several non-linguistic and linguistic factors: It varies as a function of distance, vocal effort, and vowel quality. Increased vocal effort involves, in addition to an increase in SPL, an emphasis of higher frequency components and increases in F0 and F1. This should allow lis...
The notion that conversational speech is rhythmically organized has been explored in a variety of disciplinary frameworks, but the acoustic phonetic basis for this notion has been problematic. Intensive analysis of a one-minute fragment of dialogue between two American men utilizing segmental and prosodic analysis reveals a hierarchy of units that...
In two previous studies of the phonetics of impersonation (Wretling, 1997; Eriksson & Wretling, 1997), it was shown that whereas acoustic targets in the frequency domain (e.g. frequency means and formant frequencies) were attained with a high degree of similarity, mimicking the timing pattern of a target voice was less successful. In the samples st...
The paper describes a laboratory course for undergraduate students of phonetics developed and used at the Department of Phonetics at Umeå university. The course consists of exercises designed to acquaint the students with basic acoustic analysis tools and methods used in speech research and experimental methods used in the study of speech perceptio...
Accurate measurement of formant frequencies is important in many studies of speech perception and production. Errors in formant frequency estimation by eye, using a spectrogram, or automatically, using linear prediction, have been reported to be as high as 60 Hz at F0 < 300 Hz. This exceeds the typical auditory difference limens (DLs) for formant f...
In order to learn how listeners evaluate F0 excursions, a set of experiments was performed in which subjects had to estimate the liveliness of utterances. The stimuli were obtained by LPC analysis of one natural utterance that was modified by resynthesizing F0, the formant frequencies, and the time scale in order to simulate some of the natural ext...
Published data on the frequency of the voice fundamental (F 0) in speech show its range of variation, often expressed in terms of two standard deviations (SD) of the F 0-distribution, to be approximately the same for men and women if expressed in semitones, but the observed SD varies substantially between different investigations. Most of the diffe...
This study examines some aspects of speech rhythm, with particular reference to Swedish.
A background to the problem area is given and some fundamental problems pointed out. Some theoretical issues are also studied. The question of how to describe and model interstress interval duration is addressed. It is shown, using published data from five lan...
In an experiment reported previously, subjects rated perceived syllable prominence in a Swedish utterance produced by ten speakers at various levels of vocal effort. The analysis showed that about half of the variance could be accounted for by acoustic factors. Slightly more than half could be accounted for by linguistic factors. Here, we report tw...
The Computer Aided Learning working group of the SOCRATES thematic network in Speech Communica-tion Sciences have studied how the Internet is being used and could be used for the provision of self-study materials. In this paper we build on our findings and make recommendations that should be useful to any current or potential author of tutorial mat...
Speech communication science is highly multidisciplinary and therefore requires access to a wide variety of materials and specialist knowledge in many different fields such as, linguistics, psychology, anatomy, acoustics, signal processing etc. For obvious reasons, specialist competence in all these fields is not normally found in a single departme...
The present report is a summary of a talk given to the Workshop on Education and Research in Speech Communication Sciences. The focus of the talk was on web-accessible resources for teaching speech sciences and is closely related to work done within the ERASMUS/SOCRATES Thematic Network in Phonetics and Speech Communication of which the present aut...
Comparison between the way human listeners judge voice similarity and how state-of-the art GMM-UBM systems for voice recognition compare voices is a little explored area of research. In this study groups of informants judged the similarity between voice samples taken from a set of fairly similar male voices that had previously been used in a voice...