
Phil RoseAustralian National University | ANU · Emeritus faculty
Phil Rose
PhD Cambridge 82; MA Manchester 74; BA Hons 1st 72
About
74
Publications
14,078
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,815
Citations
Citations since 2017
Introduction
I am a speech scientist with expertise in forensic voice comparison and the phonetics of Asian tone languages. You can view and download ALL my papers (not just the ones on Researchgate) on forensic voice comparison and tones from my web-site: philjohnrose.net. My web-site also contains some numerical data-sets for download that have used in my forensic and tonal research and might be useful for other researchers in forensic voice comparison and tone.
Publications
Publications (74)
Examples are given of forensic voice comparison with higher level features in real-world cases and research. A pilot experiment relating to estimation of strength of evidence in forensic voice comparison is described which explores the use of higher-level features extracted over a disyllabic word as a whole, rather than over individual monosyllable...
An experiment relating to estimation of strength of evidence in forensic voice comparison is described which explores the use of F-pattern and tonal F0 trajectories extracted over a disyllabic word as a whole, rather than over individual monosyllables as conventionally practiced. The first three formants and tonal F0 (measured in Hz) of Cantonese d...
An acoustically-based description is given of the isolation tones and right-dominant tone sandhi in disyllabic words of a male speaker of the Chinese Oūjiāng Wú dialect of Wénchéng. His seven isolation tones show typical Wu complexity, comprising two mid-level, two rising, two falling-rising and one depressed level pitch shapes. Typical too is his...
The first use of likelihood ratios for the evaluation of forensic voice comparison evidence in a real trial in Australia is documented. Important steps in the process of estimating the strength of evidence - from acoustic-phonetic features in the utterances 'yes' and 'not too bad' - are described and explained. These comprise the nature and currenc...
The suitability of vowel cepstral spectra for forensic voice comparison is explored within a likelihood ratio-based framework, and non-technical explanations provided for some basic concepts of cepstral analysis and forensic voice comparison. Non-contemporaneous landline telephone recordings of 297 male Japanese speakers are compared using only two...
Experiments are described which investigate the ability of listeners to identify and distinguish between individuals with similar voices whom they know well. Subjects were tested on speech of varying duration from four familiar male speakers and two foils. Results in identification tasks range from chance for single words to almost perfect for long...
A protocol for the collection of databases of audio recordings for forensic-voice-comparison research and practice is described. The protocol fulfils the following requirements. (1) The database contains at least two non-contemporaneous recordings of each speaker. (2) The database contains recordings of each speaker using different speaking styles,...
The suitability of voiceless fricative spectra for forensic voice comparison is explored within a Likelihood Ratio-based framework. Non-contemporaneous landline telephone recordings of 99 male Japanese speakers are compared using only tokens of their voiceless alveolo-patalal fricative [ç]. A subset of mean-cepstrally-subtracted LPC CCs from the fr...
Auditory and acoustic descriptions are presented for the tones and tone sandhi of two Wenzhou speakers in disyllabic words with tones from the historical ping + qu categories. Differences between the speakers both in isolation tones and tone sandhi are demonstrated. It is suggested that the opaque morphotonemic relationships between sandhi and isol...
The suitability of vowel cepstral spectra for forensic voice comparison is explored within a likelihood ratio-based framework. Non-contemporaneous landline telephone recordings of 297 male Japanese speakers are compared using only two replicates each of their five vowels. 14 cepstrally-mean-subtracted LPC CCs from dc to 5 kHz are used as features....
An acoustic-phonetic forensic-voice-comparison system was constructed using the time-averaged formant values of tokens of 61 male Chinese speakers' /i/, /e/, and /a/ monophthongs as input. Likelihood ratios were calculated using a multivariate kernel density formula. A separate set of likelihood ratios was calculated for each vowel phoneme, and the...
Recently there has been a great deal of concern in forensic science about validity and reliability (accuracy and precision). The log-likelihood-ratio cost (C(llr)), developed for automatic speaker recognition, is increasingly applied as a standard measure of accuracy in forensic voice comparison, but so far there has been little work on developing...
In the last decade, forensic voice comparison has experienced a remarkable paradigm shift [Morrison, Sci. Justice 49, 298-308 (2009)]. Both automatic and traditional phonetic approaches have been developed within the new paradigm. The main difference is that traditional approaches are typically local in both time and frequency domains, with feature...
The consequences of ignoring correlations between features in traditional forensic speaker recognition are investigated. Two likelihood ratio-based discrimination experiments on the same multivariate formant data are described, one taking correlation into account and the other not doing so. The discrimination is performed using Naive Bayes univaria...
Under an ARC Linkage Infrastructure, Equipment and Facilities (LIEF) grant, speech science and technology experts from across Australia have joined forces to organise the recording of audio-visual (AV) speech data from representative speakers of Australian English in all capital cities and some regional centres. The Big Australian Speech Corpus (th...
Despite its many prima facie attractive properties for forensic speaker recognition, F0 is regarded as having limited forensic value due to its large within-speaker variability. However, its forensic use to date has been limited mostly to its long-term mean and standard deviation. This paper examines the discriminatory potential, within a Likelihoo...
Acoustic and auditory data are presented for the citation tones of single speakers from nine sites (eight hitherto undescribed in English) from the little-studied Chuqu subgroup of Wu in East Central China: Lìshuǐ, Lóngquán Qìngyuán, Lóngyóu, Jìnyún, Qingtián Yúnhé, Jǐngní ng, and Táishùn. The data demonstrate a high degree of complexity, having no...
Contemporary speech science is driven by the availability of large, diverse speech corpora. Such infrastructure underpins research and technological advances in various practical, socially beneficial and economically fruitful endeavours, from ASR to hearing prostheses. Unfortunately, speech corpora are not easy to come by because they are both expe...
Large auditory-visual (AV) speech corpora are the grist of modern research in speech science, but no such corpus exists for Australian English. This is unfortunate, for speech science is the brains behind speech technology and applications such as text-to-speech (TTS) synthesis, automatic speech recognition (ASR), speaker recognition and forensic i...
The likelihood-ratio approach to forensic speaker recognition seeks to determine the likelihood that one would observe the evidence, the acoustic difference between suspect and offender speech samples, under the hypothesis that they were produced by the same speaker versus under the hypothesis that they were produced by different speakers. Before t...
Forensic DNA profiling is acknowledged as the model for a scientifically defensible approach in forensic identification science, as it meets the most stringent court admissibility requirements demanding transparency in scientific evaluation of evidence and testability of systems and protocols. In this paper, we propose a unified approach to forensi...
A large-scale forensic discrimination experiment is described that investigates how well same-speaker speech samples can be discriminated from different-speaker speech samples using acoustic parameters from Australian English vowels. A multivariate likelihood ratio is used as a discriminant function on the five tense and six lax vowel phonemes of 1...
Long-term mean F0 (LTF0) is a popular parameter in Forensic Speaker Recognition (FSR). Its popularity probably stems from promising results in early SR research (eg. Atal, 1972, Sambur, 1975), together with its conforming to three of Nolan's (1983) desiderata for FSR parameters, namely: robustness, measurability, and availability. On the other hand...
The necessity of taking correlation between variables into account when estimating strength of forensic speaker recognition evidence is argued for. A modest forensic speaker discrimination experiment is described which investigates how well non-contemporaneous speech samples from the same speaker can be discriminated from different-speaker samples...
Important aspects of Technical Forensic Speaker Recognition, particularly those associated with evidence, are exemplified and critically discussed, and comparisons drawn with generic Speaker Recognition. The centrality of the Likelihood Ratio of Bayes’ theorem in correctly evaluating strength of forensic speech evidence is emphasised, as well as th...
This paper describes an experiment investigating how well same-speaker speech samples can be discriminated from different-speaker speech samples using acoustic parameters from Australian English diphthongs. A two-level kernel density multivariate likelihood ratio is used as a discriminant function on five of the diphthongs of the 171 speakers of th...
Acoustic descriptions of citation tones are presented for speakers from seven sites in the Oujiang sub-group of the Wu dialects of east central China. The homogeneity of two tones within the Oujiang sub-group is demonstrated by quantified comparison with Shanghai tones, and it is pointed out that the normalised tonal acoustics can be interpreted hi...
This paper describes a discrimination experiment in forensic speaker recognition using the Australian English diphthong /a/. A two-level kernel density multivariate likelihood ratio is used as a discriminant function to investigate how well non-contemporaneous same-speaker speech samples of /a/ can be forensically discriminated from different-spe...
This paper has discussed some important aspects of forensic speaker recognition. It has emphasised that the task of a forensic speaker recognition expert is, after first quantifying the differences or similarities between the samples they are comparing, to estimate how much more likely this evidence is, assuming the samples have come from the same...
Auditory and acoustic descriptions and hermeneutic tonological analysis are presented for the lexical tone sandhi in two subsets of disyllabic tonal combinations in the Southern Wu dialect of Wenzhou. The effects of stress are shown to be an important factor in accounting for the tones in one of the combinations.
Mean fundamental frequency and duration data are presented for citation tones of Hong Kong Cantonese on short stopped syllables for five male and five female young native speakers. The relationship between the acoustics of the speakers' stopped and unstopped tones is examined and it is shown that the low-stopped tone F0 is for most speakers acousti...
A forensic-phonetic speaker identification experiment is described which tests to what extent same-speaker pairs from a 60 speaker Japanese data base can be discriminated from different-speaker pairs using a Bayesian likelihood ratio (LR) as discriminant function. Non-contemporaneous telephone recordings are used, with comparison based on mean valu...
This paper presents an analysis based on new acoustic data from tones and tone sandhi in Wenzhou dialect. The data provide evidence for the independent existence of a Depressor as well as the Tonal Register and Tonal melodic component in the tonology. For example, a convex [343 or 342] pitch is shown to result from a depressor effect on a tone with...
A pilot forensic-phonetic experiment is described which compares the performance of formant- and cepstrally-based analyses on forensically realistic speech: intonationally varying tokens of the word hello said by six demonstrably similar-sounding speakers in recording sessions separated by at least a year. The two approaches are compared with respe...
Forty-one pairs of words with CVCVC structure selected from Old Japanese and Old Javanese dictionaries are presented. It is claimed that these are the result of borrowing into an antecedent of Old Japanese from an Indonesian source. Semantic relationships are discussed, and sound correspondences are specified within a discussion of the segmental ph...
Forty-one pairs of words with CVCVC structure selected from Old Japanese and Old Javanese dictionaries are presented. It is claimed that these are the result of borrowing into an antecedent of Old Japanese from an Indonesian source. Semantic relationships are discussed, and sound correspondences are specified within a discussion of the segmental ph...
This paper reports the results of a forensic phonetic experiment which investigates the nature of long- and short-term within-speaker differences in the F-pattem of the same word hello said by six similar-sounding male speakers of Australian English. Short-term differences are obtained from recordings separated by about one minute, long-term differ...
Forensic Phonetics is an important application of Linguistics that has emerged as a discipline over the last decade. This paper describes a Forensic Phonetic experiment which investigates the nature of within- and between-speaker variation in the acoustic characteristics of the word hello in demonstrably similar-sounding voices. The nature of withi...
The abstract for this document is available on CSA Illumina.To view the Abstract, click the Abstract button above the document title.
This paper investigates the possibility of describing vowels phonetically using an automated method. Models of the phonetic dimensions of the vowel space are built using two multi-layer perceptrons trained using eight cardinal vowels. The paper aims to improve the positioning of vowels in the open-close dimension by experimenting with a parameter i...
This paper investigates the possibility of describing vowels
phonetically using an automated method. Models of the phonetic
dimensions of the vowel space are built using two multi-layer
perceptrons trained using eight cardinal vowels. The paper aims to
improve the positioning of vowels in the open-close dimension by
experimenting with a parameter i...
The linguistic phonetic properties of Shanghai tones are specified from normalised mean fundamental frequency and duration data of four male and three female speakers. Corroborative normalised F0 shapes are derived for an additional nine Shanghai speakers, and the Shanghai data compared with another Wu dialect. A linguistic phonetic contrast is dem...
An attempt is described to ascertain how well the F0 of the isolation tones of a variety of Chinese can be normalised using the mean and standard deviation from seven speakers' long term F0 distributions. Six of the seven speakers' cumulative mean and standard deviation stabilised after approximately 20 s and 10 s, respectively, of voiced speech. L...
This paper presents a detailed acoustic and auditory description of the kind of complex tone sandhi found in the Northern Wu dialects of Chinese. Mean fundamental frequency, amplitude and duration values from many tokens of 1 native speaker of Zhenhai dialect are used to show how the acoustical characteristics of the 6 citation tones can be related...
L'auteur donne une description perceptuelle, acoustique et physiologique des modes phonatoires - chuchottement, voix chuchottée, et
Some considerations in the normalisation of tone are discussed, and their application demonstrated on the fundamental frequency data of seven speakers of a variety of Wu Chinese. It is argued that, although a considerable reduction in between-speaker variance can be achieved by either a Z-Score or Fraction of Range normalisation, the former strateg...
Many phoneticians are remarkably expert at 'reading' speech waveforms. This paper describes an attempt to capture this knowledge for use as a segmentation and early labelling knowledge source for a continuous speech recognition system. As well as deriving information from the waveform directly, the decisions made by the waveform deciphering knowled...
The first likelihood ratio-based forensic voice comparison on female voices, and the first forensic use of Gaussian mixture models on traditional features, are described. A GMM-UBM LR-based comparison is performed on the first three formants of the five long /monophthongs/ of 20 General Australian English female speakers in non-contemporaneous reco...
Auditory and acoustic data are presented to document some tonologically challenging aspects of the seven tones of Wencheng, a south-west Oujia ng W u dialect of Chinese. Tones are presented both in isolation form and in selected lexical tone sandhi combinations. It is shown how the complexity of the tonal sy stem results in clashes between tonologi...
Thesis (Ph. D.)--University of Cambridge, 1982.