Phil Rose

University of Canberra, Canberra, Australian Capital Territory, Australia

Are you Phil Rose?

Claim your profile

Publications (4)3.1 Total impact

  • Article: Combining linguistic and non-linguistic information in likelihood-ratio-based forensic voice comparison: A hybrid automatic-traditional system.
    Phil Rose
    [show abstract] [hide abstract]
    ABSTRACT: In the last decade, forensic voice comparison has experienced a remarkable paradigm shift [Morrison, Sci. Justice 49, 298-308 (2009)]. Both automatic and traditional phonetic approaches have been developed within the new paradigm. The main difference is that traditional approaches are typically local in both time and frequency domains, with features like formant frequencies extracted from linguistically comparable items (e.g., words or phonemes), whereas automatic approaches are typically global, with long-term spectral properties used and linguistic information treated as noise. Since neither makes use of all the information present, combining them could improve performance. A fully automatic and a partially traditional system were compared. Data were pairs of non-contemporaneous landline-telephone recordings of 60 speakers from the Japanese National Research Institute of Police Science database (net 35-40 s speech per recording). In the fully automatic system, the whole speech-active portion of the recording was analyzed using 12th order LPCCs, mean cepstral subtraction, GMM-UBM, and logistic-regression calibration. In the partially traditional system, the same procedures were applied only to tokens of [o:], [ɴ], and [ç] extracted from the recordings, with logistic-regression fusion of the results. The performance of each system and the fusion of the two were compared using the log-likelihood-ratio cost (C(llr)).
    The Journal of the Acoustical Society of America 10/2010; 128(4):2378. · 1.55 Impact Factor
  • Article: Extraction of likelihood-ratio forensic evidence from the formant trajectories of diphthongs.
    [show abstract] [hide abstract]
    ABSTRACT: The likelihood-ratio approach to forensic speaker recognition seeks to determine the likelihood that one would observe the evidence, the acoustic difference between suspect and offender speech samples, under the hypothesis that they were produced by the same speaker versus under the hypothesis that they were produced by different speakers. Before the results of a scientific forensic technique can be presented in court, it is necessary to demonstrate its efficacy. This presentation tests the efficacy of extracting information from the formant trajectories of diphthongs. Differences in physiology and learned motor patterns could potentially lead to different speakers producing quite different formant trajectories which could in turn lead to strong forensic evidence. The data tested were aI, aupsilon, eI, "open o"I, oupsilon, i"schwa", and varepsilon"schwa" tokens produced in several phonetic contexts by 27 male speakers of Australian English. Cubic polynomials were fitted to each vowel token, and the coefficient values were used in a multivariate-kernel-density procedure which calculated likelihood ratios. Cross-validated same-speaker and different-speaker comparisons were made, resulting in a series of same-speaker and different-speaker likelihood ratios for each vowel phoneme. Results indicated that substantial strength of evidence with respect to speaker identity can be extracted from diphthong formant trajectories.
    The Journal of the Acoustical Society of America 06/2008; 123(5):3877. · 1.55 Impact Factor
  • Source
    Article: A response to the UK position statement on forensic speaker comparison
  • Source
    Article: Realistic extrinsic forensic speaker discrimination with the diphthong/a¿
    [show abstract] [hide abstract]
    ABSTRACT: This paper describes a discrimination experiment in forensic speaker recognition using the Australian English diphthong /a/. A two-level kernel density multivariate likelihood ratio is used as a discriminant function to investigate how well non-contemporaneous same-speaker speech samples of /a/ can be forensically discriminated from different-speaker speech samples using just this diphthong's F-pattern at its two targets. Natural speech elicited from 25 Australian-English speaking males is extrinsically evaluated against a reference population of 166 male speakers from Bernard's database. Comparing samples with 12 diphthong tokens each, a respectable well-calibrated EER of between ca. 8% and 10% is obtained. Forensically important aspects of the results are discussed, including an assessment of the suitability of the reference population.