Figure 3 - uploaded by Rafal Samborski
Content may be subject to copyright.
Source publication
Playback attacks constitute one of the biggest threats in biometric speaker verification systems, in which a previously recorded passphrase is played back by an unprivileged person in order to gain access. This paper features a description of the playback attack detection (PAD) algorithm, designed to protect text-dependent speaker verification syst...
Similar publications
Progress in the development of spoofing countermeasures for automatic speaker recognition is less advanced than equivalent work related to other biometric modalities. This chapter outlines the potential for even state-of-the-art automatic speaker recognition systems to be spoofed. While the use of a multitude of different datasets, protocols and me...
The advances in Automatic Speaker Verification (ASV) system for voice biometric purpose comes with the danger of spoofing attacks. The replay attack is the most accessible attack, where the attacker imitates speaker's identity by replaying the pre-recorded speech samples of the target speaker. Most of the conventional features, such as Mel Frequenc...
Research in the area of automatic speaker verification (ASV) has advanced enough for the industry to start using ASV systems in practical applications. However, as it was also shown for fingerprints, face, and other verification systems, ASV systems are highly vulnerable to spoofing or presentation attacks, limiting their wide practical deployment....
Citations
... This method is mainly developed on two different signal features, one is based on a special physical phenomena, as we discussed in this paper, and another is based on characteristics that can indirectly explore the difference between original signal and replayed signal like the challenge-response based detection methods [10]. Detection methods based on speech randomness [11] and some detection methods based on other clues (lip movement phenomenon [12], voice liveness detection [13]). In addition to looking for some indirect features, researchers also consider distortion and some additional noise mixed in speech signal after being replayed. ...
Automatic Speaker Verification (ASV) has its benefits compared to other biometric verification methods, such as face recognition. It is convenient, low cost, and more privacy protected, so it can start being used for various practical applications. However, voice verification systems are vulnerable to unknown spoofing attacks, and need to be upgraded with the pace of forgery techniques. This paper investigates a low-cost attacking scenario in which a playback device is used to impersonate the real speaker. The replay attack only needs a recording and playback device to complete the process, so it can be one of the most widespread spoofing methods. In this paper, we explore and investigate some spectral clues in the high sampling rate recording signals, and utilize this property to effectively detect the replay attack. First, a small scale genuine-replay dataset of high sample rates are constructed using some low-cost mobile terminals; then, the signal features are investigated by comparing their spectra; machine learning models are also applied for evaluation. The experimental results verify that the high frequency spectral clue in the replay signal provides a convenient and reliable way to detect the replay attack.
... Replay attacks are easy to perform and their threat to the reliability of ASV has been studied widely [1,8,22]. Replay attacks use recordings of a target speaker's voice which is replayed to the ASV system in place of genuine speech [14,19]. ...
... Many audio forensic methods have been proposed for various audio editing operations. In addition to common operations such as pitch-shifting [24,25,27], device source [15,21,30], replaying [6] and operation-chain detection [23], fakequality operations such as double compression [13,14,28] have been attracted more attentions in recent years. Luo et al. [14] proposed a method for detecting double compressed AMR audio based on deep learning in 2014. ...
Fake-quality audio detection is an important branch in the field of digital audio forensics. Resampling and recompression are the two typical operations to achieve fake audio quality, in which an audio with low sampling/bit rate can be converted to one with higher sampling/bit rate pretending to be in high quality. Stereo-faking is another fake-quality operation, with which a mono audio can be converted into a stereo one. To detect the stereo-faking, a few forensic methods have been proposed. Little consideration, however, has been given to the security of these methods themselves. To expose the weakness of these stereo-faking detectors, an anti-forensic framework based on generative adversarial network is proposed. The fake stereo audio is created by generating a new channel audio based on a mono audio. Skip connection is adopted to ensure the quality of the generated audio. Considering that stereo application scenarios are mostly music and film recording, a large number of music and film recordings are downloaded from the Internet as our datasets. Use these datasets to train our model. The anti-forensic samples generated by the model are used to attack the most effective fake stereo audio detectors. Experimental results show that the generated fake stereo audio of music can significantly reduce its detection accuracy from about 99–30%, and the false acceptance rate can increase from 0.08% to about 69%. The fake stereo audio generated from the film recording can significantly reduce its detection accuracy from about 99–1.7%, and the false acceptance rate can increase from 0.02% to about 98%.
... In this paper, we focus on this difference as a criterion for distinguishing spoofed and bona fide utterances. As shown in Fig 1, it can be clearly seen that the production of replay speech has two more steps than genuine speech, that is, recording and playback [25]. The noises that remain from different types of recording and playback devices, help countermeasure systems to solve the problem more easily. ...
... Audio forensics [1] is an important branch of multimedia security, which can be used to evaluate the authenticity of digital audio. Many audio forensics methods have been proposed for various speech operations in addition to common audio forgeries, such as double compression [2,3], pitch shifting [4][5][6], device source [7][8][9], replaying [10] and the detection of the operation type and sequence of digital speech [11]. Fake-quality detection is a very important part of the field of audio forensics, such as in [12], in which the authors recompressed low bit rate audio into high bit rate audio. ...
The number of channels is one of the important criteria in regard to digital audio quality. Generally, stereo audio with two channels can provide better perceptual quality than mono audio. To seek illegal commercial benefit, one might convert a mono audio system to stereo with fake quality. Identifying stereo-faking audio is a lesser-investigated audio forensic issue. In this paper, a stereo faking corpus is first presented, which is created using the Haas effect technique. Two identification algorithms for fake stereo audio are proposed. One is based on Mel-frequency cepstral coefficient features and support vector machines. The other is based on a specially designed five-layer convolutional neural network. The experimental results on two datasets with five different cut-off frequencies show that the proposed algorithm can effectively detect stereo-faking audio and has good robustness.
... Many audio forensic methods have been proposed for various speech operations [2,3]. In addition to common audio forgeries such as pitch-shifting [4,5,6], device source [7,8,9], replaying [10] and the detection of the operation type and sequence of digital speech [11]. Fake-quality detection is a very important part of the field of audio forensics, such as the detection of fake quality [12]. ...
Channel is one of the important criterions for digital audio quality. General-ly, stereo audio two channels can provide better perceptual quality than mono audio. To seek illegal commercial benefit, one might convert mono audio to stereo one with fake quality. Identifying of stereo faking audio is still a less-investigated audio forensic issue. In this paper, a stereo faking corpus is first present, which is created by Haas Effect technique. Then the effect of stereo faking on Mel Frequency Cepstral Coefficients (MFCC) is analyzed to find the difference between the real and faked stereo audio. Fi-nally, an effective algorithm for identifying stereo faking audio is proposed, in which 80-dimensional MFCC features and Support Vector Machine (SVM) classifier are adopted. The experimental results on three datasets with five different cut-off frequencies show that the proposed algorithm can ef-fectively detect stereo faking audio and achieve a good robustness.
... The recent advances in speech technologies have posed a great threat to the ASV system with various spoofing attacks. There are four well-known attacks that present a serious threat to ASV systems, namely, mimicry [2], text-to-speech (TTS) [3], voice conversion (VC) [4], replay [5]. To counteract these spoofed attacks, countermeasures (CM) have been developed to detect spoofed attacks before speaker verification. ...
... In the last decades, Shang et al. [3] and Jakub et al. [4] proposed a replay attacks detection algorithm by comparing a test recording with the recordings that exist in the database. Wang et al. [5] developed a method by using channel information to detect replay attacks. ...
Spoofing attacks, especially replay attacks, pose great security challenges to automatic speaker verification (ASV) systems. Current works on replay attacks detection primarily focused on either developing new features or improving classifier performance, ignoring the effects of feature variability, e.g., the channel variability. In this paper, we first establish a mathematical model for replay speech and introduce a method for eliminating the negative interference of the channel. Then a novel feature is proposed to detect the replay attacks. To further boost the detection performance, four post-processing methods using normalization techniques are investigated. We evaluate our proposed method on the ASVspoof 2017 dataset. The experimental results show that our approach outperforms competing methods in terms of detection accuracy. More interestingly, we find that the proposed normalization strategy could also improve the performance of the existing algorithms.
... For synthetic attacks, a speaker model is first trained using speech from the true client; the resulting speaker model is then employed to synthesize the utterance to be used in the attack (De Leon et al., 2012;McClanahan et al., 2014;Sanchez et al., 2015;Satoh et al., 2001;. Finally, for playback attacks, a recording of the true client's utterance is used (Alegre et al., 2014b;Galka et al., 2015;Shang and Stevenson, 2008a;Wu et al., 2012;Gonzalez-Rodriguez et al., 2018). Of the three types of spoofing attacks, playback attacks pose the most serious threat, providing an effective means of spoofing the security system while requiring very little technical knowledge or skill to execute (Alegre et al., 2014a;Evans et al., 2014b;Lindberg and Blomberg, 1999;Kinnunen et al., 2017). ...
... The development of methods to detect playback attacks has attracted the interest of many researchers in recent years (Bredin et al., 2006;Shang and Stevenson, 2008a;2008b;Greenhall and Atlas, 2010;Malik, 2012;Villalba and Lleida, 2011;Wang et al., 2011;Galka et al., 2015;Yamagishi et al., 2017;Wu et al., 2017;Gonzalez-Rodriguez et al., 2018). One approach focuses on detecting the distortion associated with the recording and playback devices that are used to execute a playback attack (Greenhall and Atlas, 2010;Villalba and Lleida, 2011;Wang et al., 2011;Kinnunen et al., 2017). ...
... Previous publications suggest that a more suitable approach for applications such as telephone banking is to detect "identical" utterances (Shang and Stevenson, 2008a;Galka et al., 2015;Gonzalez-Rodriguez et al., 2018). This approach, termed copy-detection, takes advantage of the uniqueness of each utterance to detect playback attacks. ...
In this paper, a new feature set is proposed for use in a playback attack detector (PAD) aimed at safeguarding a passphrase and speaker-verified protected system that can be remotely accessed from an arbitrary location using an arbitrary telecommunication channel. The new feature set, termed VoicedTracks, is a time-frequency map of the most robust harmonic trajectories in an utterance and serves as an audio fingerprint that can uniquely identify an utterance despite a moderate amount of noise and channel distortion. Experimental results are obtained using a specially designed in-house database; the impact of various noise types and SNR levels is further investigated using a publicly available database. An analysis of playback scores across several combinations of telecommunication channel types, playback devices and additive noise demonstrates robustness of the feature set to channel distortion and additive noise, thus making it suitable for use in a copy-detection based PAD (cd-PAD) designed for applications such as telephone banking. Relative to other cd-PADs the proposed approach was better able to defend against playback attacks when telephone channels were involved. An analysis of its performance across the replay configurations used in the ASVspoof 2017 V2 evaluation set suggests that the proposed cd-PAD is highly effective in detecting those playback attacks that are most likely to spoof the speaker verification system.
... Later, such methods are applied to replay detection in textindependent speaker verification in terms of average spectral bitmap models [15]. Another study based on spectral features and score normalization was carried out for replay speech detection in [16]. ...
Replay attacks have been proven to be a potential threat to practical automatic speaker verification systems. In this work, we explore a novel feature based on spectral entropy for the detection of replay attacks. The spectral entropy is a measure to capture spectral distortions and flatness. It is found that the replay speech carries artifacts in the process of recording and playback. We hypothesize that spectral entropy can be a useful information to capture such artifacts. In this regard, we explore multi-band spectral entropy feature for replay attack detection. The studies are conducted on ASVspoof 2017 Version 2.0 database that deals with replay speech attacks. A baseline system with popular constant-Q cepstral coefficient (CQCC) feature is also developed. Finally, a combined system is proposed with multi-band spectral entropy and CQCC features that outperforms the baseline. The experiments validate the idea of multi-band spectral entropy feature.