Sonal Joshi's research while affiliated with Johns Hopkins University and other places

Publications (12)

Preprint
Full-text available
Adversarial attacks are a threat to automatic speech recognition (ASR) systems, and it becomes imperative to propose defenses to protect them. In this paper, we perform experiments to show that K2 conformer hybrid ASR is strongly affected by white-box adversarial attacks. We propose three defenses--denoiser pre-processor, adversarially fine-tuning...
Preprint
Full-text available
Adversarial attacks pose a severe security threat to the state-of-the-art speaker identification systems, thereby making it vital to propose countermeasures against them. Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adve...
Article
Adversarial examples are designed to fool the speaker recognition (SR) system by adding a carefully crafted human-imperceptible noise to the speech signals. Posing a severe security threat to state-of-the-art SR systems, it becomes vital to deep-dive and study their vulnerabilities. Moreover, it is of greater importance to propose countermeasures t...
Preprint
Adversarial attacks have become a major threat for machine learning applications. There is a growing interest in studying these attacks in the audio domain, e.g, speech and speaker recognition; and find defenses against them. In this work, we focus on using representation learning to classify/detect attacks w.r.t. the attack algorithm, threat model...
Preprint
Full-text available
In this study, we analyze the use of speech and speaker recognition technologies and natural language processing to detect Alzheimer disease (AD) and estimate mini-mental status evaluation (MMSE) scores. We used speech recordings from Interspeech 2021 ADReSSo challenge dataset. Our work focuses on adapting state-of-the-art speaker recognition and l...
Preprint
The ubiquitous presence of machine learning systems in our lives necessitates research into their vulnerabilities and appropriate countermeasures. In particular, we investigate the effectiveness of adversarial attacks and defenses against automatic speech recognition (ASR) systems. We select two ASR models - a thoroughly studied DeepSpeech model an...
Preprint
Full-text available
Research in automatic speaker recognition (SR) has been undertaken for several decades, reaching great performance. However, researchers discovered potential loopholes in these technologies like spoofing attacks. Quite recently, a new genre of attack, termed adversarial attacks, has been proved to be fatal in computer vision and it is vital to stud...

Citations

... Conv-TasNet [48]: We use off-the-shelf Conv-TasNet for the feedforward model for deep regression and the generator for GANs. It is a time-domain model that has been used for other tasks like speech enhancement [49], [50] and source separation [48]. It consists of encoder, separator, and decoder. ...
... These results further verify the effectiveness of using pretrained models. (Samangouei et al., 2018) and Joint Adversarial Finetuning (Joshi et al., 2022). For DefenseGAN, which is originally designed to defend against adversarial images by finding the optimal noise that generates the most similar image to the adversarial counterpart, we adopt it to the audio domain, choosing WaveGAN (Donahue et al., 2018) as the GAN model in this pipeline. ...
... Passive defense does not modify the ASV model, instead, it defends against adversarial attacks by a mitigation or detection component. For example, the works in [25], [26], [27] proposed to remove the adversarial noise with the adversarial separation network, Parallel-Wave-GAN (PWG) module, and cascaded self-supervised learning based reformer (SSLR), respectively. Wu et al. [28] also employed a voting strategy with random sampling to mitigate the adversarial attacks. ...
... Cummins et al. [36] and Rohanian et al. [37] combined public acoustic features with neural networks and achieved an accuracy of 70.8 and 66.6%, respectively. Acoustic embeddings as speech features started to attract the attention of numerous researchers, and have gained good performance in AD detection [38,[40][41][42][43]]. There appears to be a trade-off in accuracy and convenience. ...
... We focus on attacks on speaker recognition systems, particularly on the state-of-the-art x-vector based system. Our previous work [15] proposed to use embeddings obtained by representation learning of adversarial examples as attack signatures to retrieve information about the adversarial attack. This information includes attack algorithm type and threat model 1 , Signal to Adversarial noise ratio, etc. could help in knowing about attacker identity and intention. ...