Rita Singh’s research while affiliated with Carnegie Mellon University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


Krait: A Backdoor Attack Against Graph Prompt Tuning
  • Preprint

July 2024

·

7 Reads

·

Rita Singh

·

Balaji Palanisamy

Graph prompt tuning has emerged as a promising paradigm to effectively transfer general graph knowledge from pre-trained models to various downstream tasks, particularly in few-shot contexts. However, its susceptibility to backdoor attacks, where adversaries insert triggers to manipulate outcomes, raises a critical concern. We conduct the first study to investigate such vulnerability, revealing that backdoors can disguise benign graph prompts, thus evading detection. We introduce Krait, a novel graph prompt backdoor. Specifically, we propose a simple yet effective model-agnostic metric called label non-uniformity homophily to select poisoned candidates, significantly reducing computational complexity. To accommodate diverse attack scenarios and advanced attack types, we design three customizable trigger generation methods to craft prompts as triggers. We propose a novel centroid similarity-based loss function to optimize prompt tuning for attack effectiveness and stealthiness. Experiments on four real-world graphs demonstrate that Krait can efficiently embed triggers to merely 0.15% to 2% of training nodes, achieving high attack success rates without sacrificing clean accuracy. Notably, in one-to-one and all-to-one attacks, Krait can achieve 100% attack success rates by poisoning as few as 2 and 22 nodes, respectively. Our experiments further show that Krait remains potent across different transfer cases, attack types, and graph neural network backbones. Additionally, Krait can be successfully extended to the black-box setting, posing more severe threats. Finally, we analyze why Krait can evade both classical and state-of-the-art defenses, and provide practical insights for detecting and mitigating this class of attacks.


Figure 2. Approximating the vocal folds with mass-spring oscillators in the phonation process. Airflow from the lungs, driven by the subglottal pressure P s , passes through the glottis, and vocal folds are set into a state of self-sustained vibration, producing the glottal flow u g which is a quasiperiodic pressure wave. The vibration of vocal folds is analogous to a pair of mass-spring-damper oscillators. Further, the glottal flow resonates in the speaker's vocal tract and nasal tract and produces voiced sound.
Figure 4. The VFO model models the generation of glottal signals by the movements of the vocal folds. The VT model models the transformation of the glottal signal generated by the vocal folds to the final voice signal. The joint VFO-VT model combines the two, using the output of the VFO model as the input to the VT model. ADLES compares the glottal signal u 0 (t) generated by the VFO model to a reference glottal signal u g (t) to estimate VFO parameters. ADLES-VFT compares the output of the joint model, u L (t), to a reference signal u m (t) obtained from an actual voice recording, to estimate both VFO and VT parameters. The output of the VFO model is the desired vocal-fold oscillation.
Figure 5. (a) A 3D Bifurcation diagram of the asymmetric vocal fold model. The third dimension is perpendicular to the parameter plane shown, and depicts the entrainment ratio n : m (encoded in different shades of gray) as a function of model parameters α and ∆, where n and m are the number of intersections of the orbits of right and left oscillators across the Poincaré section ˙ ξ r,l = 0 at stable status. This is consistent with the theoretical results in [24]); (b) Phase-space trajectories (or phase portraits) corresponding to the points A (left panel), B (center panel) and C (right panel). The horizontal axis is displacement of a vocal fold, and the vertical axis is its velocity.
Figure 6. Phase portraits showing the coupling of the left and right oscillators (ADLES-based estimation) for (a) normal speech: 1 limit cycle, (b) neoplasm: 1 limit cycle, (c) phonotrauma: 2 limit cycles, (d) vocal palsy: limit torus. The convergence trajectory is also shown, and the limit cycles can be observed as the emergent geometries in these plots.
Figure 7. Glottal flows from inverse filtering and ADLES estimation for (a) normal speech (control), (b) neoplasm, (c) phonotrauma, and (d) vocal palsy.

+3

Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation
  • Article
  • Full-text available

July 2023

·

143 Reads

·

2 Citations

Entropy

During phonation, the vocal folds exhibit a self-sustained oscillatory motion, which is influenced by the physical properties of the speaker’s vocal folds and driven by the balance of bio-mechanical and aerodynamic forces across the glottis. Subtle changes in the speaker’s physical state can affect voice production and alter these oscillatory patterns. Measuring these can be valuable in developing computational tools that analyze voice to infer the speaker’s state. Traditionally, vocal fold oscillations (VFOs) are measured directly using physical devices in clinical settings. In this paper, we propose a novel analysis-by-synthesis approach that allows us to infer the VFOs directly from recorded speech signals on an individualized, speaker-by-speaker basis. The approach, called the ADLES-VFT algorithm, is proposed in the context of a joint model that combines a phonation model (with a glottal flow waveform as the output) and a vocal tract acoustic wave propagation model such that the output of the joint model is an estimated waveform. The ADLES-VFT algorithm is a forward-backward algorithm which minimizes the error between the recorded waveform and the output of this joint model to estimate its parameters. Once estimated, these parameter values are used in conjunction with a phonation model to obtain its solutions. Since the parameters correlate with the physical properties of the vocal folds of the speaker, model solutions obtained using them represent the individualized VFOs for each speaker. The approach is flexible and can be applied to various phonation models. In addition to presenting the methodology, we show how the VFOs can be quantified from a dynamical systems perspective for classification purposes. Mathematical derivations are provided in an appendix for better readability.

Download

Figure 1. Voice chains of different levels. In this reversed perspective ideogram, a link depicts a pathway, the black dots on it are genes, and the chromosomes that they belong to are shown as rods where relevant. The lines connecting genes are only meant for visual clarity. Chains are formed with respect to genes on a chromosome (a microdeletion region in this case, shown shaded in yellow). In this ideogram, Gene-1 and FOXP2 lie on the same pathway, contributing to a level-1 voice chain. In a level-2 chain, FOXP2 and the microdeletion region are on different pathways, but the pathways share a set of genes. Gene 2 and Gene 3 have level-2 voice chains.
(a) Code distance between the (sets of) chainlink counts for different types of speech disorders. (b) Code distance between the (sets of) chainlink connectivities for diferrent types of speech disorders.
A Gene-Based Algorithm for Identifying Factors That May Affect a Speaker’s Voice

June 2023

·

49 Reads

·

1 Citation

Entropy

Over the past decades, many machine-learning- and artificial-intelligence-based technologies have been created to deduce biometric or bio-relevant parameters of speakers from their voice. These voice profiling technologies have targeted a wide range of parameters, from diseases to environmental factors, based largely on the fact that they are known to influence voice. Recently, some have also explored the prediction of parameters whose influence on voice is not easily observable through data-opportunistic biomarker discovery techniques. However, given the enormous range of factors that can possibly influence voice, more informed methods for selecting those that may be potentially deducible from voice are needed. To this end, this paper proposes a simple path-finding algorithm that attempts to find links between vocal characteristics and perturbing factors using cytogenetic and genomic data. The links represent reasonable selection criteria for use by computational by profiling technologies only, and are not intended to establish any unknown biological facts. The proposed algorithm is validated using a simple example from medical literature—that of the clinically observed effects of specific chromosomal microdeletion syndromes on the vocal characteristics of affected people. In this example, the algorithm attempts to link the genes involved in these syndromes to a single example gene (FOXP2) that is known to play a broad role in voice production. We show that in cases where strong links are exposed, vocal characteristics of the patients are indeed reported to be correspondingly affected. Validation experiments and subsequent analyses confirm that the methodology could be potentially useful in predicting the existence of vocal signatures in naïve cases where their existence has not been otherwise observed.


GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content

May 2023

·

35 Reads

·

3 Citations

Yutian Chen

·

Hao Kang

·

Vivian Zhai

·

[...]

·

Bhiksha Ramakrishnan

This paper presents a novel approach for detecting ChatGPT-generated vs. human-written text using language models. To this end, we first collected and released a pre-processed dataset named OpenGPTText, which consists of rephrased content generated using ChatGPT. We then designed, implemented, and trained two different models for text classification, using Robustly Optimized BERT Pretraining Approach (RoBERTa) and Text-to-Text Transfer Transformer (T5), respectively. Our models achieved remarkable results, with an accuracy of over 97% on the test dataset, as evaluated through various metrics. Furthermore, we conducted an interpretability study to showcase our model's ability to extract and differentiate key features between human-written and ChatGPT-generated text. Our findings provide important insights into the effective use of language models to detect generated text.

Citations (3)


... In the recent past, numerous mechanical, mathematical and computational models have been developed for analysing of the vocal fold oscillation, either separated or not from the vocal tract dynamics. The various aims pursued ranged from, e.g., deriving the VF oscillations directly from recorded voice signals to explaining mechanobiological processes at cellular and molecular levels [11][12][13][14]. ...

Reference:

The Physics of the Human Vocal Folds as a Biological Oscillator
Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation

Entropy

... The model provided mechanistic information on CP pathogenesis. Rita Singh [47] developed an AIbased algorithm using breadth-first analysis to find the significance of FOXP2 in voice disorder. A path-finding algorithm was used to determine the relationship among vocal characteristics and perturbing factors. ...

A Gene-Based Algorithm for Identifying Factors That May Affect a Speaker’s Voice

Entropy

... Authors in [6] presented an ML-based solution to differentiate between human-generated and ChatGPTgenerated text, achieving an accuracy of 77%. Authors in [7] distinguished between human-generated text and that produced by ChatGPT, using the T5 and RoBERTa language models and obtaining over 97% accuracy. Authors in [8] focused on the discrimination of medical texts written by human specialists from those created by ChatGPT, obtaining accuracy over 95%. ...

GPT-Sentinel: Distinguishing Human and ChatGPT Generated Content
  • Citing Preprint
  • May 2023