April 2025
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
Publications (230)
April 2025
·
2 Reads
April 2025
·
13 Reads
March 2025
·
20 Reads
Purpose Phonetic forced alignment has a multitude of applications in automated analysis of speech, particularly in studying nonstandard speech such as children's speech. Manual alignment is tedious but serves as the gold standard for clinical-grade alignment. Current tools do not support direct training on manual alignments. Thus, a trainable speaker adaptive phonetic forced alignment system, Wav2TextGrid, was developed for children's speech. The source code for the method is publicly available along with a graphical user interface at https://github.com/pkadambi/Wav2TextGrid. Method We propose a trainable, speaker-adaptive, neural forced aligner developed using a corpus of 42 neurotypical children from 3 to 6 years of age. Evaluation on both child speech and on the TIMIT corpus was performed to demonstrate aligner performance across age and dialectal variations. Results The trainable alignment tool markedly improved accuracy over baseline for several alignment quality metrics, for all phoneme categories. Accuracy for plosives and affricates in children's speech improved more than 40% over baseline. Performance matched existing methods using approximately 13 min of labeled data, while approximately 45–60 min of labeled alignments yielded significant improvement. Conclusion The Wav2TextGrid tool allows alternate alignment workflows where the forced alignments, via training, are directly tailored to match clinical-grade, manually provided alignments. Supplemental Material https://doi.org/10.23641/asha.28593971
March 2025
·
6 Reads
Journal of Clinical and Translational Science
Objectives/Goals: Aspiration causes or aggravates lung diseases. While bedside swallow evaluations are not sensitive/specific, gold standard tests for aspiration are invasive, uncomfortable, expose patients to radiation, and are resource intensive. We propose the development and validation of an AI model that analyzes voice to noninvasively predict aspiration. Methods/Study Population: Retrospectively recorded [i] phonations from 163 unique ENT patients were analyzed for acoustic features including jitter, shimmer, harmonic to noise ratio (HNR), etc. Patients were classified into three groups: aspirators (Penetration-Aspiration Scale, PAS 6–8), probable (PAS 3–5), and non-aspirators (PAS 1–2) based on video fluoroscopic swallow (VFSS) findings. Multivariate analysis evaluated patient demographics, history of head and neck surgery, radiation, neurological illness, obstructive sleep apnea, esophageal disease, body mass index, and vocal cord dysfunction. Supervised machine learning using five folds cross-validated neural additive network modelling (NAM) was performed on the phonations of aspirator versus non-aspirators. The model was then validated using an independent, external database. Results/Anticipated Results: Aspirators were found to have quantifiably worse quality of sound with higher jitter and shimmer but lower harmonics noise ratio. NAM modeling classified aspirators and non-aspirators as distinct groups (aspirator NAM risk score 0.528+0.2478 (mean + std) vs. non-aspirator (control) risk score of 0.252+0.241 (mean + std); p Discussion/Significance of Impact: We report the use of voice as a novel, noninvasive biomarker to detect aspiration risk using machine learning techniques. This tool has the potential to be used for the safe and early detection of aspiration in a variety of clinical settings including intensive care units, wards, outpatient clinics, and remote monitoring.
March 2025
·
38 Reads
Millimeter Wave (mmWave) radar has emerged as a promising modality for speech sensing, offering advantages over traditional microphones. Prior works have demonstrated that radar captures motion signals related to vocal vibrations, but there is a gap in the understanding of the analytical connection between radar-measured vibrations and acoustic speech signals. We establish a mathematical framework linking radar-captured neck vibrations to speech acoustics. We derive an analytical relationship between neck surface displacements and speech. We use data from 66 human participants, and statistical spectral distance analysis to empirically assess the model. Our results show that the radar-measured signal aligns more closely with our model filtered vibration signal derived from speech than with raw speech itself. These findings provide a foundation for improved radar-based speech processing for applications in speech enhancement, coding, surveillance, and authentication.
February 2025
·
18 Reads
Existing methods for analyzing linguistic content from picture descriptions for assessment of cognitive-linguistic impairment often overlook the participant's visual narrative path, which typically requires eye tracking to assess. Spatio-semantic graphs are a useful tool for analyzing this narrative path from transcripts alone, however they are limited by the need for manual tagging of content information units (CIUs). In this paper, we propose an automated approach for estimation of spatio-semantic graphs (via automated extraction of CIUs) from the Cookie Theft picture commonly used in cognitive-linguistic analyses. The method enables the automatic characterization of the visual semantic path during picture description. Experiments demonstrate that the automatic spatio-semantic graphs effectively differentiate between cognitively impaired and unimpaired speakers. Statistical analyses reveal that the features derived by the automated method produce comparable results to the manual method, with even greater group differences between clinical groups of interest. These results highlight the potential of the automated approach for extracting spatio-semantic features in developing clinical speech models for cognitive impairment assessment.
February 2025
·
4 Reads
·
1 Citation
Patterns
January 2025
·
6 Reads
Purpose: This commentary introduces how artificial intelligence (AI) can be leveraged to advance cross-language intelligibility assessment of dysarthric speech. Method: We propose a dual-component framework consisting of a universal module that generates language-independent speech representations and a language-specific intelligibility model that incorporates linguistic nuances. Additionally, we identify key barriers to cross-language intelligibility assessment, including data scarcity, annotation complexity, and limited linguistic insights, and present AI-driven solutions to overcome these challenges. Conclusion: Advances in AI offer transformative opportunities to enhance cross-language intelligibility assessment for dysarthric speech by balancing scalability across languages and adaptability by languages.
January 2025
·
9 Reads
Hypothesis testing is a statistical inference approach used to determine whether data supports a specific hypothesis. An important type is the two-sample test, which evaluates whether two sets of data points are from identical distributions. This test is widely used, such as by clinical researchers comparing treatment effectiveness. This tutorial explores two-sample testing in a context where an analyst has many features from two samples, but determining the sample membership (or labels) of these features is costly. In machine learning, a similar scenario is studied in active learning. This tutorial extends active learning concepts to two-sample testing within this \textit{label-costly} setting while maintaining statistical validity and high testing power. Additionally, the tutorial discusses practical applications of these label-efficient two-sample tests.
Citations (57)
... Valuing and rewarding perceived novelty and potential impact over basic rigor and responsible reporting can lead researchers to inflate claims in hopes of acceptance in the most prestigious venues. It can also skew the literature in other ways, leading to so-called "publication bias" (Saidi, Dasarathy, and Berisha 2024). Here, in addition to spin and the aforementioned selective reporting of (usually positive) results, the role of the peer review system is also in question, given known biases on the part of reviewers that can lead to preferential treatment for researchers from specific regions, institutions, or demographics, or for certain types of research (Lee et al. 2013). ...
- Citing Article
February 2025
Patterns
... Children's ASR remains a considerable difficulty, as evidenced by the decline in performance of contemporary state-ofthe-art systems compared to adult speech. The decline can be partly ascribed to the significant acoustic heterogeneity in the speech of kids resulting from developmental alterations in the language production equipment (Berisha & Liss, 2024). These physical alterations result in variations in formant and basic frequency positions. ...
- Citing Article
- Full-text available
August 2024
npj Digital Medicine
... Therefore, speech assessment may provide clues for differential diagnosis among neurological diseases with differing pathophysiology but similar clinical manifestations. Based on crosssectional design, the digital speech biomarkers were researched mainly in Huntington's disease, multiple sclerosis, cerebellar ataxia, amyotrophic lateral sclerosis, multiple system atrophy, and progressive supranuclear palsy (Neumann et al., 2024;Noffs, 2020;Simmatis, 2023;Stegmann, 2024;Stegmann et al., 2020) (Table 3; see also Supplementary Material for associated references). Interestingly, in Huntington's disease and cerebellar ataxia, subliminal speech impairment has been detected in prodromal periods (Vogel et al., , 2022Kouba et al., 2023). ...
- Citing Article
June 2024
... This may be the result of the richness of the data used 566 in the studies. Also, it is important to remember that due to the complex 567 spatial-temporal data structure, EEG research can often suffer from non-reproducibility 568 due to inadequate data analysis methods and overfitting [92]. 569 HRV focuses on changes in the beat-to-beat interval which are ANS 570 activity-dependent [93,94], while ECG signal morphology should be understood as the 571 shape of the voltage curve over time. ...
- Citing Conference Paper
June 2024
... It necessitates automating CIU annotation to efficiently extract the features from spatiosemantic graphs. Given the Cookie Theft picture's usage in the largest Alzheimer's disease and dementia cohort studies to date, the availability of these automatically extracted spatiosemantic features will serve as a great aid to clinicians in automatically assessing cognitive impairment [18]- [21]. ...
- Citing Article
- Full-text available
June 2024
... The approach presented in this work does not assume knowledge of constraints yet allows recovery of interpretable constraints. This work also deals with error detection in ML models or "metacognition" [17,21,31]. Recent efforts include neurosymbolic methods [7,9,16] that rely on and is a metacognitive approach for detecting errors in the result of a trained machine learning modelˆthat assigns a label for some sample . ...
- Citing Conference Paper
January 2024
... In[110], acoustic-based speech measures-such as the duration ratio of different syllable sequences, variability in syllable duration, and F2 slope-were compared with validated perceptual ratings of coordination, consistency, and speed in ALS speech. The Pearson and Spearman correlation coefficients confirmed the validity of these speech measures for profiling articulatory deficits in motor speech disorders.Similarly, in[111], Yawer et al. evaluated the external validity of a publicly available speech AI tool designed to assess stress. The tool's stress measures were compared with the well-established ...
- Citing Article
- Full-text available
November 2023
... First is the increasing availability of Big Data that enables deployment of data-hungry ML techniques. Big Data sources include large, curated databases such as the AphasiaBank or PhonBank (MacWhinney & Fromm, 2016;Rose & MacWhinney, 2014) as well as more ad hoc data sources that have emerged as speech-language researchers increasingly employ remote monitoring approaches (Cordella et al., 2022;Kadambi et al., 2023;Liu et al., 2023) or pool years of research data on a specific population or theme . These approaches include wearable sensors (Cao et al., 2023;Coyle & Sejdić, 2020;Van Stan et al., 2015), smartphone digital recordings (Connaghan et al., 2019;Kadambi et al., 2023;Van Stan et al., 2017), and ecological momentary assessment (Hester et al., 2023;Marks et al., 2021) to name a few. ...
- Citing Article
- Publisher preview available
August 2023
... When considering the ALS subtype, bulbar or spinal, the analysis showed that individuals with bulbar ALS retain a percentage of this function for a shorter period than those with spinal ALS. One study indicated that only 30% of individuals with ALS begin with bulbar onset (20) , which corroborates the results of our study, considering the sample size, which shows a higher percentage of individuals with spinal ALS. However, bulbar-type subjects experience speech loss over a short period (4) , as bulbar ALS involves the upper motor neurons (UMNs), lower motor neurons (LMNs), or both, located in the cortex and brainstem, causing difficulties in verbal fluency and voice, making speech slower, breathy, and hypernasal (1,5,10) . ...
- Citing Article
June 2023
... Thus, they are adopted as powerful tools for multiple applications, in public health and education. Therefore, the use of LLMs has been investigated for applications in the field of mental health as supportive tools [8] in the medical/clinical field: for diagnosis of mental distress [9], for cognitive impairments such as Alzheimer's disease [10], to predict the mini-mental state examination score related to cognitive impairments [11], for ASD detection [12] and more. ...
- Citing Conference Paper
June 2023