Samyak Shah’s research while affiliated with Johns Hopkins Medicine and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (8)


Online click detection pipeline
a The participant was seated upright with his forearms on the armrests of a chair facing a computer monitor where the switch scanning speller application was displayed. b Position of both 64-electrode grids overlayed on the left cortical surface of a virtual reconstruction of the participant’s brain. The dorsal and ventral grids primarily covered cortical upper limb and face regions, respectively. The electrodes are numbered in increasing order from left to right and from bottom to top. Magenta: pre-central gyrus; Orange: post-central gyrus. c ECoG voltage signals were streamed in 100 ms packets to update a 256 ms running buffer for online spectral pre-processing. A sample of signals from 20 channels is shown. d A Fast Fourier Transform filter was used to compute the spectral power of the 256 ms buffer, from which the high gamma (HG, 110-170 Hz) log-power was placed into a 1 s running buffer (10 feature vectors). e The running buffer was then used as time history for the recurrent neural network (RNN). f An RNN-FC (RNN-fully connected) network then predicted rest or grasp every 100 ms depending on the higher output probability. g Each classification result was stored as a vote in a 7-vote running buffer such that the number of grasp votes had to surpass a predetermined voting threshold (4-vote threshold shown) to initiate a click. A lock-out period of 1 s immediately followed every detected click to prohibit multiple clicks from occurring during the same attempted movement. Transparent images of the hand are shown to indicate the attempted grasp before a click and a relaxed configuration otherwise. h Once a click was detected, the switch scanning speller application selected the highlighted row or element within that row. Two clicks were necessary to type a letter or autocomplete a word. The example sentence shown is “the birch canoe slid on the smooth planks.”
Long-term use of a fixed click detector
a Training data was collected during 4 sessions that occurred within a period of 16 days. For each day, each sub-bar represents a separate block of training data collection (6 training blocks total). b Using the fixed detector, one block of switch scanning with the communication board was performed +21 days post-training data collection (purple). From Day +46 to Day +81, the fixed detector was used for switch scan spelling with a 7-vote threshold (blue). From Day +81 to Day +111, the fixed detector was used for switch scan spelling with a 4-vote threshold (teal). For each day, each sub-bar represents a separate spelling block of 3-4 sentences. The horizontal axis spanning both (a) and (b) represents the number of days relative to the last day of training data collection (Day 0).
Switch scanning applications
The participant was instructed to select an experimenter-cued graphical button (a) or to spell the sentence prompt (pale gray text) (b) by timing his clicks to the appropriate highlighted row or column during the switch scanning cycle. For a detailed description of (a) and (b), refer to Supplementary Figs. 8 and 9, respectively.
Long-term switch scan spelling performance
Across all subplots, triangular and circular markers represent metrics using a 7-vote and 4-vote voting threshold, respectively. a Sensitivity of click detection for each session. b True positive and false positive frequencies (TPF and FPF, which are represented by blue and green markers, respectively) were measured as detections per minute. c Latencies of grasp onset to correct algorithm detection (green markers) and on-screen click (blue markers). For each of the sessions using a 7-vote threshold, there were 284, 182, 372, 264, 451, 382, 233, 453, and 292 latency measurements, respectively. For each of the sessions using a 4-vote threshold, there were 135, 513, 591, 209, 289, 547, 511, 579, and 466 latency measurements, respectively. Mean latencies are shown as triangular or circular markers that are overlayed on top of box-and-whisker plots. The distribution of latencies across all spelling blocks in a session was used to compute the mean latency and box-and-whisker plot for that session. For each box-and-whisker plot, the median is shown as the center line, the quartiles are shown as the top and bottom edges of the box, and the whiskers are shown at 1.5 times the interquartile range. Using 7-vote and 4-vote voting thresholds, on-screen clicks happened an average of 207 ms and 203 ms, respectively after detection. Note that algorithmic detection latencies were not registered in the first six sessions. d Characters per minute (CPM) are assessed in terms of correct and wrong characters per minute (CCPM and WCPM, which are represented by blue and green markers, respectively). e Correct words per minute (CWPM).
Channel importance for grasp classification
Saliency maps for the model used online, a model using HG features from all channels except from channel 112, and a model using HG features only from channels covering cortical hand-knob are shown in (a), (c) and (e), respectively. Channels overlayed with larger and more opaque circles represent greater importance for grasp classification. White and transparent circles represent channels which were not used for model training. Mean confusion matrices from repeated 10-fold CV using models trained on HG features from all channels, all channels except for channel 112, and channels covering only the cortical hand-knob are shown in (b), (d), and (f), respectively. g Box and whisker plot showing the offline classification accuracies from 10 cross-validated testing folds using models with the above-mentioned channel subsets. Specifically, for one model configuration, each dot represents the average accuracy of the same validation fold across 20 repetitions of 10-fold CV (see Channel contributions and offline classification comparisons). Offline classification accuracies from CV-models trained on features from all channels were statistically higher than CV-models trained on features from channels only over cortical hand-knob (* P = 0.015, two-sided Wilcoxon Rank-Sum test with 3-way Holm-Bonferroni correction).
A click-based electrocorticographic brain-computer interface enables long-term high-performance switch scan spelling
  • Article
  • Full-text available

October 2024

·

27 Reads

·

2 Citations

Communications Medicine

Daniel N. Candrea

·

Samyak Shah

·

·

[...]

·

Nathan E. Crone

Background Brain-computer interfaces (BCIs) can restore communication for movement- and/or speech-impaired individuals by enabling neural control of computer typing applications. Single command click detectors provide a basic yet highly functional capability. Methods We sought to test the performance and long-term stability of click decoding using a chronically implanted high density electrocorticographic (ECoG) BCI with coverage of the sensorimotor cortex in a human clinical trial participant (ClinicalTrials.gov, NCT03567213) with amyotrophic lateral sclerosis. We trained the participant’s click detector using a small amount of training data (<44 min across 4 days) collected up to 21 days prior to BCI use, and then tested it over a period of 90 days without any retraining or updating. Results Using a click detector to navigate a switch scanning speller interface, the study participant can maintain a median spelling rate of 10.2 characters per min. Though a transient reduction in signal power modulation can interrupt usage of a fixed model, a new click detector can achieve comparable performance despite being trained with even less data (<15 min, within 1 day). Conclusions These results demonstrate that a click detector can be trained with a small ECoG dataset while retaining robust performance for extended periods, providing functional text-based communication to BCI users.

Download

Iterative alignment discovery of speech-associated neural activity

August 2024

·

2 Reads

·

1 Citation

Journal of Neural Engineering

Objective . Brain–computer interfaces (BCIs) have the potential to preserve or restore speech in patients with neurological disorders that weaken the muscles involved in speech production. However, successful training of low-latency speech synthesis and recognition models requires alignment of neural activity with intended phonetic or acoustic output with high temporal precision. This is particularly challenging in patients who cannot produce audible speech, as ground truth with which to pinpoint neural activity synchronized with speech is not available. Approach . In this study, we present a new iterative algorithm for neural voice activity detection (nVAD) called iterative alignment discovery dynamic time warping (IAD-DTW) that integrates DTW into the loss function of a deep neural network (DNN). The algorithm is designed to discover the alignment between a patient’s electrocorticographic (ECoG) neural responses and their attempts to speak during collection of data for training BCI decoders for speech synthesis and recognition. Main results . To demonstrate the effectiveness of the algorithm, we tested its accuracy in predicting the onset and duration of acoustic signals produced by able-bodied patients with intact speech undergoing short-term diagnostic ECoG recordings for epilepsy surgery. We simulated a lack of ground truth by randomly perturbing the temporal correspondence between neural activity and an initial single estimate for all speech onsets and durations. We examined the model’s ability to overcome these perturbations to estimate ground truth. IAD-DTW showed no notable degradation (<1% absolute decrease in accuracy) in performance in these simulations, even in the case of maximal misalignments between speech and silence. Significance . IAD-DTW is computationally inexpensive and can be easily integrated into existing DNN-based nVAD approaches, as it pertains only to the final loss computation. This approach makes it possible to train speech BCI algorithms using ECoG data from patients who are unable to produce audible speech, including those with Locked-In Syndrome.



Overview of the closed-loop speech synthesizer. (A) Neural activity is acquired from a subset of 64 electrodes (highlighted in orange) from two 8 × 8 ECoG electrode arrays covering sensorimotor areas for face and tongue, and for upper limb regions. (B) The closed-loop speech synthesizer extracts high-gamma features to reveal speech-related neural correlates of attempted speech production and propagates each frame to a neural voice activity detection (nVAD) model (C) that identifies and extracts speech segments (D). When the participant finishes speaking a word, the nVAD model forwards the high-gamma activity of the whole extracted sequence to a bidirectional decoding model (E) which estimates acoustic features (F) that can be transformed into an acoustic speech signal. (G) The synthesized speech is played back as acoustic feedback.
Evaluation of the synthesized words. (A) Visual example of time-aligned original and reconstructed acoustic speech waveforms and their spectral representations (B) for 6 words that were recorded during one of the closed-loop sessions. Speech spectrograms are shown between 100 and 8000 Hz with a logarithmic frequency range to emphasize formant frequencies. (C) The confusion matrix between human listeners and ground truth. (D) Distribution of accuracy scores from all who performed the listening test for the synthesized speech samples. Dashed line shows chance performance (16.7%).
Changes in high-gamma activity across motor, premotor and somatosensory cortices trigger detection of speech output. (A) Saliency analysis shows that changes in high-gamma activity predominantly from 300 to 100 ms prior to predicted speech onset (PSO) strongly influenced the nVAD model’s decision. Electrodes covering motor, premotor and somatosensory cortices show the impact of model decisions, while electrodes covering the dorsal laryngeal area only modestly added information to the prediction. Grey electrodes were either not used, bad channels or had no notable contributions. (B) Illustration of the general procedure on how relevance scores were computed. For each time step t, relevance scores were computed by backpropagation through time across all previous high-gamma frames Xt. Predictions of 0 correspond to no-speech, while 1 represents speech frames. (C) Temporal progression of mean magnitudes of the absolute relevance score in 3 selected channels that strongly contributed to PSOs. Shaded areas reflect the standard error of the mean (N = 60). Units of the relevance scores are in 10–3.
System overview of the closed-loop architecture. The computational graph is designed as a directed acyclic network. Solid shapes represent ezmsg units, dotted ones represent initialization parameters. Each unit is responsible for a self-contained task and distributes their output to all its subscribers. Logger units run in separate processes to not interrupt the main processing chain for synthesizing speech.
Online speech synthesis using a chronically implanted brain–computer interface in an individual with ALS

April 2024

·

185 Reads

·

14 Citations

Brain–computer interfaces (BCIs) that reconstruct and synthesize speech using brain activity recorded with intracranial electrodes may pave the way toward novel communication interfaces for people who have lost their ability to speak, or who are at high risk of losing this ability, due to neurological disorders. Here, we report online synthesis of intelligible words using a chronically implanted brain-computer interface (BCI) in a man with impaired articulation due to ALS, participating in a clinical trial (ClinicalTrials.gov, NCT03567213) exploring different strategies for BCI communication. The 3-stage approach reported here relies on recurrent neural networks to identify, decode and synthesize speech from electrocorticographic (ECoG) signals acquired across motor, premotor and somatosensory cortices. We demonstrate a reliable BCI that synthesizes commands freely chosen and spoken by the participant from a vocabulary of 6 keywords previously used for decoding commands to control a communication board. Evaluation of the intelligibility of the synthesized speech indicates that 80% of the words can be correctly recognized by human listeners. Our results show that a speech-impaired individual with ALS can use a chronically implanted BCI to reliably produce synthesized words while preserving the participant’s voice profile, and provide further evidence for the stability of ECoG for speech-based BCIs.


Schematics of the speech BCI for functional control. a) Neural signals were acquired from two 64‐channel ECoG arrays implanted over the motor and somatosensory areas responsible for upper extremity and speech functions. Only the inferior array was used in this study. b) A sample of high gamma energy (HGE, 70–170 Hz, z‐scored) for six channels. c) 1‐s rolling average of channel‐averaged HGE (updated every 10 ms). The peak of this signal was used to detect speech intent. Once the intended speech was detected, a decoding window consisting of HGE 2 s before and 0.5 s after the peak was sent to the classifier. d) The CNN model (InceptionTime[²⁵]) classified the window of HGE into commands that facilitated navigation of a communication board or control of external devices.
Stable performance of the BCI in online self‐paced experiments over 3 months. a) Online accuracy of the BCI system. Each dot represents one session. Average chance = 16.16% (n = 10000 simulations, dashed line). The blue line is the linear least squares regression line between accuracy and days after implant. b) Correct decoding results performed by the BCI per minute. Each dot represents one session. The blue line is the linear least squares regression line between correct decodes per minute and days after implant. c) Number of false detections (blue dot) and missed detections (purple triangle) per minute. Each symbol represents one experiment session. d) Time between speech offset and when the decoding result was registered by the BCI system for every successful decode per day. For all boxplots, the center line represents the median, top and bottom edges of the box represent quantiles. Data outside of 1.5 times of interquartile range were shown as outlier data points, and the maximum and minimum of non‐outliers were shown as whiskers.
Stability of the event‐related high gamma activities acquired from the ECoG arrays. a) Anatomical location of the ECoG array used in this study. Example channels in (b) are highlighted. b) Examples of event‐related HGE in both training and real‐time usage phases for two different commands. A vertical dotted line at 0 s indicates speech onset. The shaded area represents 95% CI. CB: Communication Board (real‐time usage). WP: Word Production (training data). c) Correlation between the real‐time usage trials and average training data per channel. For each real‐time usage trial, the Pearson's correlation coefficient between its HGE and the average HGE of the corresponding command in training data collection phase was calculated. Each dot represents the average (weighted against the frequency of the command) of the correlation coefficients per usage day per channel. The blue line represents the linear least squares regression line between channel‐averaged correlation and days after implant. d) Rate of change for correlation. Each dot represents one channel. Filled dots represent statistically significant linear relationships between correlation values and days after implant (p <0.05, Wald test with t‐distribution). Unfilled dots indicate that a relationship could not be established (p > = 0.05). e) Channel‐average of logarithmic HGE (unnormalized) for each command during online usage. Lines represent the linear least squares regression lines between HGE and days after implant for each command. f) Same as (d), but for HGE.
Electrode contribution during the study period. a) MRI reconstruction of the participant's brain, overlaid on top of which are the ECoG grids implanted as part of the clinical trial. Electrodes used in this study are colored in red (motor) and blue (sensory). The grey electrodes were not used in this study. b) Simulated online accuracy when the decoding model is trained with both motor and sensory electrodes, only motor electrodes, only sensory electrodes, and only the most salient electrode. Chance = 16.67% (shown as dashed line). Each box corresponds to the accuracy for n = 33 testing days (****p < 0.0001, Mann–Whitney‐Wilcoxon test two‐sided with Bonferroni correction). c) Relative contribution of each of the electrodes to the decoding results for each real‐time usage month.
Performance in functional control and during silent (mimed) speech. a) Online accuracy of the BCI system during functional control. Each data point represents one session. Chance = 16.67% (dashed line). The blue line is the linear least squares regression line between accuracy and days after implant. b) Correct decoding results performed by the BCI per minute during functional control. Each dot represents one session. The blue line is the linear least squares regression line between correct decodes per minute and days after implant. c) Number of false detections (blue dots) and missed detections (purple triangles) per minute during function control. Each symbol represents one session. d) Online accuracy of silent speech decoder. Each dot represents one day. Average chance = 16.73% (dashed line, n = 10000 simulations). The purple line represents the linear least squares regression between accuracy and days after implant. e) Correct decoding results performed by the BCI per minute using the silent speech decoder. Each dot represents one day. The purple line is the linear least squares regression line between correct decodes per minute and days after implant.
Stable Decoding from a Speech BCI Enables Control for an Individual with ALS without Recalibration for 3 Months

October 2023

·

134 Reads

·

26 Citations

Brain‐computer interfaces (BCIs) can be used to control assistive devices by patients with neurological disorders like amyotrophic lateral sclerosis (ALS) that limit speech and movement. For assistive control, it is desirable for BCI systems to be accurate and reliable, preferably with minimal setup time. In this study, a participant with severe dysarthria due to ALS operates computer applications with six intuitive speech commands via a chronic electrocorticographic (ECoG) implant over the ventral sensorimotor cortex. Speech commands are accurately detected and decoded (median accuracy: 90.59%) throughout a 3‐month study period without model retraining or recalibration. Use of the BCI does not require exogenous timing cues, enabling the participant to issue self‐paced commands at will. These results demonstrate that a chronically implanted ECoG‐based speech BCI can reliably control assistive devices over long time periods with only initial model training and calibration, supporting the feasibility of unassisted home use.


Figure 1
Figure 2
Figure 4
A click-based electrocorticographic brain-computer interface enables long-term high-performance switch-scan spelling

July 2023

·

69 Reads

·

2 Citations

Background Brain-computer interfaces (BCIs) can restore communication in movement- and/or speech-impaired individuals by enabling neural control of computer typing applications. Single command “click” decoders provide a basic yet highly functional capability. Methods We sought to test the performance and long-term stability of click-decoding using a chronically implanted high density electrocorticographic (ECoG) BCI with coverage of the sensorimotor cortex in a human clinical trial participant (ClinicalTrials.gov, NCT03567213) with amyotrophic lateral sclerosis (ALS). We trained the participant’s click decoder using a small amount of training data (< 44 minutes across four days) collected up to 21 days prior to BCI use, and then tested it over a period of 90 days without any retraining or updating. Results Using this click decoder to navigate a switch-scanning spelling interface, the study participant was able to maintain a median spelling rate of 10.2 characters per min. Though a transient reduction in signal power modulation interrupted testing with this fixed model, a new click decoder achieved comparable performance despite being trained with even less data (< 15 min, within one day). Conclusion These results demonstrate that a click decoder can be trained with a small ECoG dataset while retaining robust performance for extended periods, providing functional text-based communication to BCI users.


Online speech synthesis using a chronically implanted brain-computer interface in an individual with ALS

July 2023

·

116 Reads

·

4 Citations

Recent studies have shown that speech can be reconstructed and synthesized using only brain activity recorded with intracranial electrodes, but until now this has only been done using retrospective analyses of recordings from able-bodied patients temporarily implanted with electrodes for epilepsy surgery. Here, we report online synthesis of intelligible words using a chronically implanted brain-computer interface (BCI) in a clinical trial participant (ClinicalTrials.gov, NCT03567213 ) with dysarthria due to amyotrophic lateral sclerosis (ALS). We demonstrate a reliable BCI that synthesizes commands freely chosen and spoken by the user from a vocabulary of 6 keywords originally designed to allow intuitive selection of items on a communication board. Our results show for the first time that a speech-impaired individual with ALS can use a chronically implanted BCI to reliably produce synthesized words that are intelligible to human listeners while preserving the participants voice profile.


Seizure triggers identified postictally using a smart watch reporting system

January 2022

·

45 Reads

·

5 Citations

Epilepsy & Behavior

Persons with epilepsy (PWE) often report that seizure triggers can influence the occurrence and timing of seizures. Some previous studies of seizure triggers have relied on retrospective daily seizure diaries or surveys pertaining to all past seizures, recent and/or remote, in respondents. To assess the characteristics of seizure triggers at the granularity of individual seizures, we used a seizure-tracking app, called EpiWatch, on a smart watch system (Apple Watch and iPhone) in a national study of PWE. Participants tracked seizures during a 16-month study period using the EpiWatch app. Seizure tracking was initiated during a pre-ictal state or as the seizure was occurring and included collection of biosensor data, responsiveness testing, and completion of an immediate post-seizure survey. The survey evaluated seizure types, auras or warning symptoms, loss of awareness, use of rescue medication, and seizure triggers for each tracked seizure. Two hundred and thirty four participants tracked 2493 seizures. Ninety six participants reported triggers in 650 seizures: stress (65.8%), lack of sleep (30.5%), menstrual cycle (19.7%), and overexertion (18%) were the most common. Participants often reported having multiple combined triggers, frequent stress with lack of sleep, overexertion, or menses. Participants who reported triggers were more likely to be taking 3 or more anti-seizure medications compared to participants who did not report triggers. Participants were able to interact with the app and use mobile technology in this national study to record seizures and report common seizure triggers. These findings demonstrate the promise of longitudinal, self-reported data to improve our understanding of epilepsy and its related comorbidities.

Citations (5)


... For the past few decades, a major focus of the BCI field has been decoding neural activity associated with attempted movements to control a computer cursor. [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17] By controlling a computer cursor with their neural signals, a person with paralysis can type sentences using an on-screen keyboard, send emails and text messages, or use a web browser and many other software outcomes; instead, it describes scientific and engineering discoveries that were made using data collected in the context of the ongoing clinical trial. ...

Reference:

Speech motor cortex enables BCI cursor control and click
A click-based electrocorticographic brain-computer interface enables long-term high-performance switch scan spelling

Communications Medicine

... Speech production is a highly complex process involving an extensive and interconnected network of brain regions 2,3 . Yet, the current state-of-the-art speech BCIs are primarily focused on decoding signals from the sensorimotor cortex [4][5][6][7][8][9] . While this region has shown promise in enabling communication for patients with dysarthria or anarthria due to amyotrophic lateral sclerosis (ALS) or a brain-stem stroke, it may not cover the full intent of the user 10 nor translate to individuals with different diseases or disease progressions. ...

Online speech synthesis using a chronically implanted brain–computer interface in an individual with ALS

... Speech production is a highly complex process involving an extensive and interconnected network of brain regions 2,3 . Yet, the current state-of-the-art speech BCIs are primarily focused on decoding signals from the sensorimotor cortex [4][5][6][7][8][9] . While this region has shown promise in enabling communication for patients with dysarthria or anarthria due to amyotrophic lateral sclerosis (ALS) or a brain-stem stroke, it may not cover the full intent of the user 10 nor translate to individuals with different diseases or disease progressions. ...

Stable Decoding from a Speech BCI Enables Control for an Individual with ALS without Recalibration for 3 Months

... In the late stages of their disease, they become completely locked in and may benefit from brain-computer interfaces (BCIs) to communicate [2]. Among various types of BCIs, speech neuroprostheses measure the electrical activity in brain areas involved in language production, decode activation patterns during attempted utterances, and convert predictions into useful commands to restore communication [3][4][5]. In this study, we focus on the use of intracranial electrodes, which record the summed local field potentials arising from underlying neuronal populations [6]. ...

Online speech synthesis using a chronically implanted brain-computer interface in an individual with ALS

... As a diagnostic tool, smartphones have been used in Neurology and led to innovations in the longitudinal care of patients for a wide range of disorders including stroke, multiple sclerosis, Parkinson's disease, sleep disorders, and epilepsy (8,9,26). Looking at publications on Neurology and smartphone use over time, research was initially concentrated on brain tumor and safety risks attributed to smartphones (27-29), but transitioned to focus on the application of this technology for disease detection, clinical management, and as a tool for applied medical science more broadly (17)(18)(19)(20)(21)(22)(23)(24)(25). Emerging trends have become increasingly evident by elucidating the exponential rise in annual number of publications during the COVID-19 pandemic in developed and developing (A) Keyword co-occurrence network. ...

Seizure triggers identified postictally using a smart watch reporting system
  • Citing Article
  • January 2022

Epilepsy & Behavior