Xianda Hou’s research while affiliated with University of California, Davis and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (6)


Figure 2. Accurate cursor control and click. a. Trial-averaged firing rates (mean ± s.e.) recorded from four example electrodes during the Grid Evaluation Task. Activity is aligned to when the cursor entered the cued target (left), and then to when the click decoder registered a click (right). Firing rates were Gaussian smoothed (std 20 ms) before trial-averaging. b. The Grid Evaluation Task. The participant attempted to move the cursor (white circle) to the cued target (green square) and click on it. c. Location of every click that was performed, relative to the current trial's cued target (central black square), during blocks with the 6x6 grid (left) and with the 14x14 grid (right). Small gray squares indicate where the cursor began each trial, relative to the trial's cued target. d. Timeline of the seventeen 3-minute Grid Evaluation Task blocks. Each point represents a trial and indicates the trial length and trial result (success or failure). Each gray region is a single block. e. T15's online bitrate performance in the Grid Evaluation Task, compared to the highest-performing prior dPCG cursor control study. Circles are individual blocks (only shown for this study). Triangles are averages per participant (from this study and others).
Figure 3. The dorsal 6v array contributed the most to cursor velocity decoding. a. Zoomed-in view of T15's array locations shown in Fig. 1b. Triangles indicate arrays providing the best decoding performance for speech (orange) and for cursor control (crimson). The best speech arrays were identified in Card et al. 2024. b. Offline analysis of cursor decoders trained using neural features from
Figure 5. Participant T15 controlled his personal desktop computer with the cursor BCI. a. Over-the-shoulder view of T15 neurally controlling the mouse cursor on his personal computer. The red arrow points to the cursor. b-c. Screenshots of T15's personal computer usage, with cursor trajectories (pink lines) overlaid. Cursor position every 250 ms (circles) and clicks (stars) are also drawn. In b., T15 first opened the Settings application (left) and then switched his computer to Light Mode (right). In c., T15 opened Netflix from Chrome's New Tab menu (top) and then selected his Netflix user (bottom).
Speech motor cortex enables BCI cursor control and click
  • Preprint
  • File available

November 2024

·

27 Reads

·

Xianda Hou

·

Nicholas S. Card

·

[...]

·

David M. Brandman

Decoding neural activity from ventral (speech) motor cortex is known to enable high-performance speech brain-computer interface (BCI) control. It was previously unknown whether this brain area could also enable computer control via neural cursor and click, as is typically associated with dorsal (arm and hand) motor cortex. We recruited a clinical trial participant with ALS and implanted intracortical microelectrode arrays in ventral precentral gyrus (vPCG), which the participant used to operate a speech BCI in a prior study. We developed a cursor BCI driven by the participant's vPCG neural activity, and evaluated performance on a series of target selection tasks. The reported vPCG cursor BCI enabled rapidly-calibrating (40 seconds), accurate (2.90 bits per second) cursor control and click. The participant also used the BCI to control his own personal computer independently. These results suggest that placing electrodes in vPCG to optimize for speech decoding may also be a viable strategy for building a multi-modal BCI which enables both speech-based communication and computer control via cursor and click.

Download

An instantaneous voice synthesis neuroprosthesis

August 2024

·

49 Reads

·

2 Citations

Brain computer interfaces (BCIs) have the potential to restore communication to people who have lost the ability to speak due to neurological disease or injury. BCIs have been used to translate the neural correlates of attempted speech into text. However, text communication fails to capture the nuances of human speech such as prosody, intonation and immediately hearing one's own voice. Here, we demonstrate a "brain-to-voice" neuroprosthesis that instantaneously synthesizes voice with closed-loop audio feedback by decoding neural activity from 256 microelectrodes implanted into the ventral precentral gyrus of a man with amyotrophic lateral sclerosis and severe dysarthria. We overcame the challenge of lacking ground-truth speech for training the neural decoder and were able to accurately synthesize his voice. Along with phonemic content, we were also able to decode paralinguistic features from intracortical activity, enabling the participant to modulate his BCI-synthesized voice in real-time to change intonation, emphasize words, and sing short melodies. These results demonstrate the feasibility of enabling people with paralysis to speak intelligibly and expressively through a BCI.


An Accurate and Rapidly Calibrating Speech Neuroprosthesis

August 2024

·

34 Reads

·

44 Citations

The New-England Medical Review and Journal

Background: Brain-computer interfaces can enable communication for people with paralysis by transforming cortical activity associated with attempted speech into text on a computer screen. Communication with brain-computer interfaces has been restricted by extensive training requirements and limited accuracy. Methods: A 45-year-old man with amyotrophic lateral sclerosis (ALS) with tetraparesis and severe dysarthria underwent surgical implantation of four microelectrode arrays into his left ventral precentral gyrus 5 years after the onset of the illness; these arrays recorded neural activity from 256 intracortical electrodes. We report the results of decoding his cortical neural activity as he attempted to speak in both prompted and unstructured conversational contexts. Decoded words were displayed on a screen and then vocalized with the use of text-to-speech software designed to sound like his pre-ALS voice. Results: On the first day of use (25 days after surgery), the neuroprosthesis achieved 99.6% accuracy with a 50-word vocabulary. Calibration of the neuroprosthesis required 30 minutes of cortical recordings while the participant attempted to speak, followed by subsequent processing. On the second day, after 1.4 additional hours of system training, the neuroprosthesis achieved 90.2% accuracy using a 125,000-word vocabulary. With further training data, the neuroprosthesis sustained 97.5% accuracy over a period of 8.4 months after surgical implantation, and the participant used it to communicate in self-paced conversations at a rate of approximately 32 words per minute for more than 248 cumulative hours. Conclusions: In a person with ALS and severe dysarthria, an intracortical speech neuroprosthesis reached a level of performance suitable to restore conversational communication after brief training. (Funded by the Office of the Assistant Secretary of Defense for Health Affairs and others; BrainGate2 ClinicalTrials.gov number, NCT00912041.).


Software architecture schematic. (a) BRAND consists of a set of processes, or ‘nodes’, that each receive inputs and/or produce outputs during an experiment. (b) If nodes were run sequentially (as if they are in a script), all nodes would need to finish processing a given sample before the next one could be processed. Delays in any part of the processing chain would cause the whole system to fall behind and delay critical events like acquiring an incoming sample. (c) In BRAND, nodes run in parallel and communicate asynchronously, allowing them to maximize the rate at which data are processed and minimize the chance that delays in downstream nodes would cause the system to fall behind.
BRAND achieves low-latency inter-node communication. (a) To test inter-node communication latency, a publisher node sends 30 kHz neural data (grouped in 1 ms packets) to a subscriber node via the Redis database. Violin plot of the resulting latency measurements showing that the inter-node communication latency is consistently below 600 microseconds even as (b) the channel count is scaled up to 1024 channels, (c) the sampling rate is changed, and (d) additional subscriber nodes are added. Vertical lines indicate the location of the median in each violin. Histograms of these data show the distribution of latency measurements for each (e) channel count, (f) output rate, and (g) number of nodes.
BRAND can be used for low-latency iBCI control. (a) To test end-to-end iBCI control latency, we ran a graph that received 30 kHz 96-channel neural spiking data via UDP (Ethernet) from two Blackrock NSPs (total of 192 channels), extracted spiking features at 1 kHz, binned spikes into 10 millisecond bins, ran decoding, and updated the location of the cursor in the task. This test used a recurrent neural network (RNN) decoder. This graph was benchmarked using simulated data. (b) Latency measurements for each node were plotted as histograms (N = 30 000 packets). (c) The cumulative latency is plotted relative to the time at which each node (vertical axis) wrote its output to the Redis database. On the horizontal axis, zero is the time at which the last sample in each bin was received over the network from the NSPs. (d) Cursor positions during iBCI-enabled cursor control.
BRAND runs ANN latent variable models with low latency. (a) To test the inference latency of LFADS and NDT, we inserted them into an iBCI control graph that receives 256 channels of simulated threshold crossings at 1 kHz, bins them, runs inference with LFADS or NDT, runs decoding, and updates the task state. Both LFADS and NDT used a sequence length of 30 bins. (b) LFADS and NDT use different types of sequence models, an RNN and a Transformer, respectively. Reprinted from Ye and Pandarinath 2021 [19]. (c) NDT inference times were consistently below 2 ms, while LFADS inference times were consistently below 6 ms. Reproduced from [19]. CC BY 4.0.
BRAND enables low-latency, real-time simulation of neural data. (a) In the speech simulator, spoken audio is translated into spectral features and, from there, into neural firing rates with a cosine tuning model. These firing rates are used to generate and broadcast 30 kHz voltage recordings as ethernet packets. In the cursor control simulator, computer mouse movements are translated into neural firing rates with a cosine tuning model. (b) Examples of data recorded from the speech simulator. (c) Examples of data recorded from the cursor control simulator. (d) Latency of each node in the speech simulator and (e) cumulative latency. (f) Latency of each node in the cursor control simulator and (g) cumulative latency.
BRAND: a platform for closed-loop experiments with deep network models

April 2024

·

43 Reads

·

6 Citations

Objective. Artificial neural networks (ANNs) are state-of-the-art tools for modeling and decoding neural activity, but deploying them in closed-loop experiments with tight timing constraints is challenging due to their limited support in existing real-time frameworks. Researchers need a platform that fully supports high-level languages for running ANNs (e.g. Python and Julia) while maintaining support for languages that are critical for low-latency data acquisition and processing (e.g. C and C++). Approach. To address these needs, we introduce the Backend for Realtime Asynchronous Neural Decoding (BRAND). BRAND comprises Linux processes, termed nodes, which communicate with each other in a graph via streams of data. Its asynchronous design allows for acquisition, control, and analysis to be executed in parallel on streams of data that may operate at different timescales. BRAND uses Redis, an in-memory database, to send data between nodes, which enables fast inter-process communication and supports 54 different programming languages. Thus, developers can easily deploy existing ANN models in BRAND with minimal implementation changes. Main results. In our tests, BRAND achieved <600 microsecond latency between processes when sending large quantities of data (1024 channels of 30 kHz neural data in 1 ms chunks). BRAND runs a brain-computer interface with a recurrent neural network (RNN) decoder with less than 8 ms of latency from neural data input to decoder prediction. In a real-world demonstration of the system, participant T11 in the BrainGate2 clinical trial (ClinicalTrials.gov Identifier: NCT00912041) performed a standard cursor control task, in which 30 kHz signal processing, RNN decoding, task control, and graphics were all executed in BRAND. This system also supports real-time inference with complex latent variable models like Latent Factor Analysis via Dynamical Systems. Significance. By providing a framework that is fast, modular, and language-agnostic, BRAND lowers the barriers to integrating the latest tools in neuroscience and machine learning into closed-loop experiments.


Figure 1. Real-time neural decoding of attempted speech. a, Diagram of the brain-to-text speech BCI system. Neural activity is measured from the left ventral precentral gyrus using four 64-electrode Utah arrays and processed into neural features (threshold crossings and spikeband power), temporally binned, and
Figure 3. Offline decoding analyses indicate rapidly-calibrating, stable and generalizable decoding. a, Offline recreation of "day 1" performance for 50-word (red) and 125,000-word (blue) vocabularies with optimal decoding hyperparameters. Word error rate is plotted as a function of the number of training sentences. b, Decoding stability over time with no recalibration or model fine-tuning. Decoders were trained on data from 5 (black) or 10 (gray) sequential sessions, and then evaluated on all future evaluation blocks. Word error rate is
Figure 4. Decoding attempted speech during open conversations. a, Photograph of the participant's BCI interface during self-initiated speech. Sentence construction initiates when any phoneme's RNN output probability surpasses that of silence and concludes after 6 seconds of speech inactivity, or upon SP2's optional activation of an on-screen button via eye tracking. After the decoded sentence was finalized, SP2 used the on-screen confirmation buttons to indicate if the decoded sentence was correct. This photo has been cropped to not include the participant, as per medrXiv policy. b, Sample transcript of a conversation between SP2 and a family member, on the second day of use. c, Evaluating speech decoding accuracy in open conversations (n=925 sentences with known true labels). Average word error rate was 3.7% (95% CI: [3.3%, 4.3%]). d, Timeline of two example sentences showing the most probable phoneme at each time step, as indicated by RNN outputs. Gray intervals indicate the highest output probability is silence, while colored segments show the
An accurate and rapidly calibrating speech neuroprosthesis

December 2023

·

318 Reads

·

5 Citations

Brain-computer interfaces (BCIs) can provide a rapid, intuitive way for people with paralysis to communicate by transforming the cortical activity associated with attempted speech into text. Despite recent advances, communication with BCIs has been restricted by requiring many weeks of training data, and by inadequate decoding accuracy. Here we report a speech BCI that decodes neural activity from 256 microelectrodes in the left precentral gyrus of a person with ALS and severe dysarthria. This system achieves daily word error rates as low as 1% (2.66% average; 9 times fewer errors than previous state-of-the-art speech BCIs) using a comprehensive 125,000-word vocabulary. On the first day of system use, following only 30 minutes of attempted speech training data, the BCI achieved 99.6% word accuracy with a 50 word vocabulary. On the second day of use, we increased the vocabulary size to 125,000 words and after an additional 1.4 hours of training data, the BCI achieved 90.2% word accuracy. At the beginning of subsequent days of use, the BCI reliably achieved 95% word accuracy, and adaptive online fine-tuning continuously improved this accuracy throughout the day. Our participant used the speech BCI in self-paced conversation for over 32 hours to communicate with friends, family, and colleagues (both in-person and over video chat). These results indicate that speech BCIs have reached a level of performance suitable to restore naturalistic communication to people living with severe dysarthria.


Figure 2. BRAND achieves low-latency inter-node communication. a) To test inter-node communication latency, a publisher node sends 30 kHz neural data (grouped in 1-millisecond packets) to a subscriber node via the Redis database. Violin plot of the resulting latency measurements showing that the inter-node communication latency is consistently below 600 microseconds even as b) the channel count is scaled up to 1024 channels, c) the sampling rate is changed, and d) additional subscriber nodes are added. Histograms of these data show the distribution of latency measurements for each e) channel count, f) output rate, and d) number of nodes.
Figure 4. BRAND runs ANN latent variable models with low latency. a) To test the inference latency of LFADS and NDT, we inserted them into an iBCI control graph that receives 256 channels of simulated threshold crossings at 1 kHz, bins them, runs inference with LFADS or NDT, runs decoding, and updates the task state. b) LFADS and NDT use different types of sequence models, an RNN and a Transformer, respectively. Reprinted from Ye and Pandarinath 2021 [18]. c) NDT inference times are consistently below 2 ms, while LFADS inference times are consistently below 6 ms.
BRAND: A platform for closed-loop experiments with deep network models

August 2023

·

39 Reads

Artificial neural networks (ANNs) are state-of-the-art tools for modeling and decoding neural activity, but deploying them in closed-loop experiments with tight timing constraints is challenging due to their limited support in existing real-time frameworks. Researchers need a platform that fully supports high-level languages for running ANNs (e.g., Python and Julia) while maintaining support for languages that are critical for low-latency data acquisition and processing (e.g., C and C++). To address these needs, we introduce the Backend for Realtime Asynchronous Neural Decoding (BRAND). BRAND comprises Linux processes, termed nodes, which communicate with each other in a graph via streams of data. Its asynchronous design allows for acquisition, control, and analysis to be executed in parallel on streams of data that may operate at different timescales. BRAND uses Redis to send data between nodes, which enables fast inter-process communication and supports 54 different programming languages. Thus, developers can easily deploy existing ANN models in BRAND with minimal implementation changes. In our tests, BRAND achieved <600 microsecond latency between processes when sending large quantities of data (1024 channels of 30 kHz neural data in 1-millisecond chunks). BRAND runs a brain-computer interface with a recurrent neural network (RNN) decoder with less than 8 milliseconds of latency from neural data input to decoder prediction. In a real-world demonstration of the system, participant T11 in the BrainGate2 clinical trial performed a standard cursor control task, in which 30 kHz signal processing, RNN decoding, task control, and graphics were all executed in BRAND. This system also supports real-time inference with complex latent variable models like Latent Factor Analysis via Dynamical Systems. By providing a framework that is fast, modular, and language-agnostic, BRAND lowers the barriers to integrating the latest tools in neuroscience and machine learning into closed-loop experiments.

Citations (4)


... 5,6,[8][9][10][11][12] In recent years, speech BCIs have emerged as a viable path toward restoring fast, naturalistic communication for people with paralysis by instead decoding attempted speech movements. [18][19][20][21][22][23] In contrast to hand-based BCIs, speech BCIs have typically been driven by neural activity in sensorimotor cortical areas further ventral such as middle precentral gyrus (midPCG) and ventral precentral gyrus (vPCG) which are most often associated with production of orofacial movements and speech. 18,20-22 Speech BCIs far outperform cursor BCIs with regard to communication rate, 10,20 but are not as well-suited for general-purpose computer control. ...

Reference:

Speech motor cortex enables BCI cursor control and click
An instantaneous voice synthesis neuroprosthesis

... Brain-Computer Interfaces (BCIs) represent a promising means for restoring communicative abilities in individuals with severe language production impairments (1-6). By circumventing the compromised neural-motor pathways, BCIs aim to directly translate the patient's intended speech into meaningful output (7,8). Over the past decade, the majority of BCI approaches have employed both invasive and non-invasive neural recording techniques to reconstruct various phonetic features of speech, including acoustic waveforms and phoneme sequences (4,9). ...

An Accurate and Rapidly Calibrating Speech Neuroprosthesis
  • Citing Article
  • August 2024

The New-England Medical Review and Journal

... Feature extraction, behavioral tasks, and neural decoders were implemented in Python 3.8 and 3.9, and were run using the BRAND framework. 41 BRAND encourages modularizing the pipeline into reusable components (processes) called BRAND nodes which use Redis for inter-process communication and logging. Calculation of the decoders' tuning matrices used Python package scikit-learn (1.1.1). ...

BRAND: a platform for closed-loop experiments with deep network models

... Intracortical brain-computer interfaces (iBCIs) can restore functional capabilities for people with paralysis by monitoring cortical neural activity and mapping it to an external variable [1,2], such as intended cursor movements, actuations of a robotic effector, handwritten characters, spoken words, and even muscle contractions [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19]. These devices typically use implanted electrodes to measure spiking activity, which in this work refers to unsorted threshold crossing events consisting primarily of action potentials. ...

An accurate and rapidly calibrating speech neuroprosthesis