Conference Paper

Aligning context-based statistical models of language with brain activity during reading

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This progress has motivated scientists to start using state-of-the-art LMs to study neural activity in the human brain during language processing (Wehbe et al., 2014b;Huth et al., 2016;Schrimpf et al., 2021;Toneva et al., 2022b;. Conversely, it has also prompted NLP researchers to start using neuroimaging data to evaluate and improve their models (Søgaard, 2016;Bingel et al., 2016;Hollenstein et al., 2019;Aw and Toneva, 2023). ...
... They train per-voxel linear regression models and evaluate their predicted per-word fMRI images by their per-voxel Pearson correlation with the real fMRI images, showing that 3-4 dimensions explained a significant amount of variance in the FMRI data. Wehbe et al. (2014b) are among the first to use neural language models, using recurrent models to compute contextualized embeddings, hidden state vectors of previous words, and word probabilities. They run their experiments of MEG recordings of participants reading Harry Potter, obtained in a follow-up study to Wehbe et al. (2014a). ...
... Several metrics are used: pairwise-matching accuracy, 2 Pearson correlation (or Brain Score), mean squared error, and representational similarity analysis. Even studies that report the same performance metrics are not directly comparable because they often report on results on different datasets and use slightly different protocols, e.g., Murphy et al. (2012) and Wehbe et al. (2014b). Beinborn et al. (2023) compare various encoding experiments and receive very diverse results for different evaluation metrics. ...
... This progress has motivated scientists to start using state-of-the-art LMs to study neural activity in the human brain during language processing (Wehbe et al., 2014b;Huth et al., 2016;Schrimpf et al., 2021;Toneva et al., 2022b;. Conversely, it has also prompted NLP researchers to start using neuroimaging data to evaluate and improve their models (Søgaard, 2016;Bingel et al., 2016;Hollenstein et al., 2019;Aw and Toneva, 2023). ...
... They train per-voxel linear regression models and evaluate their predicted per-word fMRI images by their per-voxel Pearson correlation with the real fMRI images, showing that 3-4 dimensions explained a significant amount of variance in the FMRI data. Wehbe et al. (2014b) are among the first to use neural language models, using recurrent models to compute contextualized embeddings, hidden state vectors of previous words, and word probabilities. of participants reading Harry Potter, obtained in a follow-up study to Wehbe et al. (2014a). From the three sets of representations, they then train linear regression models to predict the MEG vectors corresponding to each word, and the regression models are then evaluated by computing pair-matching accuracy. ...
... Several metrics are used: pairwise-matching accuracy, 2 Pearson correlation (or Brain Score), mean squared error, and representational similarity analysis. Even studies that report the same performance metrics are not directly comparable because they often report on results on different datasets and use slightly different protocols, e.g., Murphy et al. (2012) and Wehbe et al. (2014b). Beinborn et al. (2023) compare various encoding experiments and receive very diverse results for different evaluation metrics. ...
Preprint
Full-text available
Over the years, many researchers have seemingly made the same observation: Brain and language model activations exhibit some structural similarities, enabling linear partial mappings between features extracted from neural recordings and computational language models. In an attempt to evaluate how much evidence has been accumulated for this observation, we survey over 30 studies spanning 10 datasets and 8 metrics. How much evidence has been accumulated, and what, if anything, is missing before we can draw conclusions? Our analysis of the evaluation methods used in the literature reveals that some of the metrics are less conservative. We also find that the accumulated evidence, for now, remains ambiguous, but correlations with model size and quality provide grounds for cautious optimism.
... Encoding models (EM) are an alternative computational approach for leveraging naturalistic experimental data (Bhattasali et al., 2019;Goldstein et al., 2021;Huth et al., 2016;Jain et al., 2020;Jain & Huth, 2018, p. 20;Schrimpf et al., 2021;Wehbe, Vaswani, et al., 2014). These predictive models learn to simulate elicited brain responses = ( ) to natural language stimuli by building a computational approximation to the function for each brain element , typically in every participant individually. ...
... In the example of action and object words, the feature space could indicate that hand-related, foot-related, and mouth-related words were all types of actions, and distinguish all action words from multiple subcategories of objects. One recent examples of such a high-dimensional feature space is "word embeddings" that capture semantic similarity (Mikolov et al., 2013;Pennington et al., 2014), which have been used to characterize semantic language representations across the human brain (de Heer et al., 2017;Huth et al., 2016;Wehbe, Murphy, et al., 2014;Wehbe, Vaswani, et al., 2014). With a suitably rich linearizing transform Ls, this approach vastly expands the set of hypotheses that can be reasonably explored with a limited dataset. ...
... Z. Li et al., 2021;Linzen & Leonard, 2018;Marvin & Linzen, 2018;Prasad et al., 2019;Tenney et al., 2018Tenney et al., , 2019). While this by no means is a complete representation of phrase meaning (Bender & Koller, 2020), using an LM as a linearizing transform has been shown to effectively predict natural language responses in both the cortex and cerebellum, with different neuroimaging techniques and stimulus presentation modalities (Abnar et al., 2019;Anderson et al., 2021;Goldstein et al., 2021;Jain et al., 2020;Jain & Huth, 2018;LeBel et al., 2021;Schrimpf et al., 2021;Toneva et al., 2020;Toneva & Wehbe, 2019;Wehbe, Murphy, et al., 2014;Wehbe, Vaswani, et al., 2014). Moreover, these models easily outperform earlier "word embedding" encoding models that use one static feature vector for each word in the stimulus and thus ignore the effects of context (Antonello et al., 2021;Jain & Huth, 2018). ...
Article
Full-text available
Language neuroscience currently relies on two major experimental paradigms: controlled experiments using carefully hand-designed stimuli, and natural stimulus experiments. These approaches have complementary advantages which allow them to address distinct aspects of the neurobiology of language, but each approach also comes with drawbacks. Here we discuss a third paradigm—in silico experimentation using deep learning-based encoding models—that has been enabled by recent advances in cognitive computational neuroscience. This paradigm promises to combine the interpretability of controlled experiments with the generalizability and broad scope of natural stimulus experiments. We show four examples of simulating language neuroscience experiments in silico and then discuss both the advantages and caveats of this approach.
... This linear mapping approach can be equally applied at the sentence level by replacing the word embedding with an embeddings vector for the integrated sentence meaning. Using sentence-level embeddings, recent works have extended this approach to other regression techniques, such as linear ridge regression (Caucheteux & King, 2022), other models, such as GPT-2 (Goldstein, Dabush, et al., 2022), and other brain recording modalities, such as MEG (Wehbe, Vaswani, Knight, & Mitchell, 2014) or ECoG . These works are reviewed in detail in the next part. ...
... When mapping language models onto fMRI data, most researchers have relied on averaging or concatenating word embeddings to match the resolution of the neural data (Anderson et al., 2021;Toneva, Mitchell, & Wehbe, 2020, 2022aWehbe, Vaswani, et al., 2014). In recent transformer models, contextualised embeddings aggregate information on preceding word meaning, which makes it possible to simply rely on individual embeddings from the final hidden layer, such as the sentence final word (Schrimpf et al., 2021). ...
... Finally, after the presentation of the word, the updated representation of the context predicts MEG activity. This can be interpreted as evidence that the brain represents the context as a latent variable, and integrates the meaning of the word to this context representation to produce a context-dependent embedding (Wehbe, Vaswani, et al., 2014). ...
Preprint
Full-text available
Recent artificial neural networks that process natural language achieve unprecedented performance in tasks requiring sentence-level understanding. As such, they could be interesting models of the integration of linguistic information in the human brain. We review works that compare these artificial language models with human brain activity and we assess the extent to which this approach has improved our understanding of the neural processes involved in natural language comprehension. Two main results emerge. First, the neural representation of word meaning aligns with the context-dependent, dense word vectors used by the artificial neural networks. Second, the processing hierarchy that emerges within artificial neural networks broadly matches the brain, but is surprisingly inconsistent across studies. We discuss current challenges in establishing artificial neural networks as process models of natural language comprehension. We suggest exploiting the highly structured representational geometry of artificial neural networks when mapping representations to brain data.
... Deep neural network language models are currently the most powerful tools for building such representations [7][8][9] . Though these natural language processing (NLP) systems are not specifically designed to mimic the processing of language in the brain, representations of language extracted from these NLP systems have been shown to predict the brain activity of a person comprehending language better than ever before [10][11][12][13][14][15] . After being trained to predict a word in a specific position from its context on extremely large corpora of text, neural network language models achieve unprecedented performance on various NLP tasks [7][8][9] . ...
... The same paradigm was recorded for nine participants (five female, four male; age 18-40 years) using MEG by the authors of ref. 10 and shared upon our request. Written informed consent was obtained from all participants, and participants were compensated for their time. ...
... We estimate a function f, such that f(e t ) = b, where b is the brain activity recorded with either MEG or fMRI. We follow previous work 10,35,54,65,66 and model f as a linear function, regularized by the ridge penalty. ...
Article
Full-text available
To study a core component of human intelligence—our ability to combine the meaning of words—neuroscientists have looked to linguistics. However, linguistic theories are insufficient to account for all brain responses reflecting linguistic composition. In contrast, we adopt a data-driven approach to study the composed meaning of words beyond their individual meaning, which we term ‘supra-word meaning’. We construct a computational representation for supra-word meaning and study its brain basis through brain recordings from two complementary imaging modalities. Using functional magnetic resonance imaging, we reveal that hubs that are thought to process lexical meaning also maintain supra-word meaning, suggesting a common substrate for lexical and combinatorial semantics. Surprisingly, we cannot detect supra-word meaning in magnetoencephalography, which suggests that composed meaning might be maintained through a different neural mechanism than the synchronized firing of pyramidal cells. This sensitivity difference has implications for past neuroimaging results and future wearable neurotechnology.
... Language models (LMs) that have been pretrained to predict the next word over billions of text documents have also been shown to significantly predict brain recordings of people comprehending language (Wehbe et al., 2014b;Jain and Huth, 2018;Toneva and Wehbe, 2019;Caucheteux and King, 2020;Schrimpf et al., 2021;Goldstein et al., 2022). Understanding the reasons behind the observed similarities between representations of language in machines and representations of language in the brain can lead to more insight into both systems. ...
... Several previous studies have investigated the alignment between pretrained language models and brain recordings of people comprehending language, finding significant similarities (Wehbe et al., 2014b;Jain and Huth, 2018;Toneva and Wehbe, 2019;Abdou et al., 2021;Schrimpf et al., 2021;Hosseini et al., 2024). Our work builds on these and further studies the reasons for these similarities. ...
... Some of these distance metrics make comparisons based on kernel matrices (Kornblith et al., 2019;Cristianini et al., 2001;Cortes et al., 2012) or relative distances (Kriegeskorte et al., 2008) between sample representations in a set. Others compute linear (Wehbe et al., 2014;Schrimpf et al., 2018) or orthogonal projections (Beauducel, 2018) from one set of representations to another. Others use canonical correlation analysis which finds linear relationships between pairs of vectors (Raghu et al., 2017;Morcos et al., 2018). ...
... Such approaches have been commonly applied in neuroscience for measuring representational distance of activations from networks and activity in the brain to understand which neural networks are architecturally most similar to the brain (Wehbe et al., 2014;Conwell et al., 2021a;Subramaniam et al., 2024;Goldstein et al., 2020). Under this context, Han et al. (2023) has shown the inability of current representational distance metrics to distinguish representations based on architecture. ...
Preprint
Full-text available
We demonstrate that architectures which traditionally are considered to be ill-suited for a task can be trained using inductive biases from another architecture. Networks are considered untrainable when they overfit, underfit, or converge to poor results even when tuning their hyperparameters. For example, plain fully connected networks overfit on object recognition while deep convolutional networks without residual connections underfit. The traditional answer is to change the architecture to impose some inductive bias, although what that bias is remains unknown. We introduce guidance, where a guide network guides a target network using a neural distance function. The target is optimized to perform well and to match its internal representations, layer-by-layer, to those of the guide; the guide is unchanged. If the guide is trained, this transfers over part of the architectural prior and knowledge of the guide to the target. If the guide is untrained, this transfers over only part of the architectural prior of the guide. In this manner, we can investigate what kinds of priors different architectures place on untrainable networks such as fully connected networks. We demonstrate that this method overcomes the immediate overfitting of fully connected networks on vision tasks, makes plain CNNs competitive to ResNets, closes much of the gap between plain vanilla RNNs and Transformers, and can even help Transformers learn tasks which RNNs can perform more easily. We also discover evidence that better initializations of fully connected networks likely exist to avoid overfitting. Our method provides a mathematical tool to investigate priors and architectures, and in the long term, may demystify the dark art of architecture creation, even perhaps turning architectures into a continuous optimizable parameter of the network.
... The increasing development and use of neuroimaging techniques represent a significant advance in the field of brain decoding, in which computational scientists interpret implicit brain activities based on explicit linguistic representations and deep-learning algorithms Wehbe et al. 2014b;Hale et al. 2018;Gauthier and Levy 2019;Cao and Zhang 2019). The main research task is to establish a mapping between the concepts and neural activation patterns through neuroimaging experiments. ...
... However, as an important prerequisite of brain decoding, the measurement of brain activation remains a challenge due to limitations on neuroimaging technology. Existing methods include electrocorticography (ECoG) (Kuruvilla and Flink 2003), electroencephalogram (EEG) (Murphy, Baroni, and Poesio 2009), functional magnetic resonance imag-The Thirty-Fifth AAAI Conference on Artificial Intelligence ing (fMRI) (Pereira, Just, and Mitchell 2001;Wehbe et al. 2014a;Gauthier and Levy 2019), and magnetoencephalography (MEG) (Wehbe et al. 2014b;Fyshe et al. 2014), each having its own relative advantages and weaknesses. One comparatively less studied tool is the functional near-infrared spectroscopy, shortly fNIRS, which can interpret the brain through cerebral hemodynamic responses associated with neuron behaviors. ...
Preprint
Full-text available
Brain activation can reflect semantic information elicited by natural words and concepts. Increasing research has been conducted on decoding such neural activation patterns using rep-resentational semantic models. However, prior work decoding semantic meaning from neurophysiological responses has been largely limited to ECoG, fMRI, MEG, and EEG techniques , each having its own advantages and limitations. More recently, the functional near infrared spectroscopy (fNIRS) has emerged as an alternative hemodynamic-based approach and possesses a number of strengths. We investigate brain decoding tasks under the help of fNIRS and empirically compare fNIRS with fMRI. Primarily, we find that: 1) like fMRI scans, activation patterns recorded from fNIRS encode rich information for discriminating concepts, but show limits on the possibility of decoding fine-grained semantic clues; 2) fNIRS decoding shows robustness across different brain regions , semantic categories and even subjects; 3) fNIRS has higher accuracy being decoded based on multi-channel patterns as compared to single-channel ones, which is in line with our intuition of the working mechanism of human brain. Our findings prove that fNIRS has the potential to promote a deep integration of NLP and cognitive neuroscience from the perspective of language understanding. We release the largest fNIRS dataset by far to facilitate future research.
... Language models that have been pretrained to predict the next word over billions of text documents have been shown to also significantly predict brain recordings of people comprehending language (Wehbe et al., 2014a;Jain and Huth, 2018;Toneva and Wehbe, 2019;Caucheteux and King, 2020;Schrimpf et al., 2021;Goldstein et al., 2022). Understanding the reasons behind the observed similarities between language in machines and language in the brain can lead to more insight into both systems. ...
... A number of previous works have investigated the alignment between pretrained language models and brain recordings of people comprehending language. Wehbe et al. (2014a) aligned MEG brain recordings with a Recurrent Neural Network (RNN), trained on an online archive of Harry Potter Fan Fiction. Jain and Huth (2018) aligned layers from a Long Short-Term Memory (LSTM) model to fMRI recordings of subjects listening to stories. ...
Preprint
Pretrained language models that have been trained to predict the next word over billions of text documents have been shown to also significantly predict brain recordings of people comprehending language. Understanding the reasons behind the observed similarities between language in machines and language in the brain can lead to more insight into both systems. Recent works suggest that the prediction of the next word is a key mechanism that contributes to the alignment between the two. What is not yet understood is whether prediction of the next word is necessary for this observed alignment or simply sufficient, and whether there are other shared mechanisms or information that is similarly important. In this work, we take a first step towards a better understanding via two simple perturbations in a popular pretrained language model. The first perturbation is to improve the model's ability to predict the next word in the specific naturalistic stimulus text that the brain recordings correspond to. We show that this indeed improves the alignment with the brain recordings. However, this improved alignment may also be due to any improved word-level or multi-word level semantics for the specific world that is described by the stimulus narrative. We aim to disentangle the contribution of next word prediction and semantic knowledge via our second perturbation: scrambling the word order at inference time, which reduces the ability to predict the next word, but maintains any newly learned word-level semantics. By comparing the alignment with brain recordings of these differently perturbed models, we show that improvements in alignment with brain recordings are due to more than improvements in next word prediction and word-level semantics.
... The increasing development and use of neuroimaging techniques represent a significant advance in the field of brain decoding, in which computational scientists interpret implicit brain activities based on explicit linguistic representations and deep-learning algorithms Wehbe et al. 2014b;Hale et al. 2018;Gauthier and Levy 2019;Cao and Zhang 2019). The main research task is to establish a mapping between the concepts and neural activation patterns through neuroimaging experiments. ...
... However, as an important prerequisite of brain decoding, the measurement of brain activation remains a challenge due to limitations on neuroimaging technology. Existing methods include electrocorticography (ECoG) (Kuruvilla and Flink 2003), electroencephalogram (EEG) (Murphy, Baroni, and Poesio 2009), functional magnetic resonance imag-The Thirty-Fifth AAAI Conference on Artificial Intelligence ing (fMRI) (Pereira, Just, and Mitchell 2001;Wehbe et al. 2014a;Gauthier and Levy 2019), and magnetoencephalography (MEG) (Wehbe et al. 2014b;Fyshe et al. 2014), each having its own relative advantages and weaknesses. One comparatively less studied tool is the functional near-infrared spectroscopy, shortly fNIRS, which can interpret the brain through cerebral hemodynamic responses associated with neuron behaviors. ...
Article
Full-text available
Brain activation can reflect semantic information elicited by natural words and concepts. Increasing research has been conducted on decoding such neural activation patterns using representational semantic models. However, prior work decoding semantic meaning from neurophysiological responses has been largely limited to ECoG, fMRI, MEG, and EEG techniques, each having its own advantages and limitations. More recently, the functional near infrared spectroscopy (fNIRS) has emerged as an alternative hemodynamic-based approach and possesses a number of strengths. We investigate brain decoding tasks under the help of fNIRS and empirically compare fNIRS with fMRI. Primarily, we find that: 1) like fMRI scans, activation patterns recorded from fNIRS encode rich information for discriminating concepts, but show limits on the possibility of decoding fine-grained semantic clues; 2) fNIRS decoding shows robustness across different brain regions, semantic categories and even subjects; 3) fNIRS has higher accuracy being decoded based on multi-channel patterns as compared to single-channel ones, which is in line with our intuition of the working mechanism of human brain. Our findings prove that fNIRS has the potential to promote a deep integration of NLP and cognitive neuroscience from the perspective of language understanding. We release the largest fNIRS dataset by far to facilitate future research.
... The few studies using EEG data implemented the encoding setup, as in Murphy et al. (2009), where pictures of concrete entities where used as stimuli; and both encoding and decoding in Sassenhagen and Fiebach (2020), a study where both concrete and abstract common nouns were used. MEG data, which grants higher machine learning performances given the superior signal quality, was instead used for decoding to word vectors from brain processing of pictures referring to concrete concepts (Sudre et al., 2012) and visually presented stories (Wehbe et al., 2014b). ...
... We chose to repeat the stimuli 24 times, since (Grootswagers et al., 2017) clearly demonstrate that, in order to reach optimal decoding and classification results using evoked potentials, between 16 and 32 trials for each stimulus are needed. Each trial consisted of two parts, as shown in Figure 1: first, the presentation of a stimulus name or noun for 750 ms (word presentation times in EEG experiments are kept below 1 s, as word processing begins already at 150 ms after the stimulus appears Simanova et al., 2010;Wehbe et al., 2014b;Sassenhagen and Fiebach, 2020); the stimulus was preceded and followed by the presentation of a white fixation cross for 1,000 ms at the center of the screen. Participants were instructed to read the word and then visualize mentally the referent of the stimulus, until the cross was on screen. ...
Article
Full-text available
Semantic knowledge about individual entities (i.e., the referents of proper names such as Jacinta Ardern) is fine-grained, episodic, and strongly social in nature, when compared with knowledge about generic entities (the referents of common nouns such as politician). We investigate the semantic representations of individual entities in the brain; and for the first time we approach this question using both neural data, in the form of newly-acquired EEG data, and distributional models of word meaning, employing them to isolate semantic information regarding individual entities in the brain. We ran two sets of analyses. The first set of analyses is only concerned with the evoked responses to individual entities and their categories. We find that it is possible to classify them according to both their coarse and their fine-grained category at appropriate timepoints, but that it is hard to map representational information learned from individuals to their categories. In the second set of analyses, we learn to decode from evoked responses to distributional word vectors. These results indicate that such a mapping can be learnt successfully: this counts not only as a demonstration that representations of individuals can be discriminated in EEG responses, but also as a first brain-based validation of distributional semantic models as representations of individual entities. Finally, in-depth analyses of the decoder performance provide additional evidence that the referents of proper names and categories have little in common when it comes to their representation in the brain.
... Using this encoding (linear regression) model, they were able to examine the brain areas which were sensitive to the different types of features, enabling them to distinguish between areas on the basis of the type of information they represent. Using MEG data gathered for the same chapter of Harry Potter, Wehbe et al. (2014b) was one of the earliest works to investigate the alignment between the representations used by RNN language models and brain activity as subjects read a story. They train auto-regressive neural language models (NLMs) (Mikolov et al., 2011) on a corpus of Harry potter fan fiction and extract three classes of features per time-step: the embeddings, the hidden state vectors (previous and current), and the predicted output probabilities. ...
... They propose a Bayesian algorithm that constructs a generative model of areas tiling the cortex across subjects, resulting in single atlas that describes the distribution of semantically selective functional areas in human cerebral cortex. Jain and Huth (2018) follow Wehbe et al. (2014b) in using an RNN language model to incorporate context into encoding models that predict the neural response (fMRI in this case) of subjects listening to natural speech. They find that the representations from NLM hidden states outperform previously used non-contextual word embedding models in predicting brain response and that context length and choice of layer differentially predict the activation across cortical regions. ...
Preprint
Full-text available
Understanding the neural basis of language comprehension in the brain has been a long-standing goal of various scientific research programs. Recent advances in language modelling and in neuroimaging methodology promise potential improvements in both the investigation of language's neurobiology and in the building of better and more human-like language models. This survey traces a line from early research linking Event Related Potentials and complexity measures derived from simple language models to contemporary studies employing Artificial Neural Network models trained on large corpora in combination with neural response recordings from multiple modalities using naturalistic stimuli.
... Recent neuroimaging studies suggest that they might-at least partially [8][9][10][11][12] . First, word embeddings-high dimensional dense vectors trained to predict lexical neighborhood [13][14][15][16] -have been shown to linearly map onto the brain responses elicited by words presented either in isolation [17][18][19] or within narratives [20][21][22][23][24][25][26][27][28][29][30] . Second, the contextualized activations of language transformers improve the precision of this mapping, especially in the prefrontal, temporal and parietal cortices [31][32][33] . ...
... First, our work complements previous studies 26,27,[30][31][32][33][34] and confirms that the activations of deep language models significantly map onto the brain responses to written sentences (Fig. 3). This mapping peaks in a distributed and bilateral brain network (Fig. 3a, b) and is best estimated by the middle layers of language transformers (Fig. 4a, e). ...
Article
Full-text available
Deep learning algorithms trained to predict masked words from large amount of text have recently been shown to generate activations similar to those of the human brain. However, what drives this similarity remains currently unknown. Here, we systematically compare a variety of deep language models to identify the computational principles that lead them to generate brain-like representations of sentences. Specifically, we analyze the brain responses to 400 isolated sentences in a large cohort of 102 subjects, each recorded for two hours with functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG). We then test where and when each of these algorithms maps onto the brain responses. Finally, we estimate how the architecture, training, and performance of these models independently account for the generation of brain-like representations. Our analyses reveal two main findings. First, the similarity between the algorithms and the brain primarily depends on their ability to predict words from context. Second, this similarity reveals the rise and maintenance of perceptual, lexical, and compositional representations within each cortical region. Overall, this study shows that modern language algorithms partially converge towards brain-like solutions, and thus delineates a promising path to unravel the foundations of natural language processing. Charlotte Caucheteux and Jean-Rémi King examine the ability of transformer neural networks trained on word prediction tasks to fit representations in the human brain measured with fMRI and MEG. Their results provide further insight into the workings of transformer language models and their relevance to brain responses.
... Many neuroscience experiments record brain responses while participants listen to natural language stimuli, such as narrative stories [Wehbe et al., 2014, Huth et al., 2016, Nastase et al., 2021. These data are used to fit encoding models that predict the response at each location in the brain as a function of the stimulus . ...
Preprint
Full-text available
Speech encoding models use auditory representations to predict how the human brain responds to spoken language stimuli. Most performant encoding models linearly map the hidden states of artificial neural networks to brain data, but this linear restriction may limit their effectiveness. In this work, we use low-rank adaptation (LoRA) to fine-tune a WavLM-based encoding model end-to-end on a brain encoding objective, producing a model we name BrainWavLM. We show that fine-tuning across all of cortex improves average encoding performance with greater stability than without LoRA. This improvement comes at the expense of low-level regions like auditory cortex (AC), but selectively fine-tuning on these areas improves performance in AC, while largely retaining gains made in the rest of cortex. Fine-tuned models generalized across subjects, indicating that they learned robust brain-like representations of the speech stimuli. Finally, by training linear probes, we showed that the brain data strengthened semantic representations in the speech model without any explicit annotations. Our results demonstrate that brain fine-tuning produces best-in-class speech encoding models, and that non-linear methods have the potential to bridge the gap between artificial and biological representations of semantics.
... Previous research in neurolinguistics ; Wehbe et al. (2014); Reddy and Wehbe (2021); Schwartz et al. (2019) has relied primarily on gathering data using fMRI, which is slow, expensive, and not in real-time compared to other brain recording techniques. Recent studies have attempted to unravel how information needs are represented in the brain Allegretti et al. (2015); Moshfeghi et al. (2016). ...
Preprint
Full-text available
Brain decoding has emerged as a rapidly advancing and extensively utilized technique within neuroscience. This paper centers on the application of raw electroencephalogram (EEG) signals for decoding human brain activity, offering a more expedited and efficient methodology for enhancing our understanding of the human brain. The investigation specifically scrutinizes the efficacy of brain-computer interfaces (BCI) in deciphering neural signals associated with speech production, with particular emphasis on the impact of vocabulary size, electrode density, and training data on the framework's performance. The study reveals the competitive word error rates (WERs) achievable on the Librispeech benchmark through pre-training on unlabelled data for speech processing. Furthermore, the study evaluates the efficacy of voice recognition under configurations with limited labeled data, surpassing previous state-of-the-art techniques while utilizing significantly fewer labels. Additionally, the research provides a comprehensive analysis of error patterns in voice recognition and the influence of model size and unlabelled training data. It underscores the significance of factors such as vocabulary size and electrode density in enhancing BCI performance, advocating for an increase in microelectrodes and refinement of language models.
... Making predictions using the fMRI recordings and the extracted features The second step was to train a ridge regression model to learn a mapping from brain representation to neural network representation. This approach followed some previous work [38,36,39,28,21] that learned a linear function with a ridge penalty to learn the mapping from brain to neural network representation. Although a ridge regression is a relatively simple model we chose to use it since it previously has been demonstrated to be useful [37]. ...
Preprint
Full-text available
Contemporary neural networks intended for natural language processing (NLP) are not designed with specific linguistic rules. It suggests that they may acquire a general understanding of language. This attribute has led to extensive research in deciphering their internal representations. A pioneering method involves an experimental setup using human brain data to explore if a translation between brain and neural network representations can be established. Since this technique emerged, more sophisticated NLP models have been developed. In our study, we apply this method to evaluate four new NLP models aiming to identify the one most compatible with brain activity. Additionally, to explore how the brain comprehends text semantically, we alter the text by removing punctuation in four different ways to understand its impact on semantic processing by the human brain. Our findings indicate that the RoBERTa model aligns best with brain activity, outperforming BERT in accuracy according to our metrics. Furthermore, for BERT, higher accuracy was noted when punctuation was excluded, and increased context length did not significantly diminish accuracy compared to the original results with punctuation.
... Distributional semantics is also increasingly influential in semantic theory [8,9] and widely applied in computational linguistics where it yields state-of-the-art results in applications for natural language processing [10]. Vector representations for words account for experimental findings from psycholinguistics [11,12] and are not unrelated to cortical representations: They have been shown to allow for the decoding of neural activity during single word com-2 prehension [13] or narrative reading [14,15], and distances between vectors are predictive of neural activation during written and spoken language comprehension [16,17]. We use these vectors differently here: Rather than comparing vector (distances) to neural activation, we compute power spectra directly over sequences of vectors, as Ding et al. do for recorded MEG signals. ...
Preprint
Results from a recent neuroimaging study on spoken sentence comprehension have been interpreted as evidence for cortical entrainment to hierarchical syntactic structure. We present a simple computational model that predicts the power spectra from this study, even though the model's linguistic knowledge is restricted to the lexical level, and word-level representations are not combined into higher-level units (phrases or sentences). Hence, the cortical entrainment results can also be explained from the lexical properties of the stimuli, without recourse to hierarchical syntax.
... The relationship between word vectors and word processing in reading comprehension has been well documented. Researchers have identified neural correlates between word vectors in both single-word comprehension and narrative reading, indicating the usefulness of word vectors in predicting reading behavior (Mitchell et al., 2008;Wehbe, Vaswani, Knight, & Mitchell, 2014). With the advancement of NLP techniques and the availability of experimental databases on language comprehension, plentiful studies have explored the feasibility of using word embeddings to predict reading behavior (Hollenstein et al., 2021(Hollenstein et al., , 2019. ...
... Natural language representations in fMRI In recent years, predicting brain responses to natural language using LLM representations has become common in the field of language neuroscience (Jain & Huth, 2018;Wehbe et al., 2014;Schrimpf et al., 2021;. This paradigm of using predictive "encoding models" to better understand how the brain processes language has been applied in a wide literature to explore to what extent syntax, semantics, or discourse drives brain activity (Wu et al., 2006;Caucheteux et al., 2021;Kauf et al., 2023;Reddy & Wehbe, 2020;Kumar et al., 2022;Oota et al., 2022;Tuckute et al., 2023;Benara et al., 2024;Antonello et al., 2024a) or to understand the cortical organization of language timescales (Jain et al., 2020;Chen et al., 2023a). ...
Preprint
Full-text available
Recent large language models (LLMs) have excelled across a wide range of tasks, but their use in high-stakes and compute-limited settings has intensified the demand for interpretability and efficiency. We address this need by proposing Induction-head ngram models (Induction-Gram), a method that builds an efficient, interpretable LM by bolstering modern ngram models with a hand-engineered "induction head". This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions. This process enables Induction-Gram to provide ngram-level grounding for each generated token. Moreover, experiments show that this simple method significantly improves next-word prediction over baseline interpretable models (up to 26%p) and can be used to speed up LLM inference for large models through speculative decoding. We further study Induction-Gram in a natural-language neuroscience setting, where the goal is to predict the next fMRI response in a sequence. It again provides a significant improvement over interpretable models (20% relative increase in the correlation of predicted fMRI responses), potentially enabling deeper scientific investigation of language selectivity in the brain. The code is available at https://github.com/ejkim47/induction-gram.
... We believe our method of fine-tuning LLMs with EEG input is the first of its kind. However, we believe that directly presenting participants with textual stimuli is not ideal, as using EEG data collected in this manner for thought analysis introduces additional complexities of language processing, such as determining brain activity windows for specific words, handling the retention of word context post-onset, and managing the overlap of contexts when words are shown in different time frames (Wehbe et al., 2014;Murphy et al., 2022). Additionally, vocabulary size presents a challenge: while EEG-to-text systems perform well in closedvocabulary settings, open-vocabulary decoding becomes inefficient as vocabulary size increases (Martin et al., 2018;Wang and Ji, 2022;Liu et al., 2024a). ...
Preprint
Full-text available
Decoding and expressing brain activity in a comprehensible form is a challenging frontier in AI. This paper presents Thought2Text, which uses instruction-tuned Large Language Models (LLMs) fine-tuned with EEG data to achieve this goal. The approach involves three stages: (1) training an EEG encoder for visual feature extraction, (2) fine-tuning LLMs on image and text data, enabling multimodal description generation, and (3) further fine-tuning on EEG embeddings to generate text directly from EEG during inference. Experiments on a public EEG dataset collected for six subjects with image stimuli demonstrate the efficacy of multimodal LLMs (LLaMa-v3, Mistral-v0.3, Qwen2.5), validated using traditional language generation evaluation metrics, GPT-4 based assessments, and evaluations by human expert. This approach marks a significant advancement towards portable, low-cost "thoughts-to-text" technology with potential applications in both neuroscience and natural language processing (NLP).
... El avance producido en losúltimos años en los modelos delárea del Procesamiento de Lenguaje (PLN) ha permitido que los mismos sean utilizados por la Neurociencia Cognitiva para profundizar el conocimiento de los procesos subyacentes del lenguaje [6,18,19,20]. En trabajos anteriores realizamos avances en comprender cómo estos tipo de modelos, en particular los modelos de lenguaje causales, son capaces de modelar la cloze-Pred como co-variables en modelos estadísticos utilizados para comprender los movimientos oculares [10,13]. En el presente trabajo profundizamos esta línea de investigación mediante el análisis de los resultados de la arquitectura GPT2 [3]. ...
Preprint
Full-text available
The advancement of the Natural Language Processing field has enabled the development of language models with a great capacity for generating text. In recent years, Neuroscience has been using these models to better understand cognitive processes. In previous studies, we found that models like Ngrams and LSTM networks can partially model Predictability when used as a co-variable to explain readers' eye movements. In the present work, we further this line of research by using GPT-2 based models. The results show that this architecture achieves better outcomes than its predecessors.
... The range is from 170 ms before the onset to 700 ms after onset. This range is wider than the [0, 400] ms period reported in Wehbe et al. (2014b), who used word embedding vectors to predict MEG signals (i.e., encoding) during a controlled reading paradigm. This indicates that the EEG recordings have captured contextual representations that are predictive of the MT-LSTM embeddings. ...
Article
Full-text available
The brain’s ability to perform complex computations at varying timescales is crucial, ranging from understanding single words to grasping the overarching narrative of a story. Recently, multi-timescale long short-term memory (MT-LSTM) models (Mahto et al. 2020; Jain et al. 2020) have been introduced, which use temporally tuned parameters to induce sensitivity to different timescales of language processing (i.e., related to near/distant words). However, there has not been an exploration of the relationship between such temporally tuned information processing in MT-LSTMs and the brain’s processing of language using high temporal resolution recording modalities, such as electroencephalography (EEG). To bridge this gap, we used an EEG dataset recorded while participants listened to Chapter 1 of “Alice in Wonderland” and trained ridge regression models to predict the temporally tuned MT-LSTM embeddings from EEG responses. Our analysis reveals that EEG signals can be used to predict MT-LSTM embeddings across various timescales. For longer timescales, our models produced accurate predictions within an extended time window of ±2 s around word onset, while for shorter timescales, significant predictions are confined to a narrower window ranging from −180 ms to 790 ms. Intriguingly, we observed that short timescale information is not only processed in the vicinity of word onset but also at more distant time points. These observations underscore the parallels and discrepancies between computational models and the neural mechanisms of the brain. As word embeddings are used more as in silico models of semantic representation in the brain, a more explicit consideration of timescale-dependent processing enables more targeted explorations of language processing in humans and machines.
... Natural language representations in fMRI Using LLM representations to help predict brain responses to natural language has recently become popular among neuroscientists studying language processing [55][56][57][58][59][60] (see [61,62] for reviews). This paradigm of using "encoding models" [63] to better understand how the brain processes language has been applied to help understand the cortical organization of language timescales [64,65], examine the relationship between visual and semantic information in the brain [66], and explore to what extent syntax, semantics, or discourse drives brain activity [22,[67][68][69][70][71][72][73]18]. ...
Preprint
Full-text available
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through LLM prompting. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. Training QA-Emb reduces to selecting a set of underlying questions rather than learning model weights. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli. QA-Emb significantly outperforms an established interpretable baseline, and does so while requiring very few questions. This paves the way towards building flexible feature spaces that can concretize and evaluate our understanding of semantic brain representations. We additionally find that QA-Emb can be effectively approximated with an efficient model, and we explore broader applications in simple NLP tasks.
... The source of prediction error in these models is operationalized as surprisal, which refers to the likelihood of a specific word given the preceding context (Hale, 2001;Levy, 2008). Word-by-word surprisal from these models correlates with language processing in humans, measured by reading times (Frank, 2013;Frank & Hoeks, 2019;Goodkind & Bicknell, 2018;Monsalve et al., 2012;Van Schijndel & Linzen, 2018), N400 amplitudes during EEG (Frank et al., , 2015, and MEG responses (Wehbe et al., 2014), suggesting that humans are sensitive to the same statistical properties of language (surprisal) which generate prediction error in computational models. ...
Article
Full-text available
Inverse probability adaptation effects (the finding that encountering a verb in an unexpected structure increases long-term priming for that structure) have been observed in both L1 and L2 speakers. However, participants in these studies all had established representations of the syntactic structures to be primed. It therefore remains an open question whether inverse probability adaptation effects could take place with newly encountered L2 structures. In a pre-registered experiment, we exposed participants ( n = 84) to an artificial language with active and passive constructions. Training on Day 1 established expectations for specific co-occurrence patterns between verbs and structures. On Day 2, established patterns were violated for the surprisal group ( n = 42), but not for the control group ( n = 42). We observed no immediate priming effects from exposure to high-surprisal items. On Day 3, however, we observed an effect of input variation on comprehension of verb meaning in an auditory grammaticality judgment task. The surprisal group showed higher accuracy for passive structures in both tasks, suggesting that experiencing variation during learning had promoted the recognition of optionality in the target language.
... Interpretation inspired by neuroscience With more understanding of the functional specialization of the human brain, researchers attempt to interpret deep learning models with brain activities in specialized regions (Wehbe et al., 2014;Toneva and Wehbe, 2019;Zhuang et al., 2021;Bakhtiari et al., 2021). For example, Toneva and Wehbe (2019) studied the representations of NLP models across different layers by aligning with two groups of brain areas among the language network. ...
... Unlike convolutional neural networks, whose architectural design principles are roughly inspired by biological vision 26 , the design of current neural-network language models is largely uninformed by psycholinguistics and neuroscience. However, there is an ongoing effort to adopt and adapt neural-network language models to serve as computational hypotheses of how humans process language, making use of a variety of different architectures, training corpora and training tasks 11,[27][28][29][30][31][32][33][34][35] . We found that RNNs make markedly human-inconsistent predictions once pitted against transformer-based neural networks. ...
Article
Full-text available
Neural network language models appear to be increasingly aligned with how humans process and generate language, but identifying their weaknesses through adversarial examples is challenging due to the discrete nature of language and the complexity of human language perception. We bypass these limitations by turning the models against each other. We generate controversial sentence pairs where two language models disagree about which sentence is more likely to occur. Considering nine language models (including n-gram, recurrent neural networks and transformers), we created hundreds of controversial sentence pairs through synthetic optimization or by selecting sentences from a corpus. Controversial sentence pairs proved highly effective at revealing model failures and identifying models that aligned most closely with human judgements of which sentence is more likely. The most human-consistent model tested was GPT-2, although experiments also revealed substantial shortcomings in its alignment with human perception.
... Following the works of Bemis & Pylkkänen (2011, a few studies have tried to leverage computational models to identify the neural bases of compositionality and quantify brain regions' sensitivity to increasing sizes of context. Some of them, using ecological paradigms, have found a hierarchy of brain regions that are sensitive to different types of contextual information and different temporal receptive fields (e.g., Jain & Huth, 2018;Toneva et al., 2022;Wehbe et al., 2014). A notable investigation (Jain & Huth, 2018) used pre-trained LSTM (Hochreiter & Schmidhuber, 1997) models to study context integration. ...
Preprint
Full-text available
Two fundamental questions in neurolinguistics concerns the brain regions that integrate information beyond the lexical level, and the size of their window of integration. To address these questions we introduce a new approach named masked-attention generation. It uses GPT-2 transformers to generate word embeddings that capture a fixed amount of contextual information. We then tested whether these embeddings could predict fMRI brain activity in humans listening to naturalistic text. The results showed that most of the cortex within the language network is sensitive to contextual information, and that the right hemisphere is more sensitive to longer contexts than the left. Masked-attention generation supports previous analyses of context-sensitivity in the brain, and complements them by quantifying the window size of context integration per voxel.
... Natural language representations in fMRI Using the representations from LLMs to help predict brain responses to natural language has become common among neuroscientists studying language processing in recent years [27,[75][76][77][78][79] (see [80] and [81] for reviews). This paradigm of using "encoding models" [82] to better understand how the brain processes language has been applied to help understand the cortical organization of language timescales [83,84], examine the relationship between visual and semantic information in the brain [85], and explore to what extent syntax, semantics or discourse drives brain activity [86][87][88][89][90][91][92]. ...
Preprint
Full-text available
Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A "text module" is any function that maps text to a scalar continuous value, such as a submodule within an LLM or a fitted model of a brain region. "Black box" indicates that we only have access to the module's inputs/outputs. We introduce Summarize and Score (SASC), a method that takes in a text module and returns a natural language explanation of the module's selectivity along with a score for how reliable the explanation is. We study SASC in 3 contexts. First, we evaluate SASC on synthetic modules and find that it often recovers ground truth explanations. Second, we use SASC to explain modules found within a pre-trained BERT model, enabling inspection of the model's internals. Finally, we show that SASC can generate explanations for the response of individual fMRI voxels to language stimuli, with potential applications to fine-grained brain mapping. All code for using SASC and reproducing results is made available on Github.
... In line with previous studies 5,7,40,41 , the activations of GPT-2 accurately map onto a distributed and bilateral set of brain areas. Brain scores peaked in the auditory cortex and in the anterior temporal and superior temporal areas (Fig. 2a, Supplementary Fig. 1 ...
Article
Full-text available
Considerable progress has recently been made in natural language processing: deep learning algorithms are increasingly able to generate, summarize, translate and classify texts. Yet, these language models still fail to match the language abilities of humans. Predictive coding theory offers a tentative explanation to this discrepancy: while language models are optimized to predict nearby words, the human brain would continuously predict a hierarchy of representations that spans multiple timescales. To test this hypothesis, we analysed the functional magnetic resonance imaging brain signals of 304 participants listening to short stories. First, we confirmed that the activations of modern language models linearly map onto the brain responses to speech. Second, we showed that enhancing these algorithms with predictions that span multiple timescales improves this brain mapping. Finally, we showed that these predictions are organized hierarchically: frontoparietal cortices predict higher-level, longer-range and more contextual representations than temporal cortices. Overall, these results strengthen the role of hierarchical predictive coding in language processing and illustrate how the synergy between neuroscience and artificial intelligence can unravel the computational bases of human cognition.
... Such a property would in the limit entail that brain encodings are isomorphic to language model representations (Peng et al., 2020). Other research articles that seem to suggest that language model representations are generally isomorphic to brain activity patterns include (Mitchell et al., 2008;Søgaard, 2016;Wehbe et al., 2014;Pereira et al., 2018;Gauthier & Levy, 2019;Caucheteux & King, 2022). ...
Article
Full-text available
Most, if not all, philosophers agree that computers cannot learn what words refers to from raw text alone. While many attacked Searle’s Chinese Room thought experiment, no one seemed to question this most basic assumption. For how can computers learn something that is not in the data? Emily Bender and Alexander Koller (2020) recently presented a related thought experiment—the so-called Octopus thought experiment, which replaces the rule-based interlocutor of Searle’s thought experiment with a neural language model. The Octopus thought experiment was awarded a best paper prize and was widely debated in the AI community. Again, however, even its fiercest opponents accepted the premise that what a word refers to cannot be induced in the absence of direct supervision. I will argue that what a word refers to is probably learnable from raw text alone. Here’s why: higher-order concept co-occurrence statistics are stable across languages and across modalities, because language use (universally) reflects the world we live in (which is relatively stable). Such statistics are sufficient to establish what words refer to. My conjecture is supported by a literature survey, a thought experiment, and an actual experiment.
... Another examples is Wehbe et al. [110], who proposed an analogy between the recurrent neural network language model (RNNLM) and the working mechanism of the reading brain. They found that the way the human brain works when reading a story is somewhat similar to how RNNLMs work when processing sentences. ...
Preprint
Full-text available
Language understanding is a key scientific issue in the fields of cognitive and computer science. However, the two disciplines differ substantially in the specific research questions. Cognitive science focuses on analyzing the specific mechanism of the brain and investigating the brain's response to language; few studies have examined the brain's language system as a whole. By contrast, computer scientists focus on the efficiency of practical applications when choosing research questions but may ignore the most essential laws of language. Given these differences, can a combination of the disciplines offer new insights for building intelligent language models and studying language cognitive mechanisms? In the following text, we first review the research questions, history, and methods of language understanding in cognitive and computer science, focusing on the current progress and challenges. We then compare and contrast the research of language understanding in cognitive and computer sciences. Finally, we review existing work that combines insights from language cognition and language computation and offer prospects for future development trends.
... Language models that have been pretrained for the next word prediction task using millions of text documents can significantly predict brain recordings of people comprehending language (Wehbe et al., 2014;Jain and Huth, 2018;Caucheteux and King, 2020;Schrimpf et al., 2021;. Understanding the reasons behind the observed similarities between language comprehension in machines and brains can lead to more insight into both systems. ...
Preprint
Full-text available
Language models have been shown to be very effective in predicting brain recordings of subjects experiencing complex language stimuli. For a deeper understanding of this alignment, it is important to understand the alignment between the detailed processing of linguistic information by the human brain versus language models. In NLP, linguistic probing tasks have revealed a hierarchy of information processing in neural language models that progresses from simple to complex with an increase in depth. On the other hand, in neuroscience, the strongest alignment with high-level language brain regions has consistently been observed in the middle layers. These findings leave an open question as to what linguistic information actually underlies the observed alignment between brains and language models. We investigate this question via a direct approach, in which we eliminate information related to specific linguistic properties in the language model representations and observe how this intervention affects the alignment with fMRI brain recordings obtained while participants listened to a story. We investigate a range of linguistic properties (surface, syntactic and semantic) and find that the elimination of each one results in a significant decrease in brain alignment across all layers of a language model. These findings provide direct evidence for the role of specific linguistic information in the alignment between brain and language models, and opens new avenues for mapping the joint information processing in both systems.
... Many studies have reported that word vectors are closely related to neural and cognitive effects in language processing and reading comprehension. For instance, researchers have found neural correlates between the word vectors themselves (rather than the distances between them), in both single-word comprehension (Mitchell et al., 2008) and in narrative reading (Wehbe, Vaswani, Knight, & Mitchell, 2014). With the recent advancement of natural language processing (NLP) techniques and the availability of experimental databases on language comprehension, more studies have been done to test the feasibility of using word embeddings to predict reading behavior. ...
Article
Full-text available
Predictions about upcoming content play an important role during language comprehension and processing. Semantic similarity as a metric has been used to predict how words are processed in context in language comprehension and processing tasks. This study proposes a novel, dynamic approach for computing contextual semantic similarity , evaluates the extent to which the semantic similarity measures computed using this approach can predict fixation durations in reading tasks recorded in a corpus of eye-tracking data, and compares the performance of these measures to that of semantic similarity measures computed using the cosine and Euclidean methods. Our results reveal that the semantic similarity measures generated by our approach are significantly predictive of fixation durations on reading and outperform those generated by the two existing approaches. The findings of this study contribute to a better understanding of how humans process words in context and make predictions in language comprehension and processing. The effective and interpretable approach to computing contextual semantic similarity proposed in this study can also facilitate further explorations of other experimental data on language comprehension and processing.
... Multimodal Learning of Language and Other Brain Signals Recently, language and cognitive data were also used together in multimodal settings to complete desirable tasks (Wang and Ji 2021;Hollenstein et al. 2019Hollenstein et al. , 2021; Hollenstein, Barrett, and Beinborn 2020). Wehbe et al. (2014) used a recurrent neural network to perform word alignment between MEG activity and the generated word embeddings. Toneva and Wehbe (2019) utilized word-level MEG and fMRI recordings to compare word embeddings from large language models. ...
Preprint
Full-text available
Electroencephalography (EEG) and language have been widely explored independently for many downstream tasks (e.g., sentiment analysis, relation detection, etc.). Multimodal approaches that study both domains have not been well explored, even though in recent years, multimodal learning has been seen to be more powerful than its unimodal counterparts. In this study, we want to explore the relationship and dependency between EEG and language, i.e., how one domain reflects and represents the other. To study the relationship at the representation level, we introduced MTAM, a Multimodal Transformer Alignment Model, to observe coordinated representations between the two modalities, and thus employ the transformed representations for downstream applications. We used various relationship alignment-seeking techniques, such as Canonical Correlation Analysis and Wasserstein Distance, as loss functions to transfigure low-level language and EEG features to high-level transformed features. On downstream applications, sentiment analysis, and relation detection, we achieved new state-of-the-art results on two datasets, ZuCo and K-EmoCon. Our method achieved an F1-score improvement of 16.5% on sentiment analysis for K-EmoCon, 26.6% on sentiment analysis of ZuCo, and 31.1% on relation detection of ZuCo. In addition, we provide interpretation of the performance improvement by: (1) visualizing the original feature distribution and the transformed feature distribution, showing the effectiveness of the alignment module for discovering and encoding the relationship between EEG and language; (2) visualizing word-level and sentence-level EEG-language alignment weights, showing the influence of different language semantics as well as EEG frequency features; and (3) visualizing brain topographical maps to provide an intuitive demonstration of the connectivity of EEG and language response in the brain regions.
... The availability of language models that can process connected text has increased the scope of cognitive neuroscientists' toolkit for probing the relationship between computational language representations and the neural signals. Mirroring the successes in computer vision 24 and the subsequent modeling of neural processing in visual perceptual hierarchies [25][26][27] , computational linguists are beginning to interpret how language models achieve their task performance [28][29][30] and what is the correspondence between such pretrained model representations and neural responses recorded when participants engage in similar language tasks [31][32][33][34][35][36][37] . On the one hand, task-optimized ANNs therefore serve as a tool and a framework that allow us to operationalize and identify which computational primitives serve as the candidate hypotheses for explaining neural data [38][39][40][41] . ...
Article
Full-text available
Recently, cognitive neuroscientists have increasingly studied the brain responses to narratives. At the same time, we are witnessing exciting developments in natural language processing where large-scale neural network models can be used to instantiate cognitive hypotheses in narrative processing. Yet, they learn from text alone and we lack ways of incorporating biological constraints during training. To mitigate this gap, we provide a narrative comprehension magnetoencephalography (MEG) data resource that can be used to train neural network models directly on brain data. We recorded from 3 participants, 10 separate recording hour-long sessions each, while they listened to audiobooks in English. After story listening, participants answered short questions about their experience. To minimize head movement, the participants wore MEG-compatible head casts, which immobilized their head position during recording. We report a basic evoked-response analysis showing that the responses accurately localize to primary auditory areas. The responses are robust and conserved across 10 sessions for every participant. We also provide usage notes and briefly outline possible future uses of the resource.
... In natural language processing, self-supervised language models (LM) encode diverse linguistic information and achieve excellent zeroshot performance on many language tasks (Peters et al., 2018;Radford et al., 2018;Devlin et al., 2019). Capitalizing on these findings, neuroimaging studies have shown that representations extracted from LMs are highly effective at predicting brain activity elicited by natural language (Wehbe et al., 2014;Jain & Huth, 2018;Toneva & Wehbe, 2019;Schrimpf et al., 2021;Caucheteux et al., 2021;Goldstein et al., 2020) and can help reveal how linguistic representations are organized across human cortex (Jain et al., 2020). ...
Preprint
Full-text available
Self-supervised language models are very effective at predicting high-level cortical responses during language comprehension. However, the best current models of lower-level auditory processing in the human brain rely on either hand-constructed acoustic filters or representations from supervised audio neural networks. In this work, we capitalize on the progress of self-supervised speech representation learning (SSL) to create new state-of-the-art models of the human auditory system. Compared against acoustic baselines, phonemic features, and supervised models, representations from the middle layers of self-supervised models (APC, wav2vec, wav2vec 2.0, and HuBERT) consistently yield the best prediction performance for fMRI recordings within the auditory cortex (AC). Brain areas involved in low-level auditory processing exhibit a preference for earlier SSL model layers, whereas higher-level semantic areas prefer later layers. We show that these trends are due to the models' ability to encode information at multiple linguistic levels (acoustic, phonetic, and lexical) along their representation depth. Overall, these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
... Unlike convolutional neural networks, whose architectural design principles are roughly inspired by biological vision [Lindsay, 2021], the design of current neural network language models is largely uninformed by psycholinguistics and neuroscience. And yet, there is an ongoing effort to adopt and adapt neural network language models to serve as computational hypotheses of how humans process language, making use of a variety of different architectures, training corpora, and training tasks [e.g., Wehbe et al., 2014, Toneva and Wehbe, 2019, Heilbron et al., 2020, Jain et al., 2020, Lyu et al., 2021, Schrimpf et al., 2021, Wilcox et al., 2021, Goldstein et al., 2022, Caucheteux and King, 2022. We found that recurrent neural networks make markedly human-inconsistent predictions once pitted against transformer-based neural networks. ...
Preprint
Full-text available
Neural network language models can serve as computational hypotheses about how humans process language. We compared the model-human consistency of diverse language models using a novel experimental approach: controversial sentence pairs. For each controversial sentence pair, two language models disagree about which sentence is more likely to occur in natural text. Considering nine language models (including n-gram, recurrent neural networks, and transformer models), we created hundreds of such controversial sentence pairs by either selecting sentences from a corpus or synthetically optimizing sentence pairs to be highly controversial. Human subjects then provided judgments indicating for each pair which of the two sentences is more likely. Controversial sentence pairs proved highly effective at revealing model failures and identifying models that aligned most closely with human judgments. The most human-consistent model tested was GPT-2, although experiments also revealed significant shortcomings of its alignment with human perception.
... The improvement in the ability to predict neural signals to each word while relying on autoregressive DLM's contextual embeddings was robust and apparent even at the single-electrode level (Extended Data Fig. 4). These results agree with concurrent studies demonstrating that contextual embeddings model neural responses to words better than static semantic embeddings 15,16,45,46 . Next, we asked which aspects of the contextual embedding drive the improvement in modeling the neural activity. ...
Article
Full-text available
Departing from traditional linguistic models, advances in deep learning have resulted in a new type of predictive (autoregressive) deep language models (DLMs). Using a self-supervised next-word prediction task, these models generate appropriate linguistic responses in a given context. In the current study, nine participants listened to a 30-min podcast while their brain responses were recorded using electrocorticography (ECoG). We provide empirical evidence that the human brain and autoregressive DLMs share three fundamental computational principles as they process the same natural narrative: (1) both are engaged in continuous next-word prediction before word onset; (2) both match their pre-onset predictions to the incoming word to calculate post-onset surprise; (3) both rely on contextual embeddings to represent words in natural contexts. Together, our findings suggest that autoregressive DLMs provide a new and biologically feasible computational framework for studying the neural basis of language.
... We estimate participant specific encoding-models which predict the brain activity of a brain region (i.e., ROI, voxel) as a function of a feature-space matrix. We use ridge regression (L2-regularized regression) to estimate the encoding-models, similar to previous work [11,14,17,18,63,64] . For each brain region the ridge regularization parameter is selected independently using nested 10-fold cross-validation. ...
Preprint
Full-text available
Similar to how differences in the proficiency of the cardiovascular and musculoskeletal system predict an individual's athletic ability, differences in how the same brain region encodes information across individuals may explain their behavior. However, when studying how the brain encodes information, researchers choose different neuroimaging tasks (e.g., language or motor tasks), which can rely on processing different types of information and can modulate different brain regions. We hypothesize that individual differences in how information is encoded in the brain are task-specific and predict different behavior measures. We propose a framework using encoding-models to identify individual differences in brain encoding and test if these differences can predict behavior. We evaluate our framework using task functional magnetic resonance imaging data. Our results indicate that individual differences revealed by encoding-models are a powerful tool for predicting behavior, and that researchers should optimize their choice of task and encoding-model for their behavior of interest.
... In line with previous studies (5,7,33,34), the activations of GPT-2 accurately map onto a distributed and bilateral set of brain areas. Brain scores peak in the auditory cortex, as well as in the anterior temporal and superior temporal areas ( Figure 2A and Figure S8). ...
Preprint
Full-text available
Deep learning has recently made remarkable progress in natural language processing. Yet, the resulting algorithms remain far from competing with the language abilities of the human brain. Predictive coding theory offers a potential explanation to this discrepancy: while deep language algorithms are optimized to predict adjacent words, the human brain would be tuned to make long-range and hierarchical predictions. To test this hypothesis, we analyze the fMRI brain signals of 304 subjects each listening to 70min of short stories. After confirming that the activations of deep language algorithms linearly map onto those of the brain, we show that enhancing these models with long-range forecast representations improves their brain-mapping. The results further reveal a hierarchy of predictions in the brain, whereby the fronto-parietal cortices forecast more abstract and more distant representations than the temporal cortices. Overall, this study strengthens predictive coding theory and suggests a critical role of long-range and hierarchical predictions in natural language processing.
Chapter
We describe how we can analyze the representational structure of a model by examining similarity relations in the representational space. We discuss the criteria for comparing the distributional spaces and how these representations can be tested through probing methods. Finally, we argue that grounding language in multimodal aspects by integrating cognitive signals into the models.
Article
Full-text available
We present the Radboud Coregistration Corpus of Narrative Sentences (RaCCooNS), the first freely available corpus of eye-tracking-with-EEG data collected while participants read narrative sentences in Dutch. The corpus is intended for studying human sentence comprehension and for evaluating the cognitive validity of computational language models. RaCCooNS contains data from 37 participants (3 of which eye tracking only) reading 200 Dutch sentences each. Less predictable words resulted in significantly longer reading times and larger N400 sizes, replicating well-known surprisal effects in eye tracking and EEG simultaneously. We release the raw eye-tracking data, the preprocessed eye-tracking data at the fixation, word, and trial levels, the raw EEG after merger with eye-tracking data, and the preprocessed EEG data both before and after ICA-based ocular artifact correction.
Preprint
Full-text available
One of the greatest puzzles of all time is how understanding arises from neural mechanics. Our brains are networks of billions of biological neurons transmitting chemical and electrical signals along their connections. Large language models are networks of millions or billions of digital neurons, implementing functions that read the output of other functions in complex networks. The failure to see how meaning would arise from such mechanics has led many cognitive scientists and philosophers to various forms of dualism -- and many artificial intelligence researchers to dismiss large language models as stochastic parrots or jpeg-like compressions of text corpora. We show that human-like representations arise in large language models. Specifically, the larger neural language models get, the more their representations are structurally similar to neural response measurements from brain imaging.
Article
Recent artificial neural networks that process natural language achieve unprecedented performance in tasks requiring sentence-level understanding. As such, they could be interesting models of the integration of linguistic information in the human brain. We review works that compare these artificial language models with human brain activity and we assess the extent to which this approach has improved our understanding of the neural processes involved in natural language comprehension. Two main results emerge. First, the neural representation of word meaning aligns with the context-dependent, dense word vectors used by the artificial neural networks. Second, the processing hierarchy that emerges within artificial neural networks broadly matches the brain, but is surprisingly inconsistent across studies. We discuss current challenges in establishing artificial neural networks as process models of natural language comprehension. We suggest exploiting the highly structured representational geometry of artificial neural networks when mapping representations to brain data.
Chapter
Modern neural networks specialised in natural language processing (NLP) are not implemented with any explicit rules regarding language. It has been hypothesised that they might learn something generic about language. Because of this property much research has been conducted on interpreting their inner representations. A novel approach has utilised an experimental procedure that uses human brain recordings to investigate if a mapping from brain to neural network representations can be learned. Since this novel approach has been introduced, more advanced models in NLP have been introduced. In this research we are using this novel approach to test four new NLP models to try and find the most brain aligned model. Moreover, in our effort to unravel important information on how the brain processes text semantically, we modify the text in the hope of getting a better mapping out of the models. We remove punctuation using four different scenarios to determine the effect of punctuation on semantic understanding by the human brain. Our results show that the RoBERTa model is most brain aligned. RoBERTa achieves a higher accuracy score on our evaluation than BERT. Our results also show for BERT that when punctuation was removed a higher accuracy was achieved and that as the context length increased the accuracy did not decrease as much as the original results that include punctuation.
Chapter
Language–brain encoding experiments evaluate the ability of language models to predict brain responses elicited by language stimuli. The evaluation scenarios for this task have not yet been standardized which makes it difficult to compare and interpret results. We perform a series of evaluation experiments with a consistent encoding setup and compute the results for multiple fMRI datasets. In addition, we test the sensitivity of the evaluation measures to randomized data and analyze the effect of voxel selection methods. Our experimental framework is publicly available to make modelling decisions more transparent and support reproducibility for future comparisons.
Conference Paper
Nowadays, injection molds are manually designed by humans using computer-aided design (CAD) systems. The placement of ejector pins is a critical step in the injection mold design to enable demolding complex parts in the production. Since each injection mold is unique, designers are limited in using standard ejector layouts or previous mold designs, which results in high design time so that an automation of the design process is needed. For such a system, human knowledge is essential. Therefore, we propose a human-centric machine learning (HCML) approach for the automatic placement of ejector pins for injection molds. In this work, we extract mental models of injection mold designers to obtain machine-readable fundamental design rules and train a machine learning model using an ongoing human-machine learning approach.
Preprint
Full-text available
How the brain captures the meaning of linguistic stimuli across multiple views is still a critical open question in neuroscience. Consider three different views of the concept apartment: (1) picture (WP) presented with the target word label, (2) sentence (S) using the target word, and (3) word cloud (WC) containing the target word along with other semantically related words. Unlike previous efforts, which focus only on single view analysis, in this paper, we study the effectiveness of brain decoding in a zero-shot cross-view learning setup. Further, we propose brain decoding in the novel context of cross-view-translation tasks like image captioning (IC), image tagging (IT), keyword extraction (KE), and sentence formation (SF). Using extensive experiments, we demonstrate that cross-view zero-shot brain decoding is practical leading to ~0.68 average pairwise accuracy across view pairs. Also, the decoded representations are sufficiently detailed to enable high accuracy for cross-view-translation tasks with following pairwise accuracy: IC (78.0), IT (83.0), KE (83.7) and SF (74.5). Analysis of the contribution of different brain networks reveals exciting cognitive insights: (1) A high percentage of visual voxels are involved in image captioning and image tagging tasks, and a high percentage of language voxels are involved in the sentence formation and keyword extraction tasks. (2) Zero-shot accuracy of the model trained on S view and tested on WC view is better than same-view accuracy of the model trained and tested on WC view.
Article
Full-text available
Story understanding involves many perceptual and cognitive subprocesses, from perceiving individual words, to parsing sentences, to understanding the relationships among the story characters. We present an integrated computational model of reading that incorporates these and additional subprocesses, simultaneously discovering their fMRI signatures. Our model predicts the fMRI activity associated with reading arbitrary text passages, well enough to distinguish which of two story segments is being read with 74% accuracy. This approach is the first to simultaneously track diverse reading subprocesses during complex story processing and predict the detailed neural representation of diverse story features, ranging from visual word properties to the mention of different story characters and different actions they perform. We construct brain representation maps that replicate many results from a wide range of classical studies that focus each on one aspect of language processing and offer new insights on which type of information is processed by different areas involved in language processing. Additionally, this approach is promising for studying individual differences: it can be used to create single subject maps that may potentially be used to measure reading comprehension and diagnose reading disorders.
Article
Full-text available
Consider the ridge estimate (λ) for β in the model unknown, (λ) = (XX + nλI) Xy. We study the method of generalized cross-validation (GCV) for choosing a good value for λ from the data. The estimate is the minimizer of V(λ) given bywhere A(λ) = X(XX + nλI) X . This estimate is a rotation-invariant version of Allen's PRESS, or ordinary cross-validation. This estimate behaves like a risk improvement estimator, but does not require an estimate of σ, so can be used when n − p is small, or even if p ≥ 2 n in certain cases. The GCV method can also be used in subset selection and singular value truncation methods for regression, and even to choose from among mixtures of these methods.
Conference Paper
Full-text available
We present several modifications of the original recurrent neural net work language model (RNN LM). While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 15 times speedup for both training and testing phases. Next, we show importance of using a backpropagation through time algorithm. An empirical comparison with feedforward networks is also provided. In the end, we discuss possibilities how to reduce the amount of parameters in the model. The resulting RNN model can thus be smaller, faster both during training and testing, and more accurate than the basic one.
Article
Full-text available
The study compared the brain activation patterns associated with the comprehension of written and spoken Portuguese sentences. An fMRI study measured brain activity while participants read and listened to sentences about general world knowledge. Participants had to decide if the sentences were true or false. To mirror the transient nature of spoken sentences, visual input was presented in rapid serial visual presentation format. The results showed a common core of amodal left inferior frontal and middle temporal gyri activation, as well as modality specific brain activation associated with listening and reading comprehension. Reading comprehension was associated with more left-lateralized activation and with left inferior occipital cortex (including fusiform gyrus) activation. Listening comprehension was associated with extensive bilateral temporal cortex activation and more overall activation of the whole cortex. Results also showed individual differences in brain activation for reading comprehension. Readers with lower working memory capacity showed more activation of right-hemisphere areas (spillover of activation) and more activation in the prefrontal cortex, potentially associated with more demand placed on executive control processes. Readers with higher working memory capacity showed more activation in a frontal-posterior network of areas (left angular and precentral gyri, and right inferior frontal gyrus). The activation of this network may be associated with phonological rehearsal of linguistic information when reading text presented in rapid serial visual format. The study demonstrates the modality fingerprints for language comprehension and indicates how low- and high working memory capacity readers deal with reading text presented in serial format.
Article
Full-text available
Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparable procedures which control the traditional familywise error rate. We prove that this same procedure also controls the false discovery rate when the test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses. This condition for positive dependency is general enough to cover many problems of practical interest, including the comparisons of many treatments with a single control, multivariate normal test statistics with positive correlation matrix and multivariate t. Furthermore, the test statistics may be discrete, and the tested hypotheses composite without posing special difficulties. For all other forms of dependency, a simple conservative modification of the procedure controls the false discovery rate. Thus the range of problems for which a procedure with proven FDR control can be offered is greatly increased.
Article
Full-text available
Multichannel measurement with hundreds of channels oversamples a curl-free vector field, like the magnetic field in a volume free of sources. This is based on the constraint caused by the Laplace's equation for the magnetic scalar potential; outside of the source volume the signals are spatially band limited. A functional solution of Laplace's equation enables one to separate the signals arising from the sphere enclosing the interesting sources, e.g. the currents in the brain, from the magnetic interference. Signal space separation (SSS) is accomplished by calculating individual basis vectors for each term of the functional expansion to create a signal basis covering all measurable signal vectors. Because the SSS basis is linearly independent for all practical sensor arrangements, any signal vector has a unique SSS decomposition with separate coefficients for the interesting signals and signals coming from outside the interesting volume. Thus, SSS basis provides an elegant method to remove external disturbances. The device-independent SSS coefficients can be used in transforming the interesting signals to virtual sensor configurations. This can also be used in compensating for distortions caused by movement of the object by modeling it as movement of the sensor array around a static object. The device-independence of the decomposition also enables physiological DC phenomena to be recorded using voluntary head movements. When used with properly designed sensor array, SSS does not affect the morphology or the signal-to-noise ratio of the interesting signals.
Article
Full-text available
Functional dissociations within the neural basis of auditory sentence processing are difficult to specify because phonological, syntactic and semantic information are all involved when sentences are perceived. In this review I argue that sentence processing is supported by a temporo-frontal network. Within this network, temporal regions subserve aspects of identification and frontal regions the building of syntactic and semantic relations. Temporal analyses of brain activation within this network support syntax-first models because they reveal that building of syntactic structure precedes semantic processes and that these interact only during a later stage.
Article
Full-text available
The development of high-resolution neuroimaging and multielectrode electrophysiological recording provides neuroscientists with huge amounts of multivariate data. The complexity of the data creates a need for statistical summary, but the local averaging standardly applied to this end may obscure the effects of greatest neuroscientific interest. In neuroimaging, for example, brain mapping analysis has focused on the discovery of activation, i.e., of extended brain regions whose average activity changes across experimental conditions. Here we propose to ask a more general question of the data: Where in the brain does the activity pattern contain information about the experimental condition? To address this question, we propose scanning the imaged volume with a “searchlight,” whose contents are analyzed multivariately at each location in the brain. • neuroimaging • functional magnetic resonance imaging • statistical analysis
Article
Full-text available
Limitations of traditional magnetoencephalography (MEG) exclude some important patient groups from MEG examinations, such as epilepsy patients with a vagus nerve stimulator, patients with magnetic particles on the head or having magnetic dental materials that cause severe movement-related artefact signals. Conventional interference rejection methods are not able to remove the artefacts originating this close to the MEG sensor array. For example, the reference array method is unable to suppress interference generated by sources closer to the sensors than the reference array, about 20-40 cm. The spatiotemporal signal space separation method proposed in this paper recognizes and removes both external interference and the artefacts produced by these nearby sources, even on the scalp. First, the basic separation into brain-related and external interference signals is accomplished with signal space separation based on sensor geometry and Maxwell's equations only. After this, the artefacts from nearby sources are extracted by a simple statistical analysis in the time domain, and projected out. Practical examples with artificial current dipoles and interference sources as well as data from real patients demonstrate that the method removes the artefacts without altering the field patterns of the brain signals.
Article
A new non parametric approach to the problem of testing the independence of two random process is developed. The test statistic is the Hilbert Schmidt Independence Criterion (HSIC), which was used previously in testing independence for i.i.d pairs of variables. The asymptotic behaviour of HSIC is established when computed from samples drawn from random processes. It is shown that earlier bootstrap procedures which worked in the i.i.d. case will fail for random processes, and an alternative consistent estimate of the p-values is proposed. Tests on artificial data and real-world Forex data indicate that the new test procedure discovers dependence which is missed by linear approaches, while the earlier bootstrap procedure returns an elevated number of false positives.
Article
We explore the application of neural language models to machine translation. We develop a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and we incorporate it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. Our large-scale, large-vocabulary experiments across four language pairs show that our neural language model improves translation quality by up to 1.1 Bleu.
Conference Paper
We investigated the effect of word sur-prisal on the EEG signal during sentence reading. On each word of 205 experimental sentences, surprisal was estimated by three types of language model: Markov models, probabilistic phrase-structure grammars, and recurrent neural networks. Four event-related potential components were extracted from the EEG of 24 readers of the same sentences. Surprisal estimates under each model type formed a significant predictor of the amplitude of the N400 component only, with more surprising words resulting in more negative N400s. This effect was mostly due to content words. These findings provide support for surprisal as a generally applicable measure of processing difficulty during language comprehension.
Conference Paper
Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these “Stepped Sigmoid Units ” are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors. 1.
Article
Clinical evaluation of language function and basic neuroscience research into the neurophysiology of language are tied together. Whole-head MEG systems readily facilitate detailed spatiotemporal characterization of language processes. A fair amount of information is available about the cortical sequence of word perception and comprehension in the auditory and visual domain, which can be applied for clinical use. Language production remains, at present, somewhat less well charted. In clinical practice, the most obvious needs are noninvasive evaluation of the language-dominant hemisphere and mapping of areas involved in language performance to assist surgery. Multiple experimental designs and analysis approaches have been proposed for estimation of language lateralization. Some of them have been compared with the invasive Wada test and need to be tested further. Development of approaches for more comprehensive pre-surgical characterization of language cortex should build on basic neuroscience research, making use of parametric designs that allow functional mapping. Studies of the neural basis of developmental and acquired language disorders, such as dyslexia, stuttering, and aphasia can currently be regarded more as clinical or basic neuroscience research rather than as clinical routine. Such investigations may eventually provide tools for development of individually targeted training procedures and their objective evaluation.
Article
Syntax is one of the components in the architecture of language processing that allows the listener/reader to bind single-word information into a unified interpretation of multiword utterances. This paper discusses ERP effects that have been observed in relation to syntactic processing. The fact that these effects differ from the semantic N400 indicates that the brain honors the distinction between semantic and syntactic binding operations. Two models of syntactic processing attempt to account for syntax-related ERP effects. One type of model is serial, with a first phase that is purely syntactic in nature (syntax-first model). The other type of model is parallel and assumes that information immediately guides the interpretation process once it becomes available. This is referred to as the immediacy model. ERP evidence is presented in support of the latter model. Next, an explicit computational model is proposed to explain the ERP data. This Unification Model assumes that syntactic frames are stored in memory and retrieved on the basis of the spoken or written word form input. The syntactic frames associated with the individual lexical items are unified by a dynamic binding process into a structural representation that spans the whole utterance. On the basis of a meta-analysis of imaging studies on syntax, it is argued that the left posterior inferior frontal cortex is involved in binding syntactic frames together, whereas the left superior temporal cortex is involved in retrieval of the syntactic frames stored in memory. Lesion data that support the involvement of this left frontotemporal network in syntactic processing are discussed.
  • Kacper Chwialkowski
  • Arthur Gretton
Kacper Chwialkowski and Arthur Gretton. 2014. A kernel independence test for random processes. arXiv preprint arXiv:1402.4501.
RNNLMrecurrent neural network language modeling toolkit
  • Tomas Mikolov
  • Stefan Kombrink
  • Anoop Deoras
  • Lukar Burget
  • J Cernocky
Tomas Mikolov, Stefan Kombrink, Anoop Deoras, Lukar Burget, and J Cernocky. 2011. RNNLMrecurrent neural network language modeling toolkit. In Proc. of the 2011 ASRU Workshop, pages 196-201.