Article

Analyzing Neuroimaging Data with Subclasses: a Shrinkage Approach

If you want to read the PDF, try requesting it from the authors.

Abstract

Among the numerous methods used to analyze neuroimaging data, Linear Discriminant Analysis (LDA) is commonly applied for binary classification problems. LDA’s popularity derives from its simplicity and its competitive classification performance, which has been reported for various types of neuroimaging data. Yet the standard LDA approach proves less than optimal for binary classification problems when additional label information (i.e. subclass labels) is present. Subclass labels allow to model structure in the data, which can be used to facilitate the classification task. In this paper, we illustrate how neuroimaging data exhibit subclass labels that may contain valuable information. We also show that the standard LDA classifier is unable to exploit subclass labels. We introduce a novel method that allows subclass labels to be incorporated efficiently into the classifier. The novel method, which we call Relevance Subclass LDA (RSLDA), computes an individual classification hyperplane for each subclass. It is based on regularized estimators of the subclass mean and uses other subclasses as regularization targets. We demonstrate the applicability and performance of our method on data drawn from two different neuroimaging modalities: (I) EEG data from brain-computer interfacing with event-related potentials, and (II) fMRI data in response to different levels of visual motion. We show that RSLDA outperforms the standard LDA approach for both types of datasets. These findings illustrate the benefits of exploiting subclass structure in neuroimaging data. Finally, we show that our classifier also outputs regularization profiles, enabling researchers to interpret it meaningfully. RSLDA therefore yields increased classification accuracy as well as a better interpretation of neuroimaging data. Since both results are highly favorable, we suggest to apply RSLDA for various classification problems within neuroimaging and beyond.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... However, as such a specialization entails reduced training data, it may be detrimental when applied to objects with similar optical properties. A more data-efficient treatment of subclass structure has been proposed by Höhne et al. (2016). While using traditional mean electrode potentials as ERP features, the authors proposed to regularize each subclass classifier toward other subclasses. ...
... These aspects can be viewed as subclasses of the data. Höhne et al. addressed the subclass structure in neuroimaging applications by training separate LDA classifiers for each subclass while regularizing them toward other subclasses using multi-target shrinkage (Bartz et al., 2014;Höhne et al., 2016). In principle, adapting subclassspecific classifiers to other subclasses can also be interpreted as a special case of transfer learning (e.g., Jayaram et al., 2016), which aims to improve performance by leveraging data from related tasks or datasets. ...
... As a second way to leverage the subclass information, we adapt the regularization approach proposed by Höhne et al. (2016) (denoted by RSLDA in their paper): Rather than calculating the class means µ t , µ nt for the LDA classifier of subclass j only on the subset of the data corresponding to K j , data from other subclasses j ′ = j is also used by calculating a weighted class mean. ...
Article
Full-text available
Brain signals represent a communication modality that can allow users of assistive robots to specify high-level goals, such as the object to fetch and deliver. In this paper, we consider a screen-free Brain-Computer Interface (BCI), where the robot highlights candidate objects in the environment using a laser pointer, and the user goal is decoded from the evoked responses in the electroencephalogram (EEG). Having the robot present stimuli in the environment allows for more direct commands than traditional BCIs that require the use of graphical user interfaces. Yet bypassing a screen entails less control over stimulus appearances. In realistic environments, this leads to heterogeneous brain responses for dissimilar objects—posing a challenge for reliable EEG classification. We model object instances as subclasses to train specialized classifiers in the Riemannian tangent space, each of which is regularized by incorporating data from other objects. In multiple experiments with a total of 19 healthy participants, we show that our approach not only increases classification performance but is also robust to both heterogeneous and homogeneous objects. While especially useful in the case of a screen-free BCI, our approach can naturally be applied to other experimental paradigms with potential subclass structure.
... They therefore affect results only minimally. In a multidimensional space, however, subclasses can form distinguishable clusters [10,11]. A classifier can learn to separate one or several of these clusters based on their distinguishing features instead of on those features shared by all elements of the class. ...
... Nevertheless, one should note that the results of this paper are confined to the application of classification in data with nested subclasses. In data sets with crossed designs, because the subclass structure is the same within all subclasses, the subclass information can actually be used to improve classification accuracy [11,23,24]. ...
Article
Full-text available
Biological data sets are typically characterized by high dimensionality and low effect sizes. A powerful method for detecting systematic differences between experimental conditions in such multivariate data sets is multivariate pattern analysis (MVPA), particularly pattern classification. However, in virtually all applications, data from the classes that correspond to the conditions of interest are not homogeneous but contain subclasses. Such subclasses can for example arise from individual subjects that contribute multiple data points, or from correlations of items within classes. We show here that in multivariate data that have subclasses nested within its class structure, these subclasses introduce systematic information that improves classifiability beyond what is expected by the size of the class difference. We analytically prove that this subclass bias systematically inflates correct classification rates (CCRs) of linear classifiers depending on the number of subclasses as well as on the portion of variance induced by the subclasses. In simulations, we demonstrate that subclass bias is highest when between-class effect size is low and subclass variance high. This bias can be reduced by increasing the total number of subclasses. However, we can account for the subclass bias by using permutation tests that explicitly consider the subclass structure of the data. We illustrate our result in several experiments that recorded human EEG activity, demonstrating that parametric statistical tests as well as typical trial-wise permutation fail to determine significance of classification outcomes correctly.
... By letting each method's strengths compensate for the weaknesses of the other, we intend to obtain a self-learning decoder that is both reliable and effective. The proposed method is inspired by the shrinkage approach for regularisation of supervised models (Höhne et al., 2016) and the mixing of parametric and non-parametric statistical estimators (Olkin and Spiegelman, 1987). ...
... Inspired by the concept of mean shrinkage for supervised classification (Höhne et al., 2016), the optimal mixing coefficient γ * is obtained as the value that minimises the expected mean squared error between the estimator valueμ and the unknown true parameter value µ: ...
... It has been recently proposed by Hohne et al. (2016) that additional label information (i.e. subclass labels) should be incorporated into the classifier to improve the accuracy of pattern classification in neuroimaging studies. ...
... It is important to note that this is only true for designs with crossed factors, i.e., when every subclass coexists in both categories. While exploiting the information that is shared between crossed subclasses can improve classification performance (Hohne et al., 2016), the contribution of such information in nested designs, i.e. when each subclass pertains only to one of the categories (see Experiments 1 and 2), represents a confound and can lead to false positive results. Nested data are characterized by a hierarchal, multi-level structure (e.g. ...
Article
Multivariate pattern analysis (MVPA) methods are now widely used in life-science research. They have great potential but their complexity also bears unexpected pitfalls. In this paper, we explore the possibilities that arise from the high sensitivity of MVPA for stimulus-related differences, which may confound estimations of class differences during decoding of cognitive concepts. We propose a method that takes advantage of concept-unrelated grouping factors, uses blocked permutation tests, and gradually manipulates the proportion of concept-related information in data while the stimulus-related, concept-irrelevant factors are held constant. This results in a concept-response curve, which shows the relative contribution of these two components, i.e. how much of the decoding performance is specific to higher-order category processing and to lower order stimulus processing. It also allows separating stimulus-related from concept-related neuronal processing, which cannot be achieved experimentally. We applied our method to three different EEG data sets with different levels of stimulus-related confound to decode concepts of digits vs. letters, faces vs. houses, and animals vs. fruits based on event-related potentials at the single trial level. We show that exemplar-specific differences between stimuli can drive classification accuracy to above chance levels even in the absence of conceptual information. By looking into time-resolved windows of brain activity, concept-response curves can help characterize the time-course of lower-level and higher-level neural information processing and detect the corresponding temporal and spatial signatures of the corresponding cognitive processes. In particular, our results show that perceptual information is decoded earlier in time than conceptual information specific to processing digits and letters. In addition, compared to the stimulus-level predictive sites, concept-related topographies are spread more widely and, at later time points, reach the frontal cortex. Thus, our proposed method yields insights into cognitive processing as well as corresponding brain responses.
... The classifier was linear discriminant analysis (LDA) with covariance shrinkage regularization [42]. LDA was used as it is a common and often successful choice for BCI applications [43,44], but also because log-power features follow an approximately normal distribution [45]: LDA is the optimal choice for normally distributed features, assuming both classes have the same noise covariance [46]. Since the electrophysiology analysis had shown that the effects of attention were spread over ICs and covered a wide range of frequencies, regularization was used to cope with the relatively high feature dimensionality. ...
... From the computational aspect of BCI design we see several attractive venues for future research. First, while shrinkage LDA is considered a competitive method for classification in BCIs [44], other classification approaches might improve accuracy. While some other classifiers commonly used in BCI research (QDA, linear and RBF SVM) did not show improvements over LDA in our preliminary tests, using dynamical classifiers that take into account the temporal structure of EEG data might be beneficial to the estimation of attentional states which have an inherent ebb and flow. ...
Article
Objective: Attention is known to modulate the plasticity of the motor cortex, and plasticity is crucial for recovery in motor rehabilitation. This study addresses the possibility of using an EEG-based brain-computer interface (BCI) to detect kinesthetic attention to movement. Approach: A novel experiment emulating physical rehabilitation was designed to study kinesthetic attention. The protocol involved continuous mobilization of lower limbs during which participants reported levels of attention to movement-from focused kinesthetic attention to mind wandering. For this protocol an asynchronous BCI detector of kinesthetic attention and deliberate mind wandering was designed. Main results: EEG analysis showed significant differences in theta, alpha, and beta bands, related to the attentional state. These changes were further pinpointed to bands relative to the frequency of the individual alpha peak. The accuracy of the designed BCI ranged between 60.8% and 68.4% (significantly above chance level), depending on the used analysis window length, i.e. acceptable detection delay. Significance: This study shows it is possible to use self-reporting to study attention-related changes in EEG during continuous mobilization. Such a protocol is used to develop an asynchronous BCI detector of kinesthetic attention, with potential applications to motor rehabilitation.
... Inspired by the concept of mean shrinkage for supervised classification [72], the optimal mixing coefficient γ * is obtained as the value that minimizes the expected mean squared error between the estimated valueμ and the unknown true parameter value µ: ...
Thesis
The principal idea of brain-computer interfaces (BCIs) is that a decoder translates brain signals into messages or control commands by utilizing machine learning (ML) methods. BCIs hold great promise to improve the living conditions of patients by providing a communication channel that is independent of motor control or by providing feedback about the ongoing brain state that can be used in a training scenario. Applying this neurotechnology is not without difficulties. A common observation is that brain signals strongly differ across patients, but also vary across different sessions of the same patient or even change within a single session. The reasons range from human factors, e.g. differences or changes in anatomical or functional network structures, to non-human factors such as differences in the measurement environment. While these changes clearly challenge the ML model and require a subject- and session-specific decoder, they can also be partly desired, for instance, in cases where BCI-based feedback is used to trigger targeted neuroplasticity. This thesis addresses both aspects: the quest for learning a good decoder that can cope with changing brain signals and the quest for finding new brain state dependent training protocols that can lead to functional improvements. In my methodical contributions, I demonstrate how unsupervised ML methods can quickly and reliably learn a good decoder even in the complete absence of labeled data, i.e. data where the user's intentions are unknown. This task is substantially more difficult than the traditional supervised ML where labeled data is collected during a calibration session and then used to associate brain signals to certain tasks. In contrast to supervised learning, unsupervised methods allow for continuous learning, adapting to changes in the data distribution and have the prospect of skipping or shortening the calibration session. I present a new approach called learning from label proportions (LLP) for BCIs based on event-related potentials (ERPs) where the unsupervised ML model exploits the existence of groups in the data that have different proportions of target and non-target stimuli. For some applications, these groups occur naturally, e.g. when the number of items varies across different selection steps such as in a BCI chess application, while the groups can be created by changing the user interface in other applications. Noticeably, LLP is the first unsupervised method in BCI that is guaranteed to converge to the optimal decoder even if no labels are available. When combined with an expectation-maximization algorithm, the resulting classifier shows remarkable performances. In a visual matrix speller, only 3 minutes of unlabeled electroencephalography (EEG) data were necessary until the unsupervised ML approach has learned a reliable decoder. In addition, classification performances were almost as good as for a supervised decoder that has full access to the labels. The results also showed that the unsupervised approaches even work well on challenging patient data from an auditory ERP paradigm. On the application side, I present the first successful BCI-based language training for patients with chronic aphasia. Aphasia refers to a language impairment that frequently occurs after brain strokes. The new training was developed together with the University Medical Center Freiburg. In contrast to previous speech and language therapies, ML methods were used to continuously monitor the ongoing brain state of patients using EEG signals in an auditory ERP paradigm. Based on this information, we continuously provide feedback to the patients that should reinforce beneficial brain states. In a pilot study, 10 stroke patients with chronic aphasia underwent 30 hours of high-intensity BCI-based language training. The results are extremely promising: compared to other therapies, our patients showed large improvements in their verbal abilities, and 5 patients were diagnosed as non-aphasic after the training even though their stroke occurred several months to years before the start of our training. Taken together, these contributions increase the usability of BCI systems and open the door for a completely new application field of BCIs with an enormous potential user group.
... In EEG methods of machine learning have been widely used over the last decade, see for example (AlZoubi, Calvo, and Stevens, 2009) (Sohaib, Qureshi, Hagelbäck, Hilborn, 2013) (Höhne, Bartz, Hebart, Müller, and Blankertz, 2015). ...
Article
Full-text available
Since it was first used in 1926, EEG has been one of the most useful instruments of neuroscience. In order to start using EEG data we need not only EEG apparatus, but also some analytical tools and skills to understand what our data mean. This article describes several classical analytical tools and also new one which appeared only several years ago. We hope it will be useful for those researchers who have only started working in the field of cognitive EEG.
Article
Full-text available
Objective: The reliable estimation of parameters such as mean or covariance matrix from noisy and high-dimensional observations is a prerequisite for successful application of signal processing and machine learning algorithms in brain-computer interfacing (BCI). This challenging task becomes significantly more difficult if the data set contains outliers, e.g. due to subject movements, eye blinks or loose electrodes, as they may heavily bias the estimation and the subsequent statistical analysis. Although various robust estimators have been developed to tackle the outlier problem, they ignore important structural information in the data and thus may not be optimal. Typical structural elements in BCI data are the trials consisting of a few hundred EEG samples and indicating the start and end of a task. Approach: This work discusses the parameter estimation problem in BCI and introduces a novel hierarchical view on robustness which naturally comprises different types of outlierness occurring in structured data. Furthermore, the class of minimum divergence estimators is reviewed and a robust mean and covariance estimator for structured data is derived and evaluated with simulations and on a benchmark data set. Main results: The results show that state-of-the-art BCI algorithms benefit from robustly estimated parameters. Significance: Since parameter estimation is an integral part of various machine learning algorithms, the presented techniques are applicable to many problems beyond BCI.
Article
Objective: Brain-computer interfaces (BCI) based on event-related potentials (ERP) incorporate a decoder to classify recorded brain signals and subsequently select a control signal that drives a computer application. Standard supervised BCI decoders require a tedious calibration procedure prior to every session. Several unsupervised classification methods have been proposed that tune the decoder during actual use and as such omit this calibration. Each of these methods has its own strengths and weaknesses. Our aim is to improve overall accuracy of ERP-based BCIs without calibration. Approach: We consider two approaches for unsupervised classification of ERP signals. Learning from label proportions (LLP) was recently shown to be guaranteed to converge to a supervised decoder when enough data is available. In contrast, the formerly proposed expectation maximization (EM) based decoding for ERP-BCI does not have this guarantee. However, while this decoder has high variance due to random initialization of its parameters, it obtains a higher accuracy faster than LLP when the initialization is good. We introduce a method to optimally combine these two unsupervised decoding methods, letting one method's strengths compensate for the weaknesses of the other and vice versa. The new method is compared to the aforementioned methods in a resimulation of an experiment with a visual speller. Main results: Analysis of the experimental results shows that the new method exceeds the performance of the previous unsupervised classification approaches in terms of ERP classification accuracy and symbol selection accuracy during the spelling experiment. Furthermore, the method shows less dependency on random initialization of model parameters and is consequently more reliable. Significance: Improving the accuracy and subsequent reliability of calibrationless BCIs makes these systems more appealing for frequent use.
Article
Full-text available
In this paper we review classification algorithms used to design brain–computer interface (BCI) systems based on electroencephalography (EEG). We briefly present the commonly employed algorithms and describe their critical properties. Based on the literature, we compare them in terms of performance and provide guidelines to choose the suitable classification algorithm(s) for a specific BCI.
Article
Full-text available
We introduce STOUT (spatio-temporal unifying tomography), a novel method for the source analysis of electroencephalograpic (EEG) recordings, which is based on a physiologically-motivated source representation. Our method assumes that only a small number of brain sources is active throughout a measurement, where each of the sources exhibits focal (smooth but localized) characteristics in space, time and frequency. This structure is enforced through an expansion of the source current density into appropriate spatio-temporal basis functions in combination with sparsity constraints. This approach combines the main strengths of two existing methods, namely Sparse Basis Field Expansions [26] and Time-Frequency Mixed-Norm Estimates [18] . By adjusting the ratio between two regularization terms, STOUT is capable of trading temporal for spatial reconstruction accuracy and vice versa, depending on the requirements of specific analyses and the provided data. Due to allowing for non-stationary source activations, STOUT is particularly suited for the localization of event-related potentials (ERP) and other evoked brain activity. We demonstrate its performance on simulated ERP data for varying signal-to-noise ratios and numbers of active sources. Our analysis of the generators of visual and auditory evoked N200 potentials reveals that the most active sources originate in the temporal and occipital lobes, in line with the literature on sensory processing. Copyright © 2015. Published by Elsevier Inc.
Conference Paper
Full-text available
Analytic shrinkage is a statistical technique that offers a fast alternative to crossvalidation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage-orthogonal complement shrinkage-which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience.
Article
Full-text available
The multivariate analysis of brain signals has recently sparked a great amount of interest, yet accessible and versatile tools to carry out decoding analyses are scarce. Here we introduce The Decoding Toolbox (TDT) which represents a user-friendly, powerful and flexible package for multivariate analysis of functional brain imaging data. TDT is written in Matlab and equipped with an interface to the widely used brain data analysis package SPM. The toolbox allows running fast whole-brain analyses, region-of-interest analyses and searchlight analyses, using machine learning classifiers, pattern correlation analysis, or representational similarity analysis. It offers automatic creation and visualization of diverse cross-validation schemes, feature scaling, nested parameter selection, a variety of feature selection methods, multiclass capabilities, and pattern reconstruction from classifier weights. While basic users can implement a generic analysis in one line of code, advanced users can extend the toolbox to their needs or exploit the structure to combine it with external high-performance classification toolboxes. The toolbox comes with an example data set which can be used to try out the various analysis methods. Taken together, TDT offers a promising option for researchers who want to employ multivariate analyses of brain activity patterns.
Conference Paper
Full-text available
Linear discriminant analysis (LDA) is the most commonly used classification method for single trial data in a brain-computer interface (BCI) framework. The popularity of LDA arises from its robustness, simplicity and high accuracy. However, the standard LDA approach is not capable to exploit sublabel information (such as stimulus identity), which is accessible in data from event related potentials (ERPs): it assumes that the evoked potentials are independent of the stimulus identity and dependent only on the users' attentional state. We question this assumption and investigate several methods which extract subclass-specific features from ERP data. Moreover, we propose a novel classification approach which exploits subclass-specific features using mean shrinkage. Based on a reanalysis of two BCI data sets, we show that our novel approach outperforms the standard LDA approach, while being computationally highly efficient.
Article
Full-text available
Perceptual confidence refers to the degree to which we believe in the accuracy of our percepts. Signal detection theory suggests that perceptual confidence is computed from an internal "decision variable," which reflects the amount of available information in favor of one or another perceptual interpretation of the sensory input. The neural processes underlying these computations have, however, remained elusive. Here, we used fMRI and multivariate decoding techniques to identify regions of the human brain that encode this decision variable and confidence during a visual motion discrimination task. We used observers' binary perceptual choices and confidence ratings to reconstruct the internal decision variable that governed the subjects' behavior. A number of areas in prefrontal and posterior parietal association cortex encoded this decision variable, and activity in the ventral striatum reflected the degree of perceptual confidence. Using a multivariate connectivity analysis, we demonstrate that patterns of brain activity in the right ventrolateral prefrontal cortex reflecting the decision variable were linked to brain signals in the ventral striatum reflecting confidence. Our results suggest that the representation of perceptual confidence in the ventral striatum is derived from a transformation of the continuous decision variable encoded in the cerebral cortex.
Article
Full-text available
Objective: Most BCIs have to undergo a calibration session in which data is recorded to train decoders with machine learning. Only recently zero-training methods have become a subject of study. This work proposes a probabilistic framework for BCI applications which exploit event-related potentials (ERPs). For the example of a visual P300 speller we show how the framework harvests the structure suitable to solve the decoding task by (a) transfer learning, (b) unsupervised adaptation, (c) language model and (d) dynamic stopping. Approach: A simulation study compares the proposed probabilistic zero framework (using transfer learning and task structure) to a state-of-the-art supervised model on n = 22 subjects. The individual influence of the involved components (a)-(d) are investigated. Main results: Without any need for a calibration session, the probabilistic zero-training framework with inter-subject transfer learning shows excellent performance--competitive to a state-of-the-art supervised method using calibration. Its decoding quality is carried mainly by the effect of transfer learning in combination with continuous unsupervised adaptation. Significance: A high-performing zero-training BCI is within reach for one of the most popular BCI paradigms: ERP spelling. Recording calibration data for a supervised BCI would require valuable time which is lost for spelling. The time spent on calibration would allow a novel user to spell 29 symbols with our unsupervised approach. It could be of use for various clinical and non-clinical ERP-applications of BCI.
Article
Full-text available
The increase in spatiotemporal resolution of neuroimaging devices is accompanied by a trend towards more powerful multivariate analysis methods. Often it is desired to interpret the outcome of these methods with respect to the cognitive processes under study. Here we discuss which methods allow for such interpretations, and provide guidelines for choosing an appropriate analysis for a given experimental goal: For a surgeon who needs to decide where to remove brain tissue it is most important to determine the origin of cognitive functions and associated neural processes. In contrast, when communicating with paralyzed or comatose patients via brain-computer interfaces, it is most important to accurately extract the neural processes specific to a certain mental state. These equally important but complementary objectives require different analysis methods. Determining the origin of neural processes in time or space from the parameters of a data-driven model requires what we call a forward model of the data; such a model explains how the measured data was generated from the neural sources. Examples are general linear models (GLMs). Methods for the extraction of neural information from data can be considered as backward models, as they attempt to reverse the data generating process. Examples are multivariate classifiers. Here we demonstrate that the parameters of forward models are neurophysiologically interpretable in the sense that significant nonzero weights are only observed at channels the activity of which is related to the brain process under study. In contrast, the interpretation of backward model parameters can lead to wrong conclusions regarding the spatial or temporal origin of the neural signals of interest, since significant nonzero weights may also be observed at channels the activity of which is statistically independent of the brain process under study. As a remedy for the linear case, we propose a procedure for transforming backward models into forward models. This procedure enables the neurophysiological interpretation of the parameters of linear backward models. We hope that this work raises awareness for an often encountered problem and provides a theoretical basis for conducting better interpretable multivariate neuroimaging analyses.
Article
Full-text available
Moving from well-controlled, brisk artificial stimuli to natural and less-controlled stimuli seems counter-intuitive for event-related potential (ERP) studies. As natural stimuli typically contain a richer internal structure, they might introduce higher levels of variance and jitter in the ERP responses. Both characteristics are unfavorable for a good single-trial classification of ERPs in the context of a multi-class brain-computer interface (BCI) system, where the class-discriminant information between target stimuli and non-target stimuli must be maximized. For the application in an auditory BCI system, however, the transition from simple artificial tones to natural syllables can be useful despite the variance introduced. In the presented study, healthy users (N = 9) participated in an offline auditory nine-class BCI experiment with artificial and natural stimuli. It is shown that the use of syllables as natural stimuli does not only improve the users' ergonomic ratings; also the classification performance is increased. Moreover, natural stimuli obtain a better balance in multi-class decisions, such that the number of systematic confusions between the nine classes is reduced. Hopefully, our findings may contribute to make auditory BCI paradigms more user friendly and applicable for patients.
Article
Full-text available
Most debate oil home ownership and risk has focused oil the management of mortgage debt. But there are other risks for home buyers in settings where housing dominates people's wealth portfolios: where the investment dimensions of property are at a premium; and where housing wealth is, de facto, all asset base for welfare. This article draws from qualitative research with 150 UK mortgage holders to assess the character, extent zinc possible mitigation of this wider risk regime. The analysis first explores the value home buyers attach to the financial returns on housing. Next we document the extent to which home equity is earmarked and used as a financial buffer. Finally, reflecting oil the merits and limitations of this tactic, we conclude by asking whether - in the interests of housing and social policy, as well as with a view to managing the economy - there is any need, scope or appetite for more actively sharing the financial risks and investment gains of housing systems anchored oil owner-occupation.
Article
Full-text available
In this letter, mixture subclass discriminant analysis (MSDA) that alleviates two shortcomings of subclass discriminant analysis (SDA) is proposed. In particular, it is shown that for data with Gaussian homoscedastic subclass structure a) SDA does not guarantee to provide the discriminant subspace that minimizes the Bayes error, and, b) the sample covariance matrix can not be used as the minimization metric of the discriminant analysis stability criterion (DSC). Based on this analysis MSDA modifies the objective function of SDA and utilizes a novel partitioning procedure to aid discrimination of data with Gaussian homoscedastic subclass structure. Experimental results confirm the improved classification performance of MSDA.
Article
Full-text available
This study assesses the relative performance characteristics of five established classification techniques on data collected using the P300 Speller paradigm, originally described by Farwell and Donchin [5]. Four linear methods: Pearson's correlation method (PCM), Fisher's Linear Discriminant (FLD), stepwise linear discriminant analysis (SWLDA), and a linear support vector machine (LSVM); and one nonlinear method: Gaussian kernel support vector machine (GSVM), are compared for classifying offline data from eight users. The relative performance of the classifiers is evaluated, along with the practical concerns regarding the implementation of the respective methods. The results indicate that while all methods attained acceptable performance levels, SWLDA and FLD provide the best overall performance and implementation characteristics for practical classification of P300 Speller data.
Article
Full-text available
Representing an intuitive spelling interface for brain-computer interfaces (BCI) in the auditory domain is not straight-forward. In consequence, all existing approaches based on event-related potentials (ERP) rely at least partially on a visual representation of the interface. This online study introduces an auditory spelling interface that eliminates the necessity for such a visualization. In up to two sessions, a group of healthy subjects (N = 21) was asked to use a text entry application, utilizing the spatial cues of the AMUSE paradigm (Auditory Multi-class Spatial ERP). The speller relies on the auditory sense both for stimulation and the core feedback. Without prior BCI experience, 76% of the participants were able to write a full sentence during the first session. By exploiting the advantages of a newly introduced dynamic stopping method, a maximum writing speed of 1.41 char/min (7.55 bits/min) could be reached during the second session (average: 0.94 char/min, 5.26 bits/min). For the first time, the presented work shows that an auditory BCI can reach performances similar to state-of-the-art visual BCIs based on covert attention. These results represent an important step toward a purely auditory BCI.
Article
Full-text available
Brain-computer interfaces (BCIs) based on event related potentials (ERPs) strive for offering communication pathways which are independent of muscle activity. While most visual ERP-based BCI paradigms require good control of the user's gaze direction, auditory BCI paradigms overcome this restriction. The present work proposes a novel approach using auditory evoked potentials for the example of a multiclass text spelling application. To control the ERP speller, BCI users focus their attention to two-dimensional auditory stimuli that vary in both, pitch (high/medium/low) and direction (left/middle/right) and that are presented via headphones. The resulting nine different control signals are exploited to drive a predictive text entry system. It enables the user to spell a letter by a single nine-class decision plus two additional decisions to confirm a spelled word. This paradigm - called PASS2D - was investigated in an online study with 12 healthy participants. Users spelled with more than 0.8 characters per minute on average (3.4 bits/min) which makes PASS2D a competitive method. It could enrich the toolbox of existing ERP paradigms for BCI end users like people with amyotrophic lateral sclerosis disease in a late stage.
Article
Full-text available
In this paper we review classification algorithms used to design brain–computer interface (BCI) systems based on electroencephalography (EEG). We briefly present the commonly employed algorithms and describe their critical properties. Based on the literature, we compare them in terms of performance and provide guidelines to choose the suitable classification algorithm(s) for a specific BCI.
Article
Full-text available
Interpreting brain image experiments requires analysis of complex, multivariate data. In recent years, one analysis approach that has grown in popularity is the use of machine learning algorithms to train classifiers to decode stimuli, mental states, behaviours and other variables of interest from fMRI data and thereby show the data contain information about them. In this tutorial overview we review some of the key choices faced in using this approach as well as how to derive statistically significant results, illustrating each point from a case study. Furthermore, we show how, in addition to answering the question of 'is there information about a variable of interest' (pattern discrimination), classifiers can be used to tackle other classes of question, namely 'where is the information' (pattern localization) and 'how is that information encoded' (pattern characterization).
Article
Full-text available
In this paper, we describe a simple set of "recipes" for the analysis of high spatial density EEG. We focus on a linear integration of multiple channels for extracting individual components without making any spatial or anatomical modeling assumptions, instead requiring particular statistical properties such as maximum difference, maximum power, or statistical independence. We demonstrate how corresponding algorithms, for example, linear discriminant analysis, principal component analysis and independent component analysis, can be used to remove eye-motion artifacts, extract strong evoked responses, and decompose temporally overlapping components. The general approach is shown to be consistent with the underlying physics of EEG, which specifies a linear mixing model of the underlying neural and non-neural current sources.
Article
Full-text available
The development of high-resolution neuroimaging and multielectrode electrophysiological recording provides neuroscientists with huge amounts of multivariate data. The complexity of the data creates a need for statistical summary, but the local averaging standardly applied to this end may obscure the effects of greatest neuroscientific interest. In neuroimaging, for example, brain mapping analysis has focused on the discovery of activation, i.e., of extended brain regions whose average activity changes across experimental conditions. Here we propose to ask a more general question of the data: Where in the brain does the activity pattern contain information about the experimental condition? To address this question, we propose scanning the imaged volume with a “searchlight,” whose contents are analyzed multivariately at each location in the brain. • neuroimaging • functional magnetic resonance imaging • statistical analysis
Article
Full-text available
The reliable operation of brain-computer interfaces (BCIs) based on spontaneous electroencephalogram (EEG) signals requires accurate classification of multichannel EEG. The design of EEG representations and classifiers for BCI are open research questions whose difficulty stems from the need to extract complex spatial and temporal patterns from noisy multidimensional time series obtained from EEG measurements. The high-dimensional and noisy nature of EEG may limit the advantage of nonlinear classification methods over linear ones. This paper reports the results of a linear (linear discriminant analysis) and two nonlinear classifiers (neural networks and support vector machines) applied to the classification of spontaneous EEG during five mental tasks, showing that nonlinear classifiers produce only slightly better classification results. An approach to feature selection based on genetic algorithms is also presented with preliminary results of application to EEG during finger movement.
Article
Fisher‐Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class. In this paper, we fit Gaussian mixtures to each class to facilitate effective classification in non‐normal settings, especially when the classes are clustered. Low dimensional views are an important by‐product of LDA—our new techniques inherit this feature. We can control the within‐class spread of the subclass centres relative to the between‐class spread. Our technique for fitting these models permits a natural blend with nonparametric versions of LDA.
Article
Active and flexible manipulations of memory contents "in the mind's eye" are believed to occur in a dedicated neural workspace, frequently referred to as visual working memory. Such a neural workspace should have two important properties: The ability to store sensory information across delay periods and the ability to flexibly transform sensory information. Here we used a combination of functional MRI and multivariate decoding to indentify such neural representations. Subjects were required to memorize a complex artificial pattern for an extended delay, then rotate the mental image as instructed by a cue and memorize this transformed pattern. We found that patterns of brain activity already in early visual areas and posterior parietal cortex encode not only the initially remembered image, but also the transformed contents after mental rotation. Our results thus suggest that the flexible and general neural workspace supporting visual working memory can be realized within posterior brain regions. Copyright © 2014 Elsevier Inc. All rights reserved.
Article
It has long been customary to measure the adequacy of an estimator by the smallness of its mean squared error. The least squares estimators were studied by Gauss and by other authors later in the nineteenth century. A proof that the best unbiased estimator of a linear function of the means of a set of observed random variables is the least squares estimator was given by Markov [12], a modified version of whose proof is given by David and Neyman [4]. A slightly more general theorem is given by Aitken [1]. Fisher [5] indicated that for large samples the maximum likelihood estimator approximately minimizes the mean squared error when compared with other reasonable estimators. This paper will be concerned with optimum properties or failure of optimum properties of the natural estimator in certain special problems with the risk usually measured by the mean squared error or, in the case of several parameters, by a quadratic function of the estimators. We shall first mention some recent papers on this subject and then give some results, mostly unpublished, in greater detail.
Article
Objective: A Brain Computer Interface (BCI) speller is a communication device, which can be used by patients suffering from neurodegenerative diseases to select symbols in a computer application. For patients unable to overtly fixate the target symbol, it is crucial to develop a speller independent of gaze shifts. In the present online study, we investigated rapid serial visual presentation (RSVP) as a paradigm for mental typewriting. Methods: We investigated the RSVP speller in three conditions, regarding the Stimulus Onset Asynchrony (SOA) and the use of color features. A vocabulary of 30 symbols was presented one-by-one in a pseudo random sequence at the same location of display. Results: All twelve participants were able to successfully operate the RSVP speller. The results show a mean online spelling rate of 1.43 symb/min and a mean symbol selection accuracy of 94.8% in the best condition. Conclusion: We conclude that the RSVP is a promising paradigm for BCI spelling and its performance is competitive with the fastest gaze-independent spellers in literature. Significance: The RSVP speller does not require gaze shifts towards different target locations and can be operated by non-spatial visual attention, therefore it can be considered as a valid paradigm in applications with patients for impaired oculo-motor control.
Article
Motion visually evoked potentials (mVEPs) have recently been explored as input features for brain-computer interfaces, in particular for the implementation of visual spellers. Due to low contrast and luminance requirements, motion-based intensification is less discomforting to the user than conventional approaches. So far, mVEP spellers were operated in the overt attention mode, wherein eye movements were allowed. However, the dependence on eye movements limits clinical applicability. Hence, the purpose of this study was to evaluate the suitability of mVEPs for gaze-independent communication. Sixteen healthy volunteers participated in an online study. We used a conventional speller layout wherein the possible selections are presented at different spatial locations both in the overt attention mode (fixation of the target) and the covert attention mode (central fixation). Additionally, we tested an alternative speller layout wherein all stimuli are sequentially presented at the same spatial location (foveal stimulation), i.e. eye movements are not required for selection. As can be expected, classification performance breaks down when switching from the overt to the covert operation. Despite reduced performance in the covert setting, conventional mVEP spellers are still potentially useful for users with severely impaired eye movements. In particular, they may offer advantages--such as less visual fatigue--over spellers using flashing stimuli. Importantly, the novel mVEP speller presented here recovers good performance in a gaze-independent setting by resorting to the foveal stimulation.
Article
Many applied problems require a covariance matrix estimator that is not only invertible, but also well-conditioned (that is, inverting it does not amplify estimation error). For large-dimensional covariance matrices, the usual estimator—the sample covariance matrix—is typically not well-conditioned and may not even be invertible. This paper introduces an estimator that is both well-conditioned and more accurate than the sample covariance matrix asymptotically. This estimator is distribution-free and has a simple explicit formula that is easy to compute and interpret. It is the asymptotically optimal convex linear combination of the sample covariance matrix with the identity matrix. Optimality is meant with respect to a quadratic loss function, asymptotically as the number of observations and the number of variables go to infinity together. Extensive Monte Carlo confirm that the asymptotic results tend to hold well in finite sample.
Article
Analyzing brain states that correspond to event related potentials (ERPs) on a single trial basis is a hard problem due to the high trial-to-trial variability and the unfavorable ratio between signal (ERP) and noise (artifacts and neural background activity). In this tutorial, we provide a comprehensive framework for decoding ERPs, elaborating on linear concepts, namely spatio-temporal patterns and filters as well as linear ERP classification. However, the bottleneck of these techniques is that they require an accurate covariance matrix estimation in high dimensional sensor spaces which is a highly intricate problem. As a remedy, we propose to use shrinkage estimators and show that appropriate regularization of linear discriminant analysis (LDA) by shrinkage yields excellent results for single-trial ERP classification that are far superior to classical LDA classification. Furthermore, we give practical hints on the interpretation of what classifiers learned from the data and demonstrate in particular that the trade-off between goodness-of-fit and model complexity in regularized LDA relates to a morphing between a difference pattern of ERPs and a spatial filter which cancels non task-related brain activity.
Article
There is evidence that conventional visual brain-computer interfaces (BCIs) based on event-related potentials cannot be operated efficiently when eye movements are not allowed. To overcome this limitation, the aim of this study was to develop a visual speller that does not require eye movements. Three different variants of a two-stage visual speller based on covert spatial attention and non-spatial feature attention (i.e. attention to colour and form) were tested in an online experiment with 13 healthy participants. All participants achieved highly accurate BCI control. They could select one out of thirty symbols (chance level 3.3%) with mean accuracies of 88%-97% for the different spellers. The best results were obtained for a speller that was operated using non-spatial feature attention only. These results show that, using feature attention, it is possible to realize high-accuracy, fast-paced visual spellers that have a large vocabulary and are independent of eye gaze.
Article
Pattern-information analysis has become an important new paradigm in functional imaging. Here I review and compare existing approaches with a focus on the question of what we can learn from them in terms of brain theory. The most popular and widespread method is stimulus decoding by response-pattern classification. This approach addresses the question whether activity patterns in a given region carry information about the stimulus category. Pattern classification uses generic models of the stimulus-response relationship that do not mimic brain information processing and treats the stimulus space as categorical-a simplification that is often helpful, but also limiting in terms of the questions that can be addressed. We can address the question whether representations are consistent across different stimulus sets or tasks by cross-decoding, where the classifier is trained with one set of stimuli (or task) and tested with another. Beyond pattern classification, a major new direction is the integration of computational models of brain information processing into pattern-information analysis. This approach enables us to address the question to what extent competing computational models are consistent with the stimulus representations in a brain region. Two methods that test computational models are voxel receptive-field modeling and representational similarity analysis. These methods sample the stimulus (or mental-state) space more richly, estimate a separate response pattern for each stimulus, and can generalize from the stimulus sample to a stimulus population. Computational models that mimic brain information processing predict responses from stimuli. The reverse transform can be modeled to reconstruct stimuli from responses. Stimulus reconstruction is a challenging feat of engineering, but the implications of the results for brain theory are not always clear. Exploratory pattern analyses complement the confirmatory approaches mentioned so far and can reveal strong, unexpected effects that might be missed when testing only a restricted set of predefined hypotheses.
Article
Machine learning and pattern recognition algorithms have in the past years developed to become a working horse in brain imaging and the computational neurosciences, as they are instrumental for mining vast amounts of neural data of ever increasing measurement precision and detecting minuscule signals from an overwhelming noise floor. They provide the means to decode and characterize task relevant brain states and to distinguish them from non-informative brain signals. While undoubtedly this machinery has helped to gain novel biological insights, it also holds the danger of potential unintentional abuse. Ideally machine learning techniques should be usable for any non-expert, however, unfortunately they are typically not. Overfitting and other pitfalls may occur and lead to spurious and nonsensical interpretation. The goal of this review is therefore to provide an accessible and clear introduction to the strengths and also the inherent dangers of machine learning usage in the neurosciences.
Article
A popular method for investigating whether stimulus information is present in fMRI response patterns is to attempt to "decode" the stimuli from the response patterns with a multivariate classifier. The sensitivity for detecting the information depends on the particular classifier used. However, little is known about the relative performance of different classifiers on fMRI data. Here we compared six multivariate classifiers and investigated how the response-amplitude estimate used (beta- or t-value) and different pattern normalizations affect classification performance. The compared classifiers were a pattern-correlation classifier, a k-nearest-neighbors classifier, Fisher's linear discriminant, Gaussian naïve Bayes, and linear and nonlinear (radial-basis-function kernel) support vector machines. We compared these classifiers' accuracy at decoding the category of visual objects from response patterns in human early visual and inferior temporal cortex acquired in an event-related design with BOLD fMRI at 3T using SENSE and isotropic voxels of about 2-mm width. Overall, Fisher's linear discriminant (with an optimal-shrinkage covariance estimator) and the linear support vector machine performed best. The pattern-correlation classifier often performed similarly as those two classifiers. The nonlinear classifiers never performed better and sometimes significantly worse than the linear classifiers, suggesting overfitting. Defining response patterns by t-values (or in error-standard-deviation units) rather than by beta estimates (in % signal change) to define the patterns appeared advantageous. Cross-validation by a leave-one-stimulus-pair-out method gave higher accuracies than a leave-one-run-out method, suggesting that generalization to independent runs (which more safely ensures independence of the test set) is more challenging than generalization to novel stimuli within the same category. Independent selection of fewer more visually responsive voxels tended to yield better decoding performance for all classifiers. Normalizing mean and standard deviation of the response patterns either across stimuli or across voxels had no significant effect on decoding performance. Overall our results suggest that linear decoders based on t-value patterns may perform best in the present scenario of visual object representations measured for about 60min per subject with 3T fMRI.
Article
Univariate statistical approaches are often used for the analysis of neuroimaging data but are unable to detect subtle interactions between different components of brain activity. In contrast, multivariate approaches that use classification as a basis are well-suited to detect such interactions, allowing the analysis of neuroimaging data on the single trial level. However, multivariate approaches typically assign a non-zero contribution to every component, making interpretation of the results troublesome. This paper introduces groupwise regularisation as a novel method for finding sparse, and therefore easy to interpret, models that are able to predict the experimental condition to which single trials belong. Furthermore, the obtained models can be constrained in various ways by placing features extracted from the data that are thought to belong together into groups. In order to learn models from data, we introduce a new algorithm that makes use of stability conditions that have been derived in this paper. The algorithm is used to classify multisensor EEG signals recorded for a motor imagery task using (groupwise) regularised logistic regression as the underlying classifier. We show that regularisation dramatically reduces the number of features without reducing the classification rate. This improves model interpretability as it finds features in the data such as mu and beta desynchronisation in the motor cortex contralateral to the imagined movement. By choosing particular groupings we can constrain the regularised solutions such that a lower number of sensors is used or a model is obtained that generalises well over subjects. The identification of a small number of groups of features that best explain the data make groupwise regularisation a useful new tool for single trial analysis.
Article
This paper treats support vector machine (SVM) classification applied to block design fMRI, extending our previous work with linear discriminant analysis [LaConte, S., Anderson, J., Muley, S., Ashe, J., Frutiger, S., Rehm, K., Hansen, L.K., Yacoub, E., Hu, X., Rottenberg, D., Strother S., 2003a. The evaluation of preprocessing choices in single-subject BOLD fMRI using NPAIRS performance metrics. NeuroImage 18, 10-27; Strother, S.C., Anderson, J., Hansen, L.K., Kjems, U., Kustra, R., Siditis, J., Frutiger, S., Muley, S., LaConte, S., Rottenberg, D. 2002. The quantitative evaluation of functional neuroimaging experiments: the NPAIRS data analysis framework. NeuroImage 15, 747-771]. We compare SVM to canonical variates analysis (CVA) by examining the relative sensitivity of each method to ten combinations of preprocessing choices consisting of spatial smoothing, temporal detrending, and motion correction. Important to the discussion are the issues of classification performance, model interpretation, and validation in the context of fMRI. As the SVM has many unique properties, we examine the interpretation of support vector models with respect to neuroimaging data. We propose four methods for extracting activation maps from SVM models, and we examine one of these in detail. For both CVA and SVM, we have classified individual time samples of whole brain data, with TRs of roughly 4 s, thirty slices, and nearly 30,000 brain voxels, with no averaging of scans or prior feature selection.
Article
Although it is known that responses in the auditory cortex are evoked predominantly contralateral to the side of stimulation, the lateralization of responses at lower levels in the human central auditory system has hardly been studied. Furthermore, little is known on the functional interactions between the involved processing centers. In this study, functional MRI was performed using sound stimuli of varying left and right intensities. In normal hearing subjects, contralateral activation was consistently detected in the temporal lobe, thalamus and midbrain. Connectivity analyses showed that auditory information crosses to the contralateral side in the lower brainstem followed by ipsilateral signal conduction towards the auditory cortex, similar to the flow of auditory signals in other mammals. In unilaterally deaf subjects, activation was more symmetrical for the cortices but remained contralateral in the midbrain and thalamus. Input connection strengths were different only at cortical levels, and there was no evidence for plastic reorganization at subcortical levels.
Article
Over the years, many Discriminant Analysis (DA) algorithms have been proposed for the study of high-dimensional data in a large variety of problems. Each of these algorithms is tuned to a specific type of data distribution (that which best models the problem at hand). Unfortunately, in most problems the form of each class pdf is a priori unknown, and the selection of the DA algorithm that best fits our data is done over trial-and-error. Ideally, one would like to have a single formulation which can be used for most distribution types. This can be achieved by approximating the underlying distribution of each class with a mixture of Gaussians. In this approach, the major problem to be addressed is that of determining the optimal number of Gaussians per class, i.e., the number of subclasses. In this paper, two criteria able to find the most convenient division of each class into a set of subclasses are derived. Extensive experimental results are shown using five databases. Comparisons are given against Linear Discriminant Analysis (LDA), Direct LDA (DLDA), Heteroscedastic LDA (HLDA), Nonparametric DA (NDA), and Kernel-Based LDA (K-LDA). We show that our method is always the best or comparable to the best.
Article
This study assesses the relative performance characteristics of five established classification techniques on data collected using the P300 Speller paradigm, originally described by Farwell and Donchin (1988 Electroenceph. Clin. Neurophysiol. 70 510). Four linear methods: Pearson's correlation method (PCM), Fisher's linear discriminant (FLD), stepwise linear discriminant analysis (SWLDA) and a linear support vector machine (LSVM); and one nonlinear method: Gaussian kernel support vector machine (GSVM), are compared for classifying offline data from eight users. The relative performance of the classifiers is evaluated, along with the practical concerns regarding the implementation of the respective methods. The results indicate that while all methods attained acceptable performance levels, SWLDA and FLD provide the best overall performance and implementation characteristics for practical classification of P300 Speller data.
Article
At the recent Second International Meeting on Brain-Computer Interfaces (BCIs) held in June 2002 in Rensselaerville, NY, a formal debate was held on the pros and cons of linear and nonlinear methods in BCI research. Specific examples applying EEG data sets to linear and nonlinear methods are given and an overview of the various pros and cons of each approach is summarized. Overall, it was agreed that simplicity is generally best and, therefore, the use of linear methods is recommended wherever possible. It was also agreed that nonlinear methods in some applications can provide better results, particularly with complex and/or other very large data sets.
Article
Many image enhancement and thresholding techniques make use of spatial neighbourhood information to boost belief in extended areas of signal. The most common such approach in neuroimaging is cluster-based thresholding, which is often more sensitive than voxel-wise thresholding. However, a limitation is the need to define the initial cluster-forming threshold. This threshold is arbitrary, and yet its exact choice can have a large impact on the results, particularly at the lower (e.g., t, z < 4) cluster-forming thresholds frequently used. Furthermore, the amount of spatial pre-smoothing is also arbitrary (given that the expected signal extent is very rarely known in advance of the analysis). In the light of such problems, we propose a new method which attempts to keep the sensitivity benefits of cluster-based thresholding (and indeed the general concept of "clusters" of signal), while avoiding (or at least minimising) these problems. The method takes a raw statistic image and produces an output image in which the voxel-wise values represent the amount of cluster-like local spatial support. The method is thus referred to as "threshold-free cluster enhancement" (TFCE). We present the TFCE approach and discuss in detail ROC-based optimisation and comparisons with cluster-based and voxel-based thresholding. We find that TFCE gives generally better sensitivity than other methods over a wide range of test signal shapes and SNR values. We also show an example on a real imaging dataset, suggesting that TFCE does indeed provide not just improved sensitivity, but richer and more interpretable output than cluster-based thresholding.
Subclass discriminant analysis. Pattern Analysis and Machine Intelligence
  • M Zhu
  • A M Martinez
Zhu, M., Martinez, A. M., 2006. Subclass discriminant analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on 28 (8), 1274-1286.
Multi-target shrinkage, submitted -available on arXiv
  • D Bartz
  • J Höhne
  • K.-R Müller
Bartz, D., Höhne, J., Müller, K.-R., 2014. Multi-target shrinkage, submitted -available on arXiv.