ArticlePDF Available

Abstract and Figures

The effectiveness of neurophysiologically triggered adaptive systems hinges on reliable and effective signal processing and cognitive state classification. Although this presents a difficult technical challenge in any context, these concerns are particularly pronounced in a system designed for mobile contexts. This paper describes a neurophysiologically derived cognitive state classification approach designed for ambulatory task contexts. We highlight signal processing and classification components that render the electroencephalogram (EEG) -based cognitive state estimation system robust to noise. Field assessments show classification performance that exceeds 70% for all participants in a context that many have regarded as intractable for cognitive state classification using EEG.
Content may be subject to copyright.
Iowa State University
From the SelectedWorks of Michael C. Dorneich
*&&%') $" #%$ ) +))
"(( /) %$%$% "$ + *"
 "%'$ 
)&$ )"%,
)'  -'+'(
$ .'%#*( Oregon Health & Science University)"
+ "") 0&(,%'!(&'((%## "%'$ 
Supporting Real-Time Cognitive State
Classification on a Mobile Individual
Michael C. Dorneich
Stephen D. Whitlow
Santosh Mathan
Patricia May Ververs
Honeywell Laboratories
Deniz Erdogmus
Andre Adami
Misha Pavel
Tian Lan
Oregon Health & Science University
ABSTRACT: The effectiveness of neurophysiologically triggered adaptive systems hinges
on reliable and effective signal processing and cognitive state classification. Although
this presents a difficult technical challenge in any context, these concerns are particular-
ly pronounced in a system designed for mobile contexts. This paper describes a neuro-
physiologically derived cognitive state classification approach designed for ambulatory
task contexts. We highlight signal processing and classification components that render
the electroencephalogram (EEG) -based cognitive state estimation system robust to
noise. Field assessments show classification performance that exceeds 70% for all par-
ticipants in a context that many have regarded as intractable for cognitive state class-
ification using EEG.
current task environment, can either provide adaptive aiding, which makes a certain
component of a task simpler, or can provide adaptive task allocation, which shifts an
entire task from a larger multitask context to automation (Parasuraman, Mouloua,
& Hilburn, 1999). Adaptive systems must make timely decisions on how to use vary-
ing levels of (adaptive) automation to provide support in a joint human-automation
In order for an adaptive system to decide when to intervene, it must have some
model of the context of operations, be it a functional model of system performance or
possibly a model of the operator’s functional state. Currently, many adaptive systems
derive their inferences about the cognitive state of the operator from mental models,
performance on the task, or external factors related directly to the task environ-
ment (Wickens & Hollands, 2000). For example, Scott (1999) developed the Ground
Collision-Avoidance System (GCAS) for test on an F-16D. GCAS used the projected
ADDRESS CORRESPONDENCE TO: Michael C. Dorneich, Honeywell Laboratories, Human-Centered Sys-
tems, 3660 Technology Dr., Minneapolis, MN 55418, 612/951-7488,
Journal of Cognitive Engineering and Decision Making, Volume 1, Number 3, Fall 2007, pp. 240–270. DOI
10.1518/155534307X255618. © 2007 Human Factors and Ergonomics Society. All rights reserved.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
time until an aircraft broke through a pilot-determined minimum altitude as an exter-
nal condition to infer that a pilot’s attention was incapacitated, at which point the sys-
tem would perform a “fly-up” evasive maneuver to avoid a ground collision. In that
case, the automation took over control of the aircraft from the pilot.
Neurophysiologically and physiologically triggered adaptive automation offers
many advantages over the more traditional approaches to automation by basing esti-
mates of operator state in sensed data directly. These systems offer the promise of
leveraging the strengths of humans and machines, augmenting human performance
with automation specifically when assessed human cognitive capacity falls short of
the demands imposed by task environments. With more refined estimates of the oper-
ator’s cognitive state, measured in real time, adaptive automation also offers the oppor-
tunity to provide aid even before the operator knows he or she is getting into trouble.
Operational Problem
The aim of augmented cognition research is to use physiological and neurophys-
iological sensors to detect states in which cognitive resources may be inadequate
to cope with mission-relevant demands. The goal is to enhance human performance
when task-related demands surpass the human’s assessed current cognitive capacity,
which fluctuates subject to fatigue, stress, overload, or boredom. Efforts have focused
on ways to leverage cognitive state information to drive adaptive systems to manage
information flow when detected human cognitive resources may be inadequate
for the tasks.
The Honeywell team has focused on the dismounted soldier in the future mili-
tary. The research program described in this article was conducted in support of the
U.S. Army Future Force Warrior (FFW) Advanced Technology Development pro-
gram. The FFW program seeks to push information exchange requirements to the
lowest levels, with the goal of enhancing the capabilities of a squad so that it can cover
the battlefield in the same way that a platoon now does. A critical element of the FFW
program is a reliance on networked communications and high-density information
exchange. These capabilities are expected to increase situation awareness at every level
of the operational hierarchy. Introducing information technologies within the trans-
formation of the military will facilitate better individual and collaborative decision
making at every level. However, effective use of these information sources is con-
strained by the limitations of the human cognitive system.
This revolutionary concept of operations could dramatically increase the likeli-
hood of information overload that could turn the postulated information superior-
ity into a profound liability. The potential data overload, coupled with the efficiency
of information flow required in executing Army doctrine, places an overreliance of
critical information throughput on a single point of contact, the individual warfight-
er. To ensure that warfighters are supported appropriately, there needs to be intel-
ligent information management to ensure that the system can support superior
situation awareness on the battlefield.
Adaptive information management systems have an important role in this context.
The efficacy of such a system is contingent on reliable and timely cognitive assessment.
Supporting Real-Time Cognitive State Classification 241
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
An example instantiation of such a system is the Communications Scheduler as
described in Dorneich, Whitlow, Mathan, Carciofini, and Ververs (2005). The system
changes the information presentation (e.g., high-priority messages preceded by a prim-
ing alert, low-priority messages delivered via text messages) based on the message
priority and cognitive workload of the soldier during critical times. The system not
only reduces the overall number of transmissions at key moments but also improves
the likelihood of receipt of essential information with reduced bandwidth and power
usage. But such strong mitigations of an adaptive system can be effective only if they
are properly tuned to the current cognitive capabilities of the user, as well as thor-
oughly evaluated with the anticipated users of the system. An accurate real-time clas-
sification of the cognitive state of the soldier is an essential first step in this process.
Neurophysiologically driven prototypes for regulating information flow were
developed and tested by a team of researchers led by Honeywell (Dorneich, Whitlow,
Mathan, Ververs, Pavel, & Erdogmus, 2005; Dorneich, Whitlow, Ververs, Mathan, et
al., 2004) to evaluate the potential benefits to the ground soldier who will receive vol-
umes of information from a variety of sensors and sources. Information regarding a
soldier’s cognitive state was integrated with information systems to manage assets and
communications. Cognitive state classification was applied to and focused on those
Army roles that require significant cognitive processing, information integration, and
information management on the part of the recipient. Such roles include the battle-
field commander, robotics noncommissioned officer, platoon leader, and other roles
that support the Network Centric Information Environment.
The current FFW approach to cognitive state assessment relies on cardiac and
physical sensors to assess general cognitive state based on the level of sleep debt in
the last 24 hours and the phase of the circadian cycle (Institute of Medicine of the
National Academies, 2004). If a truly adaptive system that manages information flow
is to be implemented, a higher degree of fidelity in the cognitive state assessment and
temporal resolution is needed.
Cognitive State Classification Techniques
Neurophysiological- and physiological-based assessment of cognitive state has
been captured in several different ways, including but not limited to cardiac meas-
ures, electroencephalogram (EEG), and functional near-infrared (fNIR) imaging.
There is an extensive research history of using cardiac, or electrocardiogram (ECG),
measures to evaluate cognitive activity under a variety of task conditions. Measures
include heart rate variability in the time domain to assess mental load (Kalsbeek &
Ettema, 1963), tonic heart rate to evaluate the impact of continuous information pro-
cessing (Wildervanck, Mulder, & Michon, 1978), variability in the spectral domain
as an index of cognitive workload (Wilson & Eggemeier, 1991), and T-wave ampli-
tude during math interruption task performance (Heslegrave & Furedy, 1979). The
fNIR spectroscopy conducts functional brain studies using wavelengths of light intro-
duced at the scalp to measure cognition-related hemodynamic changes, and has been
used to assess the cognitive state (Izzetoglu & Bunce, 2004).
242 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
Other physiological measures used to inform cognitive state assessment are gal-
vanic skin response (Verwey & Veltman, 1996), eyelid movement (Neumann, 2002;
Stern, Boyer, & Schroeder, 1994; Veltman & Gaillard,1998; Yamada,1998), pupil
response (Beatty, 1982; Partala & Surakka, 2003), and respiratory patterns (Backs
& Seljos, 1994; Boiten, 1998; Porges & Byrne, 1992; Veltman & Gaillard, 1998;
Wientjes, 1992).
As the gold standard for providing high-resolution spatial and temporal indices of
cortical electrical activity from scalp electrodes, EEG has been used in the context
of adaptive systems. For instance, researchers have used the engagement index, devel-
oped by NASA, in the context of mixed-initiative control of an automated system
(Pope, Bogart, & Bartolome, 1995). This method uses a ratio of power in common
frequency bands (beta / [alpha + theta]), whereby cognitively alert and focused is rep-
resented in beta, wakeful and relaxed in alpha, and a daydream state in theta. Thus
higher engagement index values estimate increased levels of task engagement.
The efficacy of the engagement index as the basis for adaptive task allocation has
been experimentally established. For instance, under manipulations of vigilance lev-
els (Mikulka, Hadley, Freeman, & Scerbo, 1999) and workload (Prinzel, Freeman,
Scerbo, Mikulka, & Pope, 2000), an adaptive system effectively detected states in
which human performance was likely to fail and took steps to allocate tasks in a man-
ner that would raise overall task performance. The results associated with the engage-
ment index highlighted the potential benefits of a neurophysiologically triggered
adaptive automation. There are several ways in which this promising work needs to
be extended in order to be effective in the dynamic, ambulatory contexts of the re-
search reported here.
1. Individual differences. As Scerbo and Gustafson (2001) point out, there were
unique individual EEG responses to task demands. Although the characterization of
the relationship between engagement and EEG activity, in terms of activity within
certain frequency bands and sites, was useful for synthesizing broadly observed trends,
a given individual’s responses may deviate substantially from assumptions derived
from averaged data. In response, some researchers have called for an approach that
was more sensitive to individual variability in EEG expression (Mathan et al., 2005).
2. Linear relationships. The engagement index was based on a linear relationship
between power estimates at specific frequency bands. However, there are potential-
ly informative nonlinear relationships across spectral features at various sites that
could help discriminate between various cognitive states. Research indicates that more
advanced pattern recognition techniques, such as multilayer neural networks, could
exploit relationships among features that do not conform to linearity assumptions
(Scerbo et al., 2001; Wilson & Russell, 2003).
3. Analysis windows. The engagement index was designed to estimate cognitive
state over an analysis window that was close to a minute in duration. Developers of
the engagement index made no claims about its efficacy at temporal resolutions of a
few seconds or hundreds of milliseconds. In the authors’ own laboratory experience,
the engagement index was able to discriminate between periods of high-intensity vir-
tual combat and periods of rest in a first-person video game over the course of analysis
Supporting Real-Time Cognitive State Classification 243
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
windows that spanned minutes, but not at a resolution of less than 10 s (Dorneich,
Whitlow, Ververs, Mathan, et al., 2004). The demands of the task environment may
require techniques that provide reliable cognitive state estimates with a fairly high de-
gree of temporal resolution.
4. Validation context. Much of the literature associated with cognitive state estima-
tion relies on findings from data collected in relatively stationary laboratory settings
(Schmorrow & Kruse, 2002). Data collection in laboratory environments has sev-
eral attributes that cannot be realized in mobile contexts. For example, the exper-
imental setup can be controlled in order to facilitate better performance, various
precautions to improve signal quality can be implemented, and large-scale data col-
lection, analysis, and signal-processing hardware and software can be used. These
constraints have to be relaxed in mobile environments. In mobile applications, EEG
signals can be very noisy and contaminated by a wide range of artifacts. Furthermore,
the system must be portable and able to work in real time.
The work reported here addressed some of the shortcomings that were previous-
ly highlighted by creating a system that was optimized to the unique EEG spectral
characteristics of each individual in response to specific task demands. Pattern recog-
nition techniques that make no restrictive assumptions about the form of the data
being modeled were used. The system provided cognitive state estimates at a high
degree of temporal resolution and was designed to work in real time in mobile con-
texts. Three aspects of the approach are highlighted in the pages that follow: hard-
ware integration into a wireless wearable form factor, real-time signal processing to
detect and correct for artifacts, and a nonlinear classification approach.
The remainder of this paper is organized as follows. The next section will discuss
the technical challenges in creating and evaluating robust mobile EEG classification.
Preliminary laboratory experiments that formed the foundations of the work dis-
cussed in this paper will be briefly reviewed. Finally, the mobile field evaluation and
results will be discussed in detail, concluding with a discussion of future directions.
Technical Challenges
Realizing the vision of an augmented cognition system in the context of an am-
bulatory soldier has been constrained by several challenges. First, as Schmorrow and
Kruse (2002) noted, processing and analysis of neurophysiological data have been
largely conducted offline by researchers and practitioners. However, in order for aug-
mented cognition technologies to work in practical settings, effective and compu-
tationally efficient artifact reduction and signal-processing solutions are necessary.
Second, inferring the cognitive state of users demands pattern recognition solutions
that are robust both to noise and to the inherent nonstationarity in neurophysiolog-
ical signals (Popivanov & Mineva, 1999). Third, understanding the fluctuations of the
cognitive state in applied environments requires the development of means to collect
reliable neurophysiological data outside the laboratory. Fourth, experiments must be
designed, often under conflicting constraints (e.g., operational realistic tasks vs. well-
understood, controlled laboratory tasks), to effectively evaluate classification accura-
244 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
cy. Finally, compact and robust form factors (e.g., size, weight, ruggedness) associat-
ed with neurophysiological sensors and processors are a matter of critical concern.
Real-Time Signal-Processing Challenges
Conducting military maneuvers in operational environments such as urban
terrain often does not allow an individual to remain stationary and can demand si-
multaneous cognitive and physical activity. Consequently, difficulties related to the
processing of EEG signals in real-world settings include factors associated with both
participant motion and the operational environment itself. Thus, utilization of re-
search methods involving EEG in operational environments necessitates the use of
real-time algorithms for signal detection and removal of artifacts. Although real-time
signal processing and classification of the EEG has been implemented previously
(Berka et al., 2004; Gevins & Smith, 2003), it has not been realized in a truly mobile,
ambulatory environment.
Inferring cognitive state from noninvasive neurophysiological sensors is a chal-
lenging task even in pristine laboratory environments. High-amplitude artifacts, rang-
ing from eye blinks to muscle artifacts and electrical line noise, can easily mask the
lower-amplitude electrical signals associated with cognitive functions. These concerns
are particularly pronounced in the context of ongoing efforts to realize neurophys-
iologically driven adaptive automation for the dismounted ambulatory soldier.
In addition to the typical sources of signal contamination, mobile applications
must consider the effects of artifacts induced by shock, cable movement, and gross
muscle movement. Specifically, artifacts related to participant motion include high-
frequency muscle activity, verbal communication, and ocular artifacts consisting of
eye movements and blinks; whereas artifacts related to the operational environment
include instrumental artifacts such as electrical noise that creates interference with
the EEG signal (cf. Kramer, 1991).
Classification Challenges
The use of EEG as the basis for cognitive state assessment was motivated by char-
acteristics such as good temporal resolution, low invasiveness, low cost, and porta-
bility. Although EEG offers several benefits, there are shortcomings related to the noise
artifacts described previously and the nonstationarity of the neural signal pattern
over time. Despite these challenges, research has shown that EEG activity can be used
to assess a variety of cognitive states that affect complex task performance. These in-
clude working memory (Gevins & Smith, 2000), alertness (Makeig & Jung, 1995),
executive control (Garavan, Ross, Li, & Stein, 2000), and visual information process-
ing (Thorpe, Fize, & Marlot, 1996). These findings point to the potential for using
EEG measurements as the basis for driving adaptive systems that demonstrate a high
degree of sensitivity and adaptability to human operators in complex task environments.
Scenario Design Challenges
In addition to the practical and system configuration challenges faced when mov-
ing from the laboratory to field studies, there are issues of experimental control and
Supporting Real-Time Cognitive State Classification 245
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
the characterization of cognitive state in less constrained environments. It is essen-
tial to select tasks that are both operationally relevant and afford reasonable adap-
tations that improve performance. In the laboratory, it is possible to develop simple
tasks in which workload is manipulated precisely and consistently. Additionally, a user’s
performance can be collected and evaluated accurately. This makes it relatively easy
to establish ground truth about a user’s likely workload.
However, when developing operationally relevant tasks in a field environment,
it becomes substantially more difficult to manipulate workload precisely and to inter-
pret and assess a user’s performance without compromising operational realism. The
mobile field evaluation reported herein had two objectives: first, to determine wheth-
er an operationally relevant task load manipulation had a measurable impact on a
user’s workload; and second, to establish whether a sensor-based classification ap-
proach could effectively classify a user’s workload in a mobile setting.
System Description
This section describes the mobile classification hardware and software approach-
es. Subsequent sections will describe how this system was evaluated in a mobile
The wireless sensor suite employed by Honeywell was assembled using a variety
of off-the-shelf hardware components tied together with a custom agent-based infor-
mation architecture based on the work of the Institute for Human and Machine
Cognition (IHMC; see Dorneich, Whitlow, Ververs, Mathan et al., 2004, for more
information). EEG data were collected with a 32-channel BioSemi Active Two system
as well as a more deployable six-channel EEG sensor headset made by Advanced
Brain Monitoring (ABM). The BioSemi Active Two system integrates an amplifier
with an Ag-AgCl electrode, which affords extremely low noise measurements with-
out skin preparation. The ABM system includes two differential channels (FzPOz and
CzPOz) and four referential channels (Fz, Cz, POz, and linked mastoids acting as
a reference site).
Information from either of the EEG systems was processed on a body-worn lap-
top that ran the IHMC information architecture. The BioSemi and ABM systems inter-
faced with the laptop via a USB 2.0 port and Bluetooth®serial port, respectively. The
sensor electronics and the laptop were mounted in a backpack worn by the partici-
pant (see Figure 1). Sensor data were collected and processed on the laptop comput-
er during the experiment.
Signal Processing
For the BioSemi Active Two EEG system, vertical and horizontal eye movements
and blinks were recorded with electrodes below and lateral to the left eye. All chan-
nels referenced the right mastoid. EEG was sampled at 256Hz from 7 channels (CZ,
P3, P4, PZ, O2, P04, F7), which were selected based on a saliency analysis on EEG
246 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
that was collected from various participants performing cognitive test battery tasks
(Russell & Gustafson, 2001). EEG signals were preprocessed to remove eye blinks
using an adaptive linear filter that is based on the Widrow-Hoff training rule (Widrow
& Hoff, 1960). Information from the VEOGLB (electrode that measures vertical eye
activity) ocular reference channel was used as the noise reference source for the adap-
tive ocular filter. DC drifts were removed using high pass filters (0.5 Hz cutoff). A band
pass filter (between 2 Hz and 50 Hz) was also employed, as this interval was gener-
ally associated with cognitive activity.
The power spectral density (PSD) of the EEG signals was estimated using the
Welch method (Welch, 1967). The PSD process used 1-s sliding windows with 50%
overlap. PSD estimates were integrated over five frequency bands: 4–8 Hz (theta),
8–12 Hz (alpha), 12–16 Hz (low beta), 16–30 Hz (high beta), and 30–44 Hz (gamma).
The classifier received a PSD feature vector of the five bands as input every100 ms. The
particular selection of the frequency bands was based on well-established interpre-
tations of EEG signals in prior cognitive and clinical contexts (e.g., Gevins, Smith,
McEvoy, & Yu, 1997).
The ABM system supported an independent signal-processing stream. Six chan-
nels were sampled at 256 samples/s with a bandpass from 0.5 Hz and 65 Hz (at 3 dB
attenuation) that was obtained digitally with Sigma-Delta A/D converters. Data were
transmitted across a Bluetooth RF (radio) link to the collection laptop via an RS232
interface. Quantification of the EEG in real time was achieved using signal analysis
Supporting Real-Time Cognitive State Classification 247
Figure 1. Body-worn sensor suite and signal-processing system.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
techniques that identified and decontaminated eye blinks and identified and reject-
ed data points that were contaminated with electromyographic (EMG) signals, ampli-
fier saturation, and/or excursions attributable to movement artifacts (see Berka et al.,
2004, for a detailed description of the artifact decontamination procedures).
Decontaminated EEG was then segmented into overlapping 256 data point win-
dows called overlays. An epoch (the temporal window of analysis) consisted of three
consecutive overlays. Fast-Fourier Transform (FFT) was applied to each overlay of
the decontaminated EEG signal multiplied by the Kaiser window (α= 6.0) to com-
pute the power spectral densities (PSD). The PSD values were adjusted to take into
account zero values inserted for artifact-contaminated data points. The PSD between
70 and 128 Hz was used to detect EMG artifact. Overlays with excessive EMG arti-
facts or with fewer than 128 data points were rejected.
The remaining overlays were then averaged to derive PSD for each epoch with a
50% overlapping window. Epochs with two or more overlays with EMG or missing
data were classified as invalid. For each channel, PSD values were derived for each
1-Hz bin from 3 Hz to 40 Hz and the total PSD from 3 to 40 Hz. Relative power vari-
ables were also computed for each channel and bin using the formula (total band
power/total bin power).
Real-Time Classification
Estimates of spectral power formed the input features to a pattern classification
system. The classification system used parametric and nonparametric techniques to
assess the likely cognitive state on the basis of spectral features; that is, estimate p(cog-
nitive state | spectral features). The classification process relied on probability density
estimates derived from a set of spectral samples. These spectral samples were gath-
ered in conjunction with tasks that were as close as possible to the eventual task
The classification system (Figure 2) used a fusion of three distinct classification
approaches: K-nearest neighbor (KNN), Parzen windows, and Gaussian mixture
models (GMM).
Gaussian mixture models. Gaussian mixture models provided a way to model the
probability density functions of spectral features that were associated with each cog-
nitive state. This was accomplished using a superposition of Gaussian kernels. The
unknown probability density associated with each class or cognitive state was approx-
imated by a weighted linear combination of Gaussian density components. Given an
appropriate number of Gaussian components and appropriately chosen component
parameters (mean and covariance matrix associated with each component), a Gaussian
mixture model can model any probability density to an arbitrary degree of precision.
The parameters associated with component Gaussians were iteratively deter-
mined using the expectation maximization algorithm (Dempster, Laird, & Rubin,
1977). Once the Gaussian parameters were initialized, the system iterated through
a two-step procedure for each sample that was associated with each class. In the first
step (expectation step), the system computed the probability of a particular training
sample belonging to a particular class based on current model parameters (posteriori
248 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
probability). In the maximization step, the model parameters were adjusted in the
direction of increasing the class membership likelihood.
Once probability density functions associated with each cognitive state were gen-
erated, it became possible to classify individual spectral samples. Each spectral vec-
tor was attributed to a class that had the highest posterior probability of representing
it. Posterior probabilities were computed using Bayes’ rule. For example, Figure 3
shows the probability density functions associated with three distinct classes (i.e.,
Supporting Real-Time Cognitive State Classification 249
Figure 2. Classification system.
Figure 3. Gaussian mixture models. Small numbers of Gaussian kernels (dotted lines)
are used to approximate the distribution of features in each class.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
cognitive states). These probability densities are estimated using three Gaussians. Very
high values of the data point x are most likely to have come from Class 3, whereas
very low values of x are most likely to have come from Class 1.
K-nearest neighbor. The K-nearest neighbor approach is a nonparametric technique
that makes no assumption about the form of the probability densities underlying a
particular set of data. Given a particular sample x, the classification process identi-
fies k samples whose features come closest (as assessed by Euclidian or Mahalanobis
distance metrics) to the features represented in x. The sample x is assigned the modal
class of the nearest k neighbors. For example, consider the data point represented by
the question mark in Figure 4. Based on k= 5, it would be assigned the label associ-
ated with the most common class category of its five nearest neighbors.
Parzen windows. Parzen windows (Parzen, 1967) are a generalization of the k-
nearest neighbor technique. Instead of choosing the nearest neighbors and assigning
a sample x with the label associated with the modal class of its neighbors, each vote
is weighed by using a kernel function. With Gaussian kernels, the weight decreases
exponentially with the square of the distance. As a consequence, far-away points be-
come insignificant. Kernel volumes constrain the region within which neighbors are
considered. Consequently, Parzen windows are a better choice when there are large
differences in the variability associated with each class. The data point shown in Fig-
ure 5 is assigned to the dominant class in its immediate vicinity.
Composite classifier. These statistical classification techniques were chosen over
multilayer neural networks because they required minimal training time. KNN and
Parzen windows required no training, whereas the expectation-maximization algo-
rithm used to generate GMMs converged relatively quickly. KNN and Parzen win-
dowapproaches required all training patterns to be held in memory. Every new feature
250 Journal of Cognitive Engineering and Decision Making / Fall 2007
Figure 4. K-nearest neighbor. A given feature vector is assigned the class label associated
with the modal class of the nsamples that are the most similar to it.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
vector had to be compared with each of these patterns. However, despite the com-
putational cost of these comparisons at run time, the system was able to output clas-
sification decisions well within real-time constraints.
The composite classification system regarded the output from each classifier as a
vote for the likely cognitive state. The majority vote of the three component classifiers
formed the output of the composite classifier. Fusing the outputs of multiple clas-
sifiers using a voting scheme is a widely used strategy to increase the robustness of
the classification system. The equal weighting of different classifiers implicit in the vot-
ing scheme reflected the fact that no single classifier produced consistently superior
results across subjects and tasks in pilot experiments. Although simple vote-based
fusion improves the overall performance of classification systems (Kittler, Hatef,
Duin, & Matas, 1998), there are a variety of alternative options for combining diverse
classifiers. Exploring these options will be an objective of future research.
A classification decision was output at a rate of 10 Hz. Outputs from the com-
posite classifier were passed through a modal filter before an assessment of cognitive
state was output by the classification system. Modal filtering served to make the cog-
nitive state assessment process more robust to undesirable fluctuations in the under-
lying EEG signal. Modal filtering was done over a sliding two-second window with
the assumption that cognitive state remains stable over that period of time.
Laboratory Evaluation
This section briefly discusses one classification validation experiment conducted
in a laboratory setting before moving on to the focus of this paper – mobile field eval-
uation. The laboratory evaluation described here is representative of the multiple
Supporting Real-Time Cognitive State Classification 251
Figure 5. Parzen windows. Gaussian kernels placed over each data point are used to
estimate the distribution of features in each class.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
preliminary experiments conducted to validate the approach described in the pre-
vious section. For a more detailed discussion of the previous work that provided the
foundation of the mobile classification field evaluation, see Dorneich, Whitlow, Ververs,
Carciofini, and Creaser (2004); Erdogmus, Adami, Pavel, Lan, Mathan, Whitlow, et al.
(2005); Lan, Erdogmus, Adami, Pavel, and Mathan (2005); and Mathan et al. (2005).
The objective of this experiment was to validate the classification approach using
a well-understood laboratory task, the n-back task, that has been used to manipulate
working memory demands. In addition, two different EEG detection systems were
Data were collected from five participants. All were male researchers at Honeywell.
EEG data were collected with a 32-channel BioSemi Active Two system as well
as a more deployable six-channel ABM EEG sensor (see System Description section
for details). Three participants wore the BioSemi system and two participants wore
the ABM system.
The working memory assessment was conducted using the n-back task. The n-
back task required participants to process a sequence of letters presented on a com-
puter screen. With every presentation of a letter, a participant had to both encode the
letter in memory and indicate whether the letter corresponded to a letter shown n
presentations ago. Working memory load encountered by a participant was manip-
ulated by manipulating the value of n.
Participants were seated and performed the task twice under 1-back and 2-back
conditions. Data associated with the first performance under the two conditions were
used to train the classifiers. The classifier was tested with data from the second per-
formance under each working memory condition. The features used for classification
consisted of estimates of spectral power at theta, alpha, beta, and gamma frequency
at each EEG site.
Data Analysis and Results
The accuracy metric used in our evaluations was derived from a confusion matrix.
The confusion matrix is a square matrix that allows comparison of the accuracy of a
classifier by comparing the predicted class membership against actual membership
(see Figure 6). Typically, rows represent the actual class, whereas columns represent
the predicted class. Counts in each cell provide an indication of how well the classi-
fier performed in classifying each sample in the data set. The counts in each cell are
252 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
weighted by the total count of the samples in each class to produce the proportion
of samples correctly and incorrectly classified. The accuracy metric used here is the
average of the values in the diagonal; that is, the average number proportion of sam-
ples from each class that were correctly classified. See Figure 6 for an example.
Data used for training and testing the classification system were drawn from ex-
perimental sessions that were separated by gaps spanning several minutes. The tasks
used for the training and testing sessions were identical in nature. The metric used
to assess the efficacy of the classification system was the proportion of testing data
correctly classified by the classifier as represented by the confusion matrix-derived
accuracy metric. The average of the true positive and true negative classification rates
of the system reflected both the sensitivity and specificity of the classifier.
The trained classifier assigned each data sample to the 1-back or 2-back catego-
ry. Results based on chance alone would yield a classification accuracy of 50%. The
system was able to classify testing data with an average accuracy of 83% (3 partici-
pants, s.d.10%) with data from the BioSemi system and 75% (2 participants, s.d. 12%)
with the ABM system (Lan et. al., 2005). The difference in performance associated with
the two systems might lie in the difference in the number of sensors provided by
each system. The challenge for the Honeywell team was to test whether the classifi-
cation method could assist in a mobile, more realistic, environment.
Mobile Field Evaluation:Method
The objectives of the mobile field evaluation were to test the effectiveness of the
cognitive state classification approaches and assess the impact of mobility on classi-
fication performance. The tasks were designed to approximate operationally relevant
dismounted soldier tasks while still affording some experimental control. The tasks
used in the evaluation required the participant to be mobile in all scenarios. The sen-
sors and output of the artifact removal algorithms were required to provide the clas-
sifiers with good signals to discriminate between the low and high workload during
completion of the scenarios.
Supporting Real-Time Cognitive State Classification 253
Figure 6. Confusion matrix. The left table counts the number of samples correctly and
incorrectly classified. The right table represents the sample proportions (number of sam-
ples divided by total population of the true class), and derives an accuracy score based
on the average of the accuracy value for each class (0.75 and 0.91, respectively).
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
It was hypothesized that the scenario design would reliably put participants into
high or low states of workload. This hypothesis was tested as part of the evaluation.
If the hypothesis were true, then it was expected that the classification algorithms
should achieve better-than-chance correct correlations between the cognitive state
classification output and the known levels of task load, based on moment-to-moment
classification. However, it was anticipated that signal degradation and loss attributa-
ble to significant artifacts may preclude the levels of classification performance seen
during laboratory studies.
Eight participants completed the evaluation. All were males between the ages of
21 and 42 (mean = 29.5, SD = 7.8), with between 16 and 21 years of education
(mean = 18.6, SD = 1.8). None had military experience. All had normal or corrected
20/20 vision and normal hearing.
Efforts focused on deployment of a cognitive state sensor system in a mobile ex-
perimental test environment. The primary challenge was fielding an integrated sens-
ing, computational, and interactive system within a mobile hardware ensemble. The
prototype ensemble was organized around the U.S Army MOLLE (Modular Light-
weight Load-Carrying Equipment) backpack that provided the framework on which
to integrate multiple sensors, interface devices, network adapters, and data collection
Transitioning from a laboratory environment with computer simulations to a
field exercise required network communications to support experimental require-
ments such as scripting and stimuli presentation. During the field experiment, a
remote computer ran scripts that played prerecorded radio broadcasts to simulate
communication traffic to a dismounted infantry leader. Initially, all sensed data were
transmitted wirelessly to a remote desktop computer, which calculated the cognitive
workload state of the participant and triggered the adaptive automation. The remote
computer also logged data for post hoc analysis.
However, network connectivity and reliability across the experimental test field
posed a considerable challenge and motivated the migration of all data logging and
reasoning to be done on the backpack laptop carried by the participant. After stream-
lining the EEG signal conditioning algorithms, migrating all hardware interfaces to
the backpack laptop, and integrating and testing other external hardware modules,
we performed an early system integration test. Subsequently, all software components
for signal processing, adaptive automation reasoning, and data logging were migrat-
ed to the backpack computer.
The design of the scenarios to empirically assess classification accuracy was sub-
ject to a multitude of sometimes contrary constraints, as noted previously. Tasks
were chosen to be “classifiable,” meaning the tasks within the scenario reliably put
254 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
participants in the cognitive workload state of interest. The Honeywell team worked
with the U.S. Army Natick Soldier Center to develop an operational scenario that close-
ly aligned with operational doctrine, training, and execution of military missions.
Each participant played the role of a platoon leader navigating along a known and
secure route to an objective while communicating over the radio. Each of the partic-
ipants completed four experimental trials, each with periods of low and high task
loads. The navigation task increased the overall task complexity as well as testing the
performance of the neurophysiological and physiological sensors and cognitive state
classifiers while the participant was mobile. In addition to navigation, participants
performed the following tasks:
Maintain Radio Counts. The participant kept a running total of civilians, ene-
mies, and friendlies reported to him over the radio by the company com-
mander while ignoring the counts reported to two other platoon leaders.
The participant was periodically prompted to report his counts.
Mission Monitoring. The participant monitored three virtual squads moving
in bounded overwatch (one squad moves while the other two squads provide
protection). When all three squads reported that they were in position, the
participant ordered the appropriate squad to move forward. The order of
the squads reporting, as well as the squad to move forward, was randomized.
Interruption Task. A series of math problems were periodically (one problem/
min) presented to the participants as an interruption task during the scenario.
This task was representative of any type of unanticipated interruption that
requires significant cognitive resources and an immediate response from the
platoon leader. Once the interruption task started, participants had 10 s to
answer the problem correctly.
Maintain Situation Awareness. In addition to the situation awareness they need-
ed to perform on the other tasks listed here, participants were asked about the
content of additional low-priority messages they received.
Stressors were used to make the scenarios more representative of the actual envi-
ronment in which soldiers operate. Stressors included time pressure to complete tasks
(for example, the countdown clock on the mathematical task) and the increased rate
of messages in the high task load elements of the scenario. Participants were encour-
aged to keep moving throughout the scenarios. The stress and anxiety brought on
by competition was explored by offering a monetary award for the highest score at
the end of the evaluation.
Independent variable. Task load was either high or low. Within each scenario there
were blocks of high and low task load conditions that lasted approximately 5 min
and 3 min, respectively. The primary difference between high and low task load
periods was the pace of radio communications. The composite rate of Maintain Count
and Mission Monitoring messages was approximately 2.4 times faster in the high task
load period (8.7 messages/min) than the low task load period (3.6 messages/min).
Supporting Real-Time Cognitive State Classification 255
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
Experimental design. This was a single-factor (task load block: high/low) within
participants’ design. Each scenario had four task load blocks in a fixed order: high,
low, high, low.
Training trials. There were two components to the training that were conducted
before the participant performed the experimental trials. The first training session
was to ensure that all participants had basic familiarity and proficiency with all the
tasks they were to perform in the experiment. The second training session was to col-
lect data with which to train the cognitive state classifiers. After collecting between
5 and 10 min of EEG spectra data for both low and high task load training condi-
tions, we submitted the data to the composite classification system to identify pat-
terns to distinguish the workload conditions. This was done on the same day as the
Experimental trials. Scenarios were run in a large grassy field surrounded by light
forest situated behind Honeywell in northeast Minneapolis, Minnesota. Participants
primarily interacted with a handheld radio and a personal digital assistant (PDA).
Input for the mission monitoring and the Maintain Counts tasks came over the radio,
and they responded over the radio as well. The math interruption task, which was
completed on a PDA, occurred at equal frequencies under both task load condi-
tions. At the end of each block, participants were asked to fill out subjective work-
load surveys.
Data Analysis
The principal goal of the data analysis was twofold: (a) determine whether the
difference in task load invoked a concomitant difference in cognitive workload, and
(b) validate that the cognitive state classification algorithms can distinguish these
differences in task load.
Subjective workload ratings of mental demand, physical demand, temporal de-
mand, performance, effort, and frustration were taken via the NASA-TLX rating scale
(Hart & Staveland, 1988). NASA-TLX was given at the end of each experimental task
load block. Successful cognitive workload manipulation was assessed by comparing
the subjective workload ratings with the task load manipulation. In addition, objec-
tive performance measures on the tasks were compared across low and high task load
blocks as another indication of differentiated workload. Objective measures included
Maintain Counts: Reported versus actual counts of civilians, enemies, and
Mission Monitoring: Errors in choice of which squad to send forward, and
errors in the timing of move command.
•Tertiary Mathematical Task: Response time to initiation alert, time to solve the
problem, and response accuracy.
Classification accuracy was assessed by comparing the cognitive state classifi-
cation accuracy across the low and high task load periods within each block. The
classification system provided cognitive state assessments every 2 s, providing a
256 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
moment-to-moment assessment. As mentioned earlier, the accuracy metric used to
evaluate the classifier was derived from a confusion matrix.
Mobile Field Evaluation:Results
Subjective Results
Workload was manipulated by varying the task load (rate of incoming messages)
over a block of time. The NASA-TLX was administered to confirm that the partici-
pants experienced a change in perceived workload. The TLX scores were compared
in the high and low task load blocks (see Figure 7). An analysis of variance (ANOVA)
was performed on the measures to study within-participants contrasts. Differences
were considered significant for alpha < .05. During the high task load blocks, partic-
ipants recorded a significant increase in mental demand, F1,7 = 13.4, p< .01; tempo-
ral demand, F1,7 = 23.5, p < .01; performance, F1,7 = 20.0, p< .01; effort, F1,7 = 25.9,
p< .01; and frustration, F1,7 = 15.0, p< .01, as compared with the low task load
blocks. The only measure that did not change significantly was physical demand,
F1,7 = .006, p> .10, which was expected, as the scenario design did not vary the phys-
ical demands in the two task load conditions.
Performance Results
Figure 8 (page 258) illustrates the task-related ANOVA results (alpha < .05) in
the low and high task load blocks. Participants showed reduced accuracy on the Mis-
sion Monitoring task in the high task load periods (67.4%) as compared with the
low task load periods (95.8%). This difference was significant, F1,7 = 24.7, p< .01.
The difference in the Maintain Counts performance was not significant. On the math
Supporting Real-Time Cognitive State Classification 257
Figure 7. Subjective assessment of workload in the high and low task load blocks; signif-
icant differences are denoted with an asterisk.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
interruption task, participants responded faster in the low task load block (loss of data
left only n= 4, so the difference was not significant), and the solve time and accu-
racy showed no difference.
The subjective ratings of workload, as well as the behavioral results from the
Mission Monitoring task during the low and high task load blocks, all lend confi-
dence to the hypothesis that the scenario design did indeed create two distinct lev-
els of cognitive workload among the participants. The ability of the real-time cognitive
state classification system to correctly characterize the task load blocks is the topic
of the next section.
Classification Results
A crucial component of classification in field settings was a systematic procedure
for selecting a subset of EEG features that was robust to potential artifacts and pro-
vided a basis to discriminate between workload classes. One way to do this was
through an exhaustive selection of every possible feature combination drawn from
the training data. Then the feature subset producing the best classification perform-
ance could be selected for classifying cognitive state in the field. However, such an
exhaustive search would result in 2nsearches, where nrepresents the number of
features. Instead, backward elimination (Langley, 1994) was used, a heuristic proce-
dure that searches the space of possible feature subsets to identify those that would
provide reliable classification. Feature selection was based on the training data that
were obtained prior to the testing data and under the same task conditions. With an
appropriate selection of channels, by this approach it was possible to classify cog-
nitive state with an accuracy that exceeded 70% for all participants. The mean clas-
sification accuracy was 74.4% with a standard deviation of 9.01%. A classification
accuracy as high as 95% was observed for one participant (see Figure 9). Data from
one participant (s6) were lost because of a system malfunction.
258 Journal of Cognitive Engineering and Decision Making / Fall 2007
Figure 8. Task metrics across task load conditions; significant differences denoted with
an asterisk.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
Performance with both the BioSemi (participants s7 and s8) and ABM (5 partic-
ipants: s1–s5) system was close to identical in the field environment. This finding
was in contrast to lab assessments, in which the 32-channel BioSemi system provid-
ed better performance relative to the six-channel ABM system (Dorneich, Whitlow,
Ververs, Mathan et al., 2005). A possible explanation for this discrepancy may be
because of the differences in the hardware design. The large number of relatively un-
constrained cables that are associated with the BioSemi might have been susceptible
to movement-induced vibration, which may have been a potential source of noise.
Any benefits of additional channels that the BioSemi system provided may have been
lost because of vulnerabilities to these movement artifacts. In contrast, the ABM sys-
tem was specifically designed for mobile use. If these results are replicated with a
larger group of participants, there may be a need for hardware specifically designed
to withstand the rigors of mobility in the field.
A series of substantial scientific and engineering issues needed to be successful-
ly addressed in order to deliver compelling results for the mobile cognitive state clas-
sification. First, participants needed to be placed in reliably high and low task loading
conditions within an operationally relevant mobile task scenario. This was validated
across multiple performance and subjective measures. These results lent confidence
that the classification assessment approach was tested against task conditions that
were perceived and elicited performance commensurate as low and high cognitive
task loads.
Supporting Real-Time Cognitive State Classification 259
Figure 9. Moment-to-moment classification accuracy for each participant.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
Second, the evaluation confirmed that the signal-processing and classification
algorithms not only ran on a mobile computing platform in real time but delivered
moment-to-moment cognitive state classification performance greater than 70% for
all participants. In order to deal with poor signal-to-noise ratio under mobile EEG
collection, real-time signal processing was developed to remove eye blinks, exclude
data contaminated by muscle artifacts, account for eye movements, correct for DC
drift, eliminate spikes, and remove motion-induced, high-frequency components.
The net result was that the signal-processing solution preserved sufficient signal
quality to decipher differences in the EEG spectral dynamics under low and high
cognitive loads.
There are many reasons why these results constituted a significant contribution
to the emerging field of augmented cognition as well as to the broader field of exper-
imental neuroscience. First, the granularity of the classification performance was at
the 2-s resolution and did not depend on larger samples to classify disparate states.
Classification performance represented the percentage of all data samples, approx-
imately 300 in high blocks and 180 in low blocks for all participants who were cor-
rectly classified. It was a far more common practice to report average classification
performance over an entire experimental block or time windows substantially greater
than 2 s, neither of which was a particularly germane measure when evaluating a sys-
tem that adapts in real time.
Second, the task conditions were far more heterogeneous, variable, and ecolog-
ically valid than was typically seen in prior classification studies in which participants
performed a single well-defined laboratory task. In both low- and high-task condi-
tions, participants were required to perform three separate tasks along with requisite
task switching and working memory rehearsal. As is the case for most “cognition in
the wild,” participants adopted different strategies to manage the multiple-task exe-
cution (as evidenced in postexperimental questionnaire responses). To achieve rea-
sonably good classification rates under these conditions indicates that the utility of
EEG in classification was likely to extend to more ecologically valid task conditions.
Third, the classifiers were trained with data from a distinct period that was com-
pleted before the test phase. In many classification studies, researchers sample train-
ing and test samples from the same block, often from temporally adjacent samples.
It is well known that EEG baselines drift over time, described as nonstationarity, as
is common to many physiological processes; therefore, running a classifier on train-
ing data from a previous period was a technical risk but resulted in validating the
approach in a more rigorous manner.
Fourth, the system used data from relatively few sensor sites – six sites from the
ABM system and seven from the BioSemi system – because any imagined field deploy-
ment needs to minimize the number of sensors. Many researchers strive to maximize
the number of sensors to ensure adequate coverage to provide them with the spatial
resolution to capture subtle differences across the cortex. These findings suggest that
even relatively sparse EEG arrays provide sufficient coverage to distinguish between
the two task loading conditions.
Fifth, the current study achieved encouraging classification between two states
260 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
that are very similar in the classes of cognitive processing that is required, such as
working memory, but differ substantially in the intensity or tempo of processing re-
quired. This suggests that the approach detected differences in executive functions
that supported the management of multiple tasks over time.
Finally, all these findings indicate that this reported EEG approach will be an ef-
fective means of triggering adaptive systems in real-world applications. This approach
provides the temporal resolution to respond to short-term changes in cognitive
state that would be required for applications such as communications scheduling
(Dorneich, Whitlow, Mathan, Carciofini et al., 2005). In this study, the communi-
cations scheduler (adaptation) applied messaging techniques that included drawing
attention to higher-priority items with additional alerting tones or visual text mes-
sages and deferring lower-priority messages to a commander’s display device for later
review. Communication scheduling significantly increased the accuracy in maintain-
ing counts in high task load conditions (67.4% accuracy unmitigated, 95.7% miti-
gated). Likewise, the communications scheduler significantly increased the accuracy
of mission monitoring in high task load when mitigation was available (68.2% un-
mitigated, 95.8% mitigated). Because the focus of this article was the feasibility of
assessing cognitive state in a mobile participant, space constraints precluded full
discussion of the adaptive automation performance results in the current evalua-
tion (see Dorneich, Whitlow, Mathan, Ververs et al., 2005).
Lessons Learned
In addition to the performance findings discussed previously, many practical
lessons were learned in the assembling, fielding, and evaluating of EEG-based clas-
sifiers in a mobile setting. A summary of lessons learned in this work is presented
in the table below.
Supporting Real-Time Cognitive State Classification 261
Lessons Learned
Area Lesson Learned
Task definition Consult domain experts. The U.S Army Natick Soldier Center was consulted
in designing “operationally relevant” tasks. This saved considerable time, and
results will be better received due to their ecological validity. The use of rep-
resentative tasks lends more confidence that the findings will be transferable
to an actual domain.
Task definition Baseline tasks early and often to ensure that representative participants per-
form and perceive different task loads as low and high. Initial assumptions
about what participants could handle in terms of a “high” communications
tempo were quickly challenged by the data collected with pilot participants.
Signal processing Develop the capability to collect data in an actual environment. A novel stabili-
ty control was created to improve filtering of ocular activity. When faced with
the extreme artifacts in a mobile environment, most adaptive filters would
become unstable and unusable.
Continued on next page
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
262 Journal of Cognitive Engineering and Decision Making / Fall 2007
Lessons Learned (continued)
Area Lesson Learned
Signal processing Critically review similar research to understand application to the target
domain. Findings from prior research were quickly identified as inadequate
for identifying relevant EEG sites for use in applied operational domains. Given
the dynamic multitasking nature of the mobile task environment, the limited
relevance of controlled laboratory studies in down-selecting to a subset of
channels was discovered. Most studies involved well-defined, homogeneous,
stationary tasks that typically reported averaged results and not moment-
to-moment classification accuracy.
Collect sufficient data to determine how much training data are required
to provide good classification performance. Use pilot studies to determine
how much training data were required to provide robust classification perfor-
mance. The amount of data needed varies depending on the nature of the
task environment, signal-to-noise ratio, and classification techniques used.
Classification Fit the approach to the constraints of the environment. Explore multiple
temporal windows in considering the constraints imposed by the sensor
density, computational efficiency, precise task adaptation needs, and the
high degree of classification accuracy during ongoing research studies.
Determine the ideal number of sensors by considering the processing
demands, operational environment, and generalizability of the classification
across multiple situations. It was determined that more sites were not always
better for machine learning classification. Once the classifier approach goes
beyond the most informative features (site by frequency band), the classifier
begins to overfit to noise and degrade classification performance – much like
adding unnecessary parameters to a regression model.
System integration Ruggedize the equipment for testing in a field environment. Most ruggedized
laptops not only come with shock-mounted hard drives to protect your data
but include better thermal management, which is sorely lacking in traditional
laptops (as was found one warm, sunny spring day).
Select an EEG system that preamplifies the signal at the electrode site to
enable low noise measurements.
Program Whenever possible, simplify the experimental design to reduce the complexity
management of conducting field studies. Inevitably the system integration phase will take
three times longer than expected. By limiting the number of research ques-
tions of interest and avoiding rolling up everything in a single study, imple-
mentation of overall findings for the study is more manageable. This ambitious
study involved making a novel system fieldable, creating realistic operational
tasks with separable cognitive task loads, and adapting a classification ap-
proach to the operationally relevant tasks, all of which seriously challenged
timetables, budgets, and overall resources.
Risk management Consider an experimental design that includes segments with severable
benefits (meaning that if something breaks or if it starts raining, the data
collected up to that point will be usable) so that a lengthy data collection does
not become “all or nothing. With a lengthy, elaborate experiment using an
elaborate system, the probability of running start to finish without some
glitch approaches zero.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
Although the classification results reported here are promising, several short-
comings need to be addressed in future work. First, some of the results we described
will have to be validated against larger groups of participants. Second, although the
classification approach seems to generalize over periods spanning minutes and hours,
it is unclear whether the system can generalize over larger temporal gaps between
training and testing. Third, the work reported here has focused only on EEG; how-
ever, considering other information sources, such as cardiac sensors and fNIR, may
make cognitive state estimation more robust under circumstances in which EEG may
be compromised. Fourth, the cognitive state classifiers evaluated in this study use
Bayes rule to make decisions about the cognitive state, based on EEG feature vectors.
However, making optimal classification decisions within a Bayesian framework also
requires consideration of the prior probability of various workload states and the
cost of actions that are associated with the cognitive state-related decisions. The cur-
rent implementation assumes equal priors for each state and does not weigh the cost
of actions. Consideration of priors and costs will be an important priority as the tech-
nology described here is transitioned into an operationally relevant system.
Next Steps
As the technology transitions from mobile, experimental scenarios to future
operational integration events, the Honeywell classification approach will be tailored
to address likely deployment challenges. Feedback from Army partners indicates the
Honeywell sensor and computational component must address the following high-
level requirements:
•Provide reliable performance under harsh dismounted conditions.
•Integrate with other FFW subsystems in a manner that does not appreciably
increase weight, size, power consumption, network bandwidth utilization, or
computational resources.
Garner very high levels of user acceptance and operational acceptance.
Classification Accuracy
The classifier approach will continue to be developed to address some of the
Supporting Real-Time Cognitive State Classification 263
Lessons Learned (continued)
Area Lesson Learned
Participant Within the bounds of any Institutional Review Board (IRB) agreement, recruit
recruitment motivated participants for lengthy experiments of this nature. From the time
the participants arrived until they cleaned the EEG gel out of their hair, these
experimental sessions lasted a minimum of 5 to 7 hr, during which they wore
a 35-lb backpack and an EEG sensor headset with gel, walked the navigation
course for at least an hour, and performed very challenging cognitive tasks.
Fortunately, this study recruited individuals who were intrinsically motivated,
competitive, and highly intelligent.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
limitations discussed earlier. Evaluating the classification approach with a larger set
of users who are operating in their natural task environment will be the focus of the
next evaluation. In addition, cardiac sensors as well as EEG sensors will be assessed
with the goal of fusing the sensor streams to provide more robust, reliable, and ac-
curate classification. In future work, we will also look at the consideration of priors
and costs in the classification decision.
System Reliability
Maintaining system reliability under harsh conditions is the reality of the dis-
mounted soldier domain. In addition to the common challenge for all electronics in
the battlefield to be ruggedized, a system that measures neurophysiological signals
must confront the considerable noise introduced by motion, sweating, and muscle
activity. In previous sections, we discussed the means by which these artifacts were
addressed for the participants operating in the mobile, multitasking scenarios.
The next steps to improve system reliability will involve rigorous testing within
dismounted operational environments, which will expose the system to increased
physical stress and a variety of environmental conditions and will likely introduce
new classes of signal artifacts that have not yet been encountered. This would pro-
vide an opportunity to improve signal processing by isolating and addressing, either
by advanced data filtering or physical integration improvements, the new sources
of noise.
System Fieldability
Effective integration with FFW component systems essentially implies the need
to continue to reduce the hardware, software, computational, and power footprint of
the system. In a matter of 2 years, the computational platform has transitioned from
a five-desktop immobile system to a fully wearable mobile system that relies on only
a laptop computer in the participant’s backpack (see Figure 10). In addition to the
dramatic hardware reduction, the sensing and signal-processing requirements have
been streamlined to be tractable on a single standard laptop. There will be continued
efforts to streamline the sensing system to ensure that it is as small, power-efficient,
and reliable as possible.
In the future, much of the signal-processing and classification calculations could
be done on dedicated hardware rather than utilizing software processing capacity. The
determining factor in the computational load of the classification system is the num-
ber of sensor sites necessary for robust classification. Toward that goal, the system
has transitioned from using the BioSemi Active Two system with 32 channels of EEG
to the ABM 6-channel sensor headset.
Furthermore, reducing computational requirements will be explored by encod-
ing neurophysiological signal processing onto a hardware system that would require
less software computation from the FFW wearable computer. Finally, potential net-
work protocols that utilize the minimum bandwidth while still transmitting the req-
uisite volume of feedback to provide value to the FFW suite will be explored. This
requires secure, efficient, and wireless data transmission from the integrated sensors
264 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
to a local signal processor for managing artifacts and spectrally decomposing signals
for subsequent classification. Ultimately, a fielded FFW augmented cognition system
will likely require advanced sensors, integrated hardware signal processing, and high-
ly efficient software agents running on the FFW mobile computer. Such a system
would be capable of triggering adaptations to the warfighters’ task environment based
on their cognitive state.
The next steps to improve fieldability include exploring sensor options that have
a reduced footprint compared with current sensing systems. For example, free-field
or minimal-preparation EEG electrode-based systems that are easily integrated into
a helmet liner or embedded within helmet pads will be considered.
System Form and Function Acceptability
In order for a system to be successfully fielded, user acceptance is critical to ensure
use in the battlefield environment. User acceptance for an augmented cognition sys-
tem includes ease of donning and doffing, comfortable integration with Advanced
Combat Helmet (ACH), and satisfaction of functional expectations. The ACH is the
replacement of the old Kevlar Army helmet and is designed to be lighter, stronger,
and compatible with current night vision devices, communications packages, and
nuclear, biological, and chemical defense equipment and body armor (Global Secu-
rity, 2006). Specifically, the system would need to be seamlessly integrated into the
ACH to a degree that a warfighter could simply don the helmet to enable the sensors
that are either integrated within the helmet liner or helmet padding, without adhe-
sives or electrolyte gel. The sensor-enabled helmet must be reasonably comfortable
to wear for extended durations.
Finally, the augmented cognition system should deliver value and satisfy func-
tional expectations to justify the addition, however small, of power, weight, and
Supporting Real-Time Cognitive State Classification 265
Figure 10. Initial (left) and current (right) systems.
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
computational requirements. Initial implementations of the augmented cognition
system would involve providing cognitive state information to remotely located com-
manders or key leaders to assess the cognitive combat readiness of their subordinates.
The next step in addressing these challenges is experimentation in an operational
environment that will further constrain the form and functional requirements. This
step will also provide a test environment to perform cognitive classification studies
with considerably more ecological validity, further proving the feasibility and utility
of determining cognitive states of interest in an operational environment.
Adaptive System Triggering
Work continues on building adaptive systems that use cognitive state assessment
as triggers. Automation is an effective means to allow users to conserve cognitive re-
sources to allocate to other higher priority tasks (Dixon & Wickens, 2004; Rovira,
Zinni, & Parasuraman, 2002). Using an assessment of the cognitive state of the user
on which to base decisions when to apply automation is one method of adaptive
automation. The work described here focuses on real-time assessment of a human’s
capacity to understand and use information while under high task load conditions,
in which cognitive capacity can fluctuate greatly. In task management, mitigation
strategies might include intelligent interruption to improve limited working mem-
ory, attention management to improve focus during complex tasks, or cued memory
retrieval to improve situational awareness and context recovery.
Ultimately, the goals of adaptive automation are similar to those of automation in
general: improve overall performance while avoiding “operator out of the loop” con-
flicts or mistrust in the automation. Such technologies not only have the potential
to significantly reduce the strain on soldiers’ cognitive resources but also provide the
opportunity to improve overall decision making by better managing information
flow (Schmorrow, Raley, & Ververs, 2004). The overall result is a benefit by making
smarter decisions about what information gets presented, when it is presented, and
how it is presented.
The authors thank Danni Bayn, Jim Carciofini, Natalia Mazaeva, Trent Reusser,
James Sampson, and Jeff Rye for their contributions to this work. We also thank Glenn
Wilson for an early review of the manuscript, as well as the anonymous reviewers,
all of whom provided excellent comments and feedback.
This research was supported by a contract with DARPA and funded through the
U.S. Army Natick Soldier Center under Contract No. DAAD16-03-C-0054, for which
Dylan Schmorrow and Amy Kruse served as the program managers of the DARPA
Improving Warfighter Information Intake Under Stress/Augmented Cognition pro-
gram and Henry Girolamo was the U.S. Army program manager and DARPA agent.
Any opinions, findings, conclusions, or recommendations expressed herein are those
of the authors and do not necessarily reflect the views of DARPA or the U.S. Army.
266 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
Backs, R. W., & Seljos, K. A. (1994). Metabolic and cardiorespiratory measures of mental effort:
The effects of level of difficulty in a working-memory task. International Journal of Psychophysi-
ology, 16, 57–68.
Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of process-
ing resources. Psychological Bulletin, 91, 276–292.
Berka, C., Levendowski, C., Cvetinovic, M. M., Petrovic, M. M., Davis, G., Lumicao, M. N., Zivkovic,
V. T ., Popovic, M. V., & Olmstead, R. (2004). Real-time analysis of EEG indices of alertness,
cognition, and memory acquired with a wireless EEG headset. International Journal of Human-
Computer Interaction, 17(2), 151–170.
Boiten, F. A. (1998). The effects of emotional behaviour on components of the respiratory cycle.
Biological Psychology, 49(1–2), 29–51.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data
via the EM algorithm. Journal of the Royal Statistical Society, 39, 1–38.
Dixon, S. R., & Wickens, C. D. (2004). Automation reliability in unmanned aerial vehicle flight
control. In D. Vincenzi (Ed.), Proceedings of Human Performance and Situation Awareness and
Automation (HPSAA) II. Daytona Beach, FL: HPSAA.
Dorneich, M. C., Whitlow, S. D., Mathan, S., Carciofini, J., & Ververs, P. M. (2005). The commu-
nications scheduler: A task scheduling mitigation for a closed loop adaptive system. Proceedings
of the 11th International Conference on Human-Computer Interaction (HCI International 2005).
Mahwah, NJ: Erlbaum.
Dorneich, M. C., Whitlow, S. D., Mathan, S., Ververs, P. M., Pavel, M., & Erdogmus, D. (2005).
DARPA improving warfighter information intake under stress: Augmented cognition phase
III final report. Technical report for DARPA Augmented Cognition Phase 3 (contract DAAD 16-
Dorneich, M. C., Whitlow, S. D., Ververs, P. M., Carciofini, J., & Creaser, J. (2004). Closing the loop
of an adaptive system with cognitive state. Proceedings of the Human Factors and Ergonomics
Society 48th Annual Meeting (pp. 590–594). Santa Monica, CA: Human Factors and Ergo-
nomics Society.
Dorneich, M. C., Whitlow, S. D., Ververs, P. M., Mathan, S., Raj, A., Muth, E., Hoover, A.,
DuRousseau, D., Parra, L., & Sajda, P. (2004). DARPA improving warfighter information
intake under stress. Augmented cognition: Concept validation experiment (CVE) analysis
report for the Honeywell team. Technical report for DARPA Augmented Cognition Phase 2B (con-
tract DAAD 16-03-C-0054).
Erdogmus, D., Adami, A., Pavel, M., Lan, T., Mathan, S., Whitlow, S., & Dorneich, M. (2005). Cog-
nitive state estimation based on EEG for augmented cognition, presented at the 2nd IEEE EMBS
International Conference on Neural Engineering, Arlington, VA, March 16–19.
Garavan, H., Ross, T. J., Li, S.-J., & Stein, E. A. (2000). A parametric manipulation of central exec-
utive functioning using fMRI. Cerebral Cortex, 10, 585–592.
Gevins, A., Smith, M. E., McEvoy, L., & Yu, D. (1997). High resolution EEG mapping of cortical
activation related to working memory: Effects of task difficulty, type of processing, and prac-
tice. Cerebral Cortex, 7, 374–385.
Gevins, A., & Smith, M. E. (2000). Neurophysiological measures of working memory and individ-
ual differences in cognitive ability and cognitive style. Cerebral Cortex, 10, 829–839.
Gevins, A., & Smith, M. (2003). Neurophysiological measure of cognitive workload during human-
computer interaction. Theoretical Issues in Ergonomics Science, 4(1–2), 113–132.
Global Security. (2006). Advanced combat helmet (ACH). Retrieved October 24, 2007, from http://
Supporting Real-Time Cognitive State Classification 267
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
Hart, S. G., & Staveland, L. E. (1988). Development of a multi-dimensional workload rating scale:
Results of empirical and theoretical research. In P. Hancock & N. Meshkati (Eds.), Human
mental workload. Amsterdam: Elsevier.
Heslegrave, R. J., & Furedy, J. J. (1979). Sensitivities of HR and T-wave amplitude for detecting cogni-
tive and anticipatory stress. Physiology & Behavior, 22(1), 17–23.
Institute of Medicine of the National Academies. (2004). Strategies for monitoring cognitive per-
formance. In Monitoring metabolic status: Predicting decrements in physiological and cognitive
performance (pp. 171–172). Washington, DC: National Academy Press.
Izzetoglu, K., Bunce, S., Onaral, B., Pourrezaei, K., & Changem, B. (2004). Functional optical brain
imaging using near-infrared during cognitive tasks. International Journal of Human-Computer
Interaction, 17, 211–227.
Kalsbeek, J. W. H., & Ettema, J. H. (1963). Scored irregularity of the heart pattern and measure-
ment of perceptual or mental load. Ergonomics, 6, 306–307.
Kittler, M., Hatef, R., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 20(3), 226–239.
Kramer, A. (1991). Physiological metrics of mental workload: A review of recent progress. In D.
Damos (Ed.), Multiple task performance (pp. 279–328). London: Taylor & Francis.
Lan, T., Erdogmus, D., Adami, A., Pavel, M., & Mathan, S. (2005). Salient EEG channel selection
in brain computer interfaces by mutual information maximization. 27th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (pp. 7064–7047). Los Ala-
mitos, CA: IEEE.
Makeig, S., & Jung, T.-P. (1995). Changes in alertness are a principal component of variance in the
EEG spectrum. NeuroReport, 7(1), 213–216.
Mathan, S., Mazaeva, N., Whitlow, S., Adami, A., Erdogmus, D., Lan, T., & Pavel, M. (2005). Sensor-
based cognitive state assessment in a mobile environment. Proceedings of the 11th International
Conference on Human-Computer Interaction. Mahwah, NJ: Erlbaum.
Mikulka, P., Hadley, G., Freeman, F., & Scerbo, M. (1999). The effects of a biocybernetic system
on vigilance decrement. Proceedings of the Human Factors and Ergonomics Society 43rd Annual
Meeting (p. 1410, abstract). Santa Monica, CA: Human Factors and Ergonomics Society.
Neumann, D. L. (2002). Effect of varying levels of mental workload on startle eyeblink modula-
tion. Ergonomics, 45, 583–602.
Parasuraman, R., Mouloua, M., & Hilburn, B. (1999). Adaptive aiding and adaptive task alloca-
tion enhance human-machine interaction. In M. W. Scerbo & M. Mouloua (Eds.), Automation
technology and human performance: Current research and trends (pp. 129–133). Mahwah, NJ:
Partala, T., & Surakka, V. (2003). Pupil size variation as an indication of affective processing.
International Journal of Human-Computer Studies, 59(1–2), 185–198.
Parzen, E. (1967). On estimation of a probability density function and mode. In Time series analy-
sis papers. San Diego, CA: Holden-Day, Inc.
Pope, A. T., Bogart, E. H., & Bartolome, D. (1995). Biocybernetic system evaluates indices of
operator engagement. Biological Psychology, 40, 187–196.
Popivanov, D., & Mineva, A. (1999). Testing procedures for non-stationarity and non-linearity in
physiological signals. Mathematical Biosciences. 157(1–2), 303–320.
Porges, S. W., & Byrne, E. A. (1992). Research methods for measurement of heart-rate and respi-
ration. Biological Psychology, 34(2–3), 93–130.
Prinzel, L. J., Freeman, F. G., Scerbo, M. W., Mikulka, P. J., & Pope, A. T. (2000). A closed-loop sys-
tem for examining psychophysiological measures for adaptive automation. International Journal
of Aviation Psychology, 10, 393–410.
Rovira, E., Zinni, M., & Parasuraman, R. (2002). Effects of information and decision automation
on multi-task performance. Proceedings of the Human Factors Society 26th Annual Meeting (pp.
327–331). Santa Monica, CA: Human Factors and Ergonomics Society.
268 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
Russell, C. A., & Gustafson, S. G. (2001). Selecting salient features of psychophysiological measures
(Air Force Research Laboratory Technical Report AFRL-HE-WP-TR-2001-0136).
Scerbo, M. W., Freeman, F. G., Mikulka, P. J., Parasuraman, R., DiNocero, F., & Prinzel, L. J. (2001).
The efficacy of psychophysiological measures for implementing adaptive technology (NASA Technical
Report TP-2001-211018, June 2001). Hampton, VA: National Aeronautics and Space Admini-
stration, Langley Research Center.
Schmorrow, D. D., & Kruse, A. A. (2002). Improving human performance through advanced cog-
nitive system technology. Proceedings of the Interservice/Industry Training, Simulation and Educa-
tion Conference. Washington, DC: National Defense Industrial Association.
Schmorrow, D., Raley, C., & Ververs, P. (2004). Toward effective warfighting in stressful environ-
ments. Poster presented at Human Performance and Situation Awareness and Automation
(HPSAA) II. Daytona Beach, FL.
Scott, W. B. (1999). Automatic GCAS: You can’t fly any lower. Aviation Week and Space Technology,
150(5), 76–79.
Stern, J. A., Boyer, D., & Schroeder, D. (1994). Blink rate: A possible measure of fatigue. Human
Factors, 36,285–297.
Thorpe, S., Fize, D., & Marlot, C. (1996). Speed of processing in the human visual system. Nature,
381, 520–522.
Veltman, J. A., & Gaillard, A.W. K. (1998). Physiological workload reactions to increasing levels
of task difficulty. Ergonomics, 41, 656–669.
Verwey, W. B., & Veltman, H. A. (1996). Detecting short periods of elevated workload: A com-
parison of nine workload assessment techniques. Journal of Experimental Psychology-Applied,
2, 270–285.
Welch, P. D. (1967). The use of fast fourier transform for the estimation of power spectra: A
method based on time averaging over short modified periodograms. IEEE Transactions on
Audio and Electroacoustics, 15(2), 70–73.
Wickens, C. D., & Hollands, J. (2000). Engineering psychology and human performance (3rd ed.).
New York: Prentice Hall.
Widrow, B., & Hoff, M. E. (1960). Adaptive switching circuits. IRE WESCON Convention Record
(pp. 96–104). New York: Institute of Radio Engineers.
Wientjes, C. J. E. (1992). Respiration in psychophysiology: Methods and applications. Biological
Psychology, 34(2–3), 179–203.
Wildervanck, C., Mulder, G., & Michon, J. A. (1978). Mapping mental load in car driving. Ergo-
nomics, 21, 225–229.
Wilson, G., & Russell, C. (2003). Operator functional state classification using multiple psycho-
physiological features in an air traffic control task. Human Factors, 45, 381–389.
Wilson, G. F., & Eggemeier, F. T. (1991). Physiological measures of workload in multi-task envi-
ronments. In D. Damos (Ed.), Multiple-task performance (pp. 329–360). London: Taylor &
Yamada, F. (1998). Frontal midline theta rhythm and eyeblinking activity during a VDT task and
a video game: Useful tools for psychophysiology in ergonomics. Ergonomics, 41, 678–688.
Michael Dorneich (Ph.D., cognitive engineering, University of Illinois, 1999) is a senior
research scientist at Honeywell Laboratories, Human Centered Systems Group. His expertise
includes design of mixed-imitative systems, adaptive human-machine interaction in complex
domains, interactive learning environments, and collaborative decision support systems.
Recent work includes developing neuroadaptive systems in the DARPA Augmented Cogni-
tion program, for which he is co-PI. For NASA he has designed alerting systems, decision aids,
and displays/controls for the Orion program.
Supporting Real-Time Cognitive State Classification 269
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
Stephen D. Whitlow (M.S., cognitive psychology, University of Illinois, 1995) is a senior
research scientist at Honeywell Laboratories, Human Centered Systems Group. His technical
expertise includes neuroadaptive systems, mixed-initiative design, and data visualization. He
has worked in domains including aviation, medical systems, cyber-security, military operations,
unmanned vehicles, satellite control, and petrochemical processing. He is the PI of Honeywell’s
Augmented Cognition program, which is developing adaptive systems that manage informa-
tion flow based on real-time sensed cognitive state.
Santosh Mathan (Ph.D., human-computer interaction, Carnegie Mellon University, 2004) is a
senior research scientist at Honeywell Laboratories, Human Centered Systems Group. His proj-
ects focus on sensor-based approaches to cognitive state estimation in real-world application
contexts. He is PI on DARPA’s Neurotechnology for Intelligence Analysts program and leads
pattern recognition and signal-processing efforts under DARPA’s Augmented Cognition pro-
gram. He is also exploring the potential for using sensor-based workload estimation techniques
in human computer-interaction assessment contexts.
Patricia May Ververs (Ph.D., Honeywell Aerospace, research scientist) is a principal research
scientist in the Human Centered Systems section at Honeywell Aerospace in Columbia, MD.
She received her Ph.D. in engineering psychology from the University of Illinois at Urbana-
Champaign in 1998. Since joining Honeywell in 1998, she has led programs in the areas of high-
speed research, cockpit alerting, advanced primary flight displays, augmented cognition, and
neurotechnologies. Ververs serves as the Synthetic Vision System Integrated Product Team
lead for the DARPA Sandblaster program, developing technologies to support helicopter land-
ings in brownout conditions.
Deniz Erdogmus is an assistant professor jointly with the Computer Science and Electrical
Engineering Department and the Biomedical Engineering Department of Oregon Health and
Science University. His research focuses on information-theoretic adaptive signal processing
and biomedical applications, including brain interfaces. He received two B.S. degrees in elec-
trical and electronics engineering and mathematics in 1997 and an M.S. in electrical and
electronics engineering in 1999 from Middle East Technical University, Turkey. He received his
Ph.D. in electrical and computer engineering from the University of Florida (2002).
Andre Gustavo Adami is an assistant professor in the Computer Science Department of the
University of Caxias do Sul, Brazil. He received his B.Sc. in computer science in 1994 from
the University of Caxias do Sul, his M.Sc. in computer science in 1997 from the Federal Univer-
sity of Rio Grande do Sul, Brazil, and his Ph.D. in electrical engineering in 2004 from the Oregon
Health and Science University.
Misha Pavel is a professor in the Department of Biomedical Engineering at Oregon Health
and Science University. He is the director of Point-of-Care Laboratory, focused on unobtrusive
monitoring and neurobehavioral assessment and modeling. His research interests include
analysis and modeling of perceptual and cognitive processes, pattern recognition, information
fusion, and decision making in healthy and impaired individuals. He received his PhD. in exper-
imental psychology from New York University and earned his M.S. in electrical engineering
from Stanford University.
Tian Lan received both a B.S. and an M.S. in electronics engineering from Beijing University of
Aeronautics and Astronautics in 1997 and 2000, respectively. He is a Ph.D. candidate and
research assistant in electrical engineering at OGI School of Science & Engineering, Oregon
Health & Science University. His research work focuses on signal processing, adaptive systems,
machine learning, and information theory and their applications in biomedical signal process-
ing, including EEG, MRI, and brain-computer interfaces.
270 Journal of Cognitive Engineering and Decision Making / Fall 2007
at IOWA STATE UNIV on April 7, 2015edm.sagepub.comDownloaded from
... This enabled the experimenters to assess the validity of displaying the workload distribution to pilots via the CWLM without confounding the results with the accuracy of the cognitive state assessment itself (an area of future work). For reference, previous work with EEG and ECG achieved an overall classification accuracy >90% (Dorneich et al., 2007). ...
... Future work is also needed to support the premise that long term workload balancing improvements would result in a reduction in fatigue and potential benefits in crew responsiveness to non-normal and off-nominal events. As cognitive state assessment improves in diagnostic accuracy in ever more realistic operational environments, there is the potential to create closedloop adaptive automation to respond to unbalanced workload (Dorneich et al., 2007). However, such automated interventions need to be designed with an understanding of the interplay between potential near-term benefits of the adaptations and the long term costs that may be associated with use of such systems (Dorneich et al., 2016). ...
Full-text available
This paper presents an adaptive system intended to address workload imbalances between pilots in future flight decks. Team performance can be maximized when task demands are balanced within crew capabilities and resources. Good communication skills enable teams to adapt to changes in workload, and include the balancing of workload between team members This work addresses human factors priorities in the aviation domain with the goal to develop concepts that balance operator workload, support future operator roles and responsibilities, and support new task requirements, while allowing operators to focus on the most safety critical tasks. A traditional closed-loop adaptive system includes the decision logic to turn automated adaptations on and off. This work takes a novel approach of replacing the decision logic, normally performed by the automation, with human decisions. The Crew Workload Manager (CWLM) was developed to objectively display the workload between pilots and recommend task sharing; it is then the pilots who " close the loop " by deciding how to best mitigate unbalanced workload. The workload was manipulated by the Shared Aviation Task Battery (SAT-B), which was developed to provide opportunities for pilots to mitigate imbalances in workload between crew members. Participants were put in situations of high and low workload (i.e., workload was manipulated as opposed to being measured), the workload was then displayed to pilots, and pilots were allowed to decide how to mitigate the situation. An evaluation was performed that utilized the SAT-B to manipulate workload and create workload imbalances. Overall, the CWLM reduced the time spent in unbalanced workload and improved the crew coordination in task sharing while not negatively impacting concurrent task performance. Balancing workload has the potential to improve crew resource management and task performance over time, and reduce errors and fatigue. Paired with a real-time workload measurement system, the CWLM could help teams manage their own task load distribution.
... The measurement of the physiological response and inferring cognitive states, with and without system adaptation has been demonstrated in previous studies [12][13][14][15][16][17][18]. However, there are still considerable challenges with the implementation of such methods, where some extensive reviews have identified that measures of MWL are not universally valid for all task scenarios [19,20]. ...
... More commonly however, studies have mainly measured cognitive states in response to task load without dynamic task adaptation [13,15,17,35]. Nonetheless, the inference of cognitive states based on physiological data is still an active area of research, with the most promising avenue being the use of AI techniques including supervised Machine Learning (ML) to generate models of the users' cognitive states based on labeled data [12,13,[15][16][17]. For a more detailed review on the various physiological sensors, and the corresponding methods implemented for processing MWL measurements see the following reference [3]. ...
Full-text available
The continuing development of avionics for Unmanned Aircraft Systems (UASs) is introducing higher levels of intelligence and autonomy both in the flight vehicle and in the ground mission control, allowing new promising operational concepts to emerge. One-to-Many (OTM) UAS operations is one such concept and its implementation will require significant advances in several areas, particularly in the field of Human-Machine Interfaces and Interactions (HMI2). Measuring cognitive load during OTM operations, in particular Mental Workload (MWL), is desirable as it can relieve some of the negative effects of increased automation by providing the ability to dynamically adapt avionics HMI2 to achieve an optimal sharing of tasks between the autonomous flight vehicles and the human operator. The novel Cognitive Human Machine System (CHMS) proposed in this paper is a Cyber-Physical Human (CPH) system that exploits the recent technological developments of affordable physiological sensors. This system focuses on physiological sensing and Artificial Intelligence (AI) techniques that can support a dynamic adaptation of the HMI2 in response to the operators' cognitive state (including MWL), external/environmental conditions and mission success criteria. However, significant research gaps still exist, one of which relates to a universally valid method for determining MWL that can be applied to UAS operational scenarios. As such, in this paper we present results from a study on measuring MWL on five participants in an OTM UAS wildfire detection scenario, using Electroencephalogram (EEG) and eye tracking measurements. These physiological data are compared with a subjective measure and a task index collected from mission-specific data, which serves as an objective task performance measure. The results show statistically significant differences for all measures including the subjective, performance and physiological measures performed on the various mission phases. Additionally, a good correlation is found between the two physiological measurements and the task index. Fusing the physiological data and correlating with the task index gave the highest correlation coefficient (CC = 0.726 ± 0.14) across all participants. This demonstrates how fusing different physiological measurements can provide a more accurate representation of the operators' MWL, whilst also allowing for increased integrity and reliability of the system.
... Recent studies proposed a real-time estimation protocol [120]. However, to obtain real-time fatigue estimation using connectivity based features will require further research and overcome several challenges like reducing computational complexity, choice of classifier, adequate training of the classifier/model and account for the subjective variability [122]. ...
From brain computer interfaces to human-machine systems, neuroergonomics and neuromarketing, cognitive state estimation based on techniques assessing human brain activity has turned from fiction into reality. From this perspective, studying brain function as resulting from complex interactions between different brain regions has the advantage of uncovering the intricacies of collective neural activity underlying different cognitive states. Further, blending methods from network science, neuroimaging and neuropsychology allows for a principled and quantifi-able interpretation of cognitive states. In this chapter, we discuss current state-of-the-art in characterizing various cognitive states and the role of network science and graph theory measures in their investigation. Further , we also present our view on future directions in cognitive state estimation in order to bridge the gap between fundamental research and translational real-world applications.
... The calibration phase can be performed either in an online or offline context and trains the classifier on a dataset specific to the user. Classification techniques include regression [433], support vector machines [434,435], fuzzy systems [436,437], discriminant analysis [438,439], neural networks [440][441][442], Bayesian networks [342,443,444], extreme learning machines [440] or committee machines [445]. The algorithmic nature of classification-based methods makes them well-suited for a system implementation. ...
Full-text available
Technological advances in avionics systems and components have facilitated the introduction of progressively more integrated and automated Human-Machine Interfaces and Interactions (HMI²) on-board civil and military aircraft. A detailed review of these HMI² evolutions is presented, addressing both manned aircraft (fixed and rotary wing) and Remotely Piloted Aircraft System (RPAS) specificities for the most fundamental flight tasks: aviate, navigate, communicate and manage. Due to the large variability in mission requirements, greater emphasis is given to safety-critical displays, command and control functions as well as associated technology developments. Additionally, a top-level definition of RPAS mission-essential functionalities is provided, addressing planning and real-time decision support for single and multi-aircraft operations. While current displays are able to integrate and fuse information from several sources to perform a range of different functions, these displays have limited adaptability. Further development to increase HMI² adaptiveness has significant potential to enhance the human operator's effectiveness, thereby contributing to safer and more efficient operations. The adaptive HMI² concepts in the literature contain three common elements. These elements comprise the ability to assess the system and environmental states; the ability to assess the operator states; and the ability to adapt the HMI² according to the first two elements. While still an emerging area of research, HMI² adaptation driven by human performance and cognition has the potential to greatly enhance human-machine teaming through varying the system support according to the user's needs. However, one of the outstanding challenges in the design of such adaptive systems is the development of suitable models and algorithms to describe human performance and cognitive states based on real-time sensor measurements. After reviewing the state-of-research in human performance assessment and adaptation techniques, detailed recommendations are provided to support the integration of such techniques in the HMI² of future Communications, Navigations, Surveillance (CNS), Air Traffic Management (CNS/ATM) and Avionics (CNS + A) systems.
... One of the more promising methods for use in adaptive systems is the use of psychophysiological measures to assess changes in operator functional state (e.g., mental workload, fatigue, stress) (Scerbo et al., 2001). This information is collected through real-time sensors that attempt to detect when cognitive resources are increasingly challenged, such as when they become inadequate to fulfill the mission's demands (Dorneich et al., 2007). Identifying states of mental overload is a high priority in adaptive systems as excessive workloads may inhibit the ability to process incoming information leading to deleterious effects on operator performance (Chen, Haas, & Barnes, 2007;Cummings, Bruni, Mercier, & Mitchell, 2007;Durantin, Gagnon, Tremblay, & Dehais, 2014;Ruff, Narayanan, & Draper, 2002). ...
Full-text available
Presently, adaptive systems use various cognitive and cardiovascular measures to evaluate the functional state of the operator. One marker that has been largely ignored as an assessment tool is baroreflex sensitivity (BRS). This study examined the extent to which BRS changed in response to acute psychological and physical stressors. A total of 20 participants underwent 6-min exposures to a psychological stressor and a physical stressor. Baroreceptor sensitivity, blood pressure, heart rate, heart rate variability, stroke volume, cardiac output, mean blood pressure, total peripheral resistance, left ventricular ejection time, and pre-ejection period were continuously measured at rest and throughout the testing period. Compared to rest, BRS significantly decreased during both the psychological and physical stressors. BRS was reduced more with the psychological stressor than the physical stressor. Heart rate and systolic blood pressure significantly increased above rest during the psychological stressor but not during the physical stressor. There were no significant differences from rest or between stressors for the other physiological markers. BRS was more robustly responsive than other cardiovascular measures commonly used to assess the psychophysiological response to stress, suggesting BRS is a useful marker for evaluating operator functional state during psychological and physical tasks.
Full-text available
Objective We demonstrate and discuss the use of mobile electroencephalogram (EEG) for neuroergonomics. Both technical state of the art as well as measures and cognitive concepts are systematically addressed. Background Modern work is increasingly characterized by information processing. Therefore, the examination of mental states, mental load, or cognitive processing during work is becoming increasingly important for ergonomics. Results Mobile EEG allows to measure mental states and processes under real live conditions. It can be used for various research questions in cognitive neuroergonomics. Besides measures in the frequency domain that have a long tradition in the investigation of mental fatigue, task load, and task engagement, new approaches—like blink-evoked potentials—render event-related analyses of the EEG possible also during unrestricted behavior. Conclusion Mobile EEG has become a valuable tool for evaluating mental states and mental processes on a highly objective level during work. The main advantage of this technique is that working environments don’t have to be changed while systematically measuring brain functions at work. Moreover, the workflow is unaffected by such neuroergonomic approaches.
The capability of measuring human performance objectively is hard to overstate, especially in the context of the instructor and student relationship within the process of learning. In this work, we investigate the automated classification of cognitive load leveraging the aviation domain as a surrogate for complex task workload induction. We use a mixed virtual and physical flight environment, given a suite of biometric sensors utilizing the HTC Vive Pro Eye and the E4 Empatica. We create and evaluate multiple models. And we have taken advantage of advancements in deep learning such as generative learning, multi-modal learning, multi-task learning, and x-vector architectures to classify multiple tasks across 40 subjects inclusive of three subject types --- pilots, operators, and novices. Our cognitive load model can automate the evaluation of cognitive load agnostic to subject, subject type, and flight maneuver (task) with an accuracy of over 80%. Further, this approach is validated with real-flight data from five test pilots collected over two test and evaluation flights on a C-17 aircraft.
New technologies in safety-critical systems offer the promise of next generation system features and capabilities; predictive analytics; enhanced and remote monitoring; and perhaps improved operator performance. At the same time, however, questions arise about the impact of such technologies on system safety, operator performance, and decision processes, in settings where safe and effective performance are of paramount importance. Wearable, immersive augmented reality (WIAR) technology is one such technology whose introduction sparks these questions. Despite the proliferation of WIAR technology in safety-critical settings, few studies have examined its impacts on operator performance, decision processes and situation awareness in these settings. As a result, this paper considers research needs for evaluating WIAR technology in safety-critical systems. To illustrate the research needed, we consider the use case of a WIAR technology in marine navigation, and propose a research framework, summarizing research needs and identifying needed next steps.
Objective: Human performance risks and benefits of adaptive systems were identified through a systematic analysis and pilot evaluation of adaptive system component types and characteristics. Background: As flight-deck automation is able to process ever more types of information in sophisticated ways to identify situations, it is becoming more realistic for adaptive systems to adapt behavior based on their own authority. Method: A framework was developed to describe the types and characteristics of adaptive system components and was used to perform a risk–benefit analysis to identify potential issues. Subsequently, eight representative adaptive system storyboards were developed for an evaluation with pilots to augment the analysis results and to explore more detailed issues and potential risk mitigations. Results: Analysis identified the principal drivers of adaptive “triggering conditions” risk as complexity and transparency. It also identified the drivers of adaptations risks and benefits as the task level and the level of control versus information adaptation. Conclusion: Pilots did not seem to distinguish between adaptive automation and normal automation if the rules were simple and obvious; however, their perception of risk increased when the level of complexity and opacity of triggering conditions reached a point where its behavior was perceived as nondeterministic.
Rapid technological progress in data and information gathering has made today's workplace a complex knowledge environment. It is estimated that at the current rate of technological innovation, the human will have become the weak link in the decision making chain by the year 2030. Although a number of innovations in data visualization and decision support have had positive impacts on human performance in the workplace, these techniques typically do not utilize information related to operator cognitive state, and thus may not function optimally in dynamic work environments. A critical step in the continued advancement of technology-driven decision making will be the development of methods to sense, assess, and augment operator performance in real-time.
Full-text available
This paper describes an adaptive system that “closes the loop” by utilizing a real-time, directly sensed measure of cognitive state of the human operator. The Honeywell Augmented Cognition team has developed a Closed Loop Integrated Prototype (CLIP) of a Communications Scheduler, for application to the U.S. Army's Future Force Warrior (FFW) program. It is expected that in a highly networked environment the sheer magnitude of communication traffic could overwhelm the individual soldier. The CLIP exploits real-time neurophysiological and physiological measurements of the human operator in order to create a cognitive state profile, which is used to augment the work environment to improve human-automation joint performance. An experiment showed that the Communications Scheduler enabled higher situation awareness and message comprehension in high workload conditions. Based solely on cognitive state, the system inferred a subject's message comprehension and repeated unattended messages in the majority of cases, without yielding an unacceptably high false alarm rate.
A physiological measure of processing load or "mental effort" required to perform a cognitive task should accurately reflect within-task, between-task, and betweenindividual variations in processing demands. This article reviews all available experimental data and concludes that the task-evoked pupillary response fulfills these criteria. Alternative explanations are considered and rejected. Some implications for neurophysiological and cognitive theories of processing resources are discussed.
Automation purported to assist human operators may itself be an additional source of complexity and uncertainty. Because high reliability cannot always be assured, imperfect automation can add to uncertainty and thereby degrade performance. The present study examined the relative benefits and costs of information and decision automation and investigated the effects of uncertainty resulting from automation unreliability during multiple task performance. Subjects were either provided with status information (?information? automation) or a recommendation for action (?decision? automation) for the system monitoring sub-task of the Multi-Attribute Task Battery (MAT). Two levels of automation reliability were compared. The detrimental effect of unreliable automation?a decrease in the detection rate of malfunctions?was greater for automation of higher reliability, a result consistent with previous findings of automation-related complacency. This effect of automation unreliability was also greater for decision than for information automation.
Perhaps the most basic issue in the study of cognitive workload is the problem of how to actually measure it. The electroencephalogram (EEG) continues to be the clinical method of choice for monitoring brain function in assessing sleep disorders, level of anaesthesia and epilepsy. This preference reflects the EEG’s high sensitivity to variations in alertness and attention, the unimposing conditions under which it can be recorded, and the low cost of the technology it requires. These characteristics also suggest that EEG-based monitoring methods might provide a useful tool in ergonomics. This paper reviews a long-term programme of research aimed at developing cognitive workload monitoring methods based on EEG measures. This research programme began with basic studies of the way neuroelectric signals change in response to highly controlled variations in task demands. The results yielded from such studies provided a basis on which to develop appropriate signal processing methodologies to automatically differentiate mental effort-related changes in brain activity from artifactual contaminants and for gauging relative magnitudes of mental effort in different task conditions. These methods were then evaluated in the context of more naturalistic computerbased work. The results obtained from these studies provide initial evidence for the scientific and technical feasibility of using EEG-based methods for monitoring cognitive load during human–computer interaction.