Conference PaperPDF Available

Adaptive Probabilistic Classification of Dynamic Processes: A Case Study on Human Trust in Automation

Authors:
Adaptive Probabilistic Classification of Dynamic Processes: A Case
Study on Human Trust in Automation
Kumar Akash, Tahira Reid, and Neera Jain
Abstract Classification algorithms have traditionally been
developed based on the assumption of independent data samples
characterized by a stationary distribution. However, some data
types, including human-subject data, typically do not satisfy
the aforementioned assumptions. This is relevant given the
growing need for models of human behavior (as they relate
to research in human-machine interaction). In this paper, we
propose an adaptive probabilistic classification algorithm using
a generative model. We model the prior probabilities using
a Markov decision process to incorporate temporal dynamics.
The conditional probabilities use an adaptive Bayes quadratic
discriminant analysis classifier. We implement and validate the
proposed algorithm for prediction of human trust in automation
using electroencephalography (EEG) and behavioral response
data. An improved accuracy is obtained for the proposed
classifier as compared to an adaptive classifier that does not
consider the temporal dynamics of the process being considered.
The proposed algorithm can be used for classification of other
human behaviors measured using psychophysiological data
and behavioral responses, as well as other dynamic processes
characterized by data with non-stationary distributions.
I. INTRODUCTION
Motivation and Problem Definition: In the application
of most classification algorithms, it is assumed that data
samples are independent, identically distributed, and are
characterized by a stationary distribution. Numerous classi-
fication algorithms have been developed for data that satisfy
these assumptions (see [1] for a review). However, many
real-world problems are characterized by data with temporal
variations and a non-stationary distribution. One example is
the use of human behavioral responses and psychophysio-
logical data for prediction of human behavior.
Human behavior and emotion estimation is becoming an
important segment in the fields of modern human-machine
interaction, brain-computer interface (BCI) design, and med-
ical care [2], among others. Human behavior inference for
decision making is critical for building synergistic relation-
ships between humans and autonomous systems. Researchers
have attempted to predict human behavior using dynamic
models that rely on the behavioral responses or self-reported
behavior of humans [3], [4]. An alternative is the use of
psychophysiological signals like the electroencephalogram
(EEG) that represents the electrical activity of the brain.
*This material is based upon work supported by the National Science
Foundation under Award No. 1548616. Any opinions, findings, and con-
clusions or recommendations expressed in this material are those of the
author(s) and do not necessarily reflect the views of the National Science
Foundation.
School of Mechanical Engineering, Purdue University, West Lafayette, IN
47907 USA kakash@purdue.edu, tahira@purdue.edu,
neerajain@purdue.edu
In order to infer human behavior from psychophysiological
signals, different brain activity patterns must be identified. A
common approach for this identification is the use of clas-
sification algorithms [5]. However, most of the EEG-based
classification algorithms in literature are based on static
classifiers that do not account for the dynamic characteristics
of human behavior [5]. Therefore, our goal is to use both
behavioral responses and psychophysiological measurements
to create a more accurate and robust classification algorithm
that considers the dynamics of human behavior.
Gaps in Literature: Most existing classification algorithms
do not consider the temporal dynamics of the process under
consideration. For classification of dynamic processes such
as human behavior, inclusion of the temporal dynamics will
improve prediction accuracy. However, dynamic classifica-
tion algorithms (e.g., hidden Markov models) are typically
computationally expensive to train adaptively, and therefore,
cannot be used for data with non-stationary characteris-
tics [6], [7], [8].
Contribution: In this paper, we propose an adaptive prob-
abilistic classification algorithm which incorporates the tem-
poral dynamics of the underlying process under considera-
tion. We use a generative model with the prior probability
modeled using a Markov decision process and the conditional
probability modeled using an existing adaptive quadratic
discriminant analysis classifier. We implement the proposed
algorithm for classification of human trust in automation
using psychophysiological measurements along with human
behavioral responses. Finally, we cross-validate the classifier
and show the improvement in its performance as compared
to the adaptive classification algorithm alone.
Outline: This paper is organized as follows. Section II
provides background on classification algorithms using EEG.
The proposed classification model framework is described in
Section III. The implementation of the proposed model for
predicting human trust is presented in Section IV. Results
and discussions are presented in Section V, followed by
concluding statements in Section VI.
II. BACKGROU ND
There are several classification algorithms which are used
in BCI applications and human behavior predictions. These
include a variety of algorithms, including linear classifiers
(e.g. linear discriminant analysis, support vector machines),
nonlinear Bayesian classifiers, artificial neural networks, and
k-nearest neighbors [5]. These classifiers can be categorized
using two taxonomies: Generative vs. Discriminative and
Static vs. Dynamic.
2018 Annual American Control Conference (ACC)
June 27–29, 2018. Wisconsin Center, Milwaukee, USA
978-1-5386-5427-9/$31.00 ©2018 AACC 246
Generative classifiers, e.g., Bayes quadratic discriminant
analysis (QDA), learn the distribution of each class and
compute the likelihood of each class for classification. Dis-
criminative classifiers, e.g., support vector machines (SVM),
only learn the explicit decision boundaries between the
classes, which are then used for classification [9]. Since the
EEG signals have non-stationary distributions, data collected
on-line may be characterized by different underlying dis-
tributions than the training data. Therefore, for an adaptive
implementation, it is preferable to identify the changes in
the underlying distribution and update a generative model
accordingly than to update the decision boundary in a
discriminative classifier. Furthermore, generative models are
typically specified as probabilistic models; this enables a
richer description between features and classes than can
be achieved using discriminative models by providing a
distribution model of how the data are actually generated.
Static classifiers, e.g., SVM, do not account for temporal
information during classification as they classify a single
feature vector. In contrast, dynamic classifiers, e.g., hidden
Markov models (HMM), account for temporal dynamics by
classifying a sequence of feature vectors. HMMs have been
used for classification of temporal sequences of EEG features
as described in [6], [7], [8]. While these studies showed that
they were promising classifiers for BCI systems, the Viterbi
algorithm used for training HMM is both computationally
expensive and memory intensive [10]. Therefore, HMM is
undesirable for use as an adaptive algorithm. Instead, to
design an adaptive probabilistic classifier, we will use a
generative model, namely, the Bayesian quadratic discrimi-
nant analysis (QDA) classifier. To include temporal dynamics
in the classification, we propose to supplement the QDA
classifier with a dynamic behavioral model using Markov
decision process.
III. PROBABILISTIC CLASSIFICATION ALGORITHM
Probabilistic classifiers predict a probability distribution
over the classes, instead of just predicting the most likely
class. For predicting the probability of a class label Ckusing
the feature vector x, we use training data to learn a model
for the posterior class probability P(Ck|x). A subsequent
decision state uses these posterior class probabilities to assign
class labels. Generative models initially determine the class-
conditional probabilities P(x|Ck)for each class Ckand also
presume the prior class probabilities P(Ck). Then, they use
Bayes’ theorem,
P(Ck|x) = P(x|Ck)P(Ck)
P(x)(1)
to estimate the posterior class probabilities P(Ck|x). The
denominator P(x)is a normalization constant.
We consider generative models in this work and incor-
porate dynamic characteristics using the prior class proba-
bilities based on Markov decision process as discussed in
Section III-B. In Section III-A, we provide the mathematical
foundations for the QDA classifier as well as an adaptive
implementation of it based on [11].
A. Adaptive Quadratic Discriminant Analysis Classifier
A Quadratic Discriminant Analysis (QDA) classifier uses
a generative approach for classification. The posterior prob-
ability that a point xbelongs to class Ckis calculated
using (1) as the product of the prior probability (P(Ck)) and
the multivariate normal density (P(x|Ck)) [12]. The density
function of the multivariate normal distribution with mean
µkand covariance Σkat a point xis
P(x|Ck) = 1
p2π|Σk|exp 1
2(xµk)TΣ1
k(xµk),
(2)
where |Σk|is the determinant of Σk[12]. The Quadratic
Discriminant Analysis (QDA) classifies xto a class Ckso
as to maximize a posteriori probability of the class, i.e.,
ˆ
Ck= argmax
i=1,...,K
ˆ
P(Ci|x).(3)
Therefore, to train a QDA classifier, we need to estimate
the means (µk) and covariance matrices (Σk) for each
class label. This estimation is given by the Maximum
Likelihood Estimate (MLE) as ˆ
µ=1
nPn
i=1 xi, and ˆ
Σ=
1
nPn
i=1 xixT
iˆ
µ2. Moreover, the prior probabilities for
each class, P(Ck), are estimated using the sample frequency
of each class in the training data. The parameters are
typically estimated using a training dataset offline and then
used for prediction. However, an adaptive implementation of
the QDA classifier developed by Anagnostopoulos et al. [11]
uses online learning with forgetting factor λas shown in (4).
ˆ
µt=11
tˆ
µt1+1
txt,ˆ
µ0= 0 (4a)
ˆ
Πt=11
tˆ
Πt1+1
txtxT
t,ˆ
Π0= 0 (4b)
ˆ
Σt=ˆ
Πtˆ
µtˆ
µT
t(4c)
nt=λt1nt1+ 1 (4d)
Here, trefers to the tth discrete time value of the variable .
The prior probabilities can be calculated as
P(Ck)t=11
ntP(Ck)t1+1
nt
I((Ck)t=Ck),(5)
where I(x=k)is the indicator function that is equal to 1
when the value of xis equal to that of k; else it is 0. A
complete derivation can be found in [11].
B. Dynamic probabilistic model for prior probability
Apart from model adaptation, the adaptive QDA classifier
is static in nature; that is, the classifier only considers the
present data without considering the dynamics of the data.
Though past data could be used as a part of x, it would
significantly increase the dimension of parameters to be esti-
mated. Instead, we propose a dynamic probabilistic model to
estimate the prior probability P(Ck)that would supplement
the estimation of posterior probability P(Ck|x)using (1).
The input to this model could include variables from xand/or
other variables that were not used for the classifier. The
modeling frameworks for this dynamic probabilistic model
247
can include state space models (SSM), Markov decision
processes (MDP), or HMMs. Here we will consider the use
of MDP for modeling the prior probability for classification.
A MDP is a 5-tuple (S,A, T, R, γ), with a finite set of
states S, a finite set of actions A, state transition probability
function T(s0|s, a) = P[St+1 =s0|St=s, At=a],
reward function R, and discount factor γ[0,1]. MDPs
are typically used for reinforcement learning to identify the
best policy that maximizes the reward. Policy identification is
outside the scope of this work. Therefore, for our application
of probabilistic dynamic modeling, the reward function R
and the reward discount factor γwill not be considered.
If T(s0|s, a)is not known, it can be empirically estimated,
based upon data consisting of actions and corresponding state
transitions, using the MLE given as
ˆ
T(i, j, k) = Nijk
PjNijk
(6)
Nijk =
n
X
t=1
I(st=i)I(st+1 =j)I(at=k),
where I(st=i)is the indicator function which is equal
to 1when the state sat time tis i, else it is 0. The
other two indicator functions are similarly defined. Once the
state transition probability function T(s0|s, a)is known, the
probability for the next state s0is based on the present state
sand action aas T(St=s, St+1 =s0, At=a). Further, the
nstep ahead transition matrix Tncan be calculated given
the series of actions at, at+1, ..., at+i, ..., at+n1, as
Tn=
n1
Y
i=0
T(:,:, at+i),(7)
and thereafter, the n-step ahead probabilities of states pn
can be calculated as pn=p0Tn, where p0are the initial
probabilities of states. These probabilities pnwill be used as
the prior probability P(Ck)in (1) with each state sof the
MDP corresponding to the labels Ckin the QDA classifier.
IV. CLASSIFICATION OF HUMAN TRUST IN HMI
In this section, we describe the classification of human
trust behavior using psychophysiological measurements of
participants, specifically EEG, along with their behavioral
responses. We used behavioral responses to model the prior
probability P(Ck)as described in Section III-B. The features
extracted from the psychophysiological measurements were
then used as the input xfor the adaptive QDA model
described in Section III-A. The framework for our adaptive
classification model for human trust is shown in Fig. 1.
A. Methods and Procedures
In our previous work [13], [14], [15], we developed an
experiment to elicit human trust dynamics in a simulated
autonomous system. The participants interacted with a com-
puter interface in which they were told that they would
be driving a car equipped with an image-based obstacle
detection sensor. The sensor would detect obstacles on the
road in front of the car, and the participant would need
to evaluate the algorithm reports and choose to either trust
or distrust the report based on their experience with the
algorithm. The study used a within-subjects design with
respect to trust wherein both behavioral and psychophysi-
ological data were collected. We used the data to estimate
and validate the classification model for each participant. A
detailed description of the study design and methods can be
found in [14], [15].
Five hundred eighty-one participants (340 males, 235 fe-
males, and 6 unknown) recruited using Amazon Mechanical
Turk [16], participated in our study online. The compensation
was $0.50 for their participation, and each participant elec-
tronically provided their consent. The Institutional Review
Board at Purdue University approved the study. These data
only consisted of the behavioral responses and were used to
estimate the MDP model parameters.
Forty-eight adults between 18 and 46 years of age (mean:
25.0 years old, standard deviation: 6.9 years) from West
Lafayette, Indiana (USA) were recruited using fliers and
email lists and participated in an in-lab study. All participants
were compensated at a rate of $15/hr. The group of partic-
ipants were diverse with respect to their age, professional
field, and cultural background (i.e., nationality). Psychophys-
iological data along with behavioral data were collected from
these participants and used for modeling and validation of the
proposed trust classification algorithm. We removed data for
three participants that had anomalous EEG spectra, possibly
due to bad channels or dislocation of EEG electrodes during
the study, resulting in 45 participants to analyze.
B. Trust behavior modeling using MDP
At each trial, each participant was presented with a stimuli
(obstacle detected or clear road) to which they had to respond
‘trust’ or ‘distrust’ based on their previous experience (re-
liable or faulty trial) and from the feedback they received
about the sensor after they responded. For this experiment,
we define human trust behavior as the process we will model
using an MDP as described below:
The trust decision of the humans is the finite set of
states, i.e., S:{Distrust,Trust}
The decision process of human trust is influenced by
the actions of the machine that lead to the machine
performance (experience) as the finite set of actions,
i.e., A:{Reliable,Faulty}
The experience from trial tacts as an action for the new
process state at t+ 1. Therefore, the human state sof
trust at tmoves to a new state s0at t+ 1 due to the
action (i.e., machine performance or experience) at t.
The state transition probability function T(s0|s, a)can
be represented as a 2×2×2matrix, such that T(i, j, k)
represents the transition probability from ith state to jth
state given the action k. Therefore, each of P(:,:, k)
represents the state transition matrix for the kth action.
We estimated the transition probability function as well
as the initial state probabilities using the behavioral data
collected from Amazon Mechanical Turk. We used an aggre-
gated data of 581 participants for the estimation, and there-
248
Experience
(Machine
Performance) P(Trust)
Psycho-
physiological
Data
P(x|Trust)
P(Trust|x)
Bayesian Probability Estimation
Feature
Extraction
Multivariate Normal Distribution
Conditional Probability using
Psychophysiological Data
x
Markov Decision Process using
Behavioral Response
Distrust
pF-D 1-pF-D
1-pF-T
pF-T
pR-D
1-pR-D
pR-T
1-pR-T
Faulty
Reliable
Trust
Fig. 1. A framework for adaptive probabilistic classification of human dynamic trust behavior. A Markov decision process model is used for estimating prior
probability using the behavioral responses of participants. Psychophysiological measurements from the participants are used for estimating the conditional
probability for each trust state.
fore assumed that a single transition probability function is
representative of general human trust behavior. The estimated
probability matrices are given as
T(s, s0, a =Faulty) = 0.5343 0.4857
0.3131 0.6869,
T(s, s0, a =Reliable) = 0.3177 0.6823
0.1191 0.8809(8)
where sand s0are initial and final states, respectively with
each consisting of S:{Distrust,Trust}. For example, the
transition from state Trust to Distrust after a reliable trial has
a probability of 0.8809. Estimated initial state probabilities
for Distrust and Trust are
p0=0.1985 0.8015.(9)
C. Adaptive QDA model using Psychophysiological Data
Adaptive implementation of the classification algorithm
inherently requires processing the data and estimating trust
in real-time. Therefore, we need to continuously extract
features from psychophysiological measurements, which is
achieved by continuously considering short segments of
signals for calculations. We divided the entire duration of
the study into multiple 4-second epochs (segments) with 50%
overlap between each consecutive epoch. We assume that the
decisive cognitive activity occurs after the participant sees the
feedback based upon their previous response. Therefore, we
only considered the epochs which were in between each suc-
cessive beginning of a trial and response (trust/distrust) for
training the classifier. All epochs were used for prediction.
We extracted an exhaustive set of potential features from
the data for each epoch. We then reduced the dimension of
this feature set to include only the statistically significant
variables of trust. This reduced feature set was used for
classifier modeling and validation.
1) Feature Extraction: For each of the seven channels (Fz,
C3, Cz, C4, P3, POz, and P4) of EEG data, we extracted
both frequency and time domain features from each epoch
as described in [15]. For frequency domain features, we
decomposed each channel’s data into four spectral bands,
namely delta (0Hz - 4Hz), theta (4Hz - 8Hz), alpha (8Hz
-16 Hz), and beta (16 Hz - 32 Hz) and calculated the mean,
variance, and signal energy for each band of each epoch. This
introduced 84 (7×4×3) potential features. For time domain
features, we included mean, variance, peak-to-peak values,
mean frequency, root-mean-square, and signal energy of each
TABLE I
FEATU RES US ED AS IN PUT VARI ABLE S FOR TR UST CL ASSI FICATIO N
Feature Domain
1 Mean Frequency - P4 Time
2 Mean Frequency - C4 Time
3 Mean Frequency - P3 Time
4 Peak-to-peak - C4 Time
5 Peak-to-peak - C3 Time
6 Root Mean Square - Fz Time
7 Energy - Fz Time
8 Variation - Fz Time
9 Correlation - C4 & P4 Time
10 Energy of Beta Band - P3 Frequency
11 Energy of Beta Band - Cz Frequency
12 Energy of Beta Band - C3 Frequency
13 Variation of Beta Band - P3 Frequency
14 Variation of Beta Band - Cz Frequency
15 Variation of Beta Band - C3 Frequency
epoch, thus introducing 42 (7×6) more potential features.
Furthermore, to consider the interaction between different
regions of the brain, we calculated the correlation between
pairs of channels for each epoch, adding another 21 features.
2) Feature Selection: The EEG data resulted in 147 (84+
42+21) potential features. To avoid “the curse of dimension-
ality” [5], these features were reduced to a smaller feature
set using a filter approach feature selection algorithm [12].
Participants were randomly divided into two sets, namely, a
training-set consisting of 23 participants and a validation-
set consisting of 22 participants. Using only training-set
participants’ data, we selected the best 15 features using
the Scalar Feature Selection technique [12], [13]. Fisher
Discriminant Ratio (FDR) was used as the class separability
criterion with a penalty proportional to the cross-correlation
between features. This penalty ensures that the selected
features are least correlated, therefore reducing redundancy
between features. The selected features are shown in Table I.
3) Modeling and validation: The selected feature set was
extracted from EEG data to construct the input xto evaluate
P(x|Ck) using (2). It should be noted that for each class
label Ck,µkRn×1and ΣkRn×n, where nis the
cardinality of the feature set. Therefore, for each class label,
n(n+ 3)/2parameters need to be estimated. This is a
relatively large number of parameters given our number of
data points. For example, for a two class problem with
15 features, the number of parameters to be estimated is
270 using approximately 270 data points in our study. This
leads to significant variations in the estimated covariance
matrices and often leads to ill-conditioned matrices which
249
1 21 41 56 68 83 100
Trial number
0.4
0.6
0.8
1
Trust Level
(Probability of
trust response)
(a) Group 1
1 21 41 56 68 83 100
Trial number
0.4
0.6
0.8
1
Trust Level
(Probability of
trust response)
(b) Group 2
Fig. 2. Participants’ trust level (blue dots). Faulty trials are highlighted in
gray, and black lines mark the breaks between databases.
cannot be inverted. This is particularly a challenge during the
initial estimation period when even fewer data are available.
Therefore, to avoid inversion of ill-conditioned matrices and
reduce the number of parameters to be estimated, we assume
that the features are independent of each other. This results in
covariance matrices that are diagonal and easily invertible.
Furthermore, this reduces the number of parameters to be
estimated to 2nfor each class label (i.e. 60 parameters in
our example above).
We included psychophysiological measurements in order
to identify any latent indicators of trust and distrust. We
hypothesized that the trust level would be high in reliable
trials and be low in faulty trials, which was validated using
responses collected from 581 online participants via Amazon
Mechanical Turk [16] as shown in Fig. 2 [14]. Therefore,
data from reliable trials were labeled as trust, and data from
faulty trials were labeled as distrust. In the next section, we
use these features extracted from psychophysiological data,
along with the dynamic behavioral model derived in Section
III-B, to implement the proposed classification algorithm.
V. RE SULTS AND DISCUSSIONS
We implemented the Adaptive Quadratic Discriminant
Analysis classifier with Markov Decision Process-based
prior probability (hereafter called AQDA-MDP) using the
selected features xshown in Table I, class labels Ck
{Distrust,Trust}, state transition matrix as given in (8), and
the initial state probability as given in (9). For compari-
son, we also consider the Adaptive Quadratic Discriminant
Analysis classifier (hereafter, called AQDA) exclusively with
the prior probability estimated using (5). The forgetting
factor λwas taken as 1, i.e., no forgetting was used. The
algorithms were used for online training and validation of
trust classification models from the real-time data for each
participant individually.
The results for two different training-set participants and
for two different validation-set participants are shown in
Fig. 3 and Fig. 4, respectively. Faulty trials are highlighted
20 40 55 67 82 100
Trial Number
0
0.5
1
Probability of
Trust response
AQDA
AQDA-MDP
(a) Prediction of trust for participant 5 in the training set.
20 40 55 67 82 100
Trial Number
0
0.5
1
Probability of
Trust response
AQDA
AQDA-MDP
(b) Prediction of trust for participant 7 in the training set.
Fig. 3. Training-set participants’ trust level predictions using AQDA-MDP
and AQDA algorithms. Faulty trials are highlighted in gray.
20 40 55 67 82 100
Trial Number
0
0.5
1
Probability of
Trust response
AQDA
AQDA-MDP
(a) Prediction of trust for participant 36 in the validation set.
20 40 55 67 82 100
Trial Number
0
0.5
1
Probability of
Trust response
AQDA
AQDA-MDP
(b) Prediction of trust for participant 34 in the validation set.
Fig. 4. Validation-set participants’ trust level predictions using AQDA-
MDP and AQDA algorithms. Faulty trials are highlighted in gray.
in gray, and reliable trials are highlighted in white. A high
probability of trust is expected in reliable trials, and a low
probability of trust is expected in faulty trials. To observe
the benefits of adaptation and to compare the performance
of each models, we calculate the mean trial accuracy for
each trial. Mean trial accuracy is calculated as the average,
across participants, of the percentage of correct prediction for
epochs for each trial. The variation of mean trial accuracy
for training-set and validation-set participants are shown in
Fig. 5(a) and Fig. 5(b), respectively. It can be seen that the
performance of the classifier is consistent between training-
set and validation-set participants. Therefore, the selected set
of features are capable of predicting trust behavior.
We see that the accuracy of the classifier is high for the
first 20 trials (see Fig. 5). This is the consequence of the
experiment design, which has data for one of the classes
(either trust or distrust) initially, therefore making the clas-
sifier biased towards the initial training data. Consequently,
the classifier accuracy just after the 20th trial is poor, and
it takes approximately 4-5 trials to eliminate the bias effect
and have a considerable sample size for both classes. After
250
0 10 20 30 40 50 60 70 80 90 100
Trials
0
20
40
60
80
100
Accuracy %
AQDA
AQDA-MDP
(a) Training-set participants
0 10 20 30 40 50 60 70 80 90 100
Trials
0
20
40
60
80
100
Accuracy %
AQDA
AQDA-MDP
(b) Validation-set participants
Fig. 5. Mean Trial accuracy for ADQA and AQDA-MDP algorithms.
the 55th trial, the classifier prediction accuracy decreases as
shown in Fig. 5. One of the potential reasons is improper
class labeling of the data. We assumed that the participants
trusted the obstacle detection sensor during the reliable trials
and distrusted it during the faulty trials. However, in the
later trials during which the sensor reliability changes more
rapidly, participants may have been unsure about the system
performance. Therefore, our assumption for class labeling
may not hold for data collected during these trials. As a
result, the adaptive algorithm incorrectly trains itself in the
later trials, resulting in accuracy approximately between 40%
and 65% as shown in Fig. 5. A better way to label the trials
as trust or distrust could improve the performance of the
classifier and is the subject of future work. The mean trial
accuracy for AQDA-MDP is, in general, higher than that
of AQDA. Despite the limitations of class labeling for our
experiment, the proposed algorithm enables the combination
of two different types of modeling frameworks, a static
QDA classifier and a dynamic MDP, systematically using
a Bayesian approach to yield a classifier with improved
accuracy. More generally, this algorithm can be used for
classification of other human behaviors measured using
psychophysiological data and behavioral responses, as well
as other dynamic processes characterized by data with non-
stationary distributions.
VI. CONCLUSION
To achieve symbiotic human-machine interactions, hu-
man behavior modeling is of utmost importance. This can
be accomplished with classification algorithms using psy-
chophysiological measurements and behavioral responses of
humans. Traditional classification algorithms, however, do
not consider the temporal dynamics of human behavior
and the non-stationary characteristics of psychophysiological
signals. In this paper, we described an adaptive probabilistic
classification algorithm for human behavior which uses a
dynamic MDP model to incorporate these temporal dynam-
ics. First, we estimated the parameters for a MDP using
behavioral responses. We then extracted an exhaustive set
of features from psychophysiological data from 23 training-
set participants and reduced the dimension of the feature
space using scalar feature selection. We trained a real-time
adaptive QDA-based classifier using data collected online
for these 23 participants. The classifiers were validated
against human subject data from another 22 validation-set
participants, and an improved accuracy was obtained with
classifier augmented with a dynamic MDP. Future work
will include comparing the performance of the proposed
classification algorithm against other dynamic classifiers.
ACKNOWLEDGMENT
The authors are extremely grateful and sincerely acknowl-
edge the guidance and help of Dr. Wan-Lin Hu in design of
experiments and collection of psychophysiological data.
REFERENCES
[1] S. Kotsiantis, “Supervised Machine Learning: A Review of Classifi-
cation Techniques,Informatica, vol. 31, pp. 249–268, 2007.
[2] D. Tan and A. Nijholt, Brain-Computer Interfaces and Human-
Computer Interaction. London: Springer London, 2010, pp. 3–19.
[3] C. M. Jonker and J. Treur, “Formal analysis of models for the
dynamics of trust based on experiences,” in European Workshop on
Modelling Autonomous Agents in a Multi-Agent World. Springer
Berlin Heidelberg, 1999, pp. 221–231.
[4] M. Hoogendoorn, S. W. Jaffry, P.-P. Van Maanen, and J. Treur,
“Modeling and validation of biased human trust,” in Proceedings of the
2011 IEEE/WIC/ACM International Conferences on Web Intelligence
and Intelligent Agent Technology - Volume 02. IEEE Computer
Society, 2011, pp. 256–263.
[5] F. Lotte, M. Congedo, A. L´
ecuyer, F. Lamarche, and B. Arnaldi,
“A review of classification algorithms for EEG-based brain–computer
interfaces,” Journal of Neural Engineering, vol. 4, no. 2, pp. R1–R13,
June 2007.
[6] B. Obermaier, C. Guger, C. Neuper, and G. Pfurtscheller, “Hidden
Markov models for online classification of single trial EEG data,”
Pattern recognition letters, vol. 22, no. 12, pp. 1299–1309, 2001.
[7] B. Obermaier, C. Neuper, C. Guger, and G. Pfurtscheller, “Information
transfer rate in a five-classes brain-computer interface,IEEE Trans-
actions on neural systems and rehabilitation engineering, vol. 9, no. 3,
pp. 283–288, 2001.
[8] F. Cincotti, A. Scipione, A. Timperi, D. Mattia, A. Marciani, J. Millan,
S. Salinari, L. Bianchi, and F. Bablioni, “Comparison of different
feature classifiers for brain computer interfaces,” in Neural Engineer-
ing, 2003. Conference Proceedings. First International IEEE EMBS
Conference on. IEEE, 2003, pp. 645–647.
[9] A. Y. Ng and M. I. Jordan, “On discriminative vs. generative
classifiers: A comparison of logistic regression and naive Bayes,
in Advances in Neural Information Processing Systems 14, T. G.
Dietterich, S. Becker, and Z. Ghahramani, Eds. MIT Press, 2002,
pp. 841–848.
[10] G. D. Forney, “The Viterbi algorithm,Proceedings of the IEEE,
vol. 61, no. 3, pp. 268–278, 1973.
[11] C. Anagnostopoulos, D. K. Tasoulis, N. M. Adams, N. G. Pavlidis, and
D. J. Hand, “Online linear and quadratic discriminant analysis with
adaptive forgetting for streaming classification,Statistical Analysis
and Data Mining, vol. 5, no. 2, pp. 139–166, Apr. 2012.
[12] S. Theodoridis and K. Koutroumbas, Pattern Recognition, ser. Pattern
Recognition Series. Elsevier Science, 2006.
[13] W.-L. Hu, K. Akash, N. Jain, and T. Reid, “Real-Time Sensing of
Trust in Human-Machine Interactions,” in 1st IFAC Conference on
Cyber-Physical & Human-Systems, Florianopolis, Brazil, 2016.
[14] K. Akash, W.-L. Hu, T. Reid, and N. Jain, “Dynamic Modeling of
Trust in Human–Machine Interactions,” in 2017 American Control
Conference, Seattle, WA, 2017.
[15] K. Akash, W. L. Hu, N. Jain, and T. Reid, “A Classification Model
for Sensing Human Trust in Machines Using EEG and GSR,ACM
Transactions on Interactive Intelligent Systems, 2018. (In Press).
[16] Amazon, “Amazon mechanical turk,” 2005. [Online]. Available:
https://www.mturk.com/ [Accessed 22 February 2017]
251
... Furthermore, similar methods have been applied in attempt to predicting users trust levels towards industrial automation systems by some researchers [2,1,32]. For example, Hu et.al [32] proposed and validated a classifier model for sensing users trust level in automation systems using EDA and EEG signals. ...
... To develop our target classifier model, customized python script based on python-MNE [24,25,59] and scikit-learn [53] was used. We implemented seven types of classifiers : The first five (linear discriminant analysis (LDA), linear support vector machine (L-SVM), logistic regression (LR), quadratic discriminant analysis (QDA) and Weighted k nearest neighbors (kNN) ) has been used in previous similar study [2,32] and the other two (gradient boosting (GB) and random forest (RF)) were chosen because of their robustness [50,37]. Our implementation combined all seven algorithms through a technique known as classifier voting [42] to reduce classification error as suggested by prior research [2,55]. ...
... We implemented seven types of classifiers : The first five (linear discriminant analysis (LDA), linear support vector machine (L-SVM), logistic regression (LR), quadratic discriminant analysis (QDA) and Weighted k nearest neighbors (kNN) ) has been used in previous similar study [2,32] and the other two (gradient boosting (GB) and random forest (RF)) were chosen because of their robustness [50,37]. Our implementation combined all seven algorithms through a technique known as classifier voting [42] to reduce classification error as suggested by prior research [2,55]. The classifiers were trained and tested using a single input feature vector containing all participants data-sets. ...
Conference Paper
Full-text available
Artificial intelligence (AI) systems are becoming pervasive in modern day society. Nonetheless, AI is not infallible. Therefore as more task and controls get delegated to AI systems, the implications become dire and riskier (e.g. Google assistant falling to make an emergency call hands-free), which impacts users experience and make users unforgiving towards system failure. Trust can be a key element as it can help overcome the fear of loss and can supports users interactions. Therefore making it important to design methods and tools to foster trust between users and these technologies, especially capable of assessing users trust objectively and serving as an interface applicable in real-time during the interaction. Measuring the psycho-physiological signals of the user provides a way for objective assessment of users trust towards the technology, with the potential for real-time trust assessment, provided that we know the neural correlates of trust. This study aims to show that it is indeed possible to objectively detect users trust level in AI technologies, and provides details on what physiological signals are most suitable for real-time trust detection, as well as details on the predictive machine learning model used. The results show that our model achieved a mean accuracy of 77.8% and mean receiver operating characteristics (ROC) for the area under the curve (AUC) was 0.76.
... Additionally, there is difficulty in establishing a one-to-one relationship between such psycho-physiological correlates and trust states [67]- [69], [121]. Furthermore, the signals from these sensors are typically non-stationary [122]. And the majority of machine learning classification algorithms are based on the assumption of stationarity, and independent data samples [123]. ...
... And the majority of machine learning classification algorithms are based on the assumption of stationarity, and independent data samples [123]. These algorithms do not perform as high as expected for data collected in this a manner [122]. Hence, on top of addressing the issues of validity, reliability, and measurement invariance, it is also worth investigating the signal processing aspects of psycho-physiological sensors as these sensors provide us with a way to look into trust formation, trust measurement and calibration from a different angle by sensing directly observable correlates to trust. ...
Article
Full-text available
As complex autonomous systems become increasingly ubiquitous, their deployment and integration into our daily lives will become a significant endeavor. Human–machine trust relationship is now acknowledged as one of the primary aspects that characterize a successful integration. In the context of human–machine interaction (HMI), proper use of machines and autonomous systems depends both on the human and machine counterparts. On one hand, it depends on how well the human relies on the machine regarding the situation or task at hand based on willingness and experience. On the other hand, it depends on how well the machine carries out the task and how well it conveys important information on how the job is done. Furthermore, proper calibration of trust for effective HMI requires the factors affecting trust to be properly accounted for and their relative importance to be rightly quantified. In this article, the functional understanding of human–machine trust is viewed from two perspectives—human-centric and machine- centric. The human aspect of the discussion outlines factors, scales, and approaches, which are available to measure and calibrate human trust. The discussion on the machine aspect spans trustworthy artificial intelligence, built-in machine assurances, and ethical frameworks of trustworthy machines.
... The model attained an accuracy of 80%, and no instability issues were reported. In addition, no instability issue was encountered because the EDA data was scaled between 0 and 1; this is the cost of removing the temporal characteristics of the psychophysiological signal data [8]. Further, there was no feature engineering applied to the only two features extracted (standard deviation and peaks), which leaves questions about the relevance and co-linearity of the features used. ...
... This motivated other researchers to utilize the ensemble method (combination of two more algorithms) to reduce the classification error [57]. However, each algorithm has its inherent limitations (bias and variance sensitivity) depending on its characteristics (i.e., generative vs. discriminant, static vs. dynamic, linear vs. nonlinear, and stable vs. unstable) [8,57]. The prior ensemble trust classifier models were fairly accurate but were unstable. ...
Article
Trust as a precursor for users' acceptance of artificial intelligence (AI) technologies that operate as a conceptual extension of humans (e.g., autonomous vehicles (AVs)) is highly influenced by users' risk perception amongst other factors. Prior studies that investigated the interplay between risk and trust perception recommended the development of real-time tools for monitoring cognitive states (e.g., trust). The primary objective of this study was to investigate a feature selection method that yields feature sets that can help develop a highly optimized and stable ensemble trust classifier model. The secondary objective of this study was to investigate how varying levels of risk perception influence users' trust and overall reliance on technology. A within-subject four-condition experiment was implemented with an AV driving game. This experiment involved 25 participants, and their electroencephalogram, electrodermal activity, and facial electromyogram psychophysiological signals were acquired. We applied wrapper, filter, and hybrid feature selection methods on the 82 features extracted from the psychophysiological signals. We trained and tested five voting-based ensemble trust classifier models using training and testing datasets containing only the features identified by the feature selection methods. The results indicate the superiority of the hybrid feature selection method over other methods in terms of model performance. In addition, the self-reported trust measurement and overall reliance of participants on the technology (AV) measured with joystick movements throughout the game reveals that a reduction in risk results in an increase in trust and overall reliance on technology.
... The contents of this section were previously published by Akash, Reid, and Jain in the Proceedings of the 2018 American Control Conference (ACC) [132] and are reported here with minor modifications. ...
Thesis
Full-text available
Intelligent machines, and more broadly, intelligent systems, are becoming increasingly common in the everyday lives of humans. Nonetheless, despite significant advancements in automation, human supervision and intervention are still essential in almost all sectors, ranging from manufacturing and transportation to disaster-management and healthcare. These intelligent machines interact and collaborate with humans in a way that demands a greater level of trust between human and machine. While a lack of trust can lead to a human's disuse of automation, over-trust can result in a human trusting a faulty autonomous system which could have negative consequences for the human. Therefore, human trust should be calibrated to optimize these human-machine interactions. This calibration can be achieved by designing human-aware automation that can infer human behavior and respond accordingly in real-time. In this dissertation, I present a probabilistic framework to model and calibrate a human's trust and workload dynamics during his/her interaction with an intelligent decision-aid system. More specifically, I develop multiple quantitative models of human trust, ranging from a classical state-space model to a classification model based on machine learning techniques. Both models are parameterized using data collected through human-subject experiments. Thereafter, I present a probabilistic dynamic model to capture the dynamics of human trust along with human workload. This model is used to synthesize optimal control policies aimed at improving context-specific performance objectives that vary automation transparency based on human state estimation. I also analyze the coupled interactions between human trust and workload to strengthen the model framework. Finally, I validate the optimal control policies using closed-loop human subject experiments. The proposed framework provides a foundation toward widespread design and implementation of real-time adaptive automation based on human states for use in human-machine interactions.
Article
Full-text available
Trust plays an essential role in all human relationships. However, measuring trust remains a challenge for researchers exploring psychophysiological signals. Therefore, this article aims to systematically map the approaches used in studies assessing trust with psychophysiological signals. In particular, we examine the numbers and frequency of combined psychophysiological signals, the primary outcomes of previous studies, and the types and most commonly used data analysis techniques for analyzing psychophysiological data to infer a trust state. For this purpose, we employ a systematic mapping review method, through which we analyze 51 carefully selected articles (studies focused on trust using psychophysiology). Two significant findings are as follows: (1) Psychophysiological signals from EEG(electroencephalogram) and ECG(electrocardiogram) for monitoring peripheral and central nervous systems are the most frequently used to measure trust, while audio and EOG(electro-oculography) psychophysiological signals are the least commonly used. Moreover, the maximum number of psychophysiological signals ever combined so far is three (3). Most of which are peripheral nervous system monitoring psychophysiological signals that are low in spatial resolution. (2) Regarding outcomes: there is only one tool proposed for assessing trust in an interpersonal context, excluding trust in a technology context. Moreover, there are no stable and accurate ensemble models that have been developed to assess trust; all prior attempts led to unstable but fairly accurate models or did not satisfy the conditions for combining several algorithms (ensemble). In conclusion, the extent to which trust can be assessed using psychophysiological measures during user interactions (real-time) remains unknown, as there several issues, such as the lack of a stable and accurate ensemble trust classifier model, among others, that require urgent research attention. Although this topic is relatively new, much work has been done. However, more remains to be done to provide clarity on this topic.
Conference Paper
Full-text available
In an increasingly automated world, trust between humans and autonomous systems is critical for successful integration of these systems into our daily lives. In particular, for autonomous systems to work cooperatively with humans, they must be able to sense and respond to the trust of the human. This inherently requires a control-oriented model of dynamic human trust behavior. In this paper, we describe a gray-box modeling approach for a linear third-order model that captures the dynamic variations of human trust in an obstacle detection sensor. The model is parameterized based on data collected from 581 human subjects, and the goodness of fit is approximately 80% for a general population. We also discuss the effect of demographics, such as national culture and gender, on trust behavior by re-parameterizing our model for subpopulations of data. These demographic-based models can be used to help autonomous systems further predict variations in human trust dynamics.
Article
Full-text available
Human trust in automation plays an important role in successful interactions between humans and machines. To design intelligent machines that can respond to changes in human trust, real-time sensing of trust level is needed. In this paper, we describe an empirical trust sensor model that maps psychophysiological measurements to human trust level. The use of psychophysiological measurements is motivated by their ability to capture a human's response in real time. An exhaustive feature set is considered, and a rigorous statistical approach is used to determine a reduced set of ten features. Multiple classification methods are considered for mapping the reduced feature set to the categorical trust level. The results show that psychophysiological measurements can be used to sense trust in real-time. Moreover, a mean accuracy of 71.57% is achieved using a combination of classifiers to model trust level in each human subject. Future work will consider the effect of human demographics on feature selection and modeling.
Chapter
Full-text available
Advances in cognitive neuroscience and brain imaging technologies have started to provide us with the ability to interface directly with the human brain. This ability is made possible through the use of sensors that can monitor some of the physical processes that occur within the brain that correspond with certain forms of thought. Researchers have used these technologies to build brain-computer interfaces (BCIs), communication systems that do not depend on the brain’s normal output pathways of peripheral nerves and muscles. In these systems, users explicitly manipulate their brain activity instead of using motor movements to produce signals that can be used to control computers or communication devices. Human-Computer Interaction (HCI) researchers explore possibilities that allow computers to use as many sensory channels as possible. Additionally, researchers have started to consider implicit forms of input, that is, input that is not explicitly performed to direct a computer to do something. Researchers attempt to infer information about user state and intent by observing their physiology, behavior, or the environment in which they operate. Using this information, systems can dynamically adapt themselves in order to support the user in the task at hand. BCIs are now mature enough that HCI researchers must add them to their tool belt when designing novel input techniques. In this introductory chapter to the book we present the novice reader with an overview of relevant aspects of BCI and HCI, so that hopefully they are inspired by the opportunities that remain.
Article
Today, intelligent machines \emph{interact and collaborate} with humans in a way that demands a greater level of trust between human and machine. A first step towards building intelligent machines that are capable of building and maintaining trust with humans is the design of a sensor that will enable machines to estimate human trust level in real-time. In this paper, two approaches for developing classifier-based empirical trust sensor models are presented that specifically use electroencephalography (EEG) and galvanic skin response (GSR) measurements. Human subject data collected from 45 participants is used for feature extraction, feature selection, classifier training, and model validation. The first approach considers a general set of psychophysiological features across all participants as the input variables and trains a classifier-based model for each participant, resulting in a trust sensor model based on the general feature set (i.e., a "general trust sensor model"). The second approach considers a customized feature set for each individual and trains a classifier-based model using that feature set, resulting in improved mean accuracy but at the expense of an increase in training time. This work represents the first use of real-time psychophysiological measurements for the development of a human trust sensor. Implications of the work, in the context of trust management algorithm design for intelligent machines, are also discussed.
Article
Supervised machine learning is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances. In other words, the goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various supervised machine learning classification techniques. Of course, a single article cannot be a complete review of all supervised machine learning classification algorithms (also known induction classification algorithms), yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored.
Article
Advances in data technology have enabled streaming acquisition of real-time information in a wide range of settings, including consumer credit, electricity consumption, and internet user behavior. Streaming data consist of transiently observed, temporally evolving data sequences, and poses novel challenges to statistical analysis. Foremost among these challenges are the need for online processing, and temporal adaptivity in the face of unforeseen changes, both smooth and abrupt, in the underlying data generation mechanism. In this paper, we develop streaming versions of two widely used parametric classifiers, namely quadratic and linear discriminant analysis. We rely on computationally efficient, recursive formulations of these classifiers. We additionally equip them with exponential forgetting factors that enable temporal adaptivity via smoothly down-weighting the contribution of older data. Drawing on ideas from adaptive filtering, we develop an online method for self-tuning forgetting factors on the basis of an approximate gradient scheme. We provide extensive simulation and real data analysis that demonstrate the effectiveness of the proposed method in handling diverse types of change, while simultaneously offering monitoring capabilities via interpretable behavior of the adaptive forgetting factors. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 © 2012 Wiley Periodicals, Inc.
Conference Paper
Changes in EEG power spectra related to the imagination of movements may be used to build up a direct communication channel between brain and computer (brain computer interface; BCI). However, for the practical implementation of a BCI device, the feature classifier plays a crucial role. We compared the performance of three different feature classifiers for the detection of the imagined movements in a group of 6 normal subjects by means the EEG. The feature classifiers compared were those based on the hidden Markov models (HMM), the artificial neural network (ANN) and on the Mahalanobis distance (MD). Results show a better performance of the MD and ANN classifiers with respect to the HMM classifier.