Sensors Model Student Self Concept in the
David G. Cooper1, Ivon Arroyo1Beverly Park Woolf1, Kasia Muldner2,
Winslow Burleson2, and Robert Christopherson2
1University of Massachusetts, Department of Computer Science,
140 Governors Drive, Amherst MA 01003, USA,
2Arizona State University, School of Computing and Informatics,
Tempe AZ 85287,USA
Abstract. In this paper we explore findings from three experiments
that use minimally invasive sensors with a web based geometry tutor
to create a user model. Minimally invasive sensor technology is mature
enough to equip classrooms of up to 25 students with four sensors at the
same time while using a computer based intelligent tutoring system. The
sensors, which are on each student’s chair, mouse, monitor, and wrist,
provide data about posture, movement, grip tension, arousal, and facially
expressed mental states. This data may provide adaptive feedback to
an intelligent tutoring system based on an individual student’s affective
states. The experiments show that when sensor data supplements a user
model based on tutor logs, the model reflects a larger percentage of the
students’ self-concept than a user model based on the tutor logs alone.
The models are further expanded to classify four ranges of emotional
self-concept including frustration, interest, confidence, and excitement
with over 78% accuracy. The emotional predictions are a first step for
intelligent tutor systems to create sensor based personalized feedback
for each student in a classroom environment. Bringing sensors to our
children’s schools addresses real problems of students’ relationship to
mathematics as they are learning the subject.
Traditionally, the User Model of an Intelligent Tutoring System (ITS) consists
of registration information with or without statistics about interactions with the
ITS [1,2]. Registration information often includes age, gender, class standing,
teacher, and other static information about learners. A limitation of this ap-
proach is that the only dynamic information that the ITS uses is based on the
performance of the students. With the use of non-invasive sensors, we have the
opportunity to enhance user models with sensor data that is a natural byproduct
of the student’s interaction with the ITS. Though the cost of such sensors has
previously made them less accessible for classroom deployment, recent strides
have been made to address this limitation. Arizona State University (ASU), in
collaboration with the Affective Computing Group (ACG) at MIT, has devel-
oped 30 lower-cost versions of four sensors that have shown promise for their
ability to detect elements of students’ emotional expression. These sensors in-
clude a pressure sensitive mouse, a pressure sensitive chair, a skin conductance
wristband, and a camera based facial expression recognition system that incor-
porates a computational framework that aims to infer a user’s state of mind. At
UMass Amherst, we have built on ASU’s work by integrating the sensors and
an Emotional Query intervention module with a traditional ITS user interaction
based models to obtain the students’ reported emotions as they interact with the
tutor. This enables the User Model System (UMS) to compare sensor readings
at the time of the emotional queries.
Ultimately we plan to have a UMS that models the student’s interaction
with an ITS in real-time and enables the ITS to intelligently tailor its behavior
to a given student’s needs. By personalizing the student’s experience, the ITS
can keep the student engaged and maintain or increase the student’s interest
and confidence in the subject.  is an example of having a character as part of
the tutor giving non-verbal feedback,  is an example of a tutor that changes
its feedback based on the tutor’s emotional state in response to the student’s
emotion. For instance, a positive student emotional state elicits happiness in
the tutor, which in turn rewards the student. In order to create the desired
UMS, we have developed a platform comprised of three functional interacting
components. These are (1) a sensor system for processing and integrating the
sensor data described in Sec. 4, (2) a pedagogical engine for tutoring the student
and collecting tutor data described in Sec. 2, and (3) a User Model system
for integrating the sensor and tutor data to create a model of the student. We
conducted three experiments using this framework in order to determine which
sensor features have the best utility in terms of modeling students’ perceived
This paper describes our progress. Section 2 describes the Wayang Tutor
and the student features that are used for the model. Section 3 describes related
work. Section 4 describes the sensors that we use, their history, and the features
for input to the User Model. Section 5 describes the integration of the sensor and
tutor features. Section 6 describes the three studies performed to collect data
for the user models. Finally, Section 7 discusses how the model can be used and
ways to improve on the model we created.
2 The Tutor: Wayang Outpost
Our test-bed application for the experiments we describe in Sec. 6 was Wayang
Outpost, a multimedia Intelligent Tutoring System (ITS) for geometry . The
tutoring software is adaptive in that it iterates through different topic sections
(e.g. pythagorean theorem). Within each topic section, Wayang adjusts the dif-
ficulty of problems provided depending on past student performance. Students
are presented with a problem and asked to choose the solution from a list of
multiple choice options (typically four or five) as shown in Fig. 1.
Fig.1. An example problem presented by the Wayang system. Jake is on the lower
right corner. The Hint Toolbar is on the right.
As students solve problems, they may ask the tutor for one or several mul-
timedia hints, consisting of text messages, audio and animations. The software
includes gendered learning companions that are actual “companions” only: they
don’t provide help; instead, they encourage students to use the help function;
they have the capability of expressing emotions; and they emphasize the im-
portance of effort and perseverance. Wayang has been used with thousands of
students in the past and has demonstrated improved learning gains in state
standard exams .
Wayang collects student interaction features in order to predict each student’s
level of effort on the problems presented. These features, described in Table 1,
are derived from the tutor data that is sent to the UMS. The majority of the
tutor features could be extracted from other tutor systems with similar structure
including a clear delineation of when attempts are made to answer the problem.
Some features of Wayang are more specific, such as the number of hints or
whether a particular gendered learning companion was used.
3 Related Work
There are a number of systems that already exist that either use similar sensors,
detect similar affective states, or incorporate both tutor data and sensor data in
order to model the student’s self reported emotion.
 uses a number of sensors to detect facial expressions, physiological fea-
tures (heart rate, temperature, and skin conductance), and speech signals. The
experiment uses 32 students simultaneously. Their application is to elicit emo-
Table 1. The nine tutor features below are selected along with the sensor features
by using regression models to predict confidence, frustration, excitement, and interest.
This table lists each tutor feature with an abbreviation and a definition.
Solved On First
Seconds to First Attempt
Seconds to Solved
Number of Hints
Learning Companion (LC)
Time In Session
Student’s first attempt was correct.
Time in seconds to the first attempt.
Time in seconds to a correct attempt.
The number of incorrect responses.
The number of hints the student selected.
A value of 1 for LC and 0 for No LC
2 for Jake, 1 for Jane, 0 for Neither
Time student has spent on interactive
problems in the current session.
Time student has spent on problems
since the first use of the Tutor.
Time In TutorTtutT
tional responses by the presentation of images rather than from using a tutor
system. The emotions that they model are fear, anger, and frustration.
[7,8] use a 3-D learning environment as their tutoring system. The systems
monitor heart-rate and skin conductance in addition to the student-tutor interac-
tions.  creates a model of frustration, while,  creates a model of self-efficacy,
i.e. the student’s belief in producing a correct answer.
Other work such as  does not use sensors at all, but only uses self reports
to determine emotional state. They use three emotional ranges to model the stu-
dent: boredom vs. curiosity, distress vs. enthusiasm, and anxiety vs. confidence.
With the model of the student, they then create a model of their tutor to have
emotional states that guide the tutor’s responses. The focus of this system is the
repair rather than the detection of emotional states.
Much of the past research has focused on small populations of students or lab
studies, while our research uses large groups of students in real school settings.
This is relevant because much research has shown that students lose interest and
self-confidence in math over the course of the K-12 school system [9–11]. Bringing
sensors to our children’s schools addresses real problems of students’ relationship
to mathematics as they are learning the subject. This brings new tools to address
their frustration, anxiety and disinterest/boredom while learning.
4 The Sensors
The sensors used in this study are similar to sensors that have been used in
previous studies done by the Affective Computing Group (ACG) at MIT, but
we have invested considerable effort on decreasing the overall production cost
and improving the non-invasive nature of the sensors. Below we describe how
our sensors compare to earlier sensors as well as some of the past uses of such
Skin Conductance Bracelet. The current system used in our research
employs the next generation of HandWave electronics , providing greater
reliability, lower power requirements through wireless RFID transmission, and
a smaller form. This smaller form was redesigned to minimize the visual im-
pact and increase the wearable aspects of previous versions. ASU integrated and
tested these electronic components into a wearable package suitable for students
in classrooms. Our version reports at 1Hz.
Pressure Sensitive Mouse. ACG developed the pressure sensitive mouse.
It uses six pressure sensors embedded in the surface of the mouse to detect the
tension in a user’s grip and has been used to infer elements of a user’s frustration
level . Our endeavors replicated ACG’s pressure sensitive mouse through a
production of 30 units. The new design of the mouse minimized the changes
made to the physical appearances of the original mouse in order to maintain a
visually non-invasive sensor, while maintaining functionality.
Pressure Sensitive Chair. The chair sensor system was developed at ASU
using a series of six force sensitive resistors as pressure sensors dispersed strategi-
cally in the seat and back of a readily available seat cover cushion. It is a greatly
simplified version of the Tek-Scan Pressure system (costing around $10,000) used
in [14,15]. This posture chair sensor was developed at ASU at an approximate
cost of $500 per chair for a production volume of 30 chairs.
Mental State Camera. The studies in [14,15] utilized IBM Research’s
Blue-Eyes camera hardware. This is special purpose hardware for facial feature
detection. In our current research we are using a standard web-camera to obtain
30fps at 320x240 pixels. The camera is placed on the monitor of each student’s
computer. This is coupled with the MindReader library from  using a Java
Native Interface (JNI) wrapper developed at UMass. The interface starts a ver-
sion of the MindReader software, and can be queried at any time to acquire
the most recent mental state values that have been computed by the library.
In the version used in the experiments, only the six mental state features were
available, but in future versions we will have the Facial Action Units available
as well. These six mental features have a 65% to 89% accuracy with 5 out of the
six features reported at above 76% accuracy.
4.2 Sensor Features
In order to create effective user models, we want to select the best feature set for
our classification of the user’s emotional self concept. Given that we don’t have
a huge number of examples, it is important to use as few features as possible
while still receiving the value from each sensor. Thus the data from each sensor
has been aggregated in the case of the Mouse and the Chair, and processed
into five mental states, in the case of the Camera. We are using the raw Skin
Conductance values for the Bracelet. The sensor features that are used for the
studies are summarized in Table 2. These are used in conjunction with tutor
features described in Sec. 2.
Table 2. The ten sensor features below are summarized by their mean, standard
deviation, min and max values and then these 40 summarized features are selected
by using regression models to predict confidence, frustration, excitement, and interest.
This table defines the abbreviations for each feature.
Bracelet Skin Conductance BmeanC
Net Seat Change SmeanS
Net Back Change SmeanB
MmeanP MdevP MminP MmaxP
The classifiers in  used a similar sensor set in order to predict whether
a user would click a button indicating frustration. They used the mean values
computed over the previous 150 second window from when clicking the frustrated
button. Fourteen sensor features were used to make four classifier systems using
data from 24 students. Each system performed better than a classifier always
picking no frustration, but no classifier was more than 80% accurate.
In addition to predicting frustration, our model is meant to predict excite-
ment, interest, and confidence. The sensor features considered in our analysis
are described below.
Mouse Feature. From the six pressure values from the mouse, each having the
range [0,1023], we compute the following feature:
middleMouseFront + middleMouseRear +
rightMouseFront + rightMouseRear
which gives a potential range from [0,6], but empirically has the range of [0,2.5]
in the High School (HS) study, and [0,1] in the two other studies.
Chair Features. We compute three features from the 6 chair sensors. The first
two are based on the most useful features from . These are the net change
in pressure of the seat, and the net change in pressure of the back:
LeftSeat[t − 1] −
MiddleSeat[t − 1] − middleSeat[t] +
RightSeat[t − 1] − rightSeat[t]
lastMiddleBack − middleBack +
lastRightBack − rightBack
The third chair feature is meant to determine if the student is sitting forward.
From the three pressure values from the back of the chair, each having the range
[0,1023], we compute the Sit Forward feature as follows:
0 if leftBack
>= middleBack > −1 and
leftBack> −1 and
where NA is treated as no data.
Bracelet Feature. There are two values that we obtain from the wrist sensor,
one is the battery voltage to inform us when the battery charge is low, and the
other is the skin conductance in Microsiemens. Since there was no need to reduce
the number of features, we processed basic statistics on the raw sensor values.
In the future we plan to examine more sophisticated use of the skin conductance
data such as the methods described in .
Mental State Camera Features. Of the six mental state features that the
MindReader software identifies, we left out the disagree state, since agree and
disagree are opposites. The five features we are left with are agreeing, concen-
trating, interested, thinking, and unsure. These mental states have a range from
[0,1] as they are confidence values.
In our framework, each feature source from each student is a separate stream of
data. Hence we have five streams of data that each report asynchronously and
at different rates. In order to merge all of the data sources, the wrist ID from
each student, and a time of the report was needed from each source. An example
of one client connected to our User Model Framework is shown in Fig. 2.
In our experiments, we used the logs rather than the sensor streams, since
the streams are not yet informing a user model. In addition, the tutor does not
yet create a stream of tutor data. Instead we used a database query to obtain
the relevant tutor information, and fed it to the User Model System with the
four sensor sources in order to time align the data and merge it with the correct
student. The result is a database table with a row for every time stamp and
Fig.2. A student at the client computer puts on a bracelet and starts the two client
programs indicating the wrist ID of the bracelet. The bracelet sends Skin Conductance
data to the Wrist Node, then logs bracelet data from all of the students in the classroom.
The User Model System (UMS) receives the bracelet data through the Wrist Stream.
The UMS client performs the same task as the Wrist Node for each of the other three
sensor sources. The ITS logs student interactions, and sends Tutor Data to the UMS.
The data is time synced based on the client’s system time. The UMS uses all available
streams of data to make user predictions to improve the ITS Client interaction.
wrist ID pair, and a column for each reported sensor value and tutor data value.
Each cell in a row represents the latest report of the data source. If the data
source has never reported or has not reported since the last tutor login or logout
event with a corresponding wrist ID, then the value is -1 until the data source
reports again. In this way the wrist IDs can be used by more than one student
at separate time intervals, and the system will continue to work.
We conducted three studies during Fall 2008 using our sensor system with
Wayang Outpost. The HS study involved 35 students in a public high school
in Massachusetts; the UMASS study involved 29 students in the University of
Massachusetts; the AZ study involved 29 undergraduate students from Arizona
State University. In the HS and UMASS studies, students used the software as
part of their regular math class for 4-5 days, as it covered topics in the class.
The AZ study was a lab study, where students would come to a lab in the uni-
versity and use the software for one single session. Wayang worked the same
way for all students, as introduced in Sec. 2, except for the fact that a student
could be randomly assigned the female learning companion (Jane), the male
learning companion (Jake) or no learning companion. In order to gather infor-
mation on students’ emotions, Wayang prompted students to report how they
were feeling (e.g., “how [interested/excited/confident/frustrated] do you feel right
now?”). Students answered this prompt by choosing one item from a five-point
scale, where a three corresponded to a neutral value and the ends were labeled
with extreme values (e.g., “ I feel anxious/ very confident”). The queried emo-
tion was randomly chosen, obtaining a report per student per emotion for most
subjects. Wayang queried students on their emotions every five minutes, but did
not interrupt students as they were solving a problem. During each student’s
interaction with Wayang, the four sensors described in Sec. 4 gathered data on
his or her physiological responses.
The three experiments yielded the results of 588 Emotional Queries from 80 stu-
dents that include valid data from at least one sensor. The queries were separated
into the four emotion variables as follows: 149 were about confidence/anxiety,
163 were about excitement/depression, 135 were about interest/boredom, and
141 were about frustration/no frustration. 16 of the student responses gave no
answer to the Emotional Query. These results were used as examples for the
Regression and the training and testing of the classification models.
In order to select a subset of the available features, a Stepwise Linear Re-
gression was done with each of the emotions as the dependent variable, and
tutor and sensor features as the independent variables. Since some students had
missing sensor data, separate models were run pairing the Tutor Features with
Sensor Features from one sensor at a time, and then finally with all of the Sensor
Features. Results from the regression in Table 4 show that the best models for
confidence, frustration, and excitement came from the subset of examples where
all of the sensor data was available, and the best model for interest came from
the subset of examples with mouse data available.
Table 3. Each cell corresponds to a linear model to predict emotion self-reports. Models
were generated using Stepwise Regression, and variables entered into the model are
shown in Table 4. The top row lists the feature sets that are available. The left column
lists the emotional self-reports being predicted. R values correspond to the fit of the
model (best fit models for each emotion are in bold). N values vary because some
students are missing data for a sensor.
Wrist + Mouse + All Sensors
Confident R = 0.44 R = 0.61 R = 0.48 R = 0.40 R = 0.48 R = 0.63 Camera
N = 143 N = 77 N = 115 N = 106 N = 107
Frustrated R = 0.55 R = 0.60 R = 0.61 R = 0.55 R = 0.59 R = 0.62 Camera
N = 138 N = 78 N = 105 N = 109 N = 102
ExcitedR = 0.39 R = 0.40 R = 0.45 R = 0.39 R = 0.45 R = 0.56
N = 154 N = 74 N = 122 N = 106 N = 119
Interested R = 0.42 R = 0.56 R = 0.53 R = 0.36 R = 0.67 R = 0.66
N = 133 N = 75 N = 107 N = 101 N = 102
N = 68
N = 67
N = 64
N = 62
Table 4 shows the features selected for each of the linear models. Looking at
the best fitting models, highlighted in bold, it is interesting to see that at most
two of the sensor sources and at most five of the available features are significant.
Table 4. This table lists the variables that the Stepwise Regression method selected as
relevant, for each of the regression models in Table 3. Each of these features significantly
contribute to the prediction of emotion self-reports (p < 0.01), and are listed in order
of relevance (The feature at the top is the best predictor.) The abbreviations of these
features are defined in Tables 1 and 2.
+ Tutorcontext only + Tutor
ConfidentTNumInc TNumInc TNumInc TNumInc TNumInc
TNumInc TNumInc TGroup
CmeanITGroup TNumInc TNumInc
6.2Cross Validation of the Linear Models
In order for the User Model system to give feedback to the ITS, the available
sensor and tutor features can be put into a classifier and report when a user is
likely to report a high value of a particular emotion. This likelihood could reduce
and possibly eliminate the need for querying the user of their affective state. To
test the efficacy of this idea, we made a classifier based on each linear model in
the table. Rather than using the scale of one to five, the dependent variable of
the classifier was 1 if the emotion level was high and -1 if the emotion level was
not. Hence we used a classification threshold of 0 on the prediction.
For each model we performed leave-one-student-out cross validation. We
recorded the number of True Positives, False Negatives, True Negatives, and
False Positives at each test. Table 5 shows that the best classifier of each emo-
tion in terms of Accuracy ranges from 78% to 87.5%. The best classification
results are obtained by only training on examples that are not in the middle.
This is likely the case because the middle values indicate indifference.
Table 5. This shows results of the best classifier of each emotional response. Accuracy
of no classifier is a prediction that the emotional state is always not high. Values in
parentheses include the middle values in the testing set as negative examples.
Classifier True False True False Accuracy (%) Accuracy (%)
Pos.Pos. Neg. Neg.
28(28) 5(24) 10(16) 1(1) 86.36(63.77)
3(3)0(0) 46(58) 7(7)
25(25) 9(37) 25(40) 5(5)
Interested Mouse 24(25) 4(19) 28(53) 7(7) 82.54(74.76)
We have presented a User Model framework to predict emotional self concept.
The framework is the first of its kind – including models based on sensor data
integrated with an ITS used in classrooms of up to 25 students. By using Stepwise
Regression we have isolated key features for predicting user emotional responses
to four categories of emotion. These results are supported by cross validation, and
show improvement using a very basic classifier. The models from these classifiers
can be used in future studies to predict a students’ self-concept of emotional state
on four ranges of emotion. These ranges are interest, frustration, confidence and
There are a number of places for improvement in our system. The first is that
we used summary information of all of the sensor values. We may find better
results by considering the time series of each of these sensors. In addition, the
MindReader library can be trained for new mental states. This is one avenue of
future work. Another place for improvement is to look at individual differences
in the sensors. Creating a baseline for emotional detection before using the tutor
system could help us to better interpret the sensor features.
Now that we have a basic User Model of students, the next step is to use this
Model in the next experiments to send recommendations to the ITS. In order
for this to be useful, the ITS needs to have some repair mechanisms based on
the predictions from the User Model. Examples of this include encouragement,
suggesting to the student to ask for a hint, and mirroring the emotion of the
Acknowledgments. We acknowledge contributions to the system development
from Rana el Kaliouby, Ashish Kapoor, Selene Mota and Carson Reynolds. We
also thank Joshua Richman, Roopesh Konda, and Assegid Kidane at ASU for
their work on sensor manufacturing. This research was funded by awards from
the National Science Foundation, 0705554, IIS/HCC Affective Learning Compan-
ions: Modeling and Supporting Emotion During Teaching, Woolf and Burleson
(PIs) with Arroyo, Barto, and Fisher and the U.S. Department of Education
to Woolf, B. P. (PI) with Arroyo, Maloy and the Center for Applied Special
Technology (CAST), Teaching Every Student: Using Intelligent Tutoring and
Universal Design To Customize The Mathematics Curriculum. Any opinions, Download full-text
findings, conclusions or recommendations expressed in this material are those of
the authors and do not necessarily reflect the views of the funding agencies.
1. Koedinger, K.R., Anderson, J.R., Hadley, W.H., Mark, M.A.: Intelligent tutoring
goes to school in the big city. International Journal of Artificial Intelligence in
Education 8(1) (1997) 30–43
2. Shute, V.J., Psotka, J.: Intelligent tutoring systems past, present and future. In
Jonassen, D., ed.: Handbook of Research on Educational Communications and
Technology. Scholastic Publications (1996)
3. Bailenson, J.N., Yee, N.: Digital chameleons. Psychological Science 16(10) (2005)
4. Florea, A., Kalisz, E.: Embedding emotions in an artificial tutor. In: SYNASC
2005. (Sept. 2005)
5. Arroyo, I., Beal, C., Murray, T., Walles, R., Woolf, B.P.: Web-based intelligent
multimedia tutoring for high stakes achievement tests. In Lester, J.C., Vicari,
R.M., Paraguacu, F., eds.: Intelligent Tutoring Systems. Springer, Maceio, Alagoas,
Brazil (2004) 468–477
6. Zhou, J., Wang, X.: Multimodal affective user interface using wireless devices for
emotion identification. (2005) 7155–7157
7. McQuiggan, S., Lee, S., Lester, J.: Early prediction of student frustration. Affective
Computing and Intelligent Interaction (2007) 698–709
8. McQuiggan, S., Mott, B., Lester, J.: Modeling self-efficacy in intelligent tutoring
systems: An inductive approach. User Modeling and User-Adapted Interaction
18(1) (2008) 81–123
9. Royer, J.M., Walles, R.: Influences of gender, motivation and socioeconomic status
on mathematics performance. In Berch, D.B., Mazzocco, M.M.M., eds.: Why is
Math so Hard for Some Children. Paul H. Brookes Publishing Co., Baltimore, MD
10. Catsambis, S.: The path to math: Gender and racial-ethnic differences in math-
ematics participation from middle school to high school. Sociology of Education
67(3) (1994) 199–215
11. Tobias, S.: Overcoming Math Anxiety, Revised and Expanded. W.W. Norton &
Company, New York (1995)
12. Strauss, M., Reynolds, C., Hughes, S., Park, K., McDarby, G., Picard, R.: The
handwave bluetooth skin conductance sensor. Affective Computing and Intelligent
Interaction (2005) 699–706
13. Qi, Y., Picard, R.: Context-sensitive bayesian classifiers and application to mouse
pressure pattern classification. Pattern Recognition, 2002. Proceedings. 16th In-
ternational Conference on 3 (2002) 448–451 vol.3
14. Kapoor, A., Burleson, W., Picard, R.W.: Automatic prediction of frustration.
International Journal of Human-Computer Studies 65(8) (August 2007) 724–736
15. Burleson, W., Picard, R.W.: Gender-specific approaches to developing emotionally
intelligent learning companions. IEEE Intelligent Systems 22(4) (2007) 62–69
16. el Kaliouby, R.: Mind-reading Machines: the automated inference of complex men-
tal states from video. PhD thesis, University of Cambridge (2005)
17. D’Mello, S., Picard, R.W., Graesser, A.: Toward an affect-sensitive autotutor.
IEEE Intelligent Systems 22(4) (2007) 53–61